PHPackages                             centamiv/vektor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. centamiv/vektor

ActiveProject[Search &amp; Filtering](/categories/search)

centamiv/vektor
===============

A native PHP Vector Database implementation with strict binary storage and Zero-RAM overhead.

v20260104225022(6mo ago)3539.3k↓67.3%42MITPHPPHP ^8.2

Since Dec 31Pushed 1w ago2 watchersCompare

[ Source](https://github.com/centamiv/vektor)[ Packagist](https://packagist.org/packages/centamiv/vektor)[ RSS](/packages/centamiv-vektor/feed)WikiDiscussions develop Synced 2d ago

READMEChangelog (4)Dependencies (1)Versions (11)Used By (2)

Vektor - Native PHP Vector Database
===================================

[](#vektor---native-php-vector-database)

**Vektor** is a high-performance, purely file-based, embedded **Vector Database** written entirely in native PHP. It is designed for **Zero-RAM Overhead**, meaning it does not require loading your entire dataset into memory to function.

**Each Vektor instance operates as a standalone database**, with data stored by default in the `/data` directory.

Instead of memory-heavy indexes, Vektor utilizes strict binary file layouts and optimized disk-seeking strategies to perform **Approximate Nearest Neighbor (ANN)** searches using the **HNSW (Hierarchical Navigable Small World)** algorithm.

---

Features
--------

[](#features)

- **Pure PHP**: No external dependencies or C extentions required. Runs on any standard PHP 8.2+ environment.
- **Zero-RAM Overhead**: Data is read directly from disk. Memory usage is constant regardless of dataset size.
- **HNSW Algorithm**: Efficient graph-based index for fast approximate nearest neighbor search.
- **Binary Storage**: Compact binary file formats for Vectors, Graph connections, and Metadata.
- **Embedded or Server**: Use it directly in your PHP code or run it as a standalone HTTP API server.
- **Thread-Safety**: Implements file locking (flock) to safely handle concurrent reads and writes.
- **Cosine Similarity**: Optimized distance metric for high-dimensional embeddings (configurable, default 1536).

---

Requirements
------------

[](#requirements)

- **PHP**: 8.2 or higher
- **Composer**: For dependency management

---

Upgrading to v2.0.0
-------------------

[](#upgrading-to-v200)

⚠️ **Important Breaking Change**: Version 2.0.0 introduces a change to the binary record length in `meta.bin`. If you are upgrading from a previous version, your existing data files will be **incompatible**. You must **delete your existing `data/` directory** and re-index (rebuild) your dataset.

---

Installation
------------

[](#installation)

### 1. Installation via Composer

[](#1-installation-via-composer)

To use Vektor in your existing PHP project:

```
composer require centamiv/vektor
```

### 2. Standalone Installation

[](#2-standalone-installation)

To run Vektor as a standalone API server:

```
git clone https://github.com/centamiv/vektor.git
cd vektor
composer install --no-dev
```

Ensure the `data/` directory is writable by your web server or script user:

```
mkdir -p data
chmod -R 775 data
```

---

Configuration
-------------

[](#configuration)

Vektor uses a `.env` file for configuration when running as a server.

1. Copy the example environment file:

    ```
    cp .env.example .env
    ```
2. Open `.env` and configure your API Token:

    ```
    # .env
    VEKTOR_API_TOKEN=your_secure_random_string_here
    VEKTOR_DIMENSIONS=1536
    ```

- **VEKTOR\_API\_TOKEN**: If set, all API requests (except `/up`) must include this token in the `Authorization` header. If left empty, the API is open to the public.
- **VEKTOR\_DIMENSIONS**: Set the dimension of your vectors (default: 1536). IMPORTANT: Changing this requires a fresh database (delete data/ dir).

---

Usage
-----

[](#usage)

Vektor is designed for flexibility, allowing you to either integrate it directly into your PHP projects as a library or deploy it as a standalone REST API server.

---

### Usage: HTTP API Server

[](#usage-http-api-server)

Vektor includes a built-in Controller to run as a REST API. You can serve this using Apache, Nginx, or the PHP built-in server.

#### Starting the Server

[](#starting-the-server)

For testing/development:

```
# Serves the public/ directory on port 8000
php -S 0.0.0.0:8000 -t public
```

#### Authentication

[](#authentication)

If `VEKTOR_API_TOKEN` is set in your `.env`, you must include the header in all requests:

```
Authorization: Bearer

```

#### API Endpoints

[](#api-endpoints)

##### `GET /up`

[](#get-up)

Health check endpoint.

- **Auth Required**: No
- **Response**:

```
{
  "status": "up"
}
```

##### `GET /info`

[](#get-info)

Returns database statistics.

- **Response**:

```
{
  "storage": {
    "vector_file_bytes": 1048576,
    "graph_file_bytes": 524288,
    "meta_file_bytes": 2048,
    "payload_file_bytes": 4096
  },
  "records": {
    "vectors_total": 150,
    "graph_nodes": 150
  },
  "config": {
    "dimension": 1536,
    "max_levels": 4
  }
}
```

##### `POST /insert`

[](#post-insert)

Insert a vector.

- **Body**:

```
{
  "id": "my-doc-id",
  "vector": [0.1, 0.2, 0.3, ...],
  "metadata": {
    "source": "docs/intro.md",
    "chunk": 3
  }
}
```

- **Response**:

```
{
  "status": "success",
  "id": "my-doc-id"
}
```

##### `POST /search`

[](#post-search)

Search for nearest neighbors.

- **Body**:

```
{
  "vector": [0.1, 0.2, 0.3, ...],
  "k": 5
}
```

- **Response**:

```
{
  "results": [
    { "id": "my-doc-id", "distance": 0.95 },
    { "id": "another-id", "distance": 0.88 }
  ]
}
```

Optionally pass `"include_vector": true` to also get vector data of similar documents. Optionally pass `"include_metadata": true` to also get metadata stored with the document.

- **Body**:

```
{
  "vector": [0.1, 0.2, 0.3, ...],
  "include_vector": true,
  "include_metadata": true,
  "k": 5
}
```

- **Response**:

```
{
  "results": [
    { "id": "my-doc-id", "distance": 0.95, "vector": [0.5, 1.0, 0.3, ...], "metadata": { "source": "docs/intro.md", "chunk": 3 } },
    { "id": "another-id", "distance": 0.88, "vector": [0.5, 1.1, 0.3, ...], "metadata": { "source": "docs/faq.md", "chunk": 1 } }
  ]
}
```

##### `POST /delete`

[](#post-delete)

Delete a vector.

- **Body**:

```
{
  "id": "my-doc-id"
}
```

- **Response**:

```
{
  "status": "success",
  "message": "..."
}
```

##### `POST /optimize`

[](#post-optimize)

Trigger database optimization.

- **Response**:

```
{
  "status": "success",
  "message": "..."
}
```

---

### Usage: Embedded Library

[](#usage-embedded-library)

You can use Vektor directly in your PHP scripts without running an HTTP server. This is the fastest way to interact with the database.

#### Configuration

[](#configuration-1)

By default, Vektor stores data in the `data/` directory relative to the package root. You can change this path using the `Config` class:

```
use Centamiv\Vektor\Core\Config;

Config::setDataDir(__DIR__ . '/my_custom_data_dir');
```

You can also set the vector dimensions (default 1536):

```
Config::setDimensions(768);
// Note: This must be called BEFORE initializing Indexer/Searcher
```

#### Initialization

[](#initialization)

```
use Centamiv\Vektor\Services\Indexer;
use Centamiv\Vektor\Services\Searcher;
use Centamiv\Vektor\Services\Optimizer;

// The Indexer handles writing (Insert, Delete)
$indexer = new Indexer();

// The Searcher handles reading (Search)
$searcher = new Searcher();
```

#### 1. Inserting Vectors

[](#1-inserting-vectors)

Vectors must be **1536-dimensional arrays** of floats.

```
$id = "doc-123"; // String ID (max 36 chars)
$vector = [0.0123, -0.5231, ...]; // Array of 1536 floats

// Insert (or update if ID exists - NOTE: Updates are essentially Appends with pointer updates)
$metadata = ['source' => 'docs/intro.md', 'chunk' => 3];
$indexer->insert($id, $vector, $metadata);
```

#### 2. Searching

[](#2-searching)

Find the `k` nearest neighbors to a query vector.

```
$queryVector = [0.0123, ...];
$k = 5; // Number of results

$results = $searcher->search($queryVector, $k, includeMetadata: true);

// Output:
// [
//   ['id' => 'doc-123', 'score' => 0.9823, 'metadata' => ['source' => 'docs/intro.md', 'chunk' => 3]],
//   ['id' => 'doc-456', 'score' => 0.8912, 'metadata' => ['source' => 'docs/faq.md', 'chunk' => 1]],
//   ...
// ]
```

#### 3. Deleting Results

[](#3-deleting-results)

Deletes a document by its ID. This performs a "soft delete" in the vector file and updates the metadata mapping.

```
$success = $indexer->delete("doc-123");

if ($success) {
    echo "Document deleted.";
} else {
    echo "Document not found.";
}
```

#### 4. Getting Statistics

[](#4-getting-statistics)

Retrieve current database stats, including file sizes and node counts.

```
$stats = $indexer->getStats();
print_r($stats);
```

#### 5. Optimizing (Vacuum)

[](#5-optimizing-vacuum)

Since deletions are "soft", the file size can grow over time. Run the optimizer to rebuild the index and reclaim space. **Note**: This is a blocking operation.

```
$optimizer = new Optimizer();
$optimizer->run();
```

---

Database File Structure
-----------------------

[](#database-file-structure)

Vektor achieves its performance and low memory footprint through three specialized binary files located in the `data/` directory.

- **`vector.bin`**: Stores raw vector data in an append-only structure.
- **`meta.bin`**: Maps external string IDs to internal file offsets using a disk-based Binary Search Tree (BST) for efficient lookups without loading maps into RAM.
- **`payload.bin`**: Stores serialized metadata (JSON) in an append-only structure referenced by `meta.bin`.
- **`graph.bin`**: Stores the HNSW Graph structure to enable fast navigation and approximate nearest neighbor searches.
- **Concurrency**: Implements advisory file locking to manage simultaneous shared reads and exclusive write operations safely.

---

How to generate embeddings from a document?
-------------------------------------------

[](#how-to-generate-embeddings-from-a-document)

Vektor stores vectors, but it does not generate them. You need an embedding model for that. A great local option is [Ollama](https://ollama.com/).

1. Install Ollama and pull an embedding model (e.g., `nomic-embed-text`).
2. Use the Ollama API to generate the vector for your text.
3. Pass that vector to Vektor:

```
function getEmbedding(string $text): array {
    $ch = curl_init('http://localhost:11434/api/embeddings');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode([
        'model' => 'nomic-embed-text',
        'prompt' => $text
    ]));

    $response = json_decode(curl_exec($ch), true);
    curl_close($ch);

    return $response['embedding'];
}

// IMPORTANT: Vektor stores the ID and the Vector, but NOT the original content.
// You are responsible for storing the actual text (in files, S3, etc.).

// 1. Read your document
$id = "doc-hello";
$text = file_get_contents("{$id}.txt");

// 2. Generate vector
$vector = getEmbedding($text);

// 3. Insert into Vektor using the filename/ID as the reference
$indexer->insert($id, $vector);
```

---

Troubleshooting
---------------

[](#troubleshooting)

### 1. Permission Denied Errors

[](#1-permission-denied-errors)

Ensure your PHP process (e.g., `www-data`) has read/write access to the `data/` folder and the files inside it.

```
chown -R www-data:www-data data/
chmod -R 775 data/
```

### 2. "Invalid Vector Dimensions"

[](#2-invalid-vector-dimensions)

Vektor defaults to **1536 dimensions**. If you send a vector with different dimensions, it will be rejected. To change this, you can use `VECTOR_DIMENSIONS` in your `.env` or `Config::setDimensions(N)` in your code. **Important**: If you change dimensions, you must start with an empty data directory, as the binary file structure depends on the dimension size.

### 3. Slow Performance?

[](#3-slow-performance)

- **Disk I/O**: Since Vektor is disk-based, SSDs are highly recommended. HDDs will result in slow seek times.
- **Opcache**: Ensure PHP Opcache is enabled for production.

---

Contributing
------------

[](#contributing)

Contributions are welcome! Please run the test suite before submitting a PR.

```
composer test
```

The test suite includes Unit tests for storage engines and Feature tests for the HNSW logic.

---

License
-------

[](#license)

This project is licensed under the **MIT License**.

###  Health Score

53

—

FairBetter than 96% of packages

Maintenance85

Actively maintained with recent releases

Popularity42

Moderate usage in the ecosystem

Community19

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 85.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~25 days

Recently: every ~43 days

Total

8

Last Release

10d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/4162703?v=4)[Ivan Centamori](/maintainers/centamiv)[@centamiv](https://github.com/centamiv)

---

Top Contributors

[![centamiv](https://avatars.githubusercontent.com/u/4162703?v=4)](https://github.com/centamiv "centamiv (29 commits)")[![aszenz](https://avatars.githubusercontent.com/u/25319264?v=4)](https://github.com/aszenz "aszenz (4 commits)")[![mrfelipemartins](https://avatars.githubusercontent.com/u/24225909?v=4)](https://github.com/mrfelipemartins "mrfelipemartins (1 commits)")

---

Tags

phpsearchembeddingsvector-databasehnswbinary-storage

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/centamiv-vektor/health.svg)

```
[![Health](https://phpackages.com/badges/centamiv-vektor/health.svg)](https://phpackages.com/packages/centamiv-vektor)
```

###  Alternatives

[solarium/solarium

PHP Solr client

93334.5M118](/packages/solarium-solarium)[omaressaouaf/query-builder-criteria

Define reusable query criteria for filtering, sorting, search, field selection, and includes in Laravel Eloquent models

285.3k](/packages/omaressaouaf-query-builder-criteria)[apicart/fql

Filter Query Language

1210.7k](/packages/apicart-fql)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
