PHPackages                             mauricioperera/php-vector-store - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. mauricioperera/php-vector-store

ActiveLibrary[Search &amp; Filtering](/categories/search)

mauricioperera/php-vector-store
===============================

Zero-dependency PHP vector database with BM25, hybrid search, Matryoshka, IVF indexing, and Int8 quantization

v0.1.0(1mo ago)01↓100%1GPL-2.0-or-laterPHPPHP &gt;=8.1

Since Mar 22Pushed 1mo agoCompare

[ Source](https://github.com/MauricioPerera/php-vector-store)[ Packagist](https://packagist.org/packages/mauricioperera/php-vector-store)[ RSS](/packages/mauricioperera-php-vector-store/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependenciesVersions (2)Used By (1)

PHP Vector Store
================

[](#php-vector-store)

Zero-dependency PHP vector database with **BM25 full-text search**, **hybrid search** (vector + text), Matryoshka progressive search, IVF indexing, and Int8 quantization. Pure PHP 8.1+ — no SQLite, no C extensions, no FFI.

```
composer require mauricioperera/php-vector-store

```

Why
---

[](#why)

Most vector databases require C extensions (sqlite-vec), external services (Pinecone, Weaviate), or specific runtimes (Python). PHP Vector Store runs anywhere PHP runs — shared hosting, WordPress, Laravel, any framework.

**New in v0.2:** BM25 full-text search, hybrid search fusion (RRF + Weighted), multiple distance metrics, `StoreInterface` for polymorphism, typed models, and a PHPUnit test suite.

Scaling Guide
-------------

[](#scaling-guide)

VectorsRecommended ConfigStorage/vecTotal (100K)Speed&lt;5KFloat32 768d + Matryoshka3,072 B300 MB~3ms5K-20KFloat32 384d + Matryoshka1,536 B150 MB~1.4ms20K-100KInt8 384d + IVF + Matryoshka**392 B****38 MB**~5ms100K-500KInt8 384d + IVF + Matryoshka**392 B****192 MB**~15ms&gt;500KUse sqlite-vec or external service———Quick Start
-----------

[](#quick-start)

```
use PHPVectorStore\VectorStore;
use PHPVectorStore\QuantizedStore;
use PHPVectorStore\IVFIndex;
use PHPVectorStore\HybridSearch;
use PHPVectorStore\HybridMode;
use PHPVectorStore\Distance;
use PHPVectorStore\BM25\Index as BM25Index;

// 1. Vector search
$store = new QuantizedStore( __DIR__ . '/vectors', 384 );
$store->set( 'articles', 'art-1', $embedding, ['title' => 'My Article'] );
$store->flush();

$results = $store->matryoshkaSearch( 'articles', $query, 5, [128, 256, 384] );

// 2. Full-text search (BM25)
$bm25 = new BM25Index();
$bm25->addDocument( 'articles', 'art-1', 'My article about machine learning...' );

$results = $bm25->search( 'articles', 'machine learning', 10 );

// 3. Hybrid search (vector + text combined)
$hybrid = new HybridSearch( $store, $bm25, HybridMode::RRF );
$results = $hybrid->search( 'articles', $query_vector, 'machine learning', 5 );

// 4. Multiple distance metrics
$results = $store->search( 'articles', $query, 5, 0, Distance::Euclidean );
```

Features
--------

[](#features)

### Vector Storage (Float32 &amp; Int8)

[](#vector-storage-float32--int8)

```
// Full precision: dim x 4 bytes per vector
$store = new VectorStore( '/path', 768 );

// Quantized: dim + 8 bytes per vector (4x smaller)
$q8 = new QuantizedStore( '/path', 384 );
```

Both implement `StoreInterface` — use them interchangeably.

### BM25 Full-Text Search

[](#bm25-full-text-search)

Okapi BM25 inverted index, collection-aware, with persistence.

```
use PHPVectorStore\BM25\Index;
use PHPVectorStore\BM25\Config;
use PHPVectorStore\BM25\SimpleTokenizer;

$bm25 = new Index(
    config: new Config( k1: 1.5, b: 0.75 ),
    tokenizer: new SimpleTokenizer(),
);

// Index documents
$bm25->addDocument( 'articles', 'doc-1', 'The quick brown fox...' );
$bm25->addDocument( 'articles', 'doc-2', 'Database systems and SQL...' );

// Search
$results = $bm25->search( 'articles', 'quick fox', 10 );
// [['id' => 'doc-1', 'score' => 1.234, 'rank' => 1], ...]

// Get raw scores (for hybrid fusion)
$scores = $bm25->scoreAll( 'articles', 'quick fox' );
// ['doc-1' => 1.234, 'doc-2' => 0.0]

// Persist to disk
$bm25->save( '/path/vectors', 'articles' );  // writes articles.bm25.bin
$bm25->load( '/path/vectors', 'articles' );  // restores state
```

The `SimpleTokenizer` handles Unicode text with configurable stop words:

```
// Custom stop words for Spanish
$tokenizer = new SimpleTokenizer(
    stopWords: ['el', 'la', 'los', 'las', 'de', 'en', 'y', 'que', 'es', 'un', 'una'],
    minTokenLength: 2,
);
$bm25 = new Index( tokenizer: $tokenizer );
```

### Hybrid Search

[](#hybrid-search)

Combines vector similarity with BM25 text relevance using fusion strategies.

```
use PHPVectorStore\HybridSearch;
use PHPVectorStore\HybridMode;

// RRF fusion (recommended — robust, no tuning needed)
$hybrid = new HybridSearch( $store, $bm25, HybridMode::RRF );
$results = $hybrid->search( 'articles', $vector, 'search text', 5 );

// Weighted fusion (tunable weights)
$hybrid = new HybridSearch( $store, $bm25, HybridMode::Weighted );
$results = $hybrid->search( 'articles', $vector, 'search text', 5, [
    'vectorWeight' => 0.7,
    'textWeight'   => 0.3,
]);

// Multi-collection hybrid
$results = $hybrid->searchAcross(
    ['articles', 'comments'],
    $vector, 'search text', 10,
);
```

**RRF (Reciprocal Rank Fusion):** `score(d) = Σ 1/(k + rank(d))` — combines ranks from both legs without needing score normalization. Best default choice.

**Weighted:** Min-max normalizes both score sets to \[0,1\], then `combined = w_vec * vecNorm + w_text * textNorm`. Use when you want explicit control over the balance.

### Distance Metrics

[](#distance-metrics)

```
use PHPVectorStore\Distance;

// Cosine similarity (default) — best for normalized embeddings
$store->search( 'col', $query, 5, 0, Distance::Cosine );

// Euclidean distance — converted to similarity: 1/(1+dist)
$store->search( 'col', $query, 5, 0, Distance::Euclidean );

// Dot product — for pre-normalized vectors
$store->search( 'col', $query, 5, 0, Distance::DotProduct );

// Manhattan distance — robust to outliers: 1/(1+dist)
$store->search( 'col', $query, 5, 0, Distance::Manhattan );
```

Works with `search()`, `matryoshkaSearch()`, and `searchAcross()` on both VectorStore and QuantizedStore.

### IVF Clustering

[](#ivf-clustering)

K-means partitions vectors into clusters for sub-linear search.

```
$ivf = new IVFIndex( $store, numClusters: 100, numProbes: 20 );
$ivf->build( 'articles' );
$results = $ivf->search( 'articles', $query, 5 );
$results = $ivf->matryoshkaSearch( 'articles', $query, 5, [128, 256, 384] );
```

Works with both VectorStore and QuantizedStore (via `StoreInterface`).

### Matryoshka Multi-Stage Search

[](#matryoshka-multi-stage-search)

Progressive refinement — each stage narrows candidates before the next.

```
$store->matryoshkaSearch( 'col', $query, 5, [128, 384, 768] );
```

Speedup: **3-5x** over brute-force. Combined with IVF: **10-15x**.

### StoreInterface

[](#storeinterface)

Both `VectorStore` and `QuantizedStore` implement `StoreInterface`:

```
use PHPVectorStore\StoreInterface;

function buildIndex( StoreInterface $store ): void {
    $ivf = new IVFIndex( $store );
    $ivf->build( 'articles' );
}

// Works with either store
buildIndex( new VectorStore( '/path', 384 ) );
buildIndex( new QuantizedStore( '/path', 384 ) );
```

### Typed Models

[](#typed-models)

```
use PHPVectorStore\Document;
use PHPVectorStore\SearchResult;

$doc = new Document(
    id: 'doc-1',
    vector: [0.1, 0.2, ...],
    text: 'The quick brown fox...',
    metadata: ['title' => 'My Doc'],
);

$result = new SearchResult(
    id: 'doc-1',
    score: 0.95,
    rank: 1,
    metadata: ['title' => 'My Doc'],
    collection: 'articles',
);
```

### Typed Exceptions

[](#typed-exceptions)

```
use PHPVectorStore\Exception\VectorStoreException;
use PHPVectorStore\Exception\DimensionMismatchException;
use PHPVectorStore\Exception\CollectionNotFoundException;
```

Concurrency &amp; Scaling Notes
-------------------------------

[](#concurrency--scaling-notes)

### File Locking

[](#file-locking)

All `flush()` operations use `flock(LOCK_EX)` to prevent race conditions when multiple PHP processes write to the same collection simultaneously. This ensures atomic writes even under concurrent web requests.

### Dimension Validation

[](#dimension-validation)

`set()` throws `DimensionMismatchException` if the vector has fewer dimensions than the store was configured with. This catches mismatches early (e.g., passing a 384d vector to a 768d store).

### JSON Manifest Scaling

[](#json-manifest-scaling)

Each collection stores its ID list and metadata in a `.json` sidecar file. For collections approaching 100K vectors, this manifest can grow large (~10-20 MB). Considerations:

- **Memory**: The entire manifest is loaded into memory on first access to a collection. For 100K vectors with metadata, budget ~50-100 MB of PHP memory.
- **Latency**: JSON decode of a large manifest adds ~50-200ms on first load (cached for subsequent operations within the same request).
- **Mitigation**: Use multiple collections (per entity type) to keep individual manifests small. A collection of 10K vectors has a ~1-2 MB manifest.

For datasets beyond 100K vectors, consider sqlite-vec or an external vector database.

API Reference
-------------

[](#api-reference)

### StoreInterface (VectorStore &amp; QuantizedStore)

[](#storeinterface-vectorstore--quantizedstore)

```
// Write
->set( $collection, $id, $vector, $metadata = [] )
->remove( $collection, $id ): bool
->drop( $collection )
->flush()

// Read
->get( $collection, $id ): ?array     // {id, vector, metadata}
->has( $collection, $id ): bool
->count( $collection ): int
->ids( $collection ): string[]
->collections(): string[]
->stats(): array
->dimensions(): int
->directory(): string

// Search
->search( $collection, $query, $limit = 5, $dimSlice = 0, $distance = null )
->matryoshkaSearch( $collection, $query, $limit = 5, $stages = [...], $multiplier = 3, $distance = null )
->searchAcross( $collections, $query, $limit = 5, $dimSlice = 0, $distance = null )

// Import/Export
->import( $collection, $records ): int
->export( $collection ): array
```

### BM25\\Index

[](#bm25index)

```
->addDocument( $collection, $id, $text )
->removeDocument( $collection, $id )
->search( $collection, $query, $limit = 10 ): array
->scoreAll( $collection, $query ): array    // id => score
->count( $collection ): int
->vocabularySize( $collection ): int
->save( $directory, $collection )
->load( $directory, $collection )
->exportState( $collection ): array
->importState( $collection, $state )
```

### HybridSearch

[](#hybridsearch)

```
->search( $collection, $vector, $text, $limit = 5, $options = [] )
->searchAcross( $collections, $vector, $text, $limit = 5, $options = [] )
```

Options: `fetchK`, `vectorWeight`, `textWeight`, `rrfK`, `dimSlice`.

### IVFIndex

[](#ivfindex)

```
new IVFIndex( StoreInterface $store, int $numClusters = 100, int $numProbes = 10 )

->build( $collection, $sampleDims = 128 ): array
->search( $collection, $query, $limit = 5, $dimSlice = 0 )
->matryoshkaSearch( $collection, $query, $limit, $stages, $multiplier = 3 )
->hasIndex( $collection ): bool
->indexStats( $collection ): ?array
->dropIndex( $collection )
```

### Math (static)

[](#math-static)

```
VectorStore::normalize( $vector ): array
VectorStore::cosineSim( $a, $b, $dims ): float
VectorStore::euclideanDist( $a, $b, $dims ): float
VectorStore::dotProduct( $a, $b, $dims ): float
VectorStore::manhattanDist( $a, $b, $dims ): float
VectorStore::computeScore( $a, $b, $dims, Distance $distance ): float
```

Storage Format
--------------

[](#storage-format)

```
vectors/
├── articles.bin          ← Float32: N x dim x 4 bytes
├── articles.json         ← Manifest: IDs + metadata
├── articles.q8.bin       ← Int8: N x (dim + 8) bytes
├── articles.q8.json      ← Quantized manifest
├── articles.ivf.json     ← IVF: centroids + cluster assignments
├── articles.bm25.bin     ← BM25: inverted index (serialized PHP)
└── .htaccess             ← Access protection

```

Testing
-------

[](#testing)

```
composer install
vendor/bin/phpunit
```

41 tests across 5 suites: VectorStore, QuantizedStore, IVFIndex, BM25, HybridSearch.

Performance
-----------

[](#performance)

### Speed (5,000 random vectors, PHP 8.2)

[](#speed-5000-random-vectors-php-82)

MethodTime/querySpeedupBrute-force 768d796ms1xMatryoshka 128-&gt;384-&gt;768182ms4.4xIVF100ms7.9xIVF + Matryoshka**54ms****14.7x**### Storage

[](#storage)

FormatPer vector10K100KFloat32 768d3,072 B30 MB300 MBFloat32 384d1,536 B15 MB150 MBInt8 768d776 B7.6 MB76 MB**Int8 384d****392 B****3.8 MB****38 MB**Integration Patterns
--------------------

[](#integration-patterns)

### WordPress

[](#wordpress)

```
$store = new QuantizedStore( WP_CONTENT_DIR . '/vectors', 384 );
$bm25  = new BM25\Index();

add_action( 'wp_after_insert_post', function( $id, $post ) use ( $store, $bm25 ) {
    if ( 'publish' !== $post->post_status ) return;
    $text   = $post->post_title . ' ' . wp_strip_all_tags( $post->post_content );
    $vector = array_slice( your_embedding_api( $text ), 0, 384 );
    $store->set( 'posts', (string) $id, $vector, ['title' => $post->post_title] );
    $bm25->addDocument( 'posts', (string) $id, $text );
    $store->flush();
    $bm25->save( WP_CONTENT_DIR . '/vectors', 'posts' );
}, 10, 2 );

// Hybrid search
$hybrid = new HybridSearch( $store, $bm25, HybridMode::RRF );
$results = $hybrid->search( 'posts', $query_vector, $search_text, 5 );
```

### Laravel

[](#laravel)

```
// Service Provider
$this->app->singleton( StoreInterface::class, fn() =>
    new QuantizedStore( storage_path( 'vectors' ), 384 )
);

// Controller
public function search( Request $request ) {
    $store   = app( StoreInterface::class );
    $query   = array_slice( $this->embed( $request->q ), 0, 384 );
    $results = $store->matryoshkaSearch( 'articles', $query, 10, [128, 256, 384] );
    return Article::whereIn( 'id', array_column( $results, 'id' ) )->get();
}
```

### Neuron AI (RAG)

[](#neuron-ai-rag)

```
use PHPVectorStore\Integration\NeuronVectorStore;

class MyRAG extends RAG {
    protected function vectorStore(): VectorStoreInterface {
        return new NeuronVectorStore(
            directory:  __DIR__ . '/vectors',
            dimensions: 384,
            quantized:  true,
            matryoshka: true,
        );
    }
}
```

Architecture
------------

[](#architecture)

```
PHPVectorStore\
├── StoreInterface           ← Common interface
├── VectorStore              ← Float32 storage (implements StoreInterface)
├── QuantizedStore           ← Int8 storage (implements StoreInterface)
├── IVFIndex                 ← K-means clustering (wraps StoreInterface)
├── HybridSearch             ← Vector + BM25 fusion
├── Distance                 ← Enum: Cosine, Euclidean, DotProduct, Manhattan
├── HybridMode               ← Enum: RRF, Weighted
├── Document                 ← Typed model
├── SearchResult             ← Typed model
├── BM25\
│   ├── Index                ← Okapi BM25 inverted index
│   ├── Config               ← k1, b parameters
│   ├── TokenizerInterface   ← Pluggable tokenization
│   └── SimpleTokenizer      ← Unicode tokenizer with stop words
├── Exception\
│   ├── VectorStoreException
│   ├── DimensionMismatchException
│   └── CollectionNotFoundException
└── Integration\
    └── NeuronVectorStore    ← Neuron AI RAG adapter

```

License
-------

[](#license)

GPL-2.0-or-later

###  Health Score

36

—

LowBetter than 81% of packages

Maintenance97

Actively maintained with recent releases

Popularity2

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity32

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

48d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/a6414920fb4353a63c1bf6af3e78e0b1387fa77adcf7610f9a7ff6e6d39ee757?d=identicon)[MauricioPerera](/maintainers/MauricioPerera)

---

Top Contributors

[![MauricioPerera](https://avatars.githubusercontent.com/u/6507438?v=4)](https://github.com/MauricioPerera "MauricioPerera (12 commits)")

---

Tags

searchaivectorembeddingragcosine-similaritymatryoshkaneuronivf

### Embed Badge

![Health badge](/badges/mauricioperera-php-vector-store/health.svg)

```
[![Health](https://phpackages.com/badges/mauricioperera-php-vector-store/health.svg)](https://phpackages.com/packages/mauricioperera-php-vector-store)
```

###  Alternatives

[vizra/vizra-adk

Vizra Agent Development Kit - A comprehensive Laravel package for building intelligent AI agents.

29026.1k](/packages/vizra-vizra-adk)[vectorifyai/vectorify-laravel

Vectorify package for Laravel. The fastest way to ask AI about your data.

206.1k](/packages/vectorifyai-vectorify-laravel)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
