PHPackages                             memvector/ext-memvector - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Database &amp; ORM](/categories/database)
4. /
5. memvector/ext-memvector

ActivePhp-ext[Database &amp; ORM](/categories/database)

memvector/ext-memvector
=======================

High-performance Local Vector Storage &amp; Embedding Engine for PHP

1.0.0(4mo ago)24642MITC++PHP &gt;=8.2

Since Mar 1Pushed 4mo agoCompare

[ Source](https://github.com/memvector/ext-memvector)[ Packagist](https://packagist.org/packages/memvector/ext-memvector)[ Docs](https://github.com/memvector/ext-memvector)[ RSS](/packages/memvector-ext-memvector/feed)WikiDiscussions main Synced today

READMEChangelogDependenciesVersions (2)Used By (0)

MemVector: Local Vector API, Storage &amp; Embedding Engine for PHP
===================================================================

[](#memvector-local-vector-api-storage--embedding-engine-for-php)

A high-performance PHP extension that brings local AI infrastructure and AI API to PHP developers and optimized for AI workload. Vector similarity search, text embedding, and cross-encoder reranking — all running directly in your PHP process. Built in C++17 with AVX2 SIMD acceleration. No external vector database, no embedding API, no network round-trips. Embed, store, search, and rerank in under 10 ms.

Works best with **small, specialized local GGUF models** (embedding + reranker, 24-636 MB) that run efficiently on CPU, and with **long-lived PHP runtimes** ([OpenSwoole](https://openswoole.com), [ReactPHP](https://reactphp.org), [RoadRunner](https://roadrunner.dev), [FrankenPHP](https://frankenphp.dev)) where persistent workers keep models loaded in memory across requests — giving you single-digit millisecond latency for the entire embed → search → rerank pipeline with zero external dependencies.

Quick Start
-----------

[](#quick-start)

```
// Semantic search with local embeddings (requires --with-llama)
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$store = new MemVectorStore(null, ['dimensions' => $emb->dimensions()]);

$store->set('php',     $emb->embed('PHP is a server-side scripting language'),     'lang');
$store->set('python',  $emb->embed('Python is used for machine learning'),         'lang');
$store->set('gravity', $emb->embed('Gravity pulls objects toward the earth'),      'science');
$store->set('dna',     $emb->embed('DNA encodes genetic information'),             'science');

$results = $store->search($emb->embed('programming languages'), 2);
// [['key' => 'php', 'score' => 0.82, 'metadata' => 'lang'],
//  ['key' => 'python', 'score' => 0.79, 'metadata' => 'lang']]
```

```
// Two-stage retrieval: vector search + cross-encoder rerank (requires --with-llama)
$rr = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
$candidates = $store->search($emb->embed('programming languages'), 50);
$results = $rr->rerank('programming languages', $candidates, 5);
```

```
// Or bring your own vectors (no llama.cpp needed)
$store = new MemVectorStore('/data/vectors', ['dimensions' => 1536]);
$store->set('doc_1', $openai_embedding, '{"title": "Introduction"}');
$results = $store->search($query_embedding, 10);
```

Features
--------

[](#features)

- **Key-value API** — `set(key, vector)`, `get(key)`, `delete(key)`, `batchSet()` with upsert semantics
- **Text embeddings** — optional [llama.cpp](https://github.com/ggerganov/llama.cpp) integration for local GGUF model inference
- **Cross-encoder reranking** — two-stage retrieval: fast vector search → accurate cross-encoder rerank
- **Three storage modes** — memory, disk (mmap), shared memory (cross-process)
- **HNSW index** — auto-builds transparently during search for fast approximate nearest neighbor
- **Vector quantization** — F16, Int8 (scalar), binary, and product quantization (PQ)
- **Four distance metrics** — cosine, dot product, euclidean, manhattan
- **Lock-free concurrency** — reads/writes via `std::atomic`
- **AVX2 SIMD** — accelerated cosine/dot distance computation
- **Multi-segment architecture** — auto-grows with fixed-capacity segments
- **JSONL import/export** — `dump()` and `load()` for backup, migration, and interop
- **Memory limits** — cap total mmap usage across segments

Performance
-----------

[](#performance)

MemVector runs entirely in-process — no network calls, no serialization, no external services. This eliminates the latency and cost of external APIs and vector databases.

### Embedding generation

[](#embedding-generation)

ApproachLatency per callCostOpenAI API (`text-embedding-3-small`)50-200 ms (network round-trip)$0.02 / 1M tokensCohere API (`embed-english-v3.0`)50-200 ms (network round-trip)$0.10 / 1M tokens**MemVector + local GGUF model****5-15 ms** (in-process)**Free**Local embedding with MemVector is 10-40x faster than API calls and has zero per-token cost. The model runs inside the PHP process via llama.cpp — no HTTP overhead, no API keys, no rate limits.

### Vector search

[](#vector-search)

ApproachLatency per queryNotesPinecone / Qdrant / Weaviate (cloud)10-50 ms (network)Managed service, per-vector pricingPinecone / Qdrant / Weaviate (self-hosted)5-20 ms (network)Separate process, TCP/gRPC overheadPostgreSQL + pgvector5-50 ms (query + network)Shared DB, connection pooling overhead**MemVector (in-process)****0.1-5 ms****No network, no serialization**MemVector searches vectors directly in the PHP process memory (or mmap'd files). There is no serialization, no socket, no protocol parsing. HNSW index + AVX2 SIMD keeps search under 1 ms for most workloads up to millions of vectors.

### Full RAG pipeline (embed + search + rerank)

[](#full-rag-pipeline-embed--search--rerank)

ApproachTotal latencyComponentsOpenAI embed + Pinecone search100-400 ms2 network round-tripsOpenAI embed + Pinecone search + Cohere rerank200-600 ms3 network round-trips**MemVector (all in-process)****10-30 ms****0 network round-trips**A complete two-stage retrieval pipeline (embed query → vector search → cross-encoder rerank) runs in a single PHP process with no external dependencies.

### Memory footprint

[](#memory-footprint)

ComponentRSSMemVector extension (no model)~1 MB+ embedding model (all-MiniLM-L6-v2, 24 MB GGUF)~33 MB+ reranker model (bge-reranker-v2-m3, 636 MB GGUF)~200 MB+ 100K vectors (384 dim, f32)~150 MB+ 100K vectors (384 dim, int8 quantized)~40 MBModel weights are `mmap()`'d and shared across php-fpm/OpenSwoole workers by the OS page cache.

### Why small models + OpenSwoole

[](#why-small-models--openswoole)

MemVector is designed for **small, task-specific models** — not large generative LLMs. Embedding models (24-138 MB) and reranker models (366-636 MB) are compact enough to run on CPU with low latency and fit comfortably in server memory. These models do one thing well: convert text to vectors or score relevance — tasks where small specialized models match or exceed large API-hosted models in quality.

Combined with **OpenSwoole**, where each worker process loads the model once and reuses it across thousands of requests, you get a fully self-contained semantic search stack:

- **No API costs** — models run locally, no per-token billing
- **No network latency** — embed, search, and rerank happen in-process
- **No external services** — no vector database, no embedding API, no reranker API to deploy and maintain
- **Predictable performance** — no cold starts, no rate limits, no variance from shared infrastructure

A single OpenSwoole worker with a 24 MB embedding model + MemVector store handles embed + search in under 10 ms per request using ~50 MB of RAM.

Requirements
------------

[](#requirements)

- PHP 8.1+
- C++17 compiler (GCC 7+ or Clang 5+)
- Optional: [llama.cpp](https://github.com/ggerganov/llama.cpp) for text embeddings and cross-encoder reranking

Install via PIE
---------------

[](#install-via-pie)

[PIE](https://github.com/php/pie) is the official PHP installer for extensions.

```
pie install memvector/ext-memvector
```

To enable llama.cpp support (embeddings &amp; reranking):

```
pie install memvector/ext-memvector --with-llama=/usr/local
```

Build from Source
-----------------

[](#build-from-source)

```
phpize
./configure --enable-memvector
make
make test
```

### Install llama.cpp (optional, for embeddings &amp; reranking)

[](#install-llamacpp-optional-for-embeddings--reranking)

**macOS (Homebrew):**

```
brew install llama.cpp
```

**Linux (build from source):**

```
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build -DBUILD_SHARED_LIBS=ON -DGGML_CUDA=OFF
cmake --build build --config Release -j$(nproc)
sudo cmake --install build --prefix /usr/local
sudo ldconfig  # refresh shared library cache
```

For GPU acceleration, replace `-DGGML_CUDA=OFF` with `-DGGML_CUDA=ON` (requires CUDA toolkit).

### Build with llama.cpp support (embeddings &amp; reranking)

[](#build-with-llamacpp-support-embeddings--reranking)

```
phpize
./configure --enable-memvector --with-llama=/usr/local
make
make test
```

Pass `--with-llama=DIR` if llama.cpp is installed in a non-standard location.

### Download models

[](#download-models)

**Embedding models** — any GGUF embedding model works. Recommended starter (24 MB, 384 dimensions):

```
curl -L -o all-MiniLM-L6-v2-Q8_0.gguf \
  https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF/resolve/main/all-MiniLM-L6-v2.Q8_0.gguf
```

ModelDimensionsSizeDownloadall-MiniLM-L6-v2 (Q8)38424 MB[HuggingFace](https://huggingface.co/leliuga/all-MiniLM-L6-v2-GGUF)nomic-embed-text-v1.5 (Q8)768138 MB[HuggingFace](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF)bge-small-en-v1.5 (F16)38467 MB[HuggingFace](https://huggingface.co/CompendiumLabs/bge-small-en-v1.5-gguf)**Reranker models** — GGUF cross-encoder models for two-stage retrieval:

```
curl -L -o bge-reranker-v2-m3-Q8_0.gguf \
  https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF/resolve/main/bge-reranker-v2-m3-Q8_0.gguf
```

ModelSizeDownloadbge-reranker-v2-m3 (Q8)636 MB[HuggingFace](https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF)bge-reranker-v2-m3 (Q2)366 MB[HuggingFace](https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF)### Configure options

[](#configure-options)

FlagDescription`--enable-memvector`Enable the extension (required)`--enable-memvector-avx2`AVX2 SIMD (`auto` by default, auto-detected)`--with-llama[=DIR]`Enable llama.cpp support (embeddings &amp; reranking)Feature Detection
-----------------

[](#feature-detection)

Optional features can be detected at runtime via constants:

```
if (defined('MEMVECTOR_SHM')) {
    // Shared memory mode available
    $store = new MemVectorStore('mystore', ['storage' => 'shm', 'dimensions' => 128]);
}

if (defined('MEMVECTOR_LLAMA')) {
    // llama.cpp support compiled in (embeddings & reranking)
    $emb = new MemVectorEmbedding('/path/to/embedding-model.gguf');
    $rr  = new MemVectorReranker('/path/to/reranker-model.gguf');
}
```

PHP API
-------

[](#php-api)

### Class: `MemVectorStore`

[](#class-memvectorstore)

#### `__construct(?string $dir = null, ?array $options = null)`

[](#__constructstring-dir--null-array-options--null)

Create or open a vector store.

```
// Memory mode (ephemeral, single-process)
$store = new MemVectorStore();
$store = new MemVectorStore(null, ['dimensions' => 128]);

// Disk mode (persistent, file-backed mmap)
$store = new MemVectorStore('/path/to/dir', ['dimensions' => 128]);

// Shared memory mode (persistent, cross-process)
$store = new MemVectorStore('mystore', ['storage' => 'shm', 'dimensions' => 128]);
```

**Options** (all string values are case-insensitive):

KeyTypeDefaultDescription`storage`stringauto`'disk'`, `'memory'`, or `'shm'``dimensions`int1536Vector dimensionality (1-4096)`distance`string`'cosine'``'cosine'`, `'dot'`, `'euclidean'`, `'manhattan'``quantization`string`'none'``'none'`, `'f16'`, `'int8'`, `'binary'`, `'pq'``segment_size`int10M (disk/shm), 4096 (memory)Max records per segment`writable`booltrueOpen in read-only mode (shm mode)`memory_limit`int|string0Max total mmap bytes (`'1G'`, `'512M'`, `'128K'`). 0 = unlimited---

#### `set(string $key, array $vector, ?string $metadata = null): bool`

[](#setstring-key-array-vector-string-metadata--null-bool)

Insert or replace a vector by key (upsert semantics).

```
$store->set('doc_42', $vector);
$store->set('doc_42', $new_vector, '{"category": "science"}'); // replaces previous
```

- `$key` — Unique string key (max 63 chars)
- `$vector` — Float array matching the store's `dimensions`
- `$metadata` — Optional metadata string (max 4096 bytes)

---

#### `batchSet(array $batch): int`

[](#batchsetarray-batch-int)

Insert or replace multiple vectors in one call. Returns the count of items inserted.

```
$count = $store->batchSet([
    ['key' => 'doc_1', 'vector' => $vec1],
    ['key' => 'doc_2', 'vector' => $vec2, 'metadata' => '{"tag": "a"}'],
]);
```

---

#### `search(array $vector, int $topK = 10, float $precision = 50): array`

[](#searcharray-vector-int-topk--10-float-precision--50-array)

Search for the nearest neighbors of a query vector.

```
$results = $store->search($query_vector, 5);
$results = $store->search($query_vector, 5, 100); // higher precision
$results = $store->search($query_vector, 5, 0);   // force brute-force
```

- `$topK` — Number of results (max 1000)
- `$precision` — HNSW beam width. Higher = better recall, slower. `0` = brute-force.

The HNSW index auto-builds when `precision > 0` and vector count reaches `auto_index_threshold`.

Returns results sorted by score (highest first):

```
[
    ['key' => 'doc_42', 'score' => 0.95, 'metadata' => '{"category": "science"}'],
    ['key' => 'doc_7',  'score' => 0.89, 'metadata' => null],
]
```

Score meaning by distance metric:

- **Cosine**: similarity (0 to 1, higher = more similar)
- **Dot**: raw dot product (higher = more similar)
- **Euclidean/Manhattan**: negated distance (higher = closer)

---

#### `get(string $key): ?array`

[](#getstring-key-array)

Retrieve a record by key. Returns `null` if not found.

```
$record = $store->get('doc_42');
// ['key' => 'doc_42', 'vector' => [0.1, ...], 'metadata' => '...']
```

---

#### `delete(string $key): bool`

[](#deletestring-key-bool)

Soft-delete a record by key. Returns `true` if found and deleted.

---

#### `count(): int`

[](#count-int)

Return the total number of live records.

---

#### `stats(): array`

[](#stats-array)

Return store statistics.

KeyTypeDescription`storage`string`"memory"`, `"disk"`, or `"shm"``total_count`intTotal records across all segments`segment_count`intNumber of segments`dimensions`intVector dimensionality`distance`string`"cosine"`, `"dot"`, `"euclidean"`, or `"manhattan"``quantization`string`"none"`, `"f16"`, `"int8"`, `"binary"`, or `"pq"``bytes_per_vector`intStorage bytes per vector`index_status`string`"none"` or `"ready"``unindexed_count`intRecords not yet in the HNSW index`memory_mb`floatTotal memory/mmap usage in MB---

#### `dump(?string $path = null): string|int`

[](#dumpstring-path--null-stringint)

Export all records as JSONL. Each line: `{"key":"...","vector":[...],"metadata":"..."}`

```
$jsonl = $store->dump();                        // returns string
$count = $store->dump('/backup/vectors.jsonl');  // streams to file, returns count
```

---

#### `load(string $path): int`

[](#loadstring-path-int)

Import records from a JSONL file (inverse of `dump()`). Returns the number of records loaded. Uses upsert semantics.

```
$count = $store->load('/backup/vectors.jsonl');
```

---

#### `clear(): bool`

[](#clear-bool)

Remove all records. Memory and shm modes only.

---

#### `snapshot(string $dir): bool`

[](#snapshotstring-dir-bool)

Persist a memory or shm store to disk. Can be reloaded with `'storage' => 'memory'`:

```
$store->snapshot('/data/snapshot');
// Later:
$store = new MemVectorStore('/data/snapshot', ['storage' => 'memory', 'dimensions' => 128]);
```

---

#### `close(): void`

[](#close-void)

Close the store and release resources. Called automatically on GC. Safe to call multiple times.

---

#### `static destroy(string $name): bool`

[](#static-destroystring-name-bool)

Unlink a shared memory store. Requires `MEMVECTOR_SHM`.

---

### Class: `MemVectorEmbedding`

[](#class-memvectorembedding)

Requires `--with-llama` at build time. Detect with `defined('MEMVECTOR_LLAMA')`.

#### `__construct(string $model_path, ?array $options = null)`

[](#__constructstring-model_path-array-options--null)

Load a GGUF embedding model.

```
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf', [
    'context_size' => 512,
    'gpu_layers'   => 0,
    'normalize'    => true,
]);
```

**Options:**

KeyTypeDefaultDescription`context_size`int512Max tokens per text input`gpu_layers`int0Number of layers to offload to GPU`normalize`booltrueL2 normalize output vectorsThrows `MemVectorException` if the model file cannot be loaded.

---

#### `embed(string $text): array`

[](#embedstring-text-array)

Generate an embedding vector for a single text. Returns `float[]` of size `dimensions()`.

```
$vector = $emb->embed("The quick brown fox");
```

---

#### `embedBatch(array $texts): array`

[](#embedbatcharray-texts-array)

Generate embeddings for multiple texts. Returns `float[][]`.

```
$vectors = $emb->embedBatch(["Hello", "World", "Test"]);
// count($vectors) === 3, count($vectors[0]) === $emb->dimensions()
```

---

#### `dimensions(): int`

[](#dimensions-int)

Return the model's embedding dimensionality.

---

#### `close(): void`

[](#close-void-1)

Free the model and context. Called automatically on GC. Safe to call multiple times.

---

### Class: `MemVectorReranker`

[](#class-memvectorreranker)

Cross-encoder reranker for two-stage retrieval. Requires `--with-llama` at build time. Detect with `defined('MEMVECTOR_LLAMA')`.

#### `__construct(string $model_path, ?array $options = null)`

[](#__constructstring-model_path-array-options--null-1)

Load a GGUF cross-encoder reranker model.

```
$rr = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf', [
    'context_size' => 512,
    'gpu_layers'   => 0,
]);
```

**Options:**

KeyTypeDefaultDescription`context_size`int512Max tokens per query+document pair`gpu_layers`int0Number of layers to offload to GPUThrows `MemVectorException` if the model file cannot be loaded.

**Recommended model:** [bge-reranker-v2-m3](https://huggingface.co/gpustack/bge-reranker-v2-m3-GGUF) (Q8\_0 ~1 GB)

---

#### `score(string $query, array $texts): array`

[](#scorestring-query-array-texts-array)

Score each text against the query using the cross-encoder. Returns `[['index' => int, 'score' => float], ...]` sorted descending by score.

```
$scores = $rr->score("What is PHP?", [
    "PHP is a scripting language.",
    "The weather is sunny.",
]);
// → [['index' => 0, 'score' => 0.87], ['index' => 1, 'score' => -3.2]]
```

---

#### `rerank(string $query, array $candidates, int $topK = 0): array`

[](#rerankstring-query-array-candidates-int-topk--0-array)

Rerank search results from `MemVectorStore::search()`. Scores each candidate's `metadata` field against the query. Returns `[['key' => string, 'score' => float, 'metadata' => string], ...]` sorted descending. `$topK = 0` returns all candidates.

```
$candidates = $store->search($emb->embed($query), 50);
$reranked = $rr->rerank($query, $candidates, 5);
```

---

#### `close(): void`

[](#close-void-2)

Free the model and context. Called automatically on GC. Safe to call multiple times.

---

#### Two-Stage Retrieval Example

[](#two-stage-retrieval-example)

```
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$rr  = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
$store = new MemVectorStore(null, ['dimensions' => $emb->dimensions()]);

// Index documents
foreach ($docs as $key => $text) {
    $store->set($key, $emb->embed($text), $text);
}

// Stage 1: Fast vector search (broad recall)
$candidates = $store->search($emb->embed($query), 50, 0);

// Stage 2: Accurate cross-encoder rerank (high precision)
$results = $rr->rerank($query, $candidates, 5);
```

---

### Exception Class: `MemVectorException`

[](#exception-class-memvectorexception)

All errors throw `MemVectorException` (extends `\Exception`).

```
try {
    $store->set('x', [0.1]); // wrong dimension
} catch (MemVectorException $e) {
    echo $e->getMessage(); // "Vector dim mismatch: expected 128"
}
```

---

### INI Settings

[](#ini-settings)

DirectiveDefaultDescription`memvector.default_dir``/tmp/memvector`Default directory for disk stores`memvector.default_dim`1536Default vector dimension`memvector.default_storage`0Default storage mode`memvector.mem_initial_capacity`4096Initial memory-mode capacity`memvector.hnsw_max_connections`16HNSW max connections per node`memvector.hnsw_construction_ef`200HNSW build beam width`memvector.hnsw_ef_search`50HNSW search beam width`memvector.auto_index_threshold`100Min vectors before HNSW auto-builds (0 = disable)`memvector.pq_subvectors`0PQ sub-vector count (required for `quantization = 'pq'`)`memvector.pq_training_size`256Vectors to buffer before PQ codebook auto-trains`memvector.use_avx2`1Enable AVX2 SIMD (if compiled with support)`memvector.model_cache`1Cache llama.cpp models per-process (PHP\_INI\_SYSTEM). Requires `--with-llama`---

Usage Examples
--------------

[](#usage-examples)

### Semantic Search with Embeddings

[](#semantic-search-with-embeddings)

```
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$store = new MemVectorStore(null, ['dimensions' => $emb->dimensions()]);

// Index a knowledge base
$articles = [
    'relativity'  => 'Einstein published the theory of general relativity in 1915',
    'evolution'   => 'Darwin proposed natural selection as the mechanism of evolution',
    'computing'   => 'Turing described a theoretical machine that could compute anything',
    'genetics'    => 'Watson and Crick discovered the double helix structure of DNA',
    'quantum'     => 'Heisenberg formulated the uncertainty principle in quantum mechanics',
];

foreach ($articles as $key => $text) {
    $store->set($key, $emb->embed($text), json_encode(['text' => $text]));
}

// Semantic search — finds related articles, not just keyword matches
$results = $store->search($emb->embed('biology and heredity'), 3);
foreach ($results as $r) {
    $meta = json_decode($r['metadata'], true);
    printf("  [%.2f] %s: %s\n", $r['score'], $r['key'], $meta['text']);
}
// Output:
//   [0.67] genetics: Watson and Crick discovered the double helix...
//   [0.51] evolution: Darwin proposed natural selection...
//   [0.24] quantum: Heisenberg formulated the uncertainty...
```

### RAG Pipeline (Retrieval-Augmented Generation)

[](#rag-pipeline-retrieval-augmented-generation)

```
// 1. Build index from documents
$emb = new MemVectorEmbedding('/models/nomic-embed-text-v1.5.Q8_0.gguf');
$store = new MemVectorStore('/data/knowledge', ['dimensions' => $emb->dimensions()]);

foreach ($documents as $doc) {
    // Chunk long documents and index each chunk
    foreach (chunk_text($doc->content, 512) as $i => $chunk) {
        $store->set("{$doc->id}_chunk_{$i}", $emb->embed($chunk), json_encode([
            'doc_id' => $doc->id,
            'title'  => $doc->title,
            'chunk'  => $chunk,
        ]));
    }
}

// 2. Retrieve relevant context for a user query
$query = "How does photosynthesis work?";
$results = $store->search($emb->embed($query), 5, 100);

$context = implode("\n\n", array_map(function ($r) {
    return json_decode($r['metadata'], true)['chunk'];
}, $results));

// 3. Send to LLM with retrieved context
$prompt = "Context:\n{$context}\n\nQuestion: {$query}\nAnswer:";
// $answer = $llm->complete($prompt);
```

### Two-Stage Retrieval with Reranking

[](#two-stage-retrieval-with-reranking)

```
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$rr  = new MemVectorReranker('/models/bge-reranker-v2-m3-Q8_0.gguf');
$store = new MemVectorStore(null, ['dimensions' => $emb->dimensions()]);

// Index documents (store text as metadata for reranking)
$docs = [
    'php-web'     => 'PHP is widely used for building dynamic web applications and APIs.',
    'python-ml'   => 'Python is the dominant language for machine learning and data science.',
    'php-history' => 'PHP was originally created by Rasmus Lerdorf in 1994.',
    'weather'     => 'The weather forecast predicts rain for the upcoming weekend.',
];

foreach ($docs as $key => $text) {
    $store->set($key, $emb->embed($text), $text);
}

// Stage 1: Fast vector search — broad recall (top 50)
$query = "Tell me about PHP";
$candidates = $store->search($emb->embed($query), 50, 0);

// Stage 2: Cross-encoder rerank — high precision (top 3)
$results = $rr->rerank($query, $candidates, 3);

foreach ($results as $r) {
    printf("  [%.4f] %s: %s\n", $r['score'], $r['key'], $r['metadata']);
}
// PHP-related documents rank at the top, weather at the bottom
```

### Persistent Disk Store with HNSW

[](#persistent-disk-store-with-hnsw)

```
$store = new MemVectorStore('/var/data/vectors', [
    'dimensions'   => 768,
    'distance'     => 'cosine',
    'segment_size' => 100000,
]);

// Index vectors — HNSW auto-builds on first search
foreach ($documents as $doc) {
    $store->set($doc->id, $doc->embedding, json_encode($doc->metadata));
}

// Fast approximate search
$results = $store->search($query, 10, 100);
$store->close();

// Data and index persist — reopen anytime
$store = new MemVectorStore('/var/data/vectors', ['dimensions' => 768]);
$results = $store->search($query, 10, 100);
```

### Quantized Store (F16 — half memory)

[](#quantized-store-f16--half-memory)

```
$store = new MemVectorStore(null, [
    'dimensions'   => 768,
    'quantization' => 'f16',  // 2 bytes/dim instead of 4
]);

$store->set('doc_1', $embedding);
$results = $store->search($query, 10);
```

### Product Quantization (PQ — extreme compression)

[](#product-quantization-pq--extreme-compression)

```
ini_set('memvector.pq_subvectors', 96);      // 96 sub-vectors of 8 dims each
ini_set('memvector.pq_training_size', 1000); // auto-train after 1000 vectors

$store = new MemVectorStore(null, [
    'dimensions'   => 768,
    'quantization' => 'pq',
]);

// First 1000 vectors are buffered for codebook training
for ($i = 0; $i < 5000; $i++) {
    $store->set("vec_$i", $embeddings[$i]);
}
// After training: 96 bytes/vector instead of 3072 (32x compression)
```

### JSONL Backup &amp; Restore

[](#jsonl-backup--restore)

```
$store = new MemVectorStore('/data/vectors', ['dimensions' => 768]);

// Export
$count = $store->dump('/backup/vectors.jsonl');
echo "Exported $count records\n";

// Import into a new store
$newStore = new MemVectorStore('/data/vectors_v2', ['dimensions' => 768]);
$loaded = $newStore->load('/backup/vectors.jsonl');
echo "Imported $loaded records\n";
```

### Snapshot &amp; Reload in Memory Mode

[](#snapshot--reload-in-memory-mode)

```
$store = new MemVectorStore(null, ['dimensions' => 128]);
// ... add vectors ...

$store->snapshot('/data/snapshot');
$store->close();

// Reload entirely in memory (fast startup, no disk I/O after load)
$store = new MemVectorStore('/data/snapshot', ['storage' => 'memory', 'dimensions' => 128]);
```

### Shared Memory with OpenSwoole Workers

[](#shared-memory-with-openswoole-workers)

```
// Master: create and populate
$store = new MemVectorStore('search_index', [
    'storage'    => 'shm',
    'dimensions' => 1536,
]);
foreach ($documents as $doc) {
    $store->set($doc->id, $doc->embedding, $doc->title);
}
$store->close();

// Workers: attach read-only (zero-copy, cross-process)
$store = new MemVectorStore('search_index', [
    'storage'    => 'shm',
    'dimensions' => 1536,
    'writable'   => false,
]);
$results = $store->search($query_vector, 10, 100);
$store->close();

// Cleanup
MemVectorStore::destroy('search_index');
```

### Memory-Limited Store

[](#memory-limited-store)

```
$store = new MemVectorStore(null, [
    'dimensions'   => 128,
    'segment_size' => 10000,
    'memory_limit' => '256M',
]);
// Throws MemVectorException when adding would exceed the limit
```

---

### Model Cache (php-fpm)

[](#model-cache-php-fpm)

When `memvector.model_cache=1` (default), `MemVectorEmbedding` and `MemVectorReranker` cache loaded GGUF models per worker process. The first construction loads the model from disk; subsequent constructions with the same file path reuse the cached model and only create a lightweight context. This avoids re-parsing weights on every request. Disable with `memvector.model_cache=0` in `php.ini` (PHP\_INI\_SYSTEM — cannot be changed at runtime).

**Latency impact** — construction time per request:

ModelSizeWithout cacheWith cacheSpeedupall-MiniLM-L6-v2 (Q8)24 MB~60 ms~1.5 ms~40xnomic-embed-text-v1.5 (Q8)138 MB~300 ms~1.5 ms~200xbge-reranker-v2-m3 (Q8)636 MB~1500 ms~1.5 ms~1000xEmbed/score inference time is the same with or without the cache (~5-15 ms per call depending on model and text length).

**Memory impact** — per worker process:

ModelSizeCache RSS per workerContext per requestall-MiniLM-L6-v2 (Q8)24 MB~22 MB~1 MBnomic-embed-text-v1.5 (Q8)138 MB~60 MB~2 MBbge-reranker-v2-m3 (Q8)636 MB~200 MB~5 MBModel weights are loaded via `mmap()`, so the OS shares physical pages across all workers through the page cache. The per-worker private cost is mainly model metadata (vocab, tensor descriptors). For example, 10 php-fpm workers with a 24 MB model use ~24 MB shared + 10 x 8 MB private = ~104 MB total physical RAM, not 10 x 48 MB.

### Long-Lived PHP Runtimes

[](#long-lived-php-runtimes)

The model cache is designed for php-fpm's per-request lifecycle. For long-lived PHP runtimes — where worker processes persist across requests — the cache is unnecessary. Load the model once at startup and reuse the same object across all requests. This avoids both model loading **and** context creation overhead, giving the best possible performance.

This applies to any long-lived PHP framework or library:

- **[OpenSwoole](https://openswoole.com)** — async event-driven server
- **[ReactPHP](https://reactphp.org)** — event loop for non-blocking I/O
- **[AMPHP](https://amphp.org)** — async PHP framework
- **[Workerman](https://www.workerman.net)** — high-performance PHP socket server
- **[RoadRunner](https://roadrunner.dev)** — Go-powered PHP application server
- **[FrankenPHP](https://frankenphp.dev)** — modern PHP app server (worker mode)
- Any long-running CLI script or daemon

**OpenSwoole example:**

```
$server->on('workerStart', function ($server, $workerId) {
    $server->emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
    $server->store = new MemVectorStore('/data/vectors', [
        'dimensions' => $server->emb->dimensions(),
    ]);
});
$server->on('request', function ($request, $response) use ($server) {
    $vec = $server->emb->embed($request->post['text']);
    $results = $server->store->search($vec, 5);
    $response->end(json_encode($results));
});
```

**ReactPHP example:**

```
$emb = new MemVectorEmbedding('/models/all-MiniLM-L6-v2.Q8_0.gguf');
$store = new MemVectorStore('/data/vectors', ['dimensions' => $emb->dimensions()]);

$http = new React\Http\HttpServer(function (Psr\Http\Message\ServerRequestInterface $request) use ($emb, $store) {
    $body = json_decode((string) $request->getBody(), true);
    $vec = $emb->embed($body['text']);
    $results = $store->search($vec, 5);
    return React\Http\Message\Response::json($results);
});
```

In all cases, the `MemVectorEmbedding`, `MemVectorReranker`, and `MemVectorStore` objects stay in memory for the lifetime of the worker — zero overhead per request beyond the actual embed/search computation.

###  Health Score

40

—

FairBetter than 86% of packages

Maintenance77

Regular maintenance activity

Popularity20

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity46

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

124d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/b6ce95d464ea49965275feb3effa29fbfe231be9a79217d4ed01f59401a16a5e?d=identicon)[bruced](/maintainers/bruced)

---

Top Contributors

[![doubaokun](https://avatars.githubusercontent.com/u/313478?v=4)](https://github.com/doubaokun "doubaokun (5 commits)")

---

Tags

aillmphpragvector

### Embed Badge

![Health badge](/badges/memvector-ext-memvector/health.svg)

```
[![Health](https://phpackages.com/badges/memvector-ext-memvector/health.svg)](https://phpackages.com/packages/memvector-ext-memvector)
```

###  Alternatives

[jdorn/sql-formatter

a PHP SQL highlighting library

3.9k117.2M117](/packages/jdorn-sql-formatter)[propel/propel1

Propel is an open-source Object-Relational Mapping (ORM) for PHP5.

8351.6M87](/packages/propel-propel1)[jfelder/oracledb

Oracle DB driver for Laravel

11518.4k](/packages/jfelder-oracledb)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)