PHPackages                             thesubhendu/embedvector-laravel - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. thesubhendu/embedvector-laravel

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

thesubhendu/embedvector-laravel
===============================

Recommendation engine using Open AI embedding and PostgresSQL pgvector

v0.0.7(8mo ago)451↓100%2[2 PRs](https://github.com/thesubhendu/embedvector-laravel/pulls)MITPHPPHP ^8.3CI failing

Since Feb 3Pushed 8mo ago1 watchersCompare

[ Source](https://github.com/thesubhendu/embedvector-laravel)[ Packagist](https://packagist.org/packages/thesubhendu/embedvector-laravel)[ Docs](https://github.com/thesubhendu/embedvector-laravel)[ RSS](/packages/thesubhendu-embedvector-laravel/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (14)Versions (11)Used By (0)

EmbedVector Laravel Package
===========================

[](#embedvector-laravel-package)

A Laravel package for building intelligent recommendation systems using OpenAI embeddings. Perfect for creating personalized content recommendations, job matching, product suggestions, and similar features where you need to find relevant matches based on user profiles or content similarity.

Features
--------

[](#features)

- Batch embedding processing using OpenAI's batch API
- Separate database connection support for vector operations
- Automatic vector extension creation for PostgreSQL
- Efficient batch processing with configurable chunk sizes
- **Dual Contract System**: Separate contracts for embedding generation and searchable models
- **Smart Model Separation**: Models can be either embedding sources or searchable targets

Installation
------------

[](#installation)

1. Install the package via Composer:

```
composer require thesubhendu/embedvector-laravel
```

2. Publish the configuration and migrations:

```
php artisan vendor:publish --provider="Subhendu\EmbedVector\EmbedVectorServiceProvider"
```

3. Configure your environment variables:

```
OPENAI_API_KEY=your_openai_api_key_here
```

**Database Requirements**: This package requires PostgreSQL with the pgvector extension for vector operations.

Optional: If you want to use a separate PostgreSQL database connection other than your application database for vector operations, you can set the `EMBEDVECTOR_DB_CONNECTION` environment variable.

```
EMBEDVECTOR_DB_CONNECTION=pgsql
```

4. Run the migrations

```
php artisan migrate
```

Usage
-----

[](#usage)

### Understanding the Contract System

[](#understanding-the-contract-system)

This package uses two distinct contracts to separate concerns based on the direction of matching:

1. **`EmbeddableContract`** - For models that generate embeddings (e.g., Customer/Candidate profiles)
2. **`EmbeddingSearchableContract`** - For models that can be found using embeddings (e.g., Jobs)

#### Example Use Case: Job Matching for Candidates

[](#example-use-case-job-matching-for-candidates)

If system is designed to **find matching jobs for customers/candidates**, not the other way around:

- **Customer/Candidate** implements `EmbeddableContract` → generates embeddings from their profile, skills, preferences
- **Job** implements `EmbeddingSearchableContract` → can be found/recommended based on candidate embeddings
- **Flow**: Customer embeddings are used to find relevant Jobs that match their profile

**For Bidirectional Matching**: If you want both ways (finding jobs for candidates AND finding candidates for jobs), then both models need to implement `EmbeddingSearchableContract`.

### Basic Embedding

[](#basic-embedding)

```
use Subhendu\EmbedVector\Services\EmbeddingService;

$embeddingService = app(EmbeddingService::class);
$embedding = $embeddingService->createEmbedding('Your text here');
```

### Implementing Contracts

[](#implementing-contracts)

#### For Models That Generate Embeddings (e.g., Customer)

[](#for-models-that-generate-embeddings-eg-customer)

```
use Illuminate\Database\Eloquent\Model;
use Subhendu\EmbedVector\Contracts\EmbeddableContract;
use Subhendu\EmbedVector\Traits\EmbeddableTrait;

class Customer extends Model implements EmbeddableContract
{
    use EmbeddableTrait;

    public function toEmbeddingText(): string
    {
        return $this->name . ' ' . $this->department . ' ' . $this->skills;
    }
}
```

#### For Models That Can Be Searched (e.g., Job)

[](#for-models-that-can-be-searched-eg-job)

```
use Illuminate\Database\Eloquent\Model;
use Illuminate\Database\Eloquent\Factories\HasFactory;
use Subhendu\EmbedVector\Contracts\EmbeddingSearchableContract;
use Subhendu\EmbedVector\Traits\EmbeddingSearchableTrait;

class Job extends Model implements EmbeddingSearchableContract
{
    use EmbeddingSearchableTrait;
    use HasFactory;

    public function toEmbeddingText(): string
    {
        return $this->title . ' ' . $this->description . ' ' . $this->requirements;
    }
}
```

**Note**: `EmbeddingSearchableContract` extends `EmbeddableContract`, and `EmbeddingSearchableTrait` automatically includes `EmbeddableTrait` functionality, so you only need to use one trait.

### Finding Matching Results

[](#finding-matching-results)

#### Basic Usage

[](#basic-usage)

```
// Find jobs that match a customer's profile
$customer = Customer::find(1);
$matchingJobs = $customer->matchingResults(Job::class, 10);

foreach ($matchingJobs as $job) {
    echo "Job: {$job->title} - Match: {$job->match_percent}%";
    echo "Distance: {$job->distance}";
}
```

**Note:** The `matchingResults()` method automatically uses `getOrCreateEmbedding()` internally, which means:

- If no embedding exists for the source model, it will be created
- If an embedding exists but needs sync (`embedding_sync_required = true`), it will be updated
- This ensures you always get accurate similarity results

#### Advanced Usage with Filters

[](#advanced-usage-with-filters)

You can add query filters to narrow down the search results before embedding similarity is calculated:

```
// Find only active jobs in specific locations
$customer = Customer::find(1);
$matchingJobs = $customer->matchingResults(
    targetModelClass: Job::class,
    topK: 10,
    queryFilter: function ($query) {
        $query->where('status', 'active')
              ->whereIn('location', ['New York', 'San Francisco'])
              ->where('salary', '>=', 80000);
    }
);
```

#### Method Parameters

[](#method-parameters)

- **`targetModelClass`** (string): The class name of the model you want to find matches for
- **`topK`** (int, default: 5): Maximum number of results to return
- **`queryFilter`** (Closure, optional): Custom query constraints to apply before similarity matching

#### Return Properties

[](#return-properties)

Each returned model includes additional properties:

- **`match_percent`** (float): Similarity percentage (0-100, higher is better)
- **`distance`** (float): Vector distance (lower is better for similarity)

Configuration
-------------

[](#configuration)

The package publishes a configuration file to `config/embedvector.php` with the following options:

```
return [
    'openai_api_key' => env('OPENAI_API_KEY', ''),
    'embedding_model' => env('EMBEDVECTOR_MODEL', 'text-embedding-3-small'),
    'distance_metric' => env('EMBEDVECTOR_DISTANCE', 'cosine'), // cosine | l2
    'search_strategy' => env('EMBEDVECTOR_SEARCH_STRATEGY', 'auto'), // auto | optimized | cross_connection
    'lot_size' => env('EMBEDVECTOR_LOT_SIZE', 50000),
    'chunk_size' => env('EMBEDVECTOR_CHUNK_SIZE', 500),
    'directories' => [
        'input' => 'embeddings/input',
        'output' => 'embeddings/output',
    ],
    'database_connection' => env('EMBEDVECTOR_DB_CONNECTION', 'pgsql'),
    'model_fields_to_check' => [
        // Configure fields to monitor for automatic sync
        // 'App\Models\Job' => ['title', 'description', 'requirements'],
    ],
];
```

### Configuration Options Explained

[](#configuration-options-explained)

- **`openai_api_key`**: Your OpenAI API key (required in production)
- **`embedding_model`**: OpenAI embedding model to use (text-embedding-3-small, text-embedding-3-large, etc.)
- **`distance_metric`**: Vector similarity calculation method
    - `cosine`: Better for semantic similarity (recommended)
    - `l2`: Euclidean distance for geometric similarity
- **`search_strategy`**: How to perform similarity searches
    - `auto`: Automatically choose the best strategy (recommended)
    - `optimized`: Use JOIN-based queries (same database only)
    - `cross_connection`: Two-step approach (works across different databases)
- **`lot_size`**: Maximum items per OpenAI batch (up to 50,000)
- **`chunk_size`**: Items processed per chunk during batch generation
- **`database_connection`**: PostgreSQL connection for vector operations
- **`model_fields_to_check`**: Configure fields to monitor for automatic sync with `FireSyncEmbeddingTrait`

### Batch Processing

[](#batch-processing)

For processing large datasets efficiently, this package provides batch processing capabilities using OpenAI's batch API, which is more cost-effective for processing many embeddings at once.

Commands
--------

[](#commands)

- `php artisan embedding:gen {model} {--type=sync|init} {--force}` - Generate batch embeddings for a specific model
- `php artisan embedding:proc {--batch-id=} {--all}` - Process completed batch results

### Command Options

[](#command-options)

#### `embedding:gen`

[](#embeddinggen)

- `{model}` - The model class name to generate embeddings for (e.g. `App\\Models\\Job`)
- `--type=sync` - Processing type (default: sync)
- `--force` - Force overwrite existing files

#### `embedding:proc`

[](#embeddingproc)

- `--batch-id=` - Process a specific batch by ID
- `--all` - Process all completed batches
- No options - Check and process batches that are ready (default behavior)

### Usage Examples

[](#usage-examples)

```
# Generate embeddings for User model (init = first time, sync = update existing)
php artisan embedding:gen "App\\Models\\User" --type=init

# Generate embeddings for sync (only models that need updates)
php artisan embedding:gen "App\\Models\\Job" --type=sync

# Check and process ready batches (default)
php artisan embedding:proc

# Process all completed batches
php artisan embedding:proc --all

# Process specific batch
php artisan embedding:proc --batch-id=batch_abc123
```

Real-World Examples
-------------------

[](#real-world-examples)

### E-commerce Product Recommendations

[](#e-commerce-product-recommendations)

```
// Product model (searchable)
class Product extends Model implements EmbeddingSearchableContract
{
    use EmbeddingSearchableTrait;

    public function toEmbeddingText(): string
    {
        return $this->name . ' ' . $this->description . ' ' . $this->category . ' ' . $this->tags;
    }
}

// User model (generates embeddings from purchase history)
class User extends Model implements EmbeddableContract
{
    use EmbeddableTrait;

    public function toEmbeddingText(): string
    {
        $purchaseHistory = $this->orders()
            ->with('products')
            ->get()
            ->flatMap->products
            ->pluck('name')
            ->implode(' ');

        return $this->preferences . ' ' . $purchaseHistory;
    }
}

// Find recommended products for a user
$user = User::find(1);
$recommendations = $user->matchingResults(
    targetModelClass: Product::class,
    topK: 20,
    queryFilter: function ($query) {
        $query->where('in_stock', true)
              ->where('price', '=', $candidate->min_salary);
    }
);

foreach ($matchingJobs as $job) {
    echo "Match: {$job->match_percent}% - {$job->title} at {$job->company}";
}
```

### Content Recommendation System

[](#content-recommendation-system)

```
// Article model
class Article extends Model implements EmbeddingSearchableContract
{
    use EmbeddingSearchableTrait;

    public function toEmbeddingText(): string
    {
        return $this->title . ' ' . $this->summary . ' ' . $this->tags . ' ' . $this->category;
    }
}

// User reading history model
class UserProfile extends Model implements EmbeddableContract
{
    use EmbeddableTrait;

    public function toEmbeddingText(): string
    {
        $readingHistory = $this->user->readArticles()
            ->selectRaw('GROUP_CONCAT(title, " ", summary) as content')
            ->value('content');

        return $this->interests . ' ' . $readingHistory;
    }
}

// Get personalized article recommendations
$profile = UserProfile::where('user_id', auth()->id())->first();
$recommendations = $profile->matchingResults(
    targetModelClass: Article::class,
    topK: 10,
    queryFilter: function ($query) use ($profile) {
        $query->where('published', true)
              ->where('created_at', '>=', now()->subDays(7))
              ->whereNotIn('id', $profile->user->read_article_ids);
    }
);
```

Embedding Management Examples
-----------------------------

[](#embedding-management-examples)

### Working with Embeddings

[](#working-with-embeddings)

```
$job = Job::find(1);

// Check if an embedding exists without creating one
$embedding = $job->getEmbedding();
if ($embedding) {
    echo "Embedding exists: " . ($embedding->embedding_sync_required ? "Needs sync" : "Up to date");
} else {
    echo "No embedding found";
}

// Get or create embedding (will create if missing or update if sync required)
$embedding = $job->getOrCreateEmbedding();
echo "Embedding ready with match percentage calculation";

// Force create a fresh embedding (useful for testing or manual refresh)
$freshEmbedding = $job->createFreshEmbedding();

// Queue for syncing (mark for batch update later)
$job->queueForSyncing();
```

### Batch Sync Workflow

[](#batch-sync-workflow)

```
// 1. Mark multiple models for syncing
$jobs = Job::where('updated_at', '>', now()->subDays(1))->get();
foreach ($jobs as $job) {
    $job->queueForSyncing(); // Queue each job for sync
}

// 2. Process all queued embeddings in batch
php artisan embedding:gen "App\\Models\\Job" --type=sync

// 3. Process the completed batch
php artisan embedding:proc --all
```

### Conditional Embedding Updates

[](#conditional-embedding-updates)

```
class JobController extends Controller
{
    public function update(Request $request, Job $job)
    {
        $job->update($request->validated());

        // Only queue for syncing if embedding-relevant fields changed
        if ($job->wasChanged(['title', 'description', 'requirements'])) {
            $job->queueForSyncing();
        }

        return response()->json($job);
    }
}
```

### Real-time vs Batch Embedding Strategy

[](#real-time-vs-batch-embedding-strategy)

```
// Real-time embedding (immediate, good for single updates)
$job = Job::create($data);
$embedding = $job->getOrCreateEmbedding(); // Creates immediately

// Batch embedding (efficient for bulk updates)
$jobs = Job::factory()->count(100)->create();
foreach ($jobs as $job) {
    $job->queueForSyncing(); // Mark for batch processing
}
// Then run: php artisan embedding:gen "App\\Models\\Job" --type=sync
```

### Embedding Lifecycle Management

[](#embedding-lifecycle-management)

```
// Scenario 1: New model creation
$job = Job::create($data);
// Option A: Create embedding immediately
$embedding = $job->getOrCreateEmbedding();
// Option B: Queue for batch processing (more efficient)
$job->queueForSyncing();

// Scenario 2: Model updates
$job->update(['title' => 'Updated Title']);
// Option A: Update embedding immediately
$job->createFreshEmbedding();
// Option B: Queue for batch processing (recommended)
$job->queueForSyncing();

// Scenario 3: Checking embedding status
$embedding = $job->getEmbedding();
if (!$embedding) {
    echo "No embedding exists";
} elseif ($embedding->embedding_sync_required) {
    echo "Embedding needs update";
} else {
    echo "Embedding is up to date";
}

// Scenario 4: Bulk operations
$jobs = Job::where('department', 'Engineering')->get();
foreach ($jobs as $job) {
    $job->queueForSyncing(); // Queue all for batch processing
}
// Process in batch: php artisan embedding:gen "App\\Models\\Job" --type=sync
```

Best Practices
--------------

[](#best-practices)

### 1. Optimize Your `toEmbeddingText()` Method

[](#1-optimize-your-toembeddingtext-method)

```
public function toEmbeddingText(): string
{
    // ✅ Good: Concise, relevant information
    return trim($this->title . ' ' . $this->description . ' ' . $this->tags);

    // ❌ Avoid: Too much noise or irrelevant data
    // return $this->created_at . ' ' . $this->id . ' ' . $this->long_legal_text;
}
```

### 2. Use Appropriate Filters

[](#2-use-appropriate-filters)

```
// ✅ Good: Filter before similarity calculation
$matches = $user->matchingResults(
    Product::class,
    10,
    fn($q) => $q->where('available', true)->where('price', '