PHPackages                             tamirrental/laravel-text-extractor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. tamirrental/laravel-text-extractor

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

tamirrental/laravel-text-extractor
==================================

A Laravel package for extracting structured data from documents via OCR APIs. Ships with Koncile AI provider.

1.0.0(2mo ago)2371↓12.5%MITPHPPHP ^8.4CI passing

Since Mar 5Pushed 2mo agoCompare

[ Source](https://github.com/TamirRental/laravel-text-extractor)[ Packagist](https://packagist.org/packages/tamirrental/laravel-text-extractor)[ Docs](https://github.com/TamirRental/laravel-text-extractor)[ RSS](/packages/tamirrental-laravel-text-extractor/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)Dependencies (7)Versions (4)Used By (0)

Laravel Text Extractor
======================

[](#laravel-text-extractor)

[![Tests](https://github.com/TamirRental/laravel-text-extractor/actions/workflows/run-tests.yml/badge.svg)](https://github.com/TamirRental/laravel-text-extractor/actions/workflows/run-tests.yml)[![Type Coverage](https://camo.githubusercontent.com/0bb01e8961a61656e3f2511460349bc9b30a0d817d9a720ee8494bfe8f2b8402/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f747970652d2d636f7665726167652d3130302532352d73756363657373)](https://github.com/TamirRental/laravel-text-extractor)[![PHP](https://camo.githubusercontent.com/29410bb6bcb6a3fb064bad25ef3ce0793978e60c3876a15c4c80f62d6660e553/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d382e342532422d626c7565)](https://www.php.net)[![Laravel](https://camo.githubusercontent.com/5bd61ec377dc8f6b4d88d43749cfe4da6a6e79fe5f3471cc226b3c0353295a88/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c61726176656c2d313125323025374325323031322d726564)](https://laravel.com)

A Laravel package for extracting structured data from documents (images, PDFs) via OCR APIs. Ships with a [Koncile AI](https://koncile.ai) provider out of the box.

Features
--------

[](#features)

- Extract structured data from documents using OCR providers
- Fluent API — chainable `metadata()`, `force()`, and `submit()` methods
- Async processing via Laravel queues
- Pluggable provider architecture — bring your own OCR provider
- Facade for clean, expressive syntax
- Built-in model scopes for querying extractions

Requirements
------------

[](#requirements)

- PHP 8.4+
- Laravel 11 or 12

Installation
------------

[](#installation)

```
composer require tamirrental/laravel-text-extractor
```

Run the install command to publish the config file and migration:

```
php artisan document-extraction:install
```

Then run the migration:

```
php artisan migrate
```

Configuration
-------------

[](#configuration)

### `config/document-extraction.php`

[](#configdocument-extractionphp)

Provider connection settings.

```
return [
    'default' => env('EXTRACTION_PROVIDER', 'koncile_ai'),

    'providers' => [
        'koncile_ai' => [
            'url' => env('KONCILE_AI_API_URL', 'https://api.koncile.ai'),
            'key' => env('KONCILE_AI_API_KEY'),
            'webhook_secret' => env('KONCILE_AI_WEBHOOK_SECRET'),
        ],
    ],
];
```

### Environment Variables

[](#environment-variables)

Add these to your `.env` file:

```
KONCILE_AI_API_KEY=your-api-key
KONCILE_AI_WEBHOOK_SECRET=your-webhook-secret
```

Usage
-----

[](#usage)

### Basic Usage with Facade

[](#basic-usage-with-facade)

```
use TamirRental\DocumentExtraction\Facades\DocumentExtraction;

// Store the uploaded file
$path = $file->store('documents/car-licenses', 's3');

// Extract — creates a record and dispatches async processing
$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata([
        'template_id' => 'your-koncile-template-id',
        'folder_id' => 'optional-folder-id',         // optional
        'identifier_field' => 'license_number',       // optional — used to resolve identifier from extracted data
    ])
    ->submit();
```

The package automatically dispatches a queued job to download the file from storage, upload it to the OCR provider, and track the result.

### Metadata

[](#metadata)

The `metadata()` method accepts a key-value array that gets stored on the extraction record and passed to the provider. This is how you supply provider-specific data without any config files.

KeyRequiredDescription`template_id`Yes (Koncile AI)The OCR template ID on the provider side`folder_id`NoOptional folder/organization ID on the provider side`identifier_field`NoThe field name from extracted data to use as a unique identifier (e.g. `license_number`)### Force Re-extraction

[](#force-re-extraction)

If an extraction already exists for a file, chain `force()` to create a new one:

```
$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata(['template_id' => 'your-template-id'])
    ->force()
    ->submit();
```

### Conditional Force

[](#conditional-force)

Using the `Conditionable` trait, you can conditionally chain methods:

```
$extraction = DocumentExtraction::extract('car_license', $path)
    ->metadata(['template_id' => 'your-template-id'])
    ->when($shouldForce, fn ($pending) => $pending->force())
    ->submit();
```

### Checking Extraction Status

[](#checking-extraction-status)

```
use TamirRental\DocumentExtraction\Enums\DocumentExtractionStatusEnum;
use TamirRental\DocumentExtraction\Models\DocumentExtraction;

$extraction = DocumentExtraction::find($id);

if ($extraction->status === DocumentExtractionStatusEnum::Completed) {
    $data = $extraction->extracted_data;
    $identifier = $extraction->identifier; // e.g. "12-345-67"
}
```

### Querying Extractions

[](#querying-extractions)

The `DocumentExtraction` model includes useful scopes:

```
use TamirRental\DocumentExtraction\Models\DocumentExtraction;

// Filter by status
DocumentExtraction::pending()->get();
DocumentExtraction::completed()->get();
DocumentExtraction::failed()->get();

// Filter by type or file
DocumentExtraction::forType('car_license')->get();
DocumentExtraction::forFile('documents/license.png')->get();

// Combine scopes
DocumentExtraction::forType('car_license')->completed()->latest()->first();
```

How It Works
------------

[](#how-it-works)

```
1. Your App                    2. Queue Worker               3. Provider (Koncile AI)
   │                              │                              │
   ├─ Store file to Storage       │                              │
   ├─ extract()->submit() ───────►│                              │
   │  (auto-dispatches event)     ├─ Download from Storage       │
   │                              ├─ Upload to provider ────────►│
   │                              ├─ Save external_task_id       │
   │                              │                              ├─ OCR Processing...
   │                              │                              │
   │◄─────────────── Provider webhook callback ◄────────────────┤
   ├─ Your controller handles it  │                              │
   ├─ complete() / fail()         │                              │
   │                              │                              │
   ├─ Check status / display      │                              │

```

### Extraction Lifecycle

[](#extraction-lifecycle)

StageStatusexternal\_task\_idextracted\_dataRecord created`pending``null``{}`Sent to provider`pending``task-abc-123``{}`Provider succeeds`completed``task-abc-123``{...provider data}`Provider fails`failed``task-abc-123``{}`Handling Webhooks
-----------------

[](#handling-webhooks)

The package does **not** register webhook routes — you own the entire webhook flow. Create your own controller to receive provider callbacks and use the service to update extractions:

```