PHPackages                             ges/ocr - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. ges/ocr

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

ges/ocr
=======

Core document processing services for OCR, classification, extraction, and normalization.

0.3.0(1mo ago)0133↑166.7%MITPHPPHP ^8.4CI passing

Since Mar 30Pushed 1mo agoCompare

[ Source](https://github.com/TechGES/ocr)[ Packagist](https://packagist.org/packages/ges/ocr)[ RSS](/packages/ges-ocr/feed)WikiDiscussions develop Synced 3w ago

READMEChangelog (2)Dependencies (21)Versions (16)Used By (0)

ges/ocr
=======

[](#gesocr)

Laravel package for document OCR, classification, extraction, and normalization.

This package is built for French business and identity documents, with current support for:

- `identity_card`
- `residence_permit`
- `passport`
- `visa`
- `crew_card`
- `travel_document`
- `other_identity_document`
- `kbis`
- `acte_propriete` (land-title deed only)
- `msa` (parcel table)

What This Package Does
----------------------

[](#what-this-package-does)

Input pipeline:

- detect technical input type: `image`, `pdf_text`, `pdf_scan`
- transcribe images and scanned PDFs
- classify the business document type
- extract structured data
- normalize values into a stable shape
- return a `ProcessedDocumentResult`

Current model strategy:

- `qwen2.5vl:7b` for visual transcription only
- `qwen2.5:7b` for classification and structured extraction

Available AI providers:

- `ollama`
- `openai`

Provider strategy:

- `ollama` uses a multi-step pipeline: vision transcription, classification, extraction, optional MRZ merge
- `openai` uses a single structured request per document and returns classification plus extracted data in one response

Package Boundaries
------------------

[](#package-boundaries)

This package contains:

- OCR/transcription services
- classifier
- extractor
- normalizer
- schema factory
- AI clients for Ollama and OpenAI
- package `DocumentProcessing` model
- package migration and factory
- install command

This package does not own your application workflow.

Typical app-specific code stays outside:

- accepted `Document` model
- upload flow
- matching an identity document against a user
- deciding whether to persist a final document
- queue jobs tied to your app domain

Install
-------

[](#install)

```
composer require ges/ocr
```

Then install package assets:

```
php artisan ocr:install
```

Or install and migrate immediately:

```
php artisan ocr:install --migrate
```

Optional install flags:

```
php artisan ocr:install --check
php artisan ocr:install --no-config
php artisan ocr:install --no-migrations
php artisan ocr:install --force
```

What this command does:

- publishes `config/ges-ocr.php`
- publishes package migrations
- optionally runs `php artisan migrate`
- optionally runs `php artisan ocr:health`

Health check command:

```
php artisan ocr:health
```

It checks:

- `pdftotext`
- `pdftoppm`
- selected AI provider connectivity
- configured text and vision models

Configuration
-------------

[](#configuration)

Published config file:

```
config/ges-ocr.php
```

Main environment variables:

```
GES_OCR_AI_PROVIDER=ollama
GES_OCR_CLASSIFICATION_CONFIDENCE_THRESHOLD=0.75
GES_OCR_MAX_PAGES=0

OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_TEXT_MODEL=qwen2.5:7b
OLLAMA_VISION_MODEL=qwen2.5vl:7b
OLLAMA_CONNECT_TIMEOUT=10
OLLAMA_TIMEOUT=120
OLLAMA_RETRY_TIMES=2
OLLAMA_RETRY_SLEEP_MS=500
OLLAMA_BASIC_AUTH_ENABLED=false
OLLAMA_BASIC_AUTH_USERNAME=
OLLAMA_BASIC_AUTH_PASSWORD=

OPENAI_BASE_URL=https://api.openai.com/v1
OPENAI_API_KEY=
OPENAI_TEXT_MODEL=gpt-4.1-mini
OPENAI_VISION_MODEL=gpt-4.1-mini
OPENAI_CONNECT_TIMEOUT=10
OPENAI_TIMEOUT=120
OPENAI_RETRY_TIMES=2
OPENAI_RETRY_SLEEP_MS=500

GES_OCR_MRZ_OCR_ENABLED=true
GES_OCR_CLEANUP_TEMPORARY_FILES=true
```

`GES_OCR_AI_PROVIDER` accepts `ollama` or `openai`.

`GES_OCR_MAX_PAGES=0` means unlimited pages.

Main config areas:

- `ai`
- `ollama`
- `openai`
- `mrz`
- `processing`

Optional Ollama upstream basic auth:

- `OLLAMA_BASIC_AUTH_ENABLED=true` enables HTTP basic auth on requests sent to `OLLAMA_BASE_URL`
- `OLLAMA_BASIC_AUTH_USERNAME` sets the upstream username
- `OLLAMA_BASIC_AUTH_PASSWORD` sets the upstream password

Example OpenAI setup:

```
GES_OCR_AI_PROVIDER=openai
OPENAI_API_KEY=sk-...
OPENAI_TEXT_MODEL=gpt-4.1-mini
OPENAI_VISION_MODEL=gpt-4.1-mini
```

Public API
----------

[](#public-api)

Main service:

```
use Ges\Ocr\DocumentProcessor;

$result = app(DocumentProcessor::class)->processFile(
    path: $absolutePath,
    mimeType: $mimeType,
    originalName: $originalName,
);
```

Returned DTO:

- `originalName`
- `mimeType`
- `path`
- `inputType`
- `documentType`
- `status`
- `pagesCount`
- `rawClassificationJson`
- `rawExtractionJson`
- `normalizedJson`
- `errorMessage`

Main statuses:

- `pending`
- `processing`
- `done`
- `failed`
- `needs_review`

Supported Output Shapes
-----------------------

[](#supported-output-shapes)

### Identity Card

[](#identity-card)

Normalized keys:

- `document_type`
- `civility`
- `first_name`
- `last_name`
- `date_of_birth`
- `place_of_birth`
- `document_number`
- `expiry_date`
- `nationality`
- `sex`
- `street_address`
- `postal_code`
- `city`

### Residence Permit

[](#residence-permit)

Normalized keys:

- `document_type`
- `civility`
- `first_name`
- `last_name`
- `date_of_birth`
- `place_of_birth`
- `document_number`
- `expiry_date`
- `nationality`
- `sex`
- `street_address`
- `postal_code`
- `city`

### KBIS

[](#kbis)

Normalized keys:

- `document_type`
- `company_name`
- `trade_name`
- `legal_form`
- `capital`
- `registration_number`
- `siret`
- `sirene`
- `street_address`
- `postal_code`
- `city`
- `naf_code`
- `registration_date`
- `issue_date`
- `registry_city`
- `legal_representatives`

Representative shape:

- `entity_type`
- `company_name`
- `legal_form`
- `civility`
- `first_name`
- `last_name`
- `street_address`
- `postal_code`
- `city`
- `registration_number`
- `registry_city`
- `role`

### Acte Propriete

[](#acte-propriete)

Important: this currently means French land-title deed only.

Normalized keys:

- `document_type`
- `cadastral_parcels`
- `owners`

Parcel shape:

- `prefixe`
- `section`
- `numero`
- `street_address`
- `postal_code`
- `city`

Owner shape:

- `entity_type`
- `company_name`
- `civility`
- `first_name`
- `last_name`

Rules:

- owners are acquirers only
- sellers must not be returned as owners
- municipalities and administrations are treated as `company`
- `lieudit` / `leudit` may be used as parcel `street_address`

Package Model
-------------

[](#package-model)

The package provides:

```
Ges\Ocr\Models\DocumentProcessing
```

This model stores:

- source file metadata
- detected input type
- business document type
- status
- raw classification JSON
- raw extraction JSON
- normalized JSON
- error message

If your app wants its own subclass, it can extend the package model.

AI Notes
--------

[](#ai-notes)

If you are an AI agent working in a project using this package:

- Use `DocumentProcessor::processFile(...)` as the main entry point.
- Treat `rawClassificationJson` as model output, not final truth.
- Treat `normalizedJson` as the stable application-facing payload.
- For images and scanned PDFs, the package uses two LLM stages:
    - vision transcription
    - text classification/extraction
- Exception: when `GES_OCR_AI_PROVIDER=openai`, the package uses a one-shot analysis request instead.
- Do not assume `acte_propriete` means generic property deed. In this package it currently means land-title deed only.
- Distinguish `identity_card` from `residence_permit`.
- Use `residence_permit` for French residence permits and `identity_card` for French identity cards.
- For KBIS:
    - `registration_number` is the raw `Immatriculation RCS`
    - `sirene` is 9 digits
    - `siret` is optional and only if explicitly present

Tests
-----

[](#tests)

Package tests live under:

```
tests/Unit
```

Manual OCR fixture tests exist for:

- CIN
- titre de séjour
- KBIS
- land-title deeds

They are gated by:

```
RUN_MANUAL_OCR_TESTS=1
```

Current Assumptions
-------------------

[](#current-assumptions)

- documents are French documents
- the selected AI provider is reachable from the Laravel app
- `pdftotext` and `pdftoppm` are available for PDF handling

Non-Goals
---------

[](#non-goals)

This package does not currently provide:

- user/document matching workflow
- approval workflow
- final accepted document persistence
- domain-specific queue orchestration
- UI components

Those belong in the consuming application.

###  Health Score

43

—

FairBetter than 90% of packages

Maintenance93

Actively maintained with recent releases

Popularity14

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity49

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 62.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~5 days

Recently: every ~13 days

Total

12

Last Release

34d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/5db9dc20af1fe51cbb0234b0bdd1f1f16f3275e78d3f7205d009fb0d694c56b4?d=identicon)[Wadie.elarrim](/maintainers/Wadie.elarrim)

---

Top Contributors

[![jegan42](https://avatars.githubusercontent.com/u/42372831?v=4)](https://github.com/jegan42 "jegan42 (22 commits)")[![wadie-elarrim](https://avatars.githubusercontent.com/u/186934705?v=4)](https://github.com/wadie-elarrim "wadie-elarrim (13 commits)")

---

Tags

laravelOCRollamadocument-processingidentity-documents

###  Code Quality

TestsPest

### Embed Badge

![Health badge](/badges/ges-ocr/health.svg)

```
[![Health](https://phpackages.com/badges/ges-ocr/health.svg)](https://phpackages.com/packages/ges-ocr)
```

###  Alternatives

[psalm/plugin-laravel

Psalm plugin for Laravel

3345.1M337](/packages/psalm-plugin-laravel)[spatie/laravel-responsecache

Speed up a Laravel application by caching the entire response

2.8k8.7M64](/packages/spatie-laravel-responsecache)[erlandmuchasaj/laravel-gzip

Gzip your responses.

40140.4k2](/packages/erlandmuchasaj-laravel-gzip)[zidbih/laravel-deadlock

Make temporary Laravel workarounds expire and fail CI when ignored.

984.0k](/packages/zidbih-laravel-deadlock)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
