PHPackages                             ges/ocr - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. ges/ocr

ActiveLibrary

ges/ocr
=======

Core document processing services for OCR, classification, extraction, and normalization.

0.1.0(1mo ago)01↑2900%MITPHPPHP ^8.4CI failing

Since Mar 30Pushed 1mo agoCompare

[ Source](https://github.com/TechGES/ocr)[ Packagist](https://packagist.org/packages/ges/ocr)[ RSS](/packages/ges-ocr/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (7)Versions (2)Used By (0)

ges/ocr
=======

[](#gesocr)

Laravel package for document OCR, classification, extraction, and normalization.

This package is built for French business and identity documents, with current support for:

- `identity_card`
- `residence_permit`
- `passport`
- `visa`
- `crew_card`
- `travel_document`
- `other_identity_document`
- `kbis`
- `acte_propriete` (land-title deed only)

What This Package Does
----------------------

[](#what-this-package-does)

Input pipeline:

- detect technical input type: `image`, `pdf_text`, `pdf_scan`
- transcribe images and scanned PDFs
- classify the business document type
- extract structured data
- normalize values into a stable shape
- return a `ProcessedDocumentResult`

Current model strategy:

- `qwen2.5vl:7b` for visual transcription only
- `qwen2.5:7b` for classification and structured extraction

Package Boundaries
------------------

[](#package-boundaries)

This package contains:

- OCR/transcription services
- classifier
- extractor
- normalizer
- schema factory
- Ollama client
- package `DocumentProcessing` model
- package migration and factory
- install command

This package does not own your application workflow.

Typical app-specific code stays outside:

- accepted `Document` model
- upload flow
- matching an identity document against a user
- deciding whether to persist a final document
- queue jobs tied to your app domain

Install
-------

[](#install)

```
composer require ges/ocr
```

Then install package assets:

```
php artisan ocr:install
```

Or install and migrate immediately:

```
php artisan ocr:install --migrate
```

Optional install flags:

```
php artisan ocr:install --check
php artisan ocr:install --no-config
php artisan ocr:install --no-migrations
php artisan ocr:install --force
```

What this command does:

- publishes `config/ges-ocr.php`
- publishes package migrations
- optionally runs `php artisan migrate`
- optionally runs `php artisan ocr:health`

Health check command:

```
php artisan ocr:health
```

It checks:

- `pdftotext`
- `pdftoppm`
- Ollama connectivity
- configured Ollama models

Configuration
-------------

[](#configuration)

Published config file:

```
config/ges-ocr.php
```

Main environment variables:

```
OLLAMA_BASE_URL=http://host.docker.internal:11434
OLLAMA_TEXT_MODEL=qwen2.5:7b
OLLAMA_VISION_MODEL=qwen2.5vl:7b
OLLAMA_CONNECT_TIMEOUT=10
OLLAMA_TIMEOUT=120
OLLAMA_RETRY_TIMES=2
OLLAMA_RETRY_SLEEP_MS=500
OLLAMA_CLASSIFICATION_CONFIDENCE_THRESHOLD=0.75
OLLAMA_MAX_PAGES=0
GES_OCR_MRZ_OCR_ENABLED=true
GES_OCR_CLEANUP_TEMPORARY_FILES=true
```

`OLLAMA_MAX_PAGES=0` means unlimited pages.

Main config areas:

- `ollama`
- `mrz`
- `processing`

Public API
----------

[](#public-api)

Main service:

```
use Ges\Ocr\DocumentProcessor;

$result = app(DocumentProcessor::class)->processFile(
    path: $absolutePath,
    mimeType: $mimeType,
    originalName: $originalName,
);
```

Returned DTO:

- `originalName`
- `mimeType`
- `path`
- `inputType`
- `documentType`
- `status`
- `pagesCount`
- `rawClassificationJson`
- `rawExtractionJson`
- `normalizedJson`
- `errorMessage`

Main statuses:

- `pending`
- `processing`
- `done`
- `failed`
- `needs_review`

Supported Output Shapes
-----------------------

[](#supported-output-shapes)

### Identity Card

[](#identity-card)

Normalized keys:

- `document_type`
- `civility`
- `first_name`
- `last_name`
- `date_of_birth`
- `place_of_birth`
- `document_number`
- `expiry_date`
- `nationality`
- `sex`
- `street_address`
- `postal_code`
- `city`

### Residence Permit

[](#residence-permit)

Normalized keys:

- `document_type`
- `civility`
- `first_name`
- `last_name`
- `date_of_birth`
- `place_of_birth`
- `document_number`
- `expiry_date`
- `nationality`
- `sex`
- `street_address`
- `postal_code`
- `city`

### KBIS

[](#kbis)

Normalized keys:

- `document_type`
- `company_name`
- `trade_name`
- `legal_form`
- `capital`
- `registration_number`
- `siret`
- `sirene`
- `street_address`
- `postal_code`
- `city`
- `naf_code`
- `registration_date`
- `registry_city`
- `legal_representatives`

Representative shape:

- `entity_type`
- `company_name`
- `legal_form`
- `civility`
- `first_name`
- `last_name`
- `street_address`
- `postal_code`
- `city`
- `registration_number`
- `registry_city`
- `role`

### Acte Propriete

[](#acte-propriete)

Important: this currently means French land-title deed only.

Normalized keys:

- `document_type`
- `cadastral_parcels`
- `owners`

Parcel shape:

- `prefixe`
- `section`
- `numero`
- `street_address`
- `postal_code`
- `city`

Owner shape:

- `entity_type`
- `company_name`
- `civility`
- `first_name`
- `last_name`

Rules:

- owners are acquirers only
- sellers must not be returned as owners
- municipalities and administrations are treated as `company`
- `lieudit` / `leudit` may be used as parcel `street_address`

Package Model
-------------

[](#package-model)

The package provides:

```
Ges\Ocr\Models\DocumentProcessing
```

This model stores:

- source file metadata
- detected input type
- business document type
- status
- raw classification JSON
- raw extraction JSON
- normalized JSON
- error message

If your app wants its own subclass, it can extend the package model.

AI Notes
--------

[](#ai-notes)

If you are an AI agent working in a project using this package:

- Use `DocumentProcessor::processFile(...)` as the main entry point.
- Treat `rawClassificationJson` as model output, not final truth.
- Treat `normalizedJson` as the stable application-facing payload.
- For images and scanned PDFs, the package uses two LLM stages:
    - vision transcription
    - text classification/extraction
- Do not assume `acte_propriete` means generic property deed. In this package it currently means land-title deed only.
- Distinguish `identity_card` from `residence_permit`.
- Use `residence_permit` for French residence permits and `identity_card` for French identity cards.
- For KBIS:
    - `registration_number` is the raw `Immatriculation RCS`
    - `sirene` is 9 digits
    - `siret` is optional and only if explicitly present

Tests
-----

[](#tests)

Package tests live under:

```
tests/Unit
```

Manual OCR fixture tests exist for:

- CIN
- titre de séjour
- KBIS
- land-title deeds

They are gated by:

```
RUN_MANUAL_OCR_TESTS=1
```

Current Assumptions
-------------------

[](#current-assumptions)

- documents are French documents
- Ollama is reachable from the Laravel app
- `pdftotext` and `pdftoppm` are available for PDF handling

Non-Goals
---------

[](#non-goals)

This package does not currently provide:

- user/document matching workflow
- approval workflow
- final accepted document persistence
- domain-specific queue orchestration
- UI components

Those belong in the consuming application.

###  Health Score

36

—

LowBetter than 82% of packages

Maintenance90

Actively maintained with recent releases

Popularity2

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity41

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

46d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/5db9dc20af1fe51cbb0234b0bdd1f1f16f3275e78d3f7205d009fb0d694c56b4?d=identicon)[Wadie.elarrim](/maintainers/Wadie.elarrim)

---

Top Contributors

[![wadie-elarrim](https://avatars.githubusercontent.com/u/186934705?v=4)](https://github.com/wadie-elarrim "wadie-elarrim (3 commits)")

---

Tags

laravelOCRollamadocument-processingidentity-documents

###  Code Quality

TestsPest

### Embed Badge

![Health badge](/badges/ges-ocr/health.svg)

```
[![Health](https://phpackages.com/badges/ges-ocr/health.svg)](https://phpackages.com/packages/ges-ocr)
```

###  Alternatives

[tymon/jwt-auth

JSON Web Token Authentication for Laravel and Lumen

11.5k49.1M350](/packages/tymon-jwt-auth)[laravel/cashier

Laravel Cashier provides an expressive, fluent interface to Stripe's subscription billing services.

2.5k25.9M107](/packages/laravel-cashier)[spatie/laravel-responsecache

Speed up a Laravel application by caching the entire response

2.8k8.2M51](/packages/spatie-laravel-responsecache)[laravel/pulse

Laravel Pulse is a real-time application performance monitoring tool and dashboard for your Laravel application.

1.7k12.1M99](/packages/laravel-pulse)[php-open-source-saver/jwt-auth

JSON Web Token Authentication for Laravel and Lumen

8359.8M53](/packages/php-open-source-saver-jwt-auth)[laravel/cashier-paddle

Cashier Paddle provides an expressive, fluent interface to Paddle's subscription billing services.

264778.4k3](/packages/laravel-cashier-paddle)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
