PHPackages                             azaharizaman/nexus-data-processor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. azaharizaman/nexus-data-processor

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

azaharizaman/nexus-data-processor
=================================

Nexus DataProcessor Package - Contracts for OCR, ETL, and document processing

v0.1.0-alpha1(1mo ago)00MITPHPPHP ^8.3

Since May 5Pushed 1mo agoCompare

[ Source](https://github.com/azaharizaman/nexus-data-processor)[ Packagist](https://packagist.org/packages/azaharizaman/nexus-data-processor)[ RSS](/packages/azaharizaman-nexus-data-processor/feed)WikiDiscussions main Synced 1w ago

READMEChangelogDependenciesVersions (2)Used By (0)

Nexus\\DataProcessor
====================

[](#nexusdataprocessor)

Framework-agnostic contracts for OCR, ETL, and document processing capabilities.

Purpose
-------

[](#purpose)

The DataProcessor package provides **interface-only** contracts for specialized data processing tasks. This is a pure interface package - all concrete implementations must be provided in the application layer (apps/Atomy) due to vendor SDK dependencies.

Key Features
------------

[](#key-features)

- **OCR/Document Recognition**: Extract structured data from images and PDFs
- **Document Classification**: Identify document types (invoice, receipt, contract, ID)
- **Data Transformation**: Format conversion and normalization
- **Batch Processing**: High-volume document processing queues
- **Multi-Language Support**: Process documents in various languages
- **Confidence Scoring**: Validation thresholds for extracted data

Architecture
------------

[](#architecture)

### Contracts (Interfaces)

[](#contracts-interfaces)

- `DocumentRecognizerInterface` - OCR service contract
- `DocumentParserInterface` - Structured data extraction
- `DocumentClassifierInterface` - Document type identification
- `DataTransformerInterface` - Data format conversions
- `DataValidatorInterface` - Extracted data validation
- `BatchProcessorInterface` - Bulk document processing

### Value Objects

[](#value-objects)

- `ProcessingResult` - OCR output with confidence scores
- `DocumentMetadata` - Document properties (type, size, MIME)
- `ExtractionConfidence` - Confidence score (0-100%)

### No Concrete Implementations

[](#no-concrete-implementations)

This package provides ONLY contracts. Vendor-specific implementations (Azure Cognitive Services, AWS Textract, Google Vision API) must be created in the application layer.

Supported Vendors (Application Layer)
-------------------------------------

[](#supported-vendors-application-layer)

Recommended OCR vendors for implementation in `apps/Atomy`:

- **Azure Cognitive Services** - Form Recognizer, OCR
- **AWS Textract** - Document analysis, forms, tables
- **Google Cloud Vision API** - OCR, label detection
- **Tesseract OCR** - Open-source (lower accuracy)

Usage Example
-------------

[](#usage-example)

```
// In application layer (Atomy), inject DocumentRecognizerInterface
use Nexus\DataProcessor\Contracts\DocumentRecognizerInterface;

public function __construct(
    private readonly DocumentRecognizerInterface $ocr
) {}

public function processInvoice(string $filePath): array
{
    $result = $this->ocr->recognizeDocument($filePath, 'invoice');

    if ($result->getConfidence() < 80) {
        // Queue for manual review
        $this->queueForReview($result);
    }

    return $result->getExtractedData();
}
```

Integration
-----------

[](#integration)

This package is consumed by:

- `Nexus\Payable` - for vendor bill OCR processing
- `Nexus\Receivable` - for customer document processing
- `Nexus\Hrm` - for employee document verification
- `Nexus\Procurement` - for PO/GR document scanning

This package integrates with:

- `Nexus\Storage` - for document archiving (REQUIRED)
- `Nexus\AuditLogger` - for processing audit trails (REQUIRED)
- `Nexus\Notifier` - for processing completion notifications (REQUIRED)

Performance Requirements
------------------------

[](#performance-requirements)

- OCR processing: &lt; 10s per document (single page) via async queue
- Batch processing: 100 documents per hour minimum
- Image preprocessing: &lt; 2s per document
- Document classification: &lt; 1s per document

Small to Enterprise Scale
-------------------------

[](#small-to-enterprise-scale)

- **Small business**: Basic OCR for common document types (&lt; 100 docs/month)
- **Medium business**: Advanced OCR with field mapping and validation (100-1000 docs/month)
- **Large enterprise**: ML-powered OCR with continuous learning (1000+ docs/day)

Vendor Implementation Example
-----------------------------

[](#vendor-implementation-example)

```
// In apps/Atomy/app/Services/AzureOcrAdapter.php
namespace App\Services;

use Nexus\DataProcessor\Contracts\DocumentRecognizerInterface;
use Azure\AI\FormRecognizer\FormRecognizerClient;

final class AzureOcrAdapter implements DocumentRecognizerInterface
{
    public function __construct(
        private readonly FormRecognizerClient $client
    ) {}

    public function recognizeDocument(string $filePath, string $documentType): ProcessingResult
    {
        // Azure-specific implementation
        $response = $this->client->beginRecognizeCustomFormsFromUrl($filePath);

        // Transform Azure response to ProcessingResult
        return new ProcessingResult(
            extractedData: $this->transformAzureData($response),
            confidence: $this->calculateConfidence($response)
        );
    }
}
```

Security Considerations
-----------------------

[](#security-considerations)

- Encrypt documents at rest and in transit
- Sanitize extracted data to prevent injection attacks
- Support GDPR compliance with document retention and deletion policies
- Log all document access and processing events
- Implement rate limiting for OCR API usage

---

Documentation
-------------

[](#documentation)

### Quick Start

[](#quick-start)

- **[Getting Started Guide](docs/getting-started.md)** - Installation, prerequisites, and your first OCR integration
- **[API Reference](docs/api-reference.md)** - Complete interface, value object, and exception documentation
- **[Integration Guide](docs/integration-guide.md)** - Laravel and Symfony integration with vendor adapter examples

### Code Examples

[](#code-examples)

- **[Basic Usage](docs/examples/basic-usage.php)** - Simple OCR processing with confidence validation
- **[Advanced Usage](docs/examples/advanced-usage.php)** - Multi-vendor fallback, batch processing, custom validation

### Package Metadata

[](#package-metadata)

- **[Requirements](REQUIREMENTS.md)** - Detailed package requirements (24 requirements, 87.5% complete)
- **[Implementation Summary](IMPLEMENTATION_SUMMARY.md)** - Development progress, metrics, and design decisions
- **[Test Suite Summary](TEST_SUITE_SUMMARY.md)** - Testing strategy for contract-only packages
- **[Valuation Matrix](VALUATION_MATRIX.md)** - Package value assessment ($475,000 estimated value)

### Architecture

[](#architecture-1)

- **Package Type:** Pure contract package (interface-only)
- **Lines of Code:** 196 lines across 5 files
- **Dependencies:** Zero external dependencies (PHP 8.3+ only)
- **Framework:** Framework-agnostic

---

License
-------

[](#license)

MIT License - see [LICENSE](LICENSE) file for details.

###  Health Score

35

—

LowBetter than 77% of packages

Maintenance93

Actively maintained with recent releases

Popularity0

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity34

Early-stage or recently created project

 Bus Factor1

Top contributor holds 76.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

36d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/117408?v=4)[Azahari Zaman](/maintainers/azaharizaman)[@azaharizaman](https://github.com/azaharizaman)

---

Top Contributors

[![azaharizaman](https://avatars.githubusercontent.com/u/117408?v=4)](https://github.com/azaharizaman "azaharizaman (460 commits)")[![Copilot](https://avatars.githubusercontent.com/in/1143301?v=4)](https://github.com/Copilot "Copilot (139 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (2 commits)")

### Embed Badge

![Health badge](/badges/azaharizaman-nexus-data-processor/health.svg)

```
[![Health](https://phpackages.com/badges/azaharizaman-nexus-data-processor/health.svg)](https://phpackages.com/packages/azaharizaman-nexus-data-processor)
```

###  Alternatives

[mck89/peast

Peast is PHP library that generates AST for JavaScript code

19037.7M41](/packages/mck89-peast)[karriere/json-decoder

JsonDecoder implementation that allows you to convert your JSON data into PHP class objects

141439.4k12](/packages/karriere-json-decoder)[sauladam/shipment-tracker

Parses tracking information for several carriers, like UPS, USPS, DHL and GLS by simply scraping the data. No need for any kind of API access.

9642.0k](/packages/sauladam-shipment-tracker)[jstewmc/rtf

Read and write Rich Text Format (RTF) documents with PHP

45143.1k6](/packages/jstewmc-rtf)[json-mapper/laravel-package

The JsonMapper package for Laravel

25188.9k3](/packages/json-mapper-laravel-package)[jamesmoss/toml

A parser for TOML implemented in PHP.

3231.7k15](/packages/jamesmoss-toml)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
