PHPackages                             daniel-jorg-schuppelius/php-pdf-toolkit - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. daniel-jorg-schuppelius/php-pdf-toolkit

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

daniel-jorg-schuppelius/php-pdf-toolkit
=======================================

PHP 8.2+ library for PDF text extraction with automatic reader selection. Supports embedded text and scanned documents via OCR.

v0.13.4(1w ago)0389[1 PRs](https://github.com/Daniel-Jorg-Schuppelius/php-pdf-toolkit/pulls)1AGPL-3.0-or-laterPHPPHP &gt;=8.1 &lt;8.6CI passing

Since Jan 22Pushed 1w ago1 watchersCompare

[ Source](https://github.com/Daniel-Jorg-Schuppelius/php-pdf-toolkit)[ Packagist](https://packagist.org/packages/daniel-jorg-schuppelius/php-pdf-toolkit)[ RSS](/packages/daniel-jorg-schuppelius-php-pdf-toolkit/feed)WikiDiscussions main Synced 2d ago

READMEChangelog (10)Dependencies (23)Versions (46)Used By (1)

PHP PDF Toolkit
===============

[](#php-pdf-toolkit)

A PHP 8.2+ library for extracting text from PDF documents and creating PDFs with intelligent reader/writer selection.

Features
--------

[](#features)

### PDF Text Extraction (Readers)

[](#pdf-text-extraction-readers)

- **Multiple PDF Readers** with automatic fallback:

    - `pdftotext` (poppler-utils) - Fast extraction for text-based PDFs
    - `PDFBox` (Apache, Java) - Better handling of complex layouts
    - `Tesseract` - OCR for scanned documents
    - `OCRmyPDF` - High-quality OCR with preprocessing
- **Automatic Reader Selection** - Tries text extraction first, falls back to OCR if needed
- **Caching** - Extracted text is cached to avoid redundant processing
- **Language Support** - Configurable OCR languages (German + English by default)

### PDF Creation (Writers)

[](#pdf-creation-writers)

- **Multiple PDF Writers** with automatic fallback:

    - `Dompdf` - HTML to PDF conversion (pure PHP, LGPL)
    - `TCPDF` - Programmatic PDF creation (pure PHP, LGPL)
    - `wkhtmltopdf` - High-quality HTML rendering via WebKit (external tool)
- **Automatic Writer Selection** - Uses the first available writer by priority
- **Multiple Input Formats** - HTML, plain text, or HTML files
- **Metadata Support** - Title, author, subject for generated PDFs

Requirements
------------

[](#requirements)

- PHP 8.2+

### For Text Extraction (at least one)

[](#for-text-extraction-at-least-one)

- `pdftotext` (`apt install poppler-utils`)
- `tesseract-ocr` (`apt install tesseract-ocr tesseract-ocr-deu`)
- `ocrmypdf` (`apt install ocrmypdf`)
- Java + PDFBox JAR (optional)

### For PDF Creation (at least one)

[](#for-pdf-creation-at-least-one)

- `dompdf/dompdf` (`composer require dompdf/dompdf`)
- `tecnickcom/tcpdf` (`composer require tecnickcom/tcpdf`)
- `wkhtmltopdf` (`apt install wkhtmltopdf`)

Installation
------------

[](#installation)

### Via Composer

[](#via-composer)

```
composer require daniel-jorg-schuppelius/php-pdf-toolkit
```

### Clone with Submodules

[](#clone-with-submodules)

```
git clone --recurse-submodules https://github.com/Daniel-Jorg-Schuppelius/php-pdf-toolkit.git
```

Or if already cloned:

```
git submodule update --init
```

### Install System Dependencies

[](#install-system-dependencies)

Use the included install script for system dependencies:

```
# Install PDF extraction tools (poppler-utils, tesseract, ocrmypdf)
sudo ./installscript/install-dependencies.sh
```

### Install PHP Libraries for PDF Creation

[](#install-php-libraries-for-pdf-creation)

```
# Dompdf (recommended, pure PHP)
composer require dompdf/dompdf

# Or TCPDF (alternative, pure PHP)
composer require tecnickcom/tcpdf

# Or wkhtmltopdf (external tool, best quality)
sudo apt install wkhtmltopdf
```

Usage
-----

[](#usage)

### Text Extraction

[](#text-extraction)

```
use PDFToolkit\Registries\PDFReaderRegistry;

$registry = PDFReaderRegistry::getInstance();
$document = $registry->extractText('/path/to/file.pdf', [
    'language' => 'deu+eng'
]);

if ($document->hasText()) {
    echo $document->text;
    echo "Reader: " . $document->reader;
    echo "Scanned: " . ($document->isScanned ? 'Yes' : 'No');
}

// Ohne OCR-Fallback (schneller für Text-PDFs wie Kontoauszüge)
$document = $registry->extractTextOnly('/path/to/bankstatement.pdf', [
    'layout' => false  // Ohne Layout-Formatierung für bessere Regex-Extraktion
]);
```

### PDF Creation

[](#pdf-creation)

```
use PDFToolkit\Registries\PDFWriterRegistry;
use PDFToolkit\Entities\PDFContent;

$registry = PDFWriterRegistry::getInstance();

// Simple: HTML to PDF
$registry->htmlToPdf('Hello WorldContent', '/path/to/output.pdf');

// Simple: Text to PDF
$registry->textToPdf('Plain text content', '/path/to/output.pdf');

// Advanced: With metadata and options
$content = PDFContent::fromHtml($html, [
    'title' => 'My Document',
    'author' => 'John Doe',
    'subject' => 'Example PDF'
]);

$registry->createPdf($content, '/path/to/output.pdf', [
    'paper_size' => 'A4',
    'orientation' => 'portrait',
    'margins' => ['top' => 15, 'bottom' => 15, 'left' => 15, 'right' => 15]
]);

// Use specific writer
$registry->createPdf($content, '/path/to/output.pdf', [], 'dompdf');

// Get PDF as string (for download/streaming)
$pdfString = $registry->createPdfString($content);
header('Content-Type: application/pdf');
echo $pdfString;
```

### Check Available Tools

[](#check-available-tools)

```
// Readers
$readerRegistry = PDFReaderRegistry::getInstance();
foreach ($readerRegistry->getReaderInfo() as $info) {
    echo "{$info['name']}: " . ($info['available'] ? '✓' : '✗') . "\n";
}
}

// Writers
$writerRegistry = PDFWriterRegistry::getInstance();
foreach ($writerRegistry->getWriterInfo() as $info) {
    echo "{$info['name']}: " . ($info['available'] ? '✓' : '✗') . "\n";
}
```

Configuration
-------------

[](#configuration)

Tool paths can be configured in `config/executables.json`:

```
{
    "shellExecutables": {
        "pdftotext": {
            "path": "/usr/bin/pdftotext",
            "required": true
        },
        "wkhtmltopdf": {
            "path": "/usr/bin/wkhtmltopdf",
            "required": false
        }
    }
}
```

Architecture
------------

[](#architecture)

```
PDFReaderRegistry → [Readers by Priority] → PDFDocument
                          ↓
              PDFToTextReader (10)     # Fast, for text PDFs
              PDFBoxReader (30)        # Complex layouts
              TesseractReader (50)     # OCR for scans
              OcrMyPDFReader (60)      # Best OCR quality

PDFWriterRegistry → [Writers by Priority] → PDF File
                          ↓
              DompdfWriter (10)        # HTML→PDF, pure PHP
              TcpdfWriter (20)         # Programmatic, pure PHP
              WkhtmltopdfWriter (30)   # Best HTML rendering

```

License
-------

[](#license)

AGPL-3.0-or-later - see [LICENSE](LICENSE) file.

###  Health Score

48

—

FairBetter than 93% of packages

Maintenance98

Actively maintained with recent releases

Popularity17

Limited adoption so far

Community12

Small or concentrated contributor base

Maturity55

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~3 days

Total

44

Last Release

8d ago

PHP version history (2 changes)v0.1PHP ^8.2 || ^8.3 || ^8.4

v0.8.0.1PHP &gt;=8.1 &lt;8.6

### Community

Maintainers

![](https://www.gravatar.com/avatar/9d648df75b8ca254b14377de6aa7c37daff5bc21e9e8742ef7687c7091c7bc94?d=identicon)[l0gtr0n](/maintainers/l0gtr0n)

---

Top Contributors

[![DSchuppelius](https://avatars.githubusercontent.com/u/19145058?v=4)](https://github.com/DSchuppelius "DSchuppelius (2 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StyleLaravel Pint

Type Coverage Yes

### Embed Badge

![Health badge](/badges/daniel-jorg-schuppelius-php-pdf-toolkit/health.svg)

```
[![Health](https://phpackages.com/badges/daniel-jorg-schuppelius-php-pdf-toolkit/health.svg)](https://phpackages.com/packages/daniel-jorg-schuppelius-php-pdf-toolkit)
```

###  Alternatives

[matomo/matomo

Matomo is the leading Free/Libre open analytics platform

21.7k38.9k](/packages/matomo-matomo)[shopware/platform

The Shopware e-commerce core

3.4k1.5M3](/packages/shopware-platform)[barryvdh/laravel-dompdf

A DOMPDF Wrapper for Laravel

7.4k99.4M384](/packages/barryvdh-laravel-dompdf)[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

751291.4k43](/packages/civicrm-civicrm-core)[mpdf/mpdf

PHP library generating PDF files from UTF-8 encoded HTML

4.7k83.4M563](/packages/mpdf-mpdf)[shopware/core

Shopware platform is the core for all Shopware ecommerce products.

585.6M574](/packages/shopware-core)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
