PHPackages                             daniel-jorg-schuppelius/php-pdf-toolkit - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. daniel-jorg-schuppelius/php-pdf-toolkit

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

daniel-jorg-schuppelius/php-pdf-toolkit
=======================================

PHP 8.2+ library for PDF text extraction with automatic reader selection. Supports embedded text and scanned documents via OCR.

v0.8.0.1(1mo ago)01661AGPL-3.0-or-laterPHPPHP &gt;=8.1 &lt;8.6CI passing

Since Jan 22Pushed 1mo agoCompare

[ Source](https://github.com/Daniel-Jorg-Schuppelius/php-pdf-toolkit)[ Packagist](https://packagist.org/packages/daniel-jorg-schuppelius/php-pdf-toolkit)[ RSS](/packages/daniel-jorg-schuppelius-php-pdf-toolkit/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (28)Used By (1)

PHP PDF Toolkit
===============

[](#php-pdf-toolkit)

A PHP 8.2+ library for extracting text from PDF documents and creating PDFs with intelligent reader/writer selection.

Features
--------

[](#features)

### PDF Text Extraction (Readers)

[](#pdf-text-extraction-readers)

- **Multiple PDF Readers** with automatic fallback:

    - `pdftotext` (poppler-utils) - Fast extraction for text-based PDFs
    - `PDFBox` (Apache, Java) - Better handling of complex layouts
    - `Tesseract` - OCR for scanned documents
    - `OCRmyPDF` - High-quality OCR with preprocessing
- **Automatic Reader Selection** - Tries text extraction first, falls back to OCR if needed
- **Caching** - Extracted text is cached to avoid redundant processing
- **Language Support** - Configurable OCR languages (German + English by default)

### PDF Creation (Writers)

[](#pdf-creation-writers)

- **Multiple PDF Writers** with automatic fallback:

    - `Dompdf` - HTML to PDF conversion (pure PHP, LGPL)
    - `TCPDF` - Programmatic PDF creation (pure PHP, LGPL)
    - `wkhtmltopdf` - High-quality HTML rendering via WebKit (external tool)
- **Automatic Writer Selection** - Uses the first available writer by priority
- **Multiple Input Formats** - HTML, plain text, or HTML files
- **Metadata Support** - Title, author, subject for generated PDFs

Requirements
------------

[](#requirements)

- PHP 8.2+

### For Text Extraction (at least one)

[](#for-text-extraction-at-least-one)

- `pdftotext` (`apt install poppler-utils`)
- `tesseract-ocr` (`apt install tesseract-ocr tesseract-ocr-deu`)
- `ocrmypdf` (`apt install ocrmypdf`)
- Java + PDFBox JAR (optional)

### For PDF Creation (at least one)

[](#for-pdf-creation-at-least-one)

- `dompdf/dompdf` (`composer require dompdf/dompdf`)
- `tecnickcom/tcpdf` (`composer require tecnickcom/tcpdf`)
- `wkhtmltopdf` (`apt install wkhtmltopdf`)

Installation
------------

[](#installation)

### Via Composer

[](#via-composer)

```
composer require daniel-jorg-schuppelius/php-pdf-toolkit
```

### Clone with Submodules

[](#clone-with-submodules)

```
git clone --recurse-submodules https://github.com/Daniel-Jorg-Schuppelius/php-pdf-toolkit.git
```

Or if already cloned:

```
git submodule update --init
```

### Install System Dependencies

[](#install-system-dependencies)

Use the included install script for system dependencies:

```
# Install PDF extraction tools (poppler-utils, tesseract, ocrmypdf)
sudo ./installscript/install-dependencies.sh
```

### Install PHP Libraries for PDF Creation

[](#install-php-libraries-for-pdf-creation)

```
# Dompdf (recommended, pure PHP)
composer require dompdf/dompdf

# Or TCPDF (alternative, pure PHP)
composer require tecnickcom/tcpdf

# Or wkhtmltopdf (external tool, best quality)
sudo apt install wkhtmltopdf
```

Usage
-----

[](#usage)

### Text Extraction

[](#text-extraction)

```
use PDFToolkit\Registries\PDFReaderRegistry;

$registry = PDFReaderRegistry::getInstance();
$document = $registry->extractText('/path/to/file.pdf', [
    'language' => 'deu+eng'
]);

if ($document->hasText()) {
    echo $document->text;
    echo "Reader: " . $document->reader;
    echo "Scanned: " . ($document->isScanned ? 'Yes' : 'No');
}

// Ohne OCR-Fallback (schneller für Text-PDFs wie Kontoauszüge)
$document = $registry->extractTextOnly('/path/to/bankstatement.pdf', [
    'layout' => false  // Ohne Layout-Formatierung für bessere Regex-Extraktion
]);
```

### PDF Creation

[](#pdf-creation)

```
use PDFToolkit\Registries\PDFWriterRegistry;
use PDFToolkit\Entities\PDFContent;

$registry = PDFWriterRegistry::getInstance();

// Simple: HTML to PDF
$registry->htmlToPdf('Hello WorldContent', '/path/to/output.pdf');

// Simple: Text to PDF
$registry->textToPdf('Plain text content', '/path/to/output.pdf');

// Advanced: With metadata and options
$content = PDFContent::fromHtml($html, [
    'title' => 'My Document',
    'author' => 'John Doe',
    'subject' => 'Example PDF'
]);

$registry->createPdf($content, '/path/to/output.pdf', [
    'paper_size' => 'A4',
    'orientation' => 'portrait',
    'margins' => ['top' => 15, 'bottom' => 15, 'left' => 15, 'right' => 15]
]);

// Use specific writer
$registry->createPdf($content, '/path/to/output.pdf', [], 'dompdf');

// Get PDF as string (for download/streaming)
$pdfString = $registry->createPdfString($content);
header('Content-Type: application/pdf');
echo $pdfString;
```

### Check Available Tools

[](#check-available-tools)

```
// Readers
$readerRegistry = PDFReaderRegistry::getInstance();
foreach ($readerRegistry->getReaderInfo() as $info) {
    echo "{$info['name']}: " . ($info['available'] ? '✓' : '✗') . "\n";
}
}

// Writers
$writerRegistry = PDFWriterRegistry::getInstance();
foreach ($writerRegistry->getWriterInfo() as $info) {
    echo "{$info['name']}: " . ($info['available'] ? '✓' : '✗') . "\n";
}
```

Configuration
-------------

[](#configuration)

Tool paths can be configured in `config/executables.json`:

```
{
    "shellExecutables": {
        "pdftotext": {
            "path": "/usr/bin/pdftotext",
            "required": true
        },
        "wkhtmltopdf": {
            "path": "/usr/bin/wkhtmltopdf",
            "required": false
        }
    }
}
```

Architecture
------------

[](#architecture)

```
PDFReaderRegistry → [Readers by Priority] → PDFDocument
                          ↓
              PdftotextReader (10)     # Fast, for text PDFs
              PdfboxReader (30)        # Complex layouts
              TesseractReader (50)     # OCR for scans
              OcrmypdfReader (60)      # Best OCR quality

PDFWriterRegistry → [Writers by Priority] → PDF File
                          ↓
              DompdfWriter (10)        # HTML→PDF, pure PHP
              TcpdfWriter (20)         # Programmatic, pure PHP
              WkhtmltopdfWriter (30)   # Best HTML rendering

```

License
-------

[](#license)

AGPL-3.0-or-later - see [LICENSE](LICENSE) file.

###  Health Score

44

—

FairBetter than 92% of packages

Maintenance90

Actively maintained with recent releases

Popularity14

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity52

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~2 days

Total

27

Last Release

52d ago

PHP version history (2 changes)v0.1PHP ^8.2 || ^8.3 || ^8.4

v0.8.0.1PHP &gt;=8.1 &lt;8.6

### Community

Maintainers

![](https://www.gravatar.com/avatar/9d648df75b8ca254b14377de6aa7c37daff5bc21e9e8742ef7687c7091c7bc94?d=identicon)[l0gtr0n](/maintainers/l0gtr0n)

---

Top Contributors

[![DSchuppelius](https://avatars.githubusercontent.com/u/19145058?v=4)](https://github.com/DSchuppelius "DSchuppelius (2 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/daniel-jorg-schuppelius-php-pdf-toolkit/health.svg)

```
[![Health](https://phpackages.com/badges/daniel-jorg-schuppelius-php-pdf-toolkit/health.svg)](https://phpackages.com/packages/daniel-jorg-schuppelius-php-pdf-toolkit)
```

###  Alternatives

[barryvdh/laravel-dompdf

A DOMPDF Wrapper for Laravel

7.3k87.6M278](/packages/barryvdh-laravel-dompdf)[mpdf/mpdf

PHP library generating PDF files from UTF-8 encoded HTML

4.7k77.1M493](/packages/mpdf-mpdf)[iio/libmergepdf

Library for merging multiple PDFs

40813.6M13](/packages/iio-libmergepdf)[creagia/laravel-sign-pad

Laravel package for of E-Signature with Signature Pad and Digital Certified Sign with TCPDF

54097.2k](/packages/creagia-laravel-sign-pad)[elibyy/tcpdf-laravel

tcpdf support for Laravel 6, 7, 8, 9, 10, 11

3542.7M5](/packages/elibyy-tcpdf-laravel)[lsnepomuceno/laravel-a1-pdf-sign

Sign PDF files with valid x509 certificates

315101.4k](/packages/lsnepomuceno-laravel-a1-pdf-sign)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
