PHPackages                             kreuzberg/kreuzberg - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. kreuzberg/kreuzberg

ActivePhp-ext[PDF &amp; Document Generation](/categories/documents)

kreuzberg/kreuzberg
===================

High-performance document intelligence for PHP. Extract text, metadata, and structured information from PDFs, Office documents, images, and 75 formats. Powered by Rust core for 10-50x speed improvements.

4.6.3(1mo ago)7.2k103↑25%348[28 issues](https://github.com/kreuzberg-dev/kreuzberg/issues)[6 PRs](https://github.com/kreuzberg-dev/kreuzberg/pulls)MITRustPHP ^8.4CI passing

Since Dec 29Pushed 1mo ago26 watchersCompare

[ Source](https://github.com/kreuzberg-dev/kreuzberg)[ Packagist](https://packagist.org/packages/kreuzberg/kreuzberg)[ Docs](https://kreuzberg.dev)[ RSS](/packages/kreuzberg-kreuzberg/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (6)Versions (86)Used By (0)

Kreuzberg
=========

[](#kreuzberg)

 [ ![Rust](https://camo.githubusercontent.com/52d16ca8a66ca3cc8957f3107ae26c779e76b0b59c70c1dd59fd7f57155a323d/68747470733a2f2f696d672e736869656c64732e696f2f6372617465732f762f6b7265757a626572673f6c6162656c3d5275737426636f6c6f723d303037656336) ](https://crates.io/crates/kreuzberg) [ ![Elixir](https://camo.githubusercontent.com/5f0a24f2971df64fbec43e1c719b1909b166947fdee3a9ef445c5fb2aa1420d1/68747470733a2f2f696d672e736869656c64732e696f2f686578706d2f762f6b7265757a626572673f6c6162656c3d456c6978697226636f6c6f723d303037656336) ](https://hex.pm/packages/kreuzberg) [ ![Python](https://camo.githubusercontent.com/a9970e6b3ebf21cde00d483c7da669d2b2032ff3728e18cd33449ae3e8e42fb6/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6b7265757a626572673f6c6162656c3d507974686f6e26636f6c6f723d303037656336) ](https://pypi.org/project/kreuzberg/) [ ![Node.js](https://camo.githubusercontent.com/42005c205bdf6014b6344c5d1fbebab7a6a98b76293b65fff927b1bbd76ca71f/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f406b7265757a626572672f6e6f64653f6c6162656c3d4e6f64652e6a7326636f6c6f723d303037656336) ](https://www.npmjs.com/package/@kreuzberg/node) [ ![WASM](https://camo.githubusercontent.com/0f8dca698a70816f6e87b6929b33381f60785aabc5eccf80de5a4d58b452634b/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f406b7265757a626572672f7761736d3f6c6162656c3d5741534d26636f6c6f723d303037656336) ](https://www.npmjs.com/package/@kreuzberg/wasm) [ ![Java](https://camo.githubusercontent.com/f5434a1a81a5d9648d818da9aaaf063bec21d81dd4f2ae8cac940a0b7e66320c/68747470733a2f2f696d672e736869656c64732e696f2f6d6176656e2d63656e7472616c2f762f6465762e6b7265757a626572672f6b7265757a626572673f6c6162656c3d4a61766126636f6c6f723d303037656336) ](https://central.sonatype.com/artifact/dev.kreuzberg/kreuzberg) [ ![Go](https://camo.githubusercontent.com/0fb958ace6c230e9e46301e13671099012d9ec2b7a08ff9898ba6f709d5520d0/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f6b7265757a626572672d6465762f6b7265757a626572673f6c6162656c3d476f26636f6c6f723d3030376563362666696c7465723d76342e362e33) ](https://github.com/kreuzberg-dev/kreuzberg/releases) [ ![C#](https://camo.githubusercontent.com/8cd75b3606f325f46375984562a1109b72c94c2604691611e160249e0c7c136a/68747470733a2f2f696d672e736869656c64732e696f2f6e756765742f762f4b7265757a626572673f6c6162656c3d4325323326636f6c6f723d303037656336) ](https://www.nuget.org/packages/Kreuzberg/) [ ![PHP](https://camo.githubusercontent.com/d8a344667fce3a80927c95d45796dd1c0e287f8199b80785124b50666393f5ca/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6b7265757a626572672f6b7265757a626572673f6c6162656c3d50485026636f6c6f723d303037656336) ](https://packagist.org/packages/kreuzberg/kreuzberg) [ ![Ruby](https://camo.githubusercontent.com/706b2deda3bfbf0da7168610a4e9565d8037c2baf379e03c72fcad01da65f137/68747470733a2f2f696d672e736869656c64732e696f2f67656d2f762f6b7265757a626572673f6c6162656c3d5275627926636f6c6f723d303037656336) ](https://rubygems.org/gems/kreuzberg) [ ![R](https://camo.githubusercontent.com/28a5b0d74da88c67d934b98ddedb89db6865b00fca445485d339835f4917ba82/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f522d6b7265757a626572672d303037656336) ](https://kreuzberg-dev.r-universe.dev/kreuzberg) [ ![Docker](https://camo.githubusercontent.com/32cec0d80520c2e9872e7c80e7f3d5d6b916c41998b25dddb1289399cbc8ba0c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b65722d3030376563363f6c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465) ](https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg) [ ![C](https://camo.githubusercontent.com/41b9c58c3810775402965a3a7652da9832ea83391c1ff255081e5551900db8f2/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f432d4646492d303037656336) ](https://github.com/kreuzberg-dev/kreuzberg/releases) [ ![License](https://camo.githubusercontent.com/0cd4d42d83d2124c29737dd1519425c87c4b465016ef0cee20cbcb8ef420c0e0/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d303037656336) ](https://github.com/kreuzberg-dev/kreuzberg/blob/main/LICENSE) [ ![Documentation](https://camo.githubusercontent.com/8079d569578159b460ce74ad8b30554ec9b0d41b1f3485c9b24ebc154a2031fb/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f646f63732d6b7265757a626572672e6465762d303037656336) ](https://docs.kreuzberg.dev) [ ![Live Demo](https://camo.githubusercontent.com/7aa98168c51ed6f6ec8777072227117b6871f4e5cfef4627122257fc754ffbd8/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f2545322539362542362545462542382538465f4c6976655f44656d6f2d303037656336) ](https://docs.kreuzberg.dev/demo.html) [ ![Hugging Face](https://camo.githubusercontent.com/c0c58f30774d8fa38a3da8ac0022f5f5b8092d50612e017657f7033d796e9760/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f2546302539462541342539375f48756767696e675f466163652d303037656336) ](https://huggingface.co/Kreuzberg)

[![Linkedin- Banner](https://private-user-images.githubusercontent.com/247880403/531720042-1b6c6ad7-3b6d-4171-b1c9-f2026cc9deb8.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzQ4OTY3MzMsIm5iZiI6MTc3NDg5NjQzMywicGF0aCI6Ii8yNDc4ODA0MDMvNTMxNzIwMDQyLTFiNmM2YWQ3LTNiNmQtNDE3MS1iMWM5LWYyMDI2Y2M5ZGViOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwMzMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDMzMFQxODQ3MTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kZTg2ODYxNjk0ZGFlYmNmMTNlZWNmMDIyNjNlMDI2YmZjNjJiNWU2MDg2OTU3ZWEzOWUxNzUyYTgxODAyMzNjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.8tX2lvIMrYWEdJ6pbaPiEQR3CGRO8Ajz5KTCvnraftU)](https://private-user-images.githubusercontent.com/247880403/531720042-1b6c6ad7-3b6d-4171-b1c9-f2026cc9deb8.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3NzQ4OTY3MzMsIm5iZiI6MTc3NDg5NjQzMywicGF0aCI6Ii8yNDc4ODA0MDMvNTMxNzIwMDQyLTFiNmM2YWQ3LTNiNmQtNDE3MS1iMWM5LWYyMDI2Y2M5ZGViOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjYwMzMwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI2MDMzMFQxODQ3MTNaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1kZTg2ODYxNjk0ZGFlYmNmMTNlZWNmMDIyNjNlMDI2YmZjNjJiNWU2MDg2OTU3ZWEzOWUxNzUyYTgxODAyMzNjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.8tX2lvIMrYWEdJ6pbaPiEQR3CGRO8Ajz5KTCvnraftU) [ ![Discord](https://camo.githubusercontent.com/f66b90b5263482521c9ff4e4cae96688b20f4670a56a633be6a10c633091b757/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d4a6f696e2532306f7572253230636f6d6d756e6974792d3732383964613f6c6f676f3d646973636f7264266c6f676f436f6c6f723d7768697465) ](https://discord.gg/xt9WY3GnKR)

Extract text and metadata from a wide range of file formats (91+), generate embeddings and post-process at native speeds without needing a GPU.

Key Features
------------

[](#key-features)

- **Extensible architecture** – Plugin system for custom OCR backends, validators, post-processors, and document extractors
- **Polyglot** – Native bindings for Rust, Python, TypeScript/Node.js, Ruby, Go, Java, C#, PHP, Elixir, R, and C
- **91+ file formats** – PDF, Office documents, images, HTML, XML, emails, archives, academic formats across 8 categories
- **OCR support** – Tesseract (all bindings, including Tesseract-WASM for browsers), PaddleOCR (all native bindings), EasyOCR (Python), extensible via plugin API
- **High performance** – Rust core with native PDFium, SIMD optimizations and full parallelism
- **Flexible deployment** – Use as library, CLI tool, REST API server, or MCP server
- **Memory efficient** – Streaming parsers for multi-GB files

**[Complete Documentation](https://kreuzberg.dev/)** | **[Live Demo](https://docs.kreuzberg.dev/demo.html)** | **[Installation Guides](#installation)**

Installation
------------

[](#installation)

Each language binding provides comprehensive documentation with examples and best practices. Choose your platform to get started:

**Scripting Languages:**

- **[Python](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/python)** – PyPI package, async/sync APIs, OCR backends (Tesseract, PaddleOCR, EasyOCR)
- **[Ruby](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/ruby)** – RubyGems package, idiomatic Ruby API, native bindings
- **[PHP](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/php)** – Composer package, modern PHP 8.4+ support, type-safe API, async extraction
- **[Elixir](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/elixir)** – Hex package, OTP integration, concurrent processing
- **[R](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/r)** – r-universe package, idiomatic R API, extendr bindings

**JavaScript/TypeScript:**

- **[@kreuzberg/node](https://github.com/kreuzberg-dev/kreuzberg/tree/main/crates/kreuzberg-node)** – Native NAPI-RS bindings for Node.js/Bun, fastest performance
- **[@kreuzberg/wasm](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/typescript)** – WebAssembly for browsers/Deno/Cloudflare Workers, full feature parity (PDF, Excel, OCR, archives)

**Compiled Languages:**

- **[Go](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/go)** – Go module with FFI bindings, context-aware async
- **[Java](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/java)** – Maven Central, Foreign Function &amp; Memory API
- **[C#](https://github.com/kreuzberg-dev/kreuzberg/tree/main/packages/csharp)** – NuGet package, .NET 6.0+, full async/await support

**Native:**

- **[Rust](https://github.com/kreuzberg-dev/kreuzberg/tree/main/crates/kreuzberg)** – Core library, flexible feature flags, zero-copy APIs
- **[C (FFI)](https://github.com/kreuzberg-dev/kreuzberg/tree/main/crates/kreuzberg-ffi)** – C header + shared library, pkg-config/CMake support, cross-platform

**Containers:**

- **[Docker](https://docs.kreuzberg.dev/guides/docker/)** – Official images with API, CLI, and MCP server modes (Core: ~1.0-1.3GB, Full: ~1.0-1.3GB with OCR + legacy format support)

**Command-Line:**

- **[CLI](https://docs.kreuzberg.dev/cli/usage/)** – Cross-platform binary, batch processing, MCP server mode

> All language bindings include precompiled binaries for both x86\_64 and aarch64 architectures on Linux and macOS.

Platform Support
----------------

[](#platform-support)

Complete architecture coverage across all language bindings:

LanguageLinux x86\_64Linux aarch64macOS ARM64Windows x64Python✅✅✅✅Node.js✅✅✅✅WASM✅✅✅✅Ruby✅✅✅-R✅✅✅✅Elixir✅✅✅✅Go✅✅✅✅Java✅✅✅✅C#✅✅✅✅PHP✅✅✅✅Rust✅✅✅✅C (FFI)✅✅✅✅CLI✅✅✅✅Docker✅✅✅-**Note**: ✅ = Precompiled binaries available with instant installation. WASM runs in any environment with WebAssembly support (browsers, Deno, Bun, Cloudflare Workers). All platforms are tested in CI. macOS support is Apple Silicon only.

### Embeddings Support (Optional)

[](#embeddings-support-optional)

To use embeddings functionality:

1. **Install ONNX Runtime 1.24+**:

    - Linux: Download from [ONNX Runtime releases](https://github.com/microsoft/onnxruntime/releases) (Debian packages may have older versions)
    - macOS: `brew install onnxruntime`
    - Windows: Download from [ONNX Runtime releases](https://github.com/microsoft/onnxruntime/releases)
2. Use embeddings in your code - see [Embeddings Guide](https://docs.kreuzberg.dev/features/#embeddings)

**Note:** Kreuzberg requires ONNX Runtime version 1.24+ for embeddings. All other Kreuzberg features work without ONNX Runtime.

Supported Formats
-----------------

[](#supported-formats)

91+ file formats across 8 major categories with intelligent format detection and comprehensive metadata extraction.

### Office Documents

[](#office-documents)

CategoryFormatsCapabilities**Word Processing**`.docx`, `.docm`, `.dotx`, `.dotm`, `.dot`, `.odt`, `.pages`Full text, tables, lists, images, metadata, styles**Spreadsheets**`.xlsx`, `.xlsm`, `.xlsb`, `.xls`, `.xla`, `.xlam`, `.xltm`, `.xltx`, `.xlt`, `.ods`, `.numbers`Sheet data, formulas, cell metadata, charts**Presentations**`.pptx`, `.pptm`, `.ppsx`, `.potx`, `.potm`, `.pot`, `.key`Slides, speaker notes, images, metadata**PDF**`.pdf`Text, tables, images, metadata, OCR support**eBooks**`.epub`, `.fb2`Chapters, metadata, embedded resources**Database**`.dbf`Table data extraction, field type support**Hangul**`.hwp`, `.hwpx`Korean document format, text extraction### Images (OCR-Enabled)

[](#images-ocr-enabled)

CategoryFormatsFeatures**Raster**`.png`, `.jpg`, `.jpeg`, `.gif`, `.webp`, `.bmp`, `.tiff`, `.tif`OCR, table detection, EXIF metadata, dimensions, color space**Advanced**`.jp2`, `.jpx`, `.jpm`, `.mj2`, `.jbig2`, `.jb2`, `.pnm`, `.pbm`, `.pgm`, `.ppm`Pure Rust decoders (JPEG 2000, JBIG2), OCR, table detection**Vector**`.svg`DOM parsing, embedded text, graphics metadata### Web &amp; Data

[](#web--data)

CategoryFormatsFeatures**Markup**`.html`, `.htm`, `.xhtml`, `.xml`, `.svg`DOM parsing, metadata (Open Graph, Twitter Card), link extraction**Structured Data**`.json`, `.yaml`, `.yml`, `.toml`, `.csv`, `.tsv`Schema detection, nested structures, validation**Text &amp; Markdown**`.txt`, `.md`, `.markdown`, `.djot`, `.mdx`, `.rst`, `.org`, `.rtf`CommonMark, GFM, Djot, MDX, reStructuredText, Org Mode, Rich Text### Email &amp; Archives

[](#email--archives)

CategoryFormatsFeatures**Email**`.eml`, `.msg`Headers, body (HTML/plain), attachments, UTF-16 support**Archives**`.zip`, `.tar`, `.tgz`, `.gz`, `.7z`Recursive extraction, nested archives, metadata### Academic &amp; Scientific

[](#academic--scientific)

CategoryFormatsFeatures**Citations**`.bib`, `.ris`, `.nbib`, `.enw`, `.csl`BibTeX/BibLaTeX, RIS, PubMed/MEDLINE, EndNote XML, CSL JSON**Scientific**`.tex`, `.latex`, `.typ`, `.typst`, `.jats`, `.ipynb`LaTeX, Typst, JATS journal articles, Jupyter notebooks**Publishing**`.fb2`, `.docbook`, `.dbk`, `.opml`FictionBook, DocBook XML, OPML outlines**Documentation**`.pod`, `.mdoc`, `.troff`Perl POD, man pages, troff**[Complete Format Reference →](https://docs.kreuzberg.dev/reference/formats/)**

Key Features
------------

[](#key-features-1)

**OCR with Table Extraction**Multiple OCR backends (Tesseract, EasyOCR, PaddleOCR) with intelligent table detection and reconstruction. Extract structured data from scanned documents and images with configurable accuracy thresholds.

**[OCR Backend Documentation →](https://docs.kreuzberg.dev/guides/ocr/)**

**Batch Processing**Process multiple documents concurrently with configurable parallelism. Optimize throughput for large-scale document processing workloads with automatic resource management.

**[Batch Processing Guide →](https://docs.kreuzberg.dev/features/#batch-processing)**

**Password-Protected PDFs**Handle encrypted PDFs with single or multiple password attempts. Supports both RC4 and AES encryption with automatic fallback strategies.

**[PDF Configuration →](https://docs.kreuzberg.dev/migration/v3-to-v4/#password-protected-pdfs)**

**Language Detection**Automatic language detection in extracted text using fast-langdetect. Configure confidence thresholds and access per-language statistics.

**[Language Detection Guide →](https://docs.kreuzberg.dev/features/#language-detection)**

**Metadata Extraction**Extract comprehensive metadata from all supported formats: authors, titles, creation dates, page counts, EXIF data, and format-specific properties.

**[Metadata Guide →](https://docs.kreuzberg.dev/reference/types/#metadata)**

AI Coding Assistants
--------------------

[](#ai-coding-assistants)

Kreuzberg ships with an [Agent Skill](https://agentskills.io) that teaches AI coding assistants how to use the library correctly. It works with Claude Code, Codex, Gemini CLI, Cursor, VS Code, Amp, Goose, Roo Code, and any tool supporting the Agent Skills standard.

Install the skill into any project using the [Vercel Skills CLI](https://github.com/vercel-labs/skills):

```
npx skills add kreuzberg-dev/kreuzberg
```

The skill is located at [`skills/kreuzberg/SKILL.md`](skills/kreuzberg/SKILL.md) and is automatically discovered by supported AI coding tools once installed.

Documentation
-------------

[](#documentation)

- **[Installation Guide](https://docs.kreuzberg.dev/getting-started/installation/)** – Setup and dependencies
- **[User Guide](https://docs.kreuzberg.dev/guides/extraction/)** – Comprehensive usage guide
- **[API Reference](https://docs.kreuzberg.dev/reference/api-python/)** – Complete API documentation
- **[Format Support](https://docs.kreuzberg.dev/reference/formats/)** – Supported file formats
- **[OCR Backends](https://docs.kreuzberg.dev/guides/ocr/)** – OCR engine setup
- **[CLI Guide](https://docs.kreuzberg.dev/cli/usage/)** – Command-line usage
- **[Migration Guide](https://docs.kreuzberg.dev/migration/v3-to-v4/)** – Upgrading from v3

Contributing
------------

[](#contributing)

Contributions are welcome! See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

License
-------

[](#license)

MIT License - see [LICENSE](LICENSE) for details. You can use Kreuzberg freely in both commercial and closed-source products with no obligations, no viral effects, and no licensing restrictions.

###  Health Score

62

—

FairBetter than 99% of packages

Maintenance91

Actively maintained with recent releases

Popularity46

Moderate usage in the ecosystem

Community35

Small or concentrated contributor base

Maturity68

Established project with proven stability

 Bus Factor1

Top contributor holds 93.1% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

62

Last Release

46d ago

PHP version history (2 changes)4.0.0-rc.22PHP ^8.2

4.3.2PHP ^8.4

### Community

Maintainers

![](https://www.gravatar.com/avatar/e332a98740b42010f18d4ff016720276b0f6b893776963c6fe29beac6f5a94ff?d=identicon)[Goldziher](/maintainers/Goldziher)

---

Top Contributors

[![Goldziher](https://avatars.githubusercontent.com/u/30733348?v=4)](https://github.com/Goldziher "Goldziher (3884 commits)")[![kh3rld](https://avatars.githubusercontent.com/u/171191586?v=4)](https://github.com/kh3rld "kh3rld (78 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (52 commits)")[![v-tan](https://avatars.githubusercontent.com/u/22367932?v=4)](https://github.com/v-tan "v-tan (52 commits)")[![pratik-mahalle](https://avatars.githubusercontent.com/u/124587957?v=4)](https://github.com/pratik-mahalle "pratik-mahalle (27 commits)")[![naderalexan](https://avatars.githubusercontent.com/u/3957852?v=4)](https://github.com/naderalexan "naderalexan (26 commits)")[![yiftachashkenazi](https://avatars.githubusercontent.com/u/154422620?v=4)](https://github.com/yiftachashkenazi "yiftachashkenazi (7 commits)")[![tobocop2](https://avatars.githubusercontent.com/u/5562156?v=4)](https://github.com/tobocop2 "tobocop2 (7 commits)")[![ivanova-gif](https://avatars.githubusercontent.com/u/247880403?v=4)](https://github.com/ivanova-gif "ivanova-gif (5 commits)")[![deepsource-autofix[bot]](https://avatars.githubusercontent.com/in/57168?v=4)](https://github.com/deepsource-autofix[bot] "deepsource-autofix[bot] (5 commits)")[![mlinmg](https://avatars.githubusercontent.com/u/121761685?v=4)](https://github.com/mlinmg "mlinmg (4 commits)")[![saiintbrisson](https://avatars.githubusercontent.com/u/29989290?v=4)](https://github.com/saiintbrisson "saiintbrisson (3 commits)")[![dereuromark](https://avatars.githubusercontent.com/u/39854?v=4)](https://github.com/dereuromark "dereuromark (2 commits)")[![hoesler](https://avatars.githubusercontent.com/u/1052770?v=4)](https://github.com/hoesler "hoesler (2 commits)")[![ktos](https://avatars.githubusercontent.com/u/1633261?v=4)](https://github.com/ktos "ktos (2 commits)")[![sandmor](https://avatars.githubusercontent.com/u/58484439?v=4)](https://github.com/sandmor "sandmor (2 commits)")[![wrmthorne](https://avatars.githubusercontent.com/u/15940892?v=4)](https://github.com/wrmthorne "wrmthorne (2 commits)")[![louisguitton](https://avatars.githubusercontent.com/u/7823843?v=4)](https://github.com/louisguitton "louisguitton (1 commits)")[![martinschaer](https://avatars.githubusercontent.com/u/1934845?v=4)](https://github.com/martinschaer "martinschaer (1 commits)")[![Marvelousmicheal](https://avatars.githubusercontent.com/u/112209435?v=4)](https://github.com/Marvelousmicheal "Marvelousmicheal (1 commits)")

---

Tags

buncsharpdocument-intelligenceelixirffigolangjavametadata-extractionnodepdf-extractionpdfiumphppythonragrubyrusttable-extractiontesseracttext-extractionwasmpdfperformancexlsxdocxOCRpptxphp8table-extractionrustTesseracttext extractiondocument-parsingdocument-extractiondocument-intelligencedocument-processingmetadata-extraction

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/kreuzberg-kreuzberg/health.svg)

```
[![Health](https://phpackages.com/badges/kreuzberg-kreuzberg/health.svg)](https://phpackages.com/packages/kreuzberg-kreuzberg)
```

###  Alternatives

[gotenberg/gotenberg-php

A PHP client for interacting with Gotenberg, a developer-friendly API for converting numerous document formats into PDF files, and more!

3685.2M19](/packages/gotenberg-gotenberg-php)[vaites/php-apache-tika

Apache Tika bindings for PHP: extracts text from documents and images (with OCR), metadata and more...

1171.5M2](/packages/vaites-php-apache-tika)[mnvx/lowrapper

PHP wrapper over LibreOffice converter

129190.5k](/packages/mnvx-lowrapper)[kartik-v/yii2-export

A library to export server/db data in various formats (e.g. excel, html, pdf, csv etc.)

1623.1M35](/packages/kartik-v-yii2-export)[aspose-cloud/aspose-words-cloud

Open, generate, edit, split, merge, compare and convert Word documents. Integrate Cloud API into your solutions to manipulate documents. Convert PDF to Word (DOC, DOCX, ODT, RTF and HTML) and in the opposite direction.

32157.4k](/packages/aspose-cloud-aspose-words-cloud)[nilgems/laravel-textract

A Laravel package to extract text from files like DOC, XL, Image, Pdf and more. I've developed this package by inspiring "npm textract".

195.2k](/packages/nilgems-laravel-textract)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
