PHPackages                             nekulin/php-apache-tika - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. nekulin/php-apache-tika

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

nekulin/php-apache-tika
=======================

Apache Tika bindings for PHP: extracts text from documents and images (with OCR), metadata and more...

0.3.0(10y ago)0221MITPHPPHP &gt;=5.4.0

Since Aug 30Pushed 10y ago1 watchersCompare

[ Source](https://github.com/nekulin/php-apache-tika)[ Packagist](https://packagist.org/packages/nekulin/php-apache-tika)[ RSS](/packages/nekulin-php-apache-tika/feed)WikiDiscussions master Synced 2mo ago

READMEChangelogDependencies (1)Versions (4)Used By (0)

PHP Apache Tika
===============

[](#php-apache-tika)

This tool provides [Apache Tika](https://tika.apache.org) bindings for PHP, allowing to extract text and metadata from documents, images and other formats.

Two modes are supported:

- **App mode**: run app JAR via command line interface
- **Server mode**: make HTTP requests to [JSR 311 network server](http://wiki.apache.org/tika/TikaJAXRS)

Server mode is recommended because is 5 times faster, but some shared hosts don't allow run processes in background.

Features
--------

[](#features)

- Simple class interface to Apache Tika features:
    - Text and HTML extraction
    - Metadata extraction
    - OCR recognition
- Standarized metadata for documents
- Support for local and remote resources
- No heavyweight library dependencies

Requirements
------------

[](#requirements)

- PHP 5.4 or greater
- Apache Tika 1.7 or greater
- Oracle Java or OpenJDK
    - Java 6 for Tika up to 1.9
    - Java 7 for Tika 1.10 or greater
- [Tesseract](https://github.com/tesseract-ocr/tesseract) (optional for OCR recognition)

Installation
------------

[](#installation)

Install using composer:

```
composer require vaites/php-apache-tika

```

If you want to use OCR you must install [Tesseract](https://github.com/tesseract-ocr/tesseract):

- **Fedora/CentOS**: `sudo yum install tesseract` (use dnf instead of yum on Fedora 22 or greater)
- **Debian/Ubuntu**: `sudo apt-get install tesseract-ocr`
- **Mac OS X**: `brew install tesseract` (using [Homebrew](http://brew.sh))

Usage
-----

[](#usage)

Start Apache Tika server with [caution](http://www.openwall.com/lists/oss-security/2015/08/13/5):

```
java -jar tika-server-1.10.jar

```

Instantiate the class:

```
$client = \Vaites\ApacheTika\Client::make('localhost', 9998);           // server mode (default)
$client = \Vaites\ApacheTika\Client::make('/path/to/tika-app.jar');     // app mode

```

Use the class to extract text from documents:

```
$language = $client->getLanguage('/path/to/your/document');
$metadata = $client->getMetadata('/path/to/your/document');

$html = $client->getHTML('/path/to/your/document');
$text = $client->getText('/path/to/your/document');

```

Or use to extract text from images:

```
$client = \Vaites\ApacheTika\Client::make($host, $port);
$metadata = $client->getMetadata('/path/to/your/image');

$text = $client->getText('/path/to/your/image');

```

Integrations
------------

[](#integrations)

- [Symfony2 Bundle](https://github.com/welcoMattic/ApacheTikaBundle)

###  Health Score

24

—

LowBetter than 32% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity7

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity50

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 79.4% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~52 days

Total

3

Last Release

3805d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/d7f4cea0a0894a9e303ff4b84562b452435eaab59b36180a38410a6a19334d68?d=identicon)[nekulin](/maintainers/nekulin)

---

Top Contributors

[![vaites](https://avatars.githubusercontent.com/u/478660?v=4)](https://github.com/vaites "vaites (27 commits)")[![welcoMattic](https://avatars.githubusercontent.com/u/773875?v=4)](https://github.com/welcoMattic "welcoMattic (4 commits)")[![nekulin](https://avatars.githubusercontent.com/u/819130?v=4)](https://github.com/nekulin "nekulin (3 commits)")

---

Tags

pdfdocdocxodtofficeapacheOCRpptxppttikadocuments

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/nekulin-php-apache-tika/health.svg)

```
[![Health](https://phpackages.com/badges/nekulin-php-apache-tika/health.svg)](https://phpackages.com/packages/nekulin-php-apache-tika)
```

###  Alternatives

[vaites/php-apache-tika

Apache Tika bindings for PHP: extracts text from documents and images (with OCR), metadata and more...

1171.5M2](/packages/vaites-php-apache-tika)[phpoffice/phpword

PHPWord - A pure PHP library for reading and writing word processing documents (OOXML, ODF, RTF, HTML, PDF)

7.5k34.7M186](/packages/phpoffice-phpword)[enzim/tika-wrapper

This is a simple PHP Wrapper for Apache Tika (using the tika-app jar)

6021.3k](/packages/enzim-tika-wrapper)[ninoskopac/php-tika-wrapper

This is a simple PHP Wrapper for Apache Tika (using the tika-app jar)

6011.1k](/packages/ninoskopac-php-tika-wrapper)[nilgems/laravel-textract

A Laravel package to extract text from files like DOC, XL, Image, Pdf and more. I've developed this package by inspiring "npm textract".

195.2k](/packages/nilgems-laravel-textract)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
