PHPackages                             izisoft/tesseract\_ocr - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. izisoft/tesseract\_ocr

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

izisoft/tesseract\_ocr
======================

A wrapper to work with Tesseract OCR inside PHP.

2.12.0(4y ago)02.2kMITPHPPHP ^5.3 || ^7.0 || ^8.0

Since Aug 30Pushed 3y ago1 watchersCompare

[ Source](https://github.com/izisoft/tesseract-ocr-for-php)[ Packagist](https://packagist.org/packages/izisoft/tesseract_ocr)[ RSS](/packages/izisoft-tesseract-ocr/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (1)Versions (50)Used By (0)

[![Tesseract OCR for PHP](https://camo.githubusercontent.com/5644ef8ac7456f307f044c699a24a91fd0005382b573633875487226c5db47a6/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f6c6f676f2e706e67)](https://camo.githubusercontent.com/5644ef8ac7456f307f044c699a24a91fd0005382b573633875487226c5db47a6/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f6c6f676f2e706e67)

Tesseract OCR for PHP
=====================

[](#tesseract-ocr-for-php)

A wrapper to work with Tesseract OCR inside PHP.

[![CI](https://github.com/thiagoalessio/tesseract-ocr-for-php/workflows/CI/badge.svg?event=push&branch=main)](https://github.com/thiagoalessio/tesseract-ocr-for-php/actions?query=workflow%3ACI)[![AppVeyor](https://camo.githubusercontent.com/97e90624b633c6c9d9d23f000894289d0d1db9754dc698e59e19cca5ee3432c6/68747470733a2f2f63692e6170707665796f722e636f6d2f6170692f70726f6a656374732f7374617475732f787779356c7330373938697763696d332f6272616e63682f6d61696e3f7376673d74727565)](https://ci.appveyor.com/project/thiagoalessio/tesseract-ocr-for-php/branch/main)[![Codacy](https://camo.githubusercontent.com/76ef3c8d10c1fb64294bd95c0332cc13be9aa73d941bdc4d408295ebd7d4a1c4/68747470733a2f2f6170702e636f646163792e636f6d2f70726f6a6563742f62616467652f47726164652f6138316161313030313238373466323361353764663562343932643833356632)](https://www.codacy.com/gh/thiagoalessio/tesseract-ocr-for-php/dashboard)[![Test Coverage](https://camo.githubusercontent.com/616ae1101b9868b6a298e1da968294c831071b77571dac026a5f4a1f290a6bd5/68747470733a2f2f636f6465636f762e696f2f67682f74686961676f616c657373696f2f7465737365726163742d6f63722d666f722d7068702f6272616e63682f6d61696e2f67726170682f62616467652e7376673f746f6b656e3d5930566e727169534966)](https://codecov.io/gh/thiagoalessio/tesseract-ocr-for-php)
[![Latest Stable Version](https://camo.githubusercontent.com/19e5fc8c5538aba3333632edbc03ee69f27c478cd76e4f459cc91e139cb87bf5/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f74686961676f616c657373696f2f7465737365726163745f6f63722e737667)](https://packagist.org/packages/thiagoalessio/tesseract_ocr)[![Total Downloads](https://camo.githubusercontent.com/a70d4f0ab0ba9167c40c8e4f19260e067f70d9fb30d823711b6ab1c7ad27e308/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f74686961676f616c657373696f2f7465737365726163745f6f63722e737667)](https://packagist.org/packages/thiagoalessio/tesseract_ocr)[![Monthly Downloads](https://camo.githubusercontent.com/7b4aaebd34cb0da4401b075eedc0ee6cd885afaf5bf81cb0ec3265d6a63db40d/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f646d2f74686961676f616c657373696f2f7465737365726163745f6f63722e737667)](https://packagist.org/packages/thiagoalessio/tesseract_ocr)

Installation
------------

[](#installation)

Via [Composer](http://getcomposer.org/):

```
$ composer require thiagoalessio/tesseract_ocr

```

‼️ **This library depends on [Tesseract OCR](https://github.com/tesseract-ocr/tesseract), version *3.02* or later.**

### [![](https://camo.githubusercontent.com/02979a56ab84a0cc36fef243c1d8e4754dca7192cfb80268601339eab161ac01/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f77696e646f77732d31382e737667)](https://camo.githubusercontent.com/02979a56ab84a0cc36fef243c1d8e4754dca7192cfb80268601339eab161ac01/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f77696e646f77732d31382e737667) Note for Windows users

[](#-note-for-windows-users)

There are [many ways](https://github.com/tesseract-ocr/tesseract/wiki#windows) to install [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) on your system, but if you just want something quick to get up and running, I recommend installing the [Capture2Text](https://chocolatey.org/packages/capture2text) package with [Chocolatey](https://chocolatey.org).

```
choco install capture2text --version 3.9

```

⚠️ Recent versions of [Capture2Text](https://chocolatey.org/packages/capture2text) stopped shipping the `tesseract` binary.

### [![](https://camo.githubusercontent.com/e47a2447dd05578dc5203d4197928f309f60621a17d6cca7b37468e52b99043a/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f6170706c652d31382e737667)](https://camo.githubusercontent.com/e47a2447dd05578dc5203d4197928f309f60621a17d6cca7b37468e52b99043a/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f6170706c652d31382e737667) Note for macOS users

[](#-note-for-macos-users)

With [MacPorts](https://www.macports.org) you can install support for individual languages, like so:

```
$ sudo port install tesseract-

```

But that is not possible with [Homebrew](https://brew.sh). It comes only with **English** support by default, so if you intend to use it for other language, the quickest solution is to install them all:

```
$ brew install tesseract tesseract-lang

```

Usage
-----

[](#usage)

### Basic usage

[](#basic-usage)

[![](./tests/EndToEnd/images/text.png "The quick brown fox jumps over the lazy dog.")](./tests/EndToEnd/images/text.png)

```
use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('text.png'))
    ->run();
```

```
The quick brown fox
jumps over
the lazy dog.

```

### Other languages

[](#other-languages)

[![](./tests/EndToEnd/images/german.png "Bülowstraße")](./tests/EndToEnd/images/german.png)

```
use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('german.png'))
    ->lang('deu')
    ->run();
```

```
Bülowstraße

```

### Multiple languages

[](#multiple-languages)

[![](./tests/EndToEnd/images/mixed-languages.png "I eat すし y Pollo")](./tests/EndToEnd/images/mixed-languages.png)

```
use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('mixed-languages.png'))
    ->lang('eng', 'jpn', 'spa')
    ->run();
```

```
I eat すし y Pollo

```

### Inducing recognition

[](#inducing-recognition)

[![](./tests/EndToEnd/images/8055.png "8055")](./tests/EndToEnd/images/8055.png)

```
use thiagoalessio\TesseractOCR\TesseractOCR;
echo (new TesseractOCR('8055.png'))
    ->allowlist(range('A', 'Z'))
    ->run();
```

```
BOSS

```

### Breaking CAPTCHAs

[](#breaking-captchas)

Yes, I know some of you might want to use this library for the *noble* purpose of breaking CAPTCHAs, so please take a look at this comment:

API
---

[](#api)

### run

[](#run)

Executes a `tesseract` command, optionally receiving an integer as `timeout`, in case you experience stalled tesseract processes.

```
$ocr = new TesseractOCR();
$ocr->run();
```

```
$ocr = new TesseractOCR();
$timeout = 500;
$ocr->run($timeout);
```

### image

[](#image)

Define the path of an image to be recognized by `tesseract`.

```
$ocr = new TesseractOCR();
$ocr->image('/path/to/image.png');
$ocr->run();
```

### imageData

[](#imagedata)

Set the image to be recognized by `tesseract` from a string, with its size. This can be useful when dealing with files that are already loaded in memory. You can easily retrieve the image data and size of an image object :

```
//Using Imagick
$data = $img->getImageBlob();
$size = $img->getImageLength();
//Using GD
ob_start();
// Note that you can use any format supported by tesseract
imagepng($img, null, 0);
$size = ob_get_length();
$data = ob_get_clean();

$ocr = new TesseractOCR();
$ocr->imageData($data, $size);
$ocr->run();
```

### executable

[](#executable)

Define a custom location of the `tesseract` executable, if by any reason it is not present in the `$PATH`.

```
echo (new TesseractOCR('img.png'))
    ->executable('/path/to/tesseract')
    ->run();
```

### version

[](#version)

Returns the current version of `tesseract`.

```
echo (new TesseractOCR())->version();
```

### availableLanguages

[](#availablelanguages)

Returns a list of available languages/scripts.

```
foreach((new TesseractOCR())->availableLanguages() as $lang) echo $lang;
```

**More info:**

### tessdataDir

[](#tessdatadir)

Specify a custom location for the tessdata directory.

```
echo (new TesseractOCR('img.png'))
    ->tessdataDir('/path')
    ->run();
```

### userWords

[](#userwords)

Specify the location of user words file.

This is a plain text file containing a list of words that you want to be considered as a normal dictionary words by `tesseract`.

Useful when dealing with contents that contain technical terminology, jargon, etc.

```
$ cat /path/to/user-words.txt
foo
bar

```

```
echo (new TesseractOCR('img.png'))
    ->userWords('/path/to/user-words.txt')
    ->run();
```

### userPatterns

[](#userpatterns)

Specify the location of user patterns file.

If the contents you are dealing with have known patterns, this option can help a lot tesseract's recognition accuracy.

```
$ cat /path/to/user-patterns.txt'
1-\d\d\d-GOOG-441
www.\n\\\*.com

```

```
echo (new TesseractOCR('img.png'))
    ->userPatterns('/path/to/user-patterns.txt')
    ->run();
```

### lang

[](#lang)

Define one or more languages to be used during the recognition. A complete list of available languages can be found at:

**Tip from [@daijiale](https://github.com/daijiale):** Use the combination `->lang('chi_sim', 'chi_tra')`for proper recognition of Chinese.

```
 echo (new TesseractOCR('img.png'))
     ->lang('lang1', 'lang2', 'lang3')
     ->run();
```

### psm

[](#psm)

Specify the Page Segmentation Method, which instructs `tesseract` how to interpret the given image.

**More info:**

```
echo (new TesseractOCR('img.png'))
    ->psm(6)
    ->run();
```

### oem

[](#oem)

Specify the OCR Engine Mode. (see `tesseract --help-oem`)

```
echo (new TesseractOCR('img.png'))
    ->oem(2)
    ->run();
```

### dpi

[](#dpi)

Specify the image DPI. It is useful if your image does not contain this information in its metadata.

```
echo (new TesseractOCR('img.png'))
    ->dpi(300)
    ->run();
```

### allowlist

[](#allowlist)

This is a shortcut for `->config('tessedit_char_whitelist', 'abcdef....')`.

```
echo (new TesseractOCR('img.png'))
    ->allowlist(range('a', 'z'), range(0, 9), '-_@')
    ->run();
```

### configFile

[](#configfile)

Specify a config file to be used. It can either be the path to your own config file or the name of one of the predefined config files:

```
echo (new TesseractOCR('img.png'))
    ->configFile('hocr')
    ->run();
```

### setOutputFile

[](#setoutputfile)

Specify an Outputfile to be used. Be aware: If you set an outputfile then the option `withoutTempFiles` is ignored. Tempfiles are written (and deleted) even if `withoutTempFiles = true`.

In combination with `configFile` you are able to get the `hocr`, `tsv` or `pdf` files.

```
echo (new TesseractOCR('img.png'))
    ->configFile('pdf')
    ->setOutputFile('/PATH_TO_MY_OUTPUTFILE/searchable.pdf')
    ->run();
```

### digits

[](#digits)

Shortcut for `->configFile('digits')`.

```
echo (new TesseractOCR('img.png'))
    ->digits()
    ->run();
```

### hocr

[](#hocr)

Shortcut for `->configFile('hocr')`.

```
echo (new TesseractOCR('img.png'))
    ->hocr()
    ->run();
```

### pdf

[](#pdf)

Shortcut for `->configFile('pdf')`.

```
echo (new TesseractOCR('img.png'))
    ->pdf()
    ->run();
```

### quiet

[](#quiet)

Shortcut for `->configFile('quiet')`.

```
echo (new TesseractOCR('img.png'))
    ->quiet()
    ->run();
```

### tsv

[](#tsv)

Shortcut for `->configFile('tsv')`.

```
echo (new TesseractOCR('img.png'))
    ->tsv()
    ->run();
```

### txt

[](#txt)

Shortcut for `->configFile('txt')`.

```
echo (new TesseractOCR('img.png'))
    ->txt()
    ->run();
```

### tempDir

[](#tempdir)

Define a custom directory to store temporary files generated by tesseract. Make sure the directory actually exists and the user running `php` is allowed to write in there.

```
echo (new TesseractOCR('img.png'))
    ->tempDir('./my/custom/temp/dir')
    ->run();
```

### withoutTempFiles

[](#withouttempfiles)

Specify that `tesseract` should output the recognized text without writing to temporary files. The data is gathered from the standard output of `tesseract` instead.

```
echo (new TesseractOCR('img.png'))
    ->withoutTempFiles()
    ->run();
```

### Other options

[](#other-options)

Any configuration option offered by Tesseract can be used like that:

```
echo (new TesseractOCR('img.png'))
    ->config('config_var', 'value')
    ->config('other_config_var', 'other value')
    ->run();
```

Or like that:

```
echo (new TesseractOCR('img.png'))
    ->configVar('value')
    ->otherConfigVar('other value')
    ->run();
```

**More info:**

### Thread-limit

[](#thread-limit)

Sometimes, it may be useful to limit the number of threads that tesseract is allowed to use (e.g. in [this case](https://github.com/tesseract-ocr/tesseract/issues/898)). Set the maxmium number of threads as param for the `run` function:

```
echo (new TesseractOCR('img.png'))
    ->threadLimit(1)
    ->run();
```

How to contribute
-----------------

[](#how-to-contribute)

You can contribute to this project by:

- Opening an [Issue](https://github.com/thiagoalessio/tesseract-ocr-for-php/issues) if you found a bug or wish to propose a new feature;
- Placing a [Pull Request](https://github.com/thiagoalessio/tesseract-ocr-for-php/pulls) with code that fix a bug, missing/wrong documentation or implement a new feature;

Just make sure you take a look at our [Code of Conduct](https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CODE_OF_CONDUCT.md) and [Contributing](https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/.github/CONTRIBUTING.md)instructions.

License
-------

[](#license)

tesseract-ocr-for-php is released under the [MIT License](https://github.com/thiagoalessio/tesseract-ocr-for-php/blob/main/MIT-LICENSE).

Made with [![love](https://camo.githubusercontent.com/7199424ae82041c5374e9302c1276d79b7e62e5a20e02a7942485c15e0ff38da/68747470733a2f2f74686961676f616c657373696f2e6769746875622e696f2f7465737365726163742d6f63722d666f722d7068702f696d616765732f68656172742e737667)](#) in Berlin

###  Health Score

38

—

LowBetter than 84% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity19

Limited adoption so far

Community19

Small or concentrated contributor base

Maturity81

Battle-tested with a long release history

 Bus Factor1

Top contributor holds 91% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~60 days

Recently: every ~35 days

Total

48

Last Release

1799d ago

Major Versions

0.2.1 → 1.0.0-RC2016-04-20

1.3.0 → 2.0.02017-12-10

PHP version history (4 changes)2.1.0PHP ^5.6 || ^7.0

2.6.1PHP ^5.4 || ^7.0

2.9.4PHP ^5.4 || ^7.0 || ^8.0

2.9.5PHP ^5.3 || ^7.0 || ^8.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/2edf0df855a366ea51f96bd910a60c87320c973460ccc273a7e448ae32eedb65?d=identicon)[zinzinx8](/maintainers/zinzinx8)

---

Top Contributors

[![thiagoalessio](https://avatars.githubusercontent.com/u/190616?v=4)](https://github.com/thiagoalessio "thiagoalessio (333 commits)")[![LeCodeurDuDimanche](https://avatars.githubusercontent.com/u/43851820?v=4)](https://github.com/LeCodeurDuDimanche "LeCodeurDuDimanche (8 commits)")[![bourdaisj](https://avatars.githubusercontent.com/u/58701261?v=4)](https://github.com/bourdaisj "bourdaisj (3 commits)")[![drsassafras](https://avatars.githubusercontent.com/u/18317850?v=4)](https://github.com/drsassafras "drsassafras (2 commits)")[![tobias74](https://avatars.githubusercontent.com/u/1889785?v=4)](https://github.com/tobias74 "tobias74 (2 commits)")[![zinzinx8](https://avatars.githubusercontent.com/u/13455399?v=4)](https://github.com/zinzinx8 "zinzinx8 (2 commits)")[![malanx](https://avatars.githubusercontent.com/u/13081336?v=4)](https://github.com/malanx "malanx (2 commits)")[![michaljusiega](https://avatars.githubusercontent.com/u/16488888?v=4)](https://github.com/michaljusiega "michaljusiega (1 commits)")[![rhys-mcguckin](https://avatars.githubusercontent.com/u/5679170?v=4)](https://github.com/rhys-mcguckin "rhys-mcguckin (1 commits)")[![rizwanjiwan](https://avatars.githubusercontent.com/u/33099935?v=4)](https://github.com/rizwanjiwan "rizwanjiwan (1 commits)")[![scrutinizer-auto-fixer](https://avatars.githubusercontent.com/u/6253494?v=4)](https://github.com/scrutinizer-auto-fixer "scrutinizer-auto-fixer (1 commits)")[![SjorsO](https://avatars.githubusercontent.com/u/7202674?v=4)](https://github.com/SjorsO "SjorsO (1 commits)")[![suud](https://avatars.githubusercontent.com/u/10183975?v=4)](https://github.com/suud "suud (1 commits)")[![adamasantares](https://avatars.githubusercontent.com/u/5537618?v=4)](https://github.com/adamasantares "adamasantares (1 commits)")[![zoilomora](https://avatars.githubusercontent.com/u/4701956?v=4)](https://github.com/zoilomora "zoilomora (1 commits)")[![BenMorel](https://avatars.githubusercontent.com/u/1952838?v=4)](https://github.com/BenMorel "BenMorel (1 commits)")[![BetsuNo](https://avatars.githubusercontent.com/u/12868790?v=4)](https://github.com/BetsuNo "BetsuNo (1 commits)")[![den1n](https://avatars.githubusercontent.com/u/25498109?v=4)](https://github.com/den1n "den1n (1 commits)")[![iamvar](https://avatars.githubusercontent.com/u/7314366?v=4)](https://github.com/iamvar "iamvar (1 commits)")[![joshuamabina](https://avatars.githubusercontent.com/u/3260441?v=4)](https://github.com/joshuamabina "joshuamabina (1 commits)")

---

Tags

OCRTesseracttext recognition

### Embed Badge

![Health badge](/badges/izisoft-tesseract-ocr/health.svg)

```
[![Health](https://phpackages.com/badges/izisoft-tesseract-ocr/health.svg)](https://phpackages.com/packages/izisoft-tesseract-ocr)
```

###  Alternatives

[thiagoalessio/tesseract_ocr

A wrapper to work with Tesseract OCR inside PHP.

3.0k3.3M24](/packages/thiagoalessio-tesseract-ocr)[ddeboer/tesseract

A wrapper for the Tesseract OCR engine

23135.1k](/packages/ddeboer-tesseract)[cdsmths/laravel-ocr-space

OCR PDF's and images with the OCR.Space API from Laravel

11915.8k](/packages/cdsmths-laravel-ocr-space)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
