PHPackages                             xatham/text-extraction - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. xatham/text-extraction

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

xatham/text-extraction
======================

Easy text extraction for many different file types

0.0.2(4y ago)117MITPHPPHP &gt;=7.4

Since Dec 21Pushed 4y ago1 watchersCompare

[ Source](https://github.com/xatham/text-extraction)[ Packagist](https://packagist.org/packages/xatham/text-extraction)[ RSS](/packages/xatham-text-extraction/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (12)Versions (4)Used By (0)

[![PHP Composer](https://github.com/xatham/text-extraction/workflows/PHP%20Composer/badge.svg)](https://github.com/xatham/text-extraction/workflows/PHP%20Composer/badge.svg)

text-extraction
===============

[](#text-extraction)

About
-----

[](#about)

This PHP-Library let's you extract plain text from various document types.

Currently supported file mime-types for extraction are:

`text/plain`

`text/csv`

`application/vnd.ms-excel`

`application/vnd.oasis.opendocument.text`

`application/pdf`

`application/msword'`

Install
-------

[](#install)

```
composer require xatham/text-extraction
```

Usage
-----

[](#usage)

```
/**
 * Extracting only pdf files, without ocr capturing
 */
$textExtractor = (new TextExtractionBuilder())->buildTextExtractor(
    [
        'withOcr' => false,
        'validMimeTypes' =>  ['application/pdf'],
    ],
);

$target = dirname(__DIR__) . '/examples/sample.pdf';
$plainTextDocument = $textExtractor->extractByFilePath($target);
if ($plainTextDocument === null) {
    exit('Could not extract any data');
}
$texts = $plainTextDocument->getTextItems();

foreach ($texts as $text) {
    var_dump($text);
}
```

License
-------

[](#license)

text-extraction is licensed under [MIT](https://github.com/xatham/text-extraction/blob/main/LICENSE).

###  Health Score

22

—

LowBetter than 22% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity7

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity45

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~278 days

Total

2

Last Release

1695d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/69ecc2877817983c18a8d9365230f63c2765f165d7db117dbd8e4de85ab286f5?d=identicon)[xatham](/maintainers/xatham)

---

Top Contributors

[![xatham](https://avatars.githubusercontent.com/u/9279227?v=4)](https://github.com/xatham "xatham (38 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/xatham-text-extraction/health.svg)

```
[![Health](https://phpackages.com/badges/xatham-text-extraction/health.svg)](https://phpackages.com/packages/xatham-text-extraction)
```

###  Alternatives

[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[coenjacobs/mozart

Composes all dependencies as a package inside a WordPress plugin

4723.6M20](/packages/coenjacobs-mozart)[illuminate/session

The Illuminate Session package.

9937.4M753](/packages/illuminate-session)[solspace/craft-freeform

The most flexible and user-friendly form building plugin!

52664.9k12](/packages/solspace-craft-freeform)[aedart/athenaeum

Athenaeum is a mono repository; a collection of various PHP packages

245.2k](/packages/aedart-athenaeum)[j0k3r/graby-site-config

Graby site config files

23365.8k3](/packages/j0k3r-graby-site-config)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
