PHPackages                             rembish/text-at-any-cost - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. rembish/text-at-any-cost

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

rembish/text-at-any-cost
========================

Extract plain text from common document formats: DOC, PDF, PPT, RTF, DOCX, ODT, RAR

v1.0.0(3mo ago)70239BSD-3-ClausePHPPHP ^8.3

Since Feb 17Pushed 3mo ago16 watchersCompare

[ Source](https://github.com/rembish/TextAtAnyCost)[ Packagist](https://packagist.org/packages/rembish/text-at-any-cost)[ RSS](/packages/rembish-text-at-any-cost/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (3)Versions (2)Used By (0)

TextAtAnyCost
=============

[](#textatanycost)

Extract plain text from common document formats — no external programs or PECL extensions required.

Supported formats
-----------------

[](#supported-formats)

FormatExtensionNotesMicrosoft Word 97–2003`.doc`CFB/WCBFF, ANSI and UnicodeMicrosoft PowerPoint 97–2003`.ppt`CFB/WCBFFAdobe PDF`.pdf`FlateDecode, ASCII-85, ASCII-Hex, ToUnicode CMapsRich Text Format`.rtf`Stack-based parser, Mac Roman + Windows-1251Word 2007+ (Open XML)`.docx`ZIP + XMLOpenDocument Text`.odt`ZIP + XMLRAR archives (read list)`.rar`RAR 4.x, no PECL requiredRAR archives (write/store)`.rar`Store method onlyRequirements
------------

[](#requirements)

- PHP **8.3** or later
- Extensions: `mbstring`, `zlib`, `dom`, `zip` (all standard in PHP 8)

Installation
------------

[](#installation)

### Via Composer (recommended)

[](#via-composer-recommended)

```
composer require rembish/text-at-any-cost
```

### Directly from GitHub

[](#directly-from-github)

```
composer require rembish/text-at-any-cost:dev-master
```

> **Packagist**: submit your GitHub URL at [packagist.org](https://packagist.org/packages/submit)once to enable tagged releases (`composer require rembish/text-at-any-cost:^1.0`).

Usage
-----

[](#usage)

### Unified facade (auto-detects by extension)

[](#unified-facade-auto-detects-by-extension)

```
use TextAtAnyCost\TextExtractor;

$text = TextExtractor::fromFile('/path/to/document.docx');
```

### Individual parsers

[](#individual-parsers)

```
use TextAtAnyCost\Parser\DocParser;
use TextAtAnyCost\Parser\PdfParser;
use TextAtAnyCost\Parser\PptParser;
use TextAtAnyCost\Parser\RtfParser;
use TextAtAnyCost\Parser\ZippedXmlParser;

$text = (new DocParser())->extractText('report.doc');
$text = (new PdfParser())->extractText('report.pdf');
$text = (new PptParser())->extractText('slides.ppt');
$text = (new RtfParser())->extractText('memo.rtf');
$text = (new ZippedXmlParser())->extractDocx('report.docx');
$text = (new ZippedXmlParser())->extractOdt('report.odt');
```

### RTF from a string

[](#rtf-from-a-string)

```
use TextAtAnyCost\Parser\RtfParser;

$text = (new RtfParser())->parseString($rtfString);
```

### RAR archives

[](#rar-archives)

```
use TextAtAnyCost\Archive\RarReader;
use TextAtAnyCost\Archive\RarWriter;

// List files
$reader = new RarReader();
$files  = $reader->getFileList('archive.rar');
$tree   = $reader->getFileTree('archive.rar');

// Create a stored (no-compression) archive
$writer = new RarWriter();
$writer->create('output.rar');
$writer->addDirectory('docs/reports');
$writer->addFile('/var/www/report.pdf', 'docs/reports');
$writer->close();
```

### Procedural wrappers (backward-compatible)

[](#procedural-wrappers-backward-compatible)

Each parser file still exports a procedural function for drop-in compatibility:

```
require 'vendor/autoload.php';

$text = doc2text('report.doc');
$text = pdf2text('report.pdf');
$text = ppt2text('slides.ppt');
$text = rtf2text('memo.rtf');
$text = docx2text('report.docx');
$text = odt2text('report.odt');
```

Error handling
--------------

[](#error-handling)

All parsers throw `TextAtAnyCost\Exception\ParseException` (extends `RuntimeException`) on structural or I/O errors. `TextExtractor::fromFile()` additionally throws `\InvalidArgumentException` for unsupported extensions.

```
use TextAtAnyCost\Exception\ParseException;
use TextAtAnyCost\TextExtractor;

try {
    $text = TextExtractor::fromFile($path);
} catch (ParseException $e) {
    // file unreadable or format invalid
} catch (\InvalidArgumentException $e) {
    // extension not supported
}
```

Development
-----------

[](#development)

All development tasks run inside Docker — no local PHP installation required.

```
make install       # install Composer dependencies
make test          # run PHPUnit test suite
make stan          # PHPStan static analysis (level 8)
make cs            # check code style (PHP-CS-Fixer, dry-run)
make cs-fix        # apply code-style fixes
make lint          # PHP syntax check on all files
make test-coverage # HTML coverage report in coverage/
make shell         # interactive shell in the container
```

Architecture
------------

[](#architecture)

```
src/
├── Exception/
│   └── ParseException.php
├── Parser/
│   ├── CfbParser.php          # Abstract base: Windows Compound Binary File
│   ├── DocParser.php          # .doc  (extends CfbParser)
│   ├── PptParser.php          # .ppt  (extends CfbParser)
│   ├── PdfParser.php          # .pdf
│   ├── RtfParser.php          # .rtf
│   └── ZippedXmlParser.php    # .docx / .odt
├── Archive/
│   ├── RarReader.php          # RAR 4.x file listing
│   └── RarWriter.php          # RAR store-mode archive creation
└── TextExtractor.php          # Unified facade

```

Changelog / Bug fixes
---------------------

[](#changelog--bug-fixes)

The following bugs from the original 2009 codebase were fixed during modernisation:

FileBug`stored-rar.php``getDateTime()`: inverted null-check always returned the current time, ignoring the provided timestamp`stored-rar.php``getBytes()`: `strlen(0)` returns 1, not 0 — header size was off by one for zero-length fields`pdf.php`Single-quoted `'\n'`, `'\r'` etc. are literal two-character strings in PHP — text output contained backslash-n instead of actual newlines`pdf.php``FILE_BINARY` constant does not exist in PHP; removed (the flag was silently ignored)`cfb.php`Dead code after `continue` including a debug `echo "@"` statement that would corrupt output`cfb.php``while(...["type"] == 0) array_pop()` could loop forever on an empty array (PR #7)`doc.php``html_entity_decode("&#x...;")` replaced with `mb_chr()` for correct multi-byte output (PR #9)`zipped-xml.php``LIBXML_XINCLUDE` removed — it allowed XML `` to read arbitrary local files (XXE)`zipped-xml.php`Lossy `iconv("utf-8", "windows-1250")` conversion removed; output is now UTF-8 throughout`rtf.php`Stack underflow when `j < 0` or stack entry missing (PR #4)License
-------

[](#license)

BSD 3-Clause — see [LICENSE](LICENSE).

###  Health Score

44

—

FairBetter than 92% of packages

Maintenance82

Actively maintained with recent releases

Popularity20

Limited adoption so far

Community19

Small or concentrated contributor base

Maturity49

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 87.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

90d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/bd6a3793929e2aa6cfc9556a5c2281573abcbddb74d4dc464a1042595364011a?d=identicon)[rembish](/maintainers/rembish)

---

Top Contributors

[![rembish](https://avatars.githubusercontent.com/u/470945?v=4)](https://github.com/rembish "rembish (7 commits)")[![magefad](https://avatars.githubusercontent.com/u/1182744?v=4)](https://github.com/magefad "magefad (1 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/rembish-text-at-any-cost/health.svg)

```
[![Health](https://phpackages.com/badges/rembish-text-at-any-cost/health.svg)](https://phpackages.com/packages/rembish-text-at-any-cost)
```

###  Alternatives

[masterminds/html5

An HTML5 parser and serializer.

1.8k242.8M229](/packages/masterminds-html5)[sabberworm/php-css-parser

Parser for CSS Files written in PHP

1.8k191.2M65](/packages/sabberworm-php-css-parser)[jms/metadata

Class/method/property metadata management in PHP

1.8k152.8M88](/packages/jms-metadata)[jms/serializer-bundle

Allows you to easily serialize, and deserialize data of any complexity

1.8k89.3M627](/packages/jms-serializer-bundle)[hassankhan/config

Lightweight configuration file loader that supports PHP, INI, XML, JSON, and YAML files

97513.5M170](/packages/hassankhan-config)[meyfa/php-svg

Read, edit, write, and render SVG files with PHP

54613.9M42](/packages/meyfa-php-svg)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
