PHPackages                             lukemadhanga/php-document-parser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. lukemadhanga/php-document-parser

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

lukemadhanga/php-document-parser
================================

A PHP parser for getting the text from a .doc, .docx, .rtf or .txt file

0.1.5(6y ago)39115.4k↑33.6%16[8 issues](https://github.com/LukeMadhanga/PHPDocumentParser/issues)[1 PRs](https://github.com/LukeMadhanga/PHPDocumentParser/pulls)1PHPPHP &gt;=5.3.3CI failing

Since Sep 13Pushed 6y ago3 watchersCompare

[ Source](https://github.com/LukeMadhanga/PHPDocumentParser)[ Packagist](https://packagist.org/packages/lukemadhanga/php-document-parser)[ RSS](/packages/lukemadhanga-php-document-parser/feed)WikiDiscussions master Synced yesterday

READMEChangelog (9)Dependencies (1)Versions (10)Used By (1)

PHP DocumentParser
==================

[](#php-documentparser)

*A PHP parser for getting the text from a .doc, .docx, .rtf or .txt file*

---

Authors

- @facuonline
- Luke Madhanga @LukeMadhanga

---

This library is perfect if you want users to be able to upload word documents to your content management system, instead of forcing them to copy and paste. Supported file types are **.doc**, **.docx**, **.txt** and **.rtf**.

> composer require lukemadhanga/php-document-parser

May require you to install PHP Zip

> sudo apt-get install php7.0-zip

The above `Ubuntu` command will vary depending on your version of PHP and what OS is running on your server

---

Methods
-------

[](#methods)

#### parseFromFile

[](#parsefromfile)

*Parse a document from a file*

Arguments

string `$filename` The path to the file to parse

string `$mimetype` The mimetype of the file. This will be used to determine which algorithm to use when decoding

**returns** string The text from the file

---

#### parseFromString

[](#parsefromstring)

*Parse a file from a string*

Arguments

string `$string` The contents of the file to parse

string `$mimetype` The mimetype of the file. This will be used to determine which algorithm to use when decoding

**returns** string The text in the document

---

Change log
----------

[](#change-log)

#### September 21 2019 (0.1.4)

[](#september-21-2019-014)

**Better ODT Support**Merged in PR#13 for better ODT support. Author: facuonline

#### August 1 2019 (0.1.3)

[](#august-1-2019-013)

**PHP Unit**Merged in PR#12 for PHP Unit testing. Author: facuonline

#### March 21 2019 (0.1.2)

[](#march-21-2019-012)

**DOCX Handling**Merged in PR#10 For better DOCX handling. Includes bug fixes for exception handling. Author: facuonline

#### September 13th 2017

[](#september-13th-2017)

**Added composer**

> composer require lukemadhanga/php-document-parser

#### April 29th 2016

[](#april-29th-2016)

**Improved .doc process**

The script to parse .doc files is unreliable: it breaks on complicated documents. I would suggest installing the `antiword` command line utility as that works almost perfectly for the larger majority of documents.

###  Health Score

36

—

LowBetter than 79% of packages

Maintenance16

Infrequent updates — may be unmaintained

Popularity45

Moderate usage in the ecosystem

Community17

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 80.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~109 days

Recently: every ~202 days

Total

9

Last Release

2340d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/141fddedb95625908dbda3c232ca719e0f64ab01628f1bf22d5850ba116237b1?d=identicon)[LukeMadhanga](/maintainers/LukeMadhanga)

---

Top Contributors

[![LukeMadhanga](https://avatars.githubusercontent.com/u/7837788?v=4)](https://github.com/LukeMadhanga "LukeMadhanga (29 commits)")[![facuonline](https://avatars.githubusercontent.com/u/37064182?v=4)](https://github.com/facuonline "facuonline (7 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/lukemadhanga-php-document-parser/health.svg)

```
[![Health](https://phpackages.com/badges/lukemadhanga-php-document-parser/health.svg)](https://phpackages.com/packages/lukemadhanga-php-document-parser)
```

###  Alternatives

[mck89/peast

Peast is PHP library that generates AST for JavaScript code

19139.2M47](/packages/mck89-peast)[sauladam/shipment-tracker

Parses tracking information for several carriers, like UPS, USPS, DHL and GLS by simply scraping the data. No need for any kind of API access.

9843.5k](/packages/sauladam-shipment-tracker)[jstewmc/rtf

Read and write Rich Text Format (RTF) documents with PHP

45153.1k6](/packages/jstewmc-rtf)[tcds-io/php-jackson

A lightweight, flexible object serializer for PHP, inspired by FasterXML/jackson

113.2k10](/packages/tcds-io-php-jackson)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
