PHPackages                             raphaelramosds/pdf-to-txt - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. raphaelramosds/pdf-to-txt

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

raphaelramosds/pdf-to-txt
=========================

A simple package for converting a PDF file into TXT

v1.0.0(1mo ago)04Apache-2.0PHPPHP &gt;=8.3

Since Mar 31Pushed 1mo ago1 watchersCompare

[ Source](https://github.com/raphaelramosds/pdf-to-txt)[ Packagist](https://packagist.org/packages/raphaelramosds/pdf-to-txt)[ RSS](/packages/raphaelramosds-pdf-to-txt/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)Dependencies (1)Versions (2)Used By (0)

PdfToTxt
========

[](#pdftotxt)

PdfToTxt is a simple package for converting a PDF file into TXT with PHP

Installation via Composer
-------------------------

[](#installation-via-composer)

```
composer require raphaelramosds/pdf-to-txt
```

Example
-------

[](#example)

Converts `file.pdf` into `file.txt` and save it on `path/to/txt directory`

```
$ptt = new PdfToTxt('path/to/file.pdf', 'path/to/txt', 'file');
$ptt->convert();
```

Web
---

[](#web)

You can also use it as a web application. Just clone the repository, build the Docker image and run the container with Docker Compose

 [![Web page to convert PDF to TXT using PdfToTxt](./docs/demo.png)](./docs/demo.png)
 *Web interface for converting PDF files to TXT*

Dependencies
------------

[](#dependencies)

Unfortunately, this package can only be used in a Linux environment. Additionally, you will need to install the following dependencies

### Tesseract OCR package

[](#tesseract-ocr-package)

```
# Install Tesseract OCR and its support to PT-BR language
sudo apt install tesseract-ocr tesseract-ocr-por
```

### ImageMagick

[](#imagemagick)

```
# Install
sudo apt install imagemagick php-imagick

# Enable imagick extension
sudo phpenmod imagick

# (Optional) Check if it is enabled
php -m | grep imagick
```

How does it work?
-----------------

[](#how-does-it-work)

It uses ImageMagick to convert all PDF pages into JPG format, extracts their content using Tesseract OCR and compiles the results into a single TXT file.

### Ghostscript support

[](#ghostscript-support)

While some PDF files use standard fonts that can be easily mapped to text, others rely on custom fonts which often store characters as vector graphics. In such cases, OCR becomes necessary to extract readable content. Therefore, in the future, I plan to add Ghostscript support to this package as an alternative method for handling these PDFs without relying solely on OCR.

You can use the following Ghostscript command to convert a PDF into a plain text file

```
gs -sDEVICE=txtwrite -o file.txt file.pdf
```

Before using this approach, it's recommended to check which fonts are used in the PDF. You can do that with the following command

```
gs -DPDFINFO file.pdf
```

Tests
-----

[](#tests)

Unit tests were written with [PHPUnit](https://phpunit.de/)

```
./vendor/bin/phpunit tests
```

###  Health Score

39

—

LowBetter than 86% of packages

Maintenance92

Actively maintained with recent releases

Popularity3

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity48

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

41d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/48178d0bb9037eaa2f08450b346f9e0104719608600339cc976016ae9fc8431b?d=identicon)[raphaelramosds](/maintainers/raphaelramosds)

---

Top Contributors

[![raphaelramosds](https://avatars.githubusercontent.com/u/31601293?v=4)](https://github.com/raphaelramosds "raphaelramosds (26 commits)")

---

Tags

ghostscriptimagickphptesseractocr

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/raphaelramosds-pdf-to-txt/health.svg)

```
[![Health](https://phpackages.com/badges/raphaelramosds-pdf-to-txt/health.svg)](https://phpackages.com/packages/raphaelramosds-pdf-to-txt)
```

###  Alternatives

[phpoffice/phpspreadsheet

PHPSpreadsheet - Read, Create and Write Spreadsheet documents in PHP - Spreadsheet engine

13.9k293.5M1.3k](/packages/phpoffice-phpspreadsheet)[spatie/browsershot

Convert a webpage to an image or pdf using headless Chrome

5.2k32.1M102](/packages/spatie-browsershot)[smalot/pdfparser

Pdf parser library. Can read and extract information from pdf file.

2.7k34.5M216](/packages/smalot-pdfparser)[barryvdh/laravel-snappy

Snappy PDF/Image for Laravel

2.8k24.8M48](/packages/barryvdh-laravel-snappy)[openspout/openspout

PHP Library to read and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way

1.1k57.6M131](/packages/openspout-openspout)[keboola/csv

Keboola CSV reader and writer

1451.8M21](/packages/keboola-csv)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
