PHPackages                             joest8/pdfinterpreter - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. joest8/pdfinterpreter

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

joest8/pdfinterpreter
=====================

This class is designed to convert multiple PDF files, whether image-based or text-based, into an array of data.The class uses user-defined templates containing regular expressions to control the data extraction process, allowing for customized and flexible output.

v1.0(2y ago)161MITPHPPHP ^8.2

Since Nov 5Pushed 2y ago1 watchersCompare

[ Source](https://github.com/joest8/pdfinterpreter)[ Packagist](https://packagist.org/packages/joest8/pdfinterpreter)[ RSS](/packages/joest8-pdfinterpreter/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)Dependencies (1)Versions (2)Used By (0)

Pdf Interpreter
===============

[](#pdf-interpreter)

Introduction
------------

[](#introduction)

This class is designed to convert multiple PDF files, whether image-based or text-based, into an array of data. The class uses user-defined templates containing regular expressions to control the data extraction process, allowing for customized and flexible output.

Table of Contents
-----------------

[](#table-of-contents)

This README is divided into several sections:

- [Installation](#installation)
    - [Console Applications](#console-applications)
    - [Automated installation](#automated-installation)
    - [Manual installation with homebrew](#manual-installation-with-homebrew)
    - [Tesseract Language Files ](#tesseract-language-files)
- [Usage](#usage)
    - [Create Object](#create-object)
    - [Get Sample Output](#get-sample-output)
    - [Set new Template](#set-new-template)
    - [Add pattern to template](#add-pattern-to-template)
    - [Get template](#get-template)
    - [Delete template](#delete-template)
    - [Convert Files from Folder](#convert-files-from-folder)
    - [Convert File](#convert-file)

Installation
------------

[](#installation)

```
composer require joest8/pdfinterpreter
```

### Console Applications

[](#console-applications)

To use this class, you'll need to install the following applications:

1. **Poppler** (necessary to convert pdf to text and get information about number of pages in file)
2. **Tesseract** (necessary to read and interpret png file)
3. **ImageMagick** (necessary to convert pdf-&gt;png)

Make sure you have a package-manager installed on your system.

### Automated installation

[](#automated-installation)

Run the following code from the source folder to autoinstall all dependencies and tesseract language files:

```
php install/install_dependencies.php
```

### Manual installation with homebrew

[](#manual-installation-with-homebrew)

If homebrew is installed run the following commands to install the Homebrew packages:

```
brew install poppler tesseract imagemagick
```

### Manual installation of Tesseract Language Files

[](#manual-installation-of-tesseract-language-files)

You also need to install the required Tesseract language files. You can check the available languages at: [https://github.com/tesseract-ocr/tessdata\_best/](https://github.com/tesseract-ocr/tessdata_best/)

Download the necessary language files and place them in the appropriate directory. To find the directory use:

```
tesseract --list-langs
```

Usage
-----

[](#usage)

### Create Object

[](#create-object)

```
