PHPackages                             oneofftech/parse-client - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. oneofftech/parse-client

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

oneofftech/parse-client
=======================

Parse PDF document keeping the structure.

v0.2.0(1y ago)0423[2 PRs](https://github.com/OneOffTech/parse-client/pulls)MITPHPPHP ^8.2CI passing

Since Feb 21Pushed 1mo ago1 watchersCompare

[ Source](https://github.com/OneOffTech/parse-client)[ Packagist](https://packagist.org/packages/oneofftech/parse-client)[ Docs](https://github.com/oneofftech/oneofftech-parse-client)[ RSS](/packages/oneofftech-parse-client/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (2)Dependencies (4)Versions (5)Used By (0)

OneOffTech Parse client
=======================

[](#oneofftech-parse-client)

[![Latest Version on Packagist](https://camo.githubusercontent.com/d012cd517e8d74b0a6d8ed7fde5fc3ebd99e18f716ef371ffa071dec73f9f462/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6f6e656f6666746563682f70617273652d636c69656e742e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/oneofftech/parse-client)[![Tests](https://github.com/OneOffTech/parse-client/actions/workflows/run-tests.yml/badge.svg)](https://github.com/OneOffTech/parse-client/actions/workflows/run-tests.yml)[![Total Downloads](https://camo.githubusercontent.com/8c395914f6b9bfe158d2e7b579e3887df6161b440eb6b5e8bf2dc6b873e7e6ee/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f6f6e656f6666746563682f70617273652d636c69656e742e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/oneofftech/parse-client)

Parse client is a library to interact with [OneOffTech Parse](https://parse.oneofftech.de) service. OneOffTech Parse is designed to extract text from PDF files preserving the [structure of the document](#document-structure) to improve interaction with Large Language Models (LLMs).

OneOffTech Parse is based on [Parxy extractor](https://github.com/OneOffTech/parxy). The client is also suitable to connect to self-hosted versions of [Parxy](https://github.com/OneOffTech/parxy).

Note

The Parse client package is under development and is not ready for production use.

Installation
------------

[](#installation)

You can install the package via Composer:

```
composer require oneofftech/parse-client
```

Usage
-----

[](#usage)

The Parse client is able to connect to self-hosted instances of the [Parxy](https://github.com/OneOffTech/parxy) extractor service or the cloud hosted [OneOffTech Parse](https://parse.oneofftech.de) service.

### Use with self-hosted instance

[](#use-with-self-hosted-instance)

Before proceeding a running instance of [Parxy](https://github.com/OneOffTech/parxy) is required. Once you have a running instance, you can instantiate the connector by passing the url that the extractor service is listening on.

```
use OneOffTech\Parse\Client\Connectors\ParseConnector;

$client = new ParseConnector(baseUrl: "http://localhost:5000");

/** @var \OneOffTech\Parse\Client\Dto\DocumentDto */
$document = $client->parse("https://domain.internal/document.pdf");
```

Note

- The URL of the document must be accessible without authentication.
- Documents are downloaded for the time of processing and then the file is immediately deleted.

### Use the cloud hosted service

[](#use-the-cloud-hosted-service)

Important

The cloud hosted service is currently in private beta. [Drop us a message](https://oneofftech.de/).

Go to [parse.oneofftech.de](https://parse.oneofftech.de) and obtain an access token. Instantiate the client and provide a URL of a PDF document.

```
use OneOffTech\Parse\Client\Connectors\ParseConnector;

$client = new ParseConnector("token");

/** @var \OneOffTech\Parse\Client\Dto\DocumentDto */
$document = $client->parse("https://domain.internal/document.pdf");
```

Note

- The URL of the document must be accessible without authentication.
- Documents are downloaded for the time of processing and then the file is immediately deleted.

### Specify the preferred extraction method

[](#specify-the-preferred-extraction-method)

Parse service supports different processors, [`pymupdf`](https://github.com/pymupdf/PyMuPDF) or [`pdfact`](https://github.com/data-house/pdfact), [`unstructured`](https://unstructured.io/) and [`llamaparse`](https://docs.cloud.llamaindex.ai/llamaparse/getting_started). You can specify the preferred processor for each request.

```
use OneOffTech\Parse\Client\ParseOption;
use OneOffTech\Parse\Client\DocumentProcessor;
use OneOffTech\Parse\Client\Connectors\ParseConnector;

$client = new ParseConnector("token");

/** @var \OneOffTech\Parse\Client\Dto\DocumentDto */
$document = $client->parse(
    url: "https://domain.internal/document.pdf",
    options: new ParseOption(DocumentProcessor::PYMUPDF)
);
```

### PDFAct vs PyMuPDF

[](#pdfact-vs-pymupdf)

PDFAct offers more flexibility than PyMuPDF. You should evaluate the extraction method best suitable for your application. Here is a small comparison of the two methods.

featurePDFActPyMuPDFText extraction✅✅Pagination✅✅Headings identification✅-Text styles (e.g. bold or italic)✅-Page header✅-Page footer✅-Document structure
------------------

[](#document-structure)

Parse is designed to preserve the document's structure hence the content is returned in a hierarchical fashion.

```
Document
 ├─Page
 │  ├─Text (category: heading)
 │  └─Text (category: body)
 └─Page
    ├─Text (category: heading)
    └─Text (category: body)

```

For a more in-depth explanation of the structure see [Parse Document Model](https://github.com/OneOffTech/parse-document-model-python).

Testing
-------

[](#testing)

Parse client is tested using [PEST](https://pestphp.com/). Tests run for each commit and pull request.

To execute the test suite run:

```
composer test
```

Changelog
---------

[](#changelog)

Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.

Contributing
------------

[](#contributing)

Thank you for considering contributing to the Parse client! The contribution guide can be found in the [CONTRIBUTING.md](./.github/CONTRIBUTING.md) file.

Security Vulnerabilities
------------------------

[](#security-vulnerabilities)

Please review [our security policy](./.github/SECURITY.md) on how to report security vulnerabilities.

Credits
-------

[](#credits)

- [OneOffTech](https://github.com/OneOffTech)
- [All Contributors](../../contributors)

Supporters
----------

[](#supporters)

The project is provided and supported by [OneOff-Tech (UG)](https://oneofftech.de).

[![](https://raw.githubusercontent.com/OneOffTech/.github/main/art/oneofftech-logo.svg)](https://oneofftech.de)

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

36

—

LowBetter than 82% of packages

Maintenance70

Regular maintenance activity

Popularity15

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 59.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~11 days

Total

2

Last Release

440d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/25452415?v=4)[OneOffTech](/maintainers/OneOffAdmins)[@OneOffAdmins](https://github.com/OneOffAdmins)

---

Top Contributors

[![avvertix](https://avatars.githubusercontent.com/u/5672748?v=4)](https://github.com/avvertix "avvertix (22 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (10 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (5 commits)")

---

Tags

pdftext-extractionpdfparseparsingtext-extract

###  Code Quality

TestsPest

Code StyleLaravel Pint

### Embed Badge

![Health badge](/badges/oneofftech-parse-client/health.svg)

```
[![Health](https://phpackages.com/badges/oneofftech-parse-client/health.svg)](https://phpackages.com/packages/oneofftech-parse-client)
```

###  Alternatives

[smalot/pdfparser

Pdf parser library. Can read and extract information from pdf file.

2.7k34.5M216](/packages/smalot-pdfparser)[barryvdh/laravel-dompdf

A DOMPDF Wrapper for Laravel

7.3k87.6M278](/packages/barryvdh-laravel-dompdf)[tecnickcom/tcpdf

TCPDF is a PHP class for generating PDF documents and barcodes.

4.5k101.8M473](/packages/tecnickcom-tcpdf)[mpdf/mpdf

PHP library generating PDF files from UTF-8 encoded HTML

4.7k77.1M493](/packages/mpdf-mpdf)[knplabs/knp-snappy

PHP library allowing thumbnail, snapshot or PDF generation from a url or a html page. Wrapper for wkhtmltopdf/wkhtmltoimage.

4.5k68.3M56](/packages/knplabs-knp-snappy)[spatie/browsershot

Convert a webpage to an image or pdf using headless Chrome

5.2k32.1M102](/packages/spatie-browsershot)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
