PHPackages                             iamgerwin/php-pdf-to-markdown-parser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. iamgerwin/php-pdf-to-markdown-parser

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

iamgerwin/php-pdf-to-markdown-parser
====================================

A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.

v0.0.1(9mo ago)75.5k↓13.5%1MITPHPPHP ^8.3CI passing

Since Sep 30Pushed 9mo ago1 watchersCompare

[ Source](https://github.com/iamgerwin/php-pdf-to-markdown-parser)[ Packagist](https://packagist.org/packages/iamgerwin/php-pdf-to-markdown-parser)[ Docs](https://github.com/iamgerwin/php-pdf-to-markdown-parser)[ RSS](/packages/iamgerwin-php-pdf-to-markdown-parser/feed)WikiDiscussions main Synced yesterday

READMEChangelogDependencies (6)Versions (2)Used By (0)

PHP PDF to Markdown Parser
==========================

[](#php-pdf-to-markdown-parser)

[![Tests](https://github.com/iamgerwin/php-pdf-to-markdown-parser/actions/workflows/tests.yml/badge.svg)](https://github.com/iamgerwin/php-pdf-to-markdown-parser/actions/workflows/tests.yml)[![Latest Version on Packagist](https://camo.githubusercontent.com/bf583d91807cf038a160e0b389f1c4d23ddd44fb084534f1ceb9c5bf2edddaac/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f69616d67657277696e2f7068702d7064662d746f2d6d61726b646f776e2d7061727365722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/iamgerwin/php-pdf-to-markdown-parser)[![Total Downloads](https://camo.githubusercontent.com/df1d2d06e8aa0b59e2a375fad60cf527df1011aedb14101f01779805cf4fed1d/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f69616d67657277696e2f7068702d7064662d746f2d6d61726b646f776e2d7061727365722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/iamgerwin/php-pdf-to-markdown-parser)

A lightweight PHP library to convert PDF documents into clean, structured Markdown. Supports text extraction, headings, lists, tables, diagrams and code blocks for easier content reuse and publishing.

Because sometimes PDFs just need to chill out and become Markdown.

Features
--------

[](#features)

- 📝 **Text Extraction with Styling** - Preserves headings, bold, italic, and strikethrough formatting
- 📊 **Table Parsing** - Extracts tables with proper headers and body formatting
- 🎨 **Diagram Support** - Converts diagrams to Mermaid and dbdiagram.io formats
    - Flowcharts
    - Sequence diagrams
    - Entity Relationship Diagrams (ERD)
    - Gantt charts
    - Class diagrams
    - State diagrams
    - Pie charts
- 📋 **List Detection** - Automatically converts bullet points and numbered lists
- 💻 **Code Block Recognition** - Identifies and formats code snippets
- 🚀 **PHP 8.3 Compatible** - Built with modern PHP features
- ✅ **PSR-12 Compliant** - Follows PHP coding standards

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require iamgerwin/php-pdf-to-markdown-parser
```

Usage
-----

[](#usage)

### Basic Usage

[](#basic-usage)

```
use Iamgerwin\PdfToMarkdownParser\PdfToMarkdownParser;

$parser = new PdfToMarkdownParser();

// Parse a PDF file
$markdown = $parser->parseFile('path/to/document.pdf');

// Parse PDF content
$pdfContent = file_get_contents('path/to/document.pdf');
$markdown = $parser->parseContent($pdfContent);

// Output the markdown
echo $markdown;
```

### Working with Tables

[](#working-with-tables)

The parser automatically detects and converts tables in your PDF:

```
| Header 1 | Header 2 | Header 3 |
| --- | --- | --- |
| Row 1 Col 1 | Row 1 Col 2 | Row 1 Col 3 |
| Row 2 Col 1 | Row 2 Col 2 | Row 2 Col 3 |
```

### Diagram Extraction

[](#diagram-extraction)

Diagrams are automatically detected and converted to appropriate formats:

**Mermaid Flowcharts:**

```
```mermaid
flowchart TD
    Start --> Process --> End
```

```

**ERD (dbdiagram.io format):**
```markdown
```dbdiagram
Table users {
  id int
  name varchar
  email varchar
}

```

```

**Sequence Diagrams:**
```markdown
```mermaid
sequenceDiagram
    User->>System: Request
    System->>Database: Query
    Database->>System: Response
    System->>User: Result

```

```

### Text Styling

The parser preserves text styling from PDFs:

- Headings (H1-H6) based on font size and formatting
- **Bold text**
- *Italic text*
- ~~Strikethrough text~~
- Lists (bulleted and numbered)
- Code blocks

## Advanced Configuration

### Custom Extractors

You can extend the parser with custom extractors:

```php
use Iamgerwin\PdfToMarkdownParser\PdfToMarkdownParser;
use Iamgerwin\PdfToMarkdownParser\Extractors\TextExtractor;
use Iamgerwin\PdfToMarkdownParser\Extractors\TableExtractor;
use Iamgerwin\PdfToMarkdownParser\Extractors\DiagramExtractor;

$parser = new PdfToMarkdownParser();

// The parser uses these extractors internally:
// - TextExtractor: Handles text and styling
// - TableExtractor: Processes tables
// - DiagramExtractor: Converts diagrams

```

Testing
-------

[](#testing)

Run the test suite:

```
composer test
```

Run tests with coverage:

```
composer test-coverage
```

Run PHPStan static analysis:

```
composer analyse
```

Format code with Laravel Pint:

```
composer format
```

Requirements
------------

[](#requirements)

- PHP 8.3 or higher
- ext-mbstring

How It Works
------------

[](#how-it-works)

The parser uses a multi-stage extraction process:

1. **PDF Parsing** - Uses the robust smalot/pdfparser library to extract raw content
2. **Text Analysis** - Identifies text styling, headings, and formatting patterns
3. **Table Detection** - Recognizes table structures (pipe, tab, or space-separated)
4. **Diagram Recognition** - Detects diagram patterns and converts to Mermaid/dbdiagram formats
5. **Markdown Generation** - Combines all elements into properly formatted Markdown

Limitations
-----------

[](#limitations)

- **Images**: Currently, images are not extracted (coming in future versions)
- **Complex Layouts**: Multi-column layouts may require manual adjustment
- **Font Styling**: Basic bold/italic detection is simplified (font metadata parsing is limited)
- **Diagrams**: Pattern matching may not catch all diagram types

Changelog
---------

[](#changelog)

Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.

Contributing
------------

[](#contributing)

Contributions are welcome! Please feel free to submit a Pull Request.

Security
--------

[](#security)

If you discover any security related issues, please email  instead of using the issue tracker.

Credits
-------

[](#credits)

- [iamgerwin](https://github.com/iamgerwin)

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

Acknowledgments
---------------

[](#acknowledgments)

Built with inspiration from the PHP community and the need to make PDF content more accessible and reusable. Special thanks to the maintainers of [smalot/pdfparser](https://github.com/smalot/pdfparser) for their excellent PDF parsing library.

###  Health Score

37

—

LowBetter than 81% of packages

Maintenance58

Moderate activity, may be stable

Popularity31

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity40

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

275d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/8f6f1736b8e2833ccb6c4098e4ecb5081d9cd5db2af370fde0365d441fbfbf59?d=identicon)[iamgerwin](/maintainers/iamgerwin)

---

Top Contributors

[![iamgerwin](https://avatars.githubusercontent.com/u/1331683?v=4)](https://github.com/iamgerwin "iamgerwin (1 commits)")

---

Tags

pdfparserconvertermarkdowntablesdiagramsmermaidpdf-to-markdown

###  Code Quality

TestsPest

Static AnalysisPHPStan

Code StyleLaravel Pint

Type Coverage Yes

### Embed Badge

![Health badge](/badges/iamgerwin-php-pdf-to-markdown-parser/health.svg)

```
[![Health](https://phpackages.com/badges/iamgerwin-php-pdf-to-markdown-parser/health.svg)](https://phpackages.com/packages/iamgerwin-php-pdf-to-markdown-parser)
```

###  Alternatives

[smalot/pdfparser

Pdf parser library. Can read and extract information from pdf file.

2.7k40.5M268](/packages/smalot-pdfparser)[gotenberg/gotenberg-php

A PHP client for interacting with Gotenberg, a developer-friendly API for converting numerous document formats into PDF files, and more!

3856.2M31](/packages/gotenberg-gotenberg-php)[faisalman/simple-excel-php

Easily parse / convert / write between Microsoft Excel XML / CSV / TSV / HTML / JSON / etc formats

578610.1k1](/packages/faisalman-simple-excel-php)[mnvx/lowrapper

PHP wrapper over LibreOffice converter

127201.9k](/packages/mnvx-lowrapper)[paperdoc-dev/paperdoc-lib

A zero-dependency PHP library for generating, parsing and converting documents (PDF, HTML, CSV, DOCX)

1253.7k](/packages/paperdoc-dev-paperdoc-lib)[xthiago/pdf-version-converter

PHP library for converting the version of PDF files (for compatibility purposes).

70481.5k](/packages/xthiago-pdf-version-converter)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
