PHPackages                             pandoc-php/pandoc - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. pandoc-php/pandoc

ActiveLibrary[PDF &amp; Document Generation](/categories/documents)

pandoc-php/pandoc
=================

A native PHP 8.4 port of the Pandoc document converter.

4.2.0(1w ago)043GPL-2.0-or-laterHTMLPHP &gt;=8.4

Since Jan 7Pushed 3w agoCompare

[ Source](https://github.com/snorky22/php-pandoc)[ Packagist](https://packagist.org/packages/pandoc-php/pandoc)[ Docs](https://github.com/pandoc-php/pandoc)[ RSS](/packages/pandoc-php-pandoc/feed)WikiDiscussions main Synced today

READMEChangelogDependencies (7)Versions (18)Used By (0)

Pandoc PHP
==========

[](#pandoc-php)

A native PHP 8.4 port of the [Pandoc](https://pandoc.org/) document converter. This library converts documents between formats (Word `.docx`, Excel `.xlsx`, PowerPoint `.pptx`, HTML `.html`, Markdown `.md`, Jupyter `.ipynb`, BibTeX `.bib` → LaTeX) without requiring the system-level Pandoc binary.

Features
--------

[](#features)

- **Native PHP 8.4**: Uses `readonly` classes, Enums, and property hooks.
- **AST-Centric Architecture**: Mirrors Pandoc's Abstract Syntax Tree for robust conversions.
- **Modular Reader System**: Factory pattern and `ReaderInterface` for easy format expansion.
- **Deep Docx Parsing**: Paragraphs, headers, tables, lists, images, bold/italic/underline/strikeout, superscript/subscript, text and background colors, hyperlinks (external `\href`/`\url`, internal `\hyperref`), footnotes and endnotes (`\footnote`), automatic run-merging (consecutive runs with identical styling are collapsed into one command), and black-color suppression (spurious `\textcolor[HTML]{000000}` commands are dropped).
- **Excel (XLSX)**: All sheets as booktabs tables, shared strings, bold/italic, embedded images, chart extraction (JSON metadata + CSV data for Chart.js), per-sheet CSV export with locale-aware separators, and a `metadata.json` summary of document locale.
- **PowerPoint (PPTX)**: Each slide becomes a `slide` environment, all slides wrapped in a `slider` environment. Images, embedded videos (`\begin{video}...\end{video}`), and audio (`\begin{audio}...\end{audio}`) extracted to MediaBag.
- **LaTeX Generation**: Standalone documents or body fragments.
- **Automatic ZIP Bundling**: When a document contains images or chart data, output is a `.zip` with the `.tex` and all media files in the same directory. Plain `.tex` otherwise.
- **Full UTF-8**: End-to-end UTF-8, supporting CJK, Cyrillic, Arabic, Thai, and all Latin-extended scripts.
- **No External Dependencies**: Pure PHP 8.4+.

Installation
------------

[](#installation)

Requires PHP 8.4 or higher.

```
composer require pandoc-php/pandoc
```

Basic Usage
-----------

[](#basic-usage)

### Converting a Word Document to LaTeX

[](#converting-a-word-document-to-latex)

```
use Pandoc\Reader\DocxReader;
use Pandoc\Writer\LatexWriter;

$reader = new DocxReader();
$writer = new LatexWriter();

$doc   = $reader->read('document.docx');
$latex = $writer->write($doc, standalone: true);

file_put_contents('document.tex', $latex);
```

### Converting Markdown to a LaTeX Fragment

[](#converting-markdown-to-a-latex-fragment)

```
use Pandoc\Reader\MarkdownReader;
use Pandoc\Writer\LatexWriter;

$reader   = new MarkdownReader();
$writer   = new LatexWriter();
$markdown = "# Hello World\nThis is a paragraph.";
$doc      = $reader->read($markdown);

// standalone: false → body only, no \documentclass preamble
$fragment = $writer->write($doc, standalone: false);
```

### Converting HTML to LaTeX

[](#converting-html-to-latex)

```
use Pandoc\Reader\HtmlReader;
use Pandoc\Writer\LatexWriter;

$reader = new HtmlReader();
$writer = new LatexWriter();

$doc   = $reader->read("HelloWorld");
$latex = $writer->write($doc);
```

### Converting an Excel Spreadsheet to LaTeX

[](#converting-an-excel-spreadsheet-to-latex)

```
use Pandoc\Reader\XlsxReader;
use Pandoc\Writer\LatexWriter;

$reader = new XlsxReader();
$writer = new LatexWriter();

$doc   = $reader->read('spreadsheet.xlsx');
$latex = $writer->write($doc);
```

Each sheet produces a level-2 header followed by a `booktabs` table. If the spreadsheet contains embedded images or charts, use the ZIP output pattern below.

> **Note**: Only `.xlsx` (OOXML) is supported. Legacy `.xls` files must be converted first (e.g. via LibreOffice).

**Chart extraction**: Charts are exported as two companion files added to the MediaBag:

`chart1.json` — Chart.js-ready metadata:

```
{
  "type": "bar",
  "title": "Sales by Quarter",
  "dataFile": "chart1.csv",
  "options": {
    "indexAxis": "x",
    "scales": {
      "x": { "title": { "display": true, "text": "Quarter" }, "stacked": false },
      "y": { "title": { "display": true, "text": "Revenue" }, "stacked": false }
    }
  },
  "series": [
    { "label": "Product A" },
    { "label": "Product B" }
  ]
}
```

`chart1.csv` — the data (categories + one column per series):

```
Category,Product A,Product B
Q1,120,85
Q2,135,90
Q3,128,95
Q4,145,110

```

A comment marker is inserted in the LaTeX at the chart's position:

```
% [pandoc-chart: chart1.json]
```

Your app reads the marker → loads the JSON → finds `dataFile` → loads the CSV → renders with Chart.js.

**Per-sheet CSV export**: Each worksheet is also exported as a standalone CSV file (e.g. `sheet-Sales.csv`) added to the MediaBag. Trailing empty rows and columns are stripped automatically.

**Locale detection**: The reader inspects `docProps/core.xml` for a `` tag and selects separators accordingly:

Language groupDecimal sep.Thousands sep.Column delim.`en`, `ja`, `zh`, `pt-BR`, …`.``,``,``fr`, `de`, `it`, `es`, `nl`, `pl`, `ru`, …`,``.``;`When no language tag is present the file falls back to `en-US` conventions.

**`metadata.json`**: Always added to the MediaBag alongside the CSVs:

```
{
    "language": "fr-FR",
    "decimalSeparator": ",",
    "thousandsSeparator": ".",
    "columnDelimiter": ";",
    "quoteCharacter": "\"",
    "sheets": ["Sheet1", "Sheet2"]
}
```

**Utility script**: `export_xlsx_media.php` converts any `.xlsx` file to a ZIP containing its CSVs and `metadata.json`:

```
php export_xlsx_media.php spreadsheet.xlsx output.zip
```

### Converting a PowerPoint Presentation to LaTeX

[](#converting-a-powerpoint-presentation-to-latex)

```
use Pandoc\Reader\PptxReader;
use Pandoc\Writer\LatexWriter;

$reader = new PptxReader();
$writer = new LatexWriter();

$doc   = $reader->read('presentation.pptx');
$latex = $writer->write($doc, standalone: true);
```

Each slide is wrapped in a `slide` environment (with the slide title as argument), and all slides are enclosed in a `slider` environment:

```
\begin{slider}

\begin{slide}{Slide Title}
Paragraph content here.
\end{slide}

\begin{slide}{Second Slide}
More content.
\end{slide}

\end{slider}
```

These are custom environments — define them in your LaTeX preamble to control rendering. All images (including slide master/template graphics) are extracted into the MediaBag.

Embedded videos are exported as a `video` environment:

```
\begin{video}
\url{media1.mp4}
\type{mp4}
\end{video}
```

Embedded audio is exported as an `audio` environment:

```
\begin{audio}
\url{recording.mp3}
\end{audio}
```

All media files (images, video, audio) are included in the ZIP output alongside the `.tex`.

### Converting BibTeX to LaTeX

[](#converting-bibtex-to-latex)

```
use Pandoc\Reader\BibtexReader;
use Pandoc\Writer\LatexWriter;

$reader  = new BibtexReader();
$writer  = new LatexWriter();

$content = file_get_contents('references.bib');
$doc     = $reader->read($content);

// standalone: false → bibliography block only, no \documentclass preamble
$fragment = $writer->write($doc, standalone: false);
file_put_contents('references.tex', $fragment);
```

The output is a self-contained `thebibliography` block:

```
\begin{thebibliography}{99}

\bibitem{Smith2020}
\emph{A Great Title}, John Smith, Journal of Examples, 2020

\end{thebibliography}
```

- HTTP/HTTPS URLs are automatically wrapped in `\url{…}`.
- The `title`, `booktitle`, `journal`, `series`, and `publisher` fields are italicised with `\emph{…}`.
- BibTeX output is always produced as a fragment (`standalone: false`); the web interface enforces this automatically.

### Converting Jupyter Notebooks to LaTeX

[](#converting-jupyter-notebooks-to-latex)

```
use Pandoc\Reader\IpynbReader;
use Pandoc\Writer\LatexWriter;

$reader = new IpynbReader();
$writer = new LatexWriter();

$json  = file_get_contents('notebook.ipynb');
$doc   = $reader->read($json);
$latex = $writer->write($doc);
```

Output: Plain `.tex` or `.zip`
------------------------------

[](#output-plain-tex-or-zip)

When a document contains images, charts, or other media, you need to bundle them alongside the `.tex` file. The `MediaBag` tells you whether there are any attachments:

```
use Pandoc\Reader\ReaderFactory;
use Pandoc\Writer\LatexWriter;

$reader = ReaderFactory::createForExtension('docx'); // or xlsx, pptx, etc.
$doc    = $reader->read($filePath);
$latex  = (new LatexWriter())->write($doc, standalone: true);

if (!$doc->mediaBag->isEmpty()) {
    // Bundle .tex + all media into a ZIP
    $zip = new ZipArchive();
    $zip->open('output.zip', ZipArchive::CREATE | ZipArchive::OVERWRITE);
    $zip->addFromString('document.tex', $latex);
    foreach ($doc->mediaBag->getAll() as $filename => $media) {
        $zip->addFromString($filename, $media['contents']);
    }
    $zip->close();
    // → distribute output.zip
} else {
    // No media — plain .tex is sufficient
    file_put_contents('document.tex', $latex);
}
```

All media files (images, chart JSON/CSV) are stored at the **root of the ZIP**, so `\includegraphics{image.png}` and chart references resolve correctly when the `.tex` is compiled or processed from the same directory.

Web Interface
-------------

[](#web-interface)

The project includes a web-based demonstration tool in `web/`.

1. Point your web server to the `php-pandoc/web/` folder.
2. Open `index.html` in your browser.
3. Upload a `.docx`, `.xlsx`, `.pptx`, `.html`, `.ipynb`, `.md`, or `.bib` file.
4. Choose Standalone or Fragment output.
5. Download the result — a plain `.tex` if the document has no media, or a `.zip` if it does.

Supported Structures
--------------------

[](#supported-structures)

See [SUPPORTED\_STRUCTURES.md](SUPPORTED_STRUCTURES.md) for a full feature list. Highlights:

- **Word**: Headers (H1–H6, Title), bold/italic/underline/strikeout/color, lists, tables, images, headers &amp; footers, hyperlinks, footnotes/endnotes, automatic run-merging.
- **Excel**: Multi-sheet tables, cell formatting, embedded images, Chart.js-ready chart extraction, per-sheet CSV export with locale-aware separators.
- **PowerPoint**: Slide titles, body text, bullet/ordered lists, images, tables, `slide`/`slider` LaTeX environments.
- **HTML**: Full block and inline element support.
- **Jupyter**: Markdown cells, code blocks, output images.
- **BibTeX**: Entries rendered as a `thebibliography` environment with `\bibitem` items; URLs wrapped in `\url{…}`, and title/journal/booktitle/series/publisher fields italicised with `\emph{…}`.

Development and Testing
-----------------------

[](#development-and-testing)

```
./vendor/bin/phpunit
```

Credits
-------

[](#credits)

This project is a port of [Pandoc](https://github.com/jgm/pandoc), originally created by John MacFarlane.

License
-------

[](#license)

GPL v2 or later, mirroring the original Pandoc license.

###  Health Score

45

—

FairBetter than 91% of packages

Maintenance95

Actively maintained with recent releases

Popularity10

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity59

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~11 days

Recently: every ~3 days

Total

16

Last Release

11d ago

Major Versions

1.2.1 → 2.0.02026-05-28

2.0.0 → v3.0.02026-05-28

3.3.2 → 4.0.02026-06-20

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/3802603?v=4)[snorky22](/maintainers/snorky22)[@snorky22](https://github.com/snorky22)

---

Top Contributors

[![snorky22](https://avatars.githubusercontent.com/u/3802603?v=4)](https://github.com/snorky22 "snorky22 (39 commits)")

---

Tags

htmlexcelxlsxcsvconvertermarkdowndocxastlocalepptxPowerPointBibliographylatexbibtexpandocjupyter

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/pandoc-php-pandoc/health.svg)

```
[![Health](https://phpackages.com/badges/pandoc-php-pandoc/health.svg)](https://phpackages.com/packages/pandoc-php-pandoc)
```

###  Alternatives

[gotenberg/gotenberg-php

A PHP client for interacting with Gotenberg, a developer-friendly API for converting numerous document formats into PDF files, and more!

3856.2M31](/packages/gotenberg-gotenberg-php)[rap2hpoutre/fast-excel

Fast Excel import/export for Laravel

2.3k27.0M52](/packages/rap2hpoutre-fast-excel)[openspout/openspout

PHP Library to read and write spreadsheet files (CSV, XLSX and ODS), in a fast and scalable way

1.2k70.2M243](/packages/openspout-openspout)[faisalman/simple-excel-php

Easily parse / convert / write between Microsoft Excel XML / CSV / TSV / HTML / JSON / etc formats

578610.1k1](/packages/faisalman-simple-excel-php)[mnvx/lowrapper

PHP wrapper over LibreOffice converter

127201.9k](/packages/mnvx-lowrapper)[avadim/fast-excel-reader

Lightweight and very fast XLSX Excel Spreadsheet and CSV Reader in PHP

107737.8k11](/packages/avadim-fast-excel-reader)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
