PHPackages                             ineersa/html2text - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. ineersa/html2text

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

ineersa/html2text
=================

`html2text` converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

v1.0.0(7mo ago)281[4 PRs](https://github.com/ineersa/html2text/pulls)GPL-3.0-or-laterPHPPHP &gt;=8.4CI passing

Since Oct 1Pushed 1mo agoCompare

[ Source](https://github.com/ineersa/html2text)[ Packagist](https://packagist.org/packages/ineersa/html2text)[ RSS](/packages/ineersa-html2text/feed)WikiDiscussions main Synced 2mo ago

READMEChangelog (3)Dependencies (4)Versions (2)Used By (0)

PHP Html2Text
=============

[](#php-html2text)

[![CI](https://github.com/ineersa/html2markdown/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/ineersa/html2markdown/actions/workflows/main.yml)[![codecov](https://camo.githubusercontent.com/f03a3800e8604115e2e543bbbb30cc46d73672be1b27ba0aed0c1c2c4c764b6b/68747470733a2f2f636f6465636f762e696f2f67682f696e65657273612f68746d6c326d61726b646f776e2f6272616e63682f6d61696e2f67726170682f62616467652e737667)](https://codecov.io/gh/ineersa/html2markdown)

`html2text` converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

It is a PHP port of [Alir3z4/html2text](https://github.com/Alir3z4/html2text) with few fixes and updates.

Functionality parity is checked via the test suite, which contains all the test cases from the original and more. Most of the code was translated with AI with a lot of refactoring and fixes.

How to install/requirements
---------------------------

[](#how-to-installrequirements)

Project is using new DOM extension for better HTML parser and requires `ext-libxml`. PHP version required - 8.4+

To install run composer command:

```
composer require ineersa/html2text
```

Usage
-----

[](#usage)

Basic usage:

```
$html = (string) file_get_contents($source);

$config = new Ineersa\Html2text\Config();
$html2Markdown = new Ineersa\Html2text\HTML2Markdown($config);
$markdown = $html2Markdown($html);
```

Config options are compatible with Python library:

```
final readonly class Config
{
    public function __construct(
        /** Use Unicode characters instead of ASCII fallbacks. */
        public bool $unicodeSnob = false,
        /** Escape all special characters even if output is less readable. */
        public bool $escapeSnob = false,
        /** Append footnote links immediately after each paragraph. */
        public bool $linksEachParagraph = false,
        /** Wrap long lines at the configured column (0 disables wrapping). */
        public int $bodyWidth = 78,
        /** Skip internal anchors like href="#local-anchor". */
        public bool $skipInternalLinks = true,
        /** Render links using inline Markdown syntax. */
        public bool $inlineLinks = true,
        /** Surround links with angle brackets to prevent wraps. */
        public bool $protectLinks = false,
        /** Allow links to wrap across lines. */
        public bool $wrapLinks = true,
        /** Wrap list items at the configured body width. */
        public bool $wrapListItems = false,
        /** Wrap table output text. */
        public bool $wrapTables = false,
        /** Is Google Doc */
        public bool $googleDoc = false,
        /** Callback to apply at tag processing, $callback($this, $tag, $attrs, $start), should return true to break processing, false otherwise */
        public ?\Closure $tagCallback = null,
        /** Pixels Google uses to indent nested lists. */
        public int $googleListIndent = 36,
        /**
         * Values that indicate bold text in inline styles.
         *
         * @var string[]
         */
        public array $boldTextStyleValues = ['bold', '700', '800', '900'],
        /** Ignore anchor tags entirely. */
        public bool $ignoreAnchors = false,
        /** Ignore mailto links during conversion. */
        public bool $ignoreMailtoLinks = false,
        /** Drop all image tags from the output. */
        public bool $ignoreImages = false,
        /** Keep image tags rendered as raw HTML. */
        public bool $imagesAsHtml = false,
        /** Replace images with their alt text. */
        public bool $imagesToAlt = false,
        /** Include width/height attributes when preserving images. */
        public bool $imagesWithSize = false,
        /** Ignore text emphasis such as italics and bold. */
        public bool $ignoreEmphasis = false,
        /** Wrap inline code with custom markers. */
        public bool $markCode = false,
        /** Use backquotes instead of indentation for code blocks. */
        public bool $backquoteCodeStyle = false,
        /** Fallback alt text when an image omits it. */
        public string $defaultImageAlt = '',
        /** Pad tables to align cell widths. */
        public bool $padTables = false,
        /** Convert absolute links with identical href/text to  style. */
        public bool $useAutomaticLinks = true,
        /** Render tables as HTML instead of Markdown. */
        public bool $bypassTables = false,
        /** Ignore table tags but retain row content. */
        public bool $ignoreTables = false,
        /** Emit a single line break after block elements (requires width 0). */
        public bool $singleLineBreak = false,
        /** Use as the opening quotation mark for  tags. */
        public string $openQuote = '"',
        /** Use as the closing quotation mark for  tags. */
        public string $closeQuote = '"',
        /** Include  and  tags in the output. */
        public bool $includeSupSub = false,
        /** baseUrl to join with URLs if needed */
        public string $baseUrl = '',
        /** Number of list nesting levels skipped when applying visual indentation. */
        public int $listIndentBaseLevel = 0,
        /** Add indentation before definition list descriptions (). */
        public bool $indentDefinitionDescriptions = true,
        /** Add a blank line after closing a definition description (). */
        public bool $blankLineAfterDefinitionDescription = false,
        /** Append an extra newline after closing a top-level list. */
        public bool $appendFinalListNewline = true,
        /** Append one extra raw newline after a top-level list closes. */
        public bool $appendRawNewlineAfterTopLevelList = false,
        /** Emphasis marks */
        public string $ulItemMark = '*',
        public string $emphasisMark = '_',
        public string $strongMark = '**',
        /** hide strikethrough emphasis */
        public bool $hideStrikethrough = false,
    ) {
    }
}
```

New parity-oriented options added for finer output control:

- `listIndentBaseLevel`: removes indentation from the first N list levels (useful when aligning with other converters that keep top-level lists flush-left).
- `indentDefinitionDescriptions`: toggles the `    ` prefix for `` entries.
- `blankLineAfterDefinitionDescription`: inserts a paragraph break after each ``.
- `appendFinalListNewline`: appends a newline when closing the outermost list.
- `appendRawNewlineAfterTopLevelList`: appends one more raw newline after a top-level list close.

Rust parity
-----------

[](#rust-parity)

This project can be validated against the [Rust implementation](https://github.com/kreuzberg-dev/html-to-markdown) using imported HTML-&gt;Markdown fixture pairs.

- `PHP Fixtures Suite` checks the PHP port's original fixture set.
- `Rust Fixtures Suite` checks parity fixtures imported from the Rust repository.
- You can run only parity tests with: `vendor/bin/phpunit --testsuite "Rust Fixtures Suite"`.

Rust parity configuration example```
