PHPackages                             wa72/htmlpagedom - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. wa72/htmlpagedom

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

wa72/htmlpagedom
================

jQuery-inspired DOM manipulation extension for Symfony's Crawler

v3.1.0(4mo ago)3494.0M↓55.9%47[7 issues](https://github.com/wasinger/htmlpagedom/issues)[2 PRs](https://github.com/wasinger/htmlpagedom/pulls)20MITPHPPHP ^8.1CI failing

Since Feb 13Pushed 4mo ago12 watchersCompare

[ Source](https://github.com/wasinger/htmlpagedom)[ Packagist](https://packagist.org/packages/wa72/htmlpagedom)[ Docs](http://github.com/wasinger/htmlpagedom)[ RSS](/packages/wa72-htmlpagedom/feed)WikiDiscussions master Synced 1w ago

READMEChangelog (10)Dependencies (8)Versions (34)Used By (20)

HtmlPageDom
===========

[](#htmlpagedom)

[![tests](https://github.com/wasinger/htmlpagedom/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/wasinger/htmlpagedom/actions/workflows/tests.yml/badge.svg?branch=master)[![Latest Version](https://camo.githubusercontent.com/b0653b1c42c7b563c785cf3b125ff399cdf0e67a2ce0d84f0eb3df8d392fb59e/687474703a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f776137322f68746d6c70616765646f6d2e737667)](https://packagist.org/packages/wa72/htmlpagedom)[![Downloads from Packagist](https://camo.githubusercontent.com/39df85f059765f841fcf38cedd50b9e8ecb9c22080868afb714dff10b28558c6/687474703a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f776137322f68746d6c70616765646f6d2e737667)](https://packagist.org/packages/wa72/htmlpagedom)

`Wa72\HtmlPageDom` is a PHP library for easy manipulation of HTML documents using DOM. It requires [DomCrawler from Symfony components](https://github.com/symfony/DomCrawler) for traversing the DOM tree and extends it by adding methods for manipulating the DOM tree of HTML documents.

It's useful when you need to not just extract information from an HTML file (what DomCrawler does) but also to modify HTML pages. It is usable as a template engine: load your HTML template file, set new HTML content on certain elements such as the page title, `div#content` or `ul#menu` and print out the modified page.

`Wa72\HtmlPageDom` consists of two main classes:

- `HtmlPageCrawler` extends `Symfony\Components\DomCrawler` by adding jQuery inspired, HTML specific DOM *manipulation* functions such as `setInnerHtml($htmltext)`, `before()`, `append()`, `wrap()`, `addClass()` or `css()`. It's like jQuery for PHP: simply select elements of an HTML page using CSS selectors and change their attributes and content.

    [API doc for HtmlPageCrawler](doc/HtmlPageCrawler.md)
- `HtmlPage` represents one complete HTML document and offers convenience functions like `getTitle()`, `setTitle($title)`, `setMeta('description', $description)`, `getBody()`. Internally, it uses the `HtmlPageCrawler` class for filtering and manipulating DOM Elements. Since version 1.2, it offers methods for compressing (`minify()`) and prettyprinting (`indent()`) the HTML page.

    [API doc for HtmlPage](doc/HtmlPage.md)

Requirements and Compatibility
------------------------------

[](#requirements-and-compatibility)

Version 3.x:

- PHP 8.x
- [Symfony\\Components\\DomCrawler](https://github.com/symfony/DomCrawler) 6.x | 7.x | 8.x
- [Symfony\\Components\\CssSelector](https://github.com/symfony/CssSelector) 6.x | 7.x | 8.x

Version 2.x:

- PHP ^7.4 | 8.x
- [Symfony\\Components\\DomCrawler](https://github.com/symfony/DomCrawler) ^4.4 | 5.x
- [Symfony\\Components\\CssSelector](https://github.com/symfony/CssSelector) ^4.4 | 5.x

There is no difference in our API between versions 2.x and 3.0.x. The only difference is the compatibility with different versions of Symfony.

Installation
------------

[](#installation)

- using [composer](http://getcomposer.org): `composer require wa72/htmlpagedom`
- using other [PSR-4](http://www.php-fig.org/psr/psr-4/) compliant autoloader: clone this project to where your included libraries are and point your autoloader to look for the "\\Wa72\\HtmlPageDom" namespace in the "src" directory of this project

Usage
-----

[](#usage)

`HtmlPageCrawler` is a wrapper around DOMNodes. `HtmlPageCrawler` objects can be created using `new` or the static function `HtmlPageCrawler::create()`, which accepts an HTML string or a DOMNode (or an array of DOMNodes, a DOMNodeList, or even another `Crawler` object) as arguments.

Afterwards you can select nodes from the added DOM tree by calling `filter()` (equivalent to find() in jQuery) and alter the selected elements using the following jQuery-like manipulation functions:

- `addClass()`, `hasClass()`, `removeClass()`, `toggleClass()`
- `after()`, `before()`
- `append()`, `appendTo()`
- `makeClone()` (equivalent to `clone()` in jQuery)
- `css()` (alias `getStyle()` / `setStyle()`)
- `html()` (get inner HTML content) and `setInnerHtml($html)`
- `attr()` (alias `getAttribute()` / `setAttribute()`), `removeAttr()`
- `insertAfter()`, `insertBefore()`
- `makeEmpty()` (equivalent to `empty()` in jQuery)
- `prepend()`, `prependTo()`
- `remove()`
- `replaceAll()`, `replaceWith()`
- `text()`, `getCombinedText()` (get text content of all nodes in the Crawler), and `setText($text)`
- `wrap()`, `unwrap()`, `wrapInner()`, `unwrapInner()`, `wrapAll()`

To get the modified DOM as HTML code use `html()` (returns innerHTML of the first node in your crawler object) or `saveHTML()` (returns combined "outer" HTML code of all elements in the list).

See the full methods documentation in the generated [API doc for HtmlPageCrawler](doc/HtmlPageCrawler.md)

**Example:**

```
use \Wa72\HtmlPageDom\HtmlPageCrawler;

// create an object from a fragment of HTML code as you would do with jQuery's $() function
$c = HtmlPageCrawler::create('Title');

// the above is the same as calling:
$c = new HtmlPageCrawler('Title');

// filter for h1 elements and wrap them with an HTML structure
$c->filter('h1')->wrap('');

// return the modified HTML
echo $c->saveHTML();
// or simply:
echo $c; // implicit __toString() calls saveHTML()
// will output: Title
```

**Advanced example: remove the third column from an HTML table**

```
use \Wa72\HtmlPageDom\HtmlPageCrawler;
$html = reduce(
        function ($c, $j) {
            if (($j+1) % 3 == 0) {
                return true;
            }
            return false;
        }
    );
$tr->remove();
echo $c->saveHTML();
```

**Usage examples for the `HtmlPage` class:**

```
use \Wa72\HtmlPageDom\HtmlPage;

// create a new HtmlPage object with an empty HTML skeleton
$page = new HtmlPage();

// or create a HtmlPage object from an existing page
$page = new HtmlPage(file_get_contents('http://www.heise.de'));

// get or set page title
echo $page->getTitle();
$page->setTitle('New page title');
echo $page->getTitle();

// add HTML content
$page->filter('body')->setInnerHtml('This is the headlineThis is a paragraph');

// select elements by css selector
$h1 = $page->filter('#content h1');
$p = $page->filter('p.text');

// change attributes and content of an element
$h1->addClass('headline')->css('margin-top', '10px')->setInnerHtml('This is the new headline');

$p->removeClass('text')->append('There is more than one line in this paragraph');

// add a new paragraph to div#content
$page->filter('#content')->append('This is a new paragraph.');

// add a class and some attribute to all paragraphs
$page->filter('p')->addClass('newclass')->setAttribute('data-foo', 'bar');

// get HTML content of an element
echo $page->filter('#content')->saveHTML();

// output the whole HTML page
echo $page->save();
// or simply:
echo $page;

// output formatted HTML code
echo $page->indent()->save();

// output compressed (minified) HTML code
echo $page->minify()->save();
```

See also the generated [API doc for HtmlPage](doc/HtmlPage.md)

Limitations
-----------

[](#limitations)

- HtmlPageDom builds on top of PHP's DOM functions and uses the loadHTML() and saveHTML() methods of the DOMDocument class. That's why it's output is always HTML, not XHTML.
- The HTML parser used by PHP is built for HTML4. It throws errors on HTML5 specific elements which are ignored by HtmlPageDom, so HtmlPageDom is usable for HTML5 with some limitations.
- HtmlPageDom has not been tested with character encodings other than UTF-8.

History
-------

[](#history)

When I discovered how easy it was to modify HTML documents using jQuery I looked for a PHP library providing similar possibilities for PHP.

Googling around I found [SimpleHtmlDom](http://simplehtmldom.sourceforge.net)and later [Ganon](http://code.google.com/p/ganon) but both turned out to be very slow. Nevertheless I used both libraries in my projects.

When Symfony2 appeared with it's DomCrawler and CssSelector components I thought: the functions for traversing the DOM tree and selecting elements by CSS selectors are already there, only the manipulation functions are missing. Let's implement them! So the HtmlPageDom project was born.

It turned out that it was a good choice to build on PHP's DOM functions: Compared to SimpleHtmlDom and Ganon, HmtlPageDom is lightning fast. In one of my projects, I have a PHP script that takes a huge HTML page containing several hundreds of article elements and extracts them into individual HTML files (that are later on demand loaded by AJAX back into the original HTML page). Using SimpleHtmlDom it took the script 3 minutes (right, minutes!) to run (and I needed to raise PHP's memory limit to over 500MB). Using Ganon as HTML parsing and manipulation engine it took even longer, about 5 minutes. After switching to HtmlPageDom the same script doing the same processing tasks is running only about one second (all on the same server). HtmlPageDom is really fast.

© 2012-2023 Christoph Singer. Licensed under the MIT License.

###  Health Score

68

—

FairBetter than 99% of packages

Maintenance75

Regular maintenance activity

Popularity62

Solid adoption and visibility

Community35

Small or concentrated contributor base

Maturity83

Battle-tested with a long release history

 Bus Factor2

2 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~147 days

Recently: every ~208 days

Total

33

Last Release

134d ago

Major Versions

v1.4.2 → v2.0.02019-10-15

1.x-dev → v2.0.12019-11-22

v2.0.2 → v3.0.02022-04-13

v2.0.3 → v3.0.22023-12-05

v2.0.4 → v3.1.02026-01-26

PHP version history (6 changes)v1.0.0PHP &gt;=5.3.1

v1.2.0PHP &gt;=5.4.0

v2.0.0-alpha1PHP &gt;=7.1.0

v2.0.2PHP ^7.4|^8.0

v3.0.0PHP ^8.0

v3.1.0PHP ^8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/bcf22c073165b9bc11275e601e5a77ca36137628c683cbeb1b8162c522d19718?d=identicon)[wasinger](/maintainers/wasinger)

---

Top Contributors

[![ttk](https://avatars.githubusercontent.com/u/1742711?v=4)](https://github.com/ttk "ttk (10 commits)")[![wasinger](https://avatars.githubusercontent.com/u/2566999?v=4)](https://github.com/wasinger "wasinger (8 commits)")[![iotch](https://avatars.githubusercontent.com/u/2292500?v=4)](https://github.com/iotch "iotch (3 commits)")[![glensc](https://avatars.githubusercontent.com/u/199095?v=4)](https://github.com/glensc "glensc (1 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")[![pfuhrmann](https://avatars.githubusercontent.com/u/1627445?v=4)](https://github.com/pfuhrmann "pfuhrmann (1 commits)")

---

Tags

dom-treedomcrawlerhtml-documentphphtmldomcrawler

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/wa72-htmlpagedom/health.svg)

```
[![Health](https://phpackages.com/badges/wa72-htmlpagedom/health.svg)](https://phpackages.com/packages/wa72-htmlpagedom)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.9k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k17.7M58](/packages/spatie-crawler)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49517.1k](/packages/blackfire-player)[helsingborg-stad/municipio

A bootstrap theme for creating municipality sites.

4028.3k10](/packages/helsingborg-stad-municipio)[vdb/php-spider

A configurable and extensible PHP web spider

1.3k184.2k7](/packages/vdb-php-spider)[crwlr/crawler

Web crawling and scraping library.

37116.4k2](/packages/crwlr-crawler)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
