PHPackages                             ee01/php-html-parser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. ee01/php-html-parser

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

ee01/php-html-parser
====================

An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.

2.0.2.1(7y ago)018MITPHPPHP &gt;=7.1

Since May 4Pushed 7y agoCompare

[ Source](https://github.com/ee01/php-html-parser)[ Packagist](https://packagist.org/packages/ee01/php-html-parser)[ Docs](https://github.com/ee01/php-html-parser)[ RSS](/packages/ee01-php-html-parser/feed)WikiDiscussions master Synced 2w ago

READMEChangelogDependencies (4)Versions (16)Used By (0)

PHP Html Parser
===============

[](#php-html-parser)

Version 2.0.2.1

[![Build Status](https://camo.githubusercontent.com/3a118028f9b6acd3e4e63e7fd51afe0b9a8778dd0ed61b306e670d7e42bc6786/68747470733a2f2f7472617669732d63692e6f72672f70617175657474672f7068702d68746d6c2d7061727365722e706e67)](https://travis-ci.org/paquettg/php-html-parser)[![Coverage Status](https://camo.githubusercontent.com/b8d097747e3b068cc49bfa53c85db8aebc6b3a344d27a21b11896fbe7af32474/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f70617175657474672f7068702d68746d6c2d7061727365722f62616467652e706e67)](https://coveralls.io/r/paquettg/php-html-parser)[![Scrutinizer Code Quality](https://camo.githubusercontent.com/de5880a9cb0cec455d15df2900f8a27b8f62fa1cf7388c5474ef7eb557123550/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f70617175657474672f7068702d68746d6c2d7061727365722f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/paquettg/php-html-parser/?branch=master)

PHPHtmlParser is a simple, flexible, html parser which allows you to select tags using any css selector, like jQuery. The goal is to assist in the development of tools which require a quick, easy way to scrap html, whether it's valid or not!

Install
-------

[](#install)

This package can be found on [packagist](https://packagist.org/packages/paquettg/php-html-parser) and is best loaded using [composer](http://getcomposer.org/). We support php 7.1, 7.2, and 7.3.

Usage
-----

[](#usage)

You can find many examples of how to use the dom parser and any of its parts (which you will most likely never touch) in the tests directory. The tests are done using PHPUnit and are very small, a few lines each, and are a great place to start. Given that, I'll still be showing a few examples of how the package should be used. The following example is a very simplistic usage of the package.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->load('Hey bro, click here :)');
$a = $dom->find('a')[0];
echo $a->text; // "click here"
```

The above will output "click here". Simple no? There are many ways to get the same result from the dome, such as `$dom->getElementsbyTag('a')[0]` or `$dom->find('a', 0)` which can all be found in the tests or in the code itself.

Loading Files
-------------

[](#loading-files)

You may also seamlessly load a file into the dom instead of a string, which is much more convenient and is how I except most developers will be loading the html. The following example is taken from our test and uses the "big.html" file found there.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromFile('tests/big.html');
$contents = $dom->find('.content-border');
echo count($contents); // 10

foreach ($contents as $content)
{
	// get the class attr
	$class = $content->getAttribute('class');

	// do something with the html
	$html = $content->innerHtml;

	// or refine the find some more
	$child   = $content->firstChild();
	$sibling = $child->nextSibling();
}
```

This example loads the html from big.html, a real page found online, and gets all the content-border classes to process. It also shows a few things you can do with a node but it is not an exhaustive list of methods that a node has available.

Alternativly, you can always use the `load()` method to load the file. It will attempt to find the file using `file_exists` and, if successful, will call `loadFromFile()` for you. The same applies to a URL and `loadFromUrl()` method.

Loading Url
-----------

[](#loading-url)

Loading a url is very similar to the way you would load the html from a file.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadFromUrl('http://google.com');
$html = $dom->outerHtml;

// or
$dom->load('http://google.com');
$html = $dom->outerHtml; // same result as the first example
```

Use php cURL configurations including changing to POST method.

```
// load via post method
$dom_login->load('http://google.com/login', [
	'curl' => [
		CURLOPT_REFERER => 'http://google.com',
		CURLOPT_HTTPHEADER => [
			'Content-Type: application/json;',
		],
		CURLOPT_POST => 1,
		CURLOPT_POSTFIELDS => [
			'account' => '111'
		]
	]
]);
$html = $dom->outerHtml;
```

What makes the loadFromUrl method note worthy is the `PHPHtmlParser\CurlInterface` parameter, an optional second parameter. By default, we use the `PHPHtmlParser\Curl` class to get the contents of the url. On the other hand, though, you can inject your own implementation of CurlInterface and we will attempt to load the url using what ever tool/settings you want, up to you.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;
use App\Services\Connector;

$dom = new Dom;
$dom->loadFromUrl('http://google.com', [], new Connector);
$html = $dom->outerHtml;
```

As long as the Connector object implements the `PHPHtmlParser\CurlInterface` interface properly it will use that object to get the content of the url instead of the default `PHPHtmlParser\Curl` class.

Loading Strings
---------------

[](#loading-strings)

Loading a string directly, with out the checks in `load()` is also easily done.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->loadStr('String', []);
$html = $dom->outerHtml;
```

If the string is to long, depending on your file system, the `load()` method will throw a warning. If this happens you can just call the above method to bypass the `is_file()` check in the `load()` method.

Options
-------

[](#options)

You can also set parsing option that will effect the behavior of the parsing engine. You can set a global option array using the `setOptions` method in the `Dom` object or a instance specific option by adding it to the `load` method as an extra (optional) parameter.

```
// Assuming you installed from Composer:
require "vendor/autoload.php";
use PHPHtmlParser\Dom;

$dom = new Dom;
$dom->setOptions([
	'strict' => true, // Set a global option to enable strict html parsing.
]);

$dom->load('http://google.com', [
	'whitespaceTextNode' => false, // Only applies to this load.
]);

$dom->load('http://gmail.com'); // will not have whitespaceTextNode set to false.
```

At the moment we support 8 options.

**Strict**

Strict, by default false, will throw a `StrickException` if it find that the html is not strictly compliant (all tags must have a closing tag, no attribute with out a value, etc.).

**whitespaceTextNode**

The whitespaceTextNode, by default true, option tells the parser to save textnodes even if the content of the node is empty (only whitespace). Setting it to false will ignore all whitespace only text node found in the document.

**enforceEncoding**

The enforceEncoding, by default null, option will enforce an character set to be used for reading the content and returning the content in that encoding. Setting it to null will trigger an attempt to figure out the encoding from within the content of the string given instead.

**cleanupInput**

Set this to `false` to skip the entire clean up phase of the parser. If this is set to true the next 3 options will be ignored. Defaults to `true`.

**removeScripts**

Set this to `false` to skip removing the script tags from the document body. This might have adverse effects. Defaults to `true`.

**removeStyles**

Set this to `false` to skip removing of style tags from the document body. This might have adverse effects. Defaults to `true`.

**preserveLineBreaks**

Preserves Line Breaks if set to `true`. If set to `false` line breaks are cleaned up as part of the input clean up process. Defaults to `false`.

**removeDoubleSpace**

Set this to `false` if you want to preserver whitespace inside of text nodes. It is set to `true` by default. Static Facade
---------------------------------------------------------------------------------------------------------------------------

[](#set-this-to-false-if-you-want-to-preserver-whitespace-inside-of-text-nodes-it-is-set-to-true-by-defaultstatic-facade)

You can also mount a static facade for the Dom object.

```
PHPHtmlParser\StaticDom::mount();

Dom::load('tests/big.hmtl');
$objects = Dom::find('.content-border');
```

The above php block does the same find and load as the first example but it is done using the static facade, which supports all public methods found in the Dom object.

Modifying The Dom
-----------------

[](#modifying-the-dom)

You can always modify the dom that was created from any loading method. To change the attribute of any node you can just call the `setAttribute` method.

```
$dom = new Dom;
$dom->load('Hey bro, click here :)');
$a = $dom->find('a')[0];
$a->setAttribute('class', 'foo');
echo $a->getAttribute('class'); // "foo"
```

You may also get the `PHPHtmlParser\Dom\Tag` class directly and manipulate it as you see fit.

```
$dom = new Dom;
$dom->load('Hey bro, click here :)');
$a   = $dom->find('a')[0];
$tag = $a->getTag();
$tag->setAttribute('class', 'foo');
echo $a->getAttribute('class'); // "foo"
```

It is also possible to remove a node from the tree. Simply call the `delete` method on any node to remove it from the tree. It is important to note that you should unset the node after removing it from the `DOM``, it will still take memory as long as it is not unset.

```
$dom = new Dom;
$dom->load('Hey bro, click here :)');
$a   = $dom->find('a')[0];
$a->delete();
unset($a);
echo $dom; // 'Hey bro,  :)');
```

You can modify the text of `TextNode` objects easely. Please note that, if you set an encoding, the new text will be encoded using the existing encoding.

```
$dom = new Dom;
$dom->load('Hey bro, click here :)');
$a   = $dom->find('a')[0];
$a->firstChild()->setText('biz baz');
echo $dom; // 'Hey bro, biz baz :)'
```

###  Health Score

29

—

LowBetter than 57% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community18

Small or concentrated contributor base

Maturity66

Established project with proven stability

 Bus Factor1

Top contributor holds 70.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~179 days

Recently: every ~271 days

Total

13

Last Release

2652d ago

Major Versions

1.7.0 → 2.0.02019-01-21

PHP version history (3 changes)1.5.1PHP &gt;=5.3.2

1.6.0PHP &gt;=5.4

2.0.0PHP &gt;=7.1

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/2222227?v=4)[Esone](/maintainers/ee01)[@ee01](https://github.com/ee01)

---

Top Contributors

[![paquettg](https://avatars.githubusercontent.com/u/2430962?v=4)](https://github.com/paquettg "paquettg (165 commits)")[![sunra](https://avatars.githubusercontent.com/u/635370?v=4)](https://github.com/sunra "sunra (22 commits)")[![andreyshade](https://avatars.githubusercontent.com/u/10078161?v=4)](https://github.com/andreyshade "andreyshade (14 commits)")[![mallardduck](https://avatars.githubusercontent.com/u/619938?v=4)](https://github.com/mallardduck "mallardduck (5 commits)")[![parisholley](https://avatars.githubusercontent.com/u/399289?v=4)](https://github.com/parisholley "parisholley (5 commits)")[![ee01](https://avatars.githubusercontent.com/u/2222227?v=4)](https://github.com/ee01 "ee01 (3 commits)")[![EsoneQiu](https://avatars.githubusercontent.com/u/131644182?v=4)](https://github.com/EsoneQiu "EsoneQiu (3 commits)")[![oleg-andreyev](https://avatars.githubusercontent.com/u/1244112?v=4)](https://github.com/oleg-andreyev "oleg-andreyev (2 commits)")[![phh](https://avatars.githubusercontent.com/u/1304003?v=4)](https://github.com/phh "phh (1 commits)")[![rhrebecek](https://avatars.githubusercontent.com/u/1384186?v=4)](https://github.com/rhrebecek "rhrebecek (1 commits)")[![scrutinizer-auto-fixer](https://avatars.githubusercontent.com/u/6253494?v=4)](https://github.com/scrutinizer-auto-fixer "scrutinizer-auto-fixer (1 commits)")[![ssfinney](https://avatars.githubusercontent.com/u/1596394?v=4)](https://github.com/ssfinney "ssfinney (1 commits)")[![billythekid](https://avatars.githubusercontent.com/u/330170?v=4)](https://github.com/billythekid "billythekid (1 commits)")[![thenotsoft](https://avatars.githubusercontent.com/u/44147615?v=4)](https://github.com/thenotsoft "thenotsoft (1 commits)")[![thiagotalma](https://avatars.githubusercontent.com/u/612578?v=4)](https://github.com/thiagotalma "thiagotalma (1 commits)")[![Upperfoot](https://avatars.githubusercontent.com/u/2132945?v=4)](https://github.com/Upperfoot "Upperfoot (1 commits)")[![ViPErCZ](https://avatars.githubusercontent.com/u/1502711?v=4)](https://github.com/ViPErCZ "ViPErCZ (1 commits)")[![cybrox](https://avatars.githubusercontent.com/u/2383736?v=4)](https://github.com/cybrox "cybrox (1 commits)")[![YavorK](https://avatars.githubusercontent.com/u/4265225?v=4)](https://github.com/YavorK "YavorK (1 commits)")[![Gufran](https://avatars.githubusercontent.com/u/2152267?v=4)](https://github.com/Gufran "Gufran (1 commits)")

---

Tags

parserhtmldom

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/ee01-php-html-parser/health.svg)

```
[![Health](https://phpackages.com/badges/ee01-php-html-parser/health.svg)](https://phpackages.com/packages/ee01-php-html-parser)
```

###  Alternatives

[paquettg/php-html-parser

An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.

2.4k8.1M127](/packages/paquettg-php-html-parser)[thesoftwarefanatics/php-html-parser

An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.

33432.0k6](/packages/thesoftwarefanatics-php-html-parser)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)