PHPackages                             voilab/htmlcleaner - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. voilab/htmlcleaner

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

voilab/htmlcleaner
==================

A HTML cleaner based on SimpleXML, fast and customizable

0.2.0(8y ago)3139MITPHPPHP &gt;=5.6.0

Since Mar 31Pushed 8y ago3 watchersCompare

[ Source](https://github.com/voilab/htmlcleaner)[ Packagist](https://packagist.org/packages/voilab/htmlcleaner)[ Docs](http://www.voilab.ch)[ RSS](/packages/voilab-htmlcleaner/feed)WikiDiscussions develop Synced 1mo ago

READMEChangelogDependencies (2)Versions (7)Used By (0)

Voilab HTML cleaner
===================

[](#voilab-html-cleaner)

A HTML cleaner based on SimpleXML, fast and customizable

Install
-------

[](#install)

Via Composer

Create a composer.json file in your project root:

```
{
    "require": {
        "voilab/htmlcleaner": "0.*"
    }
}
```

```
$ composer require voilab/htmlcleaner
```

Sample dataset
--------------

[](#sample-dataset)

```

    Some paragraph with bold or
    nested tags.

    And a second paragraph (so two roots elements, here) with
    a cool link,
    a bad link
    and some nice attributes to try to keep.

```

Basic usage
-----------

[](#basic-usage)

### All tags stripped

[](#all-tags-stripped)

```
use \voilab\cleaner\HtmlCleaner;

$cleaner = new HtmlCleaner();
$raw_html = '...'; // take sample dataset above

echo $cleaner->clean($raw_html);
```

### Allow some tags

[](#allow-some-tags)

```
// create cleaner...
$cleaner->addAllowedTags(['p', 'strong']);
// call clean method
```

### Allow some tags and attributes (regardless of tags)

[](#allow-some-tags-and-attributes-regardless-of-tags)

```
// create cleaner...
$cleaner
    ->addAllowedTags(['p', 'span'])
    ->addAllowedAttributes(['class']);
// call clean method
```

### Allow some attributes only on certain tags

[](#allow-some-attributes-only-on-certain-tags)

```
// create cleaner...
$cleaner
    ->addAllowedTags(['p', 'span'])
    ->addAllowedAttributes([
        // keep attribute "class" only for spans
        new \voilab\cleaner\attribute\Keep('class', 'span'),

        // you can use this shorthand too, as a string
        'style:span'
    ]);
// call clean method
```

Advanced usage
--------------

[](#advanced-usage)

### Processors

[](#processors)

Processors are used to prepare HTML string before it is inserted into a new SimpleXMLElement (base of the process). They are also used to format the HTML after it is cleaned. It's some sort of pre-process and post-process.

> The pre-process **must** remove not allowed tags.

#### Standard processor

[](#standard-processor)

The standard processor uses `strip_tags()` to remove not allowed tags. After process, the processor removes all carriage returns from the string.

#### Custom processor

[](#custom-processor)

You can create your own processor by implementing `\voilab\cleaner\processor\Processor`. Do not forget that the pre-process is responsible of removing all not allowed tags.

### Attributes

[](#attributes)

Attributes classes are used to validate attributes and their content. By default an allowed attribute becomes a `\voilab\cleaner\attribute\Keep`. Every "not allowed" attribute becomes a `\voilab\cleaner\attribute\Remove`.

These two attribute types don't need to be instanciated by you. All attributes provided as a string in `setAllowedTags()` are converted in `Keep` class.

#### Js attribute

[](#js-attribute)

You may want to keep some attributes but check the content. It's true for the `href` attribute. It can contain a valid URL or some javascript injection. There is an attribute validator already created for that:

```
$cleaner
    ->addAllowedTags(['a'])
    ->addAllowedAttributes([
        new \voilab\cleaner\attribute\Js('href')
    ]);
```

> Note that allowed attributes can be bound or not to a specific tag. In the example above, the href attribute will be valid for every HTML tag. If you want to bind the attribute to a tag, you need to specify it as a second parameter.

Known limitations
-----------------

[](#known-limitations)

### Root mixed content

[](#root-mixed-content)

Mixed content outside tags is not allowed in root position.

```

some root mixed special content

some root mixed special content

some root element
and an other root element
```

### Bad HTML format with Standard processor

[](#bad-html-format-with-standard-processor)

If HTML is not well formatted, the cleaner will throw an `\Exception`. The string needs to be perfectly written, because it is processed by `simplexml_load_string($html)`, which is very strict:

- tags must be closed (`` or ``)
- attributes must be wrapped in (double-)quotes (``)
- (double-)quote is not allowed in attribute content, it must be converted in `&quot;` before `HtmlCleaner::clean()` is called
- opening tag `
