PHPackages                             mehr-it/html-cleaner - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. mehr-it/html-cleaner

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

mehr-it/html-cleaner
====================

HTML cleaner to remove unwanted tags and attributes and elements from HTML fragments

1.1.0(5y ago)46.9k↑13.3%MITPHPPHP &gt;=7.1.0CI failing

Since Aug 28Pushed 5y ago1 watchersCompare

[ Source](https://github.com/mehr-it/html-cleaner)[ Packagist](https://packagist.org/packages/mehr-it/html-cleaner)[ RSS](/packages/mehr-it-html-cleaner/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (2)Versions (3)Used By (0)

HTML cleaner for PHP
====================

[](#html-cleaner-for-php)

This library aims to offer an easy API for removing unwanted elements from a given HTML fragment. This is required when outputting HTML from an untrusted source such as browsers, API clients or other third parties.

Usage
=====

[](#usage)

The `HtmlCleaner` class allows to define blacklists and whitelists for element types, tag names and attributes. If more customization is required, callbacks my be defined for filtering.

Restrict allowed tags
---------------------

[](#restrict-allowed-tags)

The following example only allows `` and `` tags. All other tags **and their content** are removed.

```
$cleaned = (new HtmlCleaner())
    ->setTagWhitelist(['p', 'br'])
    ->cleanFragment($html);

```

Instead of a whitelist, a blacklist can be used via `setTagBlacklist()` or even a callback which receives the tag name and must return `true` to keep the designated tag:

```
$cleaned = (new HtmlCleaner())
    ->setTagCallback(function($tag, $cleaner) {
        return $tag == 'span';
    })
    ->cleanFragment($html);

```

Restrict element types
----------------------

[](#restrict-element-types)

HTML also contains other elements, such as comments and CDATA. They cannot be filtered by tag name, but by using the element filter functions in the same way as for tag restriction. Following example only allows tags and text nodes:

```
$cleaned = (new HtmlCleaner())
    ->setElementTypeWhitelist([
        HtmlCleaner::ELEMENT_TYPE_TAG,
        HtmlCleaner::ELEMENT_TYPE_TEXT,
    ])
    ->cleanFragment($html);

```

Filter attributes
-----------------

[](#filter-attributes)

Even if certain tags should be allowed, some attributes might have to be removed. Following example only allows `style` attributes:

```
$cleaned = (new HtmlCleaner())
    ->setAttributeWhitelist(['style'])
    ->cleanFragment($html);

```

If all attributes should be removed, the blacklist with the wildcard entry `'*'` can be used:

```
 $cleaned = (new HtmlCleaner())
        ->setAttributeBlacklist(['*'])
        ->cleanFragment($html);

```

Replacing nodes
---------------

[](#replacing-nodes)

Imagine following HTML fragment:

```
A big search engine is called Google.

```

Simply removing all `` tags, would also cause their content to be removed. But what if the text should be kept? Here the replacing functionality comes in. Following example replaces all `` tags with ``:

```
$cleaned = (new HtmlCleaner())
    ->setReplacements([
        'a' => 'span'
    ])
    ->cleanFragment($html);

// output: "A big search engine is called Google."

```

As you see, existing attributes are removed automatically.

To get rid of the `` tags, you may simply pass `null` as value, to only keep the text content of a node:

```
$cleaned = (new HtmlCleaner())
    ->setReplacements([
        'a' => null
    ])
    ->cleanFragment($html);

// output: "A big search engine is called Google."

```

You may even pass a `Closure` as replacement to generate a replacement value such as a tag name, `null` or even a newly created `DOMNode`. If the callback returns `false` the corresponding node is not replaced but removed.

Unwrapping nodes
----------------

[](#unwrapping-nodes)

Sometimes replacing nodes is not what you want. Often you might want to get rid of some nodes but keep their content. You may specify these nodes using the "unwrap" list:

```
$html = 'This is me and you

$cleaned = (new HtmlCleaner())
   ->setUnwraps([
       'p',
   ])
   ->cleanFragment($html);

// output: "This is me and you"

```

You may pass `'*'` as a wildcard to unwrap any nodes. Note: **Replacements take precedence over unwraps!**

If you want to unwrap a node and prepend/append other elements, an associative array may be passed:

```
$html = 'This is me and you

$cleaned = (new HtmlCleaner())
   ->setUnwraps([
       'p',
       'b' => ['about ', ':innerHtml', ', my life'],
   ])
   ->cleanFragment($html);

// output: "This is about me, my life and you"

```

The string `':innerHTML'` has a special meaning and will replaced with all child nodes of the unwrapped node. Any other strings are converted to text nodes.

###  Health Score

30

—

LowBetter than 64% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity27

Limited adoption so far

Community4

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~287 days

Total

2

Last Release

2169d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/44973729?v=4)[mehr.IT GmbH](/maintainers/mehr-it)[@mehr-it](https://github.com/mehr-it)

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/mehr-it-html-cleaner/health.svg)

```
[![Health](https://phpackages.com/badges/mehr-it-html-cleaner/health.svg)](https://phpackages.com/packages/mehr-it-html-cleaner)
```

###  Alternatives

[lullabot/amp

A set of useful classes and utilities to convert html to AMP html (See https://www.ampproject.org/)

3802.9M10](/packages/lullabot-amp)[j0k3r/php-readability

Automatic article extraction from HTML

186808.8k6](/packages/j0k3r-php-readability)[fivefilters/readability.php

A PHP port of Readability.js

311826.8k5](/packages/fivefilters-readabilityphp)[mundschenk-at/php-typography

A PHP library for improving your web typography

78945.6k13](/packages/mundschenk-at-php-typography)[caseyamcl/toc

Simple Table-of-Contents Generator for PHP. Generates TOCs based off H1...H6 tags

92344.9k5](/packages/caseyamcl-toc)[magyarandras/amp-converter

A library to convert HTML articles, blog posts or similar content to AMP (Accelerated Mobile Pages).

65150.3k](/packages/magyarandras-amp-converter)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
