PHPackages                             wp-php-toolkit/encoding - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. wp-php-toolkit/encoding

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

wp-php-toolkit/encoding
=======================

Encoding component for WordPress.

v0.8.1(1mo ago)077.9k1GPL-2.0-or-laterPHPPHP &gt;=7.2

Since May 20Pushed 3w agoCompare

[ Source](https://github.com/wp-php-toolkit/encoding)[ Packagist](https://packagist.org/packages/wp-php-toolkit/encoding)[ Docs](https://wordpress.github.io/php-toolkit/reference/encoding.html)[ RSS](/packages/wp-php-toolkit-encoding/feed)WikiDiscussions trunk Synced 3w ago

READMEChangelogDependenciesVersions (49)Used By (1)

   slug encoding   title Encoding   install wp-php-toolkit/encoding   see\_also    html | HTML | Normalize incoming text before HTML tokenization.

 xml | XML | Keep invalid bytes out of XML streams.

 dataliberation | DataLiberation | Clean content before importing it into WordPress.

    UTF-8 validation and scrubbing with a pure-PHP fallback when `mbstring` is unavailable. Detects malformed bytes and replaces them per the Unicode maximal-subpart algorithm.

Why this exists
---------------

[](#why-this-exists)

Every parser in this toolkit eventually has to decide what to do with text bytes. XML rejects malformed UTF-8. JSON and databases can fail late. CSS, HTML, WXR, and Blueprint validation all need consistent answers about whether a string is well-formed Unicode.

The Encoding component provides the small UTF-8 primitives the rest of the toolkit can share: validate bytes, scrub invalid sequences, scan code points, and detect Unicode noncharacters. When `mbstring` is available it can delegate to it; when it is not, the component uses its own byte scanner so behavior stays available in restricted PHP environments.

Historically, this became the common foundation for Blueprint validation and CSS/XML processing, replacing ad hoc Unicode helpers with the WordPress core UTF-8 routines used here.

Validating UTF-8 before storing it
----------------------------------

[](#validating-utf-8-before-storing-it)

`wp_is_valid_utf8()` rejects overlong sequences, surrogate halves, and stray ISO-8859-1 bytes. Use it as a guard in front of any code path that assumes UTF-8 (database, JSON, XML).

```
