PHPackages                             infoxy/utext - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. infoxy/utext

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

infoxy/utext
============

Tiny set of PHP text utility classes.

1.1.2(4y ago)2615MITPHPPHP &gt;=5.5.0

Since Jun 14Pushed 4y ago1 watchersCompare

[ Source](https://github.com/infoxy/utext)[ Packagist](https://packagist.org/packages/infoxy/utext)[ Docs](https://github.com/infoxy/utext)[ RSS](/packages/infoxy-utext/feed)WikiDiscussions master Synced 3d ago

READMEChangelog (6)DependenciesVersions (7)Used By (0)

utext
=====

[](#utext)

Tiny set of PHP text utility classes.

- [Purposes and intro](#purposes-and-intro)

**Requirements:**

- PHP intl extension or polyfills for [Normalizer class](https://www.php.net/manual/en/class.normalizer.php) and [idna functions](https://www.php.net/manual/en/ref.intl.idn.php).

**Class list (all classes placed in `\infoxy\utext namespace`):**

- [PlainFilter](#plainfilter): Plain text filter and corresponded utilities.
- [PlainSimpler](#plainsimpler): Filter for simplify plain unicode text.
- [HtmlBase](#htmlbase): Collection of static functions for DOMDocument manipulations.
    - [String to DOM and back](#string-to-dom-and-back)
    - [Class checking](#class-checking)
    - [DOM elements manipulations](#dom-elements-manipulations)
- [IdnaURL](#idnaurl): International domain names normalization and humanization class.

First two classes can be used as standalone, and latter based on its.

Purposes and intro
------------------

[](#purposes-and-intro)

All editors, copywriters, users have different skills in html and unicode. Somebody type text in notepads, anothers type in word processors or in some advanced publishing platforms, all can made copy-paste from foreign sources and so on.

As result in real life: many pieces of simple utf-8 text (in site's database for example) can be very different in formatting and technical quality:

- can contain invalid utf-8 byte sequences;
- can be mixture of composed and decomposed unicode chars;
- with or without encoded html entities;
- with or without denormalized whitespaces;
- with special spaces (spations, fixed-width spaces) that can be nice for printable papers, but really bad things then copypasted on the web pages;
- with special dashes, hyphens or other symbols that can be unreleased in the used fonts.

This makes pieces of text harder to search and ugly to look. PlainFilter filter can be used to transform plain text to the some more normalized and clean form (based on specified options) and also provide some additional services like tags stripper and pattern usage. See [PlainFilter](#plainfilter) section for details.

PlainFilter
-----------

[](#plainfilter)

### Basic filtration

[](#basic-filtration)

```
use \infoxy\utext\PlainFilter

$pf = new PlainFilter;
$pf -> setLangId('ru')  // language for quote filter
    -> setOptions([     // set filter options
        'filter_utf8' => true,
        'decode_entities' => true,
        'lang_quotes' => true,   // replace " with language-specific quotes
        'replace_quotes => true, // replase ' and " to curly form
        'simplify_spaces' => true,
        'collapse_spaces' => true,
        'trim' => true,
        'normalize' => true,
    ]);
$filtered_string = $pf->filter($input_string);

```

There are list of filter options in "logical pipeline" order:

**filter\_utf8**Bypass only correct utf8 chars, strip out any invalid byte sequences.

Note: there exists static method `PlainFilter::filter_utf8($s)` that can be used explicitly.

**newline\_tags**Insert `\r\n` before every ` true, // replase ' and " to curly form
    'simplify_spaces' => true,
    'collapse_spaces' => true,
    'trim' => true,
    'normalize' => true,
];

// Call static escape_filter()
$prepared_string = PlainFilter::escape_filter($input_string, $opt);

```

PlainSimpler
------------

[](#plainsimpler)

Unicode plain text simplifier.

**PlainSimpler::simplify($s, $lang)**

Simplify unicode plain text. Typically it is not what you want to expose to end users. Simplified text can be used to improve search queries and string comparing.

More deeply simplify() do:

- Decode html entities;
- Decomposite digraphs and ligatures by normalizing to NFKD
- Additional language-based decomposition for umlauts, AE, ets (latin based).
- Preserve some specific diacritic combinations (cyrillic 'Й').
- Remove all other diacritics.
- Finally, normalize to NFC.

Note: PlainSimpler can be used as next stage after PlainFilter.

### PlainSimpler example

[](#plainsimpler-example)

```
use \infoxy\utext\PlainSimpler;

...

$src = "bœf (fr), el niño (es), клёвый (ru), regelmäßig (de), øjemål (da)";

$simp_en = PlainSimpler::simplify($src, 'en');
// Produce "boef (fr), el ninno (es), клевый (ru), regelmassig (de), oejemal (da)"

$simp_de = PlainSimpler::simplify($src, 'de');
// Produce "boef (fr), el ninno (es), клевый (ru), regelmaessig (de), oejemaal (da)"

```

HtmlBase
--------

[](#htmlbase)

Collection of static functions for DOMDocument manipulation. So you do not need to create HtmlBase objects to use methods.

Note: `toText()` and `toDom()` is focused on import/export in-body html tags, not for full documents with embedded scripts, styles and CDATA sections.

### String to DOM and back

[](#string-to-dom-and-back)

**HtmlBase::toDom($s)**Create HTML DOMDocument from string $s, that defines body content for created document. Return DOMElement body for created document.

**HtmlBase::toText($e)**Export content of DOMElement $e into the string. Return html as string.

### Class checking

[](#class-checking)

**HtmlBase::classCheck($s)**Check then string $s is acceptable as class list. In current version it means that $s contain mixture of alphanumerics, '-', underscore and space. Return TRUE if check passed or FALSE in other case.

**HtmlBase::classArray($s)**Explode string $s to class names. Return array of string (or empty array if no classes).

**HtmlBase::classPat($classes)**Generate pattern to match against specified classes. $classes: array of class names or string of class names.

#### Usage example

[](#usage-example)

```
$pat = HtmlBase::classPat('class1 class2 class3');
foreach ($nodes as $n) {
  if ($n->hasAttribute('class') && preg_match($pat, $n->getAttribute('class'))) {
     // class matched with any of ones in pattern
     // DoSomething($n);
  }
}

```

### DOM elements manipulations

[](#dom-elements-manipulations)

**HtmlBase::tagStrip($e)**Strip tag (DOMElement) $e, reattach children to it's parent. Return (DOMNode) first reattached child or NULL if no child or $e don't have parent.

**HtmlBase::tagWrap($e, $tag)**Wrap (DOMElement) $e with new DOMElement with (string) $tag name. Return newly created DOMElement.

**HtmlBase::tagReplace($e, $tag)**Replace (DOMElement) $e with new DOMElement with (string) $tag name and reattach children to it. id, class, lang, dir attributes are also copied to new element. Return newly created DOMElement

**HtmlBase::contentWrap($e,$tag)**Wrap (DOMElement)$e children with specified tag Return: newly created DOMElement

IdnaURL
-------

[](#idnaurl)

... in progress ...

###  Health Score

29

—

LowBetter than 59% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity17

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity58

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~163 days

Recently: every ~204 days

Total

6

Last Release

1712d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/f9afaf04b537717f619c69a977d27e6022de3878ca435edc9123c482167798d6?d=identicon)[infoxy](/maintainers/infoxy)

---

Top Contributors

[![infoxy](https://avatars.githubusercontent.com/u/47464492?v=4)](https://github.com/infoxy "infoxy (32 commits)")

---

Tags

drupal

### Embed Badge

![Health badge](/badges/infoxy-utext/health.svg)

```
[![Health](https://phpackages.com/badges/infoxy-utext/health.svg)](https://phpackages.com/packages/infoxy-utext)
```

###  Alternatives

[lullabot/amp

A set of useful classes and utilities to convert html to AMP html (See https://www.ampproject.org/)

3802.9M10](/packages/lullabot-amp)[drupal/core-composer-scaffold

A flexible Composer project scaffold builder.

5341.9M446](/packages/drupal-core-composer-scaffold)[drupal/core-project-message

Adds a message after Composer installation.

2122.6M172](/packages/drupal-core-project-message)[aleksip/plugin-data-transform

Data Transform Plugin for Pattern Lab PHP

34897.4k3](/packages/aleksip-plugin-data-transform)[acquia/drupal-recommended-settings

The composer plugin for adding drupal-recommended-settings for Acquia Cloud.

101.1M4](/packages/acquia-drupal-recommended-settings)[tripal/tripal

Tripal is a toolkit to facilitate construction of online genomic, genetic (and other biological) websites.

709.9k9](/packages/tripal-tripal)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
