PHPackages                             onoi/tesa - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Validation &amp; Sanitization](/categories/validation)
4. /
5. onoi/tesa

ActiveLibrary[Validation &amp; Sanitization](/categories/validation)

onoi/tesa
=========

A simple library to sanitize text elements

0.1.0(9y ago)3255.5k↑425%2[1 PRs](https://github.com/onoi/tesa/pulls)1GPL-2.0+PHPPHP &gt;=5.3.2

Since Aug 7Pushed 3y ago2 watchersCompare

[ Source](https://github.com/onoi/tesa)[ Packagist](https://packagist.org/packages/onoi/tesa)[ Docs](https://github.com/onoi/tesa)[ RSS](/packages/onoi-tesa/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (1)Dependencies (2)Versions (2)Used By (1)

Tesa (text sanitizer)
=====================

[](#tesa-text-sanitizer)

[![Build Status](https://camo.githubusercontent.com/0022ee4b0918a4bb609d001207a3acb73d7cdb4a73ed4d23f508cab7e9214e3c/68747470733a2f2f7365637572652e7472617669732d63692e6f72672f6f6e6f692f746573612e7376673f6272616e63683d6d6173746572)](http://travis-ci.org/onoi/tesa)[![Code Coverage](https://camo.githubusercontent.com/97e555dd5b34d0d8d08ec64a568f69f7948753becc217e950ceb0274d4501922/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f6f6e6f692f746573612f6261646765732f636f7665726167652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/onoi/tesa/?branch=master)[![Scrutinizer Code Quality](https://camo.githubusercontent.com/a2a903151e9fb98bbadd46bcef61d2cec0611fc1ff3fde47667cf6bc6c4bba63/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f6f6e6f692f746573612f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/onoi/tesa/?branch=master)[![Latest Stable Version](https://camo.githubusercontent.com/8e42382603f8e18eeea8a9678175159ce95d71fbb81dfc53b9eea5df24e3eef5/68747470733a2f2f706f7365722e707567782e6f72672f6f6e6f692f746573612f76657273696f6e2e706e67)](https://packagist.org/packages/onoi/tesa)[![Packagist download count](https://camo.githubusercontent.com/4fc7b07a6e2367dc44d5008cbc5a61e18b5dcf1743c6e28b46073190eb452b9b/68747470733a2f2f706f7365722e707567782e6f72672f6f6e6f692f746573612f642f746f74616c2e706e67)](https://packagist.org/packages/onoi/tesa)[![Dependency Status](https://camo.githubusercontent.com/d66dfda44c4b7ee2b1bdf4dd64763d026508cee3c4f018b5fc3c12bbf365cc65/68747470733a2f2f7777772e76657273696f6e6579652e636f6d2f7068702f6f6e6f693a746573612f62616467652e706e67)](https://www.versioneye.com/php/onoi:tesa)

The library contains a small collection of helper classes to support sanitization of text or string elements of arbitrary length with the aim to improve search match confidence during a query execution that is required by [Semantic MediaWiki](https://github.com/SemanticMediaWiki/SemanticMediaWiki/)project and is deployed independently.

Requirements
------------

[](#requirements)

- PHP 5.3 / HHVM 3.5 or later
- Recommended to enable the [ICU](http://php.net/manual/en/intro.intl.php) extension

Installation
------------

[](#installation)

The recommended installation method for this library is by adding the following dependency to your [composer.json](https://getcomposer.org/).

```
{
	"require": {
		"onoi/tesa": "~0.1"
	}
}
```

Usage
-----

[](#usage)

```
use Onoi\Tesa\SanitizerFactory;
use Onoi\Tesa\Transliterator;
use Onoi\Tesa\Sanitizer;

$sanitizerFactory = new SanitizerFactory();

$sanitizer = $sanitizerFactory->newSanitizer( 'A string that contains ...' );

$sanitizer->reduceLengthTo( 200 );
$sanitizer->toLowercase();

$sanitizer->replace(
	array( "'", "http://", "https://", "mailto:", "tel:" ),
	array( '' )
);

$sanitizer->setOption( Sanitizer::MIN_LENGTH, 4 );
$sanitizer->setOption( Sanitizer::WHITELIST, array( 'that' ) );

$sanitizer->applyTransliteration(
	Transliterator::DIACRITICS | Transliterator::GREEK
);

$text = $sanitizer->sanitizeWith(
	$sanitizerFactory->newGenericTokenizer(),
	$sanitizerFactory->newNullStopwordAnalyzer(),
	$sanitizerFactory->newNullSynonymizer()
);
```

- `SanitizerFactory` is expected to be the sole entry point for services and instances when used outside of this library
- `IcuWordBoundaryTokenizer` is a preferred tokenizer in case the [ICU](http://php.net/manual/en/intro.intl.php) extension is available
- `NGramTokenizer` is provided to increase CJK match confidence in case the back-end does not provide an explicit ngram tokenizer
- `StopwordAnalyzer` together with a `LanguageDetector` is provided as a means to reduce ambiguity of frequent "noise" words from a possible search index
- `Synonymizer` currently only provides an interface

Contribution and support
------------------------

[](#contribution-and-support)

If you want to contribute work to the project please subscribe to the developers mailing list and have a look at the [contribution guidelinee](/CONTRIBUTING.md). A list of people who have made contributions in the past can be found [here](https://github.com/onoi/tesa/graphs/contributors).

- [File an issue](https://github.com/onoi/tesa/issues)
- [Submit a pull request](https://github.com/onoi/tesa/pulls)

Tests
-----

[](#tests)

The library provides unit tests that covers the core-functionality normally run by the [continues integration platform](https://travis-ci.org/onoi/tesa). Tests can also be executed manually using the `composer phpunit` command from the root directory.

Release notes
-------------

[](#release-notes)

- 0.1.0 Initial release (2016-08-07)
- Added `SanitizerFactory` with support for a
- `Tokenizer`, `LanguageDetector`, `Synonymizer`, and `StopwordAnalyzer` interface

Acknowledgments
---------------

[](#acknowledgments)

- The `Transliterator` uses the same diacritics conversion table as (except the German diaeresis ä, ü, and ö)
- The stopwords used by the `StopwordAnalyzer` have been collected from different sources, each `json`file identifies its origin
- `CdbStopwordAnalyzer` relies on `wikimedia/cdb` to avoid using an external database or cache layer (with extra stopwords being available [here](https://github.com/6/stopwords-json))
- `JaTinySegmenterTokenizer` is based on the work of Taku Kudo and his [tiny\_segmenter.js](http://chasen.org/~taku/software/TinySegmenter)
- `TextCatLanguageDetector` uses the [`wikimedia/textcat`](https://github.com/wikimedia/wikimedia-textcat) library to make predictions about a language

License
-------

[](#license)

[GNU General Public License 2.0 or later](https://www.gnu.org/copyleft/gpl.html).

###  Health Score

32

—

LowBetter than 72% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity36

Limited adoption so far

Community15

Small or concentrated contributor base

Maturity48

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 78.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

3571d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/372f9bc1233d5518b9522cb681210a8de2765a3a9bbde20138f6ad5332a411ca?d=identicon)[mwjames](/maintainers/mwjames)

---

Top Contributors

[![mwjames](https://avatars.githubusercontent.com/u/1245473?v=4)](https://github.com/mwjames "mwjames (11 commits)")[![jaideraf](https://avatars.githubusercontent.com/u/3636594?v=4)](https://github.com/jaideraf "jaideraf (2 commits)")[![kghbln](https://avatars.githubusercontent.com/u/1104078?v=4)](https://github.com/kghbln "kghbln (1 commits)")

---

Tags

transliteration

### Embed Badge

![Health badge](/badges/onoi-tesa/health.svg)

```
[![Health](https://phpackages.com/badges/onoi-tesa/health.svg)](https://phpackages.com/packages/onoi-tesa)
```

###  Alternatives

[webmozart/assert

Assertions to validate method input/output with nice error messages.

7.6k894.0M1.2k](/packages/webmozart-assert)[bensampo/laravel-enum

Simple, extensible and powerful enumeration implementation for Laravel.

2.0k15.9M104](/packages/bensampo-laravel-enum)[swaggest/json-schema

High definition PHP structures with JSON-schema based validation

48612.5M73](/packages/swaggest-json-schema)[stevebauman/purify

An HTML Purifier / Sanitizer for Laravel

5325.6M19](/packages/stevebauman-purify)[ashallendesign/laravel-config-validator

A package for validating your Laravel app's config.

217905.3k5](/packages/ashallendesign-laravel-config-validator)[crazybooot/base64-validation

Laravel validators for base64 encoded files

1341.9M8](/packages/crazybooot-base64-validation)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
