PHPackages                             thingston/language - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. thingston/language

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

thingston/language
==================

Pure-PHP language detection library using n-gram profiles

00PHP

Since Jun 19Pushed todayCompare

[ Source](https://github.com/thingston/language)[ Packagist](https://packagist.org/packages/thingston/language)[ RSS](/packages/thingston-language/feed)WikiDiscussions master Synced today

READMEChangelogDependenciesVersions (1)Used By (0)

Thingston Language
==================

[](#thingston-language)

Pure-PHP language detection library using n-gram frequency profiles. Given any input text, returns a ranked list of candidate languages with confidence scores. No external services or compiled extensions required beyond `ext-mbstring`.

Installation
------------

[](#installation)

```
composer require thingston/language
```

Usage
-----

[](#usage)

```
use Thingston\Language\LanguageDetector;

$detector = new LanguageDetector();

$results = $detector->detect('Bonjour tout le monde');

$best = $results->best();
echo $best->getCode();       // fr
echo $best->getName();       // French
echo $best->getConfidence(); // ~0.97

foreach ($results->top(5) as $score) {
    echo $score->getCode() . ': ' . round($score->getConfidence() * 100, 1) . '%' . PHP_EOL;
}
```

### Restrict to a subset of languages

[](#restrict-to-a-subset-of-languages)

```
$detector = new LanguageDetector(languages: ['en', 'fr', 'de', 'es']);
$best = $detector->detect('The weather is lovely today')->best();
echo $best->getCode(); // en
```

### Custom n-gram sizes

[](#custom-n-gram-sizes)

```
$detector = new LanguageDetector(ngramSizes: [1, 2, 3]);
```

### Custom profile repository

[](#custom-profile-repository)

```
use Thingston\Language\Profile\ProfileRepository;

$detector = new LanguageDetector(profileRepository: new ProfileRepository('/path/to/profiles'));
```

Supported languages (74)
------------------------

[](#supported-languages-74)

CodeLanguageCodeLanguageCodeLanguageafAfrikaanshiHindipsPashtoamAmharichrCroatianptPortuguesearArabichuHungarianroRomanianazAzerbaijaniidIndonesianruRussianbeBelarusianigIgbosiSinhalabgBulgarianitItalianskSlovakbnBengalijaJapaneseslSlovenianbsBosniankkKazakhsoSomalicaCatalankmKhmersqAlbaniancsCzechknKannadasrSerbiancyWelshkoKoreansvSwedishdaDanishloLaoswSwahilideGermanltLithuaniantaTamilelGreeklvLatvianteTeluguenEnglishmkMacedoniantgTajikesSpanishmlMalayalamthThaietEstonianmnMongoliantlTagalogeuBasquemrMarathitrTurkishfaPersianmsMalayukUkrainianfiFinnishmyBurmeseurUrdufrFrenchneNepaliuzUzbekglGaliciannlDutchviVietnameseguGujaratinoNorwegianyoYorubahaHausapaPunjabizhChineseheHebrewplPolishHow it works
------------

[](#how-it-works)

Language profiles are built from [Tatoeba](https://tatoeba.org) sentence corpora (CC-BY 2.0). For each language, the top 1,000 most frequent character n-grams (sizes 1–4) are stored as relative frequencies.

At detection time the input text is normalized (lowercased, non-letter characters stripped), n-grams are extracted, and each language profile is scored with a log-probability sum. Raw scores are converted to confidences via softmax.

Rebuilding profiles
-------------------

[](#rebuilding-profiles)

If you want to regenerate language profiles from fresh training data:

```
# 1. Download Tatoeba sentence corpora (requires ext-bzip2)
php bin/download-corpus.php

# 2. Build PHP profile files
php bin/build-profiles.php

# Optional: restrict to specific languages
php bin/download-corpus.php en fr de
php bin/build-profiles.php en fr de --top=1000 --sizes=1,2,3,4
```

Development
-----------

[](#development)

```
composer install

# Run unit tests
composer test-unit

# Run accuracy tests (requires built profiles)
composer test-accuracy

# Code style check
composer cs

# Code style fix
composer cs-fix

# Static analysis (PHPStan level 9)
composer stan
```

License
-------

[](#license)

MIT

###  Health Score

20

—

LowBetter than 13% of packages

Maintenance65

Regular maintenance activity

Popularity0

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity11

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/04e91a806958f09ca8b3f4c24c6f5c4bcbce3c4e44dd882fa1e4c737ee7774c6?d=identicon)[pedrobrazao](/maintainers/pedrobrazao)

---

Top Contributors

[![pedrobrazao](https://avatars.githubusercontent.com/u/772193?v=4)](https://github.com/pedrobrazao "pedrobrazao (1 commits)")

### Embed Badge

![Health badge](/badges/thingston-language/health.svg)

```
[![Health](https://phpackages.com/badges/thingston-language/health.svg)](https://phpackages.com/packages/thingston-language)
```

###  Alternatives

[minime/annotations

The KISS PHP annotations library

229386.3k38](/packages/minime-annotations)[phpcfdi/cfdi-sat-scraper

Web Scraping para extraer facturas electrónicas desde la página del SAT

9322.4k](/packages/phpcfdi-cfdi-sat-scraper)[mehedi-iitdu/core-component-repository

My awesome package

1552.4k](/packages/mehedi-iitdu-core-component-repository)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
