PHPackages                             andywer/language-detector - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. andywer/language-detector

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

andywer/language-detector
=========================

PHP library to detect the language of any free text.

0.2.0(10y ago)026BSD-4-ClausePHP

Since Apr 11Pushed 10y ago1 watchersCompare

[ Source](https://github.com/andywer/language-detector)[ Packagist](https://packagist.org/packages/andywer/language-detector)[ RSS](/packages/andywer-language-detector/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (2)Versions (8)Used By (0)

LanguageDetector [![Build Status](https://camo.githubusercontent.com/bd47fc04624b1f430283bc11f7c77436b5b89abcee0b53cf8cfdc53784008b2c/68747470733a2f2f7472617669732d63692e6f72672f616e64797765722f6c616e67756167652d6465746563746f722e706e67)](https://travis-ci.org/andywer/language-detector)
===============================================================================================================================================================================================================================================================================================

[](#languagedetector-)

PHP library to detect languages from any free text.

It follows the approach described in the [paper](http://scholar.google.com.py/scholar?q=N-Gram-Based+Text+Categorization), a given text is tokenized into [N-Grams](http://en.wikipedia.org/wiki/N-gram) (we cleanup whitespaces before doing this step). Then we sort the `tokens` and we compare against a language `model`.

*Fork of [crodas/languagedetector](https://github.com/crodas/LanguageDetector), since the original package seems abandoned.*

How it works
------------

[](#how-it-works)

The first thing we need is a `language model` (which looks like [this file](https://github.com/crodas/LanguageDetector/blob/master/example/datafile.php)) that is used to compare the texts against at classification time. This process must done *before* anything, and it can be generated with an script similar to [this file](https://github.com/crodas/LanguageDetector/blob/master/example/learn.php).

```
// register the autoloader
require 'lib/LanguageDetector/autoload.php';

// it could use a little bit of memory, but it's fine
// because this process runs once.
ini_set('memory_limit', '1G');

// we load the configuration (which will be serialized
// later into our language model file
$config = new LanguageDetector\Config;

$c = new LanguageDetector\Learn($config);
foreach (glob(__DIR__ . '/samples/*') as $file) {
    // feed with examples ('language', 'text');
    $c->addSample(basename($file), file_get_contents($file));
}

// some callback so we know where the process is
$c->addStepCallback(function($lang, $status) {
    echo "Learning {$lang}: $status\n";
});

// save it in `datafile`.
// we currently support the `php` serialization but it's trivial
// to add other formats, just extend `\LanguageDetector\Format\AbstractFormat`.
//You can check example at https://github.com/crodas/LanguageDetector/blob/master/lib/LanguageDetector/Format/PHP.php
$c->save(AbstractFormat::initFormatByPath('language.php'));
```

Once we have our language model file (in this case `language.php`) we're ready to classify texts by their language.

```
// register the autoloader
require 'lib/LanguageDetector/autoload.php';

// we load the language model, it would create
// the $config object for us.
$detect = LanguageDetector\Detect::initByPath('language.php');

$lang = $detect->detect("Agricultura (-ae, f.), sensu latissimo,
est summa omnium artium et scientiarum et technologiarum quae de
terris colendis et animalibus creandis curant, ut poma, frumenta,
charas, carnes, textilia, et aliae res e terra bene producantur.
Specialius, agronomia est ars et scientia quae terris colendis student,
agricultio autem animalibus creandis.")

var_dump($lang);
```

And that's it.

Algorithms
----------

[](#algorithms)

The project is designed to work with modules, which means you can provide your own algorithm for `sorting` and `comparing` the N-Grams. By default the library implements the [PageRank](http://en.wikipedia.org/wiki/PageRank) as `sorting` algorithm, and *out of place* (described in the paper) as `comparing`.

In order to supply your own algorithms, you must change the `$config` at *learning stage* to load your own classes (which by the way should implement some interaces).

Language Detection Training Files
---------------------------------

[](#language-detection-training-files)

Have a look at `example/samples` directory. For more advanced traning data, visit the [Leipzig Corpora Download Page](http://corpora2.informatik.uni-leipzig.de/download.html).

Languages with non-latin characters
-----------------------------------

[](#languages-with-non-latin-characters)

Remember to set the Config's `mb` property (already before creating the language model) if you train for languages based on non-latin characters. Use UTF-8 encoded texts.

###  Health Score

26

—

LowBetter than 43% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity58

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 77% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~346 days

Total

3

Last Release

3884d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/8ff954113ccc159a8aeafc8805da9dd73e33f6a85bbfd5c3e9117481c3b0f08d?d=identicon)[andywer](/maintainers/andywer)

---

Top Contributors

[![crodas](https://avatars.githubusercontent.com/u/36463?v=4)](https://github.com/crodas "crodas (47 commits)")[![mente](https://avatars.githubusercontent.com/u/391997?v=4)](https://github.com/mente "mente (7 commits)")[![andywer](https://avatars.githubusercontent.com/u/1842462?v=4)](https://github.com/andywer "andywer (3 commits)")[![adam-lynch](https://avatars.githubusercontent.com/u/1427241?v=4)](https://github.com/adam-lynch "adam-lynch (2 commits)")[![pborreli](https://avatars.githubusercontent.com/u/77759?v=4)](https://github.com/pborreli "pborreli (1 commits)")[![sasezaki](https://avatars.githubusercontent.com/u/42755?v=4)](https://github.com/sasezaki "sasezaki (1 commits)")

### Embed Badge

![Health badge](/badges/andywer-language-detector/health.svg)

```
[![Health](https://phpackages.com/badges/andywer-language-detector/health.svg)](https://phpackages.com/packages/andywer-language-detector)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
