PHPackages                             xcalder/languagedetector - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. xcalder/languagedetector

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

xcalder/languagedetector
========================

simple library to classify texts

v0.1.1(12y ago)019BSD-4-ClausePHP

Since Apr 11Pushed 7y ago1 watchersCompare

[ Source](https://github.com/xcalder/LanguageDetector)[ Packagist](https://packagist.org/packages/xcalder/languagedetector)[ RSS](/packages/xcalder-languagedetector/feed)WikiDiscussions master Synced yesterday

READMEChangelogDependencies (1)Versions (6)Used By (0)

LanguageDetector [![Build Status](https://camo.githubusercontent.com/7ec31c8fcec9646e190597f381c156bc89bce7de23271fb3e1a6b6001e976b15/68747470733a2f2f7472617669732d63692e6f72672f63726f6461732f4c616e67756167654465746563746f722e706e67)](https://travis-ci.org/crodas/LanguageDetector) [![Flattr this git repo](https://camo.githubusercontent.com/7e3f46a36526479d701ef7f90a0f8c3ac2fbab3087446e2a9fceed75cd1ab802/687474703a2f2f6170692e666c617474722e636f6d2f627574746f6e2f666c617474722d62616467652d6c617267652e706e67)](https://flattr.com/submit/auto?user_id=crodas&url=https://github.com/crodas/LanguageDetector&title=Language%20Detector%20Library&language=en&tags=github&category=software)
===========================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================

[](#languagedetector--)

PHP Class to detect languages from any free text.

It follows the approach described in the [paper](http://scholar.google.com.py/scholar?q=N-Gram-Based+Text+Categorization), a given text is tokenized into [N-Grams](http://en.wikipedia.org/wiki/N-gram) (we cleanup whitespaces before doing this step). Then we sort the `tokens` and we compare against a language `model`.

How it works
------------

[](#how-it-works)

The first thing we need is a `language model` (which looks like [this file](https://github.com/crodas/LanguageDetector/blob/master/example/datafile.php)) that is used to compare the texts against at classification time. This process must done *before* anything, and it can be generated with an script similar to [this file](https://github.com/crodas/LanguageDetector/blob/master/example/learn.php).

```
// register the autoloader
require 'lib/LanguageDetector/autoload.php';

// it could use a little bit of memory, but it's fine
// because this process runs once.
ini_set('memory_limit', '1G');

// we load the configuration (which will be serialized
// later into our language model file
$config = new LanguageDetector\Config;

$c = new LanguageDetector\Learn($config);
foreach (glob(__DIR__ . '/samples/*') as $file) {
    // feed with examples ('language', 'text');
    $c->addSample(basename($file), file_get_contents($file));
}

// some callback so we know where the process is
$c->addStepCallback(function($lang, $status) {
    echo "Learning {$lang}: $status\n";
});

// save it in `datafile`.
// we currently support the `php` serialization but it's trivial
// to add other formats, just extend `\LanguageDetector\Format\AbstractFormat`.
//You can check example at https://github.com/crodas/LanguageDetector/blob/master/lib/LanguageDetector/Format/PHP.php
$c->save(AbstractFormat::initFormatByPath('language.php'));
```

Once we have our language model file (in this case `language.php`) we're ready to classify texts by their language.

```
// register the autoloader
require 'lib/LanguageDetector/autoload.php';

// we load the language model, it would create
// the $config object for us.
$detect = LanguageDetector\Detect::initByPath('language.php');

$lang = $detect->detect("Agricultura (-ae, f.), sensu latissimo,
est summa omnium artium et scientiarum et technologiarum quae de
terris colendis et animalibus creandis curant, ut poma, frumenta,
charas, carnes, textilia, et aliae res e terra bene producantur.
Specialius, agronomia est ars et scientia quae terris colendis student,
agricultio autem animalibus creandis.")

var_dump($lang);
```

And that's it.

Algorithms
----------

[](#algorithms)

The project is designed to work with modules, which means you can provide your own algorithm for `sorting` and `comparing` the N-Grams. By default the library implements the [PageRank](http://en.wikipedia.org/wiki/PageRank) as `sorting` algorithm, and *out of place* (described in the paper) as `comparing`.

In order to supply your own algorithms, you must change the `$config` at *learning stage* to load your own classes (which by the way should implement some interaces).

###  Health Score

26

—

LowBetter than 43% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity57

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 78.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

4571d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/6959235?v=4)[xcalder](/maintainers/xcalder)[@xcalder](https://github.com/xcalder)

---

Top Contributors

[![crodas](https://avatars.githubusercontent.com/u/36463?v=4)](https://github.com/crodas "crodas (47 commits)")[![mente](https://avatars.githubusercontent.com/u/391997?v=4)](https://github.com/mente "mente (7 commits)")[![adam-lynch](https://avatars.githubusercontent.com/u/1427241?v=4)](https://github.com/adam-lynch "adam-lynch (2 commits)")[![xcalder](https://avatars.githubusercontent.com/u/6959235?v=4)](https://github.com/xcalder "xcalder (2 commits)")[![pborreli](https://avatars.githubusercontent.com/u/77759?v=4)](https://github.com/pborreli "pborreli (1 commits)")[![sasezaki](https://avatars.githubusercontent.com/u/42755?v=4)](https://github.com/sasezaki "sasezaki (1 commits)")

### Embed Badge

![Health badge](/badges/xcalder-languagedetector/health.svg)

```
[![Health](https://phpackages.com/badges/xcalder-languagedetector/health.svg)](https://phpackages.com/packages/xcalder-languagedetector)
```

###  Alternatives

[cleaniquecoders/profile

Common Profile Information

143.0k3](/packages/cleaniquecoders-profile)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
