PHPackages                             patrickschur/language-detection - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. patrickschur/language-detection

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

patrickschur/language-detection
===============================

A language detection library for PHP. Detects the language from a given text string.

v5.3.1(1y ago)8513.2M—6.2%85[7 issues](https://github.com/patrickschur/language-detection/issues)17MITPHPPHP ^7.4 || ^8.0

Since Dec 25Pushed 1y ago27 watchersCompare

[ Source](https://github.com/patrickschur/language-detection)[ Packagist](https://packagist.org/packages/patrickschur/language-detection)[ Docs](https://github.com/patrickschur/language-detection)[ RSS](/packages/patrickschur-language-detection/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (1)Versions (22)Used By (17)

language-detection
==================

[](#language-detection)

Build StatusCode CoverageVersionTotal DownloadsMinimum PHP VersionLicense[![Build Status](https://camo.githubusercontent.com/11c66dee5681cad84cdb934fd35d9d27f2d35c83b01616a38f7b681bcad5f7f4/68747470733a2f2f7472617669732d63692e6f72672f7061747269636b73636875722f6c616e67756167652d646574656374696f6e2e7376673f6272616e63683d6d6173746572)](https://travis-ci.org/patrickschur/language-detection)[![codecov](https://camo.githubusercontent.com/2a88204cc802cf6287e645a1e9785c55b4774ef1418701c9c6bb50f6aa5d6aad/68747470733a2f2f636f6465636f762e696f2f67682f7061747269636b73636875722f6c616e67756167652d646574656374696f6e2f6272616e63682f6d61737465722f67726170682f62616467652e737667)](https://codecov.io/gh/patrickschur/language-detection)[![Version](https://camo.githubusercontent.com/b62e86f72897424dd3e9c1e86549206f5158bd2cdc50ddd62ad4e0a275a09522/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f7061747269636b73636875722f6c616e67756167652d646574656374696f6e2e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/patrickschur/language-detection)[![Total Downloads](https://camo.githubusercontent.com/020c9233d61d610dabbf5f788136ad3bfc9a227b7daec265098e868c6fc6b3f3/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f7061747269636b73636875722f6c616e67756167652d646574656374696f6e2e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/patrickschur/language-detection)[![Minimum PHP Version](https://camo.githubusercontent.com/03d00c0b486426808b9c25f863c9209ab684ed0903f57040cc1ec7af25af5de1/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d253345253344253230372e342d3441433531432e7376673f7374796c653d666c61742d737175617265)](https://php.net/)[![License](https://camo.githubusercontent.com/f31db3334342f319d57f22087529c67b979845829ebf287e16e287e6b09690e8/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f7061747269636b73636875722f6c616e67756167652d646574656374696f6e2e7376673f7374796c653d666c61742d737175617265)](https://opensource.org/licenses/MIT)This library can detect the language of a given text string. It can parse given training text in many different idioms into a sequence of [N-grams](https://en.wikipedia.org/wiki/N-gram) and builds a database file in PHP to be used in the detection phase. Then it can take a given text and detect its language using the database previously generated in the training phase. The library comes with text samples used for training and detecting text in 110 languages.

Table of Contents
-----------------

[](#table-of-contents)

- [Installation with Composer](#installation-with-composer)
- [How to upgrade from 3.y.z to 4.y.z?](#how-to-upgrade)
- [Basic Usage](#basic-usage)
- [API](#api)
- [Method Chaining](#method-chaining)
- [Array Access](#arrayaccess)
- [List of supported languages](#supported-languages)
- [Other languages](#other-languages)
- [FAQ](#faq)
- [Contributing](#contributing)
- [License](#license)

Installation with Composer
--------------------------

[](#installation-with-composer)

> **Note:** This library requires the [Multibyte String](https://secure.php.net/manual/en/book.mbstring.php) extension in order to work.

```
$ composer require patrickschur/language-detection
```

 How to upgrade from `3.y.z` to `4.y.z`?
----------------------------------------------------------------------------------

[](#-how-to-upgrade-from-3yz-to-4yz)

**Important**: Only for people who are using a **custom directory** with their **own** translation files.

Starting with version `4.y.z` we have updated the resource files. For performance reasons we now use PHP instead of JSON as a format. That means people who want to use `4.y.z` and used `3.y.z` before, have to upgrade their JSON files to PHP. To upgrade your resource files you must generate a language profile again. The JSON files are then no longer needed.

You can delete unnecessary JSON files under Linux with the following command.

```
rm resources/*/*.json
```

Basic Usage
-----------

[](#basic-usage)

To detect the language correctly, the length of the input text should be at least some sentences.

```
use LanguageDetection\Language;

$ld = new Language;

$ld->detect('Mag het een onsje meer zijn?')->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151,
    "nb" => 0.48849462365591,
    "nn" => 0.48741935483871,
    "fy" => 0.47822580645161,
    "dk" => 0.47172043010753,
    "sv" => 0.46408602150538,
    "bi" => 0.46021505376344,
    "de" => 0.45903225806452,
    [...]
)

```

API
---

[](#api)

#### `__construct(array $result = [], string $dirname = '')`

[](#__constructarray-result---string-dirname--)

You can pass an array of languages to the constructor. To compare the desired sentence only with the given languages. This can dramatically increase the performance. The other parameter is optional and the name of the directory where the translations files are located.

```
$ld = new Language(['de', 'en', 'nl']);

// Compares the sentence only with "de", "en" and "nl" language models.
$ld->detect('Das ist ein Test');
```

---

#### `whitelist(string ...$whitelist)`

[](#whiteliststring-whitelist)

Provide a whitelist. Returns a list of languages, which are required.

```
$ld->detect('Mag het een onsje meer zijn?')->whitelist('de', 'nn', 'nl', 'af')->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "nn" => 0.48741935483871,
    "de" => 0.45903225806452
)

```

---

#### `blacklist(string ...$blacklist)`

[](#blackliststring-blacklist)

Provide a blacklist. Removes the given languages from the result.

```
$ld->detect('Mag het een onsje meer zijn?')->blacklist('dk', 'nb', 'de')->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151,
    "nn" => 0.48741935483871,
    "fy" => 0.47822580645161,
    "sv" => 0.46408602150538,
    "bi" => 0.46021505376344,
    [...]
)

```

---

#### `bestResults()`

[](#bestresults)

Returns the best results.

```
$ld->detect('Mag het een onsje meer zijn?')->bestResults()->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097
)

```

---

#### `limit(int $offset, ?int $length = null)`

[](#limitint-offset-int-length--null)

You can specify the number of records to return. For example the following code will return the top three entries.

```
$ld->detect('Mag het een onsje meer zijn?')->limit(0, 3)->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097,
    "af" => 0.51338709677419,
    "br" => 0.49634408602151
)

```

---

#### `close()`

[](#close)

Returns the result as an array.

```
$ld->detect('This is an example!')->close();
```

Result:

```
Array
(
    "en" => 0.5889400921659,
    "gd" => 0.55691244239631,
    "ga" => 0.55376344086022,
    "et" => 0.48294930875576,
    "af" => 0.48218125960061,
    [...]
)

```

---

#### `setTokenizer(TokenizerInterface $tokenizer)`

[](#settokenizertokenizerinterface-tokenizer)

The script use a tokenizer for getting all words in a sentence. You can define your own tokenizer to deal with numbers for example.

```
$ld->setTokenizer(new class implements TokenizerInterface
{
    public function tokenize(string $str): array
    {
        return preg_split('/[^a-z0-9]/u', $str, -1, PREG_SPLIT_NO_EMPTY);
    }
});
```

This will return only characters from the alphabet in lowercase and numbers between 0 and 9.

---

#### `__toString()`

[](#__tostring)

Returns the top entrie of the result. Note the `echo` at the beginning.

```
echo $ld->detect('Das ist ein Test.');
```

Result:

```
de

```

---

#### `jsonSerialize()`

[](#jsonserialize)

Serialized the data to JSON.

```
$object = $ld->detect('Tere tulemast tagasi! Nägemist!');

json_encode($object, JSON_PRETTY_PRINT);
```

Result:

```
{
    "et": 0.5224748810153358,
    "ch": 0.45817028027498674,
    "bi": 0.4452670544685352,
    "fi": 0.440983606557377,
    "lt": 0.4382866208355367,
    [...]
}

```

---

Method chaining
---------------

[](#method-chaining)

You can also combine methods with each other. The following example will remove all entries specified in the blacklist and returns only the top four entries.

```
$ld->detect('Mag het een onsje meer zijn?')->blacklist('af', 'dk', 'sv')->limit(0, 4)->close();
```

Result:

```
Array
(
    "nl" => 0.66193548387097
    "br" => 0.49634408602151
    "nb" => 0.48849462365591
    "nn" => 0.48741935483871
)

```

---

ArrayAccess
-----------

[](#arrayaccess)

You can also access the object directly as an array.

```
$object = $ld->detect(Das ist ein Test');

echo $object['de'];
echo $object['en'];
echo $object['xy']; // does not exists
```

Result:

```
0.6623339658444
0.56859582542694
NULL

```

---

Supported languages
-------------------

[](#supported-languages)

The library currently supports 110 languages. To get an overview of all supported languages please have a look at [here](resources/README.md).

---

Other languages
---------------

[](#other-languages)

The library is trainable which means you can change, remove and add your own language files to it. If your language not supported, feel free to add your own language files. To do that, create a new directory in `resources` and add your training text to it.

> **Note:** The training text should be a **.txt** file.

#### Example

[](#example)

```
|- resources
    |- ham
        |- ham.txt
    |- spam
        |- spam.txt

```

As you can see, we can also used it to detect spam or ham.

When you stored your translation files outside of `resources`, you have to specify the path.

```
$t->learn('YOUR_PATH_HERE');
```

Whenever you change one of the translation files you must first generate a language profile for it. This may take a few seconds.

```
use LanguageDetection\Trainer;

$t = new Trainer();

$t->learn();
```

Remove these few lines after execution and now we can classify texts by their language with our own training text.

---

FAQ
---

[](#faq)

#### How can I improve the detection phase?

[](#how-can-i-improve-the-detection-phase)

To improve the detection phase you have to use more n-grams. But be careful this will slow down the script. I figured out that the detection phase is much better when you are using around 9.000 n-grams (default is 310). To do that look at the code right below:

```
$t = new Trainer();

$t->setMaxNgrams(9000);

$t->learn();
```

First you have to train it. Now you can classify texts like before but you must specify how many n-grams you want to use.

```
$ld = new Language();

$ld->setMaxNgrams(9000);

// "grille pain" is french and means "toaster" in english
var_dump($ld->detect('grille pain')->bestResults());
```

Result:

```
class LanguageDetection\LanguageResult#5 (1) {
  private $result =>
  array(2) {
    'fr' =>
    double(0.91307037037037)
    'en' =>
    double(0.90623333333333)
  }
}

```

#### Is the detection process slower if language files are very big?

[](#is-the-detection-process-slower-if-language-files-are-very-big)

No it is not. The trainer class will only use the best 310 n-grams of the language. If you don't change this number or add more language files it will not affect the performance. Only creating the N-grams is slower. However, the creation of N-grams must be done only once. The detection phase is only affected when you are trying to detect big chunks of texts.

> **Summary**: The training phase will be slower but the detection phase remains the same.

Contributing
------------

[](#contributing)

Feel free to contribute. Any help is welcome.

License
-------

[](#license)

This projects is licensed under the terms of the MIT license.

###  Health Score

60

—

FairBetter than 99% of packages

Maintenance45

Moderate activity, may be stable

Popularity66

Solid adoption and visibility

Community38

Small or concentrated contributor base

Maturity77

Established project with proven stability

 Bus Factor1

Top contributor holds 82.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~150 days

Recently: every ~391 days

Total

21

Last Release

419d ago

Major Versions

v1.2 → v2.02017-01-09

v2.1.1 → v3.02017-02-04

v3.4.2 → v4.0.02020-08-09

v4.0.1 → v5.0.02020-12-11

PHP version history (5 changes)v1.0PHP ^7.0 || ^7.1

v3.0PHP ^7

v3.4.1PHP ^7.2

v5.0.0PHP ^7.3 || ^8.0

v5.2.0PHP ^7.4 || ^8.0

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/18181635?v=4)[Patrick Schur](/maintainers/patrickschur)[@patrickschur](https://github.com/patrickschur)

---

Top Contributors

[![patrickschur](https://avatars.githubusercontent.com/u/18181635?v=4)](https://github.com/patrickschur "patrickschur (97 commits)")[![arsonik](https://avatars.githubusercontent.com/u/1333734?v=4)](https://github.com/arsonik "arsonik (4 commits)")[![Mejans](https://avatars.githubusercontent.com/u/61360811?v=4)](https://github.com/Mejans "Mejans (4 commits)")[![matthewnessworthy](https://avatars.githubusercontent.com/u/5653887?v=4)](https://github.com/matthewnessworthy "matthewnessworthy (2 commits)")[![JoyceBabu](https://avatars.githubusercontent.com/u/502942?v=4)](https://github.com/JoyceBabu "JoyceBabu (2 commits)")[![tomasliubinas](https://avatars.githubusercontent.com/u/1522729?v=4)](https://github.com/tomasliubinas "tomasliubinas (1 commits)")[![gradzio](https://avatars.githubusercontent.com/u/32451455?v=4)](https://github.com/gradzio "gradzio (1 commits)")[![drowe-wayfair](https://avatars.githubusercontent.com/u/126295987?v=4)](https://github.com/drowe-wayfair "drowe-wayfair (1 commits)")[![dayvsonsales](https://avatars.githubusercontent.com/u/5778467?v=4)](https://github.com/dayvsonsales "dayvsonsales (1 commits)")[![Pierstoval](https://avatars.githubusercontent.com/u/3369266?v=4)](https://github.com/Pierstoval "Pierstoval (1 commits)")[![stof](https://avatars.githubusercontent.com/u/439401?v=4)](https://github.com/stof "stof (1 commits)")[![Toflar](https://avatars.githubusercontent.com/u/481937?v=4)](https://github.com/Toflar "Toflar (1 commits)")[![iquito](https://avatars.githubusercontent.com/u/973653?v=4)](https://github.com/iquito "iquito (1 commits)")

---

Tags

languagelanguage-detectionn-gramsnatural-language-processingnlpphptraininglanguagedetectiondetect

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/patrickschur-language-detection/health.svg)

```
[![Health](https://phpackages.com/badges/patrickschur-language-detection/health.svg)](https://phpackages.com/packages/patrickschur-language-detection)
```

###  Alternatives

[phpoption/phpoption

Option Type for PHP

2.7k541.2M159](/packages/phpoption-phpoption)[hisorange/browser-detect

Browser &amp; Mobile detection package for Laravel.

1.1k10.1M50](/packages/hisorange-browser-detect)[cbschuld/browser.php

A PHP Class to detect a user's Browser. This encapsulation provides a breakdown of the browser and the version of the browser using the browser's user-agent string. This is not a guaranteed solution but provides an overall accurate way to detect what browser a user is using.

5876.7M19](/packages/cbschuld-browserphp)[foroco/php-browser-detection

Ultra fast PHP library to detect browser, OS, platform and device type by User-Agent parsing

1554.7M7](/packages/foroco-php-browser-detection)[nitotm/efficient-language-detector

Fast and accurate natural language detection. Detector written in PHP. Nito-ELD, ELD.

59252.9k6](/packages/nitotm-efficient-language-detector)[unicodeveloper/laravel-identify

A Laravel 5 Package Provider to Identify/detect a user's browser, device, operating system and Language

19322.0k](/packages/unicodeveloper-laravel-identify)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
