PHPackages                             phonyland/ngram - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. phonyland/ngram

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

phonyland/ngram
===============

🧪 N-Gram Tools for 🙃 Phony Language Models with sanitizing, tokenization, n-gram extraction, frequency mapping

21.1k1[1 issues](https://github.com/phonyland/ngram/issues)PHP

Since May 22Pushed 2y ago1 watchersCompare

[ Source](https://github.com/phonyland/ngram)[ Packagist](https://packagist.org/packages/phonyland/ngram)[ RSS](/packages/phonyland-ngram/feed)WikiDiscussions master Synced 3w ago

READMEChangelogDependenciesVersions (1)Used By (0)

[![Phony Logo - Light](https://raw.githubusercontent.com/phonyland/artwork/master/logo-light.png#gh-light-mode-only)](https://raw.githubusercontent.com/phonyland/artwork/master/logo-light.png#gh-light-mode-only)[![Phony Logo - Dark](https://raw.githubusercontent.com/phonyland/artwork/master/logo-dark.png#gh-dark-mode-only)](https://raw.githubusercontent.com/phonyland/artwork/master/logo-dark.png#gh-dark-mode-only)

🧪
N-Gram Tools
==============

[](#n-gram-tools)

This repository contains the N-Gram Tools for 🙃 Phony Language that includes features like sanitizing, tokenization, n-gram extraction, frequency mapping.

🚀 Installation
--------------

[](#-installation)

Requires PHP `>= 8.0`.

You can install the package via composer:

```
composer require phonyland/ngram
```

⌨️ Usage
--------

[](#️-usage)

### Tokenizer

[](#tokenizer)

### Word Tokenization

[](#word-tokenization)

```
$tokenizer->tokenize($text);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;
use Phonyland\NGram\TokenizerFilter;

$tokenizer = new Tokenizer();
$tokenizer
  ->addWordSeparatorPattern(';')
  ->addWordSeparatorPattern('\s')
  ->addWordFilterRule(TokenizerFilterType::NO_SYMBOLS);

$text = 'sample   text;sample;text';

$tokenizer->tokenize($text);
```

 🖥 Output```
[
    "sample",
    "text",
    "sample",
    "text",
];
```

### Sentence Tokenization

[](#sentence-tokenization)

```
$tokenizer->sentences($text);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;

$tokenizer = new Tokenizer();
$tokenizer
  ->addSentenceSeparatorPattern('.')
  ->addSentenceSeparatorPattern('!')
  ->addSentenceSeparatorPattern('?');

$text = 'Sample Sentence. Sample Sentence! Sample Sentence? Sample Sentence no. 4?! Sample sample sentence... End';

$tokenizer->sentences($text);
```

 🖥 Output```
[
    "Sample Sentence.",
    "Sample Sentence!",
    "Sample Sentence?",
    "Sample Sentence no.",
    "4?!",
    "Sample sample sentence...",
    "End",
];
```

### Word Tokenization by Sentences

[](#word-tokenization-by-sentences)

```
$tokenizer->tokenizeBySentences($text);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;
use Phonyland\NGram\TokenizerFilter;

$tokenizer = new Tokenizer();
$tokenizer
  ->addSentenceSeparatorPattern('.')
  ->addSentenceSeparatorPattern('!')
  ->addSentenceSeparatorPattern('?')
  ->addWordFilterRule(TokenizerFilterType::NO_SYMBOLS)
  ->addWordSeparatorPattern(TokenizerFilterType::WHITESPACE_SEPARATOR);

$text = 'Sample Sentence. Sample Sentence! Sample Sentence? Sample Sentence no. 4?! Sample sample sentence... End';

$tokenizer->tokenizeBySentences($text);
```

 🖥 Output```
[
    ["Sample", "Sentence"],
    ["Sample", "Sentence"],
    ["Sample", "Sentence"],
    ["Sample", "Sentence", "no"],
    ["Sample", "sample", "sentence"],
    ["End"],
];
```

### N-Gram

[](#n-gram)

#### N-Gram Sequence

[](#n-gram-sequence)

```
NGramSequence::multigram($n, $tokens, $isUnique);
NGramSequence::trigram($tokens, $isUnique);
NGramSequence::bigram($tokens, $isUnique);
NGramSequence::unigram($tokens, $isUnique);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;
use Phonyland\NGram\NGramSequence;
use Phonyland\NGram\TokenizerFilter;

$tokenizer = new Tokenizer();
$tokenizer->addWordSeparatorPattern(TokenizerFilterType::WHITESPACE_SEPARATOR);
$tokens = $tokenizer->tokenize('sample text');

NGramSequence::multigram(4, $tokens);
// ['samp', 'ampl', 'mple', 'text'];

// Generate Unique N-Grams
NGramSequence::unigram($tokens, true);
// ['s', 'a', 'm', 'p', 'l', 'e', 't', 'x'];
```

#### N-Gram Sequences with Count

[](#n-gram-sequences-with-count)

```
NGramCount::multigram(4, $tokens);
NGramCount::trigram($tokens);
NGramCount::bigram($tokens);
NGramCount::unigram($tokens);

NGramCount::incrementElementCount($element, $elements);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;
use Phonyland\NGram\NGramCount;

$tokenizer = new Tokenizer();
$tokenizer->addWordSeparatorPattern(TokenizerFilterType::WHITESPACE_SEPARATOR);
$tokens = $tokenizer->tokenize('sample text');

NGramCount::multigram(4, $tokens);
// [
//     'samp' => 1,
//     'ampl' => 1,
//     'mple' => 1,
//     'text' => 1,
// ];
```

#### N-Gram Frequency

[](#n-gram-frequency)

```
NGramFrequency::multigram(4, $tokens);
NGramFrequency::multigram($tokens);
NGramFrequency::bigram($tokens);
NGramFrequency::unigram($tokens);

NGramFrequency::frequencyFromCount($countArray);
```

 ⌨️ Usage```
use Phonyland\NGram\Tokenizer;
use Phonyland\NGram\NGramFrequency;
use Phonyland\NGram\TokenizerFilter;

$tokenizer = new Tokenizer();
$tokenizer->addWordSeparatorPattern(TokenizerFilterType::WHITESPACE_SEPARATOR);
$tokenizer->addWordFilterRule(TokenizerFilterType::ALPHABETICAL);
$tokens = $tokenizer->tokenize('bombadil! bombadillo!');

NGramFrequency::multigram(4, $tokens);
//[
//    'bomb' => 0.16666666666666666,
//    'omba' => 0.16666666666666666,
//    'mbad' => 0.16666666666666666,
//    'badi' => 0.16666666666666666,
//    'adil' => 0.16666666666666666,
//    'dill' => 0.08333333333333333,
//    'illo' => 0.08333333333333333,
//]
```

🙃
=

[](#)

Start generating fake data with 🙃 Phony Framework,
visit the main **[Phony Repository](https://github.com/phonyland/framework)**.

Explore the docs » **[https://phony.land](https://phony.land/)**
Follow us on Twitter » **[@phony\_land](https://twitter.com/phony_land)**

**[🙃 Phony
Fake Data Generation Framework](https://phony.land)**
was created by
**[Yunus Emre Deligöz](https://twitter.com/yedeligoez)**
under
**[MIT license](https://opensource.org/licenses/MIT)**.

###  Health Score

15

—

LowBetter than 3% of packages

Maintenance10

Infrequent updates — may be unmaintained

Popularity18

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity19

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/3030815?v=4)[Yunus Emre Deligöz](/maintainers/deligoez)[@deligoez](https://github.com/deligoez)

---

Top Contributors

[![deligoez](https://avatars.githubusercontent.com/u/3030815?v=4)](https://github.com/deligoez "deligoez (114 commits)")

---

Tags

ngramngram-extractionnlpphonyphpphp-librarysanitizingtokenizationtokenizer

### Embed Badge

![Health badge](/badges/phonyland-ngram/health.svg)

```
[![Health](https://phpackages.com/badges/phonyland-ngram/health.svg)](https://phpackages.com/packages/phonyland-ngram)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)