PHPackages                             php-text-analysis/php-text-analysis - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. php-text-analysis/php-text-analysis

Abandoned → [yooper/php-text-analysis](/?search=yooper%2Fphp-text-analysis)Library[Utility &amp; Helpers](/categories/utility)

php-text-analysis/php-text-analysis
===================================

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

1.9.2(1y ago)5338.2k↓83.3%91[8 issues](https://github.com/yooper/php-text-analysis/issues)MITPHPPHP &gt;=7.4CI failing

Since Sep 26Pushed 1y ago39 watchersCompare

[ Source](https://github.com/yooper/php-text-analysis)[ Packagist](https://packagist.org/packages/php-text-analysis/php-text-analysis)[ RSS](/packages/php-text-analysis-php-text-analysis/feed)WikiDiscussions master Synced 3w ago

READMEChangelog (10)Dependencies (6)Versions (45)Used By (0)

php-text-analysis
=================

[](#php-text-analysis)

[![alt text](https://camo.githubusercontent.com/58ce8a87d57fb5b31505520c7bbdfad17b0f397f5a2d756aad14ce17a4d5aef4/68747470733a2f2f7472617669732d63692e6f72672f796f6f7065722f7068702d746578742d616e616c797369732e7376673f6272616e63683d6d6173746572 "Build status")](https://camo.githubusercontent.com/58ce8a87d57fb5b31505520c7bbdfad17b0f397f5a2d756aad14ce17a4d5aef4/68747470733a2f2f7472617669732d63692e6f72672f796f6f7065722f7068702d746578742d616e616c797369732e7376673f6272616e63683d6d6173746572)

[![Latest Stable Version](https://camo.githubusercontent.com/5f50fcaf3548eaff505aec684cef1dd0baeef6fe7e61f3dfe8c8f8e4a21b5558/68747470733a2f2f706f7365722e707567782e6f72672f796f6f7065722f7068702d746578742d616e616c797369732f762f737461626c65)](https://packagist.org/packages/yooper/php-text-analysis)

[![Total Downloads](https://camo.githubusercontent.com/0b554ab1fda829dcbe74c8fddb0865cae894ac8f1cc99c0a3b869161cb6b6179/68747470733a2f2f706f7365722e707567782e6f72672f796f6f7065722f7068702d746578742d616e616c797369732f646f776e6c6f616473)](https://packagist.org/packages/yooper/php-text-analysis)

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. There are tools in this library that can perform:

- document classification
- sentiment analysis
- compare documents
- frequency analysis
- tokenization
- stemming
- collocations with Pointwise Mutual Information
- lexical diversity
- corpus analysis
- text summarization

All the documentation for this project can be found in the book and wiki.

PHP Text Analysis Book &amp; Wiki
=================================

[](#php-text-analysis-book--wiki)

A book is in the works and your contributions are needed. You can find the book at

Also, documentation for the library resides in the wiki, too.

Installation Instructions
=========================

[](#installation-instructions)

Add PHP Text Analysis to your project

```
composer require yooper/php-text-analysis

```

### Tokenization

[](#tokenization)

```
$tokens = tokenize($text);
```

You can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class

```
$tokens = tokenize($text, \TextAnalysis\Tokenizers\PennTreeBankTokenizer::class);
```

The default tokenizer is **\\TextAnalysis\\Tokenizers\\GeneralTokenizer::class** . Some tokenizers require parameters to be set upon instantiation.

### Normalization

[](#normalization)

By default, **normalize\_tokens** uses the function **strtolower** to lowercase all the tokens. To customize the normalize function, pass in either a function or a string to be used by array\_map.

```
$normalizedTokens = normalize_tokens(array $tokens);
```

```
$normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');

$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });
```

### Frequency Distributions

[](#frequency-distributions)

The call to **freq\_dist** returns a [FreqDist](https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php) instance.

```
$freqDist = freq_dist(tokenize($text));
```

### Ngram Generation

[](#ngram-generation)

By default bigrams are generated.

```
$bigrams = ngrams($tokens);
```

Customize the ngrams

```
// create trigrams with a pipe delimiter in between each word
$trigrams = ngrams($tokens,3, '|');
```

### Stemming

[](#stemming)

By default stem method uses the Porter Stemmer.

```
$stemmedTokens = stem($tokens);
```

You can customize which type of stemmer to use by passing in the name of the stemmer class name

```
$stemmedTokens = stem($tokens, \TextAnalysis\Stemmers\MorphStemmer::class);
```

### Keyword Extract with Rake

[](#keyword-extract-with-rake)

There is a short cut method for using the Rake algorithm. You will need to clean your data prior to using. Second parameter is the ngram size of your keywords to extract.

```
$rake = rake($tokens, 3);
$results = $rake->getKeywordScores();
```

### Sentiment Analysis with Vader

[](#sentiment-analysis-with-vader)

Need Sentiment Analysis with PHP Use Vader,  . The PHP implementation can be invoked easily. Just normalize your data before hand.

```
$sentimentScores = vader($tokens);
```

### Document Classification with Naive Bayes

[](#document-classification-with-naive-bayes)

Need to do some document classification with PHP, trying using the Naive Bayes implementation. An example of classifying movie reviews can be found in the unit tests

```
$nb = naive_bayes();
$nb->train('mexican', tokenize('taco nacho enchilada burrito'));
$nb->train('american', tokenize('hamburger burger fries pop'));
$nb->predict(tokenize('my favorite food is a burrito'));
```

###  Health Score

50

—

FairBetter than 95% of packages

Maintenance37

Infrequent updates — may be unmaintained

Popularity45

Moderate usage in the ecosystem

Community28

Small or concentrated contributor base

Maturity76

Established project with proven stability

 Bus Factor1

Top contributor holds 80.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~71 days

Recently: every ~329 days

Total

43

Last Release

552d ago

PHP version history (7 changes)v1.0PHP &gt;=5.5

v1.2PHP &gt;=7.0

1.3.2PHP &gt;=7

1.4.9PHP &gt;=7.1

1.6PHP ~7.4

1.7PHP ~7.4|~8.0

1.9PHP &gt;=7.4

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/1064781?v=4)[yooper](/maintainers/yooper)[@yooper](https://github.com/yooper)

---

Top Contributors

[![yooper](https://avatars.githubusercontent.com/u/1064781?v=4)](https://github.com/yooper "yooper (192 commits)")[![Euak](https://avatars.githubusercontent.com/u/12688177?v=4)](https://github.com/Euak "Euak (28 commits)")[![carbon-cloud-deploy](https://avatars.githubusercontent.com/u/60390459?v=4)](https://github.com/carbon-cloud-deploy "carbon-cloud-deploy (6 commits)")[![thiagogomesverissimo](https://avatars.githubusercontent.com/u/908508?v=4)](https://github.com/thiagogomesverissimo "thiagogomesverissimo (4 commits)")[![ace411](https://avatars.githubusercontent.com/u/11040337?v=4)](https://github.com/ace411 "ace411 (2 commits)")[![evertharmeling](https://avatars.githubusercontent.com/u/308513?v=4)](https://github.com/evertharmeling "evertharmeling (1 commits)")[![maxguru](https://avatars.githubusercontent.com/u/8198049?v=4)](https://github.com/maxguru "maxguru (1 commits)")[![NeoBlack](https://avatars.githubusercontent.com/u/1128085?v=4)](https://github.com/NeoBlack "NeoBlack (1 commits)")[![nielsriekert](https://avatars.githubusercontent.com/u/8812322?v=4)](https://github.com/nielsriekert "nielsriekert (1 commits)")[![repat](https://avatars.githubusercontent.com/u/516807?v=4)](https://github.com/repat "repat (1 commits)")[![elievischel](https://avatars.githubusercontent.com/u/25434540?v=4)](https://github.com/elievischel "elievischel (1 commits)")[![cicnavi](https://avatars.githubusercontent.com/u/3176844?v=4)](https://github.com/cicnavi "cicnavi (1 commits)")

---

Tags

nlpphpphp-languagephp-text-analysistext-analysistokenizationnlpnatural language processingtext classificationirtext analysis

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/php-text-analysis-php-text-analysis/health.svg)

```
[![Health](https://phpackages.com/badges/php-text-analysis-php-text-analysis/health.svg)](https://phpackages.com/packages/php-text-analysis-php-text-analysis)
```

###  Alternatives

[yooper/php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

541420.5k4](/packages/yooper-php-text-analysis)[friendsoftypo3/content-blocks

TYPO3 CMS Content Blocks - Content Types API | Define reusable components via YAML

103519.9k53](/packages/friendsoftypo3-content-blocks)[phel-lang/phel-lang

Phel is a functional programming language that compiles to PHP

5186.0k18](/packages/phel-lang-phel-lang)[dagger/dagger

Dagger PHP SDK

261.1k](/packages/dagger-dagger)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
