PHPackages                             batdan/php-text-analysis - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. batdan/php-text-analysis

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

batdan/php-text-analysis
========================

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

v1.0.1(2y ago)016MITPHPPHP &gt;=7.4

Since Jul 14Pushed 2y agoCompare

[ Source](https://github.com/batdan/php-text-analysis)[ Packagist](https://packagist.org/packages/batdan/php-text-analysis)[ RSS](/packages/batdan-php-text-analysis/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (2)Dependencies (5)Versions (3)Used By (0)

php-text-analysis
=================

[](#php-text-analysis)

[![alt text](https://camo.githubusercontent.com/58ce8a87d57fb5b31505520c7bbdfad17b0f397f5a2d756aad14ce17a4d5aef4/68747470733a2f2f7472617669732d63692e6f72672f796f6f7065722f7068702d746578742d616e616c797369732e7376673f6272616e63683d6d6173746572 "Build status")](https://camo.githubusercontent.com/58ce8a87d57fb5b31505520c7bbdfad17b0f397f5a2d756aad14ce17a4d5aef4/68747470733a2f2f7472617669732d63692e6f72672f796f6f7065722f7068702d746578742d616e616c797369732e7376673f6272616e63683d6d6173746572)

[![Latest Stable Version](https://camo.githubusercontent.com/5f50fcaf3548eaff505aec684cef1dd0baeef6fe7e61f3dfe8c8f8e4a21b5558/68747470733a2f2f706f7365722e707567782e6f72672f796f6f7065722f7068702d746578742d616e616c797369732f762f737461626c65)](https://packagist.org/packages/yooper/php-text-analysis)

[![Total Downloads](https://camo.githubusercontent.com/0b554ab1fda829dcbe74c8fddb0865cae894ac8f1cc99c0a3b869161cb6b6179/68747470733a2f2f706f7365722e707567782e6f72672f796f6f7065722f7068702d746578742d616e616c797369732f646f776e6c6f616473)](https://packagist.org/packages/yooper/php-text-analysis)

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language. There are tools in this library that can perform:

- document classification
- sentiment analysis
- compare documents
- frequency analysis
- tokenization
- stemming
- collocations with Pointwise Mutual Information
- lexical diversity
- corpus analysis
- text summarization

All the documentation for this project can be found in the book and wiki.

PHP Text Analysis Book &amp; Wiki
=================================

[](#php-text-analysis-book--wiki)

A book is in the works and your contributions are needed. You can find the book at

Also, documentation for the library resides in the wiki, too.

Installation Instructions
=========================

[](#installation-instructions)

Add PHP Text Analysis to your project

```
composer require batdan/php-text-analysis

```

### Tokenization

[](#tokenization)

```
$tokens = tokenize($text);
```

You can customize which type of tokenizer to tokenize with by passing in the name of the tokenizer class

```
$tokens = tokenize($text, \TextAnalysis\Tokenizers\PennTreeBankTokenizer::class);
```

The default tokenizer is **\\TextAnalysis\\Tokenizers\\GeneralTokenizer::class** . Some tokenizers require parameters to be set upon instantiation.

### Normalization

[](#normalization)

By default, **normalize\_tokens** uses the function **strtolower** to lowercase all the tokens. To customize the normalize function, pass in either a function or a string to be used by array\_map.

```
$normalizedTokens = normalize_tokens(array $tokens);
```

```
$normalizedTokens = normalize_tokens(array $tokens, 'mb_strtolower');

$normalizedTokens = normalize_tokens(array $tokens, function($token){ return mb_strtoupper($token); });
```

### Frequency Distributions

[](#frequency-distributions)

The call to **freq\_dist** returns a [FreqDist](https://github.com/yooper/php-text-analysis/blob/master/src/Analysis/FreqDist.php) instance.

```
$freqDist = freq_dist(tokenize($text));
```

### Ngram Generation

[](#ngram-generation)

By default bigrams are generated.

```
$bigrams = ngrams($tokens);
```

Customize the ngrams

```
// create trigrams with a pipe delimiter in between each word
$trigrams = ngrams($tokens,3, '|');
```

### Stemming

[](#stemming)

By default stem method uses the Porter Stemmer.

```
$stemmedTokens = stem($tokens);
```

You can customize which type of stemmer to use by passing in the name of the stemmer class name

```
$stemmedTokens = stem($tokens, \TextAnalysis\Stemmers\MorphStemmer::class);
```

### Keyword Extract with Rake

[](#keyword-extract-with-rake)

There is a short cut method for using the Rake algorithm. You will need to clean your data prior to using. Second parameter is the ngram size of your keywords to extract.

```
$rake = rake($tokens, 3);
$results = $rake->getKeywordScores();
```

### Sentiment Analysis with Vader

[](#sentiment-analysis-with-vader)

Need Sentiment Analysis with PHP Use Vader,  . The PHP implementation can be invoked easily. Just normalize your data before hand.

```
$sentimentScores = vader($tokens);
```

### Document Classification with Naive Bayes

[](#document-classification-with-naive-bayes)

Need to do some document classification with PHP, trying using the Naive Bayes implementation. An example of classifying movie reviews can be found in the unit tests

```
$nb = naive_bayes();
$nb->train('mexican', tokenize('taco nacho enchilada burrito'));
$nb->train('american', tokenize('hamburger burger fries pop'));
$nb->predict(tokenize('my favorite food is a burrito'));
```

###  Health Score

23

—

LowBetter than 27% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community15

Small or concentrated contributor base

Maturity45

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 79% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

1039d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/665b1d34ee9a53822f92198ae5284f6e686025c8cb216f4d3d680dc8398b6e02?d=identicon)[batdan](/maintainers/batdan)

---

Top Contributors

[![yooper](https://avatars.githubusercontent.com/u/1064781?v=4)](https://github.com/yooper "yooper (188 commits)")[![Euak](https://avatars.githubusercontent.com/u/12688177?v=4)](https://github.com/Euak "Euak (28 commits)")[![carbon-cloud-deploy](https://avatars.githubusercontent.com/u/60390459?v=4)](https://github.com/carbon-cloud-deploy "carbon-cloud-deploy (6 commits)")[![batdan](https://avatars.githubusercontent.com/u/12898074?v=4)](https://github.com/batdan "batdan (4 commits)")[![thiagogomesverissimo](https://avatars.githubusercontent.com/u/908508?v=4)](https://github.com/thiagogomesverissimo "thiagogomesverissimo (4 commits)")[![ace411](https://avatars.githubusercontent.com/u/11040337?v=4)](https://github.com/ace411 "ace411 (2 commits)")[![maxguru](https://avatars.githubusercontent.com/u/8198049?v=4)](https://github.com/maxguru "maxguru (1 commits)")[![NeoBlack](https://avatars.githubusercontent.com/u/1128085?v=4)](https://github.com/NeoBlack "NeoBlack (1 commits)")[![nielsriekert](https://avatars.githubusercontent.com/u/8812322?v=4)](https://github.com/nielsriekert "nielsriekert (1 commits)")[![repat](https://avatars.githubusercontent.com/u/516807?v=4)](https://github.com/repat "repat (1 commits)")[![elievischel](https://avatars.githubusercontent.com/u/25434540?v=4)](https://github.com/elievischel "elievischel (1 commits)")[![cicnavi](https://avatars.githubusercontent.com/u/3176844?v=4)](https://github.com/cicnavi "cicnavi (1 commits)")

---

Tags

nlpnatural language processingtext classificationirtext analysis

### Embed Badge

![Health badge](/badges/batdan-php-text-analysis/health.svg)

```
[![Health](https://phpackages.com/badges/batdan-php-text-analysis/health.svg)](https://phpackages.com/packages/batdan-php-text-analysis)
```

###  Alternatives

[yooper/php-text-analysis

PHP Text Analysis is a library for performing Information Retrieval (IR) and Natural Language Processing (NLP) tasks using the PHP language

539393.0k2](/packages/yooper-php-text-analysis)[rubix/ml

A high-level machine learning and deep learning library for the PHP language.

2.2k1.4M28](/packages/rubix-ml)[codewithkyrian/transformers

State-of-the-art Machine Learning for PHP. Run Transformers in PHP

749231.8k5](/packages/codewithkyrian-transformers)[nlp-tools/nlp-tools

NlpTools is a set of php 5.3+ classes for beginner to semi advanced natural language processing work.

774645.2k5](/packages/nlp-tools-nlp-tools)[nlgen/nlgen

A library for creating recursive-descent natural language generators.

56181.3k](/packages/nlgen-nlgen)[php-soap/wsdl

Deals with WSDLs

173.5M12](/packages/php-soap-wsdl)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
