PHPackages                             byjg/text-classifier - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. byjg/text-classifier

ActiveLibrary[Search &amp; Filtering](/categories/search)

byjg/text-classifier
====================

A PHP text classifier supporting binary spam filtering (Robinson-Fisher Bayesian) and multi-class Naive Bayes classification, with optional LLM-assisted active learning fallback.

6.0.0(3mo ago)9103PHPPHP &gt;=8.3 &lt;8.6CI failing

Since May 14Pushed 3mo ago1 watchersCompare

[ Source](https://github.com/byjg/php-text-classifier)[ Packagist](https://packagist.org/packages/byjg/text-classifier)[ Docs](https://github.com/byjg/b8)[ GitHub Sponsors](https://github.com/byjg)[ RSS](/packages/byjg-text-classifier/feed)WikiDiscussions master Synced 3w ago

READMEChangelog (2)Dependencies (8)Versions (15)Used By (0)

   sidebar\_key text-classifier   tags    php

 text-classification

 ai

    text-classifier — Bayesian Text Classifier
==========================================

[](#text-classifier--bayesian-text-classifier)

A PHP library for statistical text classification. Provides two independent engines:

[![Sponsor](https://camo.githubusercontent.com/fab14b7f7f475072ada0473f193d6f322561fd4a2958e0cc89910d053347cf27/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f53706f6e736f722d2532336561346161613f6c6f676f3d67697468756273706f6e736f7273266c6f676f436f6c6f723d7768697465266c6162656c436f6c6f723d306431313137)](https://github.com/sponsors/byjg)[![Build Status](https://github.com/byjg/text-classifier/actions/workflows/phpunit.yml/badge.svg?branch=master)](https://github.com/byjg/text-classifier/actions/workflows/phpunit.yml)[![Opensource ByJG](https://camo.githubusercontent.com/425c1bbccc0f292bf4d20569ae74a6b2e384fd648f1af8911bc61de9a8dcfc0b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6f70656e736f757263652d62796a672d737563636573732e737667)](http://opensource.byjg.com)[![GitHub source](https://camo.githubusercontent.com/88e61eb211719144efdd570290a0456b6e13099c2df8d973f1bb43fe33bf0039/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4769746875622d736f757263652d696e666f726d6174696f6e616c3f6c6f676f3d676974687562)](https://github.com/byjg/text-classifier/)[![GitHub license](https://camo.githubusercontent.com/58a89710eded993ceb21a6ee4701b01cb74ce5f70e1836e8b4de5c724d576613/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f62796a672f746578742d636c61737369666965722e737667)](https://opensource.byjg.com/opensource/licensing.html)[![GitHub release](https://camo.githubusercontent.com/5101affbdaab68da1dcb5ae2226ab76f7025c9815375dc0fa71fef34fa4bb688/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f72656c656173652f62796a672f746578742d636c61737369666965722e737667)](https://github.com/byjg/text-classifier/releases/)

- **BinaryClassifier** — Binary Robinson-Fisher Bayesian filter. Classifies text as spam or ham. Designed for high-accuracy two-class filtering with word degeneration support.
- **NaiveBayes** — Multi-class Naive Bayes classifier. Classifies text into any number of user-defined categories. Suitable for language detection, topic tagging, content routing, and similar tasks.

Both engines return a `ClassificationResult` with the winning category, confidence score, and all category scores. Both support optional LLM injection for automatic escalation when the statistical model is uncertain — the LLM decision is fed back as training data, improving the model over time (active learning).

Both engines share the same tokenisation pipeline (`StandardLexer`, `StandardDegenerator`) and support pluggable storage backends (in-memory, SQLite, MySQL, PostgreSQL, GDBM).

Installation
------------

[](#installation)

```
composer require byjg/text-classifier
```

Requires PHP `>=8.3`. The GDBM storage backend additionally requires `ext-dba`.

Quick Example
-------------

[](#quick-example)

**Spam filter:**

```
use ByJG\TextClassifier\BinaryClassifier;
use ByJG\TextClassifier\ConfigBinaryClassifier;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Lexer\ConfigLexer;
use ByJG\TextClassifier\Degenerator\StandardDegenerator;
use ByJG\TextClassifier\Degenerator\ConfigDegenerator;
use ByJG\TextClassifier\Storage\Rdbms;
use ByJG\Util\Uri;

$storage = new Rdbms(new Uri('sqlite:///tmp/spam.db'), new StandardDegenerator(new ConfigDegenerator()));
$storage->createDatabase();

$classifier = new BinaryClassifier(new ConfigBinaryClassifier(), $storage, new StandardLexer(new ConfigLexer()));

$classifier->learn('Buy cheap pills now!!!', BinaryClassifier::SPAM);
$classifier->learn('Meeting at 3pm in the conference room', BinaryClassifier::HAM);

$result = $classifier->classify('buy pills online cheap');
// $result->choice === 'spam'
// $result->score  is close to 1.0
```

**Multi-class classifier:**

```
use ByJG\TextClassifier\NaiveBayes\NaiveBayes;
use ByJG\TextClassifier\NaiveBayes\Storage\Memory;
use ByJG\TextClassifier\Lexer\StandardLexer;
use ByJG\TextClassifier\Lexer\ConfigLexer;

$nb = new NaiveBayes(new Memory(), new StandardLexer(new ConfigLexer()));

$nb->train('PHP is a programming language', 'tech');
$nb->train('The cat sat on the mat', 'animals');

$result = $nb->classify('programming language');
// $result->choice          === 'tech'
// $result->score           === 0.93
// $result->scores          === ['tech' => 0.93, 'animals' => 0.07]
```

Documentation
-------------

[](#documentation)

SectionDescription[Getting Started](docs/getting-started/installation.md)Installation, requirements, first working example[Guides: Spam Filter](docs/guides/spam-filter/training.md)Training, classifying, choosing storage[Guides: Multi-class](docs/guides/multi-class/training.md)Training categories, classifying, persistence[Guide: LLM-Assisted Classification](docs/guides/llm-assisted-classification.md)Automatic LLM fallback and active learning[Concepts](docs/concepts/how-binary-classifier-works.md)How the algorithms work, architecture overview[Reference](docs/reference/binary-classifier.md)Full API, configuration parameters, error codesAcknowledgements
----------------

[](#acknowledgements)

This library is inspired by the original **b8** spam filter written by [Tobias Leupold](mailto:tobias.leupold@web.de). The core algorithm, Robinson-Fisher probability model, token degeneration approach, and the `tc*` internal variable convention all originate from his work. This project modernises the codebase for PHP 8.3+, replaces the storage layer with `byjg/micro-orm` and `byjg/migration`, and adds a multi-class NaiveBayes engine built on the same tokenisation pipeline.

Dependencies
------------

[](#dependencies)

 ```
flowchart TD
    byjg/text-classifier --> byjg/micro-orm
    byjg/text-classifier --> byjg/migration
    byjg/text-classifier --> byjg/llm-api-objects
    byjg/text-classifier --> openai-php/client
```

      Loading

###  Health Score

52

—

FairBetter than 96% of packages

Maintenance80

Actively maintained with recent releases

Popularity13

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity88

Battle-tested with a long release history

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~248 days

Recently: every ~805 days

Total

14

Last Release

109d ago

Major Versions

1.1.0 → 6.0.x-dev2026-03-07

PHP version history (3 changes)1.0.0PHP &gt;=5.4.0

1.1.0PHP &gt;=5.6.0

6.0.x-devPHP &gt;=8.3 &lt;8.6

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/981924?v=4)[Joao Gilberto Magalhaes](/maintainers/byjg)[@byjg](https://github.com/byjg)

---

Top Contributors

[![byjg](https://avatars.githubusercontent.com/u/981924?v=4)](https://github.com/byjg "byjg (30 commits)")

---

Tags

filterspamclassificationbayesianllmnaive bayestext-classifieractive-learning

###  Code Quality

TestsPHPUnit

Static AnalysisPsalm

Type Coverage Yes

### Embed Badge

![Health badge](/badges/byjg-text-classifier/health.svg)

```
[![Health](https://phpackages.com/badges/byjg-text-classifier/health.svg)](https://phpackages.com/packages/byjg-text-classifier)
```

###  Alternatives

[sulu/sulu

Core framework that implements the functionality of the Sulu content management system

1.3k1.4M196](/packages/sulu-sulu)[shopware/core

Shopware platform is the core for all Shopware ecommerce products.

585.4M518](/packages/shopware-core)[sproutcms/cms

Enterprise content management and framework

242.2k4](/packages/sproutcms-cms)[mezcalito/ux-search

Effortless search and faceted search with Symfony UX and Mezcalito UX Search

6612.1k2](/packages/mezcalito-ux-search)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
