PHPackages                             albertvankiel/classification - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. albertvankiel/classification

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

albertvankiel/classification
============================

Text classification for PHP

v1.0.1(1mo ago)56MITPHPPHP ^8.1CI passing

Since Apr 18Pushed 1mo agoCompare

[ Source](https://github.com/albertvankiel/classification)[ Packagist](https://packagist.org/packages/albertvankiel/classification)[ RSS](/packages/albertvankiel-classification/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (2)Dependencies (1)Versions (3)Used By (0)

Classification
==============

[](#classification)

[![Tests](https://github.com/albertvankiel/classification/actions/workflows/tests.yml/badge.svg)](https://github.com/albertvankiel/classification/actions/workflows/tests.yml)

Lightweight zero-dependency text classification library for PHP 8.1+.

This package provides an implementation of the Naive Bayes algorithm for text classification, sentiment analysis and spam filtering. It includes built-in support for N-Grams, stop-word filtering and Laplace smoothing.

Installation
------------

[](#installation)

Install the package through Composer.

```
composer require albertvankiel/classification
```

Basic usage
-----------

[](#basic-usage)

The most common use for Naive Bayes is binary classification, such as detecting spam messages or sentiment analysis.

### Training the model

[](#training-the-model)

To train the model, you must provide an array of text samples and an array of corresponding labels:

```
use AlbertvanKiel\Classification\Classifiers\NaiveBayes;

$classifier = new NaiveBayes();

// 1. Prepare your training data
$samples = [
    "Win a FREE iPhone today! Click here to claim your prize.",
    "Cheap medication for sale, limited time offer!",
    "Hey John, are we still on for the marketing meeting at 10?",
    "Can you please send me the Q3 financial report by Friday?"
];
$labels = [
    "spam",
    "spam",
    "not_spam",
    "not_spam"
];

// 2. Train the classifier
$classifier->train($samples, $labels);
```

### Making predictions

[](#making-predictions)

Once trained, you can use the classifier to predict the category of text:

```
// Predict the single most likely category
$prediction = $classifier->predict("Click here to get your free gift card!");
echo $prediction; // Outputs: 'spam'

// Get the exact probability percentages
$probabilities = $classifier->predictProbabilities("Are we meeting tomorrow?");
print_r($probabilities);
// Outputs: ['not_spam' => 0.98, 'spam' => 0.02]
```

Loading training datasets from JSON
-----------------------------------

[](#loading-training-datasets-from-json)

For loading larger datasets with training data from a JSON file you can use the `Dataset` factory. The JSON file should be an array of objects containing a `text` and `label` key, for example:

**`dataset.json`**

```
[
  {
    "text": "Win a FREE iPhone today! Click here to claim your prize.",
    "label": "spam"
  },
  {
    "text": "Hey John, are we still on for the marketing meeting at 10?",
    "label": "not_spam"
  },
  {
    "text": "Cheap medication for sale, limited time offer!",
    "label": "spam"
  }
]
```

Loading the dataset in PHP:

```
use AlbertvanKiel\Classification\Data\Json;

// Load the data from the file
$dataset = Json::fromFile('/path/to/dataset.json');

// Extract the data and train the classifier
$classifier->train($dataset->getSamples(), $dataset->getLabels());
```

Saving and loading models
-------------------------

[](#saving-and-loading-models)

You can train the model once and then save it to a disk and then load it later:

```
// Save the trained math to a file

$classifier->save('/path/to/storage/spam_model.txt');

// Later, load it without training
$fastClassifier = new NaiveBayes();
$fastClassifier->load('/path/to/storage/spam_model.txt');
$result = $fastClassifier->predict($_POST['message']);
```

Customizing the tokenizer
-------------------------

[](#customizing-the-tokenizer)

By default, the built-in tokenizer filters out common English stop words (such as "the", "and", "is") and uses Unigrams (single words).

You can inject a custom tokenizer for supporting different languages or use N-Grams to give the algorithm context about word order:

```
use AlbertvanKiel\Classification\Tokenizer\Tokenizer;
use AlbertvanKiel\Classification\Tokenizer\StopWords;

// Example 1: Use Spanish stop words
$spanishStopWords = ['el', 'la', 'los', 'las', 'un', 'una', 'y', 'o', 'pero'];
$spanishTokenizer = new Tokenizer($spanishStopWords);

// Example 2: Use Bigrams (pairs of words) for better context
// "not good" becomes "not_good" instead of ["not", "good"]
$bigramTokenizer = new Tokenizer(StopWords::english(), 2);

// Example 3: Disable stop word filtering entirely
$rawTokenizer = new Tokenizer([]);

// Inject the custom tokenizer into the classifier
$classifier = new NaiveBayes($bigramTokenizer);
```

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

39

—

LowBetter than 84% of packages

Maintenance90

Actively maintained with recent releases

Popularity9

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

52d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/407dbefc104daa6e71b7f664846428347e92f929e6d63cdd9f54472d45a8f51a?d=identicon)[albertvankiel](/maintainers/albertvankiel)

---

Top Contributors

[![albertvankiel](https://avatars.githubusercontent.com/u/6236648?v=4)](https://github.com/albertvankiel "albertvankiel (2 commits)")

---

Tags

categorizationclassificationmachine-learningn-gramsnaive-bayesnlpphpsentiment-analysisspam-filterstop-wordstext-classificationzero-dependency

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/albertvankiel-classification/health.svg)

```
[![Health](https://phpackages.com/badges/albertvankiel-classification/health.svg)](https://phpackages.com/packages/albertvankiel-classification)
```

###  Alternatives

[rowbot/url

A WHATWG URL spec compliant URL parser for working with URLs and their query strings.

19664.2k6](/packages/rowbot-url)[sarhan/php-flatten

Flattens multidimensional arrays, traversables and vars into one dimensional array.

21182.2k1](/packages/sarhan-php-flatten)[bartlett/sarif-php-sdk

PHP library to create and manipulate SARIF logs

1190.2k6](/packages/bartlett-sarif-php-sdk)[opanegro/nova-custom-controller

Make custom controller in Laravel Nova

144.4k](/packages/opanegro-nova-custom-controller)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
