PHPackages                             coral-media/php-ir - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. coral-media/php-ir

Abandoned → [coral-media/ext-ir](/?search=coral-media%2Fext-ir)ArchivedLibrary[Search &amp; Filtering](/categories/search)

coral-media/php-ir
==================

Information Retrieval algorithms (vector space, similarity, clustering)

v0.7.2(4mo ago)08MITPHPPHP &gt;=8.2

Since Dec 18Pushed 4mo agoCompare

[ Source](https://github.com/coral-media/php-ir)[ Packagist](https://packagist.org/packages/coral-media/php-ir)[ RSS](/packages/coral-media-php-ir/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (5)Versions (20)Used By (0)

PHP-IR
======

[](#php-ir)

[![PHP](https://camo.githubusercontent.com/5a4f359a9d75caa6a221ef60f3f354e97287bef4051283935a7710e2cc7e50f1/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d382e322532422d3737374242343f6c6f676f3d706870)](https://camo.githubusercontent.com/5a4f359a9d75caa6a221ef60f3f354e97287bef4051283935a7710e2cc7e50f1/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d382e322532422d3737374242343f6c6f676f3d706870)[![License](https://camo.githubusercontent.com/e7dac58613f2dcadc20e7cd1d29effb32da6b2787ff83b9ed120f86b920bd8ff/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f636f72616c2d6d656469612f7068702d6972)](https://camo.githubusercontent.com/e7dac58613f2dcadc20e7cd1d29effb32da6b2787ff83b9ed120f86b920bd8ff/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f636f72616c2d6d656469612f7068702d6972)

[![PHPStan](https://camo.githubusercontent.com/13f601b8b984aa928494c759cc8a1e0c7cb3edfef605b044d2d3a659b66e0482/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068707374616e2d6c6576656c25323031302d627269676874677265656e)](https://camo.githubusercontent.com/13f601b8b984aa928494c759cc8a1e0c7cb3edfef605b044d2d3a659b66e0482/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068707374616e2d6c6576656c25323031302d627269676874677265656e)[![PHPMD](https://camo.githubusercontent.com/c2321a49e1d204f5366f1de678700e2841cd5be1c726a4d31c4673ac75e33282/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068706d642d656e61626c65642d626c7565)](https://camo.githubusercontent.com/c2321a49e1d204f5366f1de678700e2841cd5be1c726a4d31c4673ac75e33282/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068706d642d656e61626c65642d626c7565)

[![GitHub last commit](https://camo.githubusercontent.com/bca4f65dd886fe4a80a9b12e1c57dcc64db0db8b2e21302ad4e5d930d6aaa077/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6173742d636f6d6d69742f636f72616c2d6d656469612f7068702d6972)](https://camo.githubusercontent.com/bca4f65dd886fe4a80a9b12e1c57dcc64db0db8b2e21302ad4e5d930d6aaa077/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6173742d636f6d6d69742f636f72616c2d6d656469612f7068702d6972)[![GitHub repo size](https://camo.githubusercontent.com/8b5e22ac4e928b08a2d20428d1b885bbbbaf852c18243d52809c858b24fc3827/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f7265706f2d73697a652f636f72616c2d6d656469612f7068702d6972)](https://camo.githubusercontent.com/8b5e22ac4e928b08a2d20428d1b885bbbbaf852c18243d52809c858b24fc3827/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f7265706f2d73697a652f636f72616c2d6d656469612f7068702d6972)

**PHP-IR** is a modern, research-oriented **Information Retrieval (IR)** and **Vector Space Modeling** library for PHP, focused on correctness, transparency, and theoretical grounding.

It provides low-level, composable primitives for **text representation, weighting, similarity, clustering, and evaluation**, designed for engineers who need **full control and explainability**, not opaque ML abstractions.

---

Why PHP-IR exists
-----------------

[](#why-php-ir-exists)

The PHP ecosystem has historically lacked serious IR tooling beyond thin wrappers around search engines. PHP-IR fills that gap by offering:

- Explicit **vector space modeling**
- Reproducible **term weighting pipelines**
- Deterministic **clustering algorithms**
- Quantitative **cluster quality metrics**
- APIs aligned with **Information Retrieval literature**

The goal is not convenience-first APIs, but **scientifically correct and inspectable IR workflows**.

---

Core capabilities
-----------------

[](#core-capabilities)

### Text processing

[](#text-processing)

- Tokenization (regex, whitespace)
- Text normalization (lowercasing, accent folding, composition)
- Stop-word filtering with language support (English, Spanish)

### Vocabulary &amp; statistics

[](#vocabulary--statistics)

- Vocabulary construction
- Document frequency tracking
- IDF computation (per-term and vectorized)
- Corpus-level statistics via dedicated façades (no core pollution)

### Vectorization

[](#vectorization)

- Sparse and dense vector representations
- Term Frequency (TF)
- TF-IDF weighting
- Spherical (L2-normalized) vector spaces
- Explicit densification for algorithms that require fixed dimensions

### Similarity

[](#similarity)

- Cosine similarity
- Pluggable similarity interfaces

### Clustering

[](#clustering)

- Spherical K-Means
- **Spherical K-Medians** (robust to outliers)
- Deterministic centroid update strategies
- Explicit iteration control
- Centroid initialization and update policies

### Cluster evaluation

[](#cluster-evaluation)

- Intra-cluster cohesion
- Inter-cluster separation
- Global quality score aligned with IR theory
- Metrics designed for **algorithm comparison**, not just reporting

---

Design philosophy
-----------------

[](#design-philosophy)

PHP-IR is intentionally **not**:

- A search engine
- A machine learning framework
- A black-box clustering toolkit

Instead, it provides **clear, inspectable building blocks** that let you:

- Reason about every step of the IR pipeline
- Swap strategies without side effects
- Validate theoretical assumptions with executable code
- Compare algorithms using quantitative invariants

If you are familiar with TF-IDF, cosine similarity, and clustering theory, PHP-IR should feel predictable and rigorous.

---

Theoretical foundation
----------------------

[](#theoretical-foundation)

The library is grounded in classical and modern IR research, including:

- *[Introduction to Information Retrieval](https://nlp.stanford.edu/IR-book/)* - Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze
- *[Spherical k-means clustering](http://www.cs.utexas.edu/~inderjit/public_papers/concept_mlj.pdf)* - I. S. Dhillon and D. S. Modha
- *[Spherical K - Medians](https://www.academia.edu/27956871/Spherical_K_Medians)* - Rafael E. Espinosa Santiesteban

---

Current status
--------------

[](#current-status)

- Actively developed
- API stabilized through real-world usage
- Strong test coverage with **invariant-based tests**
- English and Spanish corpora used for validation
- Designed to evolve without breaking theoretical guarantees

> Detailed documentation, examples, and usage guides will be added incrementally.

---

Roadmap (high level)
--------------------

[](#roadmap-high-level)

- Advanced convergence criteria beyond fixed iteration limits
- Additional robustness heuristics for clustering
- Optional serialization of evaluation artifacts
- Extended language tooling and corpora support

---

License
-------

[](#license)

MIT License.
Use it, extend it, and build on it responsibly.

###  Health Score

35

—

LowBetter than 80% of packages

Maintenance78

Regular maintenance activity

Popularity4

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity46

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

19

Last Release

122d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/5040695?v=4)[Rafael Ernesto Espinosa Santiesteban](/maintainers/rernesto)[@rernesto](https://github.com/rernesto)

---

Top Contributors

[![rernesto](https://avatars.githubusercontent.com/u/5040695?v=4)](https://github.com/rernesto "rernesto (94 commits)")

---

Tags

clustering-algorithminformation-retrievalk-meansk-means-clusteringtext-classificationsearchrecommendationclusteringk-meansTF-IDFinformation-retrievaltext-similaritycosine-similarityvector-space-model

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/coral-media-php-ir/health.svg)

```
[![Health](https://phpackages.com/badges/coral-media-php-ir/health.svg)](https://phpackages.com/packages/coral-media-php-ir)
```

###  Alternatives

[elasticsearch/elasticsearch

PHP Client for Elasticsearch

5.3k178.3M943](/packages/elasticsearch-elasticsearch)[rubix/ml

A high-level machine learning and deep learning library for the PHP language.

2.2k1.4M28](/packages/rubix-ml)[ruflin/elastica

Elasticsearch Client

2.3k50.4M203](/packages/ruflin-elastica)[solarium/solarium

PHP Solr client

93432.7M98](/packages/solarium-solarium)[opensearch-project/opensearch-php

PHP Client for OpenSearch

15224.3M65](/packages/opensearch-project-opensearch-php)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
