PHPackages                             brucetruth/ml-idea - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. brucetruth/ml-idea

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

brucetruth/ml-idea
==================

A production-ready machine learning library for PHP

18175PHPCI failing

Since Feb 16Pushed 2w ago2 watchersCompare

[ Source](https://github.com/brucetruth/ml-idea)[ Packagist](https://packagist.org/packages/brucetruth/ml-idea)[ RSS](/packages/brucetruth-ml-idea/feed)WikiDiscussions master Synced 2w ago

READMEChangelogDependenciesVersions (2)Used By (0)

ml-idea
=======

[](#ml-idea)

[![Minimum PHP Version](https://camo.githubusercontent.com/ec21f169d70b69344c67d6f18fa1a24d20476d2f0cd680e8c4a1534c22f34e5f/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d253345253344253230382e322d3838393242462e737667)](https://php.net/)[![License](https://camo.githubusercontent.com/5b585c831c71de78d6f4e4cf760f5768caaa109716e1129c610b07a95f137765/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f627275636574727574682f6d6c2d696465612e737667)](LICENSE)

`ml-idea` is a modern, production-oriented machine learning library for PHP focused on clean APIs, strict typing, and practical classification workflows.

Others always look down on PHP &amp; have proclaimed its end since 2000, well, the elephant keeps moving.

Features
--------

[](#features)

- PHP 8.2+ with strict types
- Consistent classifier contract (`train`, `predict`, `predictBatch`)
- Production-ready baseline classifiers:
    - `KNearestNeighbors`
    - `LogisticRegression` (binary classification)
    - `GaussianNaiveBayes`
- Model persistence (`ModelSerializer`)
- Data splitting utility (`TrainTestSplit`)
- Evaluation metrics (`accuracy`, `precision`, `recall`, `f1Score`)
- Advanced evaluation metrics: `rocAuc`, `prAuc`, `logLoss`, `brierScore`, `matthewsCorrcoef`, `meanAbsolutePercentageError`
- Preprocessing transformers (`StandardScaler`, `MinMaxScaler`)
- Workflow tools (`PipelineClassifier`, `KFold` cross-validation splits)
- Extra splitters: `StratifiedKFold`, `TimeSeriesSplit`
- Cross-validation helpers: `CrossValidation::crossValScore*`, `CrossValidation::crossValPredict*`
- Probability calibration + threshold optimization: `CalibratedClassifierCV` (CV + `cv='prefit'`), `ThresholdTuner`, isotonic regression
- Regression support (`LinearRegression`, `RegressionMetrics`)
- Advanced modules: `PCA`, `KMeans`, `MiniBatchKMeans`, `DBSCAN`, sparse `TfidfVectorizer`
- Tree ensembles: `RandomForestClassifier/Regressor`, `GradientBoostingClassifier/Regressor`, `LinearSVC`, `DecisionTree`
- Model selection: stratified `GridSearchClassifier`, `RandomizedSearchClassifier` (accuracy, F1, ROC-AUC, PR-AUC)
- Pipeline persistence: `PipelineSerializer`, `TabularPipelineClassifier` (OneHot + scalers + estimator)
- Vision module: DCT/noise/patch forensics, `ForensicsVisionEmbedder`, `OllamaVisionEmbedder`, `VisionIndexer`, `VisionEval` (ROC-AUC), trainable `AuthenticityClassifier`, neural hooks
- Vision heuristics: color palette analysis, skin-tone risk, AI-generation authenticity scoring
- NLP foundation (Phase 1): fluent Text API, unicode tokenization with offsets, PII redaction, rule-based POS tagging
- NLP Phase 2: language detection, keyword extraction (RAKE), BM25 retrieval, hashing vectorizer, similarity utilities, and NLP RAG helpers
- NLP advanced tagging: multilingual rule-based POS, extensible language profiles, rule-based NER, spaCy-style `Nlp::load()` API (104 languages)
- NLP neural backend hooks: `CallableNlpBackend`, `OllamaNlpBackend`, `HuggingFaceInferenceBackend`
- GEO service + ML-GEO helpers: country/state/city lookup, nearest-place search, and geo feature building
- Managed dataset assets: registry, integrity checks, licenses metadata, and compiled indexes (trie/automaton/kd-tree)
- RAG foundations: embedders (`EmbedderFactory::fromEnv`, `OpenAI`, `AzureOpenAI`, `Ollama`, `HuggingFaceEmbedder`, `TeiEmbedder`, `HashEmbedder`), `VisionPathEmbedder`, splitters, retriever, vector stores
- RAG LLM clients for QA generation: `Echo`, `OpenAI`, `Azure OpenAI`, and `Ollama` (direct or `LlmClientFactory::fromEnv()`)
- Advanced RAG workflow: document loaders, hybrid retrieval, rerankers, citations/diagnostics, vector-index persistence, tool-calling + streaming hooks
- AI agents + tool routing: `ToolCallingAgent`, `ToolRoutingAgent`, deterministic/local routing, and provider-backed routing (OpenAI/Azure/Anthropic/Ollama/custom)
- Unified core contracts (v1.4): `fit/predict`, probabilistic, online-learning, serializable model interfaces
- Hyperparameter lifecycle helpers: `getParams`, `setParams`, `cloneWithParams`, random-state aware models
- PHPUnit test suite + CI workflow
- Static analysis support with PHPStan

Installation
------------

[](#installation)

```
composer require brucetruth/ml-idea
```

Quick Start
-----------

[](#quick-start)

```
