PHPackages                             token27/nexus-ai-tokenizer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. token27/nexus-ai-tokenizer

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

token27/nexus-ai-tokenizer
==========================

Provider-agnostic LLM token counter for PHP 8.3+. Supports Tiktoken BPE (OpenAI, Claude approx.), HuggingFace JSON vocabularies (DeepSeek, LLaMA, Mistral), SentencePiece (Gemini), and extensible custom strategies.

v1.0.0(2mo ago)0251MITPHPPHP ^8.3CI passing

Since May 20Pushed 2mo agoCompare

[ Source](https://github.com/token27/nexus-ai-tokenizer)[ Packagist](https://packagist.org/packages/token27/nexus-ai-tokenizer)[ RSS](/packages/token27-nexus-ai-tokenizer/feed)WikiDiscussions main Synced 3w ago

READMEChangelog (1)Dependencies (4)Versions (2)Used By (1)

nexus-ai-tokenizer
==================

[](#nexus-ai-tokenizer)

[![CI](https://github.com/token27/nexus-ai-tokenizer/actions/workflows/ci.yml/badge.svg)](https://github.com/token27/nexus-ai-tokenizer/actions)[![PHPStan Level 8](https://camo.githubusercontent.com/412205ac46adf8d1c9329cfcacca7da2697c664b7f1ffd18076a4a17c1e9de6d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048505374616e2d4c6576656c253230382d316636666562)](https://phpstan.org/)[![Latest Version](https://camo.githubusercontent.com/1c93e8670198976069353956755bb29ee95741d66fcc72dfadede49e308d0d42/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f746f6b656e32372f6e657875732d61692d746f6b656e697a65722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/token27/nexus-ai-tokenizer)[![PHP 8.3+](https://camo.githubusercontent.com/38027453aeb7eb818641c9de8f82b7624c3558d92634f1946edc715c3ddf8956/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048502d382e332532422d3737374242343f6c6f676f3d706870266c6f676f436f6c6f723d7768697465)](https://php.net)[![License: MIT](https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667)](LICENSE)[![Tests](https://camo.githubusercontent.com/9207969a02bf4a7a1d6a6c0dcba96fc872ad2923a2553905792b5a4071e3e3da/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f54657374732d31333425323070617373696e672d627269676874677265656e)](#testing)

A **universal, multi-provider** PHP 8.3+ token counting and context estimation library. Manage and estimate context windows across fragmented AI models with an elegant and extensible API.

Why nexus-ai-tokenizer?
-----------------------

[](#why-nexus-ai-tokenizer)

Different providers use completely different tokenization algorithms. OpenAI relies on Tiktoken (`cl100k_base`, `o200k_base`), DeepSeek uses HuggingFace BPE, Gemini uses SentencePiece, and Claude has a proprietary BPE.

**nexus-ai-tokenizer** solves this fragmentation by:

- **Zero-config OpenAI counting**: Built-in, fast, exact token counting for all modern OpenAI models (`gpt-4o`, `o1`, `gpt-3.5-turbo`, etc.).
- **Consistent Interface**: Handle any provider with one method (`TokenizerEngine::for('model')`).
- **Graceful Approximations**: Seamlessly approximate limits for closed tokenizers (like Claude) using `cl100k_base`.
- **Exact Native Integration**: Directly load `.json` HuggingFace and `.model` SentencePiece vocabularies when you need 100% exact math.
- **Multimodal Counting**: Translates image resolutions and detail settings into exact token costs across providers.
- **Batching &amp; Context Windows**: Easy percentage checks, `isWithinContextWindow()`, and batch token processing.

Features
--------

[](#features)

- **Built-in Catalog**: Out-of-the-box support mapping 50+ mainstream models to the correct strategy.
- **Tiktoken Strategy**: Rapid exact counts for `o200k_base`, `cl100k_base`, and more.
- **ChatML &amp; Conversation Overhead**: `countChat()` handles exact provider metadata framing.
- **Extensible Architecture**: Define your own custom tokenizer strategies and dynamic providers.
- **Type Safety**: PHPStan Level 8, immutable Value Objects.

Installation
------------

[](#installation)

```
composer require token27/nexus-ai-tokenizer
```

**Requires:** PHP 8.3+

*Note: For exact SentencePiece tokenization (e.g. Gemini/Gemma), the `textualization/sentencepiece` extension is optionally required. For HuggingFace vocabularies, no extensions are needed.*

Quick Start
-----------

[](#quick-start)

### 1. Zero-config Token Counting

[](#1-zero-config-token-counting)

Count exact tokens for any OpenAI model right out of the box:

```
use Token27\Tokenizer\Engine\TokenizerEngine;

$text = 'Hello world! This is a test of the token counting library.';

// Automatically resolves 'gpt-4o' to Tiktoken (o200k_base)
$count = TokenizerEngine::for('gpt-4o')->count($text);

echo $count->count(); // 14
echo $count->strategy(); // tiktoken:o200k_base
echo $count->isApproximate() ? 'Yes' : 'No'; // No
```

### 2. Multi-turn Chat Counting

[](#2-multi-turn-chat-counting)

Accurately calculate token payloads including provider-specific overheads (role markers, ChatML syntax):

```
$messages = [
    ['role' => 'system', 'content' => 'You are a helpful assistant.'],
    ['role' => 'user', 'content' => 'What is tokenization?'],
];

$chat = TokenizerEngine::for('gpt-4-turbo')->countChat($messages);

echo $chat->contentTokens();  // 12
echo $chat->overheadTokens(); // 11
echo $chat->count();          // 23
```

### 3. Context Window Management

[](#3-context-window-management)

Verify if prompts will fit within a provider's window limit to handle automatic truncation or provider switching:

```
$count = TokenizerEngine::for('claude-3-opus-20240229')->count($hugeDocument);
$limit = 200_000;

if ($count->isWithinContextWindow($limit)) {
    echo "Fits! " . $count->remainingTokens($limit) . " tokens left.";
} else {
    echo "Too large by " . ($count->count() - $limit) . " tokens.";
}
```

### 4. Multimodal Image Tokens

[](#4-multimodal-image-tokens)

Translates image dimensions to required token spend:

```
// OpenAI High-detail image token math
$imageCost = TokenizerEngine::for('gpt-4o')->estimateImage(1920, 1080, 'high');
echo $imageCost->count(); // e.g. 1105 tokens

// Anthropic logic
$claudeCost = TokenizerEngine::for('claude-sonnet-4-20250514')->estimateImage(1920, 1080);
echo $claudeCost->count(); // e.g. 2765 tokens
```

Documentation
-------------

[](#documentation)

- [Installation &amp; Setup](docs/installation.md) — Dependencies and optional extensions
- [Text &amp; Chat Token Counting](docs/counting-text-and-chat.md) — Calculating content and formatting overhead
- [Context Windows](docs/context-window.md) — Handling prompt limits and budget calculations
- [Exact vs Approximate Counting](docs/exact-vs-approximate.md) — How closed models are approximated
- [HuggingFace BPE](docs/huggingface-bpe.md) — Using `.json` vocabularies for exact DeepSeek/Llama counts
- [SentencePiece / Gemini](docs/sentencepiece-gemini.md) — Using `.model` files with FFI
- [Vision &amp; Images](docs/vision-images.md) — Estimating tokens for multimodal images
- [Custom Strategies](docs/custom-strategies.md) — Developing dynamic catalogs and implementations
- [Testing](docs/testing.md) — Running the test suite
- [Contributing](docs/contributing.md) — Development guidelines

License
-------

[](#license)

MIT. See [LICENSE](LICENSE).

###  Health Score

39

—

LowBetter than 84% of packages

Maintenance87

Actively maintained with recent releases

Popularity6

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity48

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

66d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/78189912?v=4)[Token27](/maintainers/token27)[@token27](https://github.com/token27)

---

Top Contributors

[![token27](https://avatars.githubusercontent.com/u/78189912?v=4)](https://github.com/token27 "token27 (2 commits)")

---

Tags

openaitokenizerGeminillmanthropicdeepseekbpetiktokenhuggingfacesentencepiecetoken-counting

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/token27-nexus-ai-tokenizer/health.svg)

```
[![Health](https://phpackages.com/badges/token27-nexus-ai-tokenizer/health.svg)](https://phpackages.com/packages/token27-nexus-ai-tokenizer)
```

###  Alternatives

[yethee/tiktoken

PHP version of tiktoken

1624.0M30](/packages/yethee-tiktoken)[cognesy/instructor-php

The complete AI toolkit for PHP: unified LLM API, structured outputs, agents, and coding agent control

326123.0k1](/packages/cognesy-instructor-php)[helgesverre/toon

Token-Oriented Object Notation - A compact data format for reducing token consumption when sending structured data to LLMs

130154.0k25](/packages/helgesverre-toon)[symfony/ai-platform

PHP library for interacting with AI platform provider.

521.4M300](/packages/symfony-ai-platform)[vizra/vizra-adk

Vizra Agent Development Kit - A comprehensive Laravel package for building intelligent AI agents.

29434.2k](/packages/vizra-vizra-adk)[sbsaga/toon

🧠 TOON for Laravel — a compact, human-readable, and token-efficient data format for AI prompts &amp; LLM contexts. Perfect for ChatGPT, Gemini, Claude, Mistral, and OpenAI integrations (JSON ⇄ TOON).

6753.8k](/packages/sbsaga-toon)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)