PHPackages                             ze/tokenizer-gpt3 - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. ze/tokenizer-gpt3

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

ze/tokenizer-gpt3
=================

PHP package for Byte Pair Encoding (BPE) used by GPT-3.

v1.1(3y ago)115.9k—3.2%1Apache-2.0PHPPHP ^7.4.0

Since Feb 28Pushed 3y ago1 watchersCompare

[ Source](https://github.com/zyz954489346/Tokenizer-GPT3)[ Packagist](https://packagist.org/packages/ze/tokenizer-gpt3)[ Docs](https://github.com/zyz954489346/Tokenizer-GPT3)[ RSS](/packages/ze-tokenizer-gpt3/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (2)Dependencies (1)Versions (3)Used By (0)

Tokenizer-GPT3 base on PHP
==========================

[](#tokenizer-gpt3-base-on-php)

> This Project is fork from [Gioni06/GPT3Tokenizer](https://github.com/Gioni06/GPT3Tokenizer)
> Just do some changes for PHP 7.4 compatibility

This is a PHP port of the GPT-3 tokenizer. It is based on the [original Python implementation](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Tokenizer) and the [Nodejs implementation](https://github.com/latitudegames/GPT-3-Encoder).

GPT-2 and GPT-3 use a technique called byte pair encoding to convert text into a sequence of integers, which are then used as input for the model. When you interact with the OpenAI API, you may find it useful to calculate the amount of tokens in a given text before sending it to the API.

If you want to learn more, read the [Summary of the tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) from Hugging Face.

Installation
------------

[](#installation)

```
composer require ze/tokenizer-gpt3
```

Use the configuration Class
---------------------------

[](#use-the-configuration-class)

```
$defaultConfig = new Gpt3TokenizerConfig();

$customConfig = new Gpt3TokenizerConfig();
$customConfig
    ->vocabPath('custom_vocab.json')
    ->mergesPath('custom_merges.txt')
    ->useCache(false)
```

### A note on caching

[](#a-note-on-caching)

The tokenizer will try to use `apcu` for caching, if that is not available it will use a plain PHP `array`. You will see slightly better performance for long texts when using the cache. The cache is enabled by default.

Encode a text
-------------

[](#encode-a-text)

```
$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$text = "This is some text";
$tokens = $tokenizer->encode($text);
// [1212,318,617,2420]
```

Decode a text
-------------

[](#decode-a-text)

```
$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$tokens = [1212,318,617,2420]
$text = $tokenizer->decode($tokens);
// "This is some text"
```

Count the number of tokens in a text
------------------------------------

[](#count-the-number-of-tokens-in-a-text)

```
$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$text = "This is some text";
$numberOfTokens = $tokenizer->count($text);
// 4
```

License
-------

[](#license)

This project uses the Apache License 2.0 license. See the [LICENSE](LICENSE) file for more information.

###  Health Score

29

—

LowBetter than 59% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity30

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity46

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 89.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

1176d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/6347bd82fe503e6d208d8899cec05533667148d3f6659e91dbb82ea92dd2446e?d=identicon)[zyz954489346](/maintainers/zyz954489346)

---

Top Contributors

[![Gioni06](https://avatars.githubusercontent.com/u/536849?v=4)](https://github.com/Gioni06 "Gioni06 (25 commits)")[![zyz954489346](https://avatars.githubusercontent.com/u/23162012?v=4)](https://github.com/zyz954489346 "zyz954489346 (3 commits)")

---

Tags

encodedecodeGPT-3openaitokenizergpt-2

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/ze-tokenizer-gpt3/health.svg)

```
[![Health](https://phpackages.com/badges/ze-tokenizer-gpt3/health.svg)](https://phpackages.com/packages/ze-tokenizer-gpt3)
```

###  Alternatives

[gioni06/gpt3-tokenizer

PHP package for Byte Pair Encoding (BPE) used by GPT-3.

85537.5k8](/packages/gioni06-gpt3-tokenizer)[yethee/tiktoken

PHP version of tiktoken

1583.1M15](/packages/yethee-tiktoken)[rajentrivedi/tokenizer-x

TokenizerX calculates required tokens for given prompt

91214.0k3](/packages/rajentrivedi-tokenizer-x)[kherge/json

Encodes, decodes, and validates JSON data.

61226.6k6](/packages/kherge-json)[devium/toml

A PHP encoder/decoder for TOML compatible with specification 1.0.0

3968.9k13](/packages/devium-toml)[rickselby/nbt

Parser/Writer for the NBT file format

171.2k1](/packages/rickselby-nbt)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
