PHPackages                             everstu/gpt3-tokenizer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. everstu/gpt3-tokenizer

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

everstu/gpt3-tokenizer
======================

PHP package for Byte Pair Encoding (BPE) used by GPT-3.

v1.0.0(3y ago)0221—0%Apache-2.0PHPPHP ^7.0

Since Apr 12Pushed 3y agoCompare

[ Source](https://github.com/everstu/GPT3Tokenizer)[ Packagist](https://packagist.org/packages/everstu/gpt3-tokenizer)[ Docs](https://github.com/everstu/GPT3Tokenizer)[ RSS](/packages/everstu-gpt3-tokenizer/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (1)Versions (2)Used By (0)

GPT3Tokenizer for PHP
=====================

[](#gpt3tokenizer-for-php)

### This package is forked from  and changed to support php 7, thanks to the original author!

[](#this-package-is-forked-from-httpspackagistorgpackagesgioni06gpt3-tokenizer-and-changed-to-support-php-7-thanks-to-the-original-author)

This is a PHP port of the GPT-3 tokenizer. It is based on the [original Python implementation](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Tokenizer) and the [Nodejs implementation](https://github.com/latitudegames/GPT-3-Encoder).

GPT-2 and GPT-3 use a technique called byte pair encoding to convert text into a sequence of integers, which are then used as input for the model. When you interact with the OpenAI API, you may find it useful to calculate the amount of tokens in a given text before sending it to the API.

If you want to learn more, read the [Summary of the tokenizers](https://huggingface.co/docs/transformers/tokenizer_summary) from Hugging Face.

Support ⭐️
----------

[](#support-️)

If you find my work useful, I would be thrilled if you could show your support by giving this project a star ⭐️. It only takes a second and it would mean a lot to me. Your star will not only make me feel warm and fuzzy inside, but it will also help reach more people who can benefit from this project.

Installation
------------

[](#installation)

Install the package from [Packagist](https://packagist.org/packages/everstu/gpt3-tokenizer) using Composer:

```
composer require everstu/gpt3-tokenizer
```

Testing
-------

[](#testing)

Loading the vocabulary files consumes a lot of memory. You might need to increase the phpunit memory limit.

```
./vendor/bin/phpunit -d memory_limit=-1 tests/
```

Use the configuration Class
---------------------------

[](#use-the-configuration-class)

```
use everstu\Gpt3Tokenizer\Gpt3TokenizerConfig;

// default vocab path
// default merges path
// caching enabled
$defaultConfig = new Gpt3TokenizerConfig();

$customConfig = new Gpt3TokenizerConfig();
$customConfig
    ->vocabPath('custom_vocab.json') // path to a custom vocabulary file
    ->mergesPath('custom_merges.txt') // path to a custom merges file
    ->useCache(false)
```

### A note on caching

[](#a-note-on-caching)

The tokenizer will try to use `apcu` for caching, if that is not available it will use a plain PHP `array`. You will see slightly better performance for long texts when using the cache. The cache is enabled by default.

Encode a text
-------------

[](#encode-a-text)

```
use everstu\Gpt3Tokenizer\Gpt3TokenizerConfig;
use everstu\Gpt3Tokenizer\Gpt3Tokenizer;

$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$text = "This is some text";
$tokens = $tokenizer->encode($text);
// [1212,318,617,2420]
```

Decode a text
-------------

[](#decode-a-text)

```
use everstu\Gpt3Tokenizer\Gpt3TokenizerConfig;
use everstu\Gpt3Tokenizer\Gpt3Tokenizer;

$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$tokens = [1212,318,617,2420]
$text = $tokenizer->decode($tokens);
// "This is some text"
```

Count the number of tokens in a text
------------------------------------

[](#count-the-number-of-tokens-in-a-text)

```
use everstu\Gpt3Tokenizer\Gpt3TokenizerConfig;
use everstu\Gpt3Tokenizer\Gpt3Tokenizer;

$config = new Gpt3TokenizerConfig();
$tokenizer = new Gpt3Tokenizer($config);
$text = "This is some text";
$numberOfTokens = $tokenizer->count($text);
// 4
```

License
-------

[](#license)

This project uses the Apache License 2.0 license. See the [LICENSE](LICENSE) file for more information.

###  Health Score

22

—

LowBetter than 23% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity12

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity39

Early-stage or recently created project

 Bus Factor1

Top contributor holds 89.7% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

1123d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/117be66758e23b54394384783627b14cb8c0ca45f8e42efeb8f37efc5f87c54c?d=identicon)[whtony2003](/maintainers/whtony2003)

---

Top Contributors

[![Gioni06](https://avatars.githubusercontent.com/u/536849?v=4)](https://github.com/Gioni06 "Gioni06 (26 commits)")[![everstu](https://avatars.githubusercontent.com/u/19686947?v=4)](https://github.com/everstu "everstu (3 commits)")

---

Tags

encodedecodeGPT-3openaitokenizergpt-2

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/everstu-gpt3-tokenizer/health.svg)

```
[![Health](https://phpackages.com/badges/everstu-gpt3-tokenizer/health.svg)](https://phpackages.com/packages/everstu-gpt3-tokenizer)
```

###  Alternatives

[gioni06/gpt3-tokenizer

PHP package for Byte Pair Encoding (BPE) used by GPT-3.

85537.5k8](/packages/gioni06-gpt3-tokenizer)[yethee/tiktoken

PHP version of tiktoken

1583.1M15](/packages/yethee-tiktoken)[rajentrivedi/tokenizer-x

TokenizerX calculates required tokens for given prompt

91214.0k3](/packages/rajentrivedi-tokenizer-x)[kherge/json

Encodes, decodes, and validates JSON data.

61226.6k6](/packages/kherge-json)[devium/toml

A PHP encoder/decoder for TOML compatible with specification 1.0.0

3968.9k12](/packages/devium-toml)[rickselby/nbt

Parser/Writer for the NBT file format

171.2k1](/packages/rickselby-nbt)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
