PHPackages                             crazzy501/gpt-3-encoder-php - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. crazzy501/gpt-3-encoder-php

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

crazzy501/gpt-3-encoder-php
===========================

PHP BPE Text Encoder/Decoder for GPT-2 / GPT-3. Based on coderevolutionplugins/gpt-3-encoder-php

1.0.3(2y ago)1382MITPHPPHP &gt;=8.2

Since Jun 6Pushed 2y agoCompare

[ Source](https://github.com/AntonyBenjiro/GPT-3-Encoder-PHP)[ Packagist](https://packagist.org/packages/crazzy501/gpt-3-encoder-php)[ RSS](/packages/crazzy501-gpt-3-encoder-php/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)DependenciesVersions (5)Used By (0)

GPT-3-Encoder-Decoder-PHP
=========================

[](#gpt-3-encoder-decoder-php)

PHP BPE Text Encoder/Decoder for GPT-2 / GPT-3

About
-----

[](#about)

GPT-2 and GPT-3 use byte pair encoding to turn text into a series of integers to feed into the model.

This is a PHP OOP implementation of OpenAI's original python encoder and decoder which can be found [here](https://github.com/openai/gpt-2). The main source of inspiration for writing this encoder was the NodeJS version of this encoder, found [here](https://github.com/latitudegames/GPT-3-Encoder).

You can test the results, by comparing the output generated by this script, with the [official tokenizer page from OpenAI](https://beta.openai.com/tokenizer).

Encoder uses caching for store characters arrays for encode/decode that can be configured by using Gpt3EncoderConfiguraion class.

By default, encoder uses php arrays for caching that may be memory intensive, but you can change it to Redis or Memcached driver

Usage
-----

[](#usage)

The mbstring PHP extension is needed for this tool to work correctly (in case non-ASCII characters are present in the tokenized text): [details here on how to install mbstring](https://www.php.net/manual/en/mbstring.installation.php)

```
require './vendor/autoload.php';

$encoder = new \GPT3Encoder\Gpt3Encoder();

$prompt = "Many words map to one token, but some don't: indivisible. Unicode characters like emojis may be split into many tokens containing the underlying bytes: 🤚🏾 Sequences of characters commonly found next to each other may be grouped together: 1234567890";

$token_array = $encoder->encode($prompt);

$original_text = $encoder->decode($token_array);
```

### Or

[](#or)

Pass `true` as second argument to throw exception if decoder can't decode something. Otherwise, it just skips some character

```
$original_text = $encoder->decode($token_array,true);
```

Configuration
-------------

[](#configuration)

```
require './vendor/autoload.php';
$configuration = (new \GPT3Encoder\Gpt3EncoderConfiguration)
    ->setCacheClass(\GPT3Encoder\Gpt3CacheMemcached::class)
    ->setEncoder(__DIR__.'/encoder.json')
    ->setVocabulary(__DIR__.'/vocabulary.bpe')
    ->setCharacters(__DIR__.'/characters.json')
    ->setMemoryLimitThreshold(10000);   // for memory limit checks

$encoder = new \GPT3Encoder\Gpt3Encoder($configuration);
```

###  Health Score

28

—

LowBetter than 54% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity14

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity60

Established project with proven stability

 Bus Factor1

Top contributor holds 94.4% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~12 days

Total

4

Last Release

1039d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/a24b97c0ca37a06ab613635795bc2ef8f77ea9052b740e4d6d40458db4f86f15?d=identicon)[AntonyBenjiro](/maintainers/AntonyBenjiro)

---

Top Contributors

[![CodeRevolutionPlugins](https://avatars.githubusercontent.com/u/91943804?v=4)](https://github.com/CodeRevolutionPlugins "CodeRevolutionPlugins (17 commits)")[![AntonyBenjiro](https://avatars.githubusercontent.com/u/26908889?v=4)](https://github.com/AntonyBenjiro "AntonyBenjiro (1 commits)")

---

Tags

phpGPT-3encoderdecodergpt-2GPT-3-TokenizerGPT-2-Tokenizergpt-2-simplegpt-3-promptsgpt-2-text-generationgpt-3-text-generationgpt-3-promptgpt3-encodergpt3-decoder

### Embed Badge

![Health badge](/badges/crazzy501-gpt-3-encoder-php/health.svg)

```
[![Health](https://phpackages.com/badges/crazzy501-gpt-3-encoder-php/health.svg)](https://phpackages.com/packages/crazzy501-gpt-3-encoder-php)
```

###  Alternatives

[dflydev/base32-crockford

Encode/decode numbers using Douglas Crockford's Base32 Encoding

14379.1k1](/packages/dflydev-base32-crockford)[base62/base62

base62 encoder and decoder also for big numbers with Laravel integration

169.5k](/packages/base62-base62)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
