PHPackages                             jared/php-tokenizer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. jared/php-tokenizer

ActiveLibrary

jared/php-tokenizer
===================

This package provides a simple standalone regular expressions powered tokenizer.

015PHP

Since Sep 13Pushed 2y ago1 watchersCompare

[ Source](https://github.com/JaredSpb/php-tokenizer)[ Packagist](https://packagist.org/packages/jared/php-tokenizer)[ RSS](/packages/jared-php-tokenizer/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependenciesVersions (1)Used By (0)

php-tokenizer
=============

[](#php-tokenizer)

A simple RE driven tokenizer

This package provides a simple standalone regular expressions powered tokenizer.

Installation
------------

[](#installation)

`composer require jared/php-tokenizer`

Usage
-----

[](#usage)

The example usage:

```
$rules = [
    'NON_SPACE_STRING' => '/\\G[^\\s]+/u',
    'ANY_CHARACTER' => '/\\G./u'
];

$string = 'abc 1qz';
$stream = ( new Falloff\Tokenizer\Factory( $rules ) )->getStream( $string );
while( $token = $stream->nextToken() ){
    print "Token has type `{$token->type}` and its value is `{$token->value}` at offset `{$token->offset}`\n";
}

// The output is:
# Token has type `NON_SPACE_STRING` and its value is `abc` at offset `0`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `3`
# Token has type `NON_SPACE_STRING` and its value is `1qz` at offset `4`
```

*Note:* the regexps used MUST start with the `\G` assertion.

*Note:* data is interpretted as UTF-8 so regexps are recommended to be provided with a `u` setting.

Rules might be added on the fly to the factory or either the stream itself. Adding rules to the factory will not affect the streams instantinated already.

```
$rules = [
    'NON_SPACE_STRING' => '/\\G[^\\s]+/u',
    'ANY_CHARACTER' => '/\\G./u'
];

$string = 'a b 1 qz';
$stream = ( new Falloff\Tokenizer\Factory( $rules ) )->getStream( $stream );

// This rule will never trigger, coz 'NON_SPACE_STRING' will be macthed earlier
$stream->appendRule('DIGIT', '/\\G\d/u');

// Prepending rules, so these will be matched before the 'NON_SPACE_STRING'
$stream->prependRules([
    'Q_CHAR' => '/\\Gq/u',
    'Z_CHAR' => '/\\Gz/u',
]);

// Stream might be invoked like it was a function
while( $token = $stream() ){
    print "Token has type `{$token->type}` and its value is `{$token->value}` at offset `{$token->offset}`\n";
}

// The output is:
# Token has type `NON_SPACE_STRING` and its value is `a` at offset `0`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `1`
# Token has type `NON_SPACE_STRING` and its value is `b` at offset `2`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `3`
# Token has type `NON_SPACE_STRING` and its value is `1` at offset `4`
# Token has type `ANY_CHARACTER` and its value is ` ` at offset `5`
# Token has type `Q_CHAR` and its value is `q` at offset `6`
# Token has type `Z_CHAR` and its value is `z` at offset `7`
```

When no rules matched for the next chunk of input stream, the `UnknownTokenException` is thrown. This exception is a token itself. It has its type set to `NULL` but yet allows accessing the `value` and `offset` properties.

When stream ends, the call for next token will return `false`. The `eof` property may be checked to retrieve the stream state without asking for next token:

```
if( $stream->eof ){
    print "Got all the tokens we had there";
} else{
    $token = $stream();
}
```

The remaining substring may be retrieved at any moment with the `tail` method:

```
print "The untokenized substring currently is: " . $stream->tail();
```

Stream can have an attached callback that is triggered every time token is requested from the tokenizer:

```
use \Falloff\Tokenizer\{UnknownTokenException,Token};

$stream->onTokenRequest(function( UnknownTokenException|Token $token ){
    print $token->type . ' token retrieved from the stream';
});
```

If this callback returns a `Token` instance, this instance will be returned to the initial caller:

###  Health Score

14

—

LowBetter than 2% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity7

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity21

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/b429117d3670af5a4be542223331c8239d40db8f5f5fd20edde449f15783f7ef?d=identicon)[JaredSpb](/maintainers/JaredSpb)

---

Top Contributors

[![JaredSpb](https://avatars.githubusercontent.com/u/26169319?v=4)](https://github.com/JaredSpb "JaredSpb (12 commits)")

### Embed Badge

![Health badge](/badges/jared-php-tokenizer/health.svg)

```
[![Health](https://phpackages.com/badges/jared-php-tokenizer/health.svg)](https://phpackages.com/packages/jared-php-tokenizer)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
