PHPackages                             nabu-3/lexer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. nabu-3/lexer

ActiveLibrary

nabu-3/lexer
============

nabu-3 Lexer library to generate and analize lexical expressions

419PHP

Since May 13Pushed 6y ago3 watchersCompare

[ Source](https://github.com/nabu-3/lexer)[ Packagist](https://packagist.org/packages/nabu-3/lexer)[ RSS](/packages/nabu-3-lexer/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (2)DependenciesVersions (2)Used By (0)

nabu-3 Lexer
============

[](#nabu-3-lexer)

[![GitHub](https://camo.githubusercontent.com/be1b79ec5a15a8e192724b1e1b47f7405f0ad6e901216fad10785d4de71807f4/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f6e6162752d332f6c657865722e737667)](https://opensource.org/licenses/Apache-2.0)[![Build Status](https://camo.githubusercontent.com/3ecc6ea0be4d571069e5e434524a3a0ac5bf7b9fa017bd40d596df77b3d1c9e6/68747470733a2f2f7472617669732d63692e6f72672f6e6162752d332f6c657865722e7376673f6272616e63683d6d6173746572)](https://travis-ci.org/nabu-3/lexer)[![Quality Gate Status](https://camo.githubusercontent.com/fff80c84474d7cf3350c30128980a363e50ad9b90f82e8b60e48366f93c63a4c/68747470733a2f2f736f6e6172636c6f75642e696f2f6170692f70726f6a6563745f6261646765732f6d6561737572653f70726f6a6563743d6e6162752d335f6c65786572266d65747269633d616c6572745f737461747573)](https://sonarcloud.io/dashboard?id=nabu-3_lexer)[![Maintainability Rating](https://camo.githubusercontent.com/3d3f9dac8ff4ecf49b2143bc100a7292f1ac0b962ac4a93491799bbfcb86a269/68747470733a2f2f736f6e6172636c6f75642e696f2f6170692f70726f6a6563745f6261646765732f6d6561737572653f70726f6a6563743d6e6162752d335f6c65786572266d65747269633d7371616c655f726174696e67)](https://sonarcloud.io/dashboard?id=nabu-3_lexer&metric=Maintainability)[![Reliability Rating](https://camo.githubusercontent.com/94101228af3c913f3829de1dc0fab70d28ec1438e59ba39aa3726e6eaff9d106/68747470733a2f2f736f6e6172636c6f75642e696f2f6170692f70726f6a6563745f6261646765732f6d6561737572653f70726f6a6563743d6e6162752d335f6c65786572266d65747269633d72656c696162696c6974795f726174696e67)](https://sonarcloud.io/dashboard?id=nabu-3_lexer&metric=Reliability)[![Security Rating](https://camo.githubusercontent.com/58742975080a8781ef49476b3fed3487cdb462478ea558e4d5dcad62929395d8/68747470733a2f2f736f6e6172636c6f75642e696f2f6170692f70726f6a6563745f6261646765732f6d6561737572653f70726f6a6563743d6e6162752d335f6c65786572266d65747269633d73656375726974795f726174696e67)](https://sonarcloud.io/dashboard?id=nabu-3_lexer&metric=Security)

This is a Lexer library written in **PHP** to analyze lexical expressions and obtain a tokenized representation and a data structure as a descriptor of interpreted content.

The Lexer supports Unicode strings and Regular Expressions.

Installation
------------

[](#installation)

Lexer library requires **PHP 7.2** or higher and **mb\_string** native module.

The library is deployed as part of [**composer**](https://getcomposer.org) and [**Packagist**](https://packagist.org/packages/nabu-3/lexer) standard **PHP** packages distribution. To use this library you need only to require it via composer:

```
composer require nabu-3/lexer
```

Basic usage
-----------

[](#basic-usage)

To start using this library you need to include the standard *autoload.php* file that is maintained by **composer**:

```
require_once 'vendor/autoload.php';
```

To start using this library, you can create a CNabuCustomLexer object and provide a Lexer Data storage as is:

```
use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());
```

This action provides a custom lexer that you can customize to add rules and to perform analysis over your sample strings.

### The Keyword Rule

[](#the-keyword-rule)

The most basic rule, is the Keyword Rule. With it, you can parse a keyword and obtain the tokenized result.

Below, a basic sample using the **Keyword Rule**:

```
require_once 'vendor/autoload.php';

use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

use nabu\lexer\rules\CNabuLexerRuleKeyword;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());

$keyword_rule = CNabuLexerRuleKeyword::createFromDescriptor(
    $lexer,
    array(
        'keyword' => 'RULE',
        'method' => 'ignore case'
    )
);
$lexer->registerRule('keyword_rule', $keyword_rule);
$keyword_rule->applyRuleToContent('RULE is the basics');

var_export($keyword_rule->getTokens());
echo "\n";
```

Allowed *methods* are 'ignore case' and 'literal'. Then:

- 'ignore case' allows to match the keyword ignoring case letters. Internally, both strings (sample and keyword) are converted to lowercase and compare it. If both matches then interprets that the rule is covered and returns true.
- 'literal' forces that all characters matches exactly as expected by the keyword, and rule only is covered if all characters matches *literally*.

You can run this sample from the terminal typing:

```
php samples/basic_sample_01.php
```

After execute this sample, you can see in your terminal the list of parsed tokens:

```
array (
  0 => 'Rule',
)
```

Note that the list contains only an item because the *Keyword Rule* affects only to one occurrence of keyword. As the rule method is defined as 'ignore case', the token included matches with the sample source string and not like the keyword attribute.

### The Regular Expression Rule

[](#the-regular-expression-rule)

This rule offers a wide application for polymorphic strings or dynamic structures that requires a use of a regular expression to interpret his content. Like the Keyword Rule, you can apply the match as 'literal' or 'ignore case', and, with ignore case, the '/i' modifier is applied when parse regular expressions using preg\_match.

Below, a basic example using the **Regular Expression Rule**:

```
require_once 'vendor/autoload.php';

use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

use nabu\lexer\rules\CNabuLexerRuleRegEx;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());

$regex_rule = CNabuLexerRuleRegEx::createFromDescriptor(
    $lexer,
    array(
        'match' => '\\w+',
        'method' => 'ignore case'
    )
);
$lexer->registerRule('regex_rule', $regex_rule);
$regex_rule->applyRuleToContent('RUle is the basics');

var_export($regex_rule->getTokens());
echo "\n";
```

Allowed *methods* are the same than Keyword Rules attribute.

You can run this sample from the terminal typing:

```
php samples/basic_sample_02.php
```

After execute this sample, you can see in your terminal the list of parsed tokens:

```
array (
  0 => 'RUle',
)
```

Note that the list contains only an item because the *Regular Expression Rule* affects only to one occurrence of the expression. As the rule method is defined as 'ignore case', the token included matches with the sample source string and not like the keyword attribute.

Block rules
-----------

[](#block-rules)

Block rules have the capability of group any kind or rule to apply a *case*, *sequence* or *repetition* of a list of rules.

### The Case Rule

[](#the-case-rule)

This rule allows to treat a list of rules as a switch/case sentence. Then, you can define this list and apply the rule. If the sample string matches, at least one of the listed rules, the first matched is applied and the evaluation of the rule stops here.

Below, a basic example using the **Case Rule**:

```
require_once 'vendor/autoload.php';

use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

use nabu\lexer\rules\CNabuLexerRuleGroup;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());

$case_rule = CNabuLexerRuleGroup::createFromDescriptor(
    $lexer,
    array(
        'method' => 'case',
        'group' => array(
            array(
                'keyword' => 'Rule',
                'method' => 'ignore case'
            ),
            array(
                'keyword' => 'are',
                'method' => 'ignore case'
            ),
            array(
                'keyword' => 'the',
                'method' => 'ignore case'
            ),
            array(
                'keyword' =>  'basics',
                'method' => 'literal'
            )
        )
    )
);
$lexer->registerRule('case_rule', $case_rule);
$case_rule->applyRuleToContent('The basics are Rules?');

var_export($case_rule->getTokens());
echo "\n";
```

You can run this sample from the terminal typing:

```
php samples/block_sample_01.php
```

After execute this sample, you can see in your terminal the list of parsed tokens:

```
array (
  0 => 'The',
)
```

Note that the list contains only an item because the *Case Rule* affects only to the first occurrence in the list or rules.

### The Sequence Rule

[](#the-sequence-rule)

Sequence rules are similar to Case Rules, but it's necessary to look at the method, that it will be 'sequence', and also, that you can define a *tokenizer* expression to allow a *separator* between rules involved in the sequence.

Below, a basic example using the **Sequence Rule**:

```
require_once 'vendor/autoload.php';

use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

use nabu\lexer\rules\CNabuLexerRuleGroup;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());

$sequence_rule = CNabuLexerRuleGroup::createFromDescriptor(
    $lexer,
    array(
        'method' => 'sequence',
        'tokenizer' => array(
            'method' => 'literal',
            'match' => '\s+',
        ),
        'group' => array(
            array(
                'keyword' => 'the',
                'method' => 'ignore case'
            ),
            array(
                'keyword' =>  'basics',
                'method' => 'literal'
            ),
            array(
                'keyword' => 'are',
                'method' => 'ignore case'
            ),
            array(
                'keyword' => 'Rules',
                'method' => 'ignore case'
            )
        )
    )
);
$lexer->registerRule('sequence_rule', $sequence_rule);
$sequence_rule->applyRuleToContent("The basics   are\tRules?");

var_export($sequence_rule->getTokens());
echo "\n";
```

Note that the variation respecting to **Case Rule** are two factors:

1. The **method** is 'sequence'.
2. We add a **tokenizer** attribute that contains an explicit rule declaration (in this case a Regular Expression Rule). This rule is applied before each iteration in the list of rules.

You can run this sample from the terminal typing:

```
php samples/block_sample_02.php
```

After execute this sample, you can see in your terminal the list of parsed tokens:

```
array (
  0 => 'The',
  1 => ' ',
  2 => 'basics',
  3 => '   ',
  4 => 'are',
  5 => '	',
  6 => 'Rules',
)
```

Note that the list contains all words in the sample string because the *Sequence Rule* try to match the full list in the order the it is declared. If one rule fails, then the sequence stops and rewinds the list to NULL to ensure that no tokens are parsed.

### The Repeat Rule

[](#the-repeat-rule)

Repeat rules have the capability of define a cardinality for a rule. This cardinality can be defined as a minimum value and a maximum value or as a fixed value. Allowed formats can be:

- Fixed cardinality: any natural number starting at 0. This will be applied as '*repeat exactly n times*', where **n** is the selected number.
- Range: a range it's a tuple of values in the form '*m..n*', where **m** and **n** are a natural number starting at 0 for **m** and at **m** for **n**. This means '*repeat between **m** and **n** times*'. If the repeat number is less than **m** then the rule evaluation fails. If the repeat evaluation rule fails between **m** and **n** iterations, the evaluation rule success. If the repeat iteration reach **n** the evaluation stops and finish successful.
- Infinite: in this case, you choose 'n' as value. Internally, this is translated as **1..n** and applies Range cardinality as explained above, and then, will be applied as '*at least one time, but until infinite times or rule fails*'. Like **Sequence Rules**, this kind of rules supports the use of a *tokenizer* acting as a separator between each iteration of the rule.

Below, a basic example using the **Repeat Rule**:

```
require_once 'vendor/autoload.php';

use nabu\lexer\CNabuCustomLexer;
use nabu\lexer\data\CNabuLexerData;

use nabu\lexer\rules\CNabuLexerRuleRepeat;

$lexer = CNabuCustomLexer::getLexer();
$lexer->setData(new CNabuLexerData());

$repeat_rule = CNabuLexerRuleRepeat::createFromDescriptor(
    $lexer,
    array(
        'repeat' => '1..4',
        'tokenizer' => array(
            'method' => 'literal',
            'match' => '\s+'
        ),
        'rule' => array(
            'method' => 'ignore case',
            'match' => '[a-zA-Z]+'
        )
    )
);
$lexer->registerRule('repeat_rule', $repeat_rule);
$repeat_rule->applyRuleToContent("The basics   are\tRules?");

var_export($repeat_rule->getTokens());
echo "\n";
```

You can run this sample from the terminal typing:

```
php samples/block_sample_03.php
```

This sample have a similar result than the above of **Sequence Rule**, but in this case, implied rules are less restrictives as the rule matches with any kind of repetition between 1 and 4 times, matching a sequence of letters in lowercase or uppercase. As is, another phrase containing at least one word will match this rule until a limit of four words.

```
array (
  0 => 'The',
  1 => ' ',
  2 => 'basics',
  3 => '   ',
  4 => 'are',
  5 => '	',
  6 => 'Rules',
)
```

###  Health Score

21

—

LowBetter than 19% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity10

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity39

Early-stage or recently created project

 Bus Factor1

Top contributor holds 69.4% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/f82a05161f3f3484dd4919fbbf064c70ed6f51811e1933f614dfffc517ddade2?d=identicon)[nabu-3](/maintainers/nabu-3)

---

Top Contributors

[![critterbots](https://avatars.githubusercontent.com/u/35937430?v=4)](https://github.com/critterbots "critterbots (75 commits)")[![wiscot](https://avatars.githubusercontent.com/u/20405017?v=4)](https://github.com/wiscot "wiscot (33 commits)")

### Embed Badge

![Health badge](/badges/nabu-3-lexer/health.svg)

```
[![Health](https://phpackages.com/badges/nabu-3-lexer/health.svg)](https://phpackages.com/packages/nabu-3-lexer)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
