PHPackages                             dbeurive/lexer - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. dbeurive/lexer

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

dbeurive/lexer
==============

This project implements a simple lexer.

1.0.4(9y ago)22.6k1MITPHP

Since Nov 23Pushed 9y ago1 watchersCompare

[ Source](https://github.com/dbeurive/Lexer)[ Packagist](https://packagist.org/packages/dbeurive/lexer)[ RSS](/packages/dbeurive-lexer/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (2)Versions (6)Used By (1)

Introduction
============

[](#introduction)

This repository contains the implementation of a basic lexer.

A lexer explodes a given string into a list of tokens.

Installation
============

[](#installation)

From the command line:

```
composer require dbeurive\lexer

```

If you want to include this package to your project, then edit your file `composer.json` and add the following entry:

```
"require": {
    "dbeurive/lexer": "*"
}

```

Synopsis
========

[](#synopsis)

```
    $varProcessor = function(array $inMatches) {
        $name = strtolower($inMatches[2]);
        switch (strtoupper($inMatches[1])) {
            case 'L': return 'LOCAL_'  . $name;
            case 'G': return 'GLOBAL_' . $name;
        }
        throw new \Exception("Impossible error!");
    };

    $tokens = array(
        array('/[0-9]+/',                'numeric'),
        array('/\\$([lg])([a-z0-9]+)/i', 'variable', $varProcessor),
        array('/[a-z]{2,}/i',            'function'),
        array('/(\\+|\\-|\\*|\\/)/',     'operator'),
        array('/\\(/',                   'open_bracket'),
        array('/\\)/',                   'close_bracket'),
        array('/(\\s+|\\r?\\n)/',        'blank', function(array $m) { return null; })
    );

    try {
        $lexer = new Lexer($tokens);
        $text = '$gConstant1 + sin($lCoef1) / cos($lcoef2) * $gTemp - tan(21)';
        $tokens = $lexer->lex($_text);
    } catch (\Exception $e) {
        print "ERROR: " . $e->getMessage() . "\n";
        exit(1);
    }

    /** @var Token $_token */
    foreach ($tokens as $_token) {
        printf("%s %s\n", $_token->type, $_token->value);
    }
```

Specifications
==============

[](#specifications)

Description
-----------

[](#description)

The lexer is configured by a list of tokens specifications:

```
array(
    ,
    ,
    ...
)

```

Each token specification is an array that contains 2 or 3 elements.

```
 = array(, , [])

```

- The first element is a regular expression that describes the token.
- The second element is a name that identifies the type of the token.
- The optional third element is a function that is applied to the token's value before it is returned.

> **WARNING**
>
> Make sure to double all characters "`\`" within the regular expressions that define the tokens. That is: `'/\s/'` becomes `'/\\s/'.`

The signature of the optional third element (``) must be:

```
mixed|null function(array $inMatches)

```

The array (`$inMatches`) passed to the function comes from the processing of the regular expression that describes the token.

- The first element of the array (`$inMatches[0]`) contains the text that matches the full pattern.
- The second element of the array (`$inMatches[1]`) contains the text that matched the first captured parenthesized subpattern.
- The third element of the array (`$inMatches[2]`) contains the text that matched the second captured parenthesized subpattern.
- ... and so on.

> See the description for the PHP function `preg_match()`.

- If the function returns the value `null`, then the detected token is "ignored". That is: it will not be inserted into the list of extracted tokens.
- If the function returns a non-null value, then the token is inserted in the list of detected tokens. The value of the inserted token will be the value returned by the function (``).

Very important note
-------------------

[](#very-important-note)

Be aware that the order of declarations of the tokens is important.

The [example 2](examples/example2.php) illustrates this point.

```
    use dbeurive\Lexer\Lexer;
    use dbeurive\Lexer\Token;

    $text = 'AAAA AA';

    // ---------------------------------------------------------
    // TEST 1
    // ---------------------------------------------------------

    $specifications = array(
        array('/AA/',                    'type A2'),
        array('/A/',                     'type A1'),
        array('/(\\s+|\\r?\\n)/',        'blank', function(array $m) { return null; })
    );

    try {
        $lexer = new Lexer($specifications);
        $tokens = $lexer->lex($text);
    } catch (\Exception $e) {
        print "ERROR: " . $e->getMessage() . "\n";
        exit(1);
    }

    print "Test1: $text\n\n";
    dumpToken($tokens);
    print "\n";

    // ---------------------------------------------------------
    // TEST 2
    // ---------------------------------------------------------

    $specifications = array(
        array('/A/',                     'type A1'),
        array('/AA/',                    'type A2'),
        array('/(\\s+|\\r?\\n)/',        'blank', function(array $m) { return null; })
    );

    try {
        $lexer = new Lexer($specifications);
        $tokens = $lexer->lex($text);
    } catch (\Exception $e) {
        print "ERROR: " . $e->getMessage() . "\n";
        exit(1);
    }

    print "Test2: $text\n\n";
    dumpToken($tokens);

    exit(0);

    function dumpToken(array $inTokens) {
        $max = 0;

        /** @var Token $_token */
        foreach ($inTokens as $_token) {
            $max = strlen($_token->type) > $max ? strlen($_token->type) : $max;
        }

        /** @var Token $_token */
        foreach ($inTokens as $_token) {
            printf("%${max}s %s\n", $_token->type, $_token->value);
        }
    }
```

The result is:

```
Test1: AAAA AA

type A2 AA
type A2 AA
type A2 AA

Test2: AAAA AA

type A1 A
type A1 A
type A1 A
type A1 A
type A1 A
type A1 A

```

API
===

[](#api)

Constructor
-----------

[](#constructor)

```
    /**
     * Lexer constructor.
     * @param array $inSpecifications This array represents the tokens specifications.
     *        Each element of this array is an array that specifies a token.
     *        It contains 2 or 3 elements.
     *        - First element: a regular expression that describes the token.
     *        - Second element: the name of the token.
     *        - Third element: an optional callback function.
     *          The signature of this function must be:
     *          null|string function(array $inMatches)
     * @throws \Exception
     */
    public function __construct(array $inSpecifications)
```

Please see the section "specifications" for a detailed description of the parameter.

lex()
-----

[](#lex)

```
    /**
     * Explode a given string into a list of tokens.
     * @param string $inString The string to explode into tokens.
     * @return array The method returns a list of tokens.
     *         Each element of the returned list is an instance of the class Token.
     * @throws \Exception
     * @see Token
     */
    public function lex($inString)
```

This method "parses" a given text and returns a list of detected tokens.

The returned array contains the list of detected tokens.

Each element of the returned array is an instance of the class `\dbeurive\Lexer\Token`.

```
    /**
     * Class Token
     *
     * This class implements a token.
     *
     * @package dbeurive\Lexer
    */
    class Token
    {
        /** @var null|mixed Token's value. */
        public $value = null;
        /** @var null|string Token's type. */
        public $type = null;

        /**
         * Token constructor.
         * @param string $inOptValue The token's value.
         * @param string $inOptType The token's type.
         */
        public function __construct($inOptValue=null, $inOptType=null)
        {
            $this->value = $inOptValue;
            $this->type  = $inOptType;
        }
    }
```

###  Health Score

33

—

LowBetter than 75% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity23

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity67

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

5

Last Release

3459d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/18211524?v=4)[Denis BEURIVE](/maintainers/dbeurive)[@dbeurive](https://github.com/dbeurive)

---

Top Contributors

[![dbeurive](https://avatars.githubusercontent.com/u/18211524?v=4)](https://github.com/dbeurive "dbeurive (13 commits)")

---

Tags

parserlexertokenizer

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/dbeurive-lexer/health.svg)

```
[![Health](https://phpackages.com/badges/dbeurive-lexer/health.svg)](https://phpackages.com/packages/dbeurive-lexer)
```

###  Alternatives

[doctrine/lexer

PHP Doctrine Lexer parser library that can be used in Top-Down, Recursive Descent Parsers.

11.2k910.8M118](/packages/doctrine-lexer)[cerbero/json-parser

Zero-dependencies pull parser to read large JSON from any source in a memory-efficient way.

803474.6k5](/packages/cerbero-json-parser)[creof/geo-parser

Parser for geography coordinate strings

624.4M15](/packages/creof-geo-parser)[creof/wkt-parser

Parser for well-known text (WKT) object strings

554.8M16](/packages/creof-wkt-parser)[nicoswd/php-rule-parser

Rule Engine - Rule Parser &amp; Evaluator

13078.6k7](/packages/nicoswd-php-rule-parser)[tmilos/lexer

Lexical analyzer with individual token definition with regular expressions

211.7M2](/packages/tmilos-lexer)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)