PHPackages                             vojtech-dobes/php-grammar-processing - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. vojtech-dobes/php-grammar-processing

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

vojtech-dobes/php-grammar-processing
====================================

Library for tokenization, abstract syntax tree parsing &amp; interpretation

11.1kPHPCI passing

Since Jun 3Pushed 11mo ago2 watchersCompare

[ Source](https://github.com/vojtech-dobes/php-grammar-processing)[ Packagist](https://packagist.org/packages/vojtech-dobes/php-grammar-processing)[ RSS](/packages/vojtech-dobes-php-grammar-processing/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependenciesVersions (2)Used By (0)

📓 Grammar Processing for PHP
============================

[](#-grammar-processing-for-php)

[![Checks](https://github.com/vojtech-dobes/php-grammar-processing/actions/workflows/checks.yml/badge.svg?branch=master&event=push)](https://github.com/vojtech-dobes/php-grammar-processing/actions/workflows/checks.yml/badge.svg?branch=master&event=push)

This library provides convenient APIs for describing a context-free grammar, and subsequently using it to build an abstract syntax tree and analyze it for an arbitrary purpose.

Installation
------------

[](#installation)

To install the latest version, run the following command:

```
composer require vojtech-dobes/php-grammar-processing

```

Minimum supported PHP version is 8.4.

Documentation
-------------

[](#documentation)

Note

All code examples assume classes are imported from `Vojtechdobes\GrammarProcessing` namespace.

Let's say in variable `$source` we have source text of a language that we want to process. For example, we want to evaluate a mathematical expression:

```
$source = '2 + 5';
```

Processing of the source text happens in few phases:

$$Tokenization \\to Parsing \\to Interpretation$$

1. In *Tokenization* phase, source text is split into basic small units called lexical tokens.
2. In *Parsing* phase, abstract syntax tree is constructed from these lexical tokens.
3. In *Interpretation* phase, abstract syntax tree is evaluated to a final arbitrary result that can be used further.

To carry out *Tokenization* and *Parsing* phases, we first have to define grammar of the language.

### Grammar definition

[](#grammar-definition)

Grammar is defined by creating [`Grammar`](../src/GrammarProcessing/Grammar.php) object.

```
$grammar = new Grammar(
    // Tokenization
    lexicalSymbols: $lexicalSymbols,
    syntaxTokenSymbols: $syntaxTokenSymbols,
    ignoredTokenSymbols: $ignoredTokenSymbols,
    // Parsing
    syntacticSymbols: $syntacticSymbols,
);
```

- **`$lexicalSymbols`**

    ```
    array

    ```

    Lexical symbols represent [production rules](https://en.wikipedia.org/wiki/Production_(computer_science)) for the source text. Production rule is provided as implementation of `Vocabulary\Symbol` interface. Example:

    ```
    $lexicalSymbols = [
        'Digit' => new Vocabulary\Regexp('[0-9]'),
        'Integer' => new Vocabulary\Sequence([
            new Vocabulary\Repeat(new Vocabulary\Nonterminal('Digit'), min: 1, max: null),
            new Vocabulary\NegativeLookahead(new Vocabulary\Nonterminal('Digit')),
        ]),
        'Operator' => new Vocabulary\OneOf([
            new Vocabulary\Literal('+'),
            new Vocabulary\Literal('-'),
        ]),
        'Whitespace' => new Vocabulary\Regexp('\s'),
    ];
    ```

    All available implementations of `Vocabulary\Symbol` are [listed below](#available-symbols). Through `Vocabulary\Nonterminal` instance lexical symbol can be based on other lexical symbols.
- **`$syntaxTokenSymbols`**

    ```
    list

    ```

    From lexical symbols we select those that will become constituent items of the resulting token stream. Every part of the source text (except for [ignored parts](#ignored-token-symbols)) must fit into one of syntax token symbols, otherwise `UnexpectedTokenException` will be thrown.

    ```
    $syntaxTokenSymbols = ['Integer', 'Operator'];
    ```
- **`$ignoredTokenSymbols`**

    ```
    list

    ```

    Not all parts of the source code are relevant in later stages, e.g. arbitrary whitespace or comments can be often discarded. We can list lexical symbols that cover such parts of the source code:

    ```
    $ignoredTokenSymbols = ['Whitespace'];
    ```
- **`$syntacticSymbols`**

    ```
    array

    ```

    Every syntactic symbol represents a syntactically meaningful sequence of lexical tokens. Like `$lexicalSymbols`, this parameter is a map where key is name of a symbol and value is production rule in form of `Vocabulary\Symbol` instance. Through `Vocabulary\Nonterminal` instance syntactic symbol can contain both lexical symbols and other syntactic symbols, thus giving rise to a tree structure. Example:

    ```
    $syntacticSymbols = [
        'BinaryOperation' => new Vocabulary\OneOf([
            new Vocabulary\Nonterminal('Addition'),
            new Vocabulary\Nonterminal('Subtraction'),
        ]),
        'Addition' => new Vocabulary\Sequence([
            new Vocabulary\Nonterminal('Integer'),
            new Vocabulary\Literal('+'),
            new Vocabulary\Nonterminal('Integer'),
        ]),
        'Subtraction' => new Vocabulary\Sequence([
            new Vocabulary\Nonterminal('Integer'),
            new Vocabulary\Literal('-'),
            new Vocabulary\Nonterminal('Integer'),
        ]),
    ];
    ```

### Grammar usage

[](#grammar-usage)

With `Grammar` instance, we can produce abstract syntax tree. We have to choose root syntactical symbol, which is expected to encompass the whole source text (the whole token stream).

```
$abstractSyntaxTree = $grammar->parseSource($source, rootSymbol: 'BinaryOperation');
```

In case of an error, `UnexpectedTokenException` will be thrown.

The `parseSource` call carries out both *Tokenization* and *Parsing* phase. Alternatively we can execute them manually:

```
$tokenStream = $grammar->tokenizeSource($source);
$abstractSyntaxTree = $grammar->parseTokenStream($tokenStream, rootSymbol: 'BinaryOperation');
```

### Interpretation

[](#interpretation)

With our example mathematical expression, result of *Interpretation* phase should be a number `7`. To do so, we will make following call:

```
$result = $abstractSyntaxTree->interpret([
    'Addition' => new AdditionNodeInterpretation(),
    'Integer' => new IntegerNodeInterpretation(),
    'Subtraction' => new SubtractionNodeInterpretation(),
]);
```

Values in the provided map are custom implementations of `NodeInterpretation` interface. Each is responsible to evaluate node in the abstract syntax tree it's registered for. Our `AdditionNodeInterpretation` implementation could look like this:

```
class AdditionNodeInterpretation implements NodeInterpretation
{
    public function evaluate(Node $node): Generator
    {
        return yield $node->value[0] + yield $node->value[2];
    }
}
```

What's happening here? In `$syntacticSymbols` we have defined `Addition` to be a `Sequence` with 3 elements. First and last elements are the operands of our mathematical addition operation. The middle element in the sequence (literal `+`) doesn't play role in our interpretation anymore - we already know we are in `Addition` node which is responsible for... adding things up!

To make sense of the operands, we can yield subnodes of the abstract syntax tree in order to interpret them first. In this case `$node->value[0]` &amp; `$node->value[2]` represent `Integer` nodes, and so `IntegerNodeInterpretation` will be used to interpret them.

```
class IntegerNodeInterpretation implements NodeInterpretation
{
    public function evaluate(Node $node): Generator
    {
        return (int) yield $node->value;
    }
}
```

Because `Integer` is lexical token, value of the node is `Token` from the token stream. Yielding instance of `Token` simply returns it's string value, which we can safely cast to PHP `int` type.

### Available symbols

[](#available-symbols)

- `Vocabulary\Literal`
- `Vocabulary\NegativeLookahead`
- `Vocabulary\Nonterminal`
- `Vocabulary\OneOf`
- `Vocabulary\Regexp`
- `Vocabulary\Sequence`
- `Vocabulary\Subtract`

###  Health Score

20

—

LowBetter than 14% of packages

Maintenance38

Infrequent updates — may be unmaintained

Popularity16

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity16

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/920689f25a0fc31b2807556eddd47203f33c4318893bce98818cf2f3e627ee8e?d=identicon)[vojtechdobes](/maintainers/vojtechdobes)

---

Top Contributors

[![vojtech-dobes](https://avatars.githubusercontent.com/u/415925?v=4)](https://github.com/vojtech-dobes "vojtech-dobes (35 commits)")

---

Tags

astcontext-freegrammarlanguagelexerparserphpsyntaxtoken

### Embed Badge

![Health badge](/badges/vojtech-dobes-php-grammar-processing/health.svg)

```
[![Health](https://phpackages.com/badges/vojtech-dobes-php-grammar-processing/health.svg)](https://phpackages.com/packages/vojtech-dobes-php-grammar-processing)
```

###  Alternatives

[masterminds/html5

An HTML5 parser and serializer.

1.8k242.8M229](/packages/masterminds-html5)[sabberworm/php-css-parser

Parser for CSS Files written in PHP

1.8k191.2M65](/packages/sabberworm-php-css-parser)[jms/metadata

Class/method/property metadata management in PHP

1.8k152.8M88](/packages/jms-metadata)[jms/serializer-bundle

Allows you to easily serialize, and deserialize data of any complexity

1.8k89.3M627](/packages/jms-serializer-bundle)[hassankhan/config

Lightweight configuration file loader that supports PHP, INI, XML, JSON, and YAML files

97513.5M170](/packages/hassankhan-config)[meyfa/php-svg

Read, edit, write, and render SVG files with PHP

54613.9M42](/packages/meyfa-php-svg)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
