PHPackages                             dekor/php-syntax-tree-builder - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. dekor/php-syntax-tree-builder

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

dekor/php-syntax-tree-builder
=============================

This library allows you to build your custom language by describing lexing/parsing rules.

210PHP

Since Sep 18Pushed 2y ago1 watchersCompare

[ Source](https://github.com/deniskoronets/PHP-Syntax-Tree-Builder)[ Packagist](https://packagist.org/packages/dekor/php-syntax-tree-builder)[ RSS](/packages/dekor-php-syntax-tree-builder/feed)WikiDiscussions master Synced 3w ago

READMEChangelogDependenciesVersions (1)Used By (0)

PHP-Syntax-Tree-Builder
=======================

[](#php-syntax-tree-builder)

This library allows you to build your custom grammar left-right, RD (recursive descent) parser by describing lexing/parsing rules. It builds on [Finite-state machine](https://en.wikipedia.org/wiki/Finite-state_machine) and gives you an instance of [Abstract syntax tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).

Inspired by lex/yacc for c/c++ *(but not the same!)*.

Installation
------------

[](#installation)

```
composer install dekor/php-syntax-tree-builder

```

How it works?
-------------

[](#how-it-works)

Building grammar takes 2 main steps:

1. lexing - splitting origin file into language lexems.
2. parser - using finite-state machine, parsing lexems sequence by some grammary rules.

### Lexing

[](#lexing)

This part is very simple. All you need is to describe pairs regex =&gt; lexem name. Here is a simple list if lexer:

```
$lexer = new \Dekor\PhpSyntaxTreeBuilder\Lexer([
    '[\s\t\n]+' => '',
    '\;' => ';',
    '=' => '=',
    '\$([a-z]+)' => 'VAR',
    'if' => 'IF',
    '([0-9]+)' => 'NUMBER',
    '\"(.*)\"' => 'STRING',
    'print' => 'PRINT',
    '(\+|\-|\*\/)' => 'MATH_OPERATOR',
]);

$lexems = $lexer->parse($pathToFile);
```

The result of `parse` method is an array of lexems. In case when lexer can't determine current symbol sequence as any of described lexems, it throws `LexerAnalyseException`. This could be catched and processed.

### Parsing

[](#parsing)

This part is more complex. When you describing construction, you need to have some boot construction which will be the point, at which finite-state machine starts:

```
$parser = new \Dekor\PhpSyntaxTreeBuilder\Parser([
    'startFrom' => 'g:php',
], [
    'g:php' => [
        'sequence' => [
            'OPENING_PHP_TAG', 'g:statements',
        ],
    ],
    'g:statements' => [
        'sequence' => [
            'g:statement',
            '?g:statements',
        ],
    ],
    'g:statement' => [
        'or' => [
            'g:var_assign',
            'g:if',
        ],
    ],
    ...
]);
```

As you can see in this sample, in the config we say that it will start parsing from g:php. Why do we add `g:` prefix? Actually, this have only visual purposes: it can be easily understanded, when you mention group and when - token.

Lets go forward. Here we can see `sequence` and `or`. In each group you can have only one of that. What does it mean? In case, when we have `sequence`, the parser will foreach through each element of sequence and try to parse lexems in this order. In case when we have group mentioned within sequence, it will be recursively parsed. This means that parser will check lexems for opening php lexem, then it will parse statements. Statements consists of another sequence: `statement` and `?statements`. Statement here is a single statement which you can see below, `?statements` means that after single statement, there will be another statement. But, the `?` symbol tells parser that it is optional, which means that parser will try to parse group, in case of unseccess it will finish continue with the following element of the sequence. Here below, we can see `or`. This allows parser to try parsing of each group from the list. Once parser matches construction, it continues with the group that works in current statement. This construction allows you to split your alghoritm in multiple branches. FYI, it trying parsing from left to right.

#### What can go wrong?

[](#what-can-go-wrong)

In some cases you may have recursive grammary which may be looped infinitely. As this parser is left-right one, here takes place [Left recursion](https://en.wikipedia.org/wiki/Left_recursion). You may check the article in order to resolve this. This particular situation is resolved for formula parsing in the `/demo/php/demo.php`. Please, check the grammar section.

Usage
-----

[](#usage)

Here is a simple structure of usage:

```