PHPackages                             loupe/matcher - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. loupe/matcher

ActiveLibrary[Search &amp; Filtering](/categories/search)

loupe/matcher
=============

Tokenize, decompose, highlight and crop around text and search terms

0.3.0(4w ago)10186.6k↓66.1%1[1 issues](https://github.com/loupe-php/matcher/issues)1MITPHPPHP ^8.1

Since May 8Pushed 2w ago1 watchersCompare

[ Source](https://github.com/loupe-php/matcher)[ Packagist](https://packagist.org/packages/loupe/matcher)[ GitHub Sponsors](https://github.com/Toflar)[ RSS](/packages/loupe-matcher/feed)WikiDiscussions main Synced 2d ago

READMEChangelog (8)Dependencies (18)Versions (17)Used By (1)

Loupe Matcher
=============

[](#loupe-matcher)

Loupe Matcher turns plain search queries and arbitrary text into precise, user-friendly matches: tokenize phrases and negations, normalize language-specific spelling variants, decompose compound words, calculate match spans, and format the result with highlighting, cropping, and truncation.

Installation
------------

[](#installation)

```
composer require loupe/matcher
```

Quick Start
-----------

[](#quick-start)

Here's a simple example of how to use Loupe Matcher to highlight search terms in a text document and crop around the highlights:

```
use Loupe\Matcher\Tokenizer\LocaleConfiguration\English;
use Loupe\Matcher\Tokenizer\Tokenizer;
use Loupe\Matcher\Matcher;
use Loupe\Matcher\Formatter;
use Loupe\Matcher\FormatterOptions;

$tokenizer = new Tokenizer(new English());
$matcher = new Matcher($tokenizer);
$formatter = new Formatter($matcher);

$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withEnableCrop()
    ->withCropLength(20);

$result = $formatter->format(
    'I always take my toothbrush with me for holidays',
    'brush',
    $options
);

// "…take my toothbrush with…"
```

Core Components
---------------

[](#core-components)

### Tokenizer

[](#tokenizer)

**Purpose:** Breaks text into searchable tokens (words, phrases, terms) for accurate matching.

The `Tokenizer` converts strings into `TokenCollection` objects, handling:

- Word boundaries using `ext-intl` rules
- Phrase groups (quoted terms like `"exact phrase"`)
- Negated terms (prefixed with `-`)
- Locale-specific tokenization
- Locale-specific term decomposition

```
$localeConfiguration = null; // Must implement the `LocaleConfigurationInterface`.
$tokenizer = new Tokenizer($localeConfiguration); // Optional locale configuration
$tokens = $tokenizer->tokenize('search for "exact phrase" -exclude');

$tokens->all();          // All tokens
$tokens->phraseGroups(); // Quoted phrases only
$tokens->allNegated();   // Terms to exclude
```

If you want to configure the way the `Tokenizer` handles locale specifics (such as decomposition or normalization), you can provide your own implementation of the `LocaleConfigurationInterface` or use any of the pre-built configurations shipped with this library. There are currently the following:

- English: Handles decomposition (`toothbrush` -&gt; `tooth`, `brush`)
- German: Handles normalization of German umlauts as well as `ß` and also decomposition (`Zeitungspapier` -&gt; `zeitung`, `papier`)

Checkout the [separate docs on decomposition](./docs/decomposition.md) if you want to improve the existing locale configurations or add support for a new one!

### Matcher

[](#matcher)

**Purpose:** Finds which tokens in your text match the search query.

The `Matcher` compares tokenized text against search terms, with support for:

- Stop word filtering (ignore common words like "the", "and")
- Match span calculation (start/end positions)
- Flexible matching between token collections

```
$matcher = new Matcher($tokenizer, ['the', 'and', 'or']); // Stop words
$matches = $matcher->calculateMatches('Text to search', 'search query');

// Get position information for highlighting
$spans = $matcher->calculateMatchSpans('Text to search', 'query', $matches);
foreach ($spans as $span) {
    echo "Match at position {$span->getStartPosition()}-{$span->getEndPosition()}";
}
```

### Formatter

[](#formatter)

**Purpose:** Combines matching and highlighting to create formatted output with context.

The `Formatter` orchestrates the entire process:

- Highlights matched terms with HTML tags
- Crops text to show relevant context around matches
- Truncates long text while preserving word boundaries and highlights
- Configurable through `FormatterOptions`

```
$formatter = new Formatter($matcher);

$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withHighlightStartTag('')
    ->withHighlightEndTag('')
    ->withEnableCrop()
    ->withCropLength(150)
    ->withCropMarker(' ... ')
    ->withEnableTruncation()
    ->withTruncationLength(200)
    ->withTruncationMarker('...')
    ->withEnableMatchPrioritization();

$result = $formatter->format($text, $query, $options);
echo $result->getFormattedText();
```

#### Match Prioritization

[](#match-prioritization)

By default, cropping emits snippets around every match cluster and truncation cuts from the start. Enabling `withEnableMatchPrioritization()` will attempt to choose the most relevant window(s) for display. Windows are scored by distinct query terms hit, then total matches, then density.

- **Cropping** now finds the best windows around matches, limits each window to `crop_length` and shows up to `crop_max_fragments` windows in document order.
- **Truncation** picks a single window centered on the best cluster of matches and falls back to truncating from the start if no matches are found in the attribute.

Advanced Usage
--------------

[](#advanced-usage)

### Custom Tokenizer

[](#custom-tokenizer)

Implement `TokenizerInterface` for specialized tokenization:

```
class CustomTokenizer implements TokenizerInterface {
    public function tokenize(string $text): TokenCollection {
        // Your custom tokenization logic
    }

    public function matches(Token $token, TokenCollection $tokens): bool {
        // Your custom logic for checking if a token is a match
    }
}
```

### Pre-highlighted Text Cropping

[](#pre-highlighted-text-cropping)

When you already have highlighted text that needs cropping:

```
$cropper = new \Loupe\Matcher\Formatting\Cropper(
    cropLength: 50,
    cropMarker: '…',
    highlightStartTag: '',
    highlightEndTag: ''
);

// "...text with highlighted terms."
echo $cropper->cropHighlightedText('Long text with highlighted terms.');
```

### Using Pre-calculated Matches

[](#using-pre-calculated-matches)

When you already have a `TokenCollection` of matches (e.g., from a previous search operation or external source), you can format text directly without re-calculating matches. This approach is useful when your search engine already provides match information or you want to cache match results for performance.

```
// Assume you already have matches from somewhere else
$existingMatches = new TokenCollection(/* ... */);

// Set up the tokenizer, matcher, and formatter as usual
$tokenizer = new Tokenizer();
$matcher = new Matcher($tokenizer);
$formatter = new Formatter($matcher);
$options = (new FormatterOptions())
    ->withEnableHighlight()
    ->withEnableCrop()
    ->withCropLength(100);

// Format using the existing matches - no duplicate processing
$result = $formatter->format($text, $query, $options, matches: $existingMatches);
echo $result->getFormattedText();
```

###  Health Score

52

—

FairBetter than 96% of packages

Maintenance94

Actively maintained with recent releases

Popularity42

Moderate usage in the ecosystem

Community18

Small or concentrated contributor base

Maturity44

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 62.2% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~56 days

Recently: every ~39 days

Total

8

Last Release

28d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/481937?v=4)[Yanick Witschi](/maintainers/Toflar)[@Toflar](https://github.com/Toflar)

![](https://www.gravatar.com/avatar/31ac2b3787ded290b6bac87b937abf4f267483e4da64731bfb256a942bb669ca?d=identicon)[daun](/maintainers/daun)

---

Top Contributors

[![daun](https://avatars.githubusercontent.com/u/22225348?v=4)](https://github.com/daun "daun (28 commits)")[![Toflar](https://avatars.githubusercontent.com/u/481937?v=4)](https://github.com/Toflar "Toflar (16 commits)")[![lemmon](https://avatars.githubusercontent.com/u/251591?v=4)](https://github.com/lemmon "lemmon (1 commits)")

---

Tags

searchhighlighttokenizercropdictionarydecompositiontokenizedecompose

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StyleECS

Type Coverage Yes

### Embed Badge

![Health badge](/badges/loupe-matcher/health.svg)

```
[![Health](https://phpackages.com/badges/loupe-matcher/health.svg)](https://phpackages.com/packages/loupe-matcher)
```

###  Alternatives

[elasticsearch/elasticsearch

PHP Client for Elasticsearch

5.3k187.3M1.1k](/packages/elasticsearch-elasticsearch)[ruflin/elastica

Elasticsearch Client

2.3k52.4M229](/packages/ruflin-elastica)[solarium/solarium

PHP Solr client

93334.5M118](/packages/solarium-solarium)[netgen/query-translator

Query Translator is a search query translator with AST representation

2042.1M8](/packages/netgen-query-translator)[opensearch-project/opensearch-php

PHP Client for OpenSearch

15528.5M116](/packages/opensearch-project-opensearch-php)[apicart/fql

Filter Query Language

1210.7k](/packages/apicart-fql)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
