PHPackages                             pucene/analysis - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. pucene/analysis

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

pucene/analysis
===============

Analysis package for pucene.

0.1.x-dev(3y ago)0662MITPHPPHP ^8.1

Since Dec 31Pushed 3y ago2 watchersCompare

[ Source](https://github.com/pucene/analysis)[ Packagist](https://packagist.org/packages/pucene/analysis)[ RSS](/packages/pucene-analysis/feed)WikiDiscussions 0.1 Synced 1mo ago

READMEChangelogDependencies (3)Versions (1)Used By (2)

Pucene: Analysis
================

[](#pucene-analysis)

Text analysis enables pucene to perform full-text search, where the search returns all relevant results rather than just exact matches.

If you search for `Quick fox jumps`, you probably want the document that contains `A quick brown fox jumps over the lazy dog`, and you might also want documents that contain related words like `fast fox` or `foxes leap`.

> This is a subtree split of the `pucene/pucene` project create issues in the [main repository](https://github.com/pucene/pucene).

Tokenization
------------

[](#tokenization)

Analysis makes full-text search possible through tokenization: breaking a text down into smaller chunks, called tokens. In most cases, these tokens are individual words.

If you index the phrase `the quick brown fox jumps` as a single string and the user searches for `quick fox`, it isn’t considered a match. However, if you tokenize the phrase and index each word separately, the terms in the query string can be looked up individually. This means they can be matched by searches for `quick fox`, `fox brown`, or other variations.

Normalization
-------------

[](#normalization)

Tokenization enables matching on individual terms, but each token is still matched literally. This means:

- A search for `Quick` would not match `quick`, even though you likely want either term to match the other
- Although `fox` and `foxes` share the same root word, a search for `foxes` would not match `fox` or vice versa.
- A search for `jumps` would not match `leaps`. While they don’t share a root word, they are synonyms and have a similar meaning.

To solve these problems, text analysis can `normalize` these tokens into a standard format. This allows you to match tokens that are not exactly the same as the search terms, but similar enough to still be relevant. For example:

- `Quick` can be lowercased: `quick`.
- `foxes` can be stemmed, or reduced to its root word: `fox`.
- `jump` and `leap` are synonyms and can be indexed as a single word: `jump`.

To ensure search terms match these words as intended, you can apply the same tokenization and normalization rules to the query string. For example, a search for `Foxes leap` can be normalized to a search for `fox jump`.

Customize text analysis
-----------------------

[](#customize-text-analysis)

Text analysis is performed by an analyzer, a set of rules that govern the entire process.

Pucene includes a default analyzer, called the standard analyzer, which works well for most use cases right out of the box.

If you want to tailor your search experience, you can choose a different [built-in analyzer](examples/standard.php#L33)or even configure a custom one. A [custom analyzer](examples/custom.php#L45-L100) gives you control over each step of the analysis process, including:

- Changes to the text before tokenization
- How text is converted to tokens
- Normalization changes made to tokens before indexing or search

Examples
--------

[](#examples)

The directory [examples](examples) contains multiple files which contains examples how to use this library.

Disclaimer
----------

[](#disclaimer)

This text was highly inspired by:

###  Health Score

21

—

LowBetter than 18% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity9

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity38

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

1234d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/e43a10984e8ee1687abaac86c26311c6a607b9560e8a3cc3193f8245751618bc?d=identicon)[wachterjohannes](/maintainers/wachterjohannes)

---

Top Contributors

[![wachterjohannes](https://avatars.githubusercontent.com/u/1464615?v=4)](https://github.com/wachterjohannes "wachterjohannes (3 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/pucene-analysis/health.svg)

```
[![Health](https://phpackages.com/badges/pucene-analysis/health.svg)](https://phpackages.com/packages/pucene-analysis)
```

###  Alternatives

[bitrix-expert/bbc

Bitrix basis components

10242.9k1](/packages/bitrix-expert-bbc)[refinery29/league-lazy-event

Provides a LazyListener for use with league/event which allows for lazy fetching of actual listeners.

1523.3k](/packages/refinery29-league-lazy-event)[mage2/module-installer

A composer plugin, to help install modules for AvoREd e commerce applications.

132.0k](/packages/mage2-module-installer)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
