PHPackages                             opencat/segmentation - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. opencat/segmentation

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

opencat/segmentation
====================

SRX-based sentence segmentation engine for the OpenCAT Framework

00PHP

Since May 9Pushed 1mo agoCompare

[ Source](https://github.com/shaikhammar/opencat-segmentation)[ Packagist](https://packagist.org/packages/opencat/segmentation)[ RSS](/packages/opencat-segmentation/feed)WikiDiscussions main Synced 1w ago

READMEChangelogDependenciesVersions (1)Used By (0)

opencat/segmentation
====================

[](#opencatsegmentation)

SRX-based sentence segmentation engine for the [OpenCAT Framework](https://github.com/shaikhammar/opencat-framework).

Takes a `Segment` (one structural unit from a file filter — a paragraph, cell, or node) and splits it into individual sentences according to SRX 2.0 rules. `InlineCode` elements inside the segment are distributed correctly across the resulting sentences, with spanning codes automatically repaired.

Installation
------------

[](#installation)

```
composer require opencat/segmentation
```

Requires `ext-mbstring`.

Usage
-----

[](#usage)

```
use CatFramework\Segmentation\SrxSegmentationEngine;

$engine = new SrxSegmentationEngine();
// Auto-loads the bundled default SRX rules on first call to segment()

$sentences = $engine->segment($segment, 'en-US');
// Returns Segment[] — one item if no sentence boundary was found
```

### Loading custom SRX rules

[](#loading-custom-srx-rules)

```
$engine->loadRules('/path/to/custom.srx');
$sentences = $engine->segment($segment, 'de-DE');
```

How segmentation works
----------------------

[](#how-segmentation-works)

1. **Plain text extraction** — `Segment::getPlainText()` strips all `InlineCode` elements, leaving only text characters.
2. **Rule matching** — the engine iterates every character position in the plain text. At each position it applies `LanguageRule` rules in order; the first matching rule decides break or no-break. First match wins.
3. **Break position adjustment** — break positions advance past any inter-sentence whitespace so trailing spaces stay with the preceding sentence.
4. **Element distribution** — text strings are sliced at break boundaries; `InlineCode` objects (zero-width) are assigned to the segment whose range contains them.
5. **Spanning code repair** — if a `` open tag lands in segment A and its close tag in segment B, both are marked `isIsolated = true`, a synthetic closing tag is appended to segment A, and a synthetic opening tag is prepended to segment B. This maps to XLIFF ``.

Segment IDs
-----------

[](#segment-ids)

Sub-segment IDs are derived from the parent: `"para-3"` → `"para-3:1"`, `"para-3:2"`, etc. This preserves the origin segment in all downstream IDs.

Inline code example
-------------------

[](#inline-code-example)

Given a segment `"Hello **world**. Next sentence."`:

```
Elements: ["Hello ", , "world", , ". Next sentence."]

```

After segmentation into two sentences:

```
Sentence 1: ["Hello ", , "world", , ". "]
Sentence 2: ["Next sentence."]

```

The bold span did not cross the boundary in this case, so no synthetic codes are needed. If it had:

```
Elements: ["Hello ", , "world. Next", , " sentence."]

```

```
Sentence 1: ["Hello ", , "world. ", ]
Sentence 2: [, "Next", , " sentence."]

```

Language codes
--------------

[](#language-codes)

Pass BCP 47 codes. The bundled SRX uses prefix patterns (`EN.*`, `HI.*`, etc.) so `"en-US"`, `"en-GB"`, and `"en"` all match the English rules. If no rule matches, the segment is returned unchanged.

Related packages
----------------

[](#related-packages)

- [`opencat/core`](https://github.com/shaikhammar/opencat-framework/tree/main/packages/core) — `Segment`, `InlineCode`, `SegmentationEngineInterface`
- [`opencat/srx`](https://github.com/shaikhammar/opencat-framework/tree/main/packages/srx) — SRX parser and bundled default rules
- [`opencat/workflow`](https://github.com/shaikhammar/opencat-framework/tree/main/packages/workflow) — uses `SrxSegmentationEngine` as part of the full pipeline

###  Health Score

19

—

LowBetter than 10% of packages

Maintenance61

Regular maintenance activity

Popularity0

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity11

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/c6b9044413071860fd6199e56c661a24f449797b77a1b130f0df7e98be8481f4?d=identicon)[shaikhammar](/maintainers/shaikhammar)

---

Top Contributors

[![actions-user](https://avatars.githubusercontent.com/u/65916846?v=4)](https://github.com/actions-user "actions-user (4 commits)")

### Embed Badge

![Health badge](/badges/opencat-segmentation/health.svg)

```
[![Health](https://phpackages.com/badges/opencat-segmentation/health.svg)](https://phpackages.com/packages/opencat-segmentation)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
