PHPackages                             zbmowrey/weighted-levenshtein - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. zbmowrey/weighted-levenshtein

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

zbmowrey/weighted-levenshtein
=============================

Weighted Levenshtein, Optimal String Alignment, and Damerau-Levenshtein distance for PHP 8.4+ — a port of infoscout/weighted-levenshtein.

v1.1.1(3w ago)00MITPHPPHP ^8.4CI passing

Since May 15Pushed 3w agoCompare

[ Source](https://github.com/zbmowrey/weighted-levenshtein)[ Packagist](https://packagist.org/packages/zbmowrey/weighted-levenshtein)[ RSS](/packages/zbmowrey-weighted-levenshtein/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (3)Dependencies (3)Versions (4)Used By (0)

weighted-levenshtein
====================

[](#weighted-levenshtein)

[![CI](https://github.com/zbmowrey/weighted-levenshtein/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/zbmowrey/weighted-levenshtein/actions/workflows/ci.yml)[![License: MIT](https://camo.githubusercontent.com/7013272bd27ece47364536a221edb554cd69683b68a46fc0ee96881174c4214c/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d626c75652e737667)](LICENSE)

Weighted Levenshtein, Optimal String Alignment, and Damerau-Levenshtein edit-distance algorithms for PHP 8.4+. Pure PHP, strict types, no extensions.

PHP port of [weighted-levenshtein](https://github.com/infoscout/weighted-levenshtein) by David Su / InfoScout. The algorithms, default behavior, and asymmetric cost semantics are preserved exactly; only the API has been reshaped to feel native in modern PHP.

Why weighted distance?
----------------------

[](#why-weighted-distance)

Most edit-distance libraries treat every insertion, deletion, substitution, or transposition as cost 1. That's fine for generic fuzzy-matching but often not what you actually want.

- **OCR correction.** Substituting `0` for `O` is a likely confusion; substituting `X` for `O` is not. Give the first a smaller cost.
- **Typo correction.** On a QWERTY keyboard, `X` and `Z` are neighbors; `X` and `K` are not. Or: humans transpose adjacent letters frequently, so transpositions should be cheap.
- **Domain-specific noise.** If your input pipeline tends to drop trailing whitespace or duplicate digits, weight those operations accordingly.

This library lets you specify a cost per character (for insert/delete) and per ordered pair (for substitute/transpose), then runs the appropriate dynamic-programming algorithm.

Installation
------------

[](#installation)

```
composer require zbmowrey/weighted-levenshtein
```

Requires PHP 8.4 or newer.

Quick start
-----------

[](#quick-start)

```
use Zbmowrey\WeightedLevenshtein\Distance;

// Default (uniform) costs of 1.0 per operation.
echo Distance::levenshtein('kitten', 'sitting');             // 3
echo Distance::optimalStringAlignment('ca', 'ac');           // 1 (one transposition)
echo Distance::damerauLevenshtein('ab', 'bca');              // 2
```

Cost maps are immutable value objects. Build them with `withCost()`:

```
use Zbmowrey\WeightedLevenshtein\CharCostMap;
use Zbmowrey\WeightedLevenshtein\Distance;

$insertCosts = CharCostMap::uniform()->withCost('D', 1.5);
echo Distance::levenshtein('BANANAS', 'BANDANAS', $insertCosts);  // 1.5
```

Full examples
-------------

[](#full-examples)

These are the same examples as the original Python README, translated 1:1. Every snippet has a corresponding test in `tests/Readme/`.

```
use Zbmowrey\WeightedLevenshtein\CharCostMap;
use Zbmowrey\WeightedLevenshtein\CharPairCostMap;
use Zbmowrey\WeightedLevenshtein\Distance;

// --- Insertion cost ---
$insertCosts = CharCostMap::uniform()->withCost('D', 1.5);
echo Distance::levenshtein('BANANAS', 'BANDANAS', $insertCosts);
// 1.5

// --- Deletion cost ---
$deleteCosts = CharCostMap::uniform()->withCost('S', 0.5);
echo Distance::levenshtein('BANANAS', 'BANANA', $insertCosts, $deleteCosts);
// 0.5

// --- Substitution cost (asymmetric!) ---
$subs = CharPairCostMap::uniform()->withCost('H', 'B', 1.25);
echo Distance::levenshtein('HANANA', 'BANANA', null, null, $subs);
// 1.25

// The reverse direction is unweighted because we never set ('B','H').
echo Distance::levenshtein('BANANA', 'HANANA', null, null, $subs);
// 1.0

// Make the reverse direction match by setting the other ordered pair.
$subs = $subs->withCost('B', 'H', 1.25);
echo Distance::levenshtein('BANANA', 'HANANA', null, null, $subs);
// 1.25

// --- Transposition cost (Damerau-Levenshtein) ---
$transposes = CharPairCostMap::uniform()->withCost('A', 'B', 0.75);
echo Distance::damerauLevenshtein('ABNANA', 'BANANA', null, null, null, $transposes);
// 0.75

// Like substitution, transposition is also asymmetric.
echo Distance::damerauLevenshtein('BANANA', 'ABNANA', null, null, null, $transposes);
// 1.0

// Set the other direction to make it symmetric.
$transposes = $transposes->withCost('B', 'A', 0.75);
echo Distance::damerauLevenshtein('BANANA', 'ABNANA', null, null, null, $transposes);
// 0.75
```

Short aliases are available as free functions:

```
use function Zbmowrey\WeightedLevenshtein\{lev, osa, dam_lev};

echo lev('kitten', 'sitting');     // 3
echo osa('ca', 'ac');              // 1
echo dam_lev('ab', 'ba');          // 1
```

Built-in cost map presets
-------------------------

[](#built-in-cost-map-presets)

Two opinionated presets ship under `Zbmowrey\WeightedLevenshtein\Presets\` for the two most common weighted-distance use cases. Both return immutable `CharPairCostMap` instances, so you can layer your own overrides with `withCost()`.

### OCR output (`OcrConfusions`)

[](#ocr-output-ocrconfusions)

```
use Zbmowrey\WeightedLevenshtein\Distance;
use Zbmowrey\WeightedLevenshtein\Presets\OcrConfusions;

$substitute = OcrConfusions::common();              // default cost 0.25
echo Distance::levenshtein('FOOD', 'F00D', null, null, $substitute);  // 0.5
echo Distance::levenshtein('Hello', 'He11o', null, null, $substitute); // 0.5
```

`OcrConfusions::common(float $cost = 0.25)` covers a curated list of high-confidence OCR confusions in both directions. The full set:

- **Zero cluster:** `0`↔`O`↔`o`, `0`↔`D`, `O`↔`D`, `O`↔`Q`
- **One cluster:** `1`↔`l`↔`I`↔`i` (every pair)
- **Seven cluster:** `1`↔`7`, `2`↔`7`, `7`↔`T`, `7`↔`Z`↔`z`
- **Other digit/digit:** `2`↔`Z`↔`z`, `3`↔`5`, `3`↔`8`
- **Other digit/letter:** `5`↔`S`↔`s`, `8`↔`B`, `6`↔`G`, `6`↔`b`, `9`↔`g`, `9`↔`q`
- **Lowercase letter pairs:** `c`↔`e`, `n`↔`h`, `u`↔`v`, `m`↔`n`, `f`↔`t`, `r`↔`n`
- **Uppercase letter pairs:** `C`↔`G`, `E`↔`F`, `M`↔`N`, `P`↔`R`, `U`↔`V`, `V`↔`Y`

The list is intentionally conservative — only pairs that confuse across most fonts are included. Font-specific or low-resolution-only confusions (e.g. `4`↔`A`, `0`↔`6`, `n`↔`u`) and multi-character confusions (`rn`↔`m`, `cl`↔`d`, `vv`↔`w`) are not registered. Layer them on with `withCost()` if your data needs them.

Pairs not in the list keep the default cost of 1.0. Layer your domain-specific tweaks with `withCost()`:

```
$substitute = OcrConfusions::common()
    ->withCost('I', '1', 0.05)  // your OCR confuses I and 1 more strongly
    ->withCost('1', 'I', 0.05);
```

### Human typos (`QwertyKeyboard`)

[](#human-typos-qwertykeyboard)

```
use Zbmowrey\WeightedLevenshtein\Distance;
use Zbmowrey\WeightedLevenshtein\Presets\QwertyKeyboard;

$substitute = QwertyKeyboard::substituteCosts();
$transpose  = QwertyKeyboard::transposeCosts();

echo Distance::damerauLevenshtein(
    'helo',
    'hwlo',
    null,
    null,
    $substitute,   // w is adjacent to e on the keyboard
    $transpose,
);
// 0.5
```

Costs are derived from the Euclidean distance between keys on a standard staggered US QWERTY layout. Orthogonal and close-diagonal neighbors get the adjacent cost (default 0.5); one-key-removed pairs get the near cost (default 0.75); everything else stays at 1.0. Both lowercase and uppercase letters are populated. Mixed-case and mixed-shift-state pairs (e.g. `q`/`W`, `Q`/`1`) are left at default — those errors are rare in practice.

Override the thresholds if 0.5/0.75 don't fit your data:

```
$strict = QwertyKeyboard::substituteCosts(adjacentCost: 0.2, nearCost: 0.5);
```

API reference
-------------

[](#api-reference)

### `Zbmowrey\WeightedLevenshtein\Distance`

[](#zbmowreyweightedlevenshteindistance)

MethodDescription`Distance::levenshtein(string $a, string $b, ?CharCostMap $insertCosts = null, ?CharCostMap $deleteCosts = null, ?CharPairCostMap $substituteCosts = null): float`Wagner-Fischer Levenshtein distance.`Distance::optimalStringAlignment(string $a, string $b, ?CharCostMap $insertCosts = null, ?CharCostMap $deleteCosts = null, ?CharPairCostMap $substituteCosts = null, ?CharPairCostMap $transposeCosts = null): float`Wagner-Fischer with a single adjacent transposition check; substrings used in a transposition cannot also be edited.`Distance::damerauLevenshtein(string $a, string $b, ?CharCostMap $insertCosts = null, ?CharCostMap $deleteCosts = null, ?CharPairCostMap $substituteCosts = null, ?CharPairCostMap $transposeCosts = null): float`True Damerau-Levenshtein distance with arbitrary non-overlapping adjacent transpositions.### `Zbmowrey\WeightedLevenshtein\CharCostMap`

[](#zbmowreyweightedlevenshteincharcostmap)

Immutable per-character cost map for insert/delete operations.

MethodDescription`CharCostMap::uniform(float $defaultCost = 1.0): self`Construct a map where every character has cost `$defaultCost`.`withCost(string $char, float $cost): self`Return a new map with `$cost` for the single ASCII byte `$char`.`cost(string $char): float`Look up the cost for `$char`.### `Zbmowrey\WeightedLevenshtein\CharPairCostMap`

[](#zbmowreyweightedlevenshteincharpaircostmap)

Immutable per-ordered-pair cost map for substitute/transpose operations.

MethodDescription`CharPairCostMap::uniform(float $defaultCost = 1.0): self`Construct a map where every ordered pair has cost `$defaultCost`.`withCost(string $from, string $to, float $cost): self`Return a new map with `$cost` for the ordered pair (`$from`, `$to`).`cost(string $from, string $to): float`Look up the cost for the ordered pair.### `Zbmowrey\WeightedLevenshtein\Presets\OcrConfusions`

[](#zbmowreyweightedlevenshteinpresetsocrconfusions)

MethodDescription`OcrConfusions::common(float $cost = 0.25): CharPairCostMap`Curated OCR confusion substitutions in both directions.### `Zbmowrey\WeightedLevenshtein\Presets\QwertyKeyboard`

[](#zbmowreyweightedlevenshteinpresetsqwertykeyboard)

MethodDescription`QwertyKeyboard::substituteCosts(float $adjacentCost = 0.5, float $nearCost = 0.75): CharPairCostMap`Adjacency-weighted substitution cost map for ASCII letters and digits on a US QWERTY layout.`QwertyKeyboard::transposeCosts(float $adjacentCost = 0.5, float $nearCost = 0.75): CharPairCostMap`Adjacency-weighted transposition cost map using the same layout.### Free function aliases

[](#free-function-aliases)

In namespace `Zbmowrey\WeightedLevenshtein`:

- `lev(...)` — alias for `Distance::levenshtein(...)`.
- `osa(...)` — alias for `Distance::optimalStringAlignment(...)`.
- `dam_lev(...)` — alias for `Distance::damerauLevenshtein(...)`.

Limitations
-----------

[](#limitations)

- **ASCII only.** Inputs must consist of bytes 0–127. Any byte ≥ 128 raises `InvalidArgumentException`. This matches the original library, whose internal arrays are indexed by `ord()` over a 128-wide alphabet.
- **Case sensitive.** `'a'` and `'A'` are distinct characters.
- **Costs are asymmetric.** `CharPairCostMap` is keyed by *ordered* pairs. Setting `('A', 'B')` does not set `('B', 'A')`. If you want symmetry, set both.

Performance
-----------

[](#performance)

Pure PHP. Suitable for typical fuzzy-matching workloads in the range of low thousands of comparisons per second on strings of ~100 characters. If you need C-speed, use the original Python library, or PHP's built-in `levenshtein()` for the unweighted case. Two-row rolling buffer is used for plain Levenshtein; full DP matrices are used for OSA and Damerau-Levenshtein because both need O(m × n) state.

Contributing
------------

[](#contributing)

Contributions welcome. Open an issue or PR. The full QA suite is `composer qa` (PHPUnit + PHPStan level max + PHP-CS-Fixer dry-run). All three must be green for a PR to land.

License
-------

[](#license)

MIT. See [LICENSE](LICENSE). PHP port copyright © 2026 Zach Mowrey. Original Python library copyright © 2016 InfoScout, distributed under the MIT License.

###  Health Score

41

—

FairBetter than 87% of packages

Maintenance95

Actively maintained with recent releases

Popularity0

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity53

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

3

Last Release

25d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/78d8be8ecef323316917188fbf507c31a8bb9c753e10f2e43f8fb7dc74907c97?d=identicon)[zbmowrey](/maintainers/zbmowrey)

---

Top Contributors

[![zbmowrey](https://avatars.githubusercontent.com/u/14931610?v=4)](https://github.com/zbmowrey "zbmowrey (8 commits)")

---

Tags

levenshteindameraudamerau-levenshteinstring-similarityfuzzy matchingedit-distanceosaoptimal-string-alignment

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/zbmowrey-weighted-levenshtein/health.svg)

```
[![Health](https://phpackages.com/badges/zbmowrey-weighted-levenshtein/health.svg)](https://phpackages.com/packages/zbmowrey-weighted-levenshtein)
```

###  Alternatives

[oefenweb/damerau-levenshtein

Get text similarity level with Damerau-Levenshtein distance

42280.7k4](/packages/oefenweb-damerau-levenshtein)[edgaras/strsim

Collection of string similarity and distance algorithms in PHP including Levenshtein, Damerau-Levenshtein, Jaro-Winkler, and more

2843.7k4](/packages/edgaras-strsim)[atomescrochus/laravel-string-similarities

Compare two string and get a similarity percentage

70173.7k2](/packages/atomescrochus-laravel-string-similarities)[nayjest/str-case-converter

Library for converting strings from camel case to snake case and vice versa.

13500.9k14](/packages/nayjest-str-case-converter)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
