PHPackages                             szopen/similarity - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. szopen/similarity

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

szopen/similarity
=================

A similarity library for string and date matching

v1.0.1(7mo ago)1343—6.7%MITPHPPHP &gt;=8.2CI passing

Since Nov 12Pushed 7mo agoCompare

[ Source](https://github.com/LeandroLuccerini/similarity)[ Packagist](https://packagist.org/packages/szopen/similarity)[ RSS](/packages/szopen-similarity/feed)WikiDiscussions main Synced yesterday

READMEChangelog (2)Dependencies (3)Versions (4)Used By (0)

Similarity
==========

[](#similarity)

A lightweight PHP library for measuring the similarity between strings and dates with flexible normalization, transliteration, and fuzzy comparison strategies.

Overview
--------

[](#overview)

The **Similarity** package provides utilities to compare textual or date-based inputs and determine how closely they match.
It’s designed to be **extensible**, **locale-aware**, and **safe for fuzzy matching** — making it ideal for use cases such as data deduplication, data cleaning, or record linkage.

The core idea is that different types of data (like names, text, or dates) require different similarity strategies, so the library offers multiple specialized classes.

---

Main Components
---------------

[](#main-components)

### `StringFuzzySimilarity`

[](#stringfuzzysimilarity)

Compares two strings with a fuzzy matching algorithm.
Useful when dealing with typos, transliteration differences, or minor formatting variations.

- **Normalization**: Removes punctuation, trims whitespace, and can optionally transliterate text to Latin.
- **Algorithm**: Combines `similar_text` and normalized Levenshtein distance.
- **Use case**: Matching names like `José` vs `Jose`, or `McDonald’s` vs `McDonalds`.

```
$similarity = new StringFuzzySimilarity(
                new StringNormalizer(
                  new TransliteratorFactory()
                )
              );
$result = $similarity->similarity('José García', 'Jose Garcia'); // e.g., 0.97
```

---

### `DateFuzzySimilarity`

[](#datefuzzysimilarity)

Compares two date values even if they use different formats or delimiters.

- **Normalization**: Converts supported formats (e.g., `YYYY-MM-DD`, `DD/MM/YYYY`, `MM.DD.YYYY`) into a canonical form.
- **Algorithm**: Computes a similarity score based on date component proximity (e.g., days, months, years).
- **Use case**: Matching `12-03-1990` and `1990/03/12` as the same date.

```
$similarity = new DateFuzzySimilarity(
                new DateFuzzySimilarityConfiguration(
                  new DatePartsWeights(),
                  new DateDiffPenalty()
                ),
                new DateNormalizer()
              );
$result = $similarity->similarity('1990-03-12', '12/03/1990'); // 1.0
```

---

### `StringExactSimilarity`

[](#stringexactsimilarity)

Compares two strings for an **exact match** after normalization.

- **Normalization**: Cleans strings but does not introduce fuzziness.
- **Algorithm**: Returns `1.0` if normalized strings are identical, otherwise `0.0`.
- **Use case**: Validating IDs, codes, or fields that must match exactly.

```
$similarity = new StringExactSimilarity();
$result = $similarity->similarity('ABC123', 'abc123'); // 1.0
```

---

Factory Usage Example
---------------------

[](#factory-usage-example)

The library provides a simple factory for creating the right similarity strategy depending on the data type or context.

```
use Szopen\Similarity\SimilarityFactory;

$factory = new SimilarityFactory(
            new DateFuzzySimilarityConfiguration(
              new DatePartsWeights(),
              new DateDiffPenalty(),
            )
        );

// Automatically selects a suitable comparator
$stringSim = $factory->create(SimilarityFactory::STRING_FUZZY);
$dateSim = $factory->create(SimilarityFactory::DATE_FUZZY);

// Compute similarity
echo $stringSim->similarity('Leandro', 'Leandor'); // e.g., 0.9
echo $dateSim->similarity('2023-11-12', '12.11.2023'); // 1.0
```

You can extend or customize the factory to add your own similarity strategies.

---

Installation
------------

[](#installation)

```
composer require szopen/similarity
```

---

Requirements
------------

[](#requirements)

- PHP 8.2+
- `ext-intl` (recommended for proper transliteration)
- `ext-iconv` (recommended as fallback from `ext-intl`)

---

License
-------

[](#license)

This project is licensed under the [MIT LICENSE](https://opensource.org/license/mit).

---

Contributing
------------

[](#contributing)

Contributions are welcome!
Please open an issue or submit a pull request if you’d like to add new normalization strategies or similarity metrics.

###  Health Score

37

—

LowBetter than 81% of packages

Maintenance62

Regular maintenance activity

Popularity17

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity50

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

232d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/d929d9daafbb5705148080ce11f24de9f7f123db0502d2aa2a65297558a5bf5c?d=identicon)[leandro.luccerini](/maintainers/leandro.luccerini)

---

Top Contributors

[![LeandroLuccerini](https://avatars.githubusercontent.com/u/7492724?v=4)](https://github.com/LeandroLuccerini "LeandroLuccerini (21 commits)")

---

Tags

levenshteinsimilarity

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/szopen-similarity/health.svg)

```
[![Health](https://phpackages.com/badges/szopen-similarity/health.svg)](https://phpackages.com/packages/szopen-similarity)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
