PHPackages                             vladan-me/fingerprint - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. vladan-me/fingerprint

ActiveLibrary[Search &amp; Filtering](/categories/search)

vladan-me/fingerprint
=====================

Provides a custom implementation of fingerprint and ngram algorithms in PHP

1.2.2(8y ago)1318731MITPHPPHP &gt;=5.4.0

Since Nov 24Pushed 7y ago2 watchersCompare

[ Source](https://github.com/vladan-me/fingerprint)[ Packagist](https://packagist.org/packages/vladan-me/fingerprint)[ RSS](/packages/vladan-me-fingerprint/feed)WikiDiscussions master Synced today

READMEChangelogDependencies (1)Versions (7)Used By (1)

Fingerprint
===========

[](#fingerprint)

[Fingerprint](https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth#fingerprint) is an algorithm that was developed by Google Refine (later OpenRefine). The (optional) improvement over original algorithm is bolded.

- remove leading and trailing whitespace
- change all characters to their lowercase representation
- remove all punctuation and control characters
- normalize extended western characters to their ASCII representation (for example "gödel" → "godel")
- **apply synonyms**
- **apply removals**
- split the string into whitespace-separated tokens
- sort the tokens and remove duplicates
- join the tokens back together

Transliteration is the slowest part of original algorithm and if you dealing mostly with English language it is a waste of time. The original algorithm has limitations because it misses all synonyms and removals. Synonyms and removals are based on English language so it has limited appliance in languages other than English. Consider titles like:

- VP Sales and Marketing
- Vice President Marketing &amp; Sales
- Vice President of Sales and Marketing
- Vice President - Sales and Marketing ... (+100 more ways to write that title, literally)

Use cases
---------

[](#use-cases)

- Simple and fast clustering of data.
- Standardization and grouping similar values in the database.
- Situations where you have users typing city/company/street/title in so many ways and you're slowly dying inside with so many combinations...

Documentation
-------------

[](#documentation)

Initialize Fingerprint type and pass it as a parameter in Fingerprint.

```
    $type = new FingerprintType();
    $string1 = 'Quick brown fox jumps over lazy dog';
    $fp = new Fingerprint($string1, $type);
    $fingerprintResult1 = $fp->fingerprint();
    // Outputs 'brown dog fox jumps lazy over quick'.

    $string2 = 'qUick Brown FOX jumps over lazy dog.';
    $fp = new Fingerprint($string2, $type);
    $fingerprintResult2 = $fp->fingerprint();
    // Outputs 'brown dog fox jumps lazy over quick'.
    // Also $fingerpintResult1 == $fingerprintResult2
```

More advanced usage is for specific types, for example:

```
    $type = new City();
    $string1 = 'Elk Grove Vlg';
    $fp = new Fingerprint($string1, $type);
    // Include all available synonyms for city type.
    $fp->includeAllSyn();
    $fingerprintResult1 = $fp->fingerprint();
    // Outputs 'elk grove village'.

    $string2 = '/Elk Grove Village';
    $fp = new Fingerprint($string2, $type);
    // Include all available synonyms for city type.
    $fp->includeAllSyn();
    $fingerprintResult2 = $fp->fingerprint();
    // Outputs 'elk grove village'.
    // Also $fingerpintResult1 == $fingerprintResult2
```

Please look at tests for common usage.

Synonyms and Removals
---------------------

[](#synonyms-and-removals)

They are broken down in two categories, basic synonyms/removals that have the most common ones and all other possible combinations that can be heavier for computation. For the fastest usage, you don't need all synonyms/removals. All of them are handpicked based on a clusters from large dataset. Of course, there are a lot more but only ones that make sense are listed. In some cases there are synonyms and removals in the same time, for example, for Company type:

'corp' first becomes 'corporation' and then is removed completely.

System Requirements
-------------------

[](#system-requirements)

You need **PHP &gt;= 5.4.0**.

Install
-------

[](#install)

Install `fingerprint` using Composer.

```
$ composer require vladan-me/fingerprint

```

Additional Notes
----------------

[](#additional-notes)

There's another package named [fingerprint-elasticsearch](https://github.com/vladan-me/fingerprint-elasticsearch) that fully prepares Elasticsearch analyzer and filters to use this version of fingerprint algorithm. This project currently also has ngram implementation that should likely be separated at some point.

Contributing
------------

[](#contributing)

Contributions are welcome and will be fully credited. Please see [CONTRIBUTING](.github/CONTRIBUTING.md) and [CONDUCT](CONDUCT.md) for details.

License
-------

[](#license)

The MIT License (MIT). Please see [LICENSE](LICENSE) for more information.

###  Health Score

31

—

LowBetter than 68% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity19

Limited adoption so far

Community14

Small or concentrated contributor base

Maturity62

Established project with proven stability

 Bus Factor1

Top contributor holds 92.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

6

Last Release

3088d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/8fa0c910c52a41647446049ef065fab1a563d859d8c2a688eec79ff0099a57cb?d=identicon)[vladan-me](/maintainers/vladan-me)

---

Top Contributors

[![vladan-me](https://avatars.githubusercontent.com/u/6961430?v=4)](https://github.com/vladan-me "vladan-me (13 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")

---

Tags

elasticsearchFingerprintngram

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/vladan-me-fingerprint/health.svg)

```
[![Health](https://phpackages.com/badges/vladan-me-fingerprint/health.svg)](https://phpackages.com/packages/vladan-me-fingerprint)
```

###  Alternatives

[elasticsearch/elasticsearch

PHP Client for Elasticsearch

5.3k178.3M943](/packages/elasticsearch-elasticsearch)[matchish/laravel-scout-elasticsearch

Search among multiple models with ElasticSearch and Laravel Scout

7431.6M2](/packages/matchish-laravel-scout-elasticsearch)[opensearch-project/opensearch-php

PHP Client for OpenSearch

15024.3M65](/packages/opensearch-project-opensearch-php)[mailerlite/laravel-elasticsearch

An easy way to use the official PHP ElasticSearch client in your Laravel applications.

934529.3k2](/packages/mailerlite-laravel-elasticsearch)[babenkoivan/elastic-scout-driver

Elasticsearch driver for Laravel Scout

2773.8M5](/packages/babenkoivan-elastic-scout-driver)[babenkoivan/elastic-scout-driver-plus

Extension for Elastic Scout Driver

2862.8M1](/packages/babenkoivan-elastic-scout-driver-plus)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
