PHPackages                             legitphp/hash-money - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Image &amp; Media](/categories/media)
4. /
5. legitphp/hash-money

ActiveLibrary[Image &amp; Media](/categories/media)

legitphp/hash-money
===================

A performance-oriented PHP package for image hashing algorithms using VIPS.

v1.3.1(3w ago)0241[1 issues](https://github.com/mferrara/hash-money/issues)[6 PRs](https://github.com/mferrara/hash-money/pulls)MITPHPPHP ^8.3CI passing

Since Jun 15Pushed 3w agoCompare

[ Source](https://github.com/mferrara/hash-money)[ Packagist](https://packagist.org/packages/legitphp/hash-money)[ Docs](https://github.com/mferrara/hash-money)[ GitHub Sponsors](https://github.com/legitphp)[ RSS](/packages/legitphp-hash-money/feed)WikiDiscussions main Synced today

READMEChangelogDependencies (9)Versions (20)Used By (0)

Hash Money 💰
============

[](#hash-money-)

[![Latest Version on Packagist](https://camo.githubusercontent.com/4c66561525fc60aa7669e3b216cb0223ffce4f336b90020d1f0aff52ae6a0c7c/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6c656769747068702f686173682d6d6f6e65792e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/legitphp/hash-money)[![Tests](https://camo.githubusercontent.com/3187b4c9d732239a21f881774588fa3c2fd13d30ffdf083ad0dc65e744728671/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f6d666572726172612f686173682f72756e2d74657374732e796d6c3f6272616e63683d6d61696e266c6162656c3d7465737473267374796c653d666c61742d737175617265)](https://github.com/mferrara/hash/actions/workflows/run-tests.yml)[![Total Downloads](https://camo.githubusercontent.com/42adbf16fe156ad28e165fc6d0f3434e31ed545a5985377741faddb271bc1aaf/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f6c656769747068702f686173682d6d6f6e65792e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/legitphp/hash-money)[![PHP Version](https://camo.githubusercontent.com/761f4057880007ac7a0f46a0123075cc07409af0000e2733f6b5847bc05252df/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f6c656769747068702f686173682d6d6f6e65793f7374796c653d666c61742d737175617265)](https://packagist.org/packages/legitphp/hash-money)

**Cache rules everything around me.**

Onions? Eggs? What do you like with your hash?

Hash Money
==========

[](#hash-money)

We're serving up a performance-oriented and opinionated collection of similarity hashing algorithms for PHP. Whether you're comparing images, finding duplicates, or measuring how alike things are - we got you covered. We're riding dirty with [php-vips](https://github.com/php-vips/php-vips) for maximum speed. Get your FFI poppin'.

Features
--------

[](#features)

- 🚀 **Multiple Algorithms**: Perceptual (pHash), Difference (dHash), Color Histogram, Mashed, Block Mean, and PDQ hashes
- 🧬 **Composite Hashes**: Concatenate several algorithms into a wider multi-view fingerprint (e.g. 256-bit `pHash + dHash + ColorHistogram + BlockMean`)
- 🔎 **LSH + MIH helpers**: Split hashes into band / chunk keys for indexed similarity search at scale
- 🔒 **Type Safety**: Value objects ensure you can't compare incompatible hashes
- 🎯 **Configurable Bit Sizes**: 8 / 16 / 32 / 64-bit integer hashes, plus wider 128 / 256-bit hashes via `HashValue::fromBytes()`
- ⚡ **High Performance**: Optimized VIPS operations for speed
- 🛠️ **Clean API**: Simple static methods with full IDE support
- 🧩 **Extensible**: Strategy pattern makes adding new algorithms easy

Algorithms
----------

[](#algorithms)

### Perceptual Hash (pHash)

[](#perceptual-hash-phash)

DCT-based algorithm that's robust to scaling, aspect ratio changes, and minor color variations. Best for finding near-duplicate images.

- Uses Discrete Cosine Transform (DCT)
- More computationally intensive but highly accurate
- Excellent for matching images with color/brightness variations
- Based on the work from [VincentChalnot/PerceptualHash](https://github.com/VincentChalnot/perceptual-hash)

### Difference Hash (dHash)

[](#difference-hash-dhash)

Gradient-based algorithm that's faster than pHash and good at detecting similar images. It works by comparing adjacent pixels to encode the image structure.

- Analyzes gradient changes between adjacent pixels
- Faster computation than pHash
- Good for detecting cropped or slightly modified images
- More sensitive to rotation than pHash

### Color Histogram Hash

[](#color-histogram-hash)

Color distribution-based algorithm that captures global color patterns in images. Particularly effective for finding images with similar color palettes.

- Uses HSV color space for robustness to illumination changes
- Quantizes colors into bins (8×4×4 by default)
- Excellent for detecting color-shifted or filtered variants
- Complements spatial hashes by focusing on color information
- **Enhanced bit distribution**: Now uses all 64 bits effectively with proper mixing
- **Improved uniqueness**: Fixed algorithm provides much better hash diversity

### MashedHash 🥔

[](#mashedhash-)

A comprehensive image fingerprint that "mashes" together multiple image characteristics into a single 64-bit hash. This algorithm analyzes 11 different aspects of an image to create a rich signature that captures both content and style.

**Bit Layout (64 bits total):**

- **Bits 0-3**: Colorfulness level (0-15) - Detects grayscale vs vibrant images
- **Bits 4-7**: Edge density (0-15) - Measures detail and texture complexity
- **Bits 8-11**: Entropy/complexity (0-15) - Identifies simple vs complex compositions
- **Bits 12-14**: Aspect ratio class (0-7) - Captures image orientation and format
- **Bit 15**: Border flag - Detects images with uniform borders (common in social media)
- **Bits 16-31**: Color distribution (16 bits) - Analyzes RGB channel characteristics
- **Bits 32-39**: Spatial color layout (8 bits) - Tracks dominant colors by quadrant
- **Bits 40-47**: Brightness pattern (8 bits) - Encodes luminance distribution
- **Bits 48-55**: Texture features (8 bits) - Captures directional patterns
- **Bits 56-59**: Dominant color count (0-15) - Estimates color palette size
- **Bits 60-63**: Special indicators (4 bits) - Flags for text, uniform regions, etc.

**Why use MashedHash?**

- **Rich metadata**: Unlike single-feature hashes, it captures multiple image properties
- **Versatile matching**: Can identify similar images even with different modifications
- **Social media ready**: Detects common edits like borders, filters, and crops
- **Fast comparison**: Despite encoding 11 features, it's still just a 64-bit integer
- **Complementary**: Works best when combined with pHash or dHash for robust matching

**Gray coding note:** Ordinal integer fields (colorfulness, edge density, entropy, aspect ratio, RGB channel levels, brightness, texture, dominant colors) are stored in reflected-binary Gray code so adjacent quantization levels differ by exactly one bit. Reading raw bit fields directly sees the Gray-coded representation; use `MashedHash::decode($hash)` to read semantic values.

### Block Mean Hash

[](#block-mean-hash)

Spatial-domain fingerprint: resize the image to a √bits × √bits grayscale grid, compute the mean luminance of the whole grid, then set one bit per cell based on whether that cell is brighter than the overall mean.

- Supports 8 / 16 / 32 / 64 / 128 / 256-bit output
- Retains "where the bright/dark regions live" through luminance changes and JPEG re-encoding
- Statistically independent from pHash (frequency-domain) and dHash (gradient-domain) — ideal as a 4th chunk in a composite hash

### PDQ Hash

[](#pdq-hash)

Industrial-scale 256-bit perceptual hash from [Meta ThreatExchange](https://github.com/facebook/ThreatExchange/tree/main/pdq). Closely related to pHash but designed for billions-scale matching:

- **Larger output** — 256 bits vs. pHash's 64. Far lower birthday-collision rate at scale; you can store millions of hashes without the false-positive drift you'd see at 64 bits.
- **Quality metric** — alongside the hash, PDQ reports a gradient-derived reliability score in \[0, 100\]. Meta recommends discarding hashes with quality below 50 (uniform/blurry images, where the median threshold isn't meaningful). Accessible via `PdqHash::quality($hash)` or `$hash->getMetadata('pdq_quality')`.
- **Eight dihedral hashes** — `PdqHash::hashesFromFile()` returns all rotation and flip variants in a single pass.
- **Recommended match threshold** — Hamming distance ≤ 31 of 256 bits.
- **Pipeline** — Rec. 601 luminance → two-pass Jarosz 1-D box filter (a tent approximation, Wojciech Jarosz, "Fast Image Convolutions", SIGGRAPH 2001) → 64×64 decimation → 16×16 DCT-II → median threshold.

This is an independent port of Meta's BSD-3-Clause reference. See [LICENSE.md](LICENSE.md) for the third-party notice.

### Composite Hash

[](#composite-hash)

Concatenates several algorithms' 64-bit output into one wider fingerprint with independent signal types in each chunk. The default composition is the 256-bit "quartet" `pHash + dHash + ColorHistogram + BlockMean`:

```
use LegitPHP\HashMoney\CompositeHash;

$composite = CompositeHash::default();
$hash = $composite->hashFromFile('/path/to/image.jpg');

echo $hash->getBits();     // 256
echo $hash->toHex();       // 64 hex chars
echo $hash->getAlgorithm();// "composite:perceptual+dhash+color-histogram+block-mean"
```

Chunk boundaries are preserved in the `HashValue`'s `chunks` metadata so LSH helpers can band each chunk separately. See the [LSH helpers](#scaling-to-large-datasets-lsh--mih) section below.

Requirements
------------

[](#requirements)

- PHP 8.3+ (64-bit required)
- [libvips](https://github.com/libvips/libvips) 8.7+
- [php-vips](https://github.com/libvips/php-vips) extension 2.5+

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require legitphp/hash-money
```

### Installing libvips

[](#installing-libvips)

**Ubuntu/Debian:**

```
sudo apt install libvips-dev
```

**macOS:**

```
brew install vips
```

**Then install the PHP extension:**

```
pecl install vips
```

### Versioning

[](#versioning)

Hash Money follows [semantic versioning](https://semver.org/). Pin with `"legitphp/hash-money": "^1.1"` to take patch releases and additive minor releases automatically while opting in to majors deliberately.

If you're rolling out at scale (~100K+ images), also read [`PRE_BATCH_REVIEW.md`](PRE_BATCH_REVIEW.md) before generating a production dataset — the guide covers version prerequisites, timing calibration, distribution/collision checks, and a go/no-go checklist.

Usage
-----

[](#usage)

### Basic Usage

[](#basic-usage)

```
use LegitPHP\HashMoney\PerceptualHash;
use LegitPHP\HashMoney\DHash;
use LegitPHP\HashMoney\ColorHistogramHash;
use LegitPHP\HashMoney\MashedHash;
use LegitPHP\HashMoney\PdqHash;

// Generate a perceptual hash
$pHash = PerceptualHash::hashFromFile('/path/to/image.jpg');
echo $pHash->toHex(); // e.g., "f0e1d2c3b4a59687"

// Generate a difference hash
$dHash = DHash::hashFromFile('/path/to/image.jpg');
echo $dHash->toBinary(); // e.g., "1010101100110011..."

// Generate a color histogram hash
$colorHash = ColorHistogramHash::hashFromFile('/path/to/image.jpg');
echo $colorHash->toHex(); // e.g., "a1b2c3d4e5f6g7h8"

// Generate a MashedHash (comprehensive fingerprint)
$mHash = MashedHash::hashFromFile('/path/to/image.jpg');
echo $mHash->toHex(); // e.g., "1cf0e2a3b4596d87"

// Generate a PDQ hash (256-bit, with quality score)
$pdqHash = PdqHash::hashFromFile('/path/to/image.jpg');
echo $pdqHash->toHex(); // 64 hex chars
echo PdqHash::quality($pdqHash); // 0-100; Meta recommends discarding < 50

// Compare images
$hash1 = PerceptualHash::hashFromFile('/path/to/image1.jpg');
$hash2 = PerceptualHash::hashFromFile('/path/to/image2.jpg');
$distance = PerceptualHash::distance($hash1, $hash2);

if ($distance hashFromFile('image.jpg');

// 4 bands per chunk → 16 total bucket keys for the 256-bit quartet.
$bucketKeys = Lsh::bandsByChunk($hash, bandsPerChunk: 4);
// [
//   'perceptual'       => [k1, k2, k3, k4],
//   'dhash'            => [k5, k6, k7, k8],
//   'color-histogram'  => [k9, k10, k11, k12],
//   'block-mean'       => [k13, k14, k15, k16],
// ]

// Flat banding (no chunk awareness):
$flatKeys = Lsh::bands($hash, bandCount: 16);
```

Tune bands vs. chunks against your dataset — more bands raises recall but inflates the candidate set. A reasonable starting point for a 256-bit composite is 16 bands (B=16, R=16 → 65,536 buckets per band).

Candidate-filtering is usually done in the database; see [`docs/LARAVEL_BRIDGE.md`](docs/LARAVEL_BRIDGE.md) for a proposed Laravel package that wires all of this (Eloquent cast, migrations, scopes, pluggable MySQL-chunked / MySQL-banded drivers) on top of this library.

### Real-World Examples

[](#real-world-examples)

#### Database Storage Pattern

[](#database-storage-pattern)

Store and retrieve hashes efficiently:

```
// Storing in database
$hash = MashedHash::hashFromFile('product-image.jpg');
$data = [
    'image_id' => 12345,
    'hash_value' => $hash->getValue(),      // Store as BIGINT
    'hash_hex' => $hash->toHex(),          // Store as CHAR(16) for 64-bit
    'algorithm' => $hash->getAlgorithm(),   // Store algorithm type
    'metadata' => json_encode([
        'original_name' => 'product-image.jpg',
        'processed_at' => date('Y-m-d H:i:s')
    ])
];

// Retrieving from database
$row = $db->fetchRow("SELECT * FROM image_hashes WHERE image_id = ?", [12345]);
$hash = new HashValue(
    $row['hash_value'],
    64,
    $row['algorithm'],
    json_decode($row['metadata'], true)
);

// Or use hex value
$hash = HashValue::fromHex($row['hash_hex'], 64, $row['algorithm']);
```

#### API Integration

[](#api-integration)

Send and receive hashes via APIs:

```
// Sending hash data
$hash = ColorHistogramHash::hashFromFile('image.jpg');
$apiPayload = [
    'image_hash' => $hash->toUrlSafeBase64(), // URL-safe for GET requests
    'algorithm' => $hash->getAlgorithm(),
    'bits' => $hash->getBits()
];

$response = $httpClient->post('/api/check-duplicate', [
    'json' => $apiPayload
]);

// Receiving and reconstructing
$data = json_decode($response->getBody(), true);
$receivedHash = HashValue::fromBase64(
    $data['image_hash'],
    $data['bits'],
    $data['algorithm']
);
```

#### Duplicate Detection System

[](#duplicate-detection-system)

Build a complete duplicate detection workflow:

```
class ImageDuplicateDetector
{
    private array $hashDatabase = [];

    public function addImage(string $path): void
    {
        // Generate multiple hash types for robust matching
        $pHash = PerceptualHash::hashFromFile($path);
        $dHash = DHash::hashFromFile($path);
        $mHash = MashedHash::hashFromFile($path);

        // Store with metadata
        $this->hashDatabase[$path] = [
            'perceptual' => $pHash->withMetadata(['path' => $path]),
            'dhash' => $dHash->withMetadata(['path' => $path]),
            'mashed' => $mHash->withMetadata(['path' => $path]),
            'added_at' => time()
        ];
    }

    public function findDuplicates(string $imagePath, int $threshold = 10): array
    {
        $candidates = [];
        $testHashes = [
            'perceptual' => PerceptualHash::hashFromFile($imagePath),
            'dhash' => DHash::hashFromFile($imagePath),
            'mashed' => MashedHash::hashFromFile($imagePath)
        ];

        foreach ($this->hashDatabase as $storedPath => $storedHashes) {
            $scores = [
                'perceptual' => $testHashes['perceptual']->hammingDistance($storedHashes['perceptual']),
                'dhash' => $testHashes['dhash']->hammingDistance($storedHashes['dhash']),
                'mashed' => $testHashes['mashed']->hammingDistance($storedHashes['mashed'])
            ];

            // Weighted scoring
            $totalScore = ($scores['perceptual'] * 2 + $scores['dhash'] + $scores['mashed']) / 4;

            if ($totalScore  $storedPath,
                    'score' => $totalScore,
                    'individual_scores' => $scores
                ];
            }
        }

        // Sort by similarity
        usort($candidates, fn($a, $b) => $a['score']  $b['score']);

        return $candidates;
    }
}
```

Example Scripts and Benchmarks
------------------------------

[](#example-scripts-and-benchmarks)

### Hash Generation Example

[](#hash-generation-example)

The package includes a comprehensive example script for testing hash generation:

```
# Test all algorithms with 64-bit hashes
php example.php

# Test specific algorithm and bit size
php example.php perceptual 32
php example.php dhash 16
php example.php color 64
php example.php all 64
```

Testing
-------

[](#testing)

Run the test suite using Pest:

```
composer test
```

For code formatting:

```
composer format
```

Performance Considerations
--------------------------

[](#performance-considerations)

- **DHash** is typically 2-3x faster than **Perceptual Hash**
- **Color Histogram Hash** is comparable to DHash in speed
- **MashedHash** is slightly slower but provides the richest feature set
- **PDQ** is the slowest of the algorithms (~150–200 ms/image at the default 512×512 working size) — the Jarosz tent filter and 16×16 DCT run in pure PHP, not libvips. Tune via `PdqHash::configure(['workingSize' => 256])` to halve the cost at the expense of less Jarosz blur. Worth the headroom for production near-dup detection where 256-bit signal and quality filtering matter.
- Smaller bit sizes compute faster but may reduce accuracy
- VIPS caching significantly improves performance for batch operations
- The package automatically detects CPU cores for optimal concurrency

Rolling Out at Scale
--------------------

[](#rolling-out-at-scale)

Before generating hashes for a large production dataset (~100K images or more), run the calibration and validation steps in [`PRE_BATCH_REVIEW.md`](PRE_BATCH_REVIEW.md) **inside your consumer project**. The guide covers version prerequisites, timing extrapolation, hash-distribution sanity checks, known-pair validation, failure-mode probes, database/query-plan review, and a go/no-go checklist. It's intended to be executable by either a human operator or an AI coding agent working in the consumer repo, and produces a single report that justifies the decision to run the full batch.

Use Cases
---------

[](#use-cases)

- **Duplicate Detection**: Find exact or near-duplicate images in large collections
- **Content Moderation**: Detect previously flagged images even after modifications
- **Image Organization**: Group similar images automatically
- **Copyright Protection**: Identify unauthorized use of images
- **Quality Control**: Detect corrupted or incorrectly processed images

Choosing the Right Hash
-----------------------

[](#choosing-the-right-hash)

Hash TypeBest ForSpeedKey Features**pHash**Near-duplicate detection, scaled/compressed variantsMediumRobust to compression, scaling, minor edits**dHash**Quick similarity checks, cropped imagesFastGood for crops, sensitive to rotation**ColorHistogram**Color-based matching, filter detectionFastCatches recolored/filtered versions**MashedHash**Reducing false positives (as augmenting signal)Medium11 Gray-coded features, read via `decode()`**BlockMean**Spatial layout fingerprintFastOrthogonal to pHash/dHash, 8/16/…/256-bit**PDQ**Large-scale near-duplicate detection (256-bit)Slower256-bit, gradient-based **quality metric**, 8 dihedral variants**Composite**LSH-friendly multi-view fingerprintMediumChunks carry independent signal types### Recommended Combinations

[](#recommended-combinations)

**For social media images:**

```
// Use MashedHash + pHash for best results
$mHash = MashedHash::hashFromFile($image);
$pHash = PerceptualHash::hashFromFile($image);

if (MashedHash::distance($mHash1, $mHash2) < 20 &&
    PerceptualHash::distance($pHash1, $pHash2) < 12) {
    // High confidence match
}
```

**For copyright detection:**

```
// Use all three spatial/color hashes
$pHash = PerceptualHash::hashFromFile($image);
$dHash = DHash::hashFromFile($image);
$colorHash = ColorHistogramHash::hashFromFile($image);
```

**For large-scale similarity search with LSH:**

```
// One 256-bit composite instead of four separate hashes — lets you
// band-index each chunk and do indexed candidate generation in the DB.
$composite = CompositeHash::default();
$hash = $composite->hashFromFile($image);
$bucketKeys = Lsh::bandsByChunk($hash, bandsPerChunk: 4);
```

**For industrial-scale near-duplicate detection with PDQ:**

```
use LegitPHP\HashMoney\PdqHash;
use LegitPHP\HashMoney\Strategies\PdqHashStrategy;

$hash = PdqHash::hashFromFile($image);
$quality = PdqHash::quality($hash);

// Meta's recommended thresholds
if ($quality < PdqHashStrategy::RECOMMENDED_QUALITY_THRESHOLD) {
    // Image hashes unreliably (uniform / blurry) — skip or fall back to MashedHash.
    return null;
}

if (PdqHash::distance($hash, $candidate)
