PHPackages                             white-rabbit-1-sketch/php-file-hash-map - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Database &amp; ORM](/categories/database)
4. /
5. white-rabbit-1-sketch/php-file-hash-map

ActiveLibrary[Database &amp; ORM](/categories/database)

white-rabbit-1-sketch/php-file-hash-map
=======================================

This PHP library implements a simple Hash Map with file-based storage. It uses a hashing function to calculate an offset, which determines the location of the corresponding bucket in the file. Data is stored in a binary file, with each bucket containing key-value pairs. The library allows for efficient data retrieval through direct file access.

1.0.0(1y ago)182.1k↓29.2%MITPHPPHP ^8.1

Since Nov 3Pushed 1y ago1 watchersCompare

[ Source](https://github.com/white-rabbit-1-sketch/php-file-hash-map)[ Packagist](https://packagist.org/packages/white-rabbit-1-sketch/php-file-hash-map)[ RSS](/packages/white-rabbit-1-sketch-php-file-hash-map/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (1)Versions (2)Used By (0)

Php File Hash Map
=================

[](#php-file-hash-map)

[![Latest Version](https://camo.githubusercontent.com/6359de46a97a6dbcd8d0298d1bdd4a89878c67554155efd4588c4230722e8e81/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f77686974652d7261626269742d312d736b657463682f7068702d66696c652d686173682d6d6170)](https://camo.githubusercontent.com/6359de46a97a6dbcd8d0298d1bdd4a89878c67554155efd4588c4230722e8e81/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f77686974652d7261626269742d312d736b657463682f7068702d66696c652d686173682d6d6170)[![Phpunit](https://camo.githubusercontent.com/6b1ad7ac917deba259562e348d98683908d65d02c814ed4ffff6e2ca18bf676c/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f77686974652d7261626269742d312d736b657463682f7068702d66696c652d686173682d6d61702f2e676974687562253246776f726b666c6f7773253246706870756e69742e796d6c)](https://camo.githubusercontent.com/6b1ad7ac917deba259562e348d98683908d65d02c814ed4ffff6e2ca18bf676c/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f77686974652d7261626269742d312d736b657463682f7068702d66696c652d686173682d6d61702f2e676974687562253246776f726b666c6f7773253246706870756e69742e796d6c)[![codecov](https://camo.githubusercontent.com/bd05df4c21f5fa9c34ac8306d019d07bb98f8bb1f7e120ae63025db1c37c3a43/68747470733a2f2f636f6465636f762e696f2f6769746875622f77686974652d7261626269742d312d736b657463682f7068702d66696c652d686173682d6d61702f67726170682f62616467652e7376673f746f6b656e3d33544a39474c344f4153)](https://codecov.io/github/white-rabbit-1-sketch/php-file-hash-map)

[![Banner](readme/assets/img/banner.webp)](readme/assets/img/banner.webp)

`PhpFileHashMap` is a PHP implementation of a file-based hash map that stores key-value pairs in a binary file. The hash map operates on a file system level, which makes it suitable for handling large amounts of data with minimal memory usage. This implementation allows persisting hash map data to a file while providing standard hash map operations like `set`, `get`, `remove`, and more.

Features
--------

[](#features)

- **Persistent storage**: The hash map data is stored in a binary file, allowing data persistence even after the script execution ends.
- **Efficient memory usage**: Uses file storage to manage large datasets with low memory overhead.
- **Basic hash map operations**: Supports key-value insertion, retrieval, deletion, existence checks, and iteration.
- **Collision handling**: The hash map handles collisions by chaining multiple buckets in the file.

Warning!
--------

[](#warning)

This is not a data storage solution and was never intended to be used as one. Essentially, it is an implementation of the hash map data structure with data stored on disk, and its current applicability is specifically within this context. But of course, you can use it as storage if it suits your task and you understand all the nuances.

⭐️ Star the Project
-------------------

[](#️-star-the-project)

If you found this project useful, please consider giving it a star! 🌟 Your support helps improve the project and motivates us to keep adding new features and improvements. Thank you! 🙏

Table of Contents
-----------------

[](#table-of-contents)

- [Php File Hash Map](#php-file-hash-map)
- [Features](#features)
- [Performance Benchmarks](#performance-benchmarks)
- [Installation](#installation)
- [Usage](#usage)
    - [Creating a Hash Map](#creating-a-hash-map)
    - [Adding Data](#adding-data)
    - [Retrieving Data](#retrieving-data)
    - [Removing Data](#removing-data)
    - [Checking for Key Existence](#checking-for-key-existence)
    - [Counting Active Buckets](#counting-active-buckets)
    - [Iterating Over Keys and Values](#iterating-over-keys-and-values)
    - [Clearing the Hash Map](#clearing-the-hash-map)
- [Nuances and Performance Considerations](#nuances-and-performance-considerations)
    - [Recommended Hash Map Size](#recommended-hash-map-size)
    - [Data File and Custom Location](#data-file-and-custom-location)
    - [Defragmentation](#defragmentation)
    - [Serialization](#serialization)
        - [Serialization Override](#serialization-override)
        - [Serialization of Closures](#serialization-of-closures)
- [Restrictions](#restrictions)
    - [Concurrent Access](#1-concurrent-access)
    - [Distributed Systems](#2-distributed-systems)
- [Why Choose This Library Over SQLite?](#why-choose-this-library-over-sqlite)
- [Data File Structure](#data-file-structure)
    - [Map Index Section](#1-map-index-section)
    - [Heap Section](#2-heap-section)
    - [File Layout Example](#file-layout-example)
- [Author and License](#author-and-license)

Performance Benchmarks
----------------------

[](#performance-benchmarks)

The performance of this file-based hash map may vary depending on the system configuration and the number of elements. On my MacBook Air M2, the hash map performed as follows (single thread):

- File Hash Map: 140k writes, 280-700k reads (depends on data/buffering)
- Redis: 25k writes, 20k reads
- Memcached: 24k writes, 30k reads
- MySQL with Hash Index: 6k writes, 15k reads
- Aerospike: 5k writes, 5k reads

Installation
------------

[](#installation)

You can install `PhpFileHashMap` via Composer by adding the following to your `composer.json`:

```
composer require white-rabbit-1-sketch/php-file-hash-map
```

Usage
-----

[](#usage)

### Creating a Hash Map

[](#creating-a-hash-map)

```
use PhpFileHashMap\FileHashMap;

$hashMap = new FileHashMap(256); // Creates a hash map with a size of 256 buckets
```

### Adding Data

[](#adding-data)

```
$hashMap->set('key1', 'value1');
$hashMap->set('key2', 'value2');
```

### Retrieving Data

[](#retrieving-data)

```
$value = $hashMap->get('key1');
echo $value; // Outputs 'value1'
```

### Removing Data

[](#removing-data)

```
$hashMap->remove('key2');
```

### Checking for Key Existence

[](#checking-for-key-existence)

```
if ($hashMap->has('key2')) {
    echo "Key exists!";
} else {
    echo "Key does not exist!";
}
```

### Counting Data

[](#counting-data)

```
echo $hashMap->count(); // Outputs the number of active buckets
```

### Iterating Over Keys and Values

[](#iterating-over-keys-and-values)

```
// Iterating over keys
foreach ($hashMap->keys() as $key) {
    echo $key . "\n";
}

// Iterating over values
foreach ($hashMap->values() as $value) {
    echo $value . "\n";
}
```

Both the `keys()` and `values()` methods iterate over all the elements in the hash map and return the respective keys and values.

- **`keys()`**: Returns an iterator of all the keys in the hash map.
- **`values()`**: Returns an iterator of all the values in the hash map.

Both of these methods require scanning the entire hash map, including both the Map Index Section and the Heap Section, to collect the keys or values. This means that they need to read through all buckets, including any inactive (deleted) ones, and this can be **resource-intensive in terms of time** if the hash map contains a large number of entries or deleted buckets.

However, it’s important to note that these operations **are not memory-intensive**. Since the methods use **generators**, they do not load all keys or values into memory at once, making them **efficient in terms of memory usage**. Only one key or value is held in memory at a time during iteration.

### Clearing the Hash Map

[](#clearing-the-hash-map)

```
$hashMap->clear(); // Removes all keys and values
```

Nuances and Performance Considerations
--------------------------------------

[](#nuances-and-performance-considerations)

This file-based hash map efficiently resolves collisions by utilizing chaining (linked lists of buckets). However, as the number of collisions increases, the performance may degrade. This degradation becomes particularly noticeable during write operations (insertion and deletion). Therefore, to ensure optimal performance, it is recommended to keep the hash map at a reasonable size relative to the expected number of elements.

#### Recommended Hash Map Size

[](#recommended-hash-map-size)

For best performance, the size of the hash map should be chosen based on the estimated number of elements you plan to store. A good rule of thumb is to set the map size to a value that is roughly **1.5 to 2 times larger than the expected number of elements**. This helps reduce the likelihood of collisions and ensures fast access times.

For example:

- For up to 10,000 elements, consider a map size of 16,000 to 20,000.
- For up to 100,000 elements, aim for a map size of 150,000 to 200,000.

By keeping the number of collisions low, you maintain fast read and write speeds, especially in the case of write-heavy workloads.

#### Data File and Custom Location

[](#data-file-and-custom-location)

By default, the hash map automatically creates a data file in the system's temporary directory. This file is used to store the hash map's data persistently.

- **Default behavior**: The file will be created in the temporary directory (e.g., `/tmp` on Unix-based systems).
- **Customization**: You can override this behavior and specify your own file location by providing a custom file path when constructing the hash map instance.

```
use PhpFileHashMap\FileHashMap;

$hashMap = new FileHashMap(256, destroyDataFileOnShutdown: true); // Deletes the file on shutdown
```

#### Defragmentation

[](#defragmentation)

When keys are removed from the hash map, the corresponding buckets are not physically deleted from the file. Instead, they are marked as deleted. This is done to avoid the performance cost of file operations, as physically deleting data would require shifting the file contents, which can be expensive.

However, over time, especially with many deletions, the file may accumulate a significant number of deleted buckets, which could reduce performance. In such cases, it is advisable to perform **defragmentation** to reclaim space and optimize the file layout.

The `defrag()` method reorganizes the entire hash map file by recalculating the entire structure from scratch. This includes removing any deleted buckets and restructuring the map for better performance.

```
$hashMap->defrag();
```

Note: Defragmentation is a resource-intensive operation, especially for large hash maps, as it requires reading and rewriting the entire file. Therefore, it should be used carefully and ideally not too frequently.

#### Serialization

[](#serialization)

By default, this hash map uses PHP's built-in `serialize()` and `unserialize()` functions to handle the serialization of values stored in the map. This allows you to store any PHP data type, including objects, arrays, and other complex structures.

##### Serialization Override

[](#serialization-override)

The default methods for serializing and unserializing data are:

```
protected function serialize(mixed $data): string
{
    return serialize($data);
}

protected function unserialize(string $data): mixed
{
    return unserialize($data);
}
```

These methods can be easily overridden if you need to use a different serialization format (e.g., JSON, MessagePack, etc.) or a custom approach. By overriding these methods, you can control how data is converted before being stored in the hash map and after being retrieved.

##### Serialization of Closures

[](#serialization-of-closures)

To handle this, you can use the **`opis/closure`** library to serialize and unserialize closures.

To enable serialization of closures, you need to install the **`opis/closure`** library. This can be done via Composer:

```
composer require opis/closure
```

Once the library is installed, you can easily customize the serialization and unserialization methods of your hash map to handle closures. Here's an example of how to do it:

```
use Opis\Closure\SerializableClosure;

class FileHashMapWithClosures extends FileHashMap
{
    // Override the serialize method to handle closures
    protected function serialize(mixed $data): string
    {
        return \Opis\Closure\serialize($data);
    }

    // Override the unserialize method to handle closures
    protected function unserialize(string $data): mixed
    {
        return \Opis\Closure\unserialize($data);
    }
}
```

Restrictions
------------

[](#restrictions)

When using `PhpFileHashMap`, keep in mind the following limitations due to its file system-based storage:

### 1. Concurrent Access

[](#1-concurrent-access)

If multiple processes attempt to access the same hash map file simultaneously, race conditions may occur. To ensure data integrity, **you must implement locking mechanisms** when working with the same file in parallel processes.

Locking is intentionally **not implemented in this library** to keep it lightweight and to give developers the freedom to choose their preferred locking strategy. Examples of possible solutions include:

- Using PHP's `flock()` function for file-level locks.
- Implementing inter-process locks via shared memory or database-backed mutexes.

### 2. Distributed Systems

[](#2-distributed-systems)

This library does not handle synchronization in distributed environments. If you need to share the same hash map file across multiple machines, synchronization must be implemented externally.

Examples of how this can be addressed:

- Use a distributed file system (e.g., NFS, GlusterFS) with appropriate locking mechanisms.
- Employ a coordination service like **Zookeeper** for managing access and updates.

These restrictions are by design to maintain the simplicity and portability of `PhpFileHashMap`, leaving implementation details of complex infrastructure to the developer.

Why Choose This Library Over SQLite?
------------------------------------

[](#why-choose-this-library-over-sqlite)

- **Performance**: This library outperforms SQLite in terms of raw speed. Benchmark tests show it can handle **700,000 reads** and **140,000 writes** per second, while SQLite is limited to **70,000 reads** and **4,000 writes** per second. This makes it a better choice for high-performance applications that require fast access to key-value data.
- **Lightweight**: Unlike SQLite, which includes a full relational database engine, this library focuses purely on key-value storage. This minimalism reduces latency and avoids the overhead associated with SQL parsing and transaction management, making it faster and more efficient for simple use cases.
- **No Database Overhead**: SQLite is designed for relational data storage and comes with features that aren't needed for basic key-value storage. If all you need is a fast, persistent key-value store, this library eliminates the complexities of relational databases and provides a streamlined solution.
- **Customization and Control**: With this library, you have full control over the storage and retrieval logic. You can tailor it to meet your specific needs without being constrained by the rigid structure and limitations of SQLite.

In summary, if you need a **high-performance, simple key-value storage solution** without the overhead of a full-fledged database engine, this library offers a more optimized, flexible, and customizable alternative to SQLite.

Data File Structure
-------------------

[](#data-file-structure)

The file structure of the hash map is designed to efficiently manage large amounts of data. It consists of two main sections: the **Map Index Section** and the **Heap Section**.

#### 1. **Map Index Section**

[](#1-map-index-section)

The Map Index Section is located at the beginning of the file. It contains a series of integers, each representing the offset of a bucket in the Heap Section. The number of entries in the index is equal to the number of buckets in the hash map.

- **Size**: `$mapSize * INT_SIZE`
- **Format**: The section contains a list of integers, each corresponding to the offset of a bucket in the heap.
    - Example: If you have a hash map with 256 buckets, this section will consist of 256 integers.

#### 2. **Heap Section**

[](#2-heap-section)

The Heap Section contains all the actual data for the hash map’s buckets. Each bucket is a block of data that includes the following elements:

- **Bucket State (INT)**: An integer representing the state of the bucket. A value of `1` indicates that the bucket is active, and `0` indicates that the bucket is deleted.
- **Next Bucket Pointer (P)**: A pointer (offset in the heap) to the next bucket in the chain (used for handling collisions).
- **Key Size (INT)**: The size of the key in bytes.
- **Key (string)**: The key itself.
- **Value Size (INT)**: The size of the value in bytes.
- **Value (serialized data)**: The serialized value associated with the key.

Additionally, the Heap Section begins with two integers that hold the following data:

- **Active Bucket Count (INT)**: The number of active (non-deleted) buckets in the hash map.
- **Deleted Bucket Count (INT)**: The number of deleted buckets in the hash map.

The rest of the heap consists of individual buckets, which contain the serialized data for each key-value pair.

#### File Layout Example

[](#file-layout-example)

```
+-----------------------------------------------+
|                   File                        |
+-----------------------------------------------+
| First part: $mapSize * INT_SIZE (offset cells)|
+-----------------------------------------------+
| $mapSize INT cells, each containing an offset |
| to the heap area (each offset is 8 bytes)     |
|  - Cell 0: Offset for bucket 0                |
|  - Cell 1: Offset for bucket 1                |
|  - Cell 2: Offset for bucket 2                |
|  ...                                          |
|  - Cell X: Offset for bucket X                |
+-----------------------------------------------+
| Next: Heap area                               |
+-----------------------------------------------+
| [Heap]                                        |
|  - First two INT values:                      |
|      - Active bucket count (INT)              |
|      - Deleted bucket count (INT)             |
|  - Bucket data:                               |
|      +-----------------------------------+    |
|      | Bucket 1                          |    |
|      +-----------------------------------+    |
|      | - State (deleted or active) (INT) |    |
|      | - Next bucket (heap offset) (INT) |    |
|      | - Key size (INT)                  |    |
|      | - Key (string)                    |    |
|      | - Value size (INT)                |    |
|      | - Value (serialized data)         |    |
|      +-----------------------------------+    |
|      | Bucket 2                          |    |
|      +-----------------------------------+    |
|      |   ...                             |    |
|      +-----------------------------------+    |
+-----------------------------------------------+

```

Author and License
------------------

[](#author-and-license)

**Author**: Mikhail Chuloshnikov

**License**: MIT License

This library is released under the MIT License. See the [LICENSE](LICENSE) file for more details.

###  Health Score

34

—

LowBetter than 77% of packages

Maintenance39

Infrequent updates — may be unmaintained

Popularity29

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity47

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

562d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/974f39a385427916d7794cf48488e6deb885ba88e88ba09fca1c4ae79e3c428b?d=identicon)[white-rabbit-1-sketch](/maintainers/white-rabbit-1-sketch)

---

Top Contributors

[![white-rabbit-1-sketch](https://avatars.githubusercontent.com/u/1744707?v=4)](https://github.com/white-rabbit-1-sketch "white-rabbit-1-sketch (3 commits)")

---

Tags

algorithmsdatadatabasesmemorynosqlnosql-databasephpstructure

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/white-rabbit-1-sketch-php-file-hash-map/health.svg)

```
[![Health](https://phpackages.com/badges/white-rabbit-1-sketch-php-file-hash-map/health.svg)](https://phpackages.com/packages/white-rabbit-1-sketch-php-file-hash-map)
```

###  Alternatives

[doctrine/orm

Object-Relational-Mapper for PHP

10.2k285.3M6.2k](/packages/doctrine-orm)[jdorn/sql-formatter

a PHP SQL highlighting library

3.9k115.1M102](/packages/jdorn-sql-formatter)[illuminate/database

The Illuminate Database package.

2.8k52.4M9.4k](/packages/illuminate-database)[ramsey/uuid-doctrine

Use ramsey/uuid as a Doctrine field type.

90440.3M211](/packages/ramsey-uuid-doctrine)[reliese/laravel

Reliese Components for Laravel Framework code generation.

1.7k3.4M16](/packages/reliese-laravel)[wildside/userstamps

Laravel Userstamps provides an Eloquent trait which automatically maintains `created\_by` and `updated\_by` columns on your model, populated by the currently authenticated user in your application.

7511.7M13](/packages/wildside-userstamps)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
