PHPackages                             rkr/data-diff - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. rkr/data-diff

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

rkr/data-diff
=============

A handy tool for comparing structured data quickly in a key-value manner

0.4.4(2mo ago)42.1k↓33.3%1MITPHPPHP &gt;= 8.1

Since Apr 13Pushed 2mo ago2 watchersCompare

[ Source](https://github.com/rkrx/php-data-diff)[ Packagist](https://packagist.org/packages/rkr/data-diff)[ Docs](https://github.com/rkrx/php-data-diff)[ RSS](/packages/rkr-data-diff/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (4)Versions (66)Used By (1)

data-diff
=========

[](#data-diff)

[![Scrutinizer Code Quality](https://camo.githubusercontent.com/5e383a669d42e9ea35c049a5545713005cafbec95e81d8424348756950da4791/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f726b72782f646174612d646966662f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/rkrx/data-diff/?branch=master)[![Latest Stable Version](https://camo.githubusercontent.com/0c7b305b1b35b62e618bd131e0c4f3f3f0382282f1b8db179fb10118d2711b5a/68747470733a2f2f706f7365722e707567782e6f72672f726b722f646174612d646966662f762f737461626c65)](https://packagist.org/packages/rkr/data-diff)[![License](https://camo.githubusercontent.com/3fb04a024303209461d431d54310fe62051e9d0180c4aac0f2e14a1b53e1cda8/68747470733a2f2f706f7365722e707567782e6f72672f726b722f646174612d646966662f6c6963656e7365)](https://packagist.org/packages/rkr/data-diff)

A handy tool for comparing structured data quickly in a key-value manner

composer
--------

[](#composer)

[See here](https://packagist.org/packages/rkr/data-diff)

Support for PHPStan
-------------------

[](#support-for-phpstan)

Add the following to your `phpstan.neon` file:

```
includes:
	- vendor/rkr/data-diff/extension.neon
```

WTF
---

[](#wtf)

This component is useful if you have a large amount of structured data to import into a local database and you want to identify changes without overwriting everything on each run. Instead, you can determine what has actually changed and take appropriate actions.

Usage
-----

[](#usage)

Initially, you have two two-dimensional data lists that you want to compare. Typically, some columns in such a data list indicate the actual differences in terms of new and missing rows. Other columns may indicate changes in existing rows. Additionally, some columns may not trigger any actions but their data could be necessary for subsequent processing.

For example, consider having some article metadata from an external data source that you would like to import into a local database. The external data should be imported into the local database, and you want to take action whenever a dataset is added, removed, or changed (e.g., logging).

External Data:

```
name;reference;price;stock;last-change
Some Notebook;B0001;1499.90;1254;2016-04-01T10:00:00+02:00
A Hairdryer;C0001;49.95;66;2016-04-01T10:00:00+02:00
A Pencil;D0001;2.9499;2481;2016-04-01T10:00:00+02:00

```

Local data:

```
name;reference;price;stock
A shiny Smartphone;A0001;519.99;213
A Hairdryer;C0001;49.95;12
A Pencil;D0001;2.95;2481

```

Each list contains three data rows. Both lists have a row that is not present in the other list, and the only common rows (`A Hairdryer;C0001` and `A Pencil;D0001`) exhibit differences in the `price` and `stock` columns, while the `name` column remains identical. The `current-datetime` column should not be compared, but it should be present in case of an insertion or update. The primary objective is to synchronize all changes from the external data source to the local database. Although it might be important to track changes in the `current-datetime` column while other columns remain unchanged, this example demonstrates how to handle a scenario where this is not a priority.

The comparison result is derived by comparing two distinct key-value lists. The comparison involves three methods to identify added keys, missing keys, and changed data where keys are equal. To achieve this, it is essential to determine whether a particular row was added, removed, or changed. This task can be complex and depends on the specific data. In this example, certain rules are established, which may vary in different scenarios.

In this example, only the `reference` column is used to determine if a row is new or has been removed. For instance, the local database contains a reference to an article `A0001` that is not present in the external data, necessitating its removal from the local data. Conversely, `B0001` is absent in the local data and should be added. The *Hairdryer* has a different stock, and the *Pencil* has a slightly different price. Since prices are stored locally with a decimal precision of two, the two pencil prices are considered equal, and the comparison should not report a change for the row `D0001`.

First, it is necessary to define what constitutes a key and a value for the `Storage` to understand the key-value list schema. The data is already in the correct format, so no transformation is required.

So, let's give some meaning to the columns:

- The `reference` column indicates whether a particular row is present or not. This serves as the unique identifier for each row. A row may have more than one identifier column (such as `reference` and `environment-id`), but in this case, there is only one identifier.
- The `name` column should only be considered when a row is already present in the other list.
- The `price` column should only be considered when a row is already present in the other list.
- The `stock` column should only be considered when a row is already present in the other list.
- The `last-change` column should not be checked at all.

Therefore, when constructing a key-value array for comparison, the key part is composed of the `reference` column, and the value part is represented by the `name`, `price`, and `stock` columns.

The key-value array of the first list would then appear as follows:

```
'B0001' => ['Some Notebook', 1499.90, 1254]
'C0001' => ['A Hairdryer', 49.95, 66]
'D0001' => ['A Pencil', 2.9499, 2481]

```

The key-value-array of the second-list would look like this:

```
'A0001' => ['A shiny Smartphone', 519.99, 213]
'C0001' => ['A Hairdryer', 49.95, 12]
'D0001' => ['A Pencil', 2.95, 2481]

```

Now, let's compare those arrays in three distinct ways:

What rows are present in the first list, but not in the second:

```
'B0001' => ['Some Notebook', 1499.90, 1254]

```

What rows are present in the second list, but not in the first:

```
'A0001' => ['A shiny Smartphone', 519.99, 213]

```

What rows are present in the first list, but have changed values compared to the second list?

```
'C0001' => ['A Hairdryer', 49.95, 66]
'D0001' => ['A Pencil', 2.9499, 2481]

```

You now have all the necessary information to identify the differences between the two lists.

Consider a special case: the pencil has a price of `2.9499` in the first list. However, since we only compare prices with a decimal precision of two, the prices are effectively identical, as the computed price for `D0001` is `2.95` in both cases. This is where the `Schema` component becomes relevant.

When defining a `MemoryDiffStorage`, you specify two schemas: one for the key part and one for the value part:

```
