PHPackages                             jeroen/json-dump-reader - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Debugging &amp; Profiling](/categories/debugging)
4. /
5. jeroen/json-dump-reader

ActiveLibrary[Debugging &amp; Profiling](/categories/debugging)

jeroen/json-dump-reader
=======================

Provides line-by-line readers and iterators for Wikibase/Wikidata JSON dumps

2.0.0(7y ago)7512.8k↓68.3%18[2 issues](https://github.com/JeroenDeDauw/JsonDumpReader/issues)[2 PRs](https://github.com/JeroenDeDauw/JsonDumpReader/pulls)GPL-2.0-or-laterPHPPHP &gt;=7.1

Since Oct 22Pushed 4mo ago4 watchersCompare

[ Source](https://github.com/JeroenDeDauw/JsonDumpReader)[ Packagist](https://packagist.org/packages/jeroen/json-dump-reader)[ Docs](https://github.com/JeroenDeDauw/JsonDumpReader)[ RSS](/packages/jeroen-json-dump-reader/feed)WikiDiscussions master Synced 2d ago

READMEChangelogDependencies (12)Versions (12)Used By (0)

JsonDumpReader
==============

[](#jsondumpreader)

**I am not actively developing this library, which might no longer work! You can commission myself and other leading Wikibase experts for [Wikibase Software Development](https://professional.wiki/en/wikibase-software-development)or other [Wikibase services](https://www.wikibase.consulting/wikibase-services/).**

[![Build Status](https://camo.githubusercontent.com/473c92477c9ba632e8dc5dd8155ae9a56bb03703765107421aeca45a581dec67/68747470733a2f2f7472617669732d63692e6f72672f4a65726f656e4465446175772f4a736f6e44756d705265616465722e7376673f6272616e63683d6d6173746572)](https://travis-ci.org/JeroenDeDauw/JsonDumpReader)[![Scrutinizer Code Quality](https://camo.githubusercontent.com/0db31b64dc3ed9d2615dfc69f2d296390141770844f29ee7908ba0813d70b21a/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f4a65726f656e4465446175772f4a736f6e44756d705265616465722f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/JeroenDeDauw/JsonDumpReader/?branch=master)[![Code Coverage](https://camo.githubusercontent.com/ff788bedc9aa088220c485e00c2f937177d73f6265978ded42bd7fee3b55e4c7/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f4a65726f656e4465446175772f4a736f6e44756d705265616465722f6261646765732f636f7665726167652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/JeroenDeDauw/JsonDumpReader/?branch=master)[![Download count](https://camo.githubusercontent.com/87559ba7006152460339ab8363c98c9f2fbee54396a39e309566b6a1c4ebae4c/68747470733a2f2f706f7365722e707567782e6f72672f6a65726f656e2f6a736f6e2d64756d702d7265616465722f642f746f74616c2e706e67)](https://packagist.org/packages/jeroen/json-dump-reader)[![Latest Stable Version](https://camo.githubusercontent.com/5d61178a0ae66096b979aea9ab31adcc90f25a2a894c68ad4361ea7157f59925/68747470733a2f2f706f7365722e707567782e6f72672f6a65726f656e2f6a736f6e2d64756d702d7265616465722f76657273696f6e2e706e67)](https://packagist.org/packages/jeroen/json-dump-reader)

**JsonDumpReader** is a PHP library that provides ways to read from and iterate through the [Wikibase](https://www.wikibase.consulting/what-is-wikibase/) entities in a Wikibase Repository JSON dump such as the Wikidata JSON dumps. You can find more information about the format on the [Wikidata dump download page](https://www.wikidata.org/wiki/Wikidata:Database_download).

- [Usage](#usage)
- [Installation](#installation)
- [Development](#development)
- [Release notes](#release-notes)
- [See also](#see-also)

You can hire [the authors](https://www.entropywins.wtf/wikidata)company [Professional Wiki](https://professional.wiki/) for custom development or for [Wikibase hosting](https://professional.wiki/en/hosting/wikibase).

Usage
-----

[](#usage)

All services are constructed via the `JsonDumpFactory` class:

```
use Wikibase\JsonDumpReader\JsonDumpFactory;
$factory = new JsonDumpFactory();
```

There are two types of services provided by this library: those implementing `DumpReader` and those implementing `Iterator`. The former allow you to ask for the next line of the dump. They are the most low level, with the different implementations supporting different dump file formats (such as `.json`and `.json.bz2`). The iterators all depend on a `DumpReader`, and allow you to easily iterate over all entities in the dump. They differ in how much additional processing they do, from nothing (returning the JSON stings) to fully deserializing the entities into `EntityDocument` objects.

### Reading some lines from a dump

[](#reading-some-lines-from-a-dump)

```
$dumpReader = $factory->newExtractedDumpReader( '/tmp/wd-dump.json' );
echo 'First line: ' . $dumpReader->nextJsonLine();
echo 'Second line: ' . $dumpReader->nextJsonLine();
```

```
$dumpReader = $factory->newGzDumpReader( '/tmp/wd-dump.json.gz' );
echo 'First line: ' . $dumpReader->nextJsonLine();
echo 'Second line: ' . $dumpReader->nextJsonLine();
```

```
$dumpReader = $factory->newBz2DumpReader( '/tmp/wd-dump.json.bz2' );
echo 'First line: ' . $dumpReader->nextJsonLine();
echo 'Second line: ' . $dumpReader->nextJsonLine();
```

### Resume reading from a previous position

[](#resume-reading-from-a-previous-position)

```
$dumpReader = $factory->newGzDumpReader( '/tmp/wd-dump.json.gz' );
echo 'First line: ' . $dumpReader->nextJsonLine();
echo 'Second line: ' . $dumpReader->nextJsonLine();

$newReader = $factory->newGzDumpReader( '/tmp/wd-dump.json.gz' );
$newReader->seekToPosition( $dumpReader->getPosition() );

echo 'Third line: ' . $newReader->nextJsonLine();
```

### Iterating though the JSON

[](#iterating-though-the-json)

```
$dumpReader = $factory->newGzDumpReader( '/tmp/wd-dump.json.gz' );
$dumpIterator = $factory->newStringDumpIterator( $dumpReader );

foreach ( $dumpIterator as $jsonLine ) {
	echo 'You can haz JSON: ' . $jsonLine;
}
```

### Creating an EntityDocument iterator

[](#creating-an-entitydocument-iterator)

```
$dumpReader = $factory->newBz2DumpReader( '/tmp/wd-dump.json.bz2' );
$dumpIterator = $factory->newEntityDumpIterator( $dumpReader, /* Deserializer */ $entityDeserializer );

foreach ( $dumpIterator as $entityDocument ) {
	echo 'At entity ' . $entityDocument->getId()->getSerialization();
}
```

The second argument needs to be an instance of `Deserializer` that can deserialize entities. Such an instance is typically constructed via the [Wikibase DataModel Serialization library](https://github.com/wmde/WikibaseDataModelSerialization). For an example of how to do this, see the `tests/integration/EntityDumpIteratorTest.php` file. Note that this code has [additional dependencies](https://github.com/JeroenDeDauw/JsonDumpReader/blob/bcb260f2a04193490f69b1bc794c1788aa235888/composer.json#L30-L33).

### Combining iterators

[](#combining-iterators)

The iterator approach taken by this library is lazy and can easily be combined with iterator tools provided by PHP, such as `LimitIterator` and `CallbackFilterIterator`.

### More documentation and examples

[](#more-documentation-and-examples)

To get documentation that is never out of date and always fully correct for your version of the library, have a look at the public methods in `src/JsonDumpFactory.php`. Every public method has at least one test, so you can find good examples in the tests directory.

Installation
------------

[](#installation)

To use the JsonDumpReader library in your project, simply add a dependency on `jeroen/json-dump-reader`to your project's `composer.json` file. Here is a minimal example of a `composer.json`file that just defines a dependency on JsonDumpReader 2.x:

```
{
    "require": {
        "jeroen/json-dump-reader": "~2.0"
    }
}
```

Supported PHP versions:

- Version 2.x: 7.1 - 7.3+
- Version 1.4: 5.6 - 7.2
- Version 1.3: 5.5 - 7.2

### Installation when using EntityDocument

[](#installation-when-using-entitydocument)

If you want to use the EntityDocument Iterator, you will also need to install the DataValue libraries used by the Wikibase that created the dump. For Wikidata and typical Wikibase installations these are:

- [DataValues Geo](https://github.com/DataValues/Geo/)
- [DataValues Number](https://github.com/DataValues/Number/)
- [DataValues Time](https://github.com/DataValues/Time/)

These can be added to the `require` section in your `composer.json` as follows. Note that the used versions are current as of August 2018. You can use the latest versions that work for you as no restrictions on these libraries are placed by JsonDumpReader.

```
        "data-values/geo": "~4.0|~3.0",
        "data-values/number": "~0.10.0",
        "data-values/time": "~1.0",
```

Development
-----------

[](#development)

### Running CI checks and tests locally

[](#running-ci-checks-and-tests-locally)

If you have PHP and Composer installed locally, you do not need Docker and can just execute composer commands.

For tests only

```
composer test

```

For style checks only

```
composer cs

```

For a full CI run

```
composer ci

```

### Docker: installation

[](#docker-installation)

You can develop without having a local installation of PHP or Composer by using Docker. Install it with

```
sudo apt-get install docker docker-compose

```

### Docker: Running Composer

[](#docker-running-composer)

To pull in the project dependencies via Composer, run:

```
make composer install

```

You can run other Composer commands via `make run`, but at present this does not support argument flags. If you need to execute such a command, you can do so in this format:

```
docker run --rm --interactive --tty --volume $PWD:/app -w /app\
 --volume ~/.composer:/composer --user $(id -u):$(id -g) composer composer install --no-scripts

```

Where `composer install --no-scripts` is the command being run.

### Docker: Running the CI checks

[](#docker-running-the-ci-checks)

To run all CI checks, which includes PHPUnit tests, PHPCS style checks and coverage tag validation, run:

```
make

```

### Docker: Running the tests

[](#docker-running-the-tests)

To run just the PHPUnit tests run

```
make test

```

To run only a subset of PHPUnit tests or otherwise pass flags to PHPUnit, run

```
docker-compose run --rm app ./vendor/bin/phpunit --filter SomeClassNameOrFilter

```

Release notes
-------------

[](#release-notes)

### Version 2.0.0 (2018-08-14)

[](#version-200-2018-08-14)

- Added support for PHP 7.3
- Dropped support for PHP 5.6 and PHP 7.0
- Added scalar and return type hints
    - Breaking change for classes extending `DumpReader` or `SeekableDumpReader`

### Version 1.4.0 (2017-03-03)

[](#version-140-2017-03-03)

- Added support for PHP 7.1 and PHP 7.2
- Dropped support for PHP 5.5

### Version 1.3.0 (2015-11-23)

[](#version-130-2015-11-23)

- `JsonDumpFactory::newGzDumpReader` now takes an optional `$initialPosition` argument

### Version 1.2.0 (2015-11-23)

[](#version-120-2015-11-23)

- Added `SeekableDumpReader` interface
    - `JsonDumpFactory::newGzDumpReader` now returns a `SeekableDumpReader`
    - `JsonDumpFactory::newExtractedDumpReader` now returns a `SeekableDumpReader`
- `ExtractedDumpReader` is now package private (no breaking changes to it will be made before 2.0)

### Version 1.1.0 (2015-11-12)

[](#version-110-2015-11-12)

- Added `JsonDumpFactory::newGzDumpReader` for gzip dump support

### Version 1.0.1 (2015-11-10)

[](#version-101-2015-11-10)

- Fixed of-by-one error in resumption of `ExtractedDumpReader` via `getPosition`

### Version 1.0.0 (2015-11-08)

[](#version-100-2015-11-08)

- Added `JsonDumpFactory`
    - Added `JsonDumpFactory::newBz2DumpReader`
    - Added `JsonDumpFactory::newExtractedDumpReader`
    - Added `JsonDumpFactory::newStringDumpIterator`
    - Added `JsonDumpFactory::newObjectDumpIterator`
    - Added `JsonDumpFactory::newEntityDumpIterator`
- Removed `JsonDumpReader` (now `JsonDumpFactory::newExtractedDumpReader`)
- Removed `JsonDumpIterator` (now `JsonDumpFactory::newEntityDumpIterator`)
- Added ci command that runs PHPUnit, PHPCS, PHPMD and covers tags validation

### Version 0.2.0 (2015-09-29)

[](#version-020-2015-09-29)

- Installation with Wikibase DataModel Serialization 2.x is now supported
- Installation restrictions of Wikibase DataModel version have been dropped

### Version 0.1.0 (2014-10-22)

[](#version-010-2014-10-22)

Initial release with

- `JsonDumpReader` to read entity JSON from the dump
- `JsonDumpIterator` to iterate through the dump as if it was a collection of `EntityDocument`

See also
--------

[](#see-also)

- [Replicator](https://github.com/JeroenDeDauw/Replicator) - a CLI application using JsonDumpReader
- [Wikibase components](http://wikiba.se/components/) - various libraries for working with Wikibase/Wikidata

###  Health Score

47

—

FairBetter than 93% of packages

Maintenance51

Moderate activity, may be stable

Popularity40

Moderate usage in the ecosystem

Community17

Small or concentrated contributor base

Maturity65

Established project with proven stability

 Bus Factor1

Top contributor holds 97.8% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~174 days

Recently: every ~251 days

Total

9

Last Release

2882d ago

Major Versions

0.2.0 → 1.0.02015-11-08

1.4.0 → 2.0.02018-08-14

PHP version history (4 changes)0.1.0PHP &gt;=5.4.0

1.0.0PHP &gt;=5.5.0

1.4.0PHP &gt;=5.6.0

2.0.0PHP &gt;=7.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/451bd4039d530fed8f9c3da91bfa519233a397d2182cdfdcad700f6cfea19b7f?d=identicon)[Jeroen De Dauw](/maintainers/Jeroen%20De%20Dauw)

---

Top Contributors

[![JeroenDeDauw](https://avatars.githubusercontent.com/u/146040?v=4)](https://github.com/JeroenDeDauw "JeroenDeDauw (90 commits)")[![tobijat](https://avatars.githubusercontent.com/u/2997252?v=4)](https://github.com/tobijat "tobijat (1 commits)")[![xEdelweiss](https://avatars.githubusercontent.com/u/1444897?v=4)](https://github.com/xEdelweiss "xEdelweiss (1 commits)")

---

Tags

jsondumpwikidatawikibasebz2

###  Code Quality

TestsPHPUnit

Code StylePHP\_CodeSniffer

### Embed Badge

![Health badge](/badges/jeroen-json-dump-reader/health.svg)

```
[![Health](https://phpackages.com/badges/jeroen-json-dump-reader/health.svg)](https://phpackages.com/packages/jeroen-json-dump-reader)
```

###  Alternatives

[symfony/var-dumper

Provides mechanisms for walking through any arbitrary PHP variable

7.4k904.9M9.6k](/packages/symfony-var-dumper)[kint-php/kint

Kint - Advanced PHP dumper

2.9k20.6M305](/packages/kint-php-kint)[xrdebug/php

PHP client library for xrDebug

23824.8k3](/packages/xrdebug-php)[leeoniya/dump-r

a cleaner, leaner mix of print\_r() and var\_dump()

12168.9k5](/packages/leeoniya-dump-r)[jbzoo/jbdump

Script for debug and dump PHP variables and other stuff. This tool is a nice replacement for print\_r() and var\_dump() functions.

211.1M3](/packages/jbzoo-jbdump)[phalcon/dd

This package will add the `dd` and `dump` helpers to your Phalcon application.

24297.7k27](/packages/phalcon-dd)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
