PHPackages                             kwattro/gh4j - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. kwattro/gh4j

ActiveApplication[Utility &amp; Helpers](/categories/utility)

kwattro/gh4j
============

Github Repository Forks Analysis with Neo4j

8111PHP

Since Jul 25Pushed 11y ago1 watchersCompare

[ Source](https://github.com/kwattro/gh4j)[ Packagist](https://packagist.org/packages/kwattro/gh4j)[ RSS](/packages/kwattro-gh4j/feed)WikiDiscussions master Synced 2d ago

READMEChangelogDependenciesVersions (2)Used By (0)

\[WIP\] Gh4j - PHP
==================

[](#wip-gh4j---php)

### Import easily Github Events Data Archive into a Neo4j Graph Database

[](#import-easily-github-events-data-archive-into-a-neo4j-graph-database)

#### Disclaimer !

[](#disclaimer-)

This is not compatible with the Github ReST API as the Event payloads are totally different, btw this can be used as a sandbox for a further switch to the API.

### What ?

[](#what-)

This is a simple library that will parse Github Events Archive files and load these Events in a Neo4j Graph DB.

This consists of a simple entry point for loading the data and some `EventType` Loaders that will generate the needed Cypher Query for inserting the data. This lib is coupled with a dead simple connection library to access the Neo4j ReST API.

It will also create relationships between Events, Repositories, Users, Forks, Comments so that you can have a full overview of what is going on Github and how it is related.

#### `Attention/Achtung/Ola:`

[](#attentionachtungola)

This lib is made primarely as a personal experiment, the DB Schema that the queries will produce are proper to the manner I intend to manipulate the data, I do not pretend it is the best schema to use but it is a good exercise for playing with the Neo4j graph DB.

Of course, suggestions, point of views, PR's, ... are always welcome.

NB: FYI there are approximately 8000 events in an hour on Github

### How to use

[](#how-to-use)

#### 1. Require the `Gh4j` library in your project dependencies

[](#1-require-the-gh4j-library-in-your-project-dependencies)

Add the following requirement to your `composer.json` file :

```
"require":{
	// ..... other dependencies
	"kwattro/gh4j" : "dev-master"
}
```

### 2. Instantiate the `Gh4j` class

[](#2-instantiate-the-gh4j-class)

```
require 'vendor/autoload.php';

use Kwattro\Gh4j\Gh4j;

$gh = new Gh4j();
```

### 3. Download Github Events Data Archive

[](#3-download-github-events-data-archive)

You can download the data archives on the [Github Archive Website](http://www.githubarchive.org) . Just follow the instructions and download the data for the period you want.

Unzip the download file somewhere on your computer.

### 4. Load the data in the database

[](#4-load-the-data-in-the-database)

The download file is not a valid JSON, but contains lines that are valid JSON representing `Event` occurences.

With the php `file` function, you'll get the file in an array\_format with each rows containing JSON Event Objects, loop through the array and load the events :

```
$events = file('/Users/kwattro/gh/data/2014-06-01-3.json', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

foreach ($events as $event) {
	$gh->loadEvent($event);
}
```

The `Gh4j` library will then check validity of the JSON, load the appropriate `EventType` Loader and insert the data in the DB.

By default, each `loadEvent()` call will trigger a connection to the DB for inserting the data. You can use a `Stack` method that will empile \* number of queries and flush it when the specified limit is reached.

To use it, you need to instruct the loadMethod to use stackMethod to do so by supplying `true` as the second argument

```
// Code to read your file ....
foreach ($events as $event) {
	$gh->loadEvent($event, true);
}
// Don't forget to flush after the loop to insert remaining queued queries
$gh->flush();
```

The above will empile queries until a limit is reached and send the stack in one big query. With a limit of 50, this is going 2x faster than calling each time the DB for a file containing about 8000 events and is inserted on a small setup in 50 seconds (1 hour file);

You can adjust the limit of the `stack` by accessing the DB connector :

```
$gh = new Gh4j():
$conn = $gh->getConnector();
$conn->setStackFlushLimit(30); // Stack will be flushed after 30 queries
```

---

### Event Types supported

[](#event-types-supported)

Currently there is 4 EventTypes handled :

- PushEvent
- PullRequestEvent
- ForkEvent
- IssueCommentEvent

Each type will be handled by a custom EventLoader, which extend a BaseEventLoader that creates Cypher query for the common payload of the Event.

I suggest that you look at the code comments inside the EventLoader directory to have an overview of how the data will be inserted, or at the end of this README to the section `Generated Cypher Queries`.

If a not supported EventType is encountered in the data, he will be skipped.

More will come ...

---

### Generated Cypher Queries examples

[](#generated-cypher-queries-examples)

#### PushEvent

[](#pushevent)

```
MERGE (u:User {name:'ZhukV'}) CREATE (ev:PushEvent {time:toInt(1401606330) }) MERGE (u)-[:DO]->(ev)
MERGE (repo:Repository {id:toInt(20051270)})
SET repo.name='Unicode'
MERGE (branch:Branch {ref:'refs/heads/master', repo_id:toInt(20051270)})
MERGE (ev)-[:PUSH_TO]->(branch)
MERGE (branch)-[:BRANCH_OF]->(repo)
MERGE (owner:User {name:'ZhukV'})
MERGE (repo)-[:OWNED_BY]->(owner)

```

Expl:

-&gt; Match or Create the User doing the Event -&gt; Relates the User to this Event -&gt; Match or Create the Repository it is pushed to -&gt; Match or Create the Branch it is pushed to -&gt; Relates Event is PUSH\_TO the Branch -&gt; Branch is BRANCH\_OF Repository -&gt; Match or Create Owner of the Repository -&gt; Repository that is OWNED\_BY somebody

### PullRequestEvent

[](#pullrequestevent)

```
MERGE (u:User {name:'pixelfreak2005'})
CREATE (ev:PullRequestEvent {time:toInt(1401606356) })
MERGE (u)-[:DO]->(ev)
MERGE (pr:PullRequest {html_url:'https://github.com/pixelfreak2005/liqiud_android_packages_apps_Settings/pull/2'})
SET pr += { id:toInt(16573622), number:toInt(2), state:'open'}
MERGE (ev)-[:PR_OPEN]->(pr)
MERGE (ow:User {name:'pixelfreak2005'})
MERGE (or:Repository {id:toInt(20338536), name:'liqiud_android_packages_apps_Settings'})
MERGE (or)-[:OWNED_BY]->(ow)
MERGE (pr)-[:PR_ON_REPO]->(or)

```

### ForkEvent

[](#forkevent)

```
MERGE (u:User {name:'rudymalhi'})
CREATE (ev:ForkEvent {time:toInt(1401606379) }) MERGE (u)-[:DO]->(ev)
CREATE (fork:Fork:Repository {name:'Full-Stack-JS-Nodember'})
MERGE (ev)-[:FORK]->(fork)-[:OWNED_BY]->(u)
MERGE (bro:User {name:'mgenev'})
MERGE (br:Repository {id:toInt(15503488), name:'Full-Stack-JS-Nodember'})-[:OWNED_BY]->(bro)
MERGE (fork)-[:FORK_OF]->(br)

```

### IssueCommentEvent

[](#issuecommentevent)

```
MERGE (u:User {name:'johanneswilm'})
CREATE (ev:IssueCommentEvent {time:toInt(1401606384) })
MERGE (u)-[:DO]->(ev)
MERGE (comment:IssueComment {id:toInt(44769338)})
MERGE (ev)-[:ISSUE_COMMENT]->(comment)
MERGE (issue:Issue {id:toInt(34722578)})
MERGE (repo:Repository {id:toInt(14487686)})
MERGE (comment)-[:COMMENT_ON]->(issue)-[:ISSUE_ON]->(repo)
SET repo.name = 'diffDOM'
MERGE (owner:User {name:'fiduswriter'})
MERGE (comment)-[:COMMENT_ON]->(issue)-[:ISSUE_ON]->(repo)-[:OWNED_BY]->(owner)

```

I listen to all suggestions that can improve query performances :)

###  Health Score

23

—

LowBetter than 27% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity12

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/c66009f0f147bb7dfcfbbd6837176b2fe6c6e2cdcf82d9ad07ab7d7aa47735f4?d=identicon)[ikwattro](/maintainers/ikwattro)

---

Top Contributors

[![ikwattro](https://avatars.githubusercontent.com/u/1222009?v=4)](https://github.com/ikwattro "ikwattro (27 commits)")

### Embed Badge

![Health badge](/badges/kwattro-gh4j/health.svg)

```
[![Health](https://phpackages.com/badges/kwattro-gh4j/health.svg)](https://phpackages.com/packages/kwattro-gh4j)
```

###  Alternatives

[spatie/laravel-mailcoach-editor

An Editor editor package for Mailcoach

11247.6k1](/packages/spatie-laravel-mailcoach-editor)[temporal-php/support

Helpers that simplify working with the Temporal PHP SDK

1115.3k](/packages/temporal-php-support)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
