PHPackages                             himedia/emr-monitoring - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Logging &amp; Monitoring](/categories/logging)
4. /
5. himedia/emr-monitoring

ActiveLibrary[Logging &amp; Monitoring](/categories/logging)

himedia/emr-monitoring
======================

Command line tool for monitoring Amazon Elastic MapReduce (Amazon EMR) jobflows and analyze past jobflows.

v1.7.0(12y ago)12172[5 issues](https://github.com/Hi-Media/EmrMonitoring/issues)Apache-2.0JavaScriptPHP &gt;=5.3.3

Since Apr 2Pushed 12y ago5 watchersCompare

[ Source](https://github.com/Hi-Media/EmrMonitoring)[ Packagist](https://packagist.org/packages/himedia/emr-monitoring)[ RSS](/packages/himedia-emr-monitoring/feed)WikiDiscussions stable Synced 2mo ago

READMEChangelogDependencies (10)Versions (10)Used By (0)

EMR Monitoring
==============

[](#emr-monitoring)

[![Latest stable version](https://camo.githubusercontent.com/59dc486a05582ed515b05f695968125f0db6feca32ade38e575067530a8182cd/68747470733a2f2f706f7365722e707567782e6f72672f68696d656469612f656d722d6d6f6e69746f72696e672f762f737461626c652e706e67 "Latest stable version")](https://packagist.org/packages/himedia/emr-monitoring)[![Dependency Status](https://camo.githubusercontent.com/caf9cef61c07ef755b4420942e58e5dbbb79d1c22bdd366078d1717276031bda/68747470733a2f2f7777772e76657273696f6e6579652e636f6d2f757365722f70726f6a656374732f3533353239336562666530643037346633383030303035372f62616467652e706e67)](https://www.versioneye.com/user/projects/535293ebfe0d074f38000057)

Command line tool for monitoring Amazon Elastic MapReduce ([Amazon EMR](http://aws.amazon.com/elasticmapreduce/)) jobflows and analyze past jobflows.

Table of Contents
-----------------

[](#table-of-contents)

- [Overview](#overview)
- [Description](#description)
    - [Retrieve information from many places](#retrieve-information-from-many-places)
    - [All that information is gathered in one screen](#all-that-information-is-gathered-in-one-screen)
    - [Task timeline](#task-timeline)
- [Installing](#installing)
    - [Git clone](#git-clone)
    - [Configuration](#configuration)
    - [Dependencies](#dependencies)
- [Usage](#usage)
    - [Command line options](#command-line-options)
    - [With a finished jobflow](#with-a-finished-jobflow)
    - [With a new jobflow](#with-a-new-jobflow)
- [Documentation](#documentation)
- [Copyrights &amp; licensing](#copyrights--licensing)
- [ChangeLog](#changelog)
- [Git branching model](#git-branching-model)

Overview
--------

[](#overview)

[![Overview](doc/images/overview.png "Overview")](doc/images/overview.png)

Description
-----------

[](#description)

### Retrieve information from many places

[](#retrieve-information-from-many-places)

1. Amazon EMR via [Amazon Elastic MapReduce Ruby Client](http://aws.amazon.com/developertools/2264) to get description of a jobflow:

```
```bash

```

$ elastic-mapreduce --describe … ```

2. Amazon EC2 via [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351)to retrieve history of spots instances price:

```
```bash

```

$ ec2-describe-spot-price-history … ```

3. Amazon S3 via [S3cmd](http://s3tools.org/s3cmd) to get size of both input and output files, to retrieve potential errors and to get log summary:

```
```bash

```

$ s3cmd ls &lt;input|output&gt; $ s3cmd get s3://…/steps/…/stderr $ s3cmd get s3://…/jobs/job\_… ```

4. Amazon Elastic MapReduce Pricing of On-Demand instances via this [URL](http://aws.amazon.com/elasticmapreduce/pricing/)and its underlying [JSON service](http://aws.amazon.com/elasticmapreduce/pricing/pricing-emr.json).
5. [Hadoop JobTracker](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-manage-view-web-interfaces.html)running on the master node and accessed by an automatic SSH tunnel:

```
```bash

```

$ ssh -N -L 12345:localhost:9100 hadoop@ … $ wget ```

6. Additionally, **EMR Monitoring** computes elapsed times between various events and realizes an estimation of the jobflow's total cost.

### All that information is gathered in one screen

[](#all-that-information-is-gathered-in-one-screen)

An *animation* is better than a thousand words:

[![Animated monitoring](doc/images/animated-monitoring.gif "Animated monitoring")](doc/images/animated-monitoring.gif)

Result with a completed jobflow *(click for full resolution image)*:

[![A completed jobflow](doc/images/completed-jobflow-mini.png "A completed jobflow")](doc/images/completed-jobflow.png)

#### Some clarifications

[](#some-clarifications)

##### Price

[](#price)

- The ask price for spot instances comes in real time from EC2 API Tools.
- The total price in general section is the sum of the prices of each instance group, *i.e.* for each group: ` ×  × ceil()`.

##### Elapsed times

[](#elapsed-times)

- Elapsed times in gray measure the time elapsed between initialization and start date of instance/step, and between start date and end date of instance/step.
- When start date or end date is unknown, then elapsed times are computed according to the local time and a `≈` sign is added.

##### Completion percentages

[](#completion-percentages)

Completion percentages are computed from Hadoop JobTracker data and are **NOT**the number of remaining tasks divided by the number of completed tasks.

##### Error messages

[](#error-messages)

Error messages, if any, are always displayed:

[![Jobflow failed](doc/images/jobflow-failed.png "Jobflow failed")](doc/images/jobflow-failed.png)

### Task timeline

[](#task-timeline)

A task timeline is generated via [gnuplot](http://www.gnuplot.info/) including all jobs of **in progress or past jobflow**and giving details on number of mapper, shuffle, merge and reducer tasks.

*Animation* from generated task timelines throughout jobflow run:

[![Animated task timeline](doc/images/animated-task-timeline.gif "Animated task timeline")](doc/images/animated-task-timeline.gif)

Result with a completed jobflow *(click for full resolution image)*:

[![Task timeline of a completed jobflow](doc/images/tasktimeline-mini.png "Task timeline of a completed jobflow")](doc/images/tasktimeline.png)

Installing
----------

[](#installing)

### Git clone

[](#git-clone)

Create a folder, *e.g.* `/usr/local/lib/emr-monitoring`, and `cd` into it. Then clone the repository (the folder must be empty!):

```
$ git clone git://github.com/Hi-Media/EmrMonitoring.git .
```

### Configuration

[](#configuration)

Initialize configuration file from `conf/config-dist.php` and adapt it:

[![Config file](doc/images/dependency-config-file.png "Config file")](doc/images/dependency-config-file.png)

```
$ cp '/usr/local/lib/emr-monitoring/conf/config-dist.php' '/usr/local/lib/emr-monitoring/conf/config.php'
```

If Bash is not your default shell, then fill `$aConfig['Himedia\EMR']['shell']` whith your Bash interpreter path, *e.g.* `/bin/bash`.

### Dependencies

[](#dependencies)

All dependencies are checked at launch and **EMR Monitoring** systematically helps to resolve them.

#### Composer dependencies

[](#composer-dependencies)

PHP class autoloading and PHP dependencies are managed by [composer](http://getcomposer.org).

[![Composer dependencies](doc/images/dependency-composer.png "Composer dependencies")](doc/images/dependency-composer.png)

##### Text version

[](#text-version)

To set up the project dependencies with composer, run one of the following commands:

```
$ composer install
# or
$ php composer.phar install
```

If needed, to install composer locally, run one of the following commands:

```
$ curl -sS https://getcomposer.org/installer | php
# or
$ wget --no-check-certificate -q -O- https://getcomposer.org/installer | php
```

Read  for more information.

#### EMR CLI

[](#emr-cli)

[Amazon Elastic MapReduce Ruby Client](http://aws.amazon.com/developertools/2264)is needed to get description of a jobflow. *Warning: it requires Ruby 1.8.7 and is not compatible with later versions of Ruby.*

[![Dependency on EMR CLI](doc/images/dependency-emr-cli.png "Dependency on EMR CLI")](doc/images/dependency-emr-cli.png)

##### Text version

[](#text-version-1)

To install Amazon EMR Command Line Interface:

```
$ sudo apt-get install ruby1.8-full
$ mkdir /usr/local/lib/elastic-mapreduce-cli
$ wget http://elasticmapreduce.s3.amazonaws.com/elastic-mapreduce-ruby.zip
$ unzip -d /usr/local/lib/elastic-mapreduce-cli elastic-mapreduce-ruby.zip
```

Create a file named `/usr/local/lib/elastic-mapreduce-cli/credentials.json` with at least the following lines:

```
{
    "keypair": "Your key pair name",
    "key-pair-file": "The path and name of your PEM/private key file"
}
```

The `key-pair-file` key is especially used to open a SSH tunnel to the master node and consult Hadoop JobTracker.

If necessary, adapt `emr_cli_bin`, `aws_access_key` and `aws_secret_key` keys of `$aConfig['Himedia\EMR']` in `conf/config.php`.

Read  for more information.

#### EC2 API Tools

[](#ec2-api-tools)

[Amazon EC2 API Tools](http://aws.amazon.com/developertools/351) allows to retrieve history of spots instances price.

[![Dependency on EC2 API Tools](doc/images/dependency-ec2.png "Dependency on EC2 API Tools")](doc/images/dependency-ec2.png)

##### Text version

[](#text-version-2)

To install Amazon EC2 API Tools:

```
$ wget http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip
$ unzip -d /usr/local/lib ec2-api-tools.zip
$ If necessary, adapt ec2_api_tools_dir, aws_access_key and aws_secret_key keys of $aConfig['Himedia\EMR'] in conf/config.php.
$ Set and export both JAVA_HOME and EC2_HOME environment variables.
```

For example, include these commands in your `~/.bashrc` and reload it:

```
    export JAVA_HOME=/usr
    export EC2_HOME=/usr/local/lib/ec2-api-tools-1.6.7.2
```

Read [http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/setting\_up\_ec2\_command\_linux.html](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/setting_up_ec2_command_linux.html) for more information.

#### S3cmd

[](#s3cmd)

[S3cmd](http://s3tools.org/s3cmd) is required to get size of both input and output files, to retrieve potential errors and to get log summary.

[![Dependency on S3cmd](doc/images/dependency-s3cmd.png "Dependency on S3cmd")](doc/images/dependency-s3cmd.png)

##### Text version

[](#text-version-3)

Please run:

```
$ sudo apt-get install s3cmd
$ s3cmd --configure
```

Read  for more information.

#### Gnuplot

[](#gnuplot)

Task timelines are generated via [gnuplot](http://www.gnuplot.info/) for **in progress or past jobflow**and give details on number of mapper, shuffle, merge and reducer tasks.

[![Dependency on Gnuplot](doc/images/dependency-gnuplot.png "Dependency on Gnuplot")](doc/images/dependency-gnuplot.png)

##### Text version

[](#text-version-4)

```
$ sudo apt-get install gnuplot
```

Usage
-----

[](#usage)

### Command line options

[](#command-line-options)

You can view the options by running:

```
$ src/emr-monitoring.php [-h|--help]
```

[![CLI options](doc/images/cli-options.png "CLI options")](doc/images/cli-options.png)

##### Text version

[](#text-version-5)

```
Usage
    emr_monitoring.php [OPTION]…

Options
    -h, --help
        Display this help.

    -l, --list-all-jobflows
        List all jobflows in the last 2 weeks.

    -j, --jobflow-id
        Display statistics on any , finished or in progress.
        ⇒ to monitor a jobflow in real-time: watch -n10 --color emr_monitoring.php -j

    --list-input-files
        With -j, list all S3 input files really loaded by Hadoop instance of the completed .
        Disable --json.

    --json
        With -j, convert statistics to JSON format.
        Overridden by --list-input-files.

    -p, --ssh-tunnel-port
        With -j, specify the  used to establish a connection to the master node and retrieve data
        from the Hadoop jobtracker.

    -d, --debug
        Enable debug mode and list all shell commands.

```

### With a finished jobflow

[](#with-a-finished-jobflow)

Simply:

```
$ src/emr-monitoring.php -j
```

### With a new jobflow

[](#with-a-new-jobflow)

1. Launching a jobflow using Amazon Elastic MapReduce:

    ```
    ```

$ /usr/local/lib/elastic-mapreduce-cli/elastic-mapreduce
\--region us-east-1 --log-uri s3n://path/to/hadoop-logs
\--create --name my-name --visible-to-all-users --enable-debugging
\--pig-script s3://path/to/script.pig
\--args "-p,INPUT=s3://path/to/input"
\--args "-p,OUTPUT=s3://path/to/output"
\--args …
\--instance-group master --instance-type m1.medium --instance-count 1
\--instance-group core --instance-type m1.medium --instance-count 5
\--instance-group task --instance-type m1.medium --instance-count 90 --bid-price 0.015 ``` 2. You can see it in the list of all jobflows:

```
```bash

```

$ src/emr-monitoring.php -l ```

```
![All jobflows](doc/images/list-all-jobflows.png "All jobflows")

```

3. Start monitoring of the jobflow:

    ```
    ```

$ watch -n15 --color src/emr-monitoring.php -j j-88OW7Z7O3T9H ```

```
You can easily view the task timeline with, for example, [Eye of Gnome](http://projects.gnome.org/eog/):

```bash

```

$ eog &amp; ```

Documentation
-------------

[](#documentation)

[API documentation](http://htmlpreview.github.io/?https://github.com/Hi-Media/EmrMonitoring/blob/stable/doc/api/index.html) generated by [ApiGen](http://apigen.org/)and included in the `doc/api` folder.

Copyrights &amp; licensing
--------------------------

[](#copyrights--licensing)

Licensed under the Apache License 2.0. See [LICENSE](LICENSE) file for details.

ChangeLog
---------

[](#changelog)

See [CHANGELOG](CHANGELOG.md) file for details.

Git branching model
-------------------

[](#git-branching-model)

The git branching model used for development is the one described and assisted by `twgit` tool: .

###  Health Score

28

—

LowBetter than 54% of packages

Maintenance12

Infrequent updates — may be unmaintained

Popularity14

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity64

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~47 days

Recently: every ~78 days

Total

9

Last Release

4407d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/c1fcf1fb7a80bf7ddb1bab1b7ee8c4059419000ec8b80a4d532d31d45410cf70?d=identicon)[geoffroy-aubry](/maintainers/geoffroy-aubry)

---

Top Contributors

[![geoffroy-aubry](https://avatars.githubusercontent.com/u/1247448?v=4)](https://github.com/geoffroy-aubry "geoffroy-aubry (110 commits)")

---

Tags

climonitoringmapreducehadoopamazon elastic mapreduceamazon emr

###  Code Quality

TestsPHPUnit

Code StylePHP\_CodeSniffer

### Embed Badge

![Health badge](/badges/himedia-emr-monitoring/health.svg)

```
[![Health](https://phpackages.com/badges/himedia-emr-monitoring/health.svg)](https://phpackages.com/packages/himedia-emr-monitoring)
```

###  Alternatives

[rollbar/rollbar

Monitors errors and exceptions and reports them to Rollbar

33723.7M82](/packages/rollbar-rollbar)[analog/analog

Fast, flexible, easy PSR-3-compatible PHP logging package with dozens of handlers.

3451.5M24](/packages/analog-analog)[scoutapp/scout-apm-php

Scout Application Performance Monitoring Agent - https://scoutapm.com

17877.0k5](/packages/scoutapp-scout-apm-php)[ohdearapp/ohdear-cli

A standalone CLI tool for Oh Dear monitoring.

1371.3k](/packages/ohdearapp-ohdear-cli)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
