PHPackages                             yoozi/miner - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. yoozi/miner

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

yoozi/miner
===========

PHP library to extract the metadata from a public web page and/or summarize it.

1.0.3(11y ago)14776[1 issues](https://github.com/yoozi/miner/issues)MITPHPPHP &gt;=5.3.0

Since Apr 24Pushed 11y ago12 watchersCompare

[ Source](https://github.com/yoozi/miner)[ Packagist](https://packagist.org/packages/yoozi/miner)[ Docs](http://golem.yoozi.cn/)[ RSS](/packages/yoozi-miner/feed)WikiDiscussions master Synced 3d ago

READMEChangelogDependencies (4)Versions (4)Used By (0)

Miner
=====

[](#miner)

[!\[Gitter\](https://badges.gitter.im/Join Chat.svg)](https://gitter.im/yoozi/miner?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

> This library is part of [Project Golem](http://golem.yoozi.cn/), see [yoozi/golem](https://github.com/yoozi/golem) for more info.

Miner is a PHP library that extracting metadata and interesting text content (like author, summary, and etc.) from HTML pages. It acts like a simplified [HTML metadata parser](https://tika.apache.org/1.4/formats.html#HyperText_Markup_Language) in [Apache Tika](https://tika.apache.org/).

WTF is Miner?
-------------

[](#wtf-is-miner)

Ta-da! Consider the screenshot taken from LinkedIn below:

[![image](https://cloud.githubusercontent.com/assets/275750/2751070/1773aa32-c8ae-11e3-9de3-e022ddcb851f.png)](https://cloud.githubusercontent.com/assets/275750/2751070/1773aa32-c8ae-11e3-9de3-e022ddcb851f.png)

When you post a link to your connections on LinkedIn, it will automatically extract the title, summary, and even cover image for you. Miner can be typically used to achieve tasks like this.

Installation
------------

[](#installation)

The best and easy way to install the Golem package is with [Composer](https://getcomposer.org).

1. Open your composer.json and add the following to the require array:

    ```
    "yoozi/miner": "1.0.*"

    ```
2. Run Composer to install or update the new package dependencies.

    ```
    php composer install

    ```

    or

    ```
    php composer update

    ```

Usage
-----

[](#usage)

### Parsers

[](#parsers)

- **Meta**: Summarize a webpage by parsing its HTML meta tags. In most cases it favors Open Graph (OG) markup, and will fall back to standard meta tags if necessary.
- **Readability**: Summarize a webpage using [Arc90's Readability alogrithm](https://code.google.com/p/arc90labs-readability/). All credit goes to [@feelinglucky's PHP Port](https://github.com/feelinglucky/php-readability).
- **Hybrid**: In combination with the above two parsers, it simply takes Readability as the primary parser, and Meta as its fallback.

Hybrid is enabled by default. You can change parsers to best fit your needs:

```
// Use the Readability Parser.
$extractor->getConfig()->set('parser', 'readability');

// Or...use the Hybrid Parser.
// $extractor->getConfig()->set('parser', 'hybrid');
// Or...use the Meta Parser.
// $extractor->getConfig()->set('parser', 'meta');
```

### Example

[](#example)

We can parse a remote url and extract its metadata directly.

```
