PHPackages                             graceas/php-scraper-engine - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. graceas/php-scraper-engine

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

graceas/php-scraper-engine
==========================

Scraper engine.

v0.2.5(6y ago)041MITPHPPHP &gt;=5.5.9

Since Oct 22Pushed 6y ago1 watchersCompare

[ Source](https://github.com/Graceas/php-scraper-engine)[ Packagist](https://packagist.org/packages/graceas/php-scraper-engine)[ RSS](/packages/graceas-php-scraper-engine/feed)WikiDiscussions master Synced 4d ago

READMEChangelog (10)Dependencies (3)Versions (25)Used By (0)

ScraperEngine
=============

[](#scraperengine)

Scraper engine.

Installation
============

[](#installation)

Through composer:

```
"require": {
    ...
    "graceas/php-scraper-engine": "v0.2.5"
    ...
}

```

Usage
=====

[](#usage)

```
include_once __DIR__.'/../vendor/autoload.php';

$scraper = new \ScraperEngine\Scraper(array(
    // prepare URLs
    new \ScraperEngine\Rules\PaginatorRequestBuilderRule(
        'index_pages', // store as
        array(), // usage from store
        array( // settings
            'categories' => array(
                'cat1/subcat1' => array(
                    'start_page' => 3,
                    'end_page'   => 5
                ),
                'cat1/subcat2' => array(
                    'start_page' => 3,
                    'end_page'   => 5
                ),
            ),
            'base_url' => 'https://www.example.com/{category}/?page={page}',
            'create_request_function' => function ($url) {
             return (new \ScraperEngine\Loader\Request\SimpleCurlRequestWrapper())
                 ->setUrl($url)
                 ->setMethod(SimpleCurlWrapper\SimpleCurlRequest::METHOD_GET)
                 ->setHeaders(array(
                     'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
                     'accept-language: en-GB,en-US;q=0.9,en;q=0.8,ru;q=0.7',
                     'cache-control: no-cache',
                     'cookie: dfp_segment_test=47; dfp_segment_test_v3=17; dfp_segment_test_v4=62; dfp_segment_test_oa=2',
                     'pragma: no-cache',
                     'referer: https://www.example.com/',
                     'sec-fetch-mode: navigate',
                     'sec-fetch-site: same-origin',
                     'sec-fetch-user: ?1',
                     'upgrade-insecure-requests: 1',
                     'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
                 ));
            }
        )
    ),
    // create requests and load
    new \ScraperEngine\Rules\LoadRequestsRule(
        'index_pages_responses', // store as
        array( // usage from store
            'index_pages'
        ),
        array( // settings
            'loader' => new \ScraperEngine\Loader\SimpleCurlLoaderWrapper(),
            'response_class' => '\ScraperEngine\Loader\Response\SimpleCurlResponseWrapper'
        )
    ),
    // parse responses content
    new \ScraperEngine\Rules\ParseResponsesRule(
        'index_pages_products',
        array(
            'index_pages_responses'
        ),
        array(
            'parser'       => new \ScraperEngine\Parser\HtmlToArrayParser(),
            'instructions' => file_get_contents('index_pages.xpath')
        )
    ),
    // format multi-dimesion array to single-dimension
    new \ScraperEngine\Rules\FormatResponsesRule(
        'index_pages_products_formatted_flatted',
        array(
            'index_pages_products'
        ),
        array(
            'formatter' => new \ScraperEngine\Formatter\ArrayToFlatArrayFormatter()
        )
    ),
    // format every array item to json
    new \ScraperEngine\Rules\FormatResponsesRule(
        'index_pages_products_formatted_json',
        array(
            'index_pages_products_formatted_flatted'
        ),
        array(
            'formatter' => new \ScraperEngine\Formatter\ArrayToJsonFormatter(array(
                \ScraperEngine\Formatter\ArrayToJsonFormatter::OPTION_SPLIT_ARRAY_TO_SINGLE_ELEMENTS => true,
            ))
        )
    ),
    // store every item
    new \ScraperEngine\Rules\StoreDataRule(
        '',
        array(
            'index_pages_products_formatted_json'
        ),
        array(
            'storage' => new \ScraperEngine\Storage\FileStorage('results')
        )
    ),
));

$scraper->execute();

```

index\_pages.xpath content
==========================

[](#index_pagesxpath-content)

```
;; => query('//table[contains(@class, "ad_id")]') || null
title => current -> query('.//td[contains(@class, "title-cell")]/div/h3') -> item('0') -> __get('nodeValue') || null
category => current -> query('.//td[contains(@class, "title-cell")]/div/p') -> item('0') -> __get('nodeValue') || null
product_id => current -> getAttribute('data-id') || null

```

###  Health Score

24

—

LowBetter than 32% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity8

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity52

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~3 days

Recently: every ~17 days

Total

24

Last Release

2323d ago

PHP version history (2 changes)v0.0.2PHP &gt;=5.6

v0.0.7PHP &gt;=5.5.9

### Community

Maintainers

![](https://www.gravatar.com/avatar/bd86282716d0a29e62ac25f6b5ad3e03bf3155700437aba1da4829f8b46d974a?d=identicon)[Graceas](/maintainers/Graceas)

---

Top Contributors

[![Graceas](https://avatars.githubusercontent.com/u/3995794?v=4)](https://github.com/Graceas "Graceas (59 commits)")

---

Tags

scraper

### Embed Badge

![Health badge](/badges/graceas-php-scraper-engine/health.svg)

```
[![Health](https://phpackages.com/badges/graceas-php-scraper-engine/health.svg)](https://phpackages.com/packages/graceas-php-scraper-engine)
```

###  Alternatives

[raiym/instagram-php-scraper

Instagram PHP Scraper. Get account information, photos and videos without any authorization

3.3k1.2M6](/packages/raiym-instagram-php-scraper)[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[nelexa/google-play-scraper

Scrapes app data from Google Play store.

88487.4k](/packages/nelexa-google-play-scraper)[crwlr/crawler

Web crawling and scraping library.

37214.8k2](/packages/crwlr-crawler)[raulr/google-play-scraper

A PHP scraper to get app data from Google Play

12892.7k](/packages/raulr-google-play-scraper)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
