PHPackages                             mahadazad/page-scraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. mahadazad/page-scraper

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

mahadazad/page-scraper
======================

951786PHP

Since Dec 12Pushed 11y ago8 watchersCompare

[ Source](https://github.com/mahadazad/page-scraper)[ Packagist](https://packagist.org/packages/mahadazad/page-scraper)[ RSS](/packages/mahadazad-page-scraper/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependenciesVersions (1)Used By (0)

Page Scraper
============

[](#page-scraper)

Easy to use page scraper with just few lines of code. Scrap data from any website using XPath or CSS Selector.

Introduction:
=============

[](#introduction)

The easiest way to parse data from a valid xml/html page is to use XPath queries. But the method of fetching the remote data can vary e.g. using simple `file_get_contents` function which uses PHP Streams to fetch the remote page, `CURL` can be used, the famous `Guzzle` library can be used. To decouple the final product i.e. `Page` from the remote page fetching logic and to avoid leaving the `Page` object in an unstable state I have used the Builder pattern. The `Page` object is passed to the Builder object which contains the logic for fetching the remote page, then the builder is passed to the director object which tells the builder how to configure the `Page` object. In a nutshell:

```
$page = new Page('https://news.ycombinator.com');
$builder = new PageBuilder($page);
$builder->setDataConfig(array(
    'side_links' => array('css' => '.title .comhead'), // use CSS Selector
    'titles'     => '//td[@class="title"]//a/text()', // use XPath
    'links'      => '//td[@class="title"]//a/@href', // use XPath
));
$director = new PageBuilderDirector($builder);
$director->buildPage();
$data = $page->getData();
```

Using Client Class To Make Things Easier:
=========================================

[](#using-client-class-to-make-things-easier)

To avoid the boilerplate work you can use `Client` class to make life easy:

```
$client = new Client(array(
    'url'         => 'https://news.ycombinator.com',
    'data_config' => array(
        'titles' => '//td[@class="title"]//a/text()', // the xpath query
        'links' => '//td[@class="title"]//a/@href', // the xpath query
    ),
));

$page = $client->fetchPage();
$data = $page->getData();

/*
  prints:
   array (
    'titles' => array(
        'title one from the remote page',
        'title two from the remote page',
        'title three from the remote page',
        // so on...
    ),
    'links' => array(
        'http://www.example.com/one',
        'http://www.example.com/two',
        'http://www.example.com/three',
        // so on...
    ),
  )
*/
print_r($data);
```

Having said that, you can also set your own `builders` and `directors` using the client's setter methods. Please see the class defination for the docs.

Advanced Parsing Data:
======================

[](#advanced-parsing-data)

the `data_config` can contain `key` =&gt; `value` pairs. Where the value can be a valid xpath query or a callback which recieves the configured `Page` object which you can utilize for advanced parsing and the `key` holds the parsed result. E.g:

```
$client = new Client(array(
    'url'         => 'https://news.ycombinator.com',
    'data_config' => array(
        'side_links' => '.title .comhead', // use css selector
        'titles' => '//td[@class="title"]//a/text()', // use xpath query
        'links' => function ($page) {
            $links = array();
            $node_list = $page->getXpath()->query('//td[@class="title"]//a/@href');
            foreach($node_list as $node) {
                $links[] = $node->nodeValue;
            }
            return $links;
        },
    ),
));

$page = $client->fetchPage();
$data = $page->getData();
```

Installation:
=============

[](#installation)

use composer to install the library, in your composer.json:

```
{
    "require": {
        "mahadazad/page-scraper": "dev-master"
    }
}
```

or run

`php composer.phar require "mahadazad/page-scraper":"dev-master"`

###  Health Score

27

—

LowBetter than 49% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity25

Limited adoption so far

Community15

Small or concentrated contributor base

Maturity41

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 90.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/ee98edb512f3fbef56e6aa36874e32a1a7626ba10ba09e4ebc2dc89b46b74cda?d=identicon)[mahadazad](/maintainers/mahadazad)

---

Top Contributors

[![mahadazad](https://avatars.githubusercontent.com/u/696470?v=4)](https://github.com/mahadazad "mahadazad (10 commits)")[![surrealcristian](https://avatars.githubusercontent.com/u/2937280?v=4)](https://github.com/surrealcristian "surrealcristian (1 commits)")

### Embed Badge

![Health badge](/badges/mahadazad-page-scraper/health.svg)

```
[![Health](https://phpackages.com/badges/mahadazad-page-scraper/health.svg)](https://phpackages.com/packages/mahadazad-page-scraper)
```

###  Alternatives

[stevenmaguire/zurb-foundation-laravel

Build HTML form elements for Foundation inside Laravel

203.8k](/packages/stevenmaguire-zurb-foundation-laravel)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
