PHPackages                             daa/web-scraping-sdk - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. daa/web-scraping-sdk

ActiveSdk[Utility &amp; Helpers](/categories/utility)

daa/web-scraping-sdk
====================

Composer package that simplifies web scraping

v1.1(11y ago)258MITPHPPHP &gt;=5.3.3

Since Dec 21Pushed 11y ago1 watchersCompare

[ Source](https://github.com/danielanteloagra/web-scraping-sdk)[ Packagist](https://packagist.org/packages/daa/web-scraping-sdk)[ RSS](/packages/daa-web-scraping-sdk/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (2)Dependencies (2)Versions (3)Used By (0)

Web Scraping PHP SDK
====================

[](#web-scraping-php-sdk)

This is a composer package that simplifies web content scraping providing a lightweight and easy to use code base.

Simply extend the Scraper class provided and implement the gather() method to extract the desired content using xpaths. You can then output this content to a file, store in a database, return a json string, etc.

Highlights:

- XPath driven extraction of content
- Just one method to implement
- Allows easy file writing, database storage or formatted string/object return
- PSR2 coding standards
- Uses cURL to retrieve content from specified source
- Configurable failed attempts retry count and pause time
- Easily follow links to get additional content

Packagist link:

Usage
-----

[](#usage)

Add the following requirement to your composer file and do a composer install/update:

```
  "require": {
        ...
        "daa/web-scraping-sdk: "1.*"
  },

```

Write your own scraper class which extends Scraper\\Sdk\\WebScraper and implements the gather method:

```
namespace Your\Package\Scraper;

use Scraper\Sdk\WebScraper;

class YourScraper extends WebScraper
{
    /**
     * {@inheritdoc}
     */
    protected function gather(\DOMXPath $dom)
    {
        $nodes = $dom->query(".//article[@class='product']");
        foreach ($nodes as $node) {
            ...
            // follow a url and extract more data
            $linkDom = $this->getLinkContent($node->getElementsByTagName('a')->item(0));
            $linkDom->query...

        }
    }
}

```

Now call your class, for example from a script that is executed by a cron job:

```
require __DIR__.'/../vendor/autoload.php';

$scraper = new Your\Package\Scraper\YourScraper('http://www.someurl.com/with/content/');
$scraper->execute();

```

With troublesome sources you can specify the retry configuration (default is 3 retries with a 3 second pause in between)

```
$scraper = new Your\Package\Scraper\YourScraper('http://www.someurl.com/with/content/', $retryAttempts, $pauseSeconds);
$scraper->execute();

```

You can use the same instance to scrape several urls with the same structure:

```
$pages = array(
    'http://www.someurl.com/section-one/',
    'http://www.someurl.com/section-two/page1',
    'http://www.someurl.com/section-one/page2'
);

$scraper = new Your\Package\Scraper\YourScraper();

foreach ($pages as $url) {
    $scraper->setSource($url);
    $scraper->execute();
}

```

Check out the examples folder for more details and fully working examples.

###  Health Score

27

—

LowBetter than 49% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity11

Limited adoption so far

Community4

Small or concentrated contributor base

Maturity59

Maturing project, gaining track record

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

2

Last Release

4157d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/ed63606fec8f7c8e47b8528c24b96c1f6627274d47f8dec2a0532c8d3c359cce?d=identicon)[danielanteloagra](/maintainers/danielanteloagra)

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/daa-web-scraping-sdk/health.svg)

```
[![Health](https://phpackages.com/badges/daa-web-scraping-sdk/health.svg)](https://phpackages.com/packages/daa-web-scraping-sdk)
```

###  Alternatives

[nativephp/electron

Electron wrapper for the NativePHP framework.

519114.4k8](/packages/nativephp-electron)[kadet/keylighter

Yet another syntax highlighter for PHP

333.2k1](/packages/kadet-keylighter)[code16/formoj

Customizable form renderer

332.6k](/packages/code16-formoj)[nullthoughts/laravel-data-sync

Laravel utility to keep records synced between environments through source control

331.4k](/packages/nullthoughts-laravel-data-sync)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
