PHPackages                             boyhagemann/scrape - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. boyhagemann/scrape

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

boyhagemann/scrape
==================

118PHP

Since Jun 16Pushed 12y ago1 watchersCompare

[ Source](https://github.com/boyhagemann/scrape)[ Packagist](https://packagist.org/packages/boyhagemann/scrape)[ RSS](/packages/boyhagemann-scrape/feed)WikiDiscussions master Synced 3w ago

READMEChangelogDependenciesVersions (1)Used By (0)

Scrape
------

[](#scrape)

With this awesome Laravel 4 package you can:

- scrape any contents from a page
- follow specific link paths to get to that content
- use closures for optimum usability

### How to install

[](#how-to-install)

Use Composer to install with all the dependencies: `composer require boyhagemann/scrape *`

Then you add the ServiceProvider to the application config:

```
'Boyhagemann\Scrape\ScrapeServiceProvider'
```

You can optionally add the alias:

```
'Scraper' => 'Boyhagemann\Scrape\Facades\Container'
```

How does it work?
-----------------

[](#how-does-it-work)

Scrape uses two components for scraping pages:

##### Container

[](#container)

The container is the single class you will use in most cases. It registers a name and a closure with information how to scrape the page.

##### Page

[](#page)

A page is a template that is used for many urls. An example would be a news item page. It can have many urls, but it has one page template. With Scrape you can define how to deal with the content on that page.

### Add pages

[](#add-pages)

The first thing you want to do is to add a page to the container.

```
Scraper::add('my-first-page', function() {

    // Start scraping...

});
```

If you don't use the facade, you can do something like this:

```
$container = App::make('Boyhagemann\Scrape\Container');
$container->add('my-second-page', function($crawler) {

    // Your magic scraping starts here...

});
```

Now start scraping!
-------------------

[](#now-start-scraping)

After you defined all pages, you are ready to scrape some content! This is done very easily, like this:

```
Scraper::scrape('my-first-page', 'http://theurl.toscrape.com');
```

### How the use the Crawler

[](#how-the-use-the-crawler)

Scrape uses the Symfony DomCrawler component to crawl the html from a url. You can check out there documention for full details. In order to use autocomplete in your IDE, it is useful to type hint the $crawler variable:

```
use Symfony\Component\DomCrawler\Crawler;

Scraper::add('page-name', function(Crawler $crawler) {

    // You have autocompletion on the $crawler instance...

});
```

Crawling strategies
-------------------

[](#crawling-strategies)

Most of the time, you don't know exactly all urls to the desired content. If you have thousands of urls to crawl, it is impossible to manage this manually. You can use Scrape to follow links to get to the desired content.

### Chain pages together

[](#chain-pages-together)

You can have crawl multiple pages after each other with great ease:

```
// Add a page that has links to your content
Scraper::add('page-1', function($crawler) {

    $crawler->filter('.your-link')->each(function($node) {
        Scraper::scrape('page-2', $node->attr('href'));
    });
});

// Add the page with all the content
Scraper::add('page-2', function($crawler) {

    $crawler->filter('.your-content')->each(function($node) {

        // Get the content and do a little dance!

    });
});
```

### No more time outs!

[](#no-more-time-outs)

Chained processes can consume lots of time and resources, so don't go mental on chaining everything. You can use the Laravel Queue or a database in conjunction with cron jobs to manage all page crawls. This will save you from the nasty requrest time outs!

```
Scraper::add('page-1', function($crawler) {

    $crawler->filter('.link')->each(function($node) {

        // Put the next crawl on a queue
        Queue::push(function($job) use ($node) {

            // Scrape this page!
            Scraper::scrape('page-2', $node->attr('href'));

            // Delete the queue job once finished
            $job->delete();
        });

    });

});
```

###  Health Score

21

—

LowBetter than 18% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity8

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity41

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 61.1% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/2a43699bf9428b2ee2e827915e14321f44c4cb7e6fbf4c6a3faf086937a852ce?d=identicon)[boyhagemann](/maintainers/boyhagemann)

---

Top Contributors

[![boyhagemann](https://avatars.githubusercontent.com/u/737610?v=4)](https://github.com/boyhagemann "boyhagemann (11 commits)")[![webble](https://avatars.githubusercontent.com/u/5823497?v=4)](https://github.com/webble "webble (7 commits)")

### Embed Badge

![Health badge](/badges/boyhagemann-scrape/health.svg)

```
[![Health](https://phpackages.com/badges/boyhagemann-scrape/health.svg)](https://phpackages.com/packages/boyhagemann-scrape)
```

###  Alternatives

[mck89/peast

Peast is PHP library that generates AST for JavaScript code

19139.2M45](/packages/mck89-peast)[sauladam/shipment-tracker

Parses tracking information for several carriers, like UPS, USPS, DHL and GLS by simply scraping the data. No need for any kind of API access.

9843.5k](/packages/sauladam-shipment-tracker)[jstewmc/rtf

Read and write Rich Text Format (RTF) documents with PHP

45153.1k6](/packages/jstewmc-rtf)[tcds-io/php-jackson

A lightweight, flexible object serializer for PHP, inspired by FasterXML/jackson

113.2k10](/packages/tcds-io-php-jackson)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)