PHPackages                             ssola/crawly - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. ssola/crawly

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

ssola/crawly
============

Simple web crawler library

1.0.0(11y ago)418PHP

Since Nov 18Pushed 10y ago1 watchersCompare

[ Source](https://github.com/ssola/crawly)[ Packagist](https://packagist.org/packages/ssola/crawly)[ RSS](/packages/ssola-crawly/feed)WikiDiscussions master Synced 3w ago

READMEChangelog (1)Dependencies (5)Versions (2)Used By (0)

Crawly [![Build Status](https://camo.githubusercontent.com/a86b11db3462dea40342f9dc0a84a2bffd40009adbed8e7963817220060e3462/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f73736f6c612f637261776c792f6261646765732f6275696c642e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/ssola/crawly/build-status/master) [![Scrutinizer Code Quality](https://camo.githubusercontent.com/a991f31bfd9ef4b435a0cc59f8d1aea46c188a626437566a956acd22eeef6b1b/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f73736f6c612f637261776c792f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/ssola/crawly/?branch=master)
=======================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================================

[](#crawly--)

Crawly is a simple web crawler able to extract and follow links depending on the discovers.

### Simple Example

[](#simple-example)

```
require_once("vendor/autoload.php");

// Create a new Crawly object
$crawler = Crawly\Factory::generic();

// Discovers are allows you to extract links to follow
$crawler->attachDiscover(
    new Crawly\Discovers\CssSelector('nav.pagination > ul > li > a')
);

// After we scrapped and discovered links you can add your own closures to handle the data
$crawler->attachExtractor(
    function($response) {
        // here we have the response, work with it!
    }
);

// set seed page
$crawler->setSeed("http://www.webpage.com/test/");

// start the crawler
$crawler->run();
```

Crawler object
--------------

[](#crawler-object)

You can create a simple crawler with the Crawler Factory, it will generate a Crawly object using Guzzle as Http client.

```
$crawler = Crawly\Factory::generic();
```

You can create a personalized crawler specified which Http client, Url queue and Visited link collection to use.

```
$crawler = Crawly\Factory::create(new MyHttpClass(), new MyUrlQueue(), new MyVisitedCollection());
```

Discovers
---------

[](#discovers)

Discovers are used to extract from the html a set of links to include to the queue. You can include as many discovers as you want and you can create your own discovers classes too.

At the moment Crawly only includes a Css Selector discover.

### Create your own discover

[](#create-your-own-discover)

Just create a new class that implements the **Discoverable** interface. This new class should look like this example:

```
class MyOwnDiscover implements Discoverable
{
    private $configuration;

    public function __construct($configuration)
    {
        $this->configuration = $configuration;
    }

    public function find(Crawly &$crawler,  $response)
    {
        // $response has the crawled url content
        // do some magin on the response and get a colleciton of links

        foreach($links as $node) {
            $uri = new Uri($node->getAttribute('href'), $crawler->getHost());

            // if url was not visited we should include this new links to the Url Queue
            if(!$crawler->getVisitedUrl()->seen($uri->toString())) {
                $crawler->getUrlQueue()->push($uri);
            }
        }
    }
}
```

Limiters
--------

[](#limiters)

Limiters are used to limit the crawler actions. For instance, we can limit how many links can been crawled or which is the maximum amout of bandwitdth to use.

Extractors
----------

[](#extractors)

###  Health Score

28

—

LowBetter than 52% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity10

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity63

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

4236d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/0489f67bef65773d8ec49277f1f0a82efffaab33adf47b0b9ba753cc72633d44?d=identicon)[ssola](/maintainers/ssola)

---

Top Contributors

[![ssola](https://avatars.githubusercontent.com/u/1267434?v=4)](https://github.com/ssola "ssola (1 commits)")

### Embed Badge

![Health badge](/badges/ssola-crawly/health.svg)

```
[![Health](https://phpackages.com/badges/ssola-crawly/health.svg)](https://phpackages.com/packages/ssola-crawly)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.9k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k17.7M58](/packages/spatie-crawler)[drupal/core-dev

require-dev dependencies from drupal/drupal; use in addition to drupal/core-recommended to run tests from drupal/core.

2022.0M321](/packages/drupal-core-dev)[spatie/laravel-export

Create a static site bundle from a Laravel app

672139.5k6](/packages/spatie-laravel-export)[dominikb/composer-license-checker

Utility to check for licenses of dependencies and block/allow them.

574.3M12](/packages/dominikb-composer-license-checker)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49517.1k](/packages/blackfire-player)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
