PHPackages                             ddliu/spider - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. ddliu/spider

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

ddliu/spider
============

Light weight spider for the web.

v0.2.9(11y ago)19364MITPHP

Since Nov 6Pushed 11y ago4 watchersCompare

[ Source](https://github.com/ddliu/spider)[ Packagist](https://packagist.org/packages/ddliu/spider)[ RSS](/packages/ddliu-spider/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (7)Versions (21)Used By (0)

Spider [![Build Status](https://camo.githubusercontent.com/0e15a96ad43bae61c209d897eba117d293eae08186ba782e0702ac1702056290/68747470733a2f2f7472617669732d63692e6f72672f64646c69752f7370696465722e737667)](https://travis-ci.org/ddliu/spider)
==============================================================================================================================================================================================================================================

[](#spider-)

A flexible spider in PHP.

Concepts
--------

[](#concepts)

A spider contains many processors called `pipes`, you can pass as many tasks as you like to the spider, each task go through these `pipes` and get processed.

Installation
------------

[](#installation)

```
composer require ddliu/spider

```

Requirements
------------

[](#requirements)

- PHP5.3+
- curl(RequestPipe)

Dependencies
------------

[](#dependencies)

See `composer.json`.

Usage
-----

[](#usage)

```
use ddliu\spider\Spider;
use ddliu\spider\Pipe\NormalizeUrlPipe;
use ddliu\spider\Pipe\RequestPipe;
use ddliu\spider\Pipe\DomCrawlerPipe;

(new Spider())
    ->pipe(new NormalizeUrlPipe())
    ->pipe(new RequestPipe())
    ->pipe(new DomCrawlerPipe())
    ->pipe(function($spider, $task) {
        $task['$dom']->filter('a')->each(function($a) use ($task) {
            $href = $a->attr('href');
            $task->fork($href);
        })
    })
    // the entry task
    ->addTask('http://example.com')
    ->run()
    ->report();
```

Find more examples in `examples` folder.

Spider
------

[](#spider)

The `Spider` class.

### Options

[](#options)

- limit: maxmum tasks to run

### Methods

[](#methods)

- `pipe($pipe)`: add a pipe
- `addTask($task)`: add a task
- `run()`: run the spider
- `report()`: write report to log

Task
----

[](#task)

A task contains the data array and some helper functions.

The `Task` class implements `ArrayAccess` interface, so you can access data like array.

### Methods

[](#methods-1)

- `fork($task)`: add a sub task to the spider
- `ignore()`: ignore the task

Pipes
-----

[](#pipes)

Pipes define how each task being processed.

A pipe can be a function:

```
function($spider, $task) {}
```

Or extends the BasePipe:

```
use ddliu\spider\Pipe\BasePipe;

class MyPipe extends BasePipe {
    public function run($spider, $task) {
        // process the task...
    }
}
```

Useful Pipes
------------

[](#useful-pipes)

### NormalizeUrlPipe

[](#normalizeurlpipe)

Normalize `$task['url']`.

```
new NormalizeUrlPipe()
```

### RequestPipe

[](#requestpipe)

Start an HTTP request with `$task['url']` and save the result in `$task['content']`.

```
new RequestPipe(array(
    'useragent' => 'myspider',
    'timeout' => 10
));
```

### FileCachePipe

[](#filecachepipe)

Cache a pipe (e.g. `RequestPipe`).

```
$requestPipe = new RequestPipe();
$cacheForReqPipe = new FileCachePipe($requestPipe, [
    'input' => 'url',
    'output' => 'content',
    'root' => '/path/to/cache/root',
]);
```

### RetryPipe

[](#retrypipe)

Retry on failure.

```
$requestPipe = new RequestPipe();
$retryForReqPipe = new RetryPipe($requestPipe, [
    'count' => 10,
]);
```

### DomCrawlerPipe

[](#domcrawlerpipe)

Create a [DomCrawler](https://github.com/symfony/DomCrawler) from `$task['content']`. Access it with `$task['$dom']` in following pipes.

### ReportPipe

[](#reportpipe)

Report every 10 minutes.

```
new ReportPipe(array(
    'seconds' => 600
))
```

Logging
-------

[](#logging)

`$spider->logger` is an instance of `Monolog\Logger`. You can add logging handlers to it before start:

```
use Monolog\Handler\StreamHandler;

$spider->logger->pushHandler(new StreamHandler('path/to/your.log', Logger::WARNING));

```

TODO/Ideas
----------

[](#todoideas)

- Real world examples.
- Running tasks concurrently.(With pthread?)

Alternate
---------

[](#alternate)

Use [golang version](http://github.com/ddliu/go-spider) for better performance!

###  Health Score

31

—

LowBetter than 68% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity18

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity62

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~2 days

Total

20

Last Release

4154d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/b7cb651132ee314761f02f60a02dede47d7a94577923f742b4c79556434beeee?d=identicon)[ddliu](/maintainers/ddliu)

---

Top Contributors

[![ddliu](https://avatars.githubusercontent.com/u/797146?v=4)](https://github.com/ddliu "ddliu (45 commits)")

### Embed Badge

![Health badge](/badges/ddliu-spider/health.svg)

```
[![Health](https://phpackages.com/badges/ddliu-spider/health.svg)](https://phpackages.com/packages/ddliu-spider)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.6k](/packages/craftcms-cms)[roach-php/core

A complete web scraping toolkit for PHP

1.5k352.4k3](/packages/roach-php-core)[spatie/laravel-pjax

A pjax middleware for Laravel 5

513371.8k11](/packages/spatie-laravel-pjax)[ec-cube/ec-cube

EC-CUBE EC open platform.

78527.0k1](/packages/ec-cube-ec-cube)[visuellverstehen/statamic-classify

A useful helper to add CSS classes to all HTML tags generated by the bard editor.

20116.8k](/packages/visuellverstehen-statamic-classify)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
