PHPackages                             fievel/webspider - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. fievel/webspider

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

fievel/webspider
================

webspider

0.1.0(9y ago)014.1k↓25%MITPHPPHP &gt;=5.5

Since Jul 21Pushed 8y ago2 watchersCompare

[ Source](https://github.com/Fievel90/WebSpider)[ Packagist](https://packagist.org/packages/fievel/webspider)[ Docs](https://github.com/Fievel90/WebSpider)[ RSS](/packages/fievel-webspider/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (1)Dependencies (5)Versions (2)Used By (0)

WebSpider
=========

[](#webspider)

This repository wraps Guzzle and some Symfony components providing an easy way for spidering websites.

Requirements
------------

[](#requirements)

- PHP &gt;=5.5
- Guzzle &gt;= 6.0
- Doctrine ORM &gt;= 2.2
- Symfony Components &gt;= 2.7

Installation
------------

[](#installation)

Add `fievel/webspider` as a require dependency in your `composer.json` file:

```
composer require fievel/webspider

```

Usage
-----

[](#usage)

Extend class `WebSpiderAbstract` as needed implementing these methods:

getDataFromResponse: used to extract data from response, default behaviour treats body as plain text;

```
protected function getDataFromResponse(ResponseInterface $response)
{
    return (string) $response->getBody();
}

```

parseData: used to extract data information, it's possible to initialize Symfony `DomCrawler` if needed;

```
protected function parseData($data)
{
    $this->crawler->addHtmlContent($data);

    $node = $this->crawler->filter('input');

    $value = null;
    if ($node->count() > 0) {
        $value = $node->first()->attr('value');
    }

    return $value;
}

```

handleException: used to handle Guzzle exceptions;

```
protected function handleException(\Exception $e)
{
    return null;
}

```

The only remaining thing to do is launch the spider created, in order to do that you can use the `SpiderManager` service.

```
$manager = $this->container->get('fievel_web_spider.manager.spider');
$manager->setLogger($this->logger);

$response = null;
try {
    $response = $manager->runSpider([
        AppBundle\Spiders\CustomSpider::class,  // Spider class created
        'http://localhost/test-spider',         // URL to spidering
        'post',                                 // Http method supported by Guzzle
        ['cookies' => true],                    // Custom config supported by Guzzle Client
        [                                       // Custom options supported by Guzzle Client
            RequestOptions::FORM_PARAMS => [
                'full_name' => 'John Doe'
            ]
        ]
    ]);
} catch(\Exception $e) {
}

```

Features
--------

[](#features)

It's possible to share a storage between subsequent spiders call.

```
$storage = new SpiderStorage();
$storage->add($sharedData);

$response = $manager->runSpider([
    AppBundle\Spiders\CustomSpider::class,  // Spider class created
    'http://localhost/test-spider',         // URL to spidering
    'post',                                 // Http method supported by Guzzle
    ['cookies' => true],                    // Custom config supported by Guzzle Client
    [                                       // Custom options supported by Guzzle Client
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
        ]
    ],
    $storage                                // Shared storage
]);

```

It's even possible to create queues and leave the entire execution to the manager.

```
$queue = new SpiderCallQueue();

$queue->enqueue(
    AppBundle\Spiders\FirstPageSpider::class,
    'http://localhost/test-spider',
    'post',
    ['cookies' => true],
    [
        RequestOptions::FORM_PARAMS => [
            'full_name' => 'John Doe'
        ]
    ]
);
$queue->enqueue(
    AppBundle\Spiders\SecondPageSpider::class,
    'http://localhost/test-spider',
    'get',
    ['cookies' => true],
    []
);

$response = $manager->runSpiderQueue($queue);

```

Last but not least, the `SpiderManager` will handle retries on failure using a custom `GuzzleMiddleware`.

Proxy
-----

[](#proxy)

Links
-----

[](#links)

- [Guzzle Documentation](http://docs.guzzlephp.org/en/latest/overview.html)
- [Symfony DomCrawler Documentation](http://symfony.com/doc/current/components/dom_crawler.html)

###  Health Score

28

—

LowBetter than 54% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity24

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity48

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

3588d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/2f09a62d613118ec59c0eedb1b7f6a54649ab7d8ec3cc22946426a3634cd0fb5?d=identicon)[Fievel90](/maintainers/Fievel90)

---

Top Contributors

[![Fievel90](https://avatars.githubusercontent.com/u/11059118?v=4)](https://github.com/Fievel90 "Fievel90 (5 commits)")

---

Tags

proxyspider

### Embed Badge

![Health badge](/badges/fievel-webspider/health.svg)

```
[![Health](https://phpackages.com/badges/fievel-webspider/health.svg)](https://phpackages.com/packages/fievel-webspider)
```

###  Alternatives

[ec-cube/ec-cube

EC-CUBE EC open platform.

78527.0k1](/packages/ec-cube-ec-cube)[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[wallabag/wallabag

open source self hostable read-it-later web application

12.6k2.2k](/packages/wallabag-wallabag)[shlinkio/shlink

A self-hosted and PHP-based URL shortener application with CLI and REST interfaces

4.8k4.3k](/packages/shlinkio-shlink)[spatie/laravel-pjax

A pjax middleware for Laravel 5

513371.8k11](/packages/spatie-laravel-pjax)[open-dxp/opendxp

Content &amp; Product Management Framework (CMS/PIM)

7310.3k29](/packages/open-dxp-opendxp)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
