PHPackages                             simgroep/concurrent-spider-bundle - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [HTTP &amp; Networking](/categories/http)
4. /
5. simgroep/concurrent-spider-bundle

AbandonedSymfony-bundle[HTTP &amp; Networking](/categories/http)

simgroep/concurrent-spider-bundle
=================================

Symfony bundle for running a distributed web page crawler

1.1.0(8y ago)51777MITPHPPHP &gt;=5.4.0

Since Apr 5Pushed 8y ago11 watchersCompare

[ Source](https://github.com/simgroep/concurrent-spider-bundle)[ Packagist](https://packagist.org/packages/simgroep/concurrent-spider-bundle)[ RSS](/packages/simgroep-concurrent-spider-bundle/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (11)Used By (0)

Concurrent Spider Bundle
========================

[](#concurrent-spider-bundle)

[![Build Status](https://camo.githubusercontent.com/a1b2c3a230d9111d0432343fed023a8cdb14d5b9996c8247fe2dbf9d5f47c658/68747470733a2f2f7472617669732d63692e6f72672f73696d67726f65702f636f6e63757272656e742d7370696465722d62756e646c652e7376673f6272616e63683d6d6173746572)](http://travis-ci.org/simgroep/concurrent-spider-bundle)[![Coverage Status](https://camo.githubusercontent.com/61a00f939ca27bc6417872cd4ae534b2ecf5ecb2ef05e47298046736e73996c9/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f73696d67726f65702f636f6e63757272656e742d7370696465722d62756e646c652f62616467652e7376673f6272616e63683d6d6173746572)](https://coveralls.io/r/simgroep/concurrent-spider-bundle?branch=master)

This bundle provides a set of commands to run a distributed web page crawler. Crawled web pages are saved to Solr.

### Installation

[](#installation)

Install it with Composer:

```
composer require simgroep/concurrent-spider-bundle dev-master

```

Then add it to your `AppKernel.php`

```
new Simgroep\ConcurrentSpiderBundle\SimgroepConcurrentSpiderBundle(),

```

It is needed to install  - only pdftotext is realy to be functional from command line:

```
/path_to_command/pdftotext pdffile.pdf

```

### Configuration

[](#configuration)

Minimal configuration is necessary. The crawler needs to know the mapping you're using in Solr so it can save documents. The only mandatory part of the config is "mapping". Other values are optional:

```
simgroep_concurrent_spider:
    http_user_agent: "PHP Concurrent Spider"

    rabbitmq.host: localhost
    rabbitmq.port: 5672
    rabbitmq.user: guest
    rabbitmq.password: guest

    queue.discoveredurls_queue: discovered_urls
    queue.indexer_queue: indexer

    solr.host: localhost
    solr.port: 8080
    solr.path: /solr

    mapping:
        id: #required
        title: #required
        content: #required
        url: #required
        tstamp: ~
        date: ~
        publishedDate: ~

```

### How does it work?

[](#how-does-it-work)

You start the crawler with:

```
app/console simgroep:start-crawler https://github.com

```

This will add one job to the queue to crawl the url . Then run the following process in background to start crawling:

```
app/console simgroep:crawl

```

It's recommended to use a tool to maintain the crawler process in background. We recommend Supervisord. You can run as many as threads as you like (and your machine can handle), but you should be careful to not flood the website. Every thread acts as a visitor on the website you're crawling.

### Architecture

[](#architecture)

This bundle uses RabbitMQ to keep track of a queue that has URLs that should be indexed. Also it uses Solr to save the crawled web pages.

###  Health Score

33

—

LowBetter than 75% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity18

Limited adoption so far

Community23

Small or concentrated contributor base

Maturity64

Established project with proven stability

 Bus Factor2

2 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~5 days

Total

10

Last Release

3279d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/6364c4564d715c729971baa827648629d3ee735123dfa1ed52ff20746ff2f427?d=identicon)[Breuls](/maintainers/Breuls)

![](https://www.gravatar.com/avatar/1297db74356a6aa881ad109d47a50987e5825a75e7399098d6f86792027a5fbc?d=identicon)[simgroep](/maintainers/simgroep)

---

Top Contributors

[![keesschepers](https://avatars.githubusercontent.com/u/915930?v=4)](https://github.com/keesschepers "keesschepers (96 commits)")[![lkalinka](https://avatars.githubusercontent.com/u/5477348?v=4)](https://github.com/lkalinka "lkalinka (47 commits)")[![Breuls](https://avatars.githubusercontent.com/u/1822037?v=4)](https://github.com/Breuls "Breuls (40 commits)")[![nickvkaam](https://avatars.githubusercontent.com/u/8651871?v=4)](https://github.com/nickvkaam "nickvkaam (19 commits)")[![smolowik](https://avatars.githubusercontent.com/u/8179206?v=4)](https://github.com/smolowik "smolowik (15 commits)")[![websid](https://avatars.githubusercontent.com/u/5891316?v=4)](https://github.com/websid "websid (9 commits)")[![marek-binkowski-sim](https://avatars.githubusercontent.com/u/7207952?v=4)](https://github.com/marek-binkowski-sim "marek-binkowski-sim (9 commits)")[![othillo](https://avatars.githubusercontent.com/u/2786663?v=4)](https://github.com/othillo "othillo (9 commits)")[![e0jopeka](https://avatars.githubusercontent.com/u/26150941?v=4)](https://github.com/e0jopeka "e0jopeka (5 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/simgroep-concurrent-spider-bundle/health.svg)

```
[![Health](https://phpackages.com/badges/simgroep-concurrent-spider-bundle/health.svg)](https://phpackages.com/packages/simgroep-concurrent-spider-bundle)
```

###  Alternatives

[yiicod/yii2-socketio

The simple and powerful socketio for the Yii2 framework

4619.7k](/packages/yiicod-yii2-socketio)[swoole-bundle/swoole-bundle

Open/Swoole Symfony Bundle

6650.4k](/packages/swoole-bundle-swoole-bundle)[sfcod/socketio

SocketIo adapter for Symfony

252.7k](/packages/sfcod-socketio)[abantecart/ups-php

UPS PHP SDK based on OAuth

1815.3k](/packages/abantecart-ups-php)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
