PHPackages                             mihaeu/tarantula - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. mihaeu/tarantula

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

mihaeu/tarantula
================

Another PHP crawler based on Guzzle.

v1.3.0(11y ago)1550[1 issues](https://github.com/mihaeu/tarantula/issues)MITHTMLPHP &gt;=5.3.3

Since Jun 27Pushed 10y ago2 watchersCompare

[ Source](https://github.com/mihaeu/tarantula)[ Packagist](https://packagist.org/packages/mihaeu/tarantula)[ Docs](https://github.com/mihaeu/tarantula)[ RSS](/packages/mihaeu-tarantula/feed)WikiDiscussions master Synced 6d ago

READMEChangelogDependencies (10)Versions (9)Used By (0)

[![Logo](https://camo.githubusercontent.com/ae5d4bebfbe707a5b3c501319cdd68ba9161cb0fb2928e586d6ec277d96c5010/68747470733a2f2f63646e312e69636f6e66696e6465722e636f6d2f646174612f69636f6e732f6465736b746f702d68616c6c6f7765656e2f34382f5370696465722e706e67)](https://github.com/mihaeu/tarantula) Tarantula
==========================================================================================================================================================================================================================================================================================================

[](#-tarantula)

[![Build Status](https://camo.githubusercontent.com/afce7864324268d28c848be084ddd611e2fb6779f60d59fd2cdd4e588094cb94/68747470733a2f2f7472617669732d63692e6f72672f6d69686165752f746172616e74756c612e7376673f6272616e63683d646576656c6f70)](https://travis-ci.org/mihaeu/tarantula)[![Coverage Status](https://camo.githubusercontent.com/0d6800f16a6d05b0ab5a6b436246ac5e69a0b5014328dbb33270f1042345aed8/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f6d69686165752f746172616e74756c612f62616467652e706e67)](https://coveralls.io/r/mihaeu/tarantula)[![SensioLabsInsight](https://camo.githubusercontent.com/55048fbee3406d78726d9a2551b4d8dcbf19b477a830a802951100ee689e6a7d/68747470733a2f2f696e73696768742e73656e73696f6c6162732e636f6d2f70726f6a656374732f34666262303664322d343436352d346564392d393931372d6339363236646465643830312f6d696e692e706e67)](https://insight.sensiolabs.com/projects/4fbb06d2-4465-4ed9-9917-c9626dded801)

Tarantula is a web crawler written in PHP. It utilizes the amazing work of the people behind Guzzle and Symfony's DomCrawler.

Installation
------------

[](#installation)

### Global tool

[](#global-tool)

Make sure `~/.composer/bin` is in your `$PATH` and then simply execute:

```
composer global require mihaeu/tarantula:1.*
```

### Library

[](#library)

Assuming you are using [Composer](http://getcomposer.org), add the following to your `composer.json` file:

```
{
    "require": {
        "mihaeu/tarantula": "1.*"
    }
}
```

or use Composer's cli tool `composer require mihaeu/tarantula:1.*`.

Usage
-----

[](#usage)

### Global tool

[](#global-tool-1)

Right now the only command available is `crawl`. Some usage examples would be:

```
# most basic use case
tarantula crawl "http://google.com"

# go deeper
tarantula crawl "http://products.com/categories" --depth=4

# mirror
tarantula crawl "http://myblog.com" --mirror=/tmp/blog-backup

# filters
tarantula crawl "http://myblog.com" --contains=yolo
tarantula crawl "http://myblog.com" --regex="(post)\|(\d+)"

# dump crawled file in hashed files
tarantula crawl "http://myblog.com" --save-hashed=/tmp/blog-backup --minify-html

# HTTP basic auth
tarantula crawl "http://secure.com" --user=admin --password=admin

# search for "Avatar" on imdb
bin/tarantula crawl "http://www.imdb.com/find?q=avatar&s=all" --depth=0 --quiet --css=".findSection td.result_text"

# today's weather in seattle
bin/tarantula crawl --depth=0 "http://www.weather.com/weather/today/Seattle+WA+USWA0395:1:US" --css=".wx-first" | head -n 2
```

For all arguments and options use the `help` command:

```
tarantula help                    # displays all available commands
tarantula help crawl              # all arguments and options for the crawler
tarantula crawl "..." --verbose   # switch on debugging output
```

### Library

[](#library-1)

Have a look at the tests to see what's possible or just try the following in your code:

```
use Mihaeu\Tarantula\Crawler;
use Mihaeu\Tarantula\HttpClient;

$crawler = new Crawler(new HttpClient('http://google.com'));
$links = $crawler->go(1);
```

All HTTP requests go through `Guzzle` and you can add any configuration for `Guzzle`'s request object also to Tarantula's `HttpClient`.

Tests
-----

[](#tests)

Test coverage is not at 100%, the reason being that this was an afternoon project and testing a crawler takes a lot of time due to the testing setup.

If you want to get a quick overview of the project, I recommend running the test suite with the `--testdox` flag:

```
vendor/bin/phpunit --testdox
```

To Do
-----

[](#to-do)

- filters (url, filetype, etc.)
- allow for Guzzle to be configured via command line
- more actions (save plain result, crawl via DOM/XPath, ...)

Troubleshooting
---------------

[](#troubleshooting)

### Composer global install fails

[](#composer-global-install-fails)

This is most likely due to a conflict with some requirements of other global installs. Unfortunately Composer's architecture doesn't offer a solution for this yet. I tried to keep the requirements Tarantula loose to avoid this problem.

If you want to have Tarantula available throughout your system, just install to another directory (e.g. using `composer create-project`) and symlink `bin/tarantula` into a folder in your `$PATH`.

Thanks to
---------

[](#thanks-to)

- [Symfony](http://symfony.com/)/[SensioLabs](http://sensiolabs.com/en) and especially [Fabien Potencier](http://fabien.potencier.org/) for what he does for PHP (for this particular project the [DomCrawler](https://github.com/symfony/DomCrawler))
- the [Guzzle](http://guzzlephp.org/) team for their awesome HTTP client
- [Aha Soft](http://www.aha-soft.com/) for the logo
- the [Composer](https://getcomposer.org/) team for revolutionizing the way I and many others write PHP
- [GitHub](https://github.com) for redefining collaboration
- [Travis CI](https://travis-ci.org/) for improving the quality and compatibility of thousands of open source projects
- [Sebastian Bergmann](http://sebastian-bergmann.de/) for [PHPUnit](http://phpunit.de) and many other awesome QA tools

License
-------

[](#license)

MIT, see `LICENSE` file.

###  Health Score

30

—

LowBetter than 64% of packages

Maintenance19

Infrequent updates — may be unmaintained

Popularity16

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity63

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

7

Last Release

4333d ago

PHP version history (2 changes)v1.0.0PHP &gt;=5.3.2

v1.1PHP &gt;=5.3.3

### Community

Maintainers

![](https://www.gravatar.com/avatar/b1cfd32112845e28319a56265bdcac255118e0fba4eb36ccfa6c718f5e52bc49?d=identicon)[mihaeu](/maintainers/mihaeu)

---

Top Contributors

[![mihaeu](https://avatars.githubusercontent.com/u/2168701?v=4)](https://github.com/mihaeu "mihaeu (65 commits)")

---

Tags

crawlerspider

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/mihaeu-tarantula/health.svg)

```
[![Health](https://phpackages.com/badges/mihaeu-tarantula/health.svg)](https://phpackages.com/packages/mihaeu-tarantula)
```

###  Alternatives

[symfony/maker-bundle

Symfony Maker helps you create empty commands, controllers, form classes, tests and more so you can forget about writing boilerplate code.

3.4k111.1M568](/packages/symfony-maker-bundle)[symplify/monorepo-builder

Not only Composer tools to build a Monorepo.

5205.3M82](/packages/symplify-monorepo-builder)[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[sulu/sulu

Core framework that implements the functionality of the Sulu content management system

1.3k1.3M152](/packages/sulu-sulu)[prestashop/prestashop

PrestaShop is an Open Source e-commerce platform, committed to providing the best shopping cart experience for both merchants and customers.

9.0k15.4k](/packages/prestashop-prestashop)[ec-cube/ec-cube

EC-CUBE EC open platform.

78527.0k1](/packages/ec-cube-ec-cube)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
