PHPackages                             octopoda/octopus - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. octopoda/octopus

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

octopoda/octopus
================

PHP Sitemap crawler

0.11.1(1y ago)115.0k1[3 issues](https://github.com/dpovshed/octopus/issues)MITPHPPHP ~8.3 || ~8.4CI failing

Since Jul 26Pushed 9mo ago2 watchersCompare

[ Source](https://github.com/dpovshed/octopus)[ Packagist](https://packagist.org/packages/octopoda/octopus)[ RSS](/packages/octopoda-octopus/feed)WikiDiscussions master Synced today

READMEChangelog (10)Dependencies (10)Versions (31)Used By (0)

Octopus Sitemap Crawler
=======================

[](#octopus-sitemap-crawler)

Small PHP tool to crawl collections of URLs in a Sitemap using the [PHPReact](https://github.com/reactphp/react) library for asynchronous loading of the URLs. Both plain text files and [XML Sitemaps](https://www.sitemaps.org/protocol.html) are supported.

[![Logo](logo-medium.png)](logo-medium.png)

Usage from the Command Line Interface (CLI)
-------------------------------------------

[](#usage-from-the-command-line-interface-cli)

Crawl the URLs in a Sitemap with verbose logging (`-vvv`).

```
php application.php http://www.domain.ext/sitemap.xml -vvv
```

Using 15 concurrent connections instead of the default 5 concurrent connections:

```
php application.php http://www.domain.ext/sitemap.xml --concurrency 15 -vvv
```

Use a `HTTP GET` request instead of the default `HTTP HEAD`. Note that `HTTP HEAD` requests involve less data transfer since no body is involved:

```
php application.php http://www.domain.ext/sitemap.xml --requestType GET -vvv
```

Use a timeout of 3 seconds instead of the default 10 seconds:

```
php application.php http://www.domain.ext/sitemap.xml --timeout 3 -vvv
```

Use a specific UserAgent instead of the default `Octopus/1.0`, for example, to simulate a search engine crawling a sitemap:

```
php application.php http://www.domain.ext/sitemap.xml --userAgent 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' -vvv
```

Use the `TablePresenter` to display intermediate results instead of the default `EchoPresenter`:

```
php application.php http://www.domain.ext/sitemap.xml --presenter Octopus\\Presenter\\TablePresenter -vvv
```

Usage from your own application
-------------------------------

[](#usage-from-your-own-application)

You can easily integrate sitemap crawling in your own application, have a look at the `Config` class for all possible configuration options. If required you can use a [PSR3-Logger](https://www.php-fig.org/psr/psr-3/) for logging purposes.

```
use Octopus\Config;
use Octopus\Processor;

$config = new Config();
$config->concurrency = 2;
$config->targetFile = 'https://www.domain.ext/sitemap.xml';
$config->additionalResponseHeadersToCount = array(
    'CF-Cache-Status', //Useful to check CloudFlare edge server cache status
);
$config->requestHeaders = array(
    'User-Agent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', //Simulate Google's webcrawler
);
$processor = new Processor($config, $this->logger); //A PSR3 Logger can be injected if required
$processor->run();

$this->logger->info('Statistics: ' . print_r($processor->result->getStatusCodes(), true));
$this->logger->info('Applied concurrency: ' . $config->concurrency);
$this->logger->info('Total amount of processed data: ' . $processor->result->getTotalData());
$this->logger->info('Failed to load #URLs: ' . count($processor->result->getBrokenUrls()));
```

Limitations
-----------

[](#limitations)

Currently, Octopus is mainly an experimental / educational tool. Advanced use cases in HTTP response handling might not be supported.

Tests
-----

[](#tests)

To run the test suite, you first need to clone this repository and then install all dependencies [using Composer](https://getcomposer.org):

```
$ composer install
```

To run the test suite, go to the project root and run:

```
$ make test
```

###  Health Score

45

—

FairBetter than 91% of packages

Maintenance42

Moderate activity, may be stable

Popularity29

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity81

Battle-tested with a long release history

 Bus Factor1

Top contributor holds 88.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~92 days

Recently: every ~282 days

Total

30

Last Release

582d ago

PHP version history (6 changes)0.1.0PHP ^7.1

0.8.0PHP ^7.4

0.8.4PHP ^7.4 || ^8.0

0.9.0PHP ^8.0

0.11.0PHP ~8.3

0.11.1PHP ~8.3 || ~8.4

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/786020?v=4)[Dennis Povshedny](/maintainers/dpovshed)[@dpovshed](https://github.com/dpovshed)

---

Top Contributors

[![holtkamp](https://avatars.githubusercontent.com/u/776405?v=4)](https://github.com/holtkamp "holtkamp (101 commits)")[![dpovshed](https://avatars.githubusercontent.com/u/786020?v=4)](https://github.com/dpovshed "dpovshed (13 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan, Rector

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/octopoda-octopus/health.svg)

```
[![Health](https://phpackages.com/badges/octopoda-octopus/health.svg)](https://phpackages.com/packages/octopoda-octopus)
```

###  Alternatives

[matomo/matomo

Matomo is the leading Free/Libre open analytics platform

21.7k38.9k](/packages/matomo-matomo)[composer/composer

Composer helps you declare, manage and install dependencies of PHP projects. It ensures you have the right stack everywhere.

29.5k196.2M3.1k](/packages/composer-composer)[phpro/soap-client

A general purpose SoapClient library

8896.1M54](/packages/phpro-soap-client)[flow-php/flow

PHP ETL - Extract Transform Load - Data processing framework

85036.3k](/packages/flow-php-flow)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49617.1k](/packages/blackfire-player)[dagger/dagger

Dagger PHP SDK

261.1k](/packages/dagger-dagger)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
