PHPackages                             octopoda/octopus - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. octopoda/octopus

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

octopoda/octopus
================

PHP Sitemap crawler

0.11.1(1y ago)114.8k1[3 issues](https://github.com/dpovshed/octopus/issues)MITPHPPHP ~8.3 || ~8.4CI failing

Since Jul 26Pushed 8mo ago2 watchersCompare

[ Source](https://github.com/dpovshed/octopus)[ Packagist](https://packagist.org/packages/octopoda/octopus)[ RSS](/packages/octopoda-octopus/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (31)Used By (0)

Octopus Sitemap Crawler
=======================

[](#octopus-sitemap-crawler)

Small PHP tool to crawl collections of URLs in a Sitemap using the [PHPReact](https://github.com/reactphp/react) library for asynchronous loading of the URLs. Both plain text files and [XML Sitemaps](https://www.sitemaps.org/protocol.html) are supported.

[![Logo](logo-medium.png)](logo-medium.png)

Usage from the Command Line Interface (CLI)
-------------------------------------------

[](#usage-from-the-command-line-interface-cli)

Crawl the URLs in a Sitemap with verbose logging (`-vvv`).

```
php application.php http://www.domain.ext/sitemap.xml -vvv
```

Using 15 concurrent connections instead of the default 5 concurrent connections:

```
php application.php http://www.domain.ext/sitemap.xml --concurrency 15 -vvv
```

Use a `HTTP GET` request instead of the default `HTTP HEAD`. Note that `HTTP HEAD` requests involve less data transfer since no body is involved:

```
php application.php http://www.domain.ext/sitemap.xml --requestType GET -vvv
```

Use a timeout of 3 seconds instead of the default 10 seconds:

```
php application.php http://www.domain.ext/sitemap.xml --timeout 3 -vvv
```

Use a specific UserAgent instead of the default `Octopus/1.0`, for example, to simulate a search engine crawling a sitemap:

```
php application.php http://www.domain.ext/sitemap.xml --userAgent 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' -vvv
```

Use the `TablePresenter` to display intermediate results instead of the default `EchoPresenter`:

```
php application.php http://www.domain.ext/sitemap.xml --presenter Octopus\\Presenter\\TablePresenter -vvv
```

Usage from your own application
-------------------------------

[](#usage-from-your-own-application)

You can easily integrate sitemap crawling in your own application, have a look at the `Config` class for all possible configuration options. If required you can use a [PSR3-Logger](https://www.php-fig.org/psr/psr-3/) for logging purposes.

```
use Octopus\Config;
use Octopus\Processor;

$config = new Config();
$config->concurrency = 2;
$config->targetFile = 'https://www.domain.ext/sitemap.xml';
$config->additionalResponseHeadersToCount = array(
    'CF-Cache-Status', //Useful to check CloudFlare edge server cache status
);
$config->requestHeaders = array(
    'User-Agent' => 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)', //Simulate Google's webcrawler
);
$processor = new Processor($config, $this->logger); //A PSR3 Logger can be injected if required
$processor->run();

$this->logger->info('Statistics: ' . print_r($processor->result->getStatusCodes(), true));
$this->logger->info('Applied concurrency: ' . $config->concurrency);
$this->logger->info('Total amount of processed data: ' . $processor->result->getTotalData());
$this->logger->info('Failed to load #URLs: ' . count($processor->result->getBrokenUrls()));
```

Limitations
-----------

[](#limitations)

Currently, Octopus is mainly an experimental / educational tool. Advanced use cases in HTTP response handling might not be supported.

Tests
-----

[](#tests)

To run the test suite, you first need to clone this repository and then install all dependencies [using Composer](https://getcomposer.org):

```
$ composer install
```

To run the test suite, go to the project root and run:

```
$ make test
```

###  Health Score

46

—

FairBetter than 93% of packages

Maintenance46

Moderate activity, may be stable

Popularity28

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity81

Battle-tested with a long release history

 Bus Factor1

Top contributor holds 88.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~92 days

Recently: every ~282 days

Total

30

Last Release

536d ago

PHP version history (6 changes)0.1.0PHP ^7.1

0.8.0PHP ^7.4

0.8.4PHP ^7.4 || ^8.0

0.9.0PHP ^8.0

0.11.0PHP ~8.3

0.11.1PHP ~8.3 || ~8.4

### Community

Maintainers

![](https://www.gravatar.com/avatar/82650d599718a993b79b9600b59ebe15206b0b0f3e30312725c402247166b9a3?d=identicon)[dpovshed](/maintainers/dpovshed)

---

Top Contributors

[![holtkamp](https://avatars.githubusercontent.com/u/776405?v=4)](https://github.com/holtkamp "holtkamp (101 commits)")[![dpovshed](https://avatars.githubusercontent.com/u/786020?v=4)](https://github.com/dpovshed "dpovshed (13 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan, Rector

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/octopoda-octopus/health.svg)

```
[![Health](https://phpackages.com/badges/octopoda-octopus/health.svg)](https://phpackages.com/packages/octopoda-octopus)
```

###  Alternatives

[phpro/soap-client

A general purpose SoapClient library

8885.6M46](/packages/phpro-soap-client)[cognesy/instructor-php

The complete AI toolkit for PHP: unified LLM API, structured outputs, agents, and coding agent control

310107.9k1](/packages/cognesy-instructor-php)[aedart/athenaeum

Athenaeum is a mono repository; a collection of various PHP packages

245.2k](/packages/aedart-athenaeum)[php-soap/wsdl

Deals with WSDLs

173.5M12](/packages/php-soap-wsdl)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
