PHPackages                             shel/crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. shel/crawler

ActiveNeos-plugin[Utility &amp; Helpers](/categories/utility)

shel/crawler
============

Allows crawling of sitemaps and node-trees

3.0.0(6mo ago)92.3k3GPL-3.0PHPPHP &gt;=8.1

Since Nov 22Pushed 6mo ago2 watchersCompare

[ Source](https://github.com/Sebobo/Shel.Crawler)[ Packagist](https://packagist.org/packages/shel/crawler)[ RSS](/packages/shel-crawler/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (2)Versions (19)Used By (0)

Shel.Crawler for Neos CMS
=========================

[](#shelcrawler-for-neos-cms)

Crawler for Neos CMS nodes and sites. It can be used to warm up the caches after a release or dump your site as html files.

Installation
------------

[](#installation)

Run the following command in your project

```
composer require shel/crawler

```

Usage
-----

[](#usage)

To crawl all pages based on a single sitemap run

```
./flow crawler:crawlsitemap --url=http://huve.de.test/sitemap.xml --simultaneousLimit=10 --delay=0
```

To crawl all pages based on all sitemaps listed in a robots.txt file

```
./flow crawler:crawlrobotstxt --url=http://huve.de.test/robots.txt --simultaneousLimit=10 --delay=0
```

Node based crawling
-------------------

[](#node-based-crawling)

This command will try to generate all page html without using actual requests and only renders them internally. Due to the complexity of the page context, this might not give the desired results, but the resulting html of alle crawled pages can be stored for further usage.

This can be much faster as all pages are rendered in one process and all caches are reused.

To make this work, you need make provide a valid hostname.

This can be done via one of the following ways:

- have an active domain setup for a site (recommended, the crawler will use the first active domain)
- set the `Neos.Flow.http.baseUri` setting for Neos in your `Settings.yaml`
- provide the `baseUri` in general via the environment variable `CRAWLER_BASE_URI` and use the example in `Configuration/Production/Settings.yaml`

```
./flow crawler:crawlnodes --siteNodeName

# The default preset will be crawled, provide a dimension (combination) if you need a specific dimension to be crawled
./flow crawler:crawlnodes --siteNodeName  --dimensions language:en
./flow crawler:crawlnodes --siteNodeName  --dimensions language:en,country:de
```

To crawl all sites based on their primary active domain:

```
./flow crawler:crawlsites
```

To crawl all sites based on their primary active domain and use the URLs listed in robots.txt:

```
./flow crawler:crawlsites --method robotstxt
```

### Experimental static file cache

[](#experimental-static-file-cache)

By providing the `outputPath` you can store all crawled content as html files.

```
./flow crawler:crawlnodes --siteNodeName  --outputPath=Web/cache
```

You can use this actually as a super simple static file cache by adapting your webserver configuration. There is an example for nginx:

```
# Serve a cached page matching the request if it exists
location / {
    default_type "text/html";
    try_files /cache/$uri $uri $uri/ /index.php?$args;
}

# Serve cache/index(.html) instead of / if it exists
location = / {
    default_type "text/html";
    try_files /cache/index.html /cache/index /index.php?$args;
}
```

You replace the existing `try_files` part with the given code and adapt the path `cache` if you use a different one. This cache feature is really experimental, and you are currently in charge of keeping the files up-to-date and removing old ones.

- Doesn't clear cache
- Doesn't update automatically on publish
- Ignores Fusion caching configuration
- Shortcuts are ignored (open TODO)

Contributing
------------

[](#contributing)

Contributions or sponsorships are very welcome.

###  Health Score

50

—

FairBetter than 96% of packages

Maintenance67

Regular maintenance activity

Popularity29

Limited adoption so far

Community14

Small or concentrated contributor base

Maturity75

Established project with proven stability

 Bus Factor1

Top contributor holds 78.7% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~128 days

Recently: every ~43 days

Total

18

Last Release

192d ago

Major Versions

0.9.0 → 2.0.02019-11-22

0.9.x-dev → 1.0.x-dev2020-06-10

1.0.x-dev → 2.1.02022-01-25

2.x-dev → 3.0.02025-11-07

PHP version history (2 changes)2.3.0PHP &gt;=7.4

2.4.2PHP &gt;=8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/5c9f106db319e8698dcb4aedfe8034f0442d379ebbd78fcd67b884999b2236f1?d=identicon)[Sebobo](/maintainers/Sebobo)

---

Top Contributors

[![Sebobo](https://avatars.githubusercontent.com/u/596967?v=4)](https://github.com/Sebobo "Sebobo (48 commits)")[![paavo](https://avatars.githubusercontent.com/u/1118783?v=4)](https://github.com/paavo "paavo (8 commits)")[![adrian-cerdeira](https://avatars.githubusercontent.com/u/43271236?v=4)](https://github.com/adrian-cerdeira "adrian-cerdeira (3 commits)")[![jobee](https://avatars.githubusercontent.com/u/5636715?v=4)](https://github.com/jobee "jobee (2 commits)")

---

Tags

crawlerneos-cms

### Embed Badge

![Health badge](/badges/shel-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/shel-crawler/health.svg)](https://phpackages.com/packages/shel-crawler)
```

###  Alternatives

[sitegeist/monocle

An living-styleguide for Neos that is based on the actual fusion-code

45315.9k10](/packages/sitegeist-monocle)[sitegeist/kaleidoscope

Responsive-images for Neos

29352.4k10](/packages/sitegeist-kaleidoscope)[flowpack/listable

Tiny extension for listing things

35209.0k7](/packages/flowpack-listable)[kaufmanndigital/gdpr-cookieconsent

A ready-to-run package, that integrates an advanced cookie consent banner into your Neos CMS site.

2540.7k](/packages/kaufmanndigital-gdpr-cookieconsent)[neos/seo

SEO configuration and tools for Neos

13990.5k24](/packages/neos-seo)[shel/neos-colorpicker

A plugin for Neos CMS which provides a colorpicker editor

1494.4k6](/packages/shel-neos-colorpicker)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
