PHPackages                             crwlr/crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. crwlr/crawler

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

crwlr/crawler
=============

Web crawling and scraping library.

v3.5.6(4mo ago)37214.8k↑16.7%13[1 PRs](https://github.com/crwlrsoft/crawler/pulls)2MITPHPPHP ^8.1

Since Apr 18Pushed 4mo ago4 watchersCompare

[ Source](https://github.com/crwlrsoft/crawler)[ Packagist](https://packagist.org/packages/crwlr/crawler)[ Docs](https://www.crwlr.software/packages/crawler)[ GitHub Sponsors](https://github.com/sponsors/otsch)[ RSS](/packages/crwlr-crawler/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (21)Versions (86)Used By (2)

[![crwlr.software logo](https://github.com/crwlrsoft/graphics/raw/eee6cf48ee491b538d11b9acd7ee71fbcdbe3a09/crwlr-logo.png)](https://www.crwlr.software)

Library for Rapid (Web) Crawler and Scraper Development
=======================================================

[](#library-for-rapid-web-crawler-and-scraper-development)

This library provides kind of a framework and a lot of ready to use, so-called **steps**, that you can use as building blocks, to build your own crawlers and scrapers with.

To give you an overview, here's a list of things that it helps you with:

- [Crawler **Politeness**](https://www.crwlr.software/packages/crawler/the-crawler/politeness) 😇 (respecting robots.txt, throttling,...)
- Load URLs using
    - [a **(PSR-18) HTTP client**](https://www.crwlr.software/packages/crawler/the-crawler/loaders) (default is of course Guzzle)
    - or a [**headless browser**](https://www.crwlr.software/packages/crawler/the-crawler/loaders#using-a-headless-browser) (chrome) to get source after Javascript execution
- [Get **absolute links** from HTML documents](https://www.crwlr.software/packages/crawler/included-steps/html#html-get-link) 🔗
- [Get **sitemaps** from robots.txt and get all URLs from those sitemaps](https://www.crwlr.software/packages/crawler/included-steps/sitemap)
- [**Crawl** (load) all pages of a website](https://www.crwlr.software/packages/crawler/included-steps/http#crawling) 🕷
- [Use **cookies** (or don't)](https://www.crwlr.software/packages/crawler/the-crawler/loaders#http-loader) 🍪
- [Use any **HTTP methods** (GET, POST,...) and send any headers or body](https://www.crwlr.software/packages/crawler/included-steps/http#http-requests)
- [Easily iterate over **paginated** list pages](https://www.crwlr.software/packages/crawler/included-steps/http#paginating) 🔁
- Extract data from:
    - [**HTML**](https://www.crwlr.software/packages/crawler/included-steps/html#extracting-data) and also [**XML**](https://www.crwlr.software/packages/crawler/included-steps/xml) (using CSS selectors or XPath queries)
    - [**JSON**](https://www.crwlr.software/packages/crawler/included-steps/json) (using dot notation)
    - [**CSV**](https://www.crwlr.software/packages/crawler/included-steps/csv) (map columns)
- [Extract **schema.org** structured data](https://www.crwlr.software/packages/crawler/included-steps/html#schema-org) in **JSON-LD** format from HTML documents
- [Keep memory usage low](https://www.crwlr.software/packages/crawler/crawling-procedure#memory-usage) by using PHP **Generators** 💪
- [**Cache** HTTP responses](https://www.crwlr.software/packages/crawler/response-cache) during development, so you don't have to load pages again and again after every code change
- [Get **logs**](https://www.crwlr.software/packages/crawler/the-crawler#loggers) about what your crawler is doing (accepts any PSR-3 LoggerInterface)
- And a lot more...

Documentation
-------------

[](#documentation)

You can find the documentation at [crwlr.software](https://www.crwlr.software/packages/crawler/getting-started).

Contributing
------------

[](#contributing)

If you consider contributing something to this package, read the [contribution guide (CONTRIBUTING.md)](CONTRIBUTING.md).

###  Health Score

57

—

FairBetter than 98% of packages

Maintenance75

Regular maintenance activity

Popularity45

Moderate usage in the ecosystem

Community20

Small or concentrated contributor base

Maturity73

Established project with proven stability

 Bus Factor1

Top contributor holds 93.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~16 days

Recently: every ~58 days

Total

85

Last Release

133d ago

Major Versions

v0.7.0 → v1.0.02023-02-08

v1.10.0 → v2.0.0-beta2024-08-08

v2.1.3 → v3.0.02024-12-08

### Community

Maintainers

![](https://www.gravatar.com/avatar/3074cef6e2926ede6d4c9c39a0cf29e2e86e6927255a17c103114d0a5957e1a7?d=identicon)[crwlr](/maintainers/crwlr)

---

Top Contributors

[![otsch](https://avatars.githubusercontent.com/u/4062813?v=4)](https://github.com/otsch "otsch (414 commits)")[![szepeviktor](https://avatars.githubusercontent.com/u/952007?v=4)](https://github.com/szepeviktor "szepeviktor (26 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (3 commits)")

---

Tags

crawlercrawlinghacktoberfestphpscraperscrapingscraping-websitesweb-crawlerweb-crawlingweb-scraperweb-scrapingwebcrawlerbotcrawlscrapescrapercrawlingscrapingcrwlr

###  Code Quality

TestsPest

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/crwlr-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/crwlr-crawler/health.svg)](https://phpackages.com/packages/crwlr-crawler)
```

###  Alternatives

[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[ec-cube/ec-cube

EC-CUBE EC open platform.

78527.0k1](/packages/ec-cube-ec-cube)[nutgram/nutgram

The Telegram bot library that doesn't drive you nuts

714214.9k8](/packages/nutgram-nutgram)[crwlr/robots-txt

Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping

1125.3k1](/packages/crwlr-robots-txt)[laurentvw/scrapher

A web scraper for PHP to easily extract data from web pages

192.5k1](/packages/laurentvw-scrapher)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
