PHPackages                             mjorgens/web-crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. mjorgens/web-crawler

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

mjorgens/web-crawler
====================

A PHP web crawler library

V1.0.3(5y ago)013[1 PRs](https://github.com/mjorgens/web-crawler/pulls)MITPHPPHP ^7.2CI passing

Since Sep 13Pushed 2w agoCompare

[ Source](https://github.com/mjorgens/web-crawler)[ Packagist](https://packagist.org/packages/mjorgens/web-crawler)[ RSS](/packages/mjorgens-web-crawler/feed)WikiDiscussions master Synced 6d ago

READMEChangelog (4)Dependencies (6)Versions (7)Used By (0)

Web Crawler for PHP
===================

[](#web-crawler-for-php)

[![GitHub release (latest by date)](https://camo.githubusercontent.com/354296c73c64d554c790726a705b80d61c6a286708bc9b1da32de3c159b7267c/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f6d6a6f7267656e732f7765622d637261776c6572)](https://camo.githubusercontent.com/354296c73c64d554c790726a705b80d61c6a286708bc9b1da32de3c159b7267c/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f6d6a6f7267656e732f7765622d637261776c6572) [![GitHub Workflow Status (branch)](https://camo.githubusercontent.com/3cdfa72f3d8e42d67970f7ffa323e5d585df3e75e95688ea65b313de79e7a8d2/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f776f726b666c6f772f7374617475732f6d6a6f7267656e732f7765622d637261776c65722f43492f6d6173746572)](https://camo.githubusercontent.com/3cdfa72f3d8e42d67970f7ffa323e5d585df3e75e95688ea65b313de79e7a8d2/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f776f726b666c6f772f7374617475732f6d6a6f7267656e732f7765622d637261776c65722f43492f6d6173746572) [![GitHub](https://camo.githubusercontent.com/99493532f97a511ec1aef8cc64dc2cf2308954cc848a8d5b670c128af70aafda/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f6d6a6f7267656e732f7765622d637261776c6572)](https://camo.githubusercontent.com/99493532f97a511ec1aef8cc64dc2cf2308954cc848a8d5b670c128af70aafda/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f6d6a6f7267656e732f7765622d637261776c6572)

This is a PHP library that takes a starting URL and then parses the page Html and extracts the URLs. It then follows the URL and parses those pages until the max number of URLs is reached.

Requirements
------------

[](#requirements)

[![PHP from Packagist](https://camo.githubusercontent.com/fe649d46062b7d1574795f155ccf8435c822a5b3b0b08e72d62b48b10b9230ce/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f6d6a6f7267656e732f7765622d637261776c6572)](https://camo.githubusercontent.com/fe649d46062b7d1574795f155ccf8435c822a5b3b0b08e72d62b48b10b9230ce/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f6d6a6f7267656e732f7765622d637261776c6572)

Installation
------------

[](#installation)

The recommended way to install this library is through Composer.

```
composer require mjorgens/web-crawler
```

Usage
-----

[](#usage)

```
$repository = new \Mjorgens\Crawler\CrawledRepository\CrawledMemoryRepository(); // The collection of pages
$url = new Uri('https://example.com'); // Starting url
$maxUrls = 5; // Max number of urls to crawl

Crawler::create()
            ->setRepository($repository)
            ->setMaxCrawl($maxUrls)
            ->startCrawling($url); // Start the crawler

foreach ($repository as $page){
    echo $page->url;
    echo $page->html;
}
```

###  Health Score

35

—

LowBetter than 79% of packages

Maintenance63

Regular maintenance activity

Popularity5

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 93.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~51 days

Total

4

Last Release

1916d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/5671965?v=4)[Marc Jorgensen](/maintainers/mjorgens)[@mjorgens](https://github.com/mjorgens)

---

Top Contributors

[![mjorgens](https://avatars.githubusercontent.com/u/5671965?v=4)](https://github.com/mjorgens "mjorgens (14 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (1 commits)")

###  Code Quality

TestsPHPUnit

Code StylePHP\_CodeSniffer

### Embed Badge

![Health badge](/badges/mjorgens-web-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/mjorgens-web-crawler/health.svg)](https://phpackages.com/packages/mjorgens-web-crawler)
```

###  Alternatives

[spatie/crawler

Crawl all internal links found on a website

2.8k16.3M52](/packages/spatie-crawler)[google/cloud-core

Google Cloud PHP shared dependency, providing functionality useful to all components.

343121.4M79](/packages/google-cloud-core)[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[wallabag/wallabag

open source self hostable read-it-later web application

12.6k2.2k](/packages/wallabag-wallabag)[ashallendesign/favicon-fetcher

A Laravel package for fetching website's favicons.

190272.4k3](/packages/ashallendesign-favicon-fetcher)[flarum/core

Delightfully simple forum software.

211.3M1.9k](/packages/flarum-core)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
