PHPackages                             robertbyrnes/phpcrawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [CLI &amp; Console](/categories/cli)
4. /
5. robertbyrnes/phpcrawler

ActiveCli-application[CLI &amp; Console](/categories/cli)

robertbyrnes/phpcrawler
=======================

A PHP CLI links crawler using FIFO Queue data structure and Producer/Consumer model.

1.1(5y ago)2121GPL-3.0PHPPHP &gt;=7.4

Since Mar 29Pushed 5y ago1 watchersCompare

[ Source](https://github.com/RobertByrnes/PHP-Crawler)[ Packagist](https://packagist.org/packages/robertbyrnes/phpcrawler)[ Docs](https://github.com/RobertByrnes/PHP-Crawler)[ RSS](/packages/robertbyrnes-phpcrawler/feed)WikiDiscussions master Synced 6d ago

READMEChangelog (1)Dependencies (3)Versions (2)Used By (0)

PHP-Crawler
===========

[](#php-crawler)

Implementation of Queue - Producer - Consumer Web Crawler in PHP. Uses multiple processes or native threads via the amphp/parallel dependency to crawl a domain for respondant links.

```
                                                    / _ \
                                                  \_\(_)/_/
                                                   _//o|\_
                                                    /  |

```

@author: Robert Byrnes @email:

Install
=======

[](#install)

Install using 'composer require robertbyrnes/phpcrawler': Once installed 'cd' into vendor/robertbyrnes/phpcrawler to find main.php this is the file to run the program. If you run into any trouble with 'class not found' errors be sure to run composer update and composer dump-autoload commands. If run with 'php main.php' from a terminal/command prompt the help menu will show detailing the arguments required to begin a crawl.

```
/*** ARGUMENTS ***/
Required arguments:
    -u url (string) e.g. http://website.org or https://...
    -n project name (string) e.g. website - this is used to create the dir to save the results to
        following crawling.
Optional arguments:
    -s number of spiders (integer)[parallel processes] used in crawling. Default is 4.
    -v returns the version.
    -h prints this help message.

```

Dependencies
============

[](#dependencies)

This program requires ^PHP7 to run as well as the amphp/parallel library for the multiple processes. Amphp/parallel should auto install with composer.

Classes
=======

[](#classes)

Crawler::class
==============

[](#crawlerclass)

Manages queueing of tasks and passes work between Queue::class and Spider::class utilising producer/consumer model with queue.

- Crawler::class functions:
- spawn() recursive function implementing the functionality of the parallel library to create processes which call Spider::search() to the do the crawling. This recursive loop will run until the Queue::class-&gt;queue is empty. When this happens the program will exit.
- add\_job() Reads queue.txt using SaveData::class adding links to the queue pushing the links to the Queue::class. This function is recursive one of two programming loops.

Queue::class
============

[](#queueclass)

A first-in, first-out data structure.

- Queue::class functions:
- push() pushes an item to the end of the queue.
- shift() take an item off the head of the queue.
- pop() take an item off the end of the queue.
- open\_job() increment the count of unfinished tasks.
- task\_done() decrement the count of unfinished tasks.
- getCount() gets the total number of items in the queue.

SaveData::class
===============

[](#savedataclass)

Handles all file tasks.

- SaveData::class functions:
- create\_dir() creates the project dir within results in the root.
- create\_files() creates queue.txt and crawled.txt in the project dir.
- file\_to\_array() opens either queue.txt or crawled.txt and parses stream to array.
- array\_to\_file() takes queue array or crawled array and writes to queue.txt or crawled.txt.
- write\_file() uses fwrite to open new files, or open - empty - then rewrite.
- append\_to\_file() appends lines to either queue.txt or crawled.txt.
- delete\_file\_contents() empties files - unused in PHP-Crawler.

Spider::class
=============

[](#spiderclass)

Extracts links from given url. Updates queues and files.

- Spider::class functions:
- setup() prints to user the domain name derived from the url. Creates dir and files. Populates both queue and crawled arrays from files.
- search() takes a url as an argument and calls extract\_links(), passing the result to sort\_to\_queue(). Prints updated queue counts to the user and updates the files once the crawling round is complete.
- getDomain() extracts the domain name form the given url.
- extract\_links() uses the built in php DomDocument::class to parses links extracted from a url into an array.
- sort\_to\_queue() cleans link to ensure all are from within the domain. Ensures unique links are pushed to the queue in Queue::class.
- update() utilises SaveData::class to write files queue.txt and crawled.txt with updated links.
- check\_queue() checks the count of the Spider::class property $queue. Exits the program once the queue is empty.

###  Health Score

25

—

LowBetter than 37% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity9

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity52

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

1875d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/59289145?v=4)[Robert Byrnes](/maintainers/RobertByrnes)[@RobertByrnes](https://github.com/RobertByrnes)

---

Top Contributors

[![RobertByrnes](https://avatars.githubusercontent.com/u/59289145?v=4)](https://github.com/RobertByrnes "RobertByrnes (37 commits)")

---

Tags

phpclicrawlerWeb Links

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/robertbyrnes-phpcrawler/health.svg)

```
[![Health](https://phpackages.com/badges/robertbyrnes-phpcrawler/health.svg)](https://phpackages.com/packages/robertbyrnes-phpcrawler)
```

###  Alternatives

[nunomaduro/termwind

It's like Tailwind CSS, but for the console.

2.5k239.8M286](/packages/nunomaduro-termwind)[nunomaduro/laravel-console-task

Laravel Console Task is a output method for your Laravel/Laravel Zero commands.

2582.1M11](/packages/nunomaduro-laravel-console-task)[mehrancodes/laravel-harbor

A CLI tool to Quickly create On-Demand preview environment for your apps.

9989.0k](/packages/mehrancodes-laravel-harbor)[alecrabbit/php-cli-snake

Lightweight cli spinner with zero dependencies

29211.3k5](/packages/alecrabbit-php-cli-snake)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
