PHPackages                             gyaaniguy/pcrawl - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. gyaaniguy/pcrawl

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

gyaaniguy/pcrawl
================

PHP web scraping and crawling library. With support for multiple clients, fast parsing, debugging and on the fly changes to various options

0.10-alpha(3y ago)28BSD-4-ClausePHPPHP &gt;=7.4

Since Feb 22Pushed 3y ago2 watchersCompare

[ Source](https://github.com/gyaaniguy/PCrawl)[ Packagist](https://packagist.org/packages/gyaaniguy/pcrawl)[ RSS](/packages/gyaaniguy-pcrawl/feed)WikiDiscussions master Synced today

READMEChangelog (1)Dependencies (3)Versions (14)Used By (0)

This is in alpha stage.
-----------------------

[](#this-is-in-alpha-stage)

PCrawl
======

[](#pcrawl)

PCrawl is a PHP library for crawling and scraping web pages.
It supports multiple clients: curl, guzzle. Options to debug, modify and parse responses.

Features
--------

[](#features)

- Rapidly create custom clients. Fluently change clients and client options like user-agent, with method chaining.
- Responses can be modified using reusable callback functions.
- Debug Responses using different criterias - httpcode, regex etc.
- Parse responses using querypath library. Several convenience functions are provided.
- Fluent API. Different debuggers, clients and response mod objects can be be changed on the fly !

Full Example
------------

[](#full-example)

We'll try to fetch a bad page, then detect using a debugger and finally change client options to fetch the page correctly.

- Setup up some clients

```
// simple clients.
$gu = new GuzzleClient();

// Custom Client, that does not allow redirects.
$uptightNoRedirectClient = new CurlClient();
$uptightNoRedirectClient->setRedirects(0); // disable redirects

// Custom client - thin wrapper around curl
class ConvertToHttpsClient extends CurlClient
{
    public function get(string $url, array $options = []): PResponse
    {
        $url = str_replace('http://', 'https://', $url);
        return parent::get($url, $options);
    }
}
```

- Lets make some debugger objects

```
$redirectDetector = new ResponseDebug();
$redirectDetector->setMustNotExistHttpCodes([301, 302, 303, 307, 308]);
$fullPageDetector = new ResponseDebug();
$fullPageDetector->setMustExistRegex(['##']);
```

##### Start fetching!

[](#start-fetching)

For testing, we will fetch page with a client that does not support redirects, then use the redirectDetector to detect 301. If so we change client option to support redirects and fetch again.

```
$req = new Request();
$url = "http://www.whatsmyua.info";
$req->setClient($uptightNoRedirectClient);
$count = 0;
do {
    $res = $req->get($url);
    $redirectDetector->setResponse($res);
    if ($redirectDetector->isFail()) {
        var_dump($redirectDetector->getFailDetail());
        $uptightNoRedirectClient->setRedirects(1);
        $res = $req->get($url);
    }
} while ($redirectDetector->isFail() && $count++ < 1);
```

Use the fullPageDetector to detect if the page is proper.
Then parse the response body using Parser

```
if ($fullPageDetector->setResponse($res)->isFail()) {
    var_dump($redirectDetector->getFailDetail());
} else {
    $parser = new ParserCommon($res->getBody());
    $h1 = $parser->find('h1')->text();
    $htmlClass = $parser->find('html')->attr('class');
}
```

> Note: the debuggers, clients, parsers can be reused.

### Detailed Usage

[](#detailed-usage)

Usage of functions can be divided into parts:

- [Fetching a page](docs/Fetching.md)
- [Modifying the response body](docs/Modify_Response.md)
- [Debugging the response](docs/Debugging_Response.md)
- [Parsing the response body](docs/Parse_Response.md)

Installation
------------

[](#installation)

- Composer:

```
composer init   # for new projects.
composer config minimum-stability dev # Will be removed once stable.
composer require gyaaniguy/pcrawl
composer update
include __DIR__ . '/vendor/autoload.php'; #in PHP
```

- github:

```
git clone git@github.com:gyaaniguy/PCrawl.git # clone repo
cd PCrawl
composer update # update composer
mv ../PCrawl /desired/location # Move dir to desired location.
require __DIR__ . '../PCrawl/vendor/autoload.php'; #in PHP
```

### TODO list

[](#todo-list)

- Leverage guzzlehttp asynchronous support

### Standards

[](#standards)

```
PSR-12
PHPUnit tests

```

###  Health Score

20

—

LowBetter than 13% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity7

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity39

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~11 days

Recently: every ~25 days

Total

11

Last Release

1117d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/db12f89c38f0eaca127c7f0b54aab98a5ae9dbaa5e0e09153490d2bd7d8b6e67?d=identicon)[gyaaniguy](/maintainers/gyaaniguy)

---

Top Contributors

[![gyaaniguy](https://avatars.githubusercontent.com/u/929990?v=4)](https://github.com/gyaaniguy "gyaaniguy (94 commits)")

---

Tags

phpcrawlerscrapingweb-scraping web-crawlerphp web scrapingweb scraping library

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/gyaaniguy-pcrawl/health.svg)

```
[![Health](https://phpackages.com/badges/gyaaniguy-pcrawl/health.svg)](https://phpackages.com/packages/gyaaniguy-pcrawl)
```

###  Alternatives

[crwlr/crawler

Web crawling and scraping library.

36917.4k2](/packages/crwlr-crawler)[dusterio/link-preview

Link preview generation for PHP with Laravel support

129331.6k3](/packages/dusterio-link-preview)[oat-sa/tao-core

TAO core extension

66143.7k122](/packages/oat-sa-tao-core)[eddieace/php-simple

1364.5k](/packages/eddieace-php-simple)[eslazarev/wildberries-sdk

Wildberries OpenAPI clients (generated).

273.0k](/packages/eslazarev-wildberries-sdk)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
