PHPackages                             ixnode/php-web-crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. ixnode/php-web-crawler

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

ixnode/php-web-crawler
======================

PHP Web Crawler - This PHP class allows you to crawl recursively a given html page (or a given html file) and collect some data from it.

0.1.24(2y ago)243↓100%MITPHPPHP ^8.2

Since Feb 24Pushed 2y ago1 watchersCompare

[ Source](https://github.com/ixnode/php-web-crawler)[ Packagist](https://packagist.org/packages/ixnode/php-web-crawler)[ RSS](/packages/ixnode-php-web-crawler/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (26)Used By (0)

PHP Web Crawler
===============

[](#php-web-crawler)

[![Release](https://camo.githubusercontent.com/08056929a607b3b24025db2bf3db8f0b939c8f5e2700f712a6203a1289cb6f2c/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f69786e6f64652f7068702d7765622d637261776c6572)](https://github.com/ixnode/php-web-crawler/releases)[![](https://camo.githubusercontent.com/5c894c5c189963748d3be115bbe5f304b046879fac37a4f50318bbe7ce6a0534/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f72656c656173652d646174652f69786e6f64652f7068702d7765622d637261776c6572)](https://github.com/ixnode/php-web-crawler/releases)[![](https://camo.githubusercontent.com/66c4e4ce5af8cc9e4dd486d25d651ab287a009116db94b4d57b72d0f35d98ff1/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f7265706f2d73697a652f69786e6f64652f7068702d7765622d637261776c65722e737667)](https://camo.githubusercontent.com/66c4e4ce5af8cc9e4dd486d25d651ab287a009116db94b4d57b72d0f35d98ff1/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f7265706f2d73697a652f69786e6f64652f7068702d7765622d637261776c65722e737667)[![PHP](https://camo.githubusercontent.com/9ffda6c94e9634fafbd5ff0f68f442ccb690a9b88c8613f79b9c2ec14f5e596b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048502d253545382e322d3737376262332e7376673f6c6f676f3d706870266c6f676f436f6c6f723d7768697465266c6162656c436f6c6f723d353535353535267374796c653d666c6174)](https://www.php.net/supported-versions.php)[![PHPStan](https://camo.githubusercontent.com/29fcb055286f72da2b9c961f987152d1ac26a91d6cea221174abe7e01b5b7857/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048505374616e2d4c6576656c2532304d61782d3737376262332e7376673f7374796c653d666c6174)](https://phpstan.org/user-guide/rule-levels)[![PHPUnit](https://camo.githubusercontent.com/de101a446df383674178b80c404288664983cc6de751737d976d3df023942af9/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f504850556e69742d556e697425323054657374732d3662396264322e7376673f7374796c653d666c6174)](https://phpunit.de)[![PHPCS](https://camo.githubusercontent.com/ee1fdc76b568e414c96c160eb608dd4680475a67d714661b70a3b3c43f6ded54/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f50485043532d50535231322d3431366434652e7376673f7374796c653d666c6174)](https://www.php-fig.org/psr/psr-12/)[![PHPMD](https://camo.githubusercontent.com/223a2afede2bd4bf4d524b7610277838a5d75504f7087fc6c36847b81a80ab43/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5048504d442d414c4c2d3336346138332e7376673f7374796c653d666c6174)](https://github.com/phpmd/phpmd)[![Rector - Instant Upgrades and Automated Refactoring](https://camo.githubusercontent.com/9c119c037099401c8a61bd1ce556a58158eb09bfe452b2d132b94500e6368714/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f526563746f722d504850253230382e322d3733613136352e7376673f7374796c653d666c6174)](https://github.com/rectorphp/rector)[![LICENSE](https://camo.githubusercontent.com/e5ea4c19422baff789692dd36ee83fc11b6dc67d211dd4a56b4c721e509e8b27/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f69786e6f64652f7068702d6170692d76657273696f6e2d62756e646c65)](https://github.com/ixnode/php-api-version-bundle/blob/master/LICENSE)

> This PHP class allows you to crawl recursively a given html page (or a given html file) and collect some data from it. Simply define the url (or a html file) and a set of xpath expressions which should map with the output data object. The final representation will be a php array which can be easily converted into the json format for further processing.

1. Installation
---------------

[](#1-installation)

```
composer require ixnode/php-web-crawler
```

```
vendor/bin/php-web-crawler -V
```

```
php-web-crawler 0.1.0 (02-24-2024 14:46:26) - Björn Hempel
```

2. Usage
--------

[](#2-usage)

### 2.1 PHP Code

[](#21-php-code)

```
use Ixnode\PhpWebCrawler\Output\Field;
use Ixnode\PhpWebCrawler\Source\Raw;
use Ixnode\PhpWebCrawler\Value\Text;
use Ixnode\PhpWebCrawler\Value\XpathTextNode;

$rawHtml = parse()->getJsonStringFormatted();
// See below
```

### 2.2 JSON result

[](#22-json-result)

```
{
    "version": "1.0.0",
    "title": "Test Title",
    "paragraph": "Test Paragraph"
}
```

3. Advanced usage
-----------------

[](#3-advanced-usage)

### 3.1 Group

[](#31-group)

#### PHP Code

[](#php-code)

```
use Ixnode\PhpWebCrawler\Output\Field;
use Ixnode\PhpWebCrawler\Output\Group;
use Ixnode\PhpWebCrawler\Source\Raw;
use Ixnode\PhpWebCrawler\Value\XpathTextNode;

$rawHtml = parse()->getJsonStringFormatted();
// See below
```

#### JSON result

[](#json-result)

```
{
  "title": "Test Page",
  "content": {
    "header": {
      "h1": "Test Title"
    },
    "text": {
      "p1": "Test Paragraph 1",
      "p2": "Test Paragraph 2"
    }
  }
}
```

### 3.2 XpathSection

[](#32-xpathsection)

#### PHP Code

[](#php-code-1)

```
use Ixnode\PhpWebCrawler\Output\Field;
use Ixnode\PhpWebCrawler\Output\Group;
use Ixnode\PhpWebCrawler\Source\Raw;
use Ixnode\PhpWebCrawler\Source\XpathSection;
use Ixnode\PhpWebCrawler\Value\XpathTextNode;

$rawHtml = parse()->getJsonStringFormatted();
// See below
```

#### JSON result

[](#json-result-1)

```
{
    "title": "Test Page",
    "content": {
        "header": {
            "h1": "Test Title"
        },
        "text": {
            "p1": "Test Paragraph 1",
            "p2": "Test Paragraph 2"
        }
    }
}
```

### 3.3 XpathSection (flat)

[](#33-xpathsection-flat)

#### PHP Code

[](#php-code-2)

```
use Ixnode\PhpWebCrawler\Output\Field;
use Ixnode\PhpWebCrawler\Output\Group;
use Ixnode\PhpWebCrawler\Source\Raw;
use Ixnode\PhpWebCrawler\Source\XpathSections;
use Ixnode\PhpWebCrawler\Value\XpathTextNode;

$rawHtml = parse()->getJsonStringFormatted();
// See below
```

#### JSON result

[](#json-result-2)

```
{
    "title": "Test Page",
    "hits": [
        [
            "Test Item 1",
            "Test Item 2"
        ]
    ]
}
```

### 3.3 XpathSection (structured)

[](#33-xpathsection-structured)

#### PHP Code

[](#php-code-3)

```
use Ixnode\PhpWebCrawler\Output\Field;
use Ixnode\PhpWebCrawler\Output\Group;
use Ixnode\PhpWebCrawler\Source\Raw;
use Ixnode\PhpWebCrawler\Source\XpathSections;
use Ixnode\PhpWebCrawler\Value\XpathTextNode;

$rawHtml = parse()->getJsonStringFormatted();
// See below
```

#### JSON result

[](#json-result-3)

```
{
    "title": "Test Page",
    "hits": [
        {
            "caption": "Caption 1",
            "content": "Cell 1"
        },
        {
            "caption": "Caption 2",
            "content": "Cell 2"
        }
    ]
}
```

4. More examples
----------------

[](#4-more-examples)

- [examples/converter.php](examples/converter.php)
- [examples/group.php](examples/group.php)
- [examples/section.php](examples/section.php)
- [examples/sections-recursive-url.php](examples/sections-recursive-url.php)
- [examples/sections.php](examples/sections.php)
- [examples/simple-wiki-page.php](examples/simple-wiki-page.php)

5. Development
--------------

[](#5-development)

```
git clone git@github.com:ixnode/php-web-crawler.git && cd php-web-crawler
```

```
composer install
```

```
composer test
```

6. License
----------

[](#6-license)

This library is licensed under the MIT License - see the [LICENSE.md](/LICENSE.md) file for details.

###  Health Score

26

—

LowBetter than 43% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity13

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity53

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

25

Last Release

801d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/b92095b1e1e279ccad060449bca0e21bf613b6e33796d878f7bfc212a4680bad?d=identicon)[ixnode](/maintainers/ixnode)

---

Top Contributors

[![bjoern-hempel](https://avatars.githubusercontent.com/u/5531245?v=4)](https://github.com/bjoern-hempel "bjoern-hempel (50 commits)")

---

Tags

phpjsonwebarrayhtmlcrawlerspiderscraperrecursive

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan, Rector

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/ixnode-php-web-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/ixnode-php-web-crawler/health.svg)](https://phpackages.com/packages/ixnode-php-web-crawler)
```

###  Alternatives

[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[crwlr/crawler

Web crawling and scraping library.

37214.8k2](/packages/crwlr-crawler)[zakirullin/mess

Convenient array-related routine &amp; better type casting

21228.9k2](/packages/zakirullin-mess)[hi-folks/data-block

Data class for managing nested arrays and JSON data.

1472.2k](/packages/hi-folks-data-block)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
