PHPackages                             fozbek/scrawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. fozbek/scrawler

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

fozbek/scrawler
===============

Simple, schema based scraping tool

v1.1.1(11mo ago)12426MITPHPPHP ^8.1CI failing

Since May 1Pushed 6mo ago1 watchersCompare

[ Source](https://github.com/fozbek/scrawler)[ Packagist](https://packagist.org/packages/fozbek/scrawler)[ RSS](/packages/fozbek-scrawler/feed)WikiDiscussions master Synced 5d ago

READMEChangelog (3)Dependencies (5)Versions (16)Used By (0)

Scrawler
========

[](#scrawler)

A modern, schema-based web scraping library for PHP with powerful transformers and a clean, intuitive syntax. Perfect for both manual use and API integration.

[![License: MIT](https://camo.githubusercontent.com/fdf2982b9f5d7489dcf44570e714e3a15fce6253e0cc6b5aa61a075aac2ff71b/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4d49542d79656c6c6f772e737667)](https://opensource.org/licenses/MIT)[![PHP Version](https://camo.githubusercontent.com/acffb6ae1962992d26e4466782832787e79504a6250f80d732c4283458b9f497/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d253545382e312d626c75652e737667)](https://php.net)

Features
--------

[](#features)

- **Intuitive Schema Syntax**: Easy to write by hand and by AI
- **Built-in Transformers**: 20+ transformers for data manipulation (trim, float, int, upper, lower, etc.)
- **Flexible Lists**: Support for limit and offset
- **JSON-Friendly**: Perfect for API usage
- **Type-Safe**: Full PHPStan max level compliance
- **Clean Architecture**: SOLID principles, no anti-patterns
- **Well-Tested**: 47 tests, 107 assertions

Installation
------------

[](#installation)

```
composer require fozbek/scrawler
```

Quick Start
-----------

[](#quick-start)

```
use Scrawler\Bootstrap;
use Scrawler\Scrawler;

// Handle PHP 8.4 deprecation warnings from vendor libraries (optional)
Bootstrap::init();

$scrawler = new Scrawler();

$schema = [
    'title' => 'h1',
    'price' => ['span.price', 'trim|float'],
    'items' => [
        'li' => [
            'text' => [null, 'trim|upper']
        ],
        'limit' => 5
    ]
];

$data = $scrawler->scrape('https://example.com', $schema);
```

### PHP 8.4 Compatibility

[](#php-84-compatibility)

If you're running PHP 8.4+, you may see deprecation warnings from vendor libraries (DiDom, Guzzle) related to implicitly nullable parameters. These are harmless but can clutter output. Use `Bootstrap::init()` to suppress these vendor-specific warnings:

```
use Scrawler\Bootstrap;

Bootstrap::init(); // Call once at the start of your script
```

This only suppresses deprecation warnings from vendor code, keeping your own code's warnings intact.

Schema Syntax
-------------

[](#schema-syntax)

### Simple Text Extraction

[](#simple-text-extraction)

```
$schema = [
    'title' => 'h1',
    'description' => '.content p'
];
```

### Attribute Extraction

[](#attribute-extraction)

```
$schema = [
    'image' => 'img@src',
    'link' => 'a@href',
    'dataId' => 'div@data-id'
];
```

**Extracting attributes from the current element** (useful in lists):

```
$schema = [
    'items' => [
        '.product' => [
            'id' => '@id',              // Get id attribute from .product element
            'data' => '@data-value',    // Get data-value attribute
            'name' => '.title'          // Get text from nested .title
        ]
    ]
];
```

### Transformers

[](#transformers)

Apply transformations using pipe-separated transformer names:

```
$schema = [
    'price' => ['span.price', 'trim|float'],
    'name' => ['.product-name', 'trim|upper'],
    'url' => ['a@href', 'urldecode']
];
```

**Available Transformers:**

**Type Conversions:**

- `int`, `float`, `bool`, `string`

**String Operations:**

- `trim`, `ltrim`, `rtrim`
- `upper`, `lower`, `ucfirst`, `ucwords`
- `strip_tags`

**URL/Path:**

- `basename`, `dirname`
- `urlencode`, `urldecode`

**Parsing:**

- `json` - decode JSON strings
- `timestamp` - convert dates to Unix timestamp

**Utility:**

- `abs` - absolute value
- `md5`, `sha1` - hashing

### Lists (New Syntax)

[](#lists-new-syntax)

**Simple list:**

```
$schema = [
    'items' => [
        'li' => [
            'text' => null  // Current element text
        ]
    ]
];
```

**List with transformers:**

```
$schema = [
    'products' => [
        '.product' => [
            'name' => ['.name', 'trim|ucwords'],
            'price' => ['.price', 'trim|float']
        ]
    ]
];
```

**List with limit and offset:**

```
$schema = [
    'items' => [
        'li' => ['text' => null],
        'limit' => 10,    // Take only first 10
        'offset' => 5     // Skip first 5
    ]
];
```

**Old syntax still supported:**

```
$schema = [
    'items' => [
        'list-selector' => 'li',
        'content' => [
            'text' => null
        ]
    ]
];
```

### Nested Lists

[](#nested-lists)

```
$schema = [
    'categories' => [
        '.category' => [
            'name' => '.category-name',
            'products' => [
                '.product' => [
                    'name' => ['.name', 'trim'],
                    'price' => ['.price', 'trim|float']
                ],
                'limit' => 5
            ]
        ]
    ]
];
```

Examples
--------

[](#examples)

### Scraping with Transformers

[](#scraping-with-transformers)

```
$html = '

          wireless headphones
          $59.99
        Details

';

$schema = [
    'name' => ['h2', 'trim|ucwords'],
    'price' => ['.price', 'trim|float'],
    'url' => ['a@href', 'urldecode']
];

$result = $scrawler->scrape($html, $schema, true);

// Output:
// [
//     'name' => 'Wireless Headphones',
//     'price' => 59.99,
//     'url' => '/products/item 123'
// ]
```

### Scraping Lists with Limits

[](#scraping-lists-with-limits)

```
$html = '12345';

$schema = [
    'items' => [
        'li' => ['text' => null],
        'offset' => 1,
        'limit' => 3
    ]
];

$result = $scrawler->scrape($html, $schema, true);

// Output: ['items' => [['text' => '2'], ['text' => '3'], ['text' => '4']]]
```

### Complex Real-World Example

[](#complex-real-world-example)

```
$schema = [
    'title' => ['h1', 'trim|upper'],
    'author' => '.meta .author',
    'publishedAt' => ['.meta .date', 'timestamp'],
    'content' => ['.content', 'trim|strip_tags'],
    'tags' => [
        '.tag' => [
            'name' => [null, 'trim|lower'],
            'url' => ['a@href', 'urldecode']
        ],
        'limit' => 10
    ]
];
```

JSON API Usage
--------------

[](#json-api-usage)

The schema syntax is designed to work seamlessly with JSON:

```
{
  "title": ["h1", "trim|upper"],
  "price": ["span.price", "trim|float"],
  "products": {
    ".product": {
      "name": [".name", "trim"],
      "price": [".price", "trim|float"]
    },
    "limit": 10,
    "offset": 0
  }
}
```

**Note:** Callbacks and filtering should be handled by the API consumer after receiving the data.

### Custom HTTP Client

[](#custom-http-client)

```
use GuzzleHttp\Client;
use Scrawler\Scrawler;

$client = new Client([
    'timeout' => 30,
    'headers' => ['User-Agent' => 'My Bot/1.0'],
    'proxy' => 'http://proxy.example.com:8080'
]);

$scrawler = new Scrawler($client);
```

Testing
-------

[](#testing)

```
# Run all tests
composer test

# Run specific test
./vendor/bin/phpunit tests/ScrawlerNewSyntaxTest.php

# With coverage
composer coverage
```

Static Analysis
---------------

[](#static-analysis)

```
composer analyse
```

**PHPStan Level:** Max (strictest)

Requirements
------------

[](#requirements)

- PHP 8.1 or higher
- ext-dom
- Guzzle 6.0 or 7.0+
- DiDom 2.0+

License
-------

[](#license)

MIT License - see [LICENSE](LICENSE)

Contributing
------------

[](#contributing)

Contributions welcome! Please ensure:

- All tests pass
- PHPStan analysis passes
- Follow PSR-12

Author
------

[](#author)

Fatih Özbek -

Credits
-------

[](#credits)

- [Guzzle](https://github.com/guzzle/guzzle) - HTTP client
- [DiDom](https://github.com/Imangazaliev/DiDom) - DOM parsing

###  Health Score

44

—

FairBetter than 92% of packages

Maintenance61

Regular maintenance activity

Popularity20

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity72

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~168 days

Recently: every ~368 days

Total

12

Last Release

353d ago

Major Versions

v0.3 → v1.02021-05-25

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/17993880?v=4)[Fatih Özbek](/maintainers/fozbek)[@fozbek](https://github.com/fozbek)

---

Top Contributors

[![fozbek](https://avatars.githubusercontent.com/u/17993880?v=4)](https://github.com/fozbek "fozbek (38 commits)")

---

Tags

scraperscraper-apiscraper-enginescrawler

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/fozbek-scrawler/health.svg)

```
[![Health](https://phpackages.com/badges/fozbek-scrawler/health.svg)](https://phpackages.com/packages/fozbek-scrawler)
```

###  Alternatives

[shlinkio/shlink

A self-hosted and PHP-based URL shortener application with CLI and REST interfaces

4.8k4.3k](/packages/shlinkio-shlink)[ralphjsmit/laravel-helpers

A package containing handy helpers for your Laravel-application.

13704.6k2](/packages/ralphjsmit-laravel-helpers)[dhlparcel/magento2-plugin

DHL Parcel plugin for Magento 2

11180.5k2](/packages/dhlparcel-magento2-plugin)[aedart/athenaeum

Athenaeum is a mono repository; a collection of various PHP packages

245.2k](/packages/aedart-athenaeum)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
