PHPackages                             coderden/page-parser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. coderden/page-parser

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

coderden/page-parser
====================

Powerful PHP package for parsing HTML pages with DOM, XPath and CSS selector support

1.0.0(5mo ago)11MITPHPPHP ^8.1

Since Jan 15Pushed 5mo agoCompare

[ Source](https://github.com/dnsinyukov/page-parser)[ Packagist](https://packagist.org/packages/coderden/page-parser)[ Docs](https://github.com/dnsinyukov/page-parser)[ RSS](/packages/coderden-page-parser/feed)WikiDiscussions main Synced today

READMEChangelog (1)Dependencies (3)Versions (2)Used By (0)

Page Parser for PHP
===================

[](#page-parser-for-php)

[![PHP Version](https://camo.githubusercontent.com/f870cee2a2e2a442c6b62c8bf79f45ec0ce794dc5af13834902518c9107230f9/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d382e312532422d626c75652e737667)](https://packagist.org/packages/coderden/page-parser)[![License](https://camo.githubusercontent.com/8bb50fd2278f18fc326bf71f6e88ca8f884f72f179d3e555e20ed30157190d0d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d677265656e2e737667)](LICENSE.md)

A powerful PHP package for parsing HTML pages with DOM, XPath, and CSS selector support. Perfect for web scraping, data extraction, and automation tasks.

Features
--------

[](#features)

- 🔍 **XPath and CSS selector support** for precise element targeting
- 🌐 **HTTP client integration** with Guzzle
- 🔗 **Automatic URL resolution** for relative links and images
- 📦 **Multiple extraction methods** for text, attributes, and HTML
- 🔄 **Meta tag parsing** and SEO data extraction
- 🚀 **Easy-to-use fluent interface**
- 📊 **Response handling** with status codes and headers
- 🛡️ **Error handling** and exception management

Installation
------------

[](#installation)

```
composer require coderden/page-parser
```

Quick Start
-----------

[](#quick-start)

```
use CoderDen\PageParser\PageParser;

// Create parser instance
$parser = new PageParser();

// Load and parse a page
$parser->loadPage('https://example.com');

// Get page title
echo $parser->getTitle();

// Extract all links
$links = $parser->getAllLinks();

// Extract specific elements
$products = $parser->extractByXPath('//div[@class="product"]', [
    'name' => './/h3/text()',
    'price' => './/span[@class="price"]/text()',
    'url' => './/a/@href',
]);
```

Basic Usage
-----------

[](#basic-usage)

### Using PageParser Directly

[](#using-pageparser-directly)

```
$parser = new PageParser([
    'timeout' => 30,
    'headers' => [
        'User-Agent' => 'MyBot/1.0',
    ],
]);

// Load page
$parser->loadPage('https://example.com');

// Extract by XPath
$data = $parser->extractByXPath('//article', [
    'title' => './/h2/text()',
    'content' => './/p/text()',
]);

// Extract by CSS selector
$links = $parser->extractByCss('a.article-link', ['href', '_text']);

// Check element existence
if ($parser->exists('.pagination')) {
    echo 'Pagination found!';
}

// Get element count
$imageCount = $parser->count('img');
```

### Using ParserHelper

[](#using-parserhelper)

```
use CoderDen\PageParser\ParserHelper;

// Quick extraction
$links = ParserHelper::extractLinks('https://example.com');

// Get page title
$title = ParserHelper::getTitle('https://example.com');

// Extract specific data
$products = ParserHelper::extract(
    'https://example.com/products',
    '//div[@class="product-item"]',
    ['name' => './/h3/text()', 'price' => './/span[@class="price"]/text()']
);

// Check URL availability
if (ParserHelper::checkUrl('https://example.com')) {
    echo 'URL is accessible';
}
```

Advanced Features
-----------------

[](#advanced-features)

### Meta Data Extraction

[](#meta-data-extraction)

```
$parser = new PageParser();
$parser->loadPage('https://example.com');

// Get meta tags
$metaTags = $parser->getMetaTags();

// Get canonical URL
$canonical = $parser->getCanonicalUrl();

// Get page charset
$charset = $parser->getCharset();

// Get Open Graph data
$ogTitle = $parser->getAttribute('meta[property="og:title"]', 'content');
$ogImage = $parser->getAttribute('meta[property="og:image"]', 'content');
```

### URL Resolution

[](#url-resolution)

```
$parser = new PageParser();
$parser->loadPage('https://example.com/blog');

// All links are automatically resolved to absolute URLs
$links = $parser->extractLinksByXPath('//a[@href]');

// Images with relative paths become absolute
$images = $parser->extractImagesByXPath('//img[@src]');
```

### Regular Expression Search

[](#regular-expression-search)

```
$parser = new PageParser();
$parser->loadPage('https://example.com');

// Search for email addresses
$emails = $parser->searchByRegex('/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/');

// Search for phone numbers
$phones = $parser->searchByRegex('/\+?[\d\s\-\(\)]{7,}/');
```

Configuration Options
---------------------

[](#configuration-options)

```
$parser = new PageParser([
    // HTTP client options
    'timeout' => 30,
    'connect_timeout' => 10,
    'verify' => true, // SSL verification
    'allow_redirects' => true,

    // Custom headers
    'headers' => [
        'User-Agent' => 'MyCrawler/1.0',
        'Accept' => 'text/html,application/xhtml+xml',
        'Accept-Language' => 'en-US,en;q=0.9',
        'Referer' => 'https://google.com',
    ],

    // Proxy support
    'proxy' => 'http://proxy.example.com:8080',

    // Authentication
    'auth' => ['username', 'password'],

    // Cookies
    'cookies' => true,
]);
```

Error Handling
--------------

[](#error-handling)

```
use CoderDen\PageParser\PageParser;

try {
    $parser = new PageParser();
    $parser->loadPage('https://example.com');

    // Your parsing logic here

} catch (\RuntimeException $e) {
    echo "Failed to load page: " . $e->getMessage();

} catch (\Exception $e) {
    echo "General error: " . $e->getMessage();
}
```

Examples
--------

[](#examples)

### Example 1: Scrape Product List

[](#example-1-scrape-product-list)

```
$parser = new PageParser();
$parser->loadPage('https://example.com/products');

$products = $parser->extractByXPath('//div[contains(@class, "product")]', [
    'name' => './/h3/text()',
    'price' => './/span[@class="price"]/text()',
    'sku' => './/span[@class="sku"]/text()',
    'image' => './/img/@src',
    'url' => './/a/@href',
]);

foreach ($products as $product) {
    echo "Product: {$product['name']}\n";
    echo "Price: {$product['price']}\n";
    echo "Image: {$product['image']}\n";
    echo "---\n";
}
```

### Example 2: Extract Article Data

[](#example-2-extract-article-data)

```
$articleData = ParserHelper::extract(
    'https://example.com/article',
    '//article',
    [
        'title' => './/h1/text()',
        'author' => './/span[@class="author"]/text()',
        'date' => './/time/@datetime',
        'content' => './/div[@class="content"]//p//text()',
        'tags' => './/a[@rel="tag"]//text()',
    ]
);

// Process article content
if (!empty($articleData[0]['content'])) {
    $content = is_array($articleData[0]['content'])
        ? implode("\n", $articleData[0]['content'])
        : $articleData[0]['content'];
}
```

### Example 3: Batch Processing URLs

[](#example-3-batch-processing-urls)

```
$urls = [
    'https://example.com/page1',
    'https://example.com/page2',
    'https://example.com/page3',
];

$allData = [];
foreach ($urls as $url) {
    try {
        $data = ParserHelper::extract($url, '//h1', ['_text']);
        $allData[$url] = $data[0] ?? 'No title';
    } catch (\Exception $e) {
        $allData[$url] = "Error: " . $e->getMessage();
    }
}

// Save results
file_put_contents('results.json', json_encode($allData, JSON_PRETTY_PRINT));
```

### Example 4: Monitor Website Changes

[](#example-4-monitor-website-changes)

```
class WebsiteMonitor
{
    private PageParser $parser;

    public function __construct()
    {
        $this->parser = new PageParser();
    }

    public function checkForChanges(string $url, string $elementSelector): array
    {
        $this->parser->loadPage($url);

        return [
            'title' => $this->parser->getTitle(),
            'element_count' => $this->parser->count($elementSelector),
            'element_exists' => $this->parser->exists($elementSelector),
            'status_code' => $this->parser->getStatusCode(),
            'timestamp' => date('Y-m-d H:i:s'),
        ];
    }
}

$monitor = new WebsiteMonitor();
$changes = $monitor->checkForChanges('https://example.com', '.news-item');
```

###  Health Score

32

—

LowBetter than 69% of packages

Maintenance70

Regular maintenance activity

Popularity3

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

169d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/19547875?v=4)[Denis Sinyukov](/maintainers/dnsinyukov)[@dnsinyukov](https://github.com/dnsinyukov)

---

Top Contributors

[![dnsinyukov](https://avatars.githubusercontent.com/u/19547875?v=4)](https://github.com/dnsinyukov "dnsinyukov (2 commits)")

---

Tags

parserhtmldomcrawlerXpathscrapercss-selector

### Embed Badge

![Health badge](/badges/coderden-page-parser/health.svg)

```
[![Health](https://phpackages.com/badges/coderden-page-parser/health.svg)](https://phpackages.com/packages/coderden-page-parser)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M3.1k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k18.5M67](/packages/spatie-crawler)[paquettg/php-html-parser

An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.

2.5k8.2M130](/packages/paquettg-php-html-parser)[scotteh/php-dom-wrapper

Simple DOM wrapper to select nodes using either CSS or XPath expressions and manipulate results quickly and easily.

1512.0M12](/packages/scotteh-php-dom-wrapper)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49617.1k](/packages/blackfire-player)[sproutcms/cms

Enterprise content management and framework

242.5k4](/packages/sproutcms-cms)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
