PHPackages                             rajpurohithitesh/advance-phpscraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. rajpurohithitesh/advance-phpscraper

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

rajpurohithitesh/advance-phpscraper
===================================

Advanced PHP web scraping library with plugin support

v1.0.0(11mo ago)00MITPHPPHP ^8.0CI passing

Since May 19Pushed 11mo ago1 watchersCompare

[ Source](https://github.com/RajpurohitHitesh/advance-phpscraper)[ Packagist](https://packagist.org/packages/rajpurohithitesh/advance-phpscraper)[ RSS](/packages/rajpurohithitesh-advance-phpscraper/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)Dependencies (17)Versions (2)Used By (0)

Advance PHP Scraper
===================

[](#advance-php-scraper)

**Advance PHP Scraper** is a powerful, modular, and extensible PHP library designed for web scraping. It simplifies extracting data from websites, such as links, images, meta tags, structured data, and more, while offering advanced features like plugin support, rate limiting, and asynchronous scraping. Whether you're a beginner or an experienced developer, this library provides a flexible and user-friendly interface to scrape web content efficiently.

This document is crafted to be beginner-friendly, with detailed explanations and examples to help you get started, even if you're new to PHP or web scraping. By the end, you'll know how to install, use, and extend the library with ease.

---

Table of Contents
-----------------

[](#table-of-contents)

1. [What is Advance PHP Scraper?](#what-is-advance-php-scraper)
    - [Why Use This Library?](#why-use-this-library)
    - [Who Should Use It?](#who-should-use-it)
2. [Key Features](#key-features)
    - [Core Scraping Features](#core-scraping-features)
    - [Advanced Features](#advanced-features)
    - [Plugin System](#plugin-system)
3. [Getting Started](#getting-started)
    - [Prerequisites](#prerequisites)
    - [Installation](#installation)
    - [Verifying Installation](#verifying-installation)
4. [Basic Usage: Your First Scrape](#basic-usage-your-first-scrape)
    - [Scraping a Simple Website](#scraping-a-simple-website)
    - [Extracting Links](#extracting-links)
    - [Extracting Images](#extracting-images)
    - [Extracting Meta Tags](#extracting-meta-tags)
    - [Using the Command-Line Interface (CLI)](#using-the-command-line-interface-cli)
5. [Intermediate Usage: Leveling Up](#intermediate-usage-leveling-up)
    - [Scraping Sitemaps](#scraping-sitemaps)
    - [Scraping RSS Feeds](#scraping-rss-feeds)
    - [Parsing Assets (CSV, JSON, XML)](#parsing-assets-csv-json-xml)
    - [Checking HTTP Status Codes](#checking-http-status-codes)
6. [Advanced Usage: Power User Mode](#advanced-usage-power-user-mode)
    - [Rate Limiting: Playing Nice with Servers](#rate-limiting-playing-nice-with-servers)
    - [Queue System: Scraping Multiple URLs](#queue-system-scraping-multiple-urls)
    - [API Integration: Combining Scraping with APIs](#api-integration-combining-scraping-with-apis)
    - [Custom CSS Selectors](#custom-css-selectors)
7. [Plugins: Supercharging Your Scraper](#plugins-supercharging-your-scraper)
    - [What Are Plugins?](#what-are-plugins)
    - [Available Plugins](#available-plugins)
    - [How to Use Plugins](#how-to-use-plugins)
    - [Learn More About Plugins](#learn-more-about-plugins)
8. [Configuration: Customizing Your Scraper](#configuration-customizing-your-scraper)
    - [Setting User Agent](#setting-user-agent)
    - [Adjusting Timeout](#adjusting-timeout)
    - [Following Redirects](#following-redirects)
    - [Using Constructor Configuration](#using-constructor-configuration)
9. [Testing: Ensuring Everything Works](#testing-ensuring-everything-works)
    - [Running Tests](#running-tests)
    - [Writing Your Own Tests](#writing-your-own-tests)
10. [Troubleshooting: Solving Common Problems](#troubleshooting-solving-common-problems)
    - [Installation Issues](#installation-issues)
    - [Scraping Errors](#scraping-errors)
    - [Plugin Problems](#plugin-problems)
11. [Contributing: Joining the Community](#contributing-joining-the-community)
12. [License: Understanding Usage Rights](#license-understanding-usage-rights)
13. [Resources: Further Learning](#resources-further-learning)

---

What is Advance PHP Scraper?
----------------------------

[](#what-is-advance-php-scraper)

**Advance PHP Scraper** is a PHP library that helps you extract data from websites, like a super-smart librarian who can quickly find and summarize books for you. Web scraping is like copying information from a webpage (e.g., product names, prices, or blog titles) using code instead of manually copying and pasting. This library makes it easy to navigate websites, grab specific data, and even handle tricky tasks like scraping JavaScript-heavy pages or processing thousands of URLs at once.

Imagine you’re at a giant library (the internet), and you need to collect all book titles from a specific shelf (a website). Doing this by hand would take forever, but **Advance PHP Scraper** is like a magical robot that does it for you in seconds. It’s designed to be:

- **Easy**: Simple commands to get data, even if you’re new to coding.
- **Powerful**: Handles complex tasks like async scraping or cloud deployment.
- **Flexible**: Add your own features using plugins, like customizing a Lego set.

### Why Use This Library?

[](#why-use-this-library)

There are other scraping tools out there, but here’s why **Advance PHP Scraper** is special:

- **Beginner-Friendly**: The code is straightforward, and this guide explains everything like you’re five.
- **Modular**: Only use the features you need, keeping your project lightweight.
- **Robust**: Built-in error handling, logging, and rate limiting prevent crashes or bans.
- **Extensible**: Plugins let you add custom features without touching the core code.
- **Free and Open-Source**: Use it, modify it, share it—under the MIT License.

### Who Should Use It?

[](#who-should-use-it)

- **New Coders**: If you’re learning PHP and want to try web scraping, this is a great starting point.
- **Hobbyists**: Want to scrape your favorite blog’s headlines or collect product prices? This is for you.
- **Professionals**: Need to scrape thousands of pages for data analysis? The library’s advanced features have you covered.
- **Educators**: Teaching PHP or web scraping? Use this library for hands-on examples.

---

Key Features
------------

[](#key-features)

Let’s explore what **Advance PHP Scraper** can do. Think of these features as tools in a toolbox, each designed for a specific job.

### Core Scraping Features

[](#core-scraping-features)

These are the basic tools you’ll use most often:

- **Extract Common Data**:
    - **Links**: Grab all `` tags (e.g., URLs and their text).
    - **Images**: Collect `` tags (e.g., source URLs and alt text).
    - **Meta Tags**: Extract `` tags (e.g., description, Open Graph data).
    - **Headings**: Get `` to `` tags for page structure.
    - **Paragraphs**: Pull `` tag content for text.
    - **Structured Data**: Extract JSON-LD, Microdata, and RDFa (e.g., schema.org data).
- **Sitemap Parsing**: Read XML sitemaps to discover all pages on a site.
- **RSS Feed Parsing**: Extract news or blog feeds.
- **Asset Parsing**: Process CSV, JSON, or XML files linked on pages.
- **Custom Selectors**: Use CSS selectors to target specific elements (e.g., `div.content`).

### Advanced Features

[](#advanced-features)

These tools are for power users:

- **Rate Limiting**: Control how fast you scrape to avoid server bans (like driving at the speed limit).
- **Queue System**: Scrape multiple URLs in batches, like a to-do list for your scraper.
- **API Integration**: Combine scraped data with external APIs (e.g., fetch product details).
- **CLI Interface**: Run scraping tasks from the command line, perfect for quick jobs.
- **Multilingual Support**: Handle non-English text with proper encoding (e.g., Spanish, Chinese).
- **Error Handling**: Logs errors and checks HTTP status codes to keep scraping smooth.

### Plugin System

[](#plugin-system)

Plugins are like optional upgrades for your toolbox:

- **Headless Browsing**: Scrape JavaScript-rendered pages (e.g., React apps).
- **Async Scraping**: Scrape multiple pages at once for speed.
- **NLP Analysis**: Extract keywords and entities from text.
- **PDF Parsing**: Read text from linked PDFs.
- **Caching**: Save scraped data to reduce server load.
- **Cloud Deployment**: Run scraping tasks on AWS Lambda.
- **Custom Plugins**: Add your own features (e.g., custom logging).

---

Getting Started
---------------

[](#getting-started)

Let’s set up the library and run your first scrape. This section is like a cooking recipe: follow each step, and you’ll have a working scraper in no time.

### Prerequisites

[](#prerequisites)

Before you start, you need:

- **PHP 7.4 or Higher**: The library works with PHP 7.4, 8.0, or 8.1. Check your version: ```
    php -v
    ```

    If it’s lower, download a newer version from [php.net](https://www.php.net/).
- **Composer**: This is a tool to manage PHP dependencies (like a grocery delivery service for code). Install it: ```
    php -r "copy('https://getcomposer.org/installer', 'composer-setup.php');"
    php composer-setup.php
    php -r "unlink('composer-setup.php');"
    mv composer.phar /usr/local/bin/composer
    ```
- **A Text Editor**: Use VS Code, Sublime Text, or any editor to write PHP code.
- **Internet Connection**: Needed to download the library and scrape websites.

### Installation

[](#installation)

Here’s how to install the library:

1. **Create a Project Folder**: Make a new directory for your scraping project:

    ```
    mkdir my-scraper
    cd my-scraper
    ```
2. **Install Advance PHP Scraper**: Run this Composer command to download the library and its dependencies:

    ```
    composer require rajpurohithitesh/advance-phpscraper
    ```

    This creates a `vendor/` folder with the library and dependencies like `symfony/browser-kit` and `guzzlehttp/guzzle`.
3. **Check the Files**: After installation, you’ll see:

    - `vendor/`: Contains the library and dependencies.
    - `composer.json`: Lists the project’s dependencies.
    - `composer.lock`: Locks dependency versions.

### Verifying Installation

[](#verifying-installation)

Let’s make sure everything works. Create a file named `test.php`:

```
