PHPackages                             webcrawlerapi/sdk - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [API Development](/categories/api)
4. /
5. webcrawlerapi/sdk

ActiveLibrary[API Development](/categories/api)

webcrawlerapi/sdk
=================

A PHP SDK for WebCrawler API - turn website into data

1.0.10(1mo ago)7105↓100%1MITPHPPHP &gt;=8.1CI passing

Since Dec 31Pushed 1mo agoCompare

[ Source](https://github.com/WebCrawlerAPI/webcrawlerapi-php-sdk)[ Packagist](https://packagist.org/packages/webcrawlerapi/sdk)[ Docs](https://github.com/webcrawlerapi/webcrawlerapi-php-sdk)[ RSS](/packages/webcrawlerapi-sdk/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (3)Versions (7)Used By (0)

WebCrawler API PHP SDK
======================

[](#webcrawler-api-php-sdk)

[![Latest Version on Packagist](https://camo.githubusercontent.com/bb0424d434719966b8cf3e22cd56213b3cd27462ba48ccbdf3f7049e60c332a3/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f776562637261776c65726170692f73646b2e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/webcrawlerapi/sdk)[![Total Downloads](https://camo.githubusercontent.com/9200a9f2f22e385d1c10a5c9b7a134703d83ba03ec6ddbda9198c718ac65d6ea/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f776562637261776c65726170692f73646b2e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/webcrawlerapi/sdk)[![License](https://camo.githubusercontent.com/3eb3bd516928a2e0ac8ce70ed65e5b1ae6199cba35bfee6b653d945386bd7a52/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f776562637261776c65726170692f73646b2e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/webcrawlerapi/sdk)

A PHP SDK for interacting with the WebCrawlerAPI - a powerful web crawling and scraping service.

> In order to use the API you have to get an API key from [WebCrawlerAPI](https://dash.webcrawlerapi.com/access)

Read documentation at [WebCrawlerAPI Docs](https://webcrawlerapi.com/docs) for more information.

Requirements
------------

[](#requirements)

- PHP 8.0 or higher
- Composer
- `ext-json` PHP extension
- Guzzle HTTP Client 7.0 or higher

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require webcrawlerapi/sdk
```

Usage
-----

[](#usage)

```
use WebCrawlerAPI\WebCrawlerAPI;

// Initialize the client
$crawler = new WebCrawlerAPI('your_api_key');

// Synchronous crawling (blocks until completion)
$job = $crawler->crawl(
    url: 'https://example.com',
    scrapeType: 'markdown',
    itemsLimit: 10,
    webhookUrl: 'https://yourserver.com/webhook',
    allowSubdomains: false,
    maxPolls: 100  // Optional: maximum number of status checks
);
echo "Job completed with status: {$job->status}\n";

// Access job items and their content
foreach ($job->jobItems as $item) {
    echo "Page title: {$item->title}\n";
    echo "Original URL: {$item->originalUrl}\n";
    echo "Item status: {$item->status}\n";

    // Get the content based on job's scrape_type
    // Returns null if item is not in "done" status
    $content = $item->getContent();
    if ($content) {
        echo "Content length: " . strlen($content) . "\n";
        echo "Content preview: " . substr($content, 0, 200) . "...\n";
    } else {
        echo "Content not available or item not done\n";
    }
}

// Access job items and their parent job
foreach ($job->jobItems as $item) {
    echo "Item URL: {$item->originalUrl}\n";
    echo "Parent job status: {$item->job->status}\n";
    echo "Parent job URL: {$item->job->url}\n";
}

// Or use asynchronous crawling
$response = $crawler->crawlAsync(
    url: 'https://example.com',
    scrapeType: 'markdown',
    itemsLimit: 10,
    webhookUrl: 'https://yourserver.com/webhook',
    allowSubdomains: false
);

// Get the job ID from the response
$jobId = $response->id;
echo "Crawling job started with ID: {$jobId}\n";

// Check job status and get results
$job = $crawler->getJob($jobId);
echo "Job status: {$job->status}\n";

// Access job details
echo "Crawled URL: {$job->url}\n";
echo "Created at: {$job->createdAt->format('Y-m-d H:i:s')}\n";
echo "Number of items: " . count($job->jobItems) . "\n";

// Cancel a running job if needed
$cancelResponse = $crawler->cancelJob($jobId);
echo "Cancellation response: " . json_encode($cancelResponse) . "\n";
```

API Methods
-----------

[](#api-methods)

### crawl()

[](#crawl)

Starts a new crawling job and waits for its completion. This method will continuously poll the job status until:

- The job reaches a terminal state (done, error, or cancelled)
- The maximum number of polls is reached (default: 100)
- The polling interval is determined by the server's `recommendedPullDelayMs` or defaults to 5 seconds

### crawlAsync()

[](#crawlasync)

Starts a new crawling job and returns immediately with a job ID. Use this when you want to handle polling and status checks yourself, or when using webhooks.

### getJob()

[](#getjob)

Retrieves the current status and details of a specific job.

### cancelJob()

[](#canceljob)

Cancels a running job. Any items that are not in progress or already completed will be marked as canceled and will not be charged.

Parameters
----------

[](#parameters)

### Crawl Methods (crawl and crawlAsync)

[](#crawl-methods-crawl-and-crawlasync)

- `url` (required): The seed URL where the crawler starts. Can be any valid URL.
- `scrapeType` (default: "html"): The type of scraping you want to perform. Can be "html", "cleaned", or "markdown".
- `itemsLimit` (default: 10): Crawler will stop when it reaches this limit of pages for this job.
- `webhookUrl` (optional): The URL where the server will send a POST request once the task is completed.
- `allowSubdomains` (default: false): If true, the crawler will also crawl subdomains.
- `whitelistRegexp` (optional): A regular expression to whitelist URLs. Only URLs that match the pattern will be crawled.
- `blacklistRegexp` (optional): A regular expression to blacklist URLs. URLs that match the pattern will be skipped.
- `maxPolls` (optional, crawl only): Maximum number of status checks before returning (default: 100)

### Responses

[](#responses)

#### CrawlAsync Response

[](#crawlasync-response)

The `crawlAsync()` method returns a `CrawlResponse` object with:

- `id`: The unique identifier of the created job

#### Job Response

[](#job-response)

The Job object contains detailed information about the crawling job:

- `id`: The unique identifier of the job
- `orgId`: Your organization identifier
- `url`: The seed URL where the crawler started
- `status`: The status of the job (new, in\_progress, done, error)
- `scrapeType`: The type of scraping performed
- `createdAt`: The date when the job was created
- `finishedAt`: The date when the job was finished (if completed)
- `webhookUrl`: The webhook URL for notifications
- `webhookStatus`: The status of the webhook request
- `webhookError`: Any error message if the webhook request failed
- `jobItems`: Array of JobItem objects representing crawled pages
- `recommendedPullDelayMs`: Server-recommended delay between status checks

### JobItem Properties

[](#jobitem-properties)

Each JobItem object represents a crawled page and contains:

- `id`: The unique identifier of the item
- `jobId`: The parent job identifier
- `job`: Reference to the parent Job object
- `originalUrl`: The URL of the page
- `pageStatusCode`: The HTTP status code of the page request
- `status`: The status of the item (new, in\_progress, done, error)
- `title`: The page title
- `createdAt`: The date when the item was created
- `cost`: The cost of the item in $
- `referredUrl`: The URL where the page was referred from
- `lastError`: Any error message if the item failed
- `errorCode`: The error code if the item failed (if available)
- `getContent()`: Method to get the page content based on the job's scrapeType (html, cleaned, or markdown). Returns null if the item's status is not "done" or if content is not available. Content is automatically fetched and cached when accessed.
- `rawContentUrl`: URL to the raw content (if available)
- `cleanedContentUrl`: URL to the cleaned content (if scrapeType is "cleaned")
- `markdownContentUrl`: URL to the markdown content (if scrapeType is "markdown")

Testing
-------

[](#testing)

### Running Tests

[](#running-tests)

1. **Install dependencies:**

    ```
    composer install
    ```
2. **Run unit tests:**

    ```
    vendor/bin/phpunit tests/Unit --testdox
    ```
3. **Run integration tests (optional, requires API key):**

    ```
    export WEBCRAWLER_API_KEY="your-api-key"
    vendor/bin/phpunit tests/Integration --testdox
    ```

Or use the test runner script: `./run-tests.sh`

License
-------

[](#license)

MIT License

###  Health Score

45

—

FairBetter than 92% of packages

Maintenance96

Actively maintained with recent releases

Popularity18

Limited adoption so far

Community4

Small or concentrated contributor base

Maturity51

Maturing project, gaining track record

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~88 days

Total

6

Last Release

52d ago

PHP version history (2 changes)1.0.1PHP &gt;=8.0

1.0.6PHP &gt;=8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/c6d4be7b13d6092b3117abadc3007e48d7f8ec865e84818a19e3c1e74fa8fcbf?d=identicon)[webcrawlerapi](/maintainers/webcrawlerapi)

---

Tags

apisdkdatacrawlerwebsitescraperllmragwebcrawlerwebcrawlerapi

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/webcrawlerapi-sdk/health.svg)

```
[![Health](https://phpackages.com/badges/webcrawlerapi-sdk/health.svg)](https://phpackages.com/packages/webcrawlerapi-sdk)
```

###  Alternatives

[openai-php/laravel

OpenAI PHP for Laravel is a supercharged PHP API client that allows you to interact with the Open AI API

3.7k7.6M74](/packages/openai-php-laravel)[theodo-group/llphant

LLPhant is a library to help you build Generative AI applications.

1.5k311.5k5](/packages/theodo-group-llphant)[mailchimp/transactional

458.9M16](/packages/mailchimp-transactional)[deepseek-php/deepseek-php-client

deepseek PHP client is a robust and community-driven PHP client library for seamless integration with the Deepseek API, offering efficient access to advanced AI and data processing capabilities.

47073.9k5](/packages/deepseek-php-deepseek-php-client)[resend/resend-php

Resend PHP library.

564.7M21](/packages/resend-resend-php)[mozex/anthropic-laravel

Anthropic PHP for Laravel is a supercharged PHP API client that allows you to interact with the Anthropic API

71226.4k1](/packages/mozex-anthropic-laravel)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
