PHPackages                             spidra/spidra-php - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. spidra/spidra-php

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

spidra/spidra-php
=================

Official PHP SDK for Spidra — AI-powered web scraping and crawling

v0.1.0(1mo ago)11MITPHPPHP &gt;=8.1

Since Apr 19Pushed 1mo agoCompare

[ Source](https://github.com/spidra-io/spidra-php)[ Packagist](https://packagist.org/packages/spidra/spidra-php)[ Docs](https://spidra.io)[ RSS](/packages/spidra-spidra-php/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (1)Dependencies (2)Versions (2)Used By (0)

Spidra PHP SDK
==============

[](#spidra-php-sdk)

The official PHP SDK for [Spidra](https://spidra.io) that allows you to scrape pages, run browser actions, batch-process URLs, and crawl entire sites. All results come back as structured data ready to feed into your pipelines or store directly.

Requirements
------------

[](#requirements)

- PHP &gt;= 8.1
- Composer

Installation
------------

[](#installation)

```
composer require spidra/spidra-php
```

Get your API key at [app.spidra.io](https://app.spidra.io) under **Settings → API Keys**.

Quick Start
-----------

[](#quick-start)

```
use Spidra\SpidraClient;

$spidra = new SpidraClient('spd_YOUR_API_KEY');

$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://news.ycombinator.com']],
    'prompt' => 'List the top 5 stories with title, points, and comment count',
    'output' => 'json',
]);

print_r($job['result']['content']);
```

Table of Contents
-----------------

[](#table-of-contents)

- [Scraping](#scraping)
    - [Basic scrape](#basic-scrape)
    - [Structured output with JSON schema](#structured-output-with-json-schema)
    - [Geo-targeted scraping](#geo-targeted-scraping)
    - [Authenticated pages](#authenticated-pages)
    - [Browser actions](#browser-actions)
    - [forEach: process every element on a page](#foreach-process-every-element-on-a-page)
    - [Manual job control](#manual-job-control)
- [Batch Scraping](#batch-scraping)
- [Crawling](#crawling)
- [Logs](#logs)
- [Usage Statistics](#usage-statistics)
- [Error Handling](#error-handling)

Scraping
--------

[](#scraping)

All scrape jobs run asynchronously. The `run()` method submits a job and polls until it finishes. If you need more control, use `submit()` and `get()` directly.

Up to 3 URLs can be passed per request and they are processed in parallel.

### Basic scrape

[](#basic-scrape)

```
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://example.com/pricing']],
    'prompt' => 'Extract all pricing plans with name, price, and included features',
    'output' => 'json',
]);

print_r($job['result']['content']);
// ['plans' => [['name' => 'Starter', 'price' => '$9/mo', 'features' => [...]], ...]]
```

### Structured output with JSON schema

[](#structured-output-with-json-schema)

When you need a guaranteed shape, pass a `schema`. The API will enforce the structure and return `null` for any missing fields rather than hallucinating values.

```
$job = $spidra->scrape->run([
    'urls'   => [['url' => 'https://jobs.example.com/senior-engineer']],
    'prompt' => 'Extract the job listing details',
    'output' => 'json',
    'schema' => [
        'type'       => 'object',
        'required'   => ['title', 'company', 'remote'],
        'properties' => [
            'title'      => ['type' => 'string'],
            'company'    => ['type' => 'string'],
            'remote'     => ['type' => ['boolean', 'null']],
            'salary_min' => ['type' => ['number', 'null']],
            'salary_max' => ['type' => ['number', 'null']],
            'skills'     => ['type' => 'array', 'items' => ['type' => 'string']],
        ],
    ],
]);
```

### Geo-targeted scraping

[](#geo-targeted-scraping)

Pass `useProxy: true` and a `proxyCountry` code to route the request through a specific country. Useful for geo-restricted content or localized pricing.

```
$job = $spidra->scrape->run([
    'urls'         => [['url' => 'https://www.amazon.de/gp/bestsellers']],
    'prompt'       => 'List the top 10 products with name and price',
    'useProxy'     => true,
    'proxyCountry' => 'de',
]);
```

Supported country codes include: `us`, `gb`, `de`, `fr`, `jp`, `au`, `ca`, `br`, `in`, `nl`, `sg`, `es`, `it`, `mx`, and [40+ more](https://docs.spidra.io/features/stealth-mode#country-targeting). Use `"global"` or `"eu"` for regional routing.

### Authenticated pages

[](#authenticated-pages)

Pass cookies as a string to scrape pages that require a login session.

```
$job = $spidra->scrape->run([
    'urls'    => [['url' => 'https://app.example.com/dashboard']],
    'prompt'  => 'Extract the monthly revenue and active user count',
    'cookies' => 'session=abc123; auth_token=xyz789',
]);
```

### Browser actions

[](#browser-actions)

Actions let you interact with the page before the scrape runs. They execute in order, and the scrape happens after all actions complete.

```
$job = $spidra->scrape->run([
    'urls' => [
        [
            'url'     => 'https://example.com/products',
            'actions' => [
                ['type' => 'click', 'selector' => '#accept-cookies'],
                ['type' => 'wait',  'duration' => 1000],
                ['type' => 'scroll', 'to' => '80%'],
            ],
        ],
    ],
    'prompt' => 'Extract all product names and prices',
]);
```

**Available actions:**

ActionRequired fieldsDescription`click``selector` or `value`Click a button, link, or any element`type``selector`, `value`Type text into an input or textarea`check``selector` or `value`Check a checkbox`uncheck``selector` or `value`Uncheck a checkbox`wait``duration` (ms)Pause for a set number of milliseconds`scroll``to` (0–100%)Scroll the page to a percentage of its height`forEach``observe`Loop over every matched element and process each oneFor `selector`, use a CSS selector. For `value`, use a plain English description and Spidra will locate the element using AI.

```
// CSS selector
['type' => 'click', 'selector' => "button[data-testid='submit']"]

// Plain English — AI finds the element
['type' => 'click', 'value' => 'Accept all cookies button']

// Type into a field
['type' => 'type', 'selector' => "input[name='q']", 'value' => 'wireless headphones']

// Wait for content to load
['type' => 'wait', 'duration' => 2000]

// Scroll to bottom
['type' => 'scroll', 'to' => '100%']
```

### forEach: process every element on a page

[](#foreach-process-every-element-on-a-page)

`forEach` finds a set of elements on the page and processes each one individually. It is the right tool when you need to collect data from a list of items, paginate through multiple pages, or click into each item's detail page.

> You don't need `forEach` if the data fits on a single page and is short — a plain `prompt` is simpler and works just as well.

**Use forEach when:**

- The list spans multiple pages and you need `pagination`
- You need to click into each item's detail page (`navigate` mode)
- You have 20+ items and want per-item AI extraction to stay consistent (`itemPrompt`)

#### inline mode

[](#inline-mode)

Read each element's content directly without navigating. Best for product cards, search results, table rows.

```
$job = $spidra->scrape->run([
    'urls' => [
        [
            'url'     => 'https://books.toscrape.com/catalogue/category/books/mystery_3/index.html',
            'actions' => [
                [
                    'type'            => 'forEach',
                    'observe'         => 'Find all book cards in the product grid',
                    'mode'            => 'inline',
                    'captureSelector' => 'article.product_pod',
                    'maxItems'        => 20,
                    'itemPrompt'      => 'Extract title, price, and star rating. Return as JSON: {title, price, star_rating}',
                ],
            ],
        ],
    ],
    'prompt' => 'Return a clean JSON array of all books',
    'output' => 'json',
]);
```

#### navigate mode

[](#navigate-mode)

Follow each element's link to its destination page and capture content there. Best for product listings where the full detail is only on the individual page.

```
[
    'type'            => 'forEach',
    'observe'         => 'Find all book title links in the product grid',
    'mode'            => 'navigate',
    'captureSelector' => 'article.product_page',
    'maxItems'        => 10,
    'waitAfterClick'  => 800,
    'itemPrompt'      => 'Extract title, price, star rating, and availability. Return as JSON.',
]
```

#### click mode

[](#click-mode)

Click each element, capture the content that appears (a modal, drawer, or expanded section), then move on. Best for hotel room cards, FAQ accordions, or any UI where clicking reveals hidden content.

```
[
    'type'            => 'forEach',
    'observe'         => 'Find all room type cards',
    'mode'            => 'click',
    'captureSelector' => "[role='dialog']",
    'maxItems'        => 8,
    'waitAfterClick'  => 1200,
    'itemPrompt'      => 'Extract room name, bed type, price per night, and amenities. Return as JSON.',
]
```

#### Pagination

[](#pagination)

After processing all elements on the current page, follow the next-page link and continue collecting.

```
[
    'type'       => 'forEach',
    'observe'    => 'Find all book title links',
    'mode'       => 'navigate',
    'maxItems'   => 40,
    'pagination' => [
        'nextSelector' => 'li.next > a',
        'maxPages'     => 3, // 3 additional pages beyond the first
    ],
]
```

`maxItems` applies across all pages combined. The loop stops when you hit `maxItems`, run out of elements, or reach `maxPages`.

#### Per-element actions

[](#per-element-actions)

Run additional browser actions on each item after navigating or clicking into it, before the content is captured. Useful for scrolling below the fold or expanding collapsed sections.

```
[
    'type'            => 'forEach',
    'observe'         => 'Find all book title links',
    'mode'            => 'navigate',
    'captureSelector' => 'article.product_page',
    'maxItems'        => 5,
    'waitAfterClick'  => 1000,
    'actions'         => [
        ['type' => 'scroll', 'to' => '50%'],
    ],
    'itemPrompt' => 'Extract title, price, and full description. Return as JSON.',
]
```

#### itemPrompt vs top-level prompt

[](#itemprompt-vs-top-level-prompt)

`itemPrompt``prompt`When it runsDuring scraping, once per itemAfter all items are collectedWhat it seesOne item's contentAll items combinedOutput location`result['content']` (per-item array)`result['content']` (final shaped output)Use `itemPrompt` to extract fields from each item individually. Use the top-level `prompt` to filter, sort, or reshape the full combined output. They can be used together.

### Manual job control

[](#manual-job-control)

Use `submit()` and `get()` when you want to manage polling yourself, or fire-and-forget and check back later.

```
// Submit a job and get the jobId immediately
$queued = $spidra->scrape->submit([
    'urls'   => [['url' => 'https://example.com/listings']],
    'prompt' => 'Extract all property listings',
    'output' => 'json',
]);

$jobId = $queued['jobId'];

// Check status at any point
$result = $spidra->scrape->get($jobId);

if ($result['status'] === 'completed') {
    print_r($result['result']['content']);
} elseif ($result['status'] === 'failed') {
    echo $result['error'];
}
```

Job statuses: `waiting`, `active`, `completed`, `failed`.

**Timeout option:**

`run()` accepts a `$timeout` (seconds) and `$pollInterval` (seconds) argument:

```
$job = $spidra->scrape->run(
    params:       [...],
    timeout:      180, // wait up to 3 minutes
    pollInterval: 5,   // check every 5 seconds
);
```

Batch Scraping
--------------

[](#batch-scraping)

Submit up to 50 URLs in a single request. All URLs are processed in parallel. Each URL is a plain string.

```
$batch = $spidra->batch->run([
    'urls'     => [
        'https://shop.example.com/product/1',
        'https://shop.example.com/product/2',
        'https://shop.example.com/product/3',
    ],
    'prompt'   => 'Extract product name, price, and availability',
    'output'   => 'json',
    'useProxy' => true,
]);

foreach ($batch['items'] as $item) {
    if ($item['status'] === 'completed') {
        echo $item['url'] . ': ';
        print_r($item['result']);
    } elseif ($item['status'] === 'failed') {
        echo $item['url'] . ' failed: ' . $item['error'] . "\n";
    }
}
```

**Retry failed items:**

```
$queued = $spidra->batch->submit([
    'urls'   => ['https://example.com/1', 'https://example.com/2'],
    'prompt' => 'Extract the page title',
]);

$batchId = $queued['batchId'];

// Later, after checking status
$result = $spidra->batch->get($batchId);
if ($result['failedCount'] > 0) {
    $spidra->batch->retry($batchId);
}
```

**Cancel a running batch:**

```
$result = $spidra->batch->cancel($batchId);
echo "Cancelled {$result['cancelledItems']} items, refunded {$result['creditsRefunded']} credits\n";
```

**List past batches:**

```
$result = $spidra->batch->list(page: 1, limit: 20);

foreach ($result['jobs'] as $job) {
    echo "{$job['uuid']} {$job['status']} {$job['completedCount']}/{$job['totalUrls']}\n";
}
```

Crawling
--------

[](#crawling)

Given a starting URL, Spidra discovers pages automatically according to your instruction and extracts structured data from each one.

```
$job = $spidra->crawl->run([
    'baseUrl'              => 'https://competitor.com/blog',
    'crawlInstruction'     => 'Find all blog posts published in 2024',
    'transformInstruction' => 'Extract the title, author, publish date, and a one-sentence summary',
    'maxPages'             => 30,
    'useProxy'             => true,
]);

foreach ($job['result'] as $page) {
    echo $page['url'] . "\n";
    print_r($page['data']);
}
```

**Submit without waiting:**

```
$queued = $spidra->crawl->submit([
    'baseUrl'              => 'https://example.com/docs',
    'crawlInstruction'     => 'Find all documentation pages',
    'transformInstruction' => 'Extract the page title and main content summary',
    'maxPages'             => 50,
]);

$jobId = $queued['jobId'];

// Check status later
$status = $spidra->crawl->get($jobId);
```

**Get signed download URLs for all crawled pages:**

Each page includes `html_url` and `markdown_url` pointing to S3-signed URLs that expire after 1 hour.

```
$result = $spidra->crawl->pages($jobId);

foreach ($result['pages'] as $page) {
    echo $page['url'] . ' — ' . $page['status'] . "\n";
    // $page['html_url']     — download raw HTML
    // $page['markdown_url'] — download markdown
}
```

**Re-extract with a new instruction:**

Runs a new AI transformation over an existing completed crawl without re-crawling. Charges credits for the transformation only.

```
$newJob = $spidra->crawl->extract(
    $sourceJobId,
    'Extract only the product SKUs and prices as a flat list'
);

// Poll the new job
$result = $spidra->crawl->get($newJob['jobId']);
```

**Crawl history and stats:**

```
$history = $spidra->crawl->history(page: 1, limit: 10);
foreach ($history['jobs'] as $job) {
    echo "{$job['base_url']} — {$job['status']} — {$job['pages_crawled']} pages\n";
}

$stats = $spidra->crawl->stats();
echo "Total crawls: {$stats['total']}\n";
```

Logs
----

[](#logs)

Scrape logs are stored for every job that runs through the API.

```
// List logs with optional filters
$result = $spidra->logs->list([
    'status'    => 'failed',       // 'success' | 'failed'
    'searchTerm'=> 'amazon.com',
    'dateStart' => '2024-01-01',
    'dateEnd'   => '2024-12-31',
    'page'      => 1,
    'limit'     => 20,
]);

foreach ($result['data']['logs'] as $log) {
    echo $log['urls'][0]['url'] . ' — ' . $log['status'] . ' — ' . $log['credits_used'] . " credits\n";
}
```

**Get a single log with full extraction result:**

```
$log = $spidra->logs->get('log-uuid-here');
print_r($log['data']['result_data']); // full AI output for that job
```

Usage Statistics
----------------

[](#usage-statistics)

Returns credit and request usage broken down by day or week.

```
// Range options: '7d' | '30d' | 'weekly'
$result = $spidra->usage->get('30d');

foreach ($result['data'] as $row) {
    echo "{$row['date']}: {$row['requests']} requests, {$row['credits']} credits, {$row['tokens']} tokens\n";
}
```

Error Handling
--------------

[](#error-handling)

Every API error throws a typed exception. Catch the specific class you care about or fall back to the base `SpidraException`.

```
use Spidra\Exceptions\AuthenticationException;
use Spidra\Exceptions\InsufficientCreditsException;
use Spidra\Exceptions\RateLimitException;
use Spidra\Exceptions\ServerException;
use Spidra\Exceptions\SpidraException;

try {
    $job = $spidra->scrape->run([...]);
} catch (AuthenticationException $e) {
    // 401 — API key is missing or invalid
    echo "Check your API key\n";
} catch (InsufficientCreditsException $e) {
    // 403 — monthly credit limit reached
    echo "Out of credits. Top up at app.spidra.io\n";
} catch (RateLimitException $e) {
    // 429 — too many requests
    echo "Rate limited, back off and retry\n";
} catch (ServerException $e) {
    // 500 — something went wrong on Spidra's side
    echo "Server error, try again\n";
} catch (SpidraException $e) {
    // Any other API error
    echo "{$e->statusCode}: {$e->getMessage()}\n";
}
```

Custom Base URL
---------------

[](#custom-base-url)

```
$spidra = new SpidraClient(
    apiKey:  'spd_YOUR_API_KEY',
    baseUrl: 'http://localhost:4321/api', // for local development
);
```

Resources
---------

[](#resources)

- [Spidra Documentation](https://docs.spidra.io)
- [Spidra Dashboard](https://app.spidra.io)
- [Report an issue](https://github.com/spidra-io/spidra-php/issues)

License
-------

[](#license)

MIT

###  Health Score

34

—

LowBetter than 75% of packages

Maintenance90

Actively maintained with recent releases

Popularity3

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity32

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

51d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/e9da9279072f013a2930dd3e9b8c2b13c1c961b40e2376e5256f1d303ac987d8?d=identicon)[olawanlejoel](/maintainers/olawanlejoel)

---

Top Contributors

[![olawanlejoel](https://avatars.githubusercontent.com/u/57611810?v=4)](https://github.com/olawanlejoel "olawanlejoel (3 commits)")

---

Tags

ai-scrapingdata-extractionphp-web-scraperphp-web-scrapingwebscrapingaicrawlingscrapingweb-scraping spidra

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/spidra-spidra-php/health.svg)

```
[![Health](https://phpackages.com/badges/spidra-spidra-php/health.svg)](https://phpackages.com/packages/spidra-spidra-php)
```

###  Alternatives

[maestroerror/laragent

Power of AI Agents in your Laravel project

638142.5k](/packages/maestroerror-laragent)[crwlr/crawler

Web crawling and scraping library.

37116.4k2](/packages/crwlr-crawler)[mage-os/module-llm-txt

AI-powered LLMs.txt generation for Magento 2 / Mage-OS stores. Help AI systems understand your store with OpenAI-generated content.

223.3k](/packages/mage-os-module-llm-txt)[eslazarev/wildberries-sdk

Wildberries OpenAPI clients (generated).

232.5k](/packages/eslazarev-wildberries-sdk)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
