PHPackages                             restyler/scrapeninja-api-php-client - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [API Development](/categories/api)
4. /
5. restyler/scrapeninja-api-php-client

ActiveLibrary[API Development](/categories/api)

restyler/scrapeninja-api-php-client
===================================

Web scraper API with proxy rotation, retries, and with Chrome TLS fingerprint emulation

v1.0.6(3y ago)106.4k↑28.6%MITPHP

Since Apr 29Pushed 3y ago4 watchersCompare

[ Source](https://github.com/restyler/scrapeninja-api-php-client)[ Packagist](https://packagist.org/packages/restyler/scrapeninja-api-php-client)[ RSS](/packages/restyler-scrapeninja-api-php-client/feed)WikiDiscussions main Synced yesterday

READMEChangelogDependencies (1)Versions (8)Used By (0)

ScrapeNinja Web scraper PHP API Client
======================================

[](#scrapeninja-web-scraper-php-api-client)

This library is a thin Guzzle-based wrapper around [ScrapeNinja Web Scraping API](https://scrapeninja.net/).

What is ScrapeNinja?
--------------------

[](#what-is-scrapeninja)

Simple &amp; high performance web scraping API which

- has 2 modes of websites rendering:
    - `scrape()`: fast, which emulates Chrome TLS fingerprint without Puppeteer/Playwright overhead
    - `scrapeJs()`: full fledged real Chrome with Javascript rendering and basic interaction ([clicking, filling in forms](https://scrapeninja.net/scraper-sandbox?slug=interact-click)).
- is backed by rotating proxies (geos: US, EU, Brazil, France, Germany, 4g residential proxies available, your own proxy can be specified as well upon request).
- has smart retries and timeouts working out of the box
- allows to extract arbitrary data from raw HTML without dealing with PHP HTML parsing libraries: just pass `extractor` function, written in JavaScript, and it will be executed on ScrapeNinja servers. ScrapeNinja uses Cheerio which is a jQuery-like library to extract data from HTML, you can quickly build &amp; test your extractor function in [Live Cheerio Sandbox](https://scrapeninja.net/cheerio-sandbox/), see `/examples/extractor.php` for an extractor which gets pure data from HackerNews HTML source.

ScrapeNinja Full API Documentation
----------------------------------

[](#scrapeninja-full-api-documentation)

ScrapeNinja Live Sandbox
------------------------

[](#scrapeninja-live-sandbox)

ScrapeNinja allows you to quickly create and test your web scraper in browser:

Use cases
---------

[](#use-cases)

The popular use case of ScrapeNinja is when regular Guzzle/cURL fails to get the scraped website response reliably, even with headers fully identical to real browser, and gets 403 or 5xx errors instead.

Another major use case is when you want to avoid Puppeteer setup and maintenance but you still need real Javascript rendering instead of sending raw network requests.

ScrapeNinja helps to reduce the amount of code for retrieving HTTP responses and dealing with retries, proxy handling, and timeouts.

### Read more about ScrapeNinja:

[](#read-more-about-scrapeninja)

### Get your free access key here:

[](#get-your-free-access-key-here)

See /examples folder for examples

Installation
============

[](#installation)

```
composer require restyler/scrapeninja-api-php-client

```

Examples:
=========

[](#examples)

`/examples` folder of this repo contains quick ready-to-launch examples how ScrapeNinja can be used. To execute these examples in a terminal, [retrieve your API key](https://rapidapi.com/restyler/api/scrapeninja) and then set it as environment variable:

```
export SCRAPENINJA_RAPIDAPI_KEY=YOUR-KEY
php ./examples/extractor.php
```

### Basic scrape request

[](#basic-scrape-request)

```
use ScrapeNinja\Client;

$scraper = new Client([
        "rapidapi_key" => getenv('SCRAPENINJA_RAPIDAPI_KEY')
    ]
);

$response = $client->scrape([
  // target website URL
  "url" => "https://news.ycombinator.com/",

  // Proxy geo. eu, br, de, fr, 4g-eu, us proxy locations are available. Default: "us"
  "geo" => "us",

  // Custom headers to pass to target website. Space after ':' is mandatory according to HTTP spec.
  // User-agent header is not required, it is attached automatically.
  "headers" => ["Some-custom-header: header1-val", "Other-header: header2-val"],

  "method" => "GET" // HTTP method to use. Default: "GET". Allowed: "GET", "POST", "PUT".
]);

echo 'Basic scrape response:';

// response contains associative array with response, with
// 'body'  containing target website response (as a string) and
// 'info' property containing all the metadata.
echo 'HTTP Response status: ' . $response['info']['statusCode'] . "\n";
echo 'HTTP Response status: ' . print_r($response['info']['headers'], 1) . "\n";
echo 'HTTP Response body (truncated): ' . mb_substr($response['body'], 0, 300) . '...' . "\n";

/*
    Array
(
    [info] => Array
        (
            [version] => 1.1 (string)
            [statusCode] => 200 (integer)
            [statusMessage] => OK (string)
            [headers] => Array
                (
                    [server] => nginx
                    [date] => Mon, 02 May 2022 04:38:12 GMT
                    [content-type] => text/html; charset=utf-8
                    [content-encoding] => gzip
                )

        )

    [body] => ...
)
    */
```

Get full HTML rendered by real browser (Puppeteer) in PHP:
==========================================================

[](#get-full-html-rendered-by-real-browser-puppeteer-in-php)

```
$response = $client->scrapeJs([
    "url" => "https://news.ycombinator.com/"
]);
```

Extract data from raw HTML:
===========================

[](#extract-data-from-raw-html)

```
// javascript extractor function, executed on ScrapeNinja servers
$extractor = "// define function which accepts body and cheerio as args
    function extract(input, cheerio) {
        // return object with extracted values
        let $ = cheerio.load(input);

        let items = [];
        $('.titleline').map(function() {
                  let infoTr = $(this).closest('tr').next();
                  let commentsLink = infoTr.find('a:contains(comments)');
                items.push([
                    $(this).text(),
                      $('a', this).attr('href'),
                      infoTr.find('.hnuser').text(),
                      parseInt(infoTr.find('.score').text()),
                      infoTr.find('.age').attr('title'),
                      parseInt(commentsLink.text()),
                      'https://news.ycombinator.com/' + commentsLink.attr('href'),
                      new Date()
                ]);
            });

      return { items };
    }";

// the extractor function works identically with both scrape() and scrapeJs() ScrapeNinja rendering modes
$response = $client->scrapeJs([
    'url' => 'https://scrapeninja.net/samples/hackernews.html',
    'extractor' => $extractor
]);

echo 'Extractor function test:';
print_r($response['extractor']);
```

Response will contain PHP array with pure data:

```
(
    [result] => Array
        (
            [items] => Array
                (
                    [0] => Array
                        (
                            [0] => A bug fix in the 8086 microprocessor, revealed in the die's silicon (righto.com)
                            [1] => https://www.righto.com/2022/11/a-bug-fix-in-8086-microprocessor.html
                            [2] => _Microft
                            [3] => 216
                            [4] => 2022-11-26T22:28:40
                            [5] => 66
                            [6] => https://news.ycombinator.com/item?id=33757484
                            [7] => 2022-12-19T09:20:53.875Z
                        )

                    [1] => Array
                        (
                            [0] => Cache invalidation is one of the hardest problems in computer science (surfingcomplexity.blog)
                            [1] => https://surfingcomplexity.blog/2022/11/25/cache-invalidation-really-is-one-of-the-hardest-things-in-computer-science/
                            [2] => azhenley
                            [3] => 126
                            [4] => 2022-11-26T03:43:06
                            [5] => 66
                            [6] => https://news.ycombinator.com/item?id=33749677
                            [7] => 2022-12-19T09:20:53.878Z
                        )

                    [2] => Array
                        (
                            [0] => FCC Bans Authorizations for Devices That Pose National Security Threat (fcc.gov)
                            [1] => https://www.fcc.gov/document/fcc-bans-authorizations-devices-pose-national-security-threat
                            [2] => terramex
                            [3] => 236
                            [4] => 2022-11-26T20:01:49
                            [5] => 196
                            [6] => https://news.ycombinator.com/item?id=33756089
                            [7] => 2022-12-19T09:20:53.881Z
                        )
    ....

```

Sending POST requests
=====================

[](#sending-post-requests)

ScrapeNinja can perform POST requests.

Sending JSON POST
-----------------

[](#sending-json-post)

```
$response = $client->scrape([
    "url" => "https://news.ycombinator.com/",
    "headers" => ["Content-Type: application/json"],
    "method" => "POST"
    "data" => "{\"fefe\":\"few\"}"
]);
```

Sending www-encoded POST
------------------------

[](#sending-www-encoded-post)

```
$response = $client->scrape([
    "url" => "https://news.ycombinator.com/",
    "headers" => ["Content-Type: application/x-www-form-urlencoded"],
    "method" => "POST"
    "data" => "key1=val1&key2=val2"
]);
```

Retries logic
=============

[](#retries-logic)

ScrapeNinja retries the request 2 times (so 3 requests in total) by default, in case of failure (target website timeout, proxy timeout, certain provider captcha request). This behaviour can be modified and disabled.

ScrapeNinja can also be instructed to retry on http response status codes and text existing in response body (useful for custom captchas)

```
$response = $client->scrape([
    "url" => "https://news.ycombinator.com/",
    "retryNum": 1, // 0 to disable retries
    "textNotExpected": [
        "random-captcha-text-which-might-appear"
    ],
    "statusNotExpected": [
        403,
        502
    ]
]);
```

Error handling
==============

[](#error-handling)

You should definitely wrap scrape() calls into try catch handler and log your errors. RapidAPI might get down, ScrapeNinja server might get down, target website might get down.

- In case RapidAPI or ScrapeNinja are down, you will get Guzzle exception which treats any non-200 response from ScrapeNinja server as an unusual situation (which is good). You might get 429 error if you exceed your plan limit.
- In case ScrapeNinja failed to get "good" response even after 3 retries it might throw 503 error.

In all these cases, it is useful to get HTTP response of a failure.

```
try {
   $response = $ninja->scrape($requestOpts);

   // you might want to add your custom errors here
   if ($response['info']['statusCode'] != 200) {
     throw new \Exception('your custom exception because this you didn\'t expect this from target website');
   }
} catch (GuzzleHttp\Exception\ClientException $e) {
    $response = $e->getResponse();

    echo 'Status code: ' . $response->getStatusCode() . "\n";
    echo 'Err message: ' . $e->getMessage() . "\n";

} catch (\Exception $e) {
   // your custom error handling logic, this is a non-Guzzle error
}
```

(see examples/ folder for full error handling example)

###  Health Score

31

—

LowBetter than 66% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity29

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity53

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~39 days

Recently: every ~58 days

Total

7

Last Release

1292d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/775507?v=4)[restyler](/maintainers/restyler)[@restyler](https://github.com/restyler)

---

Top Contributors

[![restyler](https://avatars.githubusercontent.com/u/775507?v=4)](https://github.com/restyler "restyler (15 commits)")

### Embed Badge

![Health badge](/badges/restyler-scrapeninja-api-php-client/health.svg)

```
[![Health](https://phpackages.com/badges/restyler-scrapeninja-api-php-client/health.svg)](https://phpackages.com/packages/restyler-scrapeninja-api-php-client)
```

###  Alternatives

[statamic/cms

The Statamic CMS Core Package

4.8k3.6M985](/packages/statamic-cms)[tencentcloud/tencentcloud-sdk-php

TencentCloudApi php sdk

3741.3M47](/packages/tencentcloud-tencentcloud-sdk-php)[neuron-core/neuron-ai

The PHP Agentic Framework.

2.0k656.1k38](/packages/neuron-core-neuron-ai)[avalara/avataxclient

Client library for Avalara's AvaTax suite of business tax calculation and processing services. Uses the REST v2 API.

528.5M7](/packages/avalara-avataxclient)[eslazarev/wildberries-sdk

Wildberries OpenAPI clients (generated).

273.0k](/packages/eslazarev-wildberries-sdk)[files.com/files-php-sdk

Files.com PHP SDK

2481.1k](/packages/filescom-files-php-sdk)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)