PHPackages                             panakour/pkscraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. panakour/pkscraper

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

panakour/pkscraper
==================

Get whatever data you want.

v1.2.0(2y ago)11881MITPHPPHP ^8.1CI failing

Since Jun 13Pushed 2y agoCompare

[ Source](https://github.com/panakour/pkscraper)[ Packagist](https://packagist.org/packages/panakour/pkscraper)[ RSS](/packages/panakour-pkscraper/feed)WikiDiscussions master Synced 2d ago

READMEChangelog (10)Dependencies (7)Versions (11)Used By (0)

[![ci](https://github.com/panakour/pkscraper/actions/workflows/ci.yml/badge.svg)](https://github.com/panakour/pkscraper/actions/workflows/ci.yml)[![Code Coverage Badge](https://raw.githubusercontent.com/panakour/pkscraper/image-data/coverage.svg)](https://raw.githubusercontent.com/panakour/pkscraper/image-data/coverage.svg)

Installation
------------

[](#installation)

`composer require panakour/pkscraper`

Examples
--------

[](#examples)

### Create http client with proxy and headers

[](#create-http-client-with-proxy-and-headers)

```
$httpClient = new \Pkscraper\Http\GuzzleClient();
$httpClient->setProxy('socks5://172.17.0.1:9050', 'socks5://172.17.0.1:9050');
$httpClient->setHeaders(['User-Agent' => 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36']);
$httpClient->newClient();
```

### Get a text from single url

[](#get-a-text-from-single-url)

```
$resp = $httpClient->doGetRequest("https://example.com/");
$con = new Text("img", new SymfonyDomCrawler($resp->getBody()->getContents()), "//meta[@property='og:image']/@content");
$con->build();
\Pkscraper\ToolBox::debugResult($con->getExtractedValue());
```

### Concurrent requests and group multiple fields

[](#concurrent-requests-and-group-multiple-fields)

```
$urls = UrlExtractor::extract($httpClient, 'https://www.example.com/feed', "//item/link");

$pool = $httpClient->concurrentRequests($urls);
foreach ($pool as $index => $response) {
    if ($response instanceof \GuzzleHttp\Exception\RequestException) {
        dd('something went wrong');
    }
    $domCrawler = new SymfonyDomCrawler($response->getBody()->getContents());
    $bags[$index] = new Bag($urls[$index]);
    $titleItem = new Text('title', $domCrawler, "//article/div[@class='box']/h2/a");
    $featuredImage = new Text('featuredImage', $domCrawler, '//meta[@property="og:image"]/@content');
    $htmlContentItem = new SafeHtml('mainContent', $domCrawler, "//article/div[@class='box']");
    $storeTitles = new TextArray('tags', $domCrawler, "//div[@class='box']/div[@class='cp-admin-row']//a[@rel='tag']");
    $storeTitles->setRequired(false);
    $bags[$index]->setItems($featuredImage, $titleItem, $htmlContentItem, $storeTitles);
    $bags[$index]->build();
}
ToolBox::debugResult($bags);
```

### More advanced example:

[](#more-advanced-example)

```
    $pool = $httpClient->concurrentRequests($urls);
    $bags = [];
    foreach ($pool as $index => $response) {
        try {
            if ($response instanceof \GuzzleHttp\Exception\RequestException) {
                continue;
            }
            $domCrawler = new SymfonyDomCrawler($response->getBody()->getContents());
            $bags[$index] = new Bag($urls[$index]);

            $titleItem = new Text('title', $domCrawler, "//div[@class='grayTopCnt topInfo ']/div[@class='row'][2]/div[@class='col col12']/div[@class='title']/h1");
            $featuredImage = new Text('featuredImage', $domCrawler, "//div[@class='imgWrp']/div[@class='topImg mainVideo']/div[@class='item']/picture/img[@class='lazyload']/@data-src");
            $safeHtmlContent = new \Pkscraper\Items\SafeHtml('contentTest', new SymfonyDomCrawler($resp->getBody()->getContents()), "//div[@id='main-post']/div[@class='post']/div[@class='blog-standard']/div[@class='cntTxt']");
            $safeHtmlContent->addTransformer(new \Pkscraper\Transform\ImageRelativeSourceToAbsoluteTransformer($httpClient->getCurrentUrlWithoutPath()));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementByTagByIndexRemover('img', 0));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementByTagByIndexRemover('a', 0));
            $safeHtmlContent->addCleaner(new \Pkscraper\Clean\TextCleaner('                Loading...                						', ''));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementsByTagRemover('footer'));
            $safeHtmlContent->addTransformer(new \Pkscraper\Transform\ImageRelativeSourceToAbsoluteTransformer($httpClient->getCurrentUrlWithoutPath()));
            $safeHtmlContent->addCleaner(new \Pkscraper\Clean\RegExCleaner('/|>(?1))/', ''));
            $safeHtmlContent->addCleaner(new \Pkscraper\Clean\RegExCleaner('/|>)(?1)/', ''));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementByIdRemover('jp-post-flair'));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementByClassByIndexRemover('size-full', 0));
            $safeHtmlContent->addCleaner(new \Pkscraper\Clean\TextCleaner('";                        i.innerHTML=l};                      //]]&gt;                    ', ''));
            $safeHtmlContent->addRemover(new \Pkscraper\Remove\ElementsContainsClassRemover('post-'));
            $safeHtmlContent->addTransformer(new ImageRelativeSourceToAbsoluteTransformer($httpClient->getCurrentUrlWithoutPath()));
            $domRunnerBeforePurify = function () {
                foreach ($this->getAttributesValue('img', 'data-src') as $index => $imgLink) {
                    $paths = \Pkscraper\ToolBox::getUrlPathComponents($imgLink);

                    if (isset($paths[3]) && $paths[3] === "YouTube") { //this let me find which img element is used for youtube and fix them
                        $youtubeId = substr($paths[4], 0, -4);

                        $iframe = $this->DOMDocument->createElement('iframe');
                        $iframe->setAttribute('src', "https://www.youtube.com/embed/$youtubeId");

                        $elementToBeReplaced = $this->getNodeList('img')->item($index);
                        if ($elementToBeReplaced) {
                            $this->replaceElement($elementToBeReplaced, $iframe);
                        }
                    }
                }
                foreach ($this->getAttributesValue('img', 'data-src') as $index => $imgLink) { //the rest is not a youtube but only image
                    $this->replaceImagesAttributes("", $imgLink);
                }

            };

            $htmlContent = new SafeHtml('mainContent', $domCrawler,
                "//div[@class='main withShare']/div[@class='content details']/div[@class='cntTxt']", [
                    'h1',
                    'h2',
                    'h3',
                    'h4',
                    'h5',
                    'h6',
                    'div',
                    'a',
                    'em',
                    'strong',
                    'b',
                    'cite',
                    'blockquote',
                    'ul',
                    'ol',
                    'li',
                    'dl',
                    'dt',
                    'dd',
                    'img',
                    'br',
                    'p',
                    'center',
                    'span',
                    'table',
                    'thead',
                    'tbody',
                    'td',
                    'th',
                    'tr',
                    'sub',
                    'sup',
                ], $domRunnerBeforePurify);

            $bags[$index]->setItems($featuredImage, $titleItem, $htmlContent);
            $bags[$index]->build();
        } catch (\Exception $e) {
            print 'ok';
        }
    }
    echo(json_encode(Collector::collect($bags), JSON_UNESCAPED_UNICODE));
```

###  Health Score

32

—

LowBetter than 72% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity13

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity74

Established project with proven stability

 Bus Factor1

Top contributor holds 89.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~183 days

Recently: every ~345 days

Total

10

Last Release

878d ago

PHP version history (3 changes)v1.0.0PHP ^7.2

1.0.6PHP ^7.2|^8.0

v1.1.0PHP ^8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/6680ff78b32fb6b99cd57ea2fdc49b5d5e449c0f9991aae0a15285ccf17fc1f4?d=identicon)[panakour](/maintainers/panakour)

---

Top Contributors

[![panakour](https://avatars.githubusercontent.com/u/12927166?v=4)](https://github.com/panakour "panakour (25 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (3 commits)")

---

Tags

crawlercrawlingscraperscrapingscraping-websiteswebcrawlerscraper

###  Code Quality

TestsPHPUnit

Static AnalysisPsalm

Type Coverage Yes

### Embed Badge

![Health badge](/badges/panakour-pkscraper/health.svg)

```
[![Health](https://phpackages.com/badges/panakour-pkscraper/health.svg)](https://phpackages.com/packages/panakour-pkscraper)
```

###  Alternatives

[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[getdkan/dkan

DKAN Open Data Catalog

385135.4k2](/packages/getdkan-dkan)[ashallendesign/favicon-fetcher

A Laravel package for fetching website's favicons.

190272.4k3](/packages/ashallendesign-favicon-fetcher)[crwlr/crawler

Web crawling and scraping library.

37214.8k2](/packages/crwlr-crawler)[nelexa/google-play-scraper

Scrapes app data from Google Play store.

88487.4k](/packages/nelexa-google-play-scraper)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
