PHPackages                             cacing69/cquery - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. cacing69/cquery

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

cacing69/cquery
===============

PHP Scraper with language expression, could be used to scrape data from a website that uses javascript or ajax

v1.4.0(2y ago)23431MITPHPPHP ^7.2|^8.1

Since Sep 5Pushed 2y ago1 watchersCompare

[ Source](https://github.com/cacing69/cquery)[ Packagist](https://packagist.org/packages/cacing69/cquery)[ Docs](https://github.com/cacing69/cquery)[ Fund](https://saweria.co/cacing69)[ Fund](https://trakteer.id/cacing69/tip)[ RSS](/packages/cacing69-cquery/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (6)Dependencies (11)Versions (8)Used By (1)

 [ ![](https://raw.githubusercontent.com/cacing69/cquery/main/docs/media/logo-sm.png) ](https://laravel.com)

Cquery (Crawl Query)
====================

[](#cquery-crawl-query)

[![Latest Version on Packagist](https://camo.githubusercontent.com/03c73e6fca4555bf2b7f713ef262f72be0652d016349e72e5e1912f35d6dc82a/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f636163696e6736392f6371756572792e737667)](https://packagist.org/packages/cacing69/cquery)[![Software License](https://camo.githubusercontent.com/074b89bca64d3edc93a1db6c7e3b1636b874540ba91d66367c0e5e354c56d0ea/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d627269676874677265656e2e737667)](LICENSE.md)[![PRs Welcome](https://camo.githubusercontent.com/25b3e6d0d42c98de74a98cbb4d149a1c09020cf6d1361993b72d7d5b8ffed363/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5052732d77656c636f6d652d627269676874677265656e2e7376673f7374796c653d666c61742d737175617265)](http://makeapullrequest.com)[![StyleCI](https://camo.githubusercontent.com/76ae13bf8ea1b3e603b36ace59e2fef635c84c3d66684fcf5975932221bb0aad/68747470733a2f2f7374796c6563692e696f2f7265706f732f3638313535373330352f736869656c64)](https://styleci.io/repos/681557305)

About Cquery
------------

[](#about-cquery)

Want to create a query for web scraping a website like this, I suppose it will enable me to generate scrape queries from anywhere.

```
from (.item)
define
    span > a.title as title
    attr(href, div > h1 > span > a) as url
filter
    span > a.title has 'history'
limit 1
```

### Changelog

[](#changelog)

Please see [CHANGELOG](https://cacing69.github.io/cquery/#/CHANGELOG) for more information what has changed recently.

### Currently experimenting

[](#currently-experimenting)

Attempt to extract data from webpage which, in my opinion, becomes more enjoyable, my intention in creating this was to enable web scraping of websites that utilize js/ajax for content loading.

To perform web scraping on pages loaded with js/ajax, you need an adapter outside of this package, this was developed using `symfony/panther`. I don't want to add it as a default package in the core of cquery because, this feature is optional for some people. Please check and understand its usage here. I refer to it as [cacing69/cquery-panther-loader](https://github.com/cacing69/cquery-panther-loader). Read more information about [symfony/panther](https://github.com/symfony/panther), you'll discover installation and additional information there

All methods and usage instructions provided here are designed according to that i needs. If you have any suggestions or feedback to improve them, it would be highly appreciated and

> I hope there's someone who is kind-hearted and compassionate to build a Web App/UI application like a dedicated tool for cquery, with a textarea (for query input) and a table container to show the results, much like the cquery playground for running raw cquery, if someone is ready, i will create an API for it, and start to develop more logic on **Parser** class

### What kind of thing is this

[](#what-kind-of-thing-is-this)

Cquery is an acronym for crawl query, used to extract text from an HTML element using PHP, simply its tool for crawling/scraping web page. It called a query, as it adopts the structure present in an SQL query, so you can analogize that your DOM/HTML Document is a table you will query.

Let's play for a moment and figure out how to make website scraping easier, much like crafting a query for a database.

Please keep in mind that I haven't yet reached a beta/stable release for this library, so the available features are still very limited.

I would greatly accept any support/contribution from everyone. See [CONTRIBUTING.md](CONTRIBUTING.md) for help getting started.

### I list a few examples of utilizing the advanced features

[](#i-list-a-few-examples-of-utilizing-the-advanced-features)

- [Manipulate Query Result](#handle-manipualte-result)
- [Multiple Requests (to get detail from another url)](#handle-multi-async)
- [Doing action after page load (click link/submit form)](#handle-doing-action)
- [Scrape website load by js/ajax with PHP](#handle-js-ajax)

Quick Installation
------------------

[](#quick-installation)

```
composer require cacing69/cquery
```

For example, you have a simple HTML element as shown below.

 Click to show HTML : `src/Samples/sample.html````
>

  Href Attribute Example

      Title 1
      Href Attribute Example 1

      Title 2

        Href Attribute Example 2
        Lorem pilsum

      Title 3
      Href Attribute Example 4

      Title 11
      Href Attribute Example 78

      Title 22
      Href Attribute Example 90

      Title 323
      Href Attribute Example 5

      Title 331
      Href Attribute Example 51

      Title 331
      Href Attribute Example 51

      12345
      Href Attribute Example 52

    The freeCodeCamp Contribution Page

    Copyright 2023

```

### List definer expression available

[](#list-definer-expression-available)

Below are the expressions you are can use, they may change over time.

functionexampledescription`attr(attrName, selector)``attr(class, .link)`will retrieve all class value present on the element/container according to the selector. (.link)`length(selector)``length(h1)`will retrieve all length string on the element/container according to the selector. (h1)`lower(selector)``lower(h1)`will change text to lowercase element/container according to the selector. (h1)`upper(selector)``upper(h1)`will change text to uppercase element/container according to the selector. (h1)`str(selector)``str(h1)`will parse element content to string (h1)`int(selector)``int(h1)`will parse element content to integer (h1)`float(selector)``float(h1)`will parse element content to float (h1)`reverse(selector)``reverse(h1)`will reverse text according to the selector. (h1)`replace(from, to, selector)``replace('lorem', 'ipsum', h1)`will change text from `lorem` to `ipsum` according to the selector (h1).
 have 3 option to use that

 `replace('lorem', 'ipsum', h1)`

 `replace(['lorem', 'dolor'], ['ipsum', 'sit'], h1)`

 `replace(['lorem', 'ipsum'], 'ipsum', h1)`

 it used single tick on argument/param`append(selectorParent)``append(title)  as main_title`will append single element as new key on element each item outsoide main source.`append_node(selectorParent, selectorChildAfterParent)``append_node(div > .tags, a)  as tags`will append array element as a child each item, for its usage, you can refer to the sample code below in $result\_4.### List rules for alias

[](#list-rules-for-alias)

Below are the functions you are can use, they may change over time.
**Note:** nested function has been supported.

\#examplekey\_resultdescription1`h1``h1`-2`h1 > 1``h1_a`-3`h1 > 1 as title``title`-4`append_node(div > .tags, a) as _tags.key``_tags[key]`it will be append element as array each element5`append_node(div > .tags, a) as tags.*.text``tags[0]['text']``*` the star symbol signifies all elements at the index. it will be append new key (in this case `text`) each array element### How to use filter

[](#how-to-use-filter)

**Note:** nested filter not supported yet.

operatorexampledescription`(= or ==)``filter("h1", "=", "99")`retrieve data according to elements that only have the same value = 99`===``filter("h1", "===", "99")`retrieve data according to elements that only have the same and identic with value = 99`=``filter("attr(id, a)", ">=", 99)`Get data from elements with values that are greater than or equal 99`( or !=)``filter("attr(id, a)", "!=", 99)`get data from elements that are not equal to 99`!==``filter("attr(id, a)", "!==", 99)`get data from elements that are not equal or they are not the same type to 99`has``filter("attr(class, a)", "has", "foo")`get data from elements that only have class "foo"`regex``filter("attr(class, a)", "regex", "/[a-z]+\-[0-9]+\-[a-z]+/im")`get data from elements that match the given regex pattern only, with the provided pattern being (a-192-ab, b-12-ac, zx-1223-ac)`like``filter("attr(class, a)", "like", "%foo%")`

 `filter("attr(class, a)", "like", "%foo")`

 `filter("attr(class, a)", "like", "foo%")`retrieve data according to elements and value criteria.

 %foo% = anything containing the phrase "foo"

 %foo = all sentences ending with "foo"

 foo% = all sentences starting with "foo"---

So, let's start scraping this website.

```
require_once 'vendor/autoload.php';

$html = file_get_contents("src/Samples/sample.html");
$data = new Cacing69\Cquery\Cquery($html);

$result = $query
        ->from("#lorem .link") // next will be from("(#lorem .link) as el")
        ->define(
            "h1 as title",
            "a as description",
            "attr(href, a) as url", // get href attribute from all element at #lorem .link a
            "attr(class, a) as class"
        )
        // just imagine this is your table, and every element as your column
        ->filter("attr(class, a)", "has", "vip") // add some filter here
        // ->orFilter("attr(class, a)", "has", "super") // add another condition its has OR condition SQL
        // ->filter("attr(class, a)", "has", "blocked") // add another condition its has AND condition SQL
        ->get(); // -> return type is \Doctrine\Common\Collections\ArrayCollection
```

or u can use raw method

```
require_once 'vendor/autoload.php';

$html = file_get_contents("src/Samples/sample.html");
$data = new Cacing69\Cquery\Cquery($html);

$result = $query
        ->raw("
            from (#lorem .link)
            define
              h1 as title,
              a as description,
              attr(href, a) as url,
              attr(class, a) as class
            filter
              attr(class, a) has 'vip'
        ");
```

And here are the results

[![Alt text](https://camo.githubusercontent.com/dd6b83fe5b5c5239253446294b5066e1182f55dcd22dff210caed51f9d92b7d8/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f513658484b5279645369676c2e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/dd6b83fe5b5c5239253446294b5066e1182f55dcd22dff210caed51f9d92b7d8/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f513658484b5279645369676c2e706e673f6f3d31)

#### Another example with anonymous function

[](#another-example-with-anonymous-function)

```
require_once 'vendor/autoload.php';

use Cacing69\Cquery\Definer;
$html = file_get_contents("src/Samples/sample.html");
$data = new Cacing69\Cquery\Cquery($html);

$result_1 = $data
          ->from("#lorem .link")
          ->define(
              "upper(h1) as title_upper",
              new Definer( "a", "col_2", function($value) use ($date) {
                  return "{$value} fetched on: {$date}";
              })
          )
          ->filter("attr(class, a)", "has", "vip")
          ->limit(2)
          ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
          ->toArray();
```

 Click to show output : `$result_1`[![Alt text](https://camo.githubusercontent.com/a4363da9c93ea2a188443f8f7b0496948dcf782f263fb97af27f615507d599de/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7174497456657a63455571372e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/a4363da9c93ea2a188443f8f7b0496948dcf782f263fb97af27f615507d599de/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7174497456657a63455571372e706e673f6f3d31)

```
// another example, filter with closure
$result_2 = $data
            ->from("#lorem .link")
            ->define("reverse(h1) as title", "attr(href, a) as url")
            ->filter("h1", function ($e) {
                return $e->text() === "Title 3";
            })
            ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
            ->toArray();
```

 Click to show output : `$result_2`[![Alt text](https://camo.githubusercontent.com/a4363da9c93ea2a188443f8f7b0496948dcf782f263fb97af27f615507d599de/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7174497456657a63455571372e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/a4363da9c93ea2a188443f8f7b0496948dcf782f263fb97af27f615507d599de/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7174497456657a63455571372e706e673f6f3d31)

#### How to load source page from url

[](#how-to-load-source-page-from-url)

```
// another example, to load data from url used browserkit

$url = "https://free-proxy-list.net/";
$data = new Cquery($url);

$result_3 = $data
    ->from(".fpl-list")
    ->pick(
        "td:nth-child(1) as ip_address",
        "td:nth-child(4) as country",
        "td:nth-child(7) as https",
    )->filter('td:nth-child(7)', "=", "no")
    ->limit(1)
    ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
    ->toArray();
```

 Click to show output : `$result_3`[![Alt text](https://camo.githubusercontent.com/f63d7d3bb5551d29d7932e0083e8489370b694b0b5ad055cb6dc659c0dac3b70/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f57653065613766726c5a77312e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/f63d7d3bb5551d29d7932e0083e8489370b694b0b5ad055cb6dc659c0dac3b70/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f57653065613766726c5a77312e706e673f6f3d31)

#### how to use append\_node(a, b)

[](#how-to-use-append_nodea-b)

```
// another example, to load data from url used browserkit

$url = "http://quotes.toscrape.com/";
$data = new Cquery($url);

$result_4 = $data
              ->from(".col-md-8 > .quote")
              ->define(
                  "span.text as text",
                  "span:nth-child(2) > small as author",
                  "append_node(div > .tags, a)  as tags",
              )
              ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
              ->toArray();
```

 Click to show output : `$result_4`[![Alt text](https://camo.githubusercontent.com/edf965134615d4982695bfb06a7e58432e4779c54f3597e71deb8f55cce8a139/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f34366d45547a4161746a75722e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/edf965134615d4982695bfb06a7e58432e4779c54f3597e71deb8f55cce8a139/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f34366d45547a4161746a75722e706e673f6f3d31)

#### Another example how to use append\_child() with custom key each item

[](#another-example-how-to-use-append_child-with-custom-key-each-item)

```
// another example, to load data from url used browserkit

$url = "http://quotes.toscrape.com/";
$data = new Cquery($url);

$result_5 = $data
              ->from(".col-md-8 > .quote")
              ->define(
                  "span.text as text",
                  "append_node(div > .tags, a) as tags.key", // grab child `a` on element `div > .tags` and place it into tags['key']
              )
              ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
              ->toArray();
```

 Click to show output : `$result_5`[![Alt text](https://camo.githubusercontent.com/c877707801ac06b2b82089b55f912c3fe7196e7d0847e0909dc9de32fd4d6baa/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4e59557353746a49736873662e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/c877707801ac06b2b82089b55f912c3fe7196e7d0847e0909dc9de32fd4d6baa/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4e59557353746a49736873662e706e673f6f3d31)

```
// another example, to load data from url used browserkit

$url = "http://quotes.toscrape.com/";
$data = new Cquery($url);

$result_6 = $data
              ->from(".col-md-8 > .quote")
              ->define(
                  "span.text as text",
                  "append_node(div > .tags, a) as _tags",
                  "append_node(div > .tags, a) as tags.*.text",
                  "append_node(div > .tags, attr(href, a)) as tags.*.url", // [*] means each index, for now ots limitd only one level
              )
              ->get() // -> return type is \Doctrine\Common\Collections\ArrayCollection
              ->toArray();
```

 Click to show output : `$result_6`[![Alt text](https://camo.githubusercontent.com/cbe6a522d8a1f60ebe15cbcc4441f07d17de1f1395fe37ef6c3dc114d4c1197a/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6c58686877376841384c59662e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/cbe6a522d8a1f60ebe15cbcc4441f07d17de1f1395fe37ef6c3dc114d4c1197a/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6c58686877376841384c59662e706e673f6f3d31)

#### How to use replace

[](#how-to-use-replace)

```
  // how to use replace with single string
  $content = file_get_contents(SAMPLE_HTML);

  $data = new Cquery($content);

  $result = $data
      ->from(".col-md-8 > .quote")
      ->define(
          "replace('The', 'Lorem', span.text) as text",
      )
      ->get();

  // how to use replace with array arguments
  $data_2 = new Cquery($content);

  $result = $data_2
      ->from(".col-md-8 > .quote")
      ->define(
          "replace(['The', 'are'], ['Please ', 'son'], span.text) as text",
          // "replace(['The', 'are'], ['Please'], span.text) as text", // or you can do this if just want to use single replacement
      )
      ->get();

  // how to use replace with array arguments and single replacement
  $data_3 = new Cquery($simpleHtml);

  $result = $data_3
      ->from("#lorem .link")
      ->define("replace(['Title', '331'], 'LOREM', h1)  as title")
      ->get();
```

####  Method to manipulate query results

[](#--method-to-manipulate-query-results)

There are 2 methods in CQuery for manipulating query results. 1. Each Item Closure `...->eachItem(function ($el, $i){})`or `...->eachItem(function ($el){})`Example :

```
  ...->eachItem(function ($item, $i){
    $item["price"] = $i == 2 ? 1000 : $resultDetail["price"];

    return $item;
  })
```

Basically, you have the ability to execute any action on each item. In the given example, it will insert a new key, "price" into each item, and if the index equals 2 (third item), it will assign a price of 1000.

2. On Obtained Results Closure `...->onObtainedResults(function ($results){})`Example :

```
  ...->onObtainedResults(function ($results){
    // u can do any operation here

    return  array_map(function ($_item) use ($results) {
        $_item["sub"] = [
            "foo" => "bar"
        ];

        return $_item;
    }, $results);
  })
```

Basically, this is the array produced by the query's result, and you have the flexibility to perform any manipulations on them. For another example i've included an example, particularly for cases where you need to load different details from another page for each entry, u can check it here [Check async multiple request](#handle-multi-async)

####  How to handle multiple request each element

[](#--how-to-handle-multiple-request-each-element)

If there's a scenario like this, you need to load the details, and the details are on a different URL, which means you have to load every page. You should use a client that can perform non-blocking requests, such as [amphp/http-client](https://github.com/amphp), [guzzle](https://github.com/guzzle/guzzle), [phpreact/http](https://github.com/reactphp/http) or used [curl\_multi\_init](https://www.php.net/curl_multi_init) in oop ways for curl u should check [php-curl-class](https://github.com/php-curl-class/php-curl-class)

I suggest using phpreact by making async requests.

```
  use Cacing69\Cquery\Cquery;
  use React\EventLoop\Loop;
  use React\Http\Browser;
  use Psr\Http\Message\ResponseInterface;

  $url = "http://www.classiccardatabase.com/postwar-models/Cadillac.php";

  $data = new Cquery($url);

  $loop = Loop::get();
  $client = new Browser($loop);

  // detail is on another page
  $result = $data
            ->from(".content")
            ->define(
                ".car-model-link > a as name",
                "replace('../', 'http://www.classiccardatabase.com/', attr(href, .car-model-link > a)) as url",
            )
            ->filter("attr(href, .car-model-link > a)", "!=", "#")
            ->onObtainedResults(function ($results) use ($loop, $client){
                // I've come across a maximum threshold of 25 chunk, when I input 30, there is some null data.
                $results = array_chunk($results, 25);

                foreach ($results as $key => $_chunks) {
                    foreach ($_chunks as $_key => $_result) {
                        $client
                        ->get($_result["url"])
                        ->then(function (ResponseInterface $response) use (&$results, $key, $_key) {
                            $detail = new Cquery((string) $response->getBody());

                            $resultDetail = $detail
                                ->from(".spec")
                                ->define(
                                    ".specleft tr:nth-child(1) > td.data as price"
                                )
                                ->first();

                            $results[$key][$_key]["price"] = $resultDetail["price"];
                        });
                    }
                    $loop->run();
                }

                return $results;
            })
            ->get();
```

Here's a comparison when utilizing phpreact.

**without phpreact**

[![Alt text](https://camo.githubusercontent.com/eeed37e14492707dcf0152f46bb05103656ed33ea458ad984ed717f1ae5ce6f2/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6c314747447a5579786173592e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/eeed37e14492707dcf0152f46bb05103656ed33ea458ad984ed717f1ae5ce6f2/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6c314747447a5579786173592e706e673f6f3d31)

**with phpreact**

[![Alt text](https://camo.githubusercontent.com/dec8cf0c57ef80b13c9e5a1fd8dad5b947b1c75b1027cb59c076c679220a6b04/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6e61644d6c463664354175332e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/dec8cf0c57ef80b13c9e5a1fd8dad5b947b1c75b1027cb59c076c679220a6b04/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f6e61644d6c463664354175332e706e673f6f3d31)

In this scenario, there are 320 rows of data, and each detail will be loaded, which means there will a lot of HTTP requests made to fetch the individual details.

####  How to doing action after page load (click link/submit form)

[](#--how-to-doing-action-after-page-load-click-linksubmit-form)

1. Submit Form If you need to submit data to retrieve another data for scraping, you'll need to deal with this case.

- *case 1 : without crawler object*

```
$url = "https://user-agents.net/random";
$data = new Cquery($url);

$result = $data
  ->onContentLoaded(function (HttpBrowser $browser) {
      $browser->submitForm("Generate random list", [
          "limit" => 5,
      ]);

      return $browser;
  })
  ->from("section > article")
  ->define(
      "ol > li > a as user_agent",
  )
  ->get();
```

Using this code above, you'll perform a form submission while setting the limit (according to input name) to 5 in the data.

- *case 2 : with crawler object*

Let's simulate on Wikipedia and then perform a search with the phrase 'sambas,' to see if the results match with a manual search.

```
$url = "https://id.wikipedia.org/wiki/Halaman_Utama";
$data = new Cquery($url);

$result = $data
    ->onContentLoaded(function (HttpBrowser $browser, Crawler $crawler) {
        // This is a native function available in the dom-crawler.
        $form = new Form($crawler->filter("#searchform")->getNode(0), $url);

        $browser->submit($form, [
            "search" => "sambas",
        ]);
        return $browser;
    })
    ->from("html")
    ->define(
        "title as title",
    )
    ->get();
```

**result**

[![Alt text](https://camo.githubusercontent.com/6ce68f0bf695bd62d6c7cb840d1857c0c665fea62aca6d179c950a30e859609d/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7448684b33394366437573702e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/6ce68f0bf695bd62d6c7cb840d1857c0c665fea62aca6d179c950a30e859609d/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7448684b33394366437573702e706e673f6f3d31)

**web page**

[![Alt text](https://camo.githubusercontent.com/0ebf4fe44ff339d37f56c54e25b564490edaf53dcfa5614a9953124bdc07d257/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4530414b65524351433666302e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/0ebf4fe44ff339d37f56c54e25b564490edaf53dcfa5614a9953124bdc07d257/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4530414b65524351433666302e706e673f6f3d31)

`page source`

[![Alt text](https://camo.githubusercontent.com/521c2541c11901dbe5d2f162501c3a666520384dc64abfab49f154b875230e75/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4d396d59474a4c30486a32702e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/521c2541c11901dbe5d2f162501c3a666520384dc64abfab49f154b875230e75/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f4d396d59474a4c30486a32702e706e673f6f3d31)

2. Click Link If you want to click a link on a loaded page, please observe the code below.

[![Alt text](https://camo.githubusercontent.com/bfd30ecf0076a30d2b876155164f10e5646f9c9e099ceecad04cddadb3549e4e/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f626e667a616d7470385670722e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/bfd30ecf0076a30d2b876155164f10e5646f9c9e099ceecad04cddadb3549e4e/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f626e667a616d7470385670722e706e673f6f3d31)

**click that link before start scraping**

```
$url = "https://semver.org/";
$data = new Cquery($url);

$result = $data
    ->onContentLoaded(function (HttpBrowser $browser, Crawler $crawler) {
        $browser->clickLink("Bahasa Indonesia (id)");
        return $browser;
    })
    ->from("#spec")
    ->define(
        "h2 as text",
    )
    ->get();
```

**result click link**

[![Alt text](https://camo.githubusercontent.com/e9938f620a8bf902da60088f9a035a5857dffcda1529497c8eb4ecdb5da35b18/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7166497467334148705473522e706e673f6f3d31 "a title")](https://camo.githubusercontent.com/e9938f620a8bf902da60088f9a035a5857dffcda1529497c8eb4ecdb5da35b18/68747470733a2f2f6763646e622e706272642e636f2f696d616765732f7166497467334148705473522e706e673f6f3d31)

####  How to scrape website load by js/ajax with PHP

[](#--how-to-scrape-website-load-by-jsajax-with-php)

If the web page to be scraped uses JavaScript and AJAX handling for its data, then you need to add Panther-loader for cquery.

install composer-panther-loader

```
composer require cacing69/cquery-panther-loader
```

#### Another Examples

[](#another-examples)

A full list of methods with example code can be found in the [tests](https://github.com/cacing69/cquery/tree/main/tests).

### Note

[](#note)

I've recently started building this, and if anyone is interested,I would certainly appreciate a lot of feedback from everyone who has read/seen my little project, in any way (issue, pull request or whatever).However, right now I'm considering making it better to be more flexible and user-friendly for website scraping.

This is just the beginning, and I will continue to develop it as long as I can

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

28

—

LowBetter than 54% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity12

Limited adoption so far

Community16

Small or concentrated contributor base

Maturity57

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 92.7% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~2 days

Total

7

Last Release

969d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/f1d7834c4d018cf15fa7b9ab0e4d944ddeccb4d7b1cc2a0611ccf975c507d5fe?d=identicon)[ibnuul](/maintainers/ibnuul)

---

Top Contributors

[![cacing69](https://avatars.githubusercontent.com/u/36250619?v=4)](https://github.com/cacing69 "cacing69 (242 commits)")[![StyleCIBot](https://avatars.githubusercontent.com/u/11048387?v=4)](https://github.com/StyleCIBot "StyleCIBot (16 commits)")[![nrzkh](https://avatars.githubusercontent.com/u/144366010?v=4)](https://github.com/nrzkh "nrzkh (2 commits)")[![alphard29](https://avatars.githubusercontent.com/u/28419594?v=4)](https://github.com/alphard29 "alphard29 (1 commits)")

---

Tags

crawlerdomhtmlhtml-parserphpscrapescraperphpcrawlerscraper

###  Code Quality

TestsPHPUnit

Code StylePHP CS Fixer

### Embed Badge

![Health badge](/badges/cacing69-cquery/health.svg)

```
[![Health](https://phpackages.com/badges/cacing69-cquery/health.svg)](https://phpackages.com/packages/cacing69-cquery)
```

###  Alternatives

[sulu/sulu

Core framework that implements the functionality of the Sulu content management system

1.3k1.3M152](/packages/sulu-sulu)[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[contao/core-bundle

Contao Open Source CMS

1231.6M2.4k](/packages/contao-core-bundle)[crwlr/crawler

Web crawling and scraping library.

37214.8k2](/packages/crwlr-crawler)[netgen/layouts-core

Netgen Layouts enables you to build and manage complex web pages in a simpler way and with less coding. This is the core of Netgen Layouts, its heart and soul.

3689.4k10](/packages/netgen-layouts-core)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49516.9k](/packages/blackfire-player)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
