PHPackages                             dimj/leoninventquerylist - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. dimj/leoninventquerylist

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

dimj/leoninventquerylist
========================

Simple, elegant, extensible PHP Web Scraper (crawler/spider),Use the css3 dom selector,Based on phpQuery! 简洁、优雅、可扩展的PHP采集工具(爬虫)，基于phpQuery。

11PHP

Since Oct 19Pushed 2y agoCompare

[ Source](https://github.com/DimJ/LeonInventQueryList)[ Packagist](https://packagist.org/packages/dimj/leoninventquerylist)[ RSS](/packages/dimj-leoninventquerylist/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependenciesVersions (1)Used By (0)

 [![QueryList](logo.png)](logo.png)

QueryList
=========

[](#querylist)

`QueryList` is a simple, elegant, extensible PHP Web Scraper (crawler/spider) ,based on phpQuery.

[API Documentation](https://github.com/jae-jae/QueryList/wiki)

[中文文档](README-ZH.md)

Features
--------

[](#features)

- Have the same CSS3 DOM selector as jQuery
- Have the same DOM manipulation API as jQuery
- Have a generic list crawling program
- Have a strong HTTP request suite, easy to achieve such as: simulated landing, forged browser, HTTP proxy and other complex network requests
- Have a messy code solution
- Have powerful content filtering, you can use the jQuey selector to filter content
- Has a high degree of modular design, scalability and strong
- Have an expressive API
- Has a wealth of plug-ins

Through plug-ins you can easily implement things like:

- Multithreaded crawl
- Crawl JavaScript dynamic rendering page (PhantomJS/headless WebKit)
- Image downloads to local
- Simulate browser behavior such as submitting Form forms
- Web crawler
- .....

Requirements
------------

[](#requirements)

- PHP &gt;= 7.1

Installation
------------

[](#installation)

By Composer installation:

```
composer require jaeger/querylist

```

Usage
-----

[](#usage)

#### DOM Traversal and Manipulation

[](#dom-traversal-and-manipulation)

- Crawl「GitHub」all picture links

```
QueryList::get('https://github.com')->find('img')->attrs('src');
```

- Crawl Google search results

```
$ql = QueryList::get('https://www.google.co.jp/search?q=QueryList');

$ql->find('title')->text(); //The page title
$ql->find('meta[name=keywords]')->content; //The page keywords

$ql->find('h3>a')->texts(); //Get a list of search results titles
$ql->find('h3>a')->attrs('href'); //Get a list of search results links

$ql->find('img')->src; //Gets the link address of the first image
$ql->find('img:eq(1)')->src; //Gets the link address of the second image
$ql->find('img')->eq(2)->src; //Gets the link address of the third image
// Loop all the images
$ql->find('img')->map(function($img){
	echo $img->alt;  //Print the alt attribute of the image
});
```

- More usage

```
$ql->find('#head')->append('Append content')->find('div')->htmls();
$ql->find('.two')->children('img')->attrs('alt'); // Get the class is the "two" element under all img child nodes
// Loop class is the "two" element under all child nodes
$data = $ql->find('.two')->children()->map(function ($item){
    // Use "is" to determine the node type
    if($item->is('a')){
        return $item->text();
    }elseif($item->is('img'))
    {
        return $item->alt;
    }
});

$ql->find('a')->attr('href', 'newVal')->removeClass('className')->html('newHtml')->...
$ql->find('div > p')->add('div > ul')->filter(':has(a)')->find('p:first')->nextAll()->andSelf()->...
$ql->find('div.old')->replaceWith( $ql->find('div.new')->clone())->appendTo('.trash')->prepend('Deleted')->...
```

#### List crawl

[](#list-crawl)

Crawl the title and link of the Google search results list:

```
$data = QueryList::get('https://www.google.co.jp/search?q=QueryList')
	// Set the crawl rules
    ->rules([
	    'title'=>array('h3','text'),
	    'link'=>array('h3>a','href')
	])
	->query()->getData();

print_r($data->all());
```

Results:

```
Array
(
    [0] => Array
        (
            [title] => Angular - QueryList
            [link] => https://angular.io/api/core/QueryList
        )
    [1] => Array
        (
            [title] => QueryList | @angular/core - Angularリファレンス - Web Creative Park
            [link] => http://www.webcreativepark.net/angular/querylist/
        )
    [2] => Array
        (
            [title] => QueryListにQueryを追加したり、追加されたことを感知する | TIPS ...
            [link] => http://www.webcreativepark.net/angular/querylist_query_add_subscribe/
        )
        //...
)

```

#### Encode convert

[](#encode-convert)

```
// Out charset :UTF-8
// In charset :GB2312
QueryList::get('https://top.etao.com')->encoding('UTF-8','GB2312')->find('a')->texts();

// Out charset:UTF-8
// In charset:Automatic Identification
QueryList::get('https://top.etao.com')->encoding('UTF-8')->find('a')->texts();
```

#### HTTP Client (GuzzleHttp)

[](#http-client-guzzlehttp)

- Carry cookie login GitHub

```
//Crawl GitHub content
$ql = QueryList::get('https://github.com','param1=testvalue & params2=somevalue',[
  'headers' => [
      // Fill in the cookie from the browser
      'Cookie' => 'SINAGLOBAL=546064; wb_cmtLike_2112031=1; wvr=6;....'
  ]
]);
//echo $ql->getHtml();
$userName = $ql->find('.header-nav-current-user>.css-truncate-target')->text();
echo $userName;
```

- Use the Http proxy

```
$urlParams = ['param1' => 'testvalue','params2' => 'somevalue'];
$opts = [
	// Set the http proxy
    'proxy' => 'http://222.141.11.17:8118',
    //Set the timeout time in seconds
    'timeout' => 30,
     // Fake HTTP headers
    'headers' => [
        'Referer' => 'https://querylist.cc/',
        'User-Agent' => 'testing/1.0',
        'Accept'     => 'application/json',
        'X-Foo'      => ['Bar', 'Baz'],
        'Cookie'    => 'abc=111;xxx=222'
    ]
];
$ql->get('http://httpbin.org/get',$urlParams,$opts);
// echo $ql->getHtml();
```

- Analog login

```
// Post login
$ql = QueryList::post('http://xxxx.com/login',[
    'username' => 'admin',
    'password' => '123456'
])->get('http://xxx.com/admin');
// Crawl pages that need to be logged in to access
$ql->get('http://xxx.com/admin/page');
//echo $ql->getHtml();
```

#### Submit forms

[](#submit-forms)

Login GitHub

```
// Get the QueryList instance
$ql = QueryList::getInstance();
// Get the login form
$form = $ql->get('https://github.com/login')->find('form');

// Fill in the GitHub username and password
$form->find('input[name=login]')->val('your github username or email');
$form->find('input[name=password]')->val('your github password');

// Serialize the form data
$fromData = $form->serializeArray();
$postData = [];
foreach ($fromData as $item) {
    $postData[$item['name']] = $item['value'];
}

// Submit the login form
$actionUrl = 'https://github.com'.$form->attr('action');
$ql->post($actionUrl,$postData);
// To determine whether the login is successful
// echo $ql->getHtml();
$userName = $ql->find('.header-nav-current-user>.css-truncate-target')->text();
if($userName)
{
    echo 'Login successful ! Welcome:'.$userName;
}else{
    echo 'Login failed !';
}
```

#### Bind function extension

[](#bind-function-extension)

Customize the extension of a `myHttp` method:

```
$ql = QueryList::getInstance();

//Bind a `myHttp` method to the QueryList object
$ql->bind('myHttp',function ($url){
	// $this is the current QueryList object
    $html = file_get_contents($url);
    $this->setHtml($html);
    return $this;
});

// And then you can call by the name of the binding
$data = $ql->myHttp('https://toutiao.io')->find('h3 a')->texts();
print_r($data->all());
```

Or package to class, and then bind:

```
$ql->bind('myHttp',function ($url){
    return new MyHttp($this,$url);
});
```

#### Plugin used

[](#plugin-used)

- Use the PhantomJS plugin to crawl JavaScript dynamically rendered pages:

```
// Set the PhantomJS binary file path during installation
$ql = QueryList::use(PhantomJs::class,'/usr/local/bin/phantomjs');

// Crawl「500px」all picture links
$data = $ql->browser('https://500px.com/editors')->find('img')->attrs('src');
print_r($data->all());

// Use the HTTP proxy
$ql->browser('https://500px.com/editors',false,[
	'--proxy' => '192.168.1.42:8080',
    '--proxy-type' => 'http'
])
```

- Using the CURL multithreading plug-in, multi-threaded crawling GitHub trending :

```
$ql = QueryList::use(CurlMulti::class);
$ql->curlMulti([
    'https://github.com/trending/php',
    'https://github.com/trending/go',
    //.....more urls
])
 // Called if task is success
 ->success(function (QueryList $ql,CurlMulti $curl,$r){
    echo "Current url:{$r['info']['url']} \r\n";
    $data = $ql->find('h3 a')->texts();
    print_r($data->all());
})
 // Task fail callback
->error(function ($errorInfo,CurlMulti $curl){
    echo "Current url:{$errorInfo['info']['url']} \r\n";
    print_r($errorInfo['error']);
})
->start([
	// Maximum number of threads
    'maxThread' => 10,
    // Number of error retries
    'maxTry' => 3,
]);
```

Plugins
-------

[](#plugins)

- [jae-jae/QueryList-PhantomJS](https://github.com/jae-jae/QueryList-PhantomJS):Use PhantomJS to crawl Javascript dynamically rendered page.
- [jae-jae/QueryList-CurlMulti](https://github.com/jae-jae/QueryList-CurlMulti) : Curl multi threading.
- [jae-jae/QueryList-AbsoluteUrl](https://github.com/jae-jae/QueryList-AbsoluteUrl) : Converting relative urls to absolute.
- [jae-jae/QueryList-Rule-Google](https://github.com/jae-jae/QueryList-Rule-Google) : Google searcher.
- [jae-jae/QueryList-Rule-Baidu](https://github.com/jae-jae/QueryList-Rule-Baidu) : Baidu searcher.

View more QueryList plugins and QueryList-based products: [QueryList Community](https://github.com/jae-jae/QueryList-Community)

Contributing
------------

[](#contributing)

Welcome to contribute code for the QueryList。About Contributing Plugins can be viewed:[QueryList Plugin Contributing Guide](https://github.com/jae-jae/QueryList-Community/blob/master/CONTRIBUTING.md)

Author
------

[](#author)

Jaeger

If this library is useful for you, say thanks [buying me a beer 🍺](https://www.paypal.me/jaepay)!

Lisence
-------

[](#lisence)

QueryList is licensed under the license of MIT. See the LICENSE for more details.

###  Health Score

14

—

LowBetter than 2% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity3

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity20

Early-stage or recently created project

 Bus Factor1

Top contributor holds 87.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://www.gravatar.com/avatar/97094120912de5e84a55876227d46ba8d4500f507af14bf391e804da9dbc730f?d=identicon)[DimJ](/maintainers/DimJ)

---

Top Contributors

[![jae-jae](https://avatars.githubusercontent.com/u/5620429?v=4)](https://github.com/jae-jae "jae-jae (134 commits)")[![DimJ](https://avatars.githubusercontent.com/u/24504085?v=4)](https://github.com/DimJ "DimJ (7 commits)")[![edwinhuish](https://avatars.githubusercontent.com/u/5207925?v=4)](https://github.com/edwinhuish "edwinhuish (5 commits)")[![maxiaozhi](https://avatars.githubusercontent.com/u/34503138?v=4)](https://github.com/maxiaozhi "maxiaozhi (2 commits)")[![huangjie-dd](https://avatars.githubusercontent.com/u/69784661?v=4)](https://github.com/huangjie-dd "huangjie-dd (2 commits)")[![storyflow](https://avatars.githubusercontent.com/u/12422762?v=4)](https://github.com/storyflow "storyflow (1 commits)")[![bryant1410](https://avatars.githubusercontent.com/u/3905501?v=4)](https://github.com/bryant1410 "bryant1410 (1 commits)")[![baijunyao](https://avatars.githubusercontent.com/u/9360694?v=4)](https://github.com/baijunyao "baijunyao (1 commits)")

### Embed Badge

![Health badge](/badges/dimj-leoninventquerylist/health.svg)

```
[![Health](https://phpackages.com/badges/dimj-leoninventquerylist/health.svg)](https://phpackages.com/packages/dimj-leoninventquerylist)
```

###  Alternatives

[rs/laravel-version-control

Foundations for making your app version controlled. Provides migration, blueprint and base models. Will make your app GxP compliant if you exclusively use the VC models and table structure as set out in this package.

1227.5k](/packages/rs-laravel-version-control)[mad-web/laravel-seoable

Easy to map your eloquent fields to seo properties

407.6k](/packages/mad-web-laravel-seoable)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
