PHPackages                             w3zone/crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [HTTP &amp; Networking](/categories/http)
4. /
5. w3zone/crawler

ActiveLibrary[HTTP &amp; Networking](/categories/http)

w3zone/crawler
==============

1.0.1(9y ago)51162MITPHPPHP &gt;=5.5

Since Feb 3Pushed 9y ago2 watchersCompare

[ Source](https://github.com/w3zone/crawler)[ Packagist](https://packagist.org/packages/w3zone/crawler)[ RSS](/packages/w3zone-crawler/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (1)DependenciesVersions (2)Used By (0)

Crawler
=======

[](#crawler)

Write Less, Do More.

Installation
------------

[](#installation)

```
composer require w3zone/Crawler
```

### Requirements

[](#requirements)

- node.js &gt; 4.x
- libcurl
- php-curl
- node.js request module

```
npm install request
```

Usage
-----

[](#usage)

```
require_once 'vendor/autoload.php';

use w3zone\Crawler\{Crawler, Services\phpCurl};

$crawler = new Crawler(new phpCurl);

$link = 'http://www.example.com';

// return an array [statusCode, body, headers, cookies]
// get method may contain link string or an array [url, query string]
$homePage = $crawler->get($link)->dumpHeaders()->run();

$response = $crawler->get($link)->dumpHeaders()->cookies($homePage['cookies'], 'r+w')->run();
```

Available Services
------------------

[](#available-services)

- phpCurl
    `use w3zone\Crawler\Services\phpCurl;`
- nodejsRequest
    `use w3zone\Crawler\Services\nodejsRequest;`
- cliCurl
    `use w3zone\Crawler\Services\cliCurl;`

Available Methods
-----------------

[](#available-methods)

- Get
    `Crawler::get(mixed $arguments);`
    set the request to GET method,
    accepts parameter holding the requested URL.
- Post
    `Crawler::post(mixed $arguments);`
    set the request to POST method,
    accepts an array of options

```
$arguments = [
    'url' => 'www.example.com/login',
    'data' => [
        'username' => '',
        'password' => ''
    ]
];
```

- Json
    `Crawler::json(void)`
    an easy way to create a json request.
- XML
    `Crawler::xml(void)`
    an easy way to create a xml request.
- Referer
    `Crawler::referer(string $referer)`
    set the current request referer.
- Headers
    `Crawler::headers(array $headers)`
    set the request additional headers,
    note that this function will overwrite json &amp;&amp; xml functions.
- DumpHeaders
    `Crawler::dumpHeaders(void)`
    include the response headers in the object response.
- Proxy `Crawler::proxy(mixed $proxy)`
    set the request proxy IP and proxy type,
    note proxy method accepts an array of proxy IP and proxy Type or an IP string

```
$proxy = [
    'ip' => 'xx.xx.xx.xx:xx',
    'type' => 'socks5'
];
```

if you've passed an IP as a string the default type will be HTTP.

- Cookies
    `Crawler::cookies(string $file, string $mode)`
    set your proxy type, the first argument is a cookie string,
    the seccond argument is the cookie mode ,
    available modes :
    \-- w : write only mode -- r : read only mode -- w+r : read and write
- Initialize
    `Crawler::initialize(array $arguments)`
    initialize or re-initialize your request
    note that , this method will overwrite the other options
- Run `Crawler::run(void)`
    fire the request.

---

### Examples:-

[](#examples-)

Quick example to login into Github :-

```
require_once 'vendor/autoload.php';

use w3zone\Crawler\{Crawler, Services\phpCurl};

$crawler = new Crawler(new phpCurl);

$url = 'https://github.com/login';
$response = $crawler->get($url)->dumpHeaders()->run();

preg_match('#
