PHPackages                             nggiahao/crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. nggiahao/crawler

ActiveLibrary

nggiahao/crawler
================

0.1.0(5y ago)493MITPHPPHP ^7.3

Since Sep 13Pushed 5y agoCompare

[ Source](https://github.com/nggiahao/crawler)[ Packagist](https://packagist.org/packages/nggiahao/crawler)[ Docs](https://github.com/nggiahao/crawler)[ RSS](/packages/nggiahao-crawler/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (1)Dependencies (13)Versions (2)Used By (0)

Laravel Crawler
===============

[](#laravel-crawler)

[![Latest Version on Packagist](https://camo.githubusercontent.com/40c98069b46f51c34208ac7e818b30a55e2730582bcf03ba931a87d663803ae6/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6e6767696168616f2f637261776c65722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/nggiahao/crawler)[![Total Downloads](https://camo.githubusercontent.com/a1d0c7be0ca403236d834bed593125def2fe8d0de9fc536fa34b4dcae6442598/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f6e6767696168616f2f637261776c65722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/nggiahao/crawler)

Package này có nhiêm vụ thu thập dữ liệu từ các website khác sử dụng Guzzle, Phantomjs hay Puppeteer.

Nó sử dụng Amphp để có thể chạy nhiều process 1 lúc.

Installation
------------

[](#installation)

You can install the package via composer:

```
composer require nggiahao/crawler
```

```
php artisan vendor:publish --provider="Nggiahao\Crawler\CrawlerServiceProvider" --tag="config"
php artisan vendor:publish --provider="Nggiahao\Crawler\CrawlerServiceProvider" --tag="migrations"
php artisan migrate
```

Nếu bạn sử dụng Phantomjs hay Puppeteer thì hãy cài đặt chúng.

Usage
-----

[](#usage)

### Step 1: Tạo Site

[](#step-1-tạo-site)

```
use Nggiahao\Crawler\SitesConfig\SiteAbstract;

class W123job extends SiteAbstract {

    public function rootUrl(): string
    {
        return 'https://123job.vn';
    }

    public function startUrls(): array {
        return [
            "https://123job.vn",
        ];
    }

    public function shouldCrawl( $url ) {
        return preg_match( "/^https:\/\/123job\.vn\/viec-lam\//", $url) || preg_match( "/^https:\/\/123job\.vn\/company\//", $url);
    }

    public function shouldGetData( $url ) {
        return preg_match( "/\/company\//", $url);
    }

    public function getInfoFromCrawler(Crawler $dom_crawler)
    {
        return parent::getInfoFromCrawler($dom_crawler);
    }
}
```

- `startUrls()` trả về mảng các url sẽ được sử dụng trong lần chạy đầu tiên
- `shouldCrawl()` định nghĩa như nào là 1 url cần phi vào
- `shouldGetData()` định nghĩa như nào là 1 url cần lấy data
- `getInfoFromCrawler()` hàm này định nghĩa viêc lấy data như thế nào? (sử dụng [DomCrawler](https://symfony.com/doc/current/components/dom_crawler.html))

### Step 2: Khai báo site

[](#step-2-khai-báo-site)

`config/crawler.php`

```
'site_config' => [
        W123job::class
    ]

```

### Step 3: Start

[](#step-3-start)

```
    $sites = ['W123job'];
    $config = [
        'concurrency' => 10,
        'proxy'       => null,
        'browser'     => 'guzzle',
    ];
    $reset = false; //reset queue
    app(\Nggiahao\Crawler\Crawler::class)->run($sites, $config, $reset);
```

### Testing

[](#testing)

```
composer test
```

### Changelog

[](#changelog)

Please see [CHANGELOG](CHANGELOG.md) for more information what has changed recently.

Contributing
------------

[](#contributing)

Please see [CONTRIBUTING](CONTRIBUTING.md) for details.

### Security

[](#security)

If you discover any security related issues, please email  instead of using the issue tracker.

Credits
-------

[](#credits)

- [Nguyen Gia Hao](https://github.com/nggiahao)
- [All Contributors](../../contributors)

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

21

—

LowBetter than 19% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity11

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity39

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

2064d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/41698bb06c384e6d94e269250b5e47355250e7254adb617ee956f9a4a436e0b9?d=identicon)[Nguyễn Gia Hào](/maintainers/Nguy%E1%BB%85n%20Gia%20H%C3%A0o)

---

Top Contributors

[![nggiahao1999](https://avatars.githubusercontent.com/u/40767596?v=4)](https://github.com/nggiahao1999 "nggiahao1999 (5 commits)")

---

Tags

crawlerphpcrawlernggiahao

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/nggiahao-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/nggiahao-crawler/health.svg)](https://phpackages.com/packages/nggiahao-crawler)
```

###  Alternatives

[spatie/crawler

Crawl all internal links found on a website

2.8k16.3M52](/packages/spatie-crawler)[craftcms/cms

Craft CMS

3.6k3.6M2.6k](/packages/craftcms-cms)[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[sandstorm/e2etesttools

1040.4k](/packages/sandstorm-e2etesttools)[sproutcms/cms

Enterprise content management and framework

241.6k4](/packages/sproutcms-cms)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
