PHPackages                             wengoooo/haixun - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. wengoooo/haixun

ActiveLibrary

wengoooo/haixun
===============

v0.3(6y ago)0151PHPPHP &gt;7.0

Since Sep 6Pushed 6y ago1 watchersCompare

[ Source](https://github.com/wwwenge/haixun)[ Packagist](https://packagist.org/packages/wengoooo/haixun)[ RSS](/packages/wengoooo-haixun/feed)WikiDiscussions master Synced 2mo ago

READMEChangelogDependencies (6)Versions (4)Used By (0)

haixun
======

[](#haixun)

安装
==

[](#安装)

环境要求
----

[](#环境要求)

> - PHP &gt;= 7.0
> - [PHP cURL 扩展](http://php.net/manual/en/book.curl.php)
> - [PHP OpenSSL 扩展](http://php.net/manual/en/book.openssl.php)

安装
--

[](#安装-1)

使用 [composer](http://getcomposer.org/):

```
$ composer require wengoooo/haixun
```

快速开始
----

[](#快速开始)

> 建立一个爬虫

```
require_once "vendor/autoload.php";

use GuzzleHttp\Psr7\Request;
class TheBaseSpider extends \Haixun\Core\Spiders {
    public $maxPage = 1;
    public $currentPage = 1;
    public $userId;

//    public $startUrls = ['http://www.httpbin.org/get', 'http://www.httpbin.org/user-agent'];

    public function startRequests()
    {
        yield new Request("GET", "https://www.domain.com/categories/1735750");
    }

    public function parse(Haixun\Http\Response $response, $index)
    {
        if (sizeof($response->css("#max_page")) > 0) {
            $this->maxPage = (int)$response->css("#max_page")->text();
            $this->currentPage = 1;
            preg_match_all("%(user_[^']+)%", $response->getBodyContents(), $result, PREG_PATTERN_ORDER);
            $this->userId = $result[0][0];
        }

        $uri = new \GuzzleHttp\Psr7\Uri($response->getCurrentUrl());

        while ($this->currentPage++ maxPage) {
            yield new Request("GET", sprintf("https://%s/load_items/categories/1735750/%s/%s/0", $uri->getHost(), $this->currentPage, $this->userId));
        }

        foreach ($response->css(".item a[href*=items]")->links() as $link) {
            yield new Request("GET", $link->getUri(), ['meta' => ['callback' => 'parseProduct']]);
        }

    }

    public function parseProduct(Haixun\Http\Response $response, $index) {
        var_dump($response->css("h2.itemTitle")->text());
    }

    public function finish() {}
}
```

> 启动爬虫

```
$crawler = new \Haixun\Core\Crawler(new TheBaseSpider());
$crawler->crawl();
```

DomCrawler Crawler
------------------

[](#domcrawler-crawler)

> 实例化

```
$url = 'https://movie.douban.com/subject/25812712/?from=showing';

$response = file_get_contents($url);
//进行XPath页面数据抽取
$data    = []; //结构化数据存本数组

$crawler = new Crawler();
$crawler->addHtmlContent($response);
```

> 查找元素

```
# xpath
$crawler->filterXPath('//*[@id="content"]/h1/span[1]')->text();
$crawler->filterXPath('//*[@id="content"]/h1/span[1]')->html();

# css
$crawler->filter('#content h1 span')->text();
$crawler->filter('#content h1 span')->html();
```

> 遍历元素

```
$crawler->filterXPath('//ul[contains(@class,"celebrities-list from-subject")]/li')->each(function (Crawler $node, $i) {
    $node->attr("class") # 获取属性
});
```

> 获取总数

```
$crawler->filter(".item a[href*=items]")->count();
```

> 遍历所有链接

```
foreach($crawler->filter(".item a[href*=items]")->links() as $link) {
    echo $link->getUri();
}
```

###  Health Score

22

—

LowBetter than 22% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity7

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity45

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 85.7% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~7 days

Total

3

Last Release

2427d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/1544682?v=4)[Wengo Team Web](/maintainers/wengo)[@Wengo](https://github.com/Wengo)

---

Top Contributors

[![wengoooooooo](https://avatars.githubusercontent.com/u/26360605?v=4)](https://github.com/wengoooooooo "wengoooooooo (6 commits)")[![wwwenge](https://avatars.githubusercontent.com/u/26360403?v=4)](https://github.com/wwwenge "wwwenge (1 commits)")

### Embed Badge

![Health badge](/badges/wengoooo-haixun/health.svg)

```
[![Health](https://phpackages.com/badges/wengoooo-haixun/health.svg)](https://phpackages.com/packages/wengoooo-haixun)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.6k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k16.3M52](/packages/spatie-crawler)[ec-cube/ec-cube

EC-CUBE EC open platform.

78527.0k1](/packages/ec-cube-ec-cube)[spatie/laravel-pjax

A pjax middleware for Laravel 5

513371.8k11](/packages/spatie-laravel-pjax)[spatie/laravel-export

Create a static site bundle from a Laravel app

646127.9k5](/packages/spatie-laravel-export)[concrete5/core

Concrete core subtree split

19159.3k48](/packages/concrete5-core)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
