PHPackages                             bitandblack/document-crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. bitandblack/document-crawler

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

bitandblack/document-crawler
============================

Extract different parts of an HTML or XML document.

0.5.0(5mo ago)6431MITPHPPHP &gt;=8.2

Since Nov 28Pushed 4mo agoCompare

[ Source](https://github.com/BitAndBlack/document-crawler)[ Packagist](https://packagist.org/packages/bitandblack/document-crawler)[ RSS](/packages/bitandblack-document-crawler/feed)WikiDiscussions main Synced today

READMEChangelog (1)Dependencies (20)Versions (7)Used By (0)

[![PHP from Packagist](https://camo.githubusercontent.com/54ab39a528eb5595c130441335344ae21ef76f1e6ec96ed10bd14c2d31ecc964/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f626974616e64626c61636b2f646f63756d656e742d637261776c6572)](http://www.php.net)[![Latest Stable Version](https://camo.githubusercontent.com/18562f1fb305041ed65cf2f8f8913dd661586eda8e686c6345a7434c60efe2dd/68747470733a2f2f706f7365722e707567782e6f72672f626974616e64626c61636b2f646f63756d656e742d637261776c65722f762f737461626c65)](https://packagist.org/packages/bitandblack/document-crawler)[![Total Downloads](https://camo.githubusercontent.com/46ef464d2dcec0d10dcc354bbdfc28a69c3337e7c3635eca95f092ffc345a773/68747470733a2f2f706f7365722e707567782e6f72672f626974616e64626c61636b2f646f63756d656e742d637261776c65722f646f776e6c6f616473)](https://packagist.org/packages/bitandblack/document-crawler)[![License](https://camo.githubusercontent.com/296f56e1e262d4a2271e5539bdba9e3b37f198591f072ca541f032f831973a78/68747470733a2f2f706f7365722e707567782e6f72672f626974616e64626c61636b2f646f63756d656e742d637261776c65722f6c6963656e7365)](https://packagist.org/packages/bitandblack/document-crawler)

 [ ![Bit&Black Logo](https://camo.githubusercontent.com/6cd6b1b85b6247964059c5d9420aae54482ebf17afbaf5f08127bc49de5916f4/68747470733a2f2f7777772e626974616e64626c61636b2e636f6d2f6275696c642f696d616765732f426974416e64426c61636b2d4c6f676f2d46756c6c2e706e67) ](https://www.bitandblack.com)

Bit&amp;Black Document Crawler
==============================

[](#bitblack-document-crawler)

Extract different parts of an HTML or XML document.

Installation
------------

[](#installation)

This library is made for the use with [Composer](https://packagist.org/packages/bitandblack/document-crawler). Add it to your project by running `$ composer require bitandblack/document-crawler`.

Usage
-----

[](#usage)

### Using Crawlers to extract parts of a document

[](#using-crawlers-to-extract-parts-of-a-document)

The *Bit&amp;Black Document Crawler* library provides different crawlers, to extract information of a document. There are currently existing:

- [**AnchorsCrawler**](./src/Crawler/AnchorsCrawler.php): Crawl and extract all defined anchors in a document, that have been declared with `...`.
- [**IconsCrawler**](./src/Crawler/IconsCrawler.php): Crawl and extract all defined icons in a document, that have been declared with ``.
- [**ImagesCrawler**](./src/Crawler/ImagesCrawler.php): Crawl and extract all defined images in a document, that have been declared with ``.
- [**LanguageCodeCrawler**](./src/Crawler/LanguageCodeCrawler.php): Crawl and extract the language code of a document, that has been declared with ``.
- [**MetaTagsCrawler**](./src/Crawler/MetaTagsCrawler.php): Crawl and extract all defined meta tags in a document, that have been declared with ``.
- [**TitleCrawler**](./src/Crawler/TitleCrawler.php): Crawl and extract the title of a document, that has been declared with `...`.

All those crawlers work the same — they need a [DomCrawler](https://symfony.com/doc/current/components/dom_crawler.html) object, that contains the document:

```
