PHPackages                             baraja-core/webcrawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. baraja-core/webcrawler

ActiveLibrary

baraja-core/webcrawler
======================

Simple package to load list of urls and make sitemap.

v1.3.3(2y ago)6342[1 PRs](https://github.com/baraja-core/webcrawler/pulls)PHPPHP ^8.0

Since Jul 24Pushed 1y ago1 watchersCompare

[ Source](https://github.com/baraja-core/webcrawler)[ Packagist](https://packagist.org/packages/baraja-core/webcrawler)[ Docs](https://github.com/baraja-core/webcrawler)[ RSS](/packages/baraja-core-webcrawler/feed)WikiDiscussions master Synced 2mo ago

READMEChangelog (10)Dependencies (9)Versions (20)Used By (0)

   ![BRJ logo](https://camo.githubusercontent.com/813c67e02a7ab7e4dc900316a4521c3ddf5846fe2cabba7140f3f4b78afda198/68747470733a2f2f63646e2e62726a2e6170702f696d616765732f62726a2d6c6f676f2f6c6f676f2d6461726b2e706e67)
 [BRJ organisation](https://brj.app)

---

Web crawler
===========

[](#web-crawler)

[![Integrity check](https://github.com/baraja-core/webcrawler/workflows/Integrity%20check/badge.svg)](https://github.com/baraja-core/webcrawler/workflows/Integrity%20check/badge.svg)

Simply library for crawling websites by following links with minimal dependencies.

[Czech documentation](https://php.baraja.cz/stazeni-celeho-webu-po-odkazech)

📦 Installation
--------------

[](#-installation)

It's best to use [Composer](https://getcomposer.org) for installation, and you can also find the package on [Packagist](https://packagist.org/packages/baraja-core/webcrawler) and [GitHub](https://github.com/baraja-core/webcrawler).

To install, simply use the command:

```
$ composer require baraja-core/webcrawler

```

You can use the package manually by creating an instance of the internal classes, or register a DIC extension to link the services directly to the Nette Framework.

How to use
----------

[](#how-to-use)

Crawler can run without dependencies.

In default settings create instance and call `crawl()` method:

```
$crawler = new \Baraja\WebCrawler\Crawler;

$result = $crawler->crawl('https://example.com');
```

In `$result` variable will be entity of type `CrawledResult`.

Advanced checking of multiple URLs
----------------------------------

[](#advanced-checking-of-multiple-urls)

In real case you need download multiple URLs in single domain and check if some specific URLs works.

Simple example:

```
$crawler = new \Baraja\WebCrawler\Crawler;

$result = $crawler->crawlList(
    'https://example.com', // Starting (main) URL
    [ // Additional URLs
        'https://example.com/error-404',
        '/robots.txt', // Relative links are also allowed
        '/web.config',
    ]
);
```

Notice: File **robots.txt** and sitemap will be downloaded automatically if exist.

Settings
--------

[](#settings)

In constructor of service `Crawler` you can define your project specific configuration.

Simply like:

```
$crawler = new \Baraja\WebCrawler\Crawler(
    new \Baraja\WebCrawler\Config([
        // key => value
    ])
);
```

No one value is required. Please use as key-value array.

Configuration options:

OptionDefault valuePossible values`followExternalLinks``false``Bool`: Stay only in given domain?`sleepBetweenRequests``1000``Int`: Sleep in milliseconds.`maxHttpRequests``1000000``Int`: Crawler budget limit.`maxCrawlTimeInSeconds``30``Int`: Stop crawling when limit is exceeded.`allowedUrls``['.+']``String[]`: List of valid regex about allowed URL format.`forbiddenUrls``['']``String[]`: List of valid regex about banned URL format.📄 License
---------

[](#-license)

`baraja-core/webcrawler` is licensed under the MIT license. See the [LICENSE](https://github.com/baraja-core/variable-generator/blob/master/LICENSE) file for more details.

###  Health Score

35

—

LowBetter than 79% of packages

Maintenance27

Infrequent updates — may be unmaintained

Popularity14

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity72

Established project with proven stability

 Bus Factor1

Top contributor holds 95.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~104 days

Recently: every ~228 days

Total

15

Last Release

1018d ago

PHP version history (4 changes)v1.0.0PHP &gt;=7.1.0

v1.2.0PHP &gt;=7.4.0

v1.3.0PHP ^7.4 || ^8.0

v1.3.1PHP ^8.0

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/3382204?v=4)[baraja](/maintainers/baraja)[@baraja](https://github.com/baraja)

---

Top Contributors

[![janbarasek](https://avatars.githubusercontent.com/u/4738758?v=4)](https://github.com/janbarasek "janbarasek (61 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (1 commits)")[![dependabot-preview[bot]](https://avatars.githubusercontent.com/in/2141?v=4)](https://github.com/dependabot-preview[bot] "dependabot-preview[bot] (1 commits)")[![pschur](https://avatars.githubusercontent.com/u/78649242?v=4)](https://github.com/pschur "pschur (1 commits)")

---

Tags

botcrawlercrawling-websitesfastphprobotspeed

###  Code Quality

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/baraja-core-webcrawler/health.svg)

```
[![Health](https://phpackages.com/badges/baraja-core-webcrawler/health.svg)](https://phpackages.com/packages/baraja-core-webcrawler)
```

###  Alternatives

[nette/php-generator

🐘 Nette PHP Generator: generates neat PHP code for you. Supports new PHP 8.5 features.

2.2k64.2M576](/packages/nette-php-generator)[nette/forms

📝 Nette Forms: generating, validating and processing secure forms in PHP. Handy API, fully customizable, server &amp; client side validation and mature design.

54013.2M450](/packages/nette-forms)[nette/caching

⏱ Nette Caching: library with easy-to-use API and many cache backends.

43518.6M368](/packages/nette-caching)[nette/application

🏆 Nette Application: a full-stack component-based MVC kernel for PHP that helps you write powerful and modern web applications. Write less, have cleaner code and your work will bring you joy.

44615.4M983](/packages/nette-application)[nette/mail

📧 Nette Mail: A handy library for creating and sending emails in PHP.

5389.8M246](/packages/nette-mail)[symplify/monorepo-builder

Not only Composer tools to build a Monorepo.

5205.3M82](/packages/symplify-monorepo-builder)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
