PHPackages                             anassrojea/laracrawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Localization &amp; i18n](/categories/localization)
4. /
5. anassrojea/laracrawler

ActiveLibrary[Localization &amp; i18n](/categories/localization)

anassrojea/laracrawler
======================

Laravel sitemap generator and crawler package for SEO optimization. Supports multilingual sites, image/video indexing, link validation, priority scoring, and advanced crawling automation.

1.2.0(2mo ago)079MITPHPPHP ^8.0

Since Sep 24Pushed 2mo agoCompare

[ Source](https://github.com/anassrojea/laracrawler)[ Packagist](https://packagist.org/packages/anassrojea/laracrawler)[ RSS](/packages/anassrojea-laracrawler/feed)WikiDiscussions main Synced today

READMEChangelogDependencies (9)Versions (5)Used By (0)

🚀 Laracrawler Sitemap Generator
===============================

[](#-laracrawler-sitemap-generator)

 [![Laracrawler Sitemap Generator](./assets/cover.png)](./assets/cover.png)

A powerful **Laravel sitemap generator** with crawling, validation, multilingual support, priority auto-scoring, indexability audit, and more.
Optimized for **Google SEO best practices**.

---

✨ Features
----------

[](#-features)

- **Recursive crawling** with depth control
- **URL normalization** (HTTPS, trailing slashes, lowercase, strip queries/anchors)
- **Exclusion rules** for URLs and assets (regex, extensions, substrings)
- **Multilingual alternates (`hreflang`)** with validation
- **Image sitemap enhancements**
    - Extract `` + `` sources
    - Add `` and `` from `alt`/`title`
- **Video sitemap enhancements**
    - Extract ``, ``, and `` (YouTube, Vimeo)
    - Add `` and `` (defaults configurable)
- **Priority auto-scoring**
    - Based on crawl depth, internal link popularity, freshness
    - Supports per-page `priority_boost`
- **Flexible lastmod strategies**
    - `now` → always current time
    - `file` → file modification time
    - `db` → fetch from database column
    - `callback` → resolve dynamically via Closure/service
- **Indexability audit**
    - Detects `noindex` in headers (`X-Robots-Tag`) or meta tags
    - Excludes such pages and logs them into `sitemap-errors.xml`
- **Link validation**
    - Detects broken or soft-404 links
    - Excludes them and logs into `sitemap-errors.xml`
- **Split &amp; index**
    - Auto-splits large sitemaps (`50k URLs` or `50MB` limit)
    - Generates `sitemap-index.xml`
- **Queue support** for async crawling in Laravel jobs
- **Auto-ping search engines** (Google, Bing, Yandex, Baidu)
- **Configurable HTTP client** (timeouts, SSL verify, User-Agent)

---

⚙️ Installation
---------------

[](#️-installation)

```
composer require anassrojea/laracrawler
```

Publish config:

```
php artisan vendor:publish --tag=laracrawler-config
```

---

🛠️ Usage
--------

[](#️-usage)

Generate sitemap:

```
php artisan laracrawler:generate
```

Options:

- `--summary` → Print summary of exclusions.
- `--debug` → Extra debug output.
- `--validate` → Force validation of links even if disabled in config.

---

📂 Configuration (`config/sitemap.php`)
--------------------------------------

[](#-configuration-configsitemapphp)

### 🔗 Base settings

[](#-base-settings)

```
'base_url'     => env('APP_URL', 'https://example.com'),
'xdefault'     => 'https://example.com', //
'validate_links' => false,
'max_errors'   => 5000,
```

---

### 🚫 Exclusions

[](#-exclusions)

```
'exclude_urls' => [
    '/admin',
    '#\?page=\d+#', // regex pagination
    '#/search#',
    '#\.(css|js)$#',
],
'exclude_assets' => [
    '#\.(css|js|json|xml|txt|md)$#',
    '#\.(zip|rar|tar|gz|7z)$#',
],
```

---

### 🌍 Normalization

[](#-normalization)

```
'normalize' => [
    'strip_queries'       => true,
    'strip_anchors'       => true,
    'strip_trailing_slash'=> true,
    'canonicalize'        => true,   // lowercase
    'enforce_https'       => true,
    'enforce_www'         => null,   // true = add, false = strip
    'force_trailing_slash'=> false,
],
```

---

### 🌐 Multilingual

[](#-multilingual)

```
'default_lang' => 'en',
'lang_mode'    => 'path', // "path", "subdomain", or "query"
'alternates'   => [
    'en' => 'https://example.com/en',
    'ar' => 'https://example.com/ar',
    'tr' => 'https://example.com/tr',
],
```

---

### 🖼 Include Rules

[](#-include-rules)

```
'include' => [
    'urls'      => true,
    'images'    => true,
    'videos'    => true,
    'languages' => true,

    'rules' => [
        '#/blog#' => [
            'images' => true,
            'videos' => false,
        ],
    ],
],
```

---

### 🖼 Image Settings

[](#-image-settings)

```
'image_whitelist' => [
    // '/storage/uploads/services/',
],
'image_defaults' => [
    'title'       => 'Image Title',
    'description' => 'Image Description',
],
```

---

### 🎥 Video Settings

[](#-video-settings)

```
'video_whitelist' => [
    // '/storage/uploads/services/',
],
'video_defaults' => [
    'title'       => 'Video Title',
    'description' => 'Video Description',
],
```

---

### 📊 Rules (SEO Overrides)

[](#-rules-seo-overrides)

Rules let you **override defaults per URL pattern**:

```
'rules' => [
    '/$' => [ // homepage
        'changefreq' => 'daily',
        'priority'   => '1.0',
        'lastmod'    => 'now',
    ],

    '/blog' => [
        'changefreq'    => 'daily',
        'priority'      => '0.9',
        'priority_boost'=> 0.3, // 🚀 boost blogs slightly
        'lastmod'       => [
            'strategy' => 'db',
            'table'    => 'posts',
            'lookup'   => 'slug',
            'column'   => 'updated_at',
        ],
    ],

    '#^/(en|ar|tr)?/service#' => [
        'changefreq'    => 'weekly',
        'priority'      => null, // auto-score
        'priority_boost'=> 0.3,  // 🚀 boost services
        'lastmod'       => [
            'strategy' => 'db',
            'table'    => 'services',
            'lookup'   => 'slug',
            'column'   => 'updated_at',
        ],
    ],
],
```

- `priority` → fixed value (`0.1`–`1.0`) or `null` for auto-score.
- `priority_boost` → bump score (applied only if auto-score).
- `lastmod` strategies:
    - `"now"` → always current timestamp
    - `"file"` → filesystem `mtime`
    - `"db"` → fetch `updated_at` from DB
    - `"callback"` → custom closure or service

---

### 📈 Priority Scoring

[](#-priority-scoring)

```
'priority_scoring' => [
    'enabled'   => true,
    'weights'   => [
        'depth'     => 0.4,
        'links'     => 0.4,
        'freshness' => 0.2,
    ],
    'min' => 0.1,
    'max' => 1.0,
],
```

---

### 📡 Pinging Search Engines

[](#-pinging-search-engines)

```
'ping' => true,
'ping_targets' => [
    'Google' => 'http://www.google.com/ping?sitemap=',
    'Bing'   => 'http://www.bing.com/ping?sitemap=',
    'Yandex' => 'https://webmaster.yandex.com/ping?sitemap=',
    'Baidu'  => 'http://ping.baidu.com/ping?sitemap=',
],
```

---

### 🧵 Queue Support

[](#-queue-support)

```
'queue' => [
    'enabled'    => false,
    'connection' => 'default',
    'batch_size' => 100,
],
```

---

### 🌐 HTTP Client Settings

[](#-http-client-settings)

```
'http' => [
    'validate_links' => [
        'timeout' => 10,
        'connect_timeout' => 5,
        'verify' => false,
        'http_errors' => false,
        'headers' => [
            'User-Agent' => 'LaracrawlerBot/1.0 (https://example.com)',
        ],
    ],
    'validate_alternates' => [
        'timeout' => 5,
        'connect_timeout' => 1,
        'verify' => false,
        'http_errors' => false,
        'headers' => [
            'User-Agent' => 'LaracrawlerBot/1.0 (https://example.com)',
        ],
    ],
],
```

---

### 🕵 Indexability Audit

[](#-indexability-audit)

```
'indexability_audit' => true,
```

Flags URLs with:

- `X-Robots-Tag: noindex`
- ``

---

🛠 Artisan Command
-----------------

[](#-artisan-command)

```
php artisan laracrawler:generate     --max-depth=2     --output=public     --split     --single     --no-ping     --ping-only     --sitemap=sitemap.xml     --debug     --summary     --fresh     --queue     --validate     --audit-indexability
```

### Flags

[](#flags)

- `--max-depth` → set crawl depth
- `--output` → custom output dir
- `--split` → force multiple sitemap files
- `--single` → force one sitemap.xml
- `--no-ping` → skip pinging search engines
- `--ping-only` → only ping, no crawl
- `--sitemap` → custom sitemap name (with ping-only)
- `--debug` → show exclusions in detail
- `--summary` → summary of exclusions
- `--fresh` → clear cache and recrawl
- `--queue` → run crawl in background via jobs
- `--validate` → enable link validation
- `--audit-indexability` → enable noindex audit

---

📦 Outputs
---------

[](#-outputs)

- `sitemap.xml` or `sitemap-index.xml`
- `sitemap-errors.xml` (broken links, invalid alternates, noindex pages)

---

✅ SEO Benefits
--------------

[](#-seo-benefits)

- Clean, canonicalized URLs only
- Correct handling of alternates (`hreflang` + `x-default`)
- Image metadata (`title`, `caption`)
- Video metadata (`title`, `description`)
- Excludes noindex &amp; broken pages automatically
- Auto-prioritization for deep/fresh/popular content

---

🔧 Best Practices
----------------

[](#-best-practices)

- Always run with `--validate` in production
- Configure `ping_targets` so Google/Bing auto-refresh faster
- Use `priority_boost` in rules for critical pages
- Whitelist only important image/video directories to keep sitemap lean
- Enable `indexability_audit` to avoid indexing blocked content

---

📜 License
---------

[](#-license)

This package is open-sourced software licensed under the [MIT license](LICENSE).

###  Health Score

38

—

LowBetter than 83% of packages

Maintenance83

Actively maintained with recent releases

Popularity9

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity44

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~99 days

Total

3

Last Release

85d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/132eda178aaed33b0ad7e3dc46ac8153e65558921861698aa6e9bef2093e31f7?d=identicon)[anas.srojea](/maintainers/anas.srojea)

---

Top Contributors

[![anassrojea](https://avatars.githubusercontent.com/u/54909632?v=4)](https://github.com/anassrojea "anassrojea (33 commits)")

---

Tags

laravelmultilinguallaravel-packagecrawlerSitemapseoxml-sitemapsearch engine optimizationSEO Toolshreflanglaracrawlerimage-sitemapvideo-sitemapindexabilitypriority-scoring

### Embed Badge

![Health badge](/badges/anassrojea-laracrawler/health.svg)

```
[![Health](https://phpackages.com/badges/anassrojea-laracrawler/health.svg)](https://phpackages.com/packages/anassrojea-laracrawler)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M3.1k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k18.5M67](/packages/spatie-crawler)[spatie/laravel-export

Create a static site bundle from a Laravel app

674146.0k6](/packages/spatie-laravel-export)[sproutcms/cms

Enterprise content management and framework

242.5k4](/packages/sproutcms-cms)[simplestats-io/laravel-client

Server-side analytics for Laravel that follows the full funnel from visit to registration to payment, attributed to the channel that drove it. Revenue, MRR, churn and ad-spend profit (ROAS/CAC) per channel. GDPR compliant, ad-blocker proof.

5022.0k](/packages/simplestats-io-laravel-client)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
