PHPackages                             iprodev/sitemap-generator-pro - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Database &amp; ORM](/categories/database)
4. /
5. iprodev/sitemap-generator-pro

ActiveLibrary[Database &amp; ORM](/categories/database)

iprodev/sitemap-generator-pro
=============================

A professional, production-ready PHP XML sitemap generator with advanced features: concurrency, robots.txt, caching, database storage, change detection, SEO analysis, JavaScript rendering, proxy support, webhooks, and more.

v3.0.0(7mo ago)855952MITPHPPHP &gt;=8.0

Since Oct 8Pushed 7mo ago10 watchersCompare

[ Source](https://github.com/iprodev/PHP-XML-Sitemap-Generator)[ Packagist](https://packagist.org/packages/iprodev/sitemap-generator-pro)[ Docs](https://github.com/iprodev/sitemap-generator-pro)[ RSS](/packages/iprodev-sitemap-generator-pro/feed)WikiDiscussions master Synced today

READMEChangelog (3)Dependencies (6)Versions (4)Used By (0)

PHP XML Sitemap Generator (Library + CLI)
=========================================

[](#php-xml-sitemap-generator-library--cli)

A professional, production-ready PHP sitemap generator by **iProDev (Hemn Chawroka)** — supports concurrency, robots.txt, gzip compression, sitemap index files, and comprehensive error handling.

[![CI/CD Pipeline](https://github.com/iprodev/PHP-XML-Sitemap-Generator/actions/workflows/ci.yml/badge.svg)](https://github.com/iprodev/PHP-XML-Sitemap-Generator/actions/workflows/ci.yml)[![PHP Version](https://camo.githubusercontent.com/f32695bd6f65b12545162e869707d33dac6bcb5f6e5dc0d48b6d1f8162b6c247/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d253345253344382e302d626c75652e737667)](https://php.net)[![License](https://camo.githubusercontent.com/8bb50fd2278f18fc326bf71f6e88ca8f884f72f179d3e555e20ed30157190d0d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4d49542d677265656e2e737667)](LICENSE.md)[![Version](https://camo.githubusercontent.com/e2a085069c3b859ca6c994bcb2d4ca1295bdebc2bb85b6f660197bc97bef4356/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f76657273696f6e2d332e302e302d6f72616e67652e737667)](CHANGELOG.md)[![codecov](https://camo.githubusercontent.com/5225c761594d8d45ff43e2288d8920dff15c8eda15318a28e035eb4a85fdd526/68747470733a2f2f636f6465636f762e696f2f67682f6970726f6465762f5048502d584d4c2d536974656d61702d47656e657261746f722f6272616e63682f6d61696e2f67726170682f62616467652e737667)](https://codecov.io/gh/iprodev/PHP-XML-Sitemap-Generator)

---

🚀 What's New in v3.0
--------------------

[](#-whats-new-in-v30)

- ✨ **Database Storage** with change detection and historical tracking
- 🔄 **Resume Capability** with checkpoint system
- 🎯 **SEO Analysis** and content quality checking
- 📊 **Performance Metrics** and detailed analytics
- 🖼️ **Multi-format Sitemaps** (Images, Videos, News)
- 🌐 **JavaScript Rendering** support for SPAs
- 🔐 **Proxy Support** with rotation
- 🔔 **Webhook Notifications** for events
- 📅 **Scheduled Crawling** with cron integration
- 🎨 **Interactive Mode** for easy configuration
- ⚡ **Caching System** (File &amp; Redis)
- 🎛️ **Smart Filtering** with priority rules
- 📈 **Rate Limiting** with retry handling

---

📋 Table of Contents
-------------------

[](#-table-of-contents)

- [Features](#-features)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [CLI Usage](#-cli-usage)
- [Advanced Features](#-advanced-features)
- [Programmatic Usage](#-programmatic-usage)
- [Examples](#-examples)
- [Configuration](#-configuration)
- [API Reference](#-api-reference)
- [Testing](#-testing)
- [Docker](#-docker)
- [Troubleshooting](#-troubleshooting)
- [Contributing](#-contributing)

---

✨ Features
----------

[](#-features)

### Core Features

[](#core-features)

- 🚀 **High Performance** - Concurrent HTTP requests
- 🤖 **Robots.txt Compliant** - Respects crawling rules
- 📦 **Gzip Compression** - Automatic compression
- 📊 **Sitemap Index** - Multiple sitemap files
- 🛡️ **Error Handling** - Comprehensive error management
- 📝 **PSR-3 Logging** - Standard logging interface

### Advanced Features

[](#advanced-features)

- 💾 **Database Storage** - SQLite/MySQL/PostgreSQL support
- 🔄 **Change Detection** - Track URL changes over time
- 📈 **SEO Analysis** - Analyze pages for SEO issues
- 🔍 **Quality Checks** - Find duplicates, broken links
- 🎯 **Smart Filtering** - Include/exclude patterns
- ⚡ **Caching** - File and Redis cache support
- 📍 **Resume Support** - Continue interrupted crawls
- 🔔 **Webhooks** - Real-time notifications
- 📅 **Scheduling** - Automated periodic crawls
- 🌐 **JavaScript** - Render SPAs with headless Chrome
- 🔐 **Proxy Support** - HTTP/SOCKS proxies with rotation
- 🎨 **Interactive Mode** - User-friendly configuration

### Sitemap Types

[](#sitemap-types)

- 📄 Standard XML Sitemap
- 🖼️ Image Sitemap
- 🎬 Video Sitemap
- 📰 News Sitemap

---

📥 Installation
--------------

[](#-installation)

```
composer require iprodev/sitemap-generator-pro
```

### Requirements

[](#requirements)

- PHP &gt;= 8.0
- Extensions: curl, xml, mbstring, zlib, pdo
- Optional: redis, posix

---

🚀 Quick Start
-------------

[](#-quick-start)

### Basic Usage

[](#basic-usage)

```
php bin/sitemap --url=https://www.example.com
```

### Interactive Mode

[](#interactive-mode)

```
php bin/sitemap --interactive
```

### With All Features

[](#with-all-features)

```
php bin/sitemap \
  --url=https://www.example.com \
  --out=./sitemaps \
  --concurrency=20 \
  --cache-enabled \
  --db-enabled \
  --seo-analysis \
  --image-sitemap \
  --webhook-url=https://example.com/webhook \
  --verbose
```

---

🖥️ CLI Usage
------------

[](#️-cli-usage)

### Basic Options

[](#basic-options)

```
--url=              # Starting URL (required)
--out=             # Output directory
--concurrency=        # Concurrent requests (1-100)
--max-pages=          # Maximum pages to crawl
--max-depth=          # Maximum link depth
--public-base=      # Public base URL for sitemap index
--verbose, -v            # Verbose output
--help, -h               # Show help
```

### Caching

[](#caching)

```
--cache-enabled          # Enable caching
--cache-driver=file      # Cache driver: file|redis
--cache-ttl=3600         # Cache TTL in seconds
```

### Database &amp; Change Detection

[](#database--change-detection)

```
--db-enabled             # Enable database storage
--db-dsn=           # Database DSN
--detect-changes         # Compare with previous crawl
--only-changed           # Only include changed URLs
```

### Resume Support

[](#resume-support)

```
--resume                 # Resume from checkpoint
--checkpoint-interval= # Save checkpoint every N pages
```

### Rate Limiting

[](#rate-limiting)

```
--rate-limit=         # Requests per minute
--delay=             # Delay between requests (ms)
```

### Filtering

[](#filtering)

```
--exclude=     # Exclude patterns (comma-separated)
--include=     # Include only patterns
--priority-rules=  # Priority rules as JSON
```

### SEO &amp; Analysis

[](#seo--analysis)

```
--seo-analysis           # Enable SEO analysis
--check-quality          # Check content quality
--find-duplicates        # Find duplicate content
--find-broken-links      # Find broken links
```

### Advanced Sitemaps

[](#advanced-sitemaps)

```
--image-sitemap          # Generate image sitemap
--video-sitemap          # Generate video sitemap
--news-sitemap           # Generate news sitemap
```

### JavaScript Rendering

[](#javascript-rendering)

```
--enable-javascript      # Enable JS rendering
--chrome-path=     # Path to Chrome/Chromium
--wait-for-ajax=     # Wait time for AJAX
```

### Proxy Support

[](#proxy-support)

```
--proxy=            # Proxy URL
--proxy-file=      # Load proxies from file
--rotate-proxies         # Rotate through proxies
```

### Webhooks

[](#webhooks)

```
--webhook-url=      # Webhook for notifications
--notify-on-complete     # Notify when complete
--notify-on-error        # Notify on errors
```

---

🎯 Advanced Features
-------------------

[](#-advanced-features)

### 1. Database Storage &amp; Change Detection

[](#1-database-storage--change-detection)

Track changes over time:

```
php bin/sitemap \
  --url=https://example.com \
  --db-enabled \
  --detect-changes
```

The system will:

- Store all URLs in database
- Compare with previous crawl
- Generate change report (new, modified, deleted)
- Track SEO metrics over time

### 2. Resume Interrupted Crawls

[](#2-resume-interrupted-crawls)

Large crawls can be resumed:

```
php bin/sitemap \
  --url=https://example.com \
  --resume \
  --checkpoint-interval=1000
```

### 3. SEO Analysis

[](#3-seo-analysis)

Analyze pages for SEO issues:

```
php bin/sitemap \
  --url=https://example.com \
  --seo-analysis \
  --find-duplicates \
  --find-broken-links
```

Reports include:

- Missing title/meta descriptions
- Duplicate content
- Broken links
- Page load times
- Mobile optimization
- Structured data

### 4. JavaScript Rendering

[](#4-javascript-rendering)

For SPAs (React, Vue, Angular):

```
php bin/sitemap \
  --url=https://spa.example.com \
  --enable-javascript \
  --chrome-path=/usr/bin/chromium \
  --wait-for-ajax=5000
```

### 5. Scheduled Crawling

[](#5-scheduled-crawling)

Setup automated crawls:

```
use IProDev\Sitemap\Scheduler\CronScheduler;

$scheduler = new CronScheduler();
$scheduler->addSchedule('daily-crawl', [
    'url' => 'https://example.com',
    'schedule' => 'daily',  // or cron: '0 2 * * *'
    'out' => './sitemaps',
    'db_enabled' => true
]);

// Add to crontab:
// * * * * * php bin/scheduler
```

### 6. Webhooks

[](#6-webhooks)

Get notified of events:

```
php bin/sitemap \
  --url=https://example.com \
  --webhook-url=https://example.com/webhook \
  --notify-on-complete \
  --notify-on-error
```

Webhook payload:

```
{
  "event": "crawl.completed",
  "timestamp": "2025-01-20T10:30:00Z",
  "data": {
    "url": "https://example.com",
    "stats": {
      "pages": 1523,
      "duration": 45.3
    }
  }
}
```

### 7. Proxy Support

[](#7-proxy-support)

Use proxies for crawling:

```
# Single proxy
php bin/sitemap \
  --url=https://example.com \
  --proxy=http://proxy.example.com:8080

# Proxy file with rotation
php bin/sitemap \
  --url=https://example.com \
  --proxy-file=./proxies.txt \
  --rotate-proxies
```

Proxy file format:

```
http://proxy1.example.com:8080
http://proxy2.example.com:8080|username:password
socks5://proxy3.example.com:1080

```

### 8. Smart Filtering

[](#8-smart-filtering)

Control what gets crawled:

```
php bin/sitemap \
  --url=https://example.com \
  --exclude="/admin/*,/test/*,*.pdf" \
  --include="/products/*,/blog/*" \
  --priority-rules='{"homepage":1.0,"/products/*":0.8}'
```

---

💻 Programmatic Usage
--------------------

[](#-programmatic-usage)

### Basic Example

[](#basic-example)

```
use IProDev\Sitemap\Fetcher;
use IProDev\Sitemap\Crawler;
use IProDev\Sitemap\SitemapWriter;
use IProDev\Sitemap\RobotsTxt;

$fetcher = new Fetcher(['concurrency' => 20]);
$robots = RobotsTxt::fromUrl('https://example.com', $fetcher);
$crawler = new Crawler($fetcher, $robots);

$pages = $crawler->crawl('https://example.com', 10000, 5);
$files = SitemapWriter::write($pages, './sitemaps');
```

### With Database &amp; Change Detection

[](#with-database--change-detection)

```
use IProDev\Sitemap\Database\Database;
use IProDev\Sitemap\ChangeDetector;

// Initialize database
$db = new Database('sqlite:./sitemap.db');
$db->createTables();

// Start crawl
$domain = 'example.com';
$crawlId = $db->startCrawl($domain, 'https://example.com', []);

// Crawl and save
foreach ($pages as $page) {
    $db->saveUrl($crawlId, $page);
}

// Detect changes
$prevCrawl = $db->getPreviousCrawl($domain, $crawlId);
if ($prevCrawl) {
    $detector = new ChangeDetector($db);
    $changes = $detector->detectChanges($prevCrawl['id'], $crawlId);

    print_r($changes);
}
```

### With SEO Analysis

[](#with-seo-analysis)

```
use IProDev\Sitemap\Analyzer\SeoAnalyzer;

$analyzer = new SeoAnalyzer();

foreach ($pages as $page) {
    $analysis = $analyzer->analyze(
        $page['url'],
        $page['html'],
        $page['status_code']
    );

    echo "Score: {$analysis['score']}/100\n";
    echo "Issues: " . count($analysis['issues']) . "\n";
}
```

### With Caching

[](#with-caching)

```
use IProDev\Sitemap\Cache\FileCache;
use IProDev\Sitemap\Cache\RedisCache;

// File cache
$cache = new FileCache('./cache', 3600);

// Redis cache
$cache = new RedisCache('127.0.0.1', 6379);

// Use in fetcher
$fetcher = new Fetcher(['cache' => $cache]);
```

---

⚙️ Configuration
----------------

[](#️-configuration)

### Configuration File

[](#configuration-file)

Create `sitemap.config.php`:

```
