PHPackages                             vipnytt/sitemapparser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. vipnytt/sitemapparser

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

vipnytt/sitemapparser
=====================

XML Sitemap parser class compliant with the Sitemaps.org protocol.

v1.3.1(8mo ago)832.3M↓38.8%3310MITPHPPHP ^5.6 || ^7.0 || ^8.0CI failing

Since Apr 4Pushed 8mo ago1 watchersCompare

[ Source](https://github.com/VIPnytt/SitemapParser)[ Packagist](https://packagist.org/packages/vipnytt/sitemapparser)[ Docs](https://github.com/VIPnytt/SitemapParser)[ RSS](/packages/vipnytt-sitemapparser/feed)WikiDiscussions master Synced 3d ago

READMEChangelog (10)Dependencies (4)Versions (17)Used By (10)

[![Build Status](https://camo.githubusercontent.com/550f4c7f21cc25de4e114e65f5ba26375caaf728859ca31aa72022e12101b187/68747470733a2f2f7472617669732d63692e6f72672f5649506e7974742f536974656d61705061727365722e7376673f6272616e63683d6d6173746572)](https://travis-ci.org/VIPnytt/SitemapParser)[![Scrutinizer Code Quality](https://camo.githubusercontent.com/36bd1dfabbeb5974c295cf5c8d1172cb955d3c6bd68177d8af8e2a461bc93196/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f5649506e7974742f536974656d61705061727365722f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/VIPnytt/SitemapParser/?branch=master)[![Code Climate](https://camo.githubusercontent.com/9d6f666d2d61e0864e0d89808a5c80d98f9471abfa3a4a17746f76064ea7329d/68747470733a2f2f636f6465636c696d6174652e636f6d2f6769746875622f5649506e7974742f536974656d61705061727365722f6261646765732f6770612e737667)](https://codeclimate.com/github/VIPnytt/SitemapParser)[![Test Coverage](https://camo.githubusercontent.com/8ac56bea68286481cc3dbc104692e351b6036583377479770b12501bb9e71ad2/68747470733a2f2f636f6465636c696d6174652e636f6d2f6769746875622f5649506e7974742f536974656d61705061727365722f6261646765732f636f7665726167652e737667)](https://codeclimate.com/github/VIPnytt/SitemapParser/coverage)[![License](https://camo.githubusercontent.com/bbb6ded090ca71e7c814250fb68e302b09013c3177037866bf316465f4b107ee/68747470733a2f2f706f7365722e707567782e6f72672f5649506e7974742f536974656d61705061727365722f6c6963656e7365)](https://github.com/VIPnytt/SitemapParser/blob/master/LICENSE)[![Packagist](https://camo.githubusercontent.com/1b99a208f073a3713be913795e9209cf74750e908e9dfd6b505f0d9590f95ec0/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f5649506e7974742f536974656d61705061727365722e737667)](https://packagist.org/packages/VIPnytt/SitemapParser)[![Join the chat at https://gitter.im/VIPnytt/SitemapParser](https://camo.githubusercontent.com/1624f08eeb6c8a0f5deb52554952c875ff310b18350017e16d3a510de7209697/68747470733a2f2f6261646765732e6769747465722e696d2f5649506e7974742f536974656d61705061727365722e737667)](https://gitter.im/VIPnytt/SitemapParser)

XML Sitemap parser
==================

[](#xml-sitemap-parser)

An easy-to-use PHP library to parse XML Sitemaps compliant with the [Sitemaps.org protocol](http://www.sitemaps.org/protocol.html).

The [Sitemaps.org](http://www.sitemaps.org/) protocol is the leading standard and is supported by Google, Bing, Yahoo, Ask and many others.

[![SensioLabsInsight](https://camo.githubusercontent.com/1d1355b26766d117ce4ad4b295722e128031fbf47063860a61ed1c1bcf6b17e5/68747470733a2f2f696e73696768742e73656e73696f6c6162732e636f6d2f70726f6a656374732f32643366626434392d363663342d346162392d393030372d6161656563363935366433302f6269672e706e67)](https://insight.sensiolabs.com/projects/2d3fbd49-66c4-4ab9-9007-aaeec6956d30)

Features
--------

[](#features)

- Basic parsing
- Recursive parsing
- String parsing
- Custom User-Agent string
- Proxy support
- URL blacklist
- request throttling (using )
- retry (using [https://github.com/caseyamcl/guzzle\_retry\_middleware](https://github.com/caseyamcl/guzzle_retry_middleware))
- advanced logging (using [https://github.com/gmponos/guzzle\_logger](https://github.com/gmponos/guzzle_logger))

Formats supported
-----------------

[](#formats-supported)

- XML `.xml`
- Compressed XML `.xml.gz`
- Robots.txt rule sheet `robots.txt`
- Line separated text *(disabled by default)*

Requirements:
-------------

[](#requirements)

- PHP [5.6 or 7.0+](http://php.net/supported-versions.php), alternatively [HHVM](http://hhvm.com)
- PHP extensions:
    - [mbstring](http://php.net/manual/en/book.mbstring.php)
    - [libxml](http://php.net/manual/en/book.libxml.php) *(enabled by default)*
    - [SimpleXML](http://php.net/manual/en/book.simplexml.php) *(enabled by default)*
- Optional:
    - [https://github.com/caseyamcl/guzzle\_retry\_middleware](https://github.com/caseyamcl/guzzle_retry_middleware)
    -

Installation
------------

[](#installation)

The library is available for install via [Composer](https://getcomposer.org). Just add this to your `composer.json` file:

```
{
    "require": {
        "vipnytt/sitemapparser": "^1.0"
    }
}
```

Then run `composer update`.

Getting Started
---------------

[](#getting-started)

### Basic example

[](#basic-example)

Returns an list of URLs only.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser();
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getURLs() as $url => $tags) {
        echo $url . '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Advanced

[](#advanced)

Returns all available tags, for both Sitemaps and URLs.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'Sitemap';
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo '';
    }
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '';
        echo 'Priority: ' . $tags['priority'] . '';
        echo '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Recursive

[](#recursive)

Parses any sitemap detected while parsing, to get an complete list of URLs.

Use `url_black_list` to skip sitemaps that are part of parent sitemap. Exact match only.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parseRecursive('http://www.google.com/robots.txt');
    echo 'Sitemaps';
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo '';
    }
    echo 'URLs';
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '';
        echo 'Priority: ' . $tags['priority'] . '';
        echo '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Parsing of line separated text strings

[](#parsing-of-line-separated-text-strings)

**Note:** This is **disabled by default** to avoid false positives when expecting XML, but fetches plain text instead.

To disable `strict` standards, simply pass this configuration to constructor parameter #2: `['strict' => false]`.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent', ['strict' => false]);
    $parser->parse('https://www.xml-sitemaps.com/urllist.txt');
    foreach ($parser->getSitemaps() as $url => $tags) {
            echo $url . '';
    }
    foreach ($parser->getURLs() as $url => $tags) {
            echo $url . '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Throttling

[](#throttling)

1. Install middleware:

```
composer require hamburgscleanest/guzzle-advanced-throttle
```

2. Define host rules:

```
$rules = new RequestLimitRuleset([
    'https://www.google.com' => [
        [
            'max_requests'     => 20,
            'request_interval' => 1
        ],
        [
            'max_requests'     => 100,
            'request_interval' => 120
        ]
    ]
]);
```

3. Create handler stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

4. Create middleware:

```
$throttle = new ThrottleMiddleware($rules);

 // Invoke the middleware
$stack->push($throttle());

// OR: alternatively call the handle method directly
$stack->push($throttle->handle());
```

5. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

6. Pass client as an argument or use `setClient` method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middle ware is available [here](https://github.com/hamburgscleanest/guzzle-advanced-throttle)

### Automatic retry

[](#automatic-retry)

1. Install middleware:

```
composer require caseyamcl/guzzle_retry_middleware
```

2. Create stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

3. Add middleware to the stack:

```
$stack->push(GuzzleRetryMiddleware::factory());
```

4. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

5. Pass client as an argument or use setClient method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middle ware is available [here](https://github.com/caseyamcl/guzzle_retry_middleware)

### Advanced logging

[](#advanced-logging)

1. Install middleware:

```
composer require gmponos/guzzle_logger
```

2. Create PSR-3 style logger

```
$logger = new Logger();
```

3. Create handler stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

5. Push logger middleware to stack

```
$stack->push(new LogMiddleware($logger));
```

6. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

7. Pass client as an argument or use `setClient` method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middleware config (like log levels, when to log and what to log) is available [here](https://github.com/gmponos/guzzle_logger)

### Additional examples

[](#additional-examples)

Even more examples available in the [examples](https://github.com/VIPnytt/SitemapParser/tree/master/examples) directory.

Configuration
-------------

[](#configuration)

Available configuration options, with their default values:

```
$config = [
    'strict' => true, // (bool) Disallow parsing of line-separated plain text
    'guzzle' => [
        // GuzzleHttp request options
        // http://docs.guzzlephp.org/en/latest/request-options.html
    ],
    // use this to ignore URL when parsing sitemaps that contain multiple other sitemaps. Exact match only.
    'url_black_list' => []
];
$parser = new SitemapParser('MyCustomUserAgent', $config);
```

*If an User-agent also is set using the GuzzleHttp request options, it receives the highest priority and replaces the other User-agent.*

###  Health Score

59

—

FairBetter than 98% of packages

Maintenance58

Moderate activity, may be stable

Popularity57

Moderate usage in the ecosystem

Community30

Small or concentrated contributor base

Maturity76

Established project with proven stability

 Bus Factor1

Top contributor holds 70.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~231 days

Recently: every ~266 days

Total

16

Last Release

267d ago

PHP version history (4 changes)v1.0.0PHP &gt;=5.5.9

v1.0.2PHP &gt;=5.6.0

1.0.3PHP ^5.6 || ^7.0

1.1.4PHP ^5.6 || ^7.0 || ^8.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/26994e517317bb178af65b50893ef46ad3a1e6a54fe1fdd8056aece47620a703?d=identicon)[JanPetterMG](/maintainers/JanPetterMG)

---

Top Contributors

[![JanPetterMG](https://avatars.githubusercontent.com/u/11933090?v=4)](https://github.com/JanPetterMG "JanPetterMG (39 commits)")[![GrzegorzDrozd](https://avatars.githubusercontent.com/u/1885137?v=4)](https://github.com/GrzegorzDrozd "GrzegorzDrozd (6 commits)")[![adamberryhuff](https://avatars.githubusercontent.com/u/15718660?v=4)](https://github.com/adamberryhuff "adamberryhuff (2 commits)")[![jszczypk](https://avatars.githubusercontent.com/u/409167?v=4)](https://github.com/jszczypk "jszczypk (1 commits)")[![madeITBelgium](https://avatars.githubusercontent.com/u/20304892?v=4)](https://github.com/madeITBelgium "madeITBelgium (1 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")[![schrojf](https://avatars.githubusercontent.com/u/8708895?v=4)](https://github.com/schrojf "schrojf (1 commits)")[![ThomasNicoullaud](https://avatars.githubusercontent.com/u/14220086?v=4)](https://github.com/ThomasNicoullaud "ThomasNicoullaud (1 commits)")[![vpominchuk](https://avatars.githubusercontent.com/u/4194395?v=4)](https://github.com/vpominchuk "vpominchuk (1 commits)")[![ao-jhelmich](https://avatars.githubusercontent.com/u/17061364?v=4)](https://github.com/ao-jhelmich "ao-jhelmich (1 commits)")[![heathstannard](https://avatars.githubusercontent.com/u/3271267?v=4)](https://github.com/heathstannard "heathstannard (1 commits)")

---

Tags

parsersitemapsitemaps-orgxmlxml-sitemap-parserxmlparserSitemaprobots.txtsitemaps.org

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/vipnytt-sitemapparser/health.svg)

```
[![Health](https://phpackages.com/badges/vipnytt-sitemapparser/health.svg)](https://phpackages.com/packages/vipnytt-sitemapparser)
```

###  Alternatives

[aws/aws-sdk-php

AWS SDK for PHP - Use Amazon Web Services in your PHP project

6.3k543.5M2.6k](/packages/aws-aws-sdk-php)[masterminds/html5

An HTML5 parser and serializer.

1.8k269.7M322](/packages/masterminds-html5)[neuron-core/neuron-ai

The PHP Agentic Framework.

2.0k656.1k38](/packages/neuron-core-neuron-ai)[imangazaliev/didom

Simple and fast HTML parser

2.2k2.5M72](/packages/imangazaliev-didom)[sabre/xml

sabre/xml is an XML library that you may not hate.

55234.6M140](/packages/sabre-xml)[presta/sitemap-bundle

A Symfony bundle that provides tools to build your application sitemap.

39610.0M37](/packages/presta-sitemap-bundle)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
