PHPackages                             vipnytt/sitemapparser - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. vipnytt/sitemapparser

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

vipnytt/sitemapparser
=====================

XML Sitemap parser class compliant with the Sitemaps.org protocol.

v1.3.1(7mo ago)772.2M—2.3%3310MITPHPPHP ^5.6 || ^7.0 || ^8.0

Since Apr 4Pushed 7mo ago1 watchersCompare

[ Source](https://github.com/VIPnytt/SitemapParser)[ Packagist](https://packagist.org/packages/vipnytt/sitemapparser)[ Docs](https://github.com/VIPnytt/SitemapParser)[ RSS](/packages/vipnytt-sitemapparser/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (4)Versions (17)Used By (10)

[![Build Status](https://camo.githubusercontent.com/550f4c7f21cc25de4e114e65f5ba26375caaf728859ca31aa72022e12101b187/68747470733a2f2f7472617669732d63692e6f72672f5649506e7974742f536974656d61705061727365722e7376673f6272616e63683d6d6173746572)](https://travis-ci.org/VIPnytt/SitemapParser)[![Scrutinizer Code Quality](https://camo.githubusercontent.com/36bd1dfabbeb5974c295cf5c8d1172cb955d3c6bd68177d8af8e2a461bc93196/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f5649506e7974742f536974656d61705061727365722f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/VIPnytt/SitemapParser/?branch=master)[![Code Climate](https://camo.githubusercontent.com/9d6f666d2d61e0864e0d89808a5c80d98f9471abfa3a4a17746f76064ea7329d/68747470733a2f2f636f6465636c696d6174652e636f6d2f6769746875622f5649506e7974742f536974656d61705061727365722f6261646765732f6770612e737667)](https://codeclimate.com/github/VIPnytt/SitemapParser)[![Test Coverage](https://camo.githubusercontent.com/8ac56bea68286481cc3dbc104692e351b6036583377479770b12501bb9e71ad2/68747470733a2f2f636f6465636c696d6174652e636f6d2f6769746875622f5649506e7974742f536974656d61705061727365722f6261646765732f636f7665726167652e737667)](https://codeclimate.com/github/VIPnytt/SitemapParser/coverage)[![License](https://camo.githubusercontent.com/bbb6ded090ca71e7c814250fb68e302b09013c3177037866bf316465f4b107ee/68747470733a2f2f706f7365722e707567782e6f72672f5649506e7974742f536974656d61705061727365722f6c6963656e7365)](https://github.com/VIPnytt/SitemapParser/blob/master/LICENSE)[![Packagist](https://camo.githubusercontent.com/1b99a208f073a3713be913795e9209cf74750e908e9dfd6b505f0d9590f95ec0/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f5649506e7974742f536974656d61705061727365722e737667)](https://packagist.org/packages/VIPnytt/SitemapParser)[![Join the chat at https://gitter.im/VIPnytt/SitemapParser](https://camo.githubusercontent.com/1624f08eeb6c8a0f5deb52554952c875ff310b18350017e16d3a510de7209697/68747470733a2f2f6261646765732e6769747465722e696d2f5649506e7974742f536974656d61705061727365722e737667)](https://gitter.im/VIPnytt/SitemapParser)

XML Sitemap parser
==================

[](#xml-sitemap-parser)

An easy-to-use PHP library to parse XML Sitemaps compliant with the [Sitemaps.org protocol](http://www.sitemaps.org/protocol.html).

The [Sitemaps.org](http://www.sitemaps.org/) protocol is the leading standard and is supported by Google, Bing, Yahoo, Ask and many others.

[![SensioLabsInsight](https://camo.githubusercontent.com/1d1355b26766d117ce4ad4b295722e128031fbf47063860a61ed1c1bcf6b17e5/68747470733a2f2f696e73696768742e73656e73696f6c6162732e636f6d2f70726f6a656374732f32643366626434392d363663342d346162392d393030372d6161656563363935366433302f6269672e706e67)](https://insight.sensiolabs.com/projects/2d3fbd49-66c4-4ab9-9007-aaeec6956d30)

Features
--------

[](#features)

- Basic parsing
- Recursive parsing
- String parsing
- Custom User-Agent string
- Proxy support
- URL blacklist
- request throttling (using )
- retry (using [https://github.com/caseyamcl/guzzle\_retry\_middleware](https://github.com/caseyamcl/guzzle_retry_middleware))
- advanced logging (using [https://github.com/gmponos/guzzle\_logger](https://github.com/gmponos/guzzle_logger))

Formats supported
-----------------

[](#formats-supported)

- XML `.xml`
- Compressed XML `.xml.gz`
- Robots.txt rule sheet `robots.txt`
- Line separated text *(disabled by default)*

Requirements:
-------------

[](#requirements)

- PHP [5.6 or 7.0+](http://php.net/supported-versions.php), alternatively [HHVM](http://hhvm.com)
- PHP extensions:
    - [mbstring](http://php.net/manual/en/book.mbstring.php)
    - [libxml](http://php.net/manual/en/book.libxml.php) *(enabled by default)*
    - [SimpleXML](http://php.net/manual/en/book.simplexml.php) *(enabled by default)*
- Optional:
    - [https://github.com/caseyamcl/guzzle\_retry\_middleware](https://github.com/caseyamcl/guzzle_retry_middleware)
    -

Installation
------------

[](#installation)

The library is available for install via [Composer](https://getcomposer.org). Just add this to your `composer.json` file:

```
{
    "require": {
        "vipnytt/sitemapparser": "^1.0"
    }
}
```

Then run `composer update`.

Getting Started
---------------

[](#getting-started)

### Basic example

[](#basic-example)

Returns an list of URLs only.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser();
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getURLs() as $url => $tags) {
        echo $url . '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Advanced

[](#advanced)

Returns all available tags, for both Sitemaps and URLs.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parse('http://php.net/sitemap.xml');
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'Sitemap';
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo '';
    }
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '';
        echo 'Priority: ' . $tags['priority'] . '';
        echo '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Recursive

[](#recursive)

Parses any sitemap detected while parsing, to get an complete list of URLs.

Use `url_black_list` to skip sitemaps that are part of parent sitemap. Exact match only.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent');
    $parser->parseRecursive('http://www.google.com/robots.txt');
    echo 'Sitemaps';
    foreach ($parser->getSitemaps() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo '';
    }
    echo 'URLs';
    foreach ($parser->getURLs() as $url => $tags) {
        echo 'URL: ' . $url . '';
        echo 'LastMod: ' . $tags['lastmod'] . '';
        echo 'ChangeFreq: ' . $tags['changefreq'] . '';
        echo 'Priority: ' . $tags['priority'] . '';
        echo '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Parsing of line separated text strings

[](#parsing-of-line-separated-text-strings)

**Note:** This is **disabled by default** to avoid false positives when expecting XML, but fetches plain text instead.

To disable `strict` standards, simply pass this configuration to constructor parameter #2: `['strict' => false]`.

```
use vipnytt\SitemapParser;
use vipnytt\SitemapParser\Exceptions\SitemapParserException;

try {
    $parser = new SitemapParser('MyCustomUserAgent', ['strict' => false]);
    $parser->parse('https://www.xml-sitemaps.com/urllist.txt');
    foreach ($parser->getSitemaps() as $url => $tags) {
            echo $url . '';
    }
    foreach ($parser->getURLs() as $url => $tags) {
            echo $url . '';
    }
} catch (SitemapParserException $e) {
    echo $e->getMessage();
}
```

### Throttling

[](#throttling)

1. Install middleware:

```
composer require hamburgscleanest/guzzle-advanced-throttle
```

2. Define host rules:

```
$rules = new RequestLimitRuleset([
    'https://www.google.com' => [
        [
            'max_requests'     => 20,
            'request_interval' => 1
        ],
        [
            'max_requests'     => 100,
            'request_interval' => 120
        ]
    ]
]);
```

3. Create handler stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

4. Create middleware:

```
$throttle = new ThrottleMiddleware($rules);

 // Invoke the middleware
$stack->push($throttle());

// OR: alternatively call the handle method directly
$stack->push($throttle->handle());
```

5. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

6. Pass client as an argument or use `setClient` method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middle ware is available [here](https://github.com/hamburgscleanest/guzzle-advanced-throttle)

### Automatic retry

[](#automatic-retry)

1. Install middleware:

```
composer require caseyamcl/guzzle_retry_middleware
```

2. Create stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

3. Add middleware to the stack:

```
$stack->push(GuzzleRetryMiddleware::factory());
```

4. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

5. Pass client as an argument or use setClient method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middle ware is available [here](https://github.com/caseyamcl/guzzle_retry_middleware)

### Advanced logging

[](#advanced-logging)

1. Install middleware:

```
composer require gmponos/guzzle_logger
```

2. Create PSR-3 style logger

```
$logger = new Logger();
```

3. Create handler stack:

```
$stack = new HandlerStack();
$stack->setHandler(new CurlHandler());
```

5. Push logger middleware to stack

```
$stack->push(new LogMiddleware($logger));
```

6. Create client manually:

```
$client = new \GuzzleHttp\Client(['handler' => $stack]);
```

7. Pass client as an argument or use `setClient` method:

```
$parser = new SitemapParser();
$parser->setClient($client);
```

More details about this middleware config (like log levels, when to log and what to log) is available [here](https://github.com/gmponos/guzzle_logger)

### Additional examples

[](#additional-examples)

Even more examples available in the [examples](https://github.com/VIPnytt/SitemapParser/tree/master/examples) directory.

Configuration
-------------

[](#configuration)

Available configuration options, with their default values:

```
$config = [
    'strict' => true, // (bool) Disallow parsing of line-separated plain text
    'guzzle' => [
        // GuzzleHttp request options
        // http://docs.guzzlephp.org/en/latest/request-options.html
    ],
    // use this to ignore URL when parsing sitemaps that contain multiple other sitemaps. Exact match only.
    'url_black_list' => []
];
$parser = new SitemapParser('MyCustomUserAgent', $config);
```

*If an User-agent also is set using the GuzzleHttp request options, it receives the highest priority and replaces the other User-agent.*

###  Health Score

60

—

FairBetter than 99% of packages

Maintenance64

Regular maintenance activity

Popularity57

Moderate usage in the ecosystem

Community30

Small or concentrated contributor base

Maturity76

Established project with proven stability

 Bus Factor1

Top contributor holds 70.9% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~231 days

Recently: every ~266 days

Total

16

Last Release

221d ago

PHP version history (4 changes)v1.0.0PHP &gt;=5.5.9

v1.0.2PHP &gt;=5.6.0

1.0.3PHP ^5.6 || ^7.0

1.1.4PHP ^5.6 || ^7.0 || ^8.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/26994e517317bb178af65b50893ef46ad3a1e6a54fe1fdd8056aece47620a703?d=identicon)[JanPetterMG](/maintainers/JanPetterMG)

---

Top Contributors

[![JanPetterMG](https://avatars.githubusercontent.com/u/11933090?v=4)](https://github.com/JanPetterMG "JanPetterMG (39 commits)")[![GrzegorzDrozd](https://avatars.githubusercontent.com/u/1885137?v=4)](https://github.com/GrzegorzDrozd "GrzegorzDrozd (6 commits)")[![adamberryhuff](https://avatars.githubusercontent.com/u/15718660?v=4)](https://github.com/adamberryhuff "adamberryhuff (2 commits)")[![jszczypk](https://avatars.githubusercontent.com/u/409167?v=4)](https://github.com/jszczypk "jszczypk (1 commits)")[![madeITBelgium](https://avatars.githubusercontent.com/u/20304892?v=4)](https://github.com/madeITBelgium "madeITBelgium (1 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")[![schrojf](https://avatars.githubusercontent.com/u/8708895?v=4)](https://github.com/schrojf "schrojf (1 commits)")[![ThomasNicoullaud](https://avatars.githubusercontent.com/u/14220086?v=4)](https://github.com/ThomasNicoullaud "ThomasNicoullaud (1 commits)")[![vpominchuk](https://avatars.githubusercontent.com/u/4194395?v=4)](https://github.com/vpominchuk "vpominchuk (1 commits)")[![ao-jhelmich](https://avatars.githubusercontent.com/u/17061364?v=4)](https://github.com/ao-jhelmich "ao-jhelmich (1 commits)")[![heathstannard](https://avatars.githubusercontent.com/u/3271267?v=4)](https://github.com/heathstannard "heathstannard (1 commits)")

---

Tags

parsersitemapsitemaps-orgxmlxml-sitemap-parserxmlparserSitemaprobots.txtsitemaps.org

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/vipnytt-sitemapparser/health.svg)

```
[![Health](https://phpackages.com/badges/vipnytt-sitemapparser/health.svg)](https://phpackages.com/packages/vipnytt-sitemapparser)
```

###  Alternatives

[masterminds/html5

An HTML5 parser and serializer.

1.8k242.8M229](/packages/masterminds-html5)[imangazaliev/didom

Simple and fast HTML parser

2.2k2.3M64](/packages/imangazaliev-didom)[presta/sitemap-bundle

A Symfony bundle that provides tools to build your application sitemap.

3929.4M28](/packages/presta-sitemap-bundle)[orchestra/parser

XML Document Parser for Laravel and PHP

4581.7M5](/packages/orchestra-parser)[laravie/parser

XML Document Parser for PHP

2342.1M8](/packages/laravie-parser)[scotteh/php-dom-wrapper

Simple DOM wrapper to select nodes using either CSS or XPath expressions and manipulate results quickly and easily.

1471.9M10](/packages/scotteh-php-dom-wrapper)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
