PHPackages                             ranvis/robots-txt-processor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Search &amp; Filtering](/categories/search)
4. /
5. ranvis/robots-txt-processor

ActiveLibrary[Search &amp; Filtering](/categories/search)

ranvis/robots-txt-processor
===========================

robots.txt filter and tester for untrusted source.

v1.0.2(4y ago)2221BSD-2-ClausePHPPHP &gt;=7.3.0

Since Sep 29Pushed 4y ago1 watchersCompare

[ Source](https://github.com/ranvis/robots-txt-processor)[ Packagist](https://packagist.org/packages/ranvis/robots-txt-processor)[ Docs](https://github.com/ranvis/robots-txt-processor/blob/master/README.md)[ RSS](/packages/ranvis-robots-txt-processor/feed)WikiDiscussions master Synced 2mo ago

READMEChangelogDependencies (1)Versions (4)Used By (0)

robots.txt filter and tester for untrusted source
=================================================

[](#robotstxt-filter-and-tester-for-untrusted-source)

- [Introduction](#introduction)
- [License](#license)
- [Installation](#installation)
- [Example Usage](#example-usage)
- [Implementation Notes](#implementation-notes)
    - [Setting user-agents](#setting-user-agents)
    - [Record separator](#record-separator)
    - [Case sensitivity](#case-sensitivity)
    - [Encoding conversion](#encoding-conversion)
    - [Features](#features)
- [Options](#options)
    - [`Tester` class options](#tester-class-options)
    - [`Filter` class options](#filter-class-options)
    - [`FilterParser` class options](#filterparser-class-options)
    - [`Parser` class options](#parser-class-options)
- [Interface](#interface)

Introduction
------------

[](#introduction)

robots-txt-processor is a tester with a filter for natural wild robots.txt data of the Internet. The module can filter like:

- Rules for other User-agents
- Rules that are too long
- Paths that contains too many wildcards
- Comments (inline or the whole line)

Also, it can for example:

- Parse line continuation (LWS,) although not used widely
- Identify misspelled `Useragent` directive
- Complement missing leading slash in a path

Tester module can process Allow/Disallow directives containing `*`/`$` meta characters. Alternatively, you can use the filter module alone and feed an output to another tester module as a single `User-agent: *` record with a non-group record (e.g. Sitemap.)

License
-------

[](#license)

BSD 2-Clause License

Installation
------------

[](#installation)

`composer require "ranvis/robots-txt-processor:^1.0"`

Example Usage
-------------

[](#example-usage)

```
require_once __DIR__ . '/vendor/autoload.php';

$source = "User-agent: *\nDisallow: /path";
$userAgents = 'MyBotIdentifier';
$tester = new \Ranvis\RobotsTxt\Tester();
$tester->setSource($source, $userAgents);
var_dump($tester->isAllowed('/path.html')); // false
```

`Tester->setSource(string)` is actually a shorthand of `Tester->setSource(RecordSet)`:

```
use Ranvis\RobotsTxt;

$source = "User-agent: *\nDisallow: /path";
$userAgents = 'MyBotIdentifier';
$filter = new RobotsTxt\Filter();
$filter->setUserAgents($userAgents);
$recordSet = $filter->getRecordSet($source);
$tester = new RobotsTxt\Tester();
$tester->setSource($recordSet);
var_dump($tester->isAllowed('/path.php')); // false
```

See [EXAMPLES.md](EXAMPLES.md) for more examples, including filter-only usage.

Implementation Notes
--------------------

[](#implementation-notes)

### Setting user-agents

[](#setting-user-agents)

When setting source, you can (optionally) pass user-agents like the examples above. If you pass a user-agent string or an array of strings, subsequent `Filter` will filter out unspecified user-agent records (aside from `*`.) While `Tester->isAllowed()` accepts user-agents, it should run faster to filter (with `Filter->setUserAgents()` or `Tester->setSource(source, userAgents)`) and call `Tester->isAllowed()` multiple times without specifying user-agents. (When an array of user-agent strings is passed, a user-agent specified earlier takes precedence when testing.)

### Record separator

[](#record-separator)

This parser ignores blank lines. Another record starts on User-agent lines after group member lines (i.e. `Disallow`/`Allow`.)

### Case sensitivity

[](#case-sensitivity)

`User-agent` value and directive names like `Disallow` are case-insensitive. `Filter` class normalizes directive names to First-character-uppercased form.

### Encoding conversion

[](#encoding-conversion)

This filter/tester themselves don't handle encoding conversion because it isn't needed. If a remote robots.txt uses some non-Unicode (specifically not UTF-8) encoding, URL path should be in that encoding too. The filter/tester safely work with any character or percent-encoded sequence which can result in invalid UTF-8. An exception is when a remote robots.txt uses any Unicode encoding with BOM. If this will ever happen, you will need to convert it to UTF-8 (without BOM) beforehand.

### Features

[](#features)

See [features/behaviors table](https://github.com/ranvis/robots-txt-processor-test/wiki/Features) of robots-txt-processor-test project.

Options
-------

[](#options)

Options can be specified in the first argument of constructors. Normally, the default values should suffice to filter potentially offensive input while preserving requested rules.

### `Tester` class options

[](#tester-class-options)

- `'respectOrder' => false,`

    If true, process path rules in their specified order. If false, longer path is processed first like Googlebot does.
- `'ignoreForbidden' => false,`

    If true, `setResponseCode()` with `401 Unauthorized` or `403 Forbidden` is treated as if no robots.txt existed, like Googlebot does, as opposed to robotstxt.org spec.
- `'escapedWildcard' => false,`

    If true, `%2A` in path line is treated as wildcard `*`. Normally you don't want to set this true for this class. See `Filter` class for some more information.

`Tester->setSource(string)` internally instantiates `Filter` with initially passed options and calls `Filter->getRecordSet(string)`.

### `Filter` class options

[](#filter-class-options)

- `'maxRecords' => 1000,`

    Maximum number of records (grouped rules) to parse. Any records thereafter will not be kept. Don't set too low or filter will give up before your user-agents. This limitation is only for parsing. Calling `setUserAgents()` limits what user-agents to keep.

`Filter->getRecordSet(string)` internally instantiates `FilterParser` with initially passed options.

### `FilterParser` class options

[](#filterparser-class-options)

- `'maxLines' => 1000,`

    Maximum number of lines to parse for each record (grouped or non-grouped). Any lines thereafter for the current record will not be kept.
- `'keepTrailingSpaces' => false,`

    If false, trailing spaces (including tabs) of line without comment is trimmed. For lines with comment, spaces before `#` are always trimmed. Retaining spaces is the requirement of both robotstxt.org and Google specs.
- `'maxWildcards' => 10,`

    Maximum number of non-repeated `*` in path to accept. If a path contains more than this, the rule itself will be ignored.
- `'escapedWildcard' => true,`

    If true, `%2A` in path line is treated as wildcard `*` and will be a subject to the limitation of `maxWildcards`. When using an external tester, don't set to false unless you are sure that your tester doesn't treat `%2A` that way (and this tester does not,) so that rules cannot circumvent `maxWildcards` limitation. (Testers listed as PeDecodeWildcard=yes in [feature test table](https://github.com/ranvis/robots-txt-processor-test/wiki/Features) should not change this flag.)
- `'complementLeadingSlash' => true,`

    If true and the path doesn't start with `/` or `*` (which must be a mistake,) `/` is prepended.
- `'pathMemberRegEx' => '/^(?:Dis)?Allow$/i',`

    A value of a directive matching this regex is treated as a path and configurations like `maxWildcards` are applied.

`FilterParser` extends `Parser` class.

### `Parser` class options

[](#parser-class-options)

- `'maxUserAgents' => 1000,`

    Maximum number of user-agents to parse. Any user-agents thereafter will be ignored and any new grouped records thereafter will be skipped.
- `'maxDirectiveLength' => 32,`

    Maximum number of characters for the directive. Any directives longer than this will be skipped. This must be at least 10 to parse `User-agent` directive. Increase if you need to keep custom long named directive value.
- `'maxNameLength' => 200,`

    Maximum number of characters for the `User-agent` value. Any user-agent names longer than this are truncated.
- `'maxValueLength' => 2000,`

    Maximum number of characters for the directive value. Any values longer than this will be changed to `-ignored-` directive with a value containing the original value length.
- `'userAgentRegEx' => '/^User-?agent$/i',`

    A directive matching this regex is treated as a `User-agent` directive.

Interface
---------

[](#interface)

- `new Tester(array $options = [])`
- `Tester->setSource($source, $userAgents = null)`
- `Tester->setResponseCode(int $code)`
- `Tester->isAllowed(string $targetPath, $userAgents = null)`
- `new Filter(array $options = [])`
- `Filter->setUserAgents($userAgents, bool $fallback = true) : RecordSet`
- `Filter->getRecordSet($source) : RecordSet`
- `new Parser(array $options = [])`
- `Parser->registerGroupDirective(string $directive)`
- `Parser->getRecordIterator($it) : \Traversable`
- `(string)RecordSet`
- `RecordSet->extract($userAgents = null)`
- `RecordSet->getRecord($userAgents = null, bool $dummy = true) : ?RecordSet`
- `RecordSet->getNonGroupRecord(bool $dummy = true) : ?RecordSet`
- `(string)Record`
- `Record->getValue(string $directive) : ?string`
- `Record->getValueIterator(string $directive) : \Traversable`

###  Health Score

27

—

LowBetter than 49% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity10

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity60

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~965 days

Total

3

Last Release

1581d ago

PHP version history (2 changes)1.0.0PHP &gt;=7.0.0

v1.0.2PHP &gt;=7.3.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/a136f6088fc1ff14838574ad97c7ed067fc44d5f623989ba9b537f72a2ebd0fd?d=identicon)[ranvis](/maintainers/ranvis)

---

Top Contributors

[![ranvis](https://avatars.githubusercontent.com/u/139048?v=4)](https://github.com/ranvis "ranvis (32 commits)")

---

Tags

robots-txtparserfiltertesterrobots.txt

### Embed Badge

![Health badge](/badges/ranvis-robots-txt-processor/health.svg)

```
[![Health](https://phpackages.com/badges/ranvis-robots-txt-processor/health.svg)](https://phpackages.com/packages/ranvis-robots-txt-processor)
```

###  Alternatives

[clue/stream-filter

A simple and modern approach to stream filtering in PHP

1.7k261.7M7](/packages/clue-stream-filter)[mjohnson/decoda

A lightweight lexical string parser for BBCode styled markup.

1971.3M12](/packages/mjohnson-decoda)[laminas/laminas-filter

Programmatically filter and normalize data and files

9428.0M150](/packages/laminas-laminas-filter)[netgen/query-translator

Query Translator is a search query translator with AST representation

2042.0M6](/packages/netgen-query-translator)[friendsofcake/search

CakePHP Search plugin using PRG pattern

1742.0M37](/packages/friendsofcake-search)[outl1ne/nova-input-filter

An input filter for Laravel Nova

24822.7k](/packages/outl1ne-nova-input-filter)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
