PHPackages                             magebitcom/pii-redactor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. magebitcom/pii-redactor

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

magebitcom/pii-redactor
=======================

Framework-agnostic PII detection and redaction for text strings

v1.0.0(today)03↑2900%MITPHPPHP ^8.1

Since Jun 10Pushed todayCompare

[ Source](https://github.com/magebitcom/pii-redactor)[ Packagist](https://packagist.org/packages/magebitcom/pii-redactor)[ RSS](/packages/magebitcom-pii-redactor/feed)WikiDiscussions master Synced today

READMEChangelog (1)Dependencies (2)Versions (2)Used By (0)

PII Redactor
============

[](#pii-redactor)

Framework-agnostic PHP library for detecting and redacting PII in text. No runtime dependencies. PHP 8.1+.

Install
-------

[](#install)

```
composer require magebitcom/pii-redactor
```

Quick start
-----------

[](#quick-start)

```
use Magebit\PiiRedactor\PiiRedactor;

$redactor = new PiiRedactor();

echo $redactor->redact('Mail john@example.com, card 4111 1111 1111 1111')->text();
// Mail [EMAIL], card [CREDIT_CARD]
```

Detect without redacting
------------------------

[](#detect-without-redacting)

```
foreach ($redactor->analyze($text)->matches() as $match) {
    printf("%s [%d, %d) score %.2f via %s\n",
        $match->entityType, $match->start, $match->end, $match->score, $match->detectorName);
}
```

Built-in detectors
------------------

[](#built-in-detectors)

EMAIL, PHONE, CREDIT\_CARD (Luhn-validated), IBAN (mod-97-validated), IP\_ADDRESS (v4/v6), MAC\_ADDRESS, URL, CRYPTO\_ADDRESS (BTC/ETH), DATE\_OF\_BIRTH.

Each detector lives in its own class under `Detector\Builtin\` (regexes, validator and sanitize config), wired together by `BuiltinDetectors`.

Checksum validation is tri-state: a passing checksum forces confidence to 1.0, a failing one discards the match, otherwise the pattern's base score applies.

### Context boosting

[](#context-boosting)

Weak patterns (a bare 8-digit phone number, an ambiguous date) carry a low base score on purpose. They only cross the reporting threshold (default 0.5) when a context word appears within ~40 characters before the match — `+0.35` to the score, shown in the match's `explanation` (e.g. `context "phone" (+0.35)`). This is the library's primary false-positive control: without it, weak patterns either flood with noise at a high base score or stay undetectable.

The library ships a single English (`en`) context-word pack. To recognise terms in other languages, or words specific to your domain, extend the default pack with your own words — no fork required. `ContextWords` is immutable; `withWords()`merges onto an entity type's existing list (de-duplicated) and returns a copy:

```
use Magebit\PiiRedactor\Context\ContextWords;
use Magebit\PiiRedactor\Detector\DetectorRegistry;
use Magebit\PiiRedactor\EntityType;

// English context words only (default)
$redactor = new PiiRedactor();

// Extend an existing entity type (e.g. Latvian phone words) and add a custom one
$words = ContextWords::default()
    ->withWords(EntityType::PHONE, ['tālrunis', 'telefons'])   // built-in type
    ->withWords('EMPLOYEE_ID', ['employee', 'staff']);          // custom type (string)

$redactor = new PiiRedactor(DetectorRegistry::withDefaults($words));
```

Pass `ContextWords::none()` for a pack with no context words at all. For full control over a single detector's words, `Builtin\PhoneDetector::create($words)`still accepts a plain `string[]`.

**Opting out of boosting.** To score matches purely on their patterns, inject the no-op enhancer (or build detectors without context words):

```
use Magebit\PiiRedactor\Analyzer;
use Magebit\PiiRedactor\NullContextEnhancer;
use Magebit\PiiRedactor\Detector\DetectorRegistry;

$analyzer = new Analyzer(DetectorRegistry::withDefaults(), new NullContextEnhancer());
```

With boosting disabled, weak detectors (bare phone/date forms) stay below the default threshold and are not reported; strong detectors (email, validated card or IBAN) are unaffected.

Per-entity strategies
---------------------

[](#per-entity-strategies)

```
use Magebit\PiiRedactor\RedactionConfig;
use Magebit\PiiRedactor\Strategy\MaskStrategy;
use Magebit\PiiRedactor\Strategy\ReplaceStrategy;

$config = new RedactionConfig(
    ['CREDIT_CARD' => new MaskStrategy('*', 4, true)],   // **** -> ************1111
    new ReplaceStrategy(''),            // default for everything else
);

$redactor = new PiiRedactor(config: $config);
```

Options
-------

[](#options)

```
use Magebit\PiiRedactor\AnalyzerOptions;

$options = new AnalyzerOptions(
    entityTypes: ['EMAIL', 'CREDIT_CARD'],   // restrict types
    minScore: 0.6,                           // raise threshold
    allowList: ['support@magebit.com'],      // never redact these literals
);
```

Custom detectors
----------------

[](#custom-detectors)

```
use Magebit\PiiRedactor\Detector\DetectorRegistry;
use Magebit\PiiRedactor\Detector\Pattern;
use Magebit\PiiRedactor\Detector\PatternDetector;

$registry = DetectorRegistry::withDefaults();
$registry->register(new PatternDetector('employee-id', 'EMPLOYEE_ID', [
    new Pattern('emp', '/\bEMP-\d{6}\b/u', 0.9, ['EMP-']),
]));

$redactor = new PiiRedactor($registry);
```

The optional fourth `Pattern` argument is a list of `requiredNeedles`: cheap literal substrings the engine checks with `str_contains` before running the regex. If none are present the regex is skipped entirely. Every needle must be guaranteed to appear in any real match (here, `EMP-` is part of the pattern), or matches will be lost. The built-in EMAIL/URL/CRYPTO/PHONE detectors use this to skip their regexes on the (common) lines that contain no `@`, `://`, `0x`, etc.

For ML-grade name/location detection, extend `Detector\RemoteDetector` and call your NER provider (Google DLP, AWS Comprehend, a Presidio sidecar); it handles chunking and fail-open/fail-closed behavior.

Performance &amp; logging
-------------------------

[](#performance--logging)

Logging is often on a blocking hot path, so per-call latency matters. **Construct `PiiRedactor` once and reuse it** — do not build a new instance per log record. Construction wires up 9 detectors and their context-word packs, which costs about as much as a full short-line redaction; reusing the instance roughly halves per-call latency.

Measured on PHP 8.4 (Xdebug off), reusing a single instance:

Scenarioµs/opClean short log line (no PII)~5.3Short line with one email~5.9`new PiiRedactor()` per call + redact~11.0So a naive `(new PiiRedactor())->redact($line)` inside a Monolog processor is ~2x slower than holding one instance on the processor. For a Monolog processor, build the redactor in the constructor and call `redact()` in `__invoke()`.

A throwaway benchmark harness lives at `tools/benchmark.php`(`php -dxdebug.mode=off tools/benchmark.php`).

Guarantees &amp; limits
-----------------------

[](#guarantees--limits)

- Byte offsets, UTF-8 safe; all bundled regexes use the `u` modifier.
- A failing detector never breaks redaction (reported in `failures()`, strict mode available).
- No regex-based detector can promise 100% recall — treat this as defense in depth, not a compliance guarantee.

###  Health Score

40

—

FairBetter than 86% of packages

Maintenance100

Actively maintained with recent releases

Popularity4

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity42

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

0d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/10514036?v=4)[magebit](/maintainers/magebit)[@Magebit](https://github.com/Magebit)

---

Top Contributors

[![KristofersOzolinsMagebit](https://avatars.githubusercontent.com/u/58505474?v=4)](https://github.com/KristofersOzolinsMagebit "KristofersOzolinsMagebit (1 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/magebitcom-pii-redactor/health.svg)

```
[![Health](https://phpackages.com/badges/magebitcom-pii-redactor/health.svg)](https://phpackages.com/packages/magebitcom-pii-redactor)
```

###  Alternatives

[mhor/php-mediainfo

PHP wrapper around the mediainfo command

120589.5k7](/packages/mhor-php-mediainfo)[gpolguere/path-to-regexp-php

PHP port of https://github.com/component/path-to-regexp

2211.8k2](/packages/gpolguere-path-to-regexp-php)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
