PHPackages                             pforret/pf-article-extractor - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. pforret/pf-article-extractor

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

pforret/pf-article-extractor
============================

PfArticleExtractor. Boilerplate Removal and Fulltext Extraction from HTML pages

0.3.4(2mo ago)41.3k↓25%MITHTMLPHP ^8.2CI failing

Since Jun 2Pushed 2mo agoCompare

[ Source](https://github.com/pforret/pf-article-extractor)[ Packagist](https://packagist.org/packages/pforret/pf-article-extractor)[ RSS](/packages/pforret-pf-article-extractor/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (3)Dependencies (3)Versions (18)Used By (0)

pforret/pf-article-extractor
============================

[](#pforretpf-article-extractor)

[![Tests](https://github.com/pforret/pf-article-extractor/actions/workflows/run-tests.yml/badge.svg)](https://github.com/pforret/pf-article-extractor/actions)[![GitHub Release](https://camo.githubusercontent.com/09e24494c31187ce21823cf6b82890e38492282e65e1f45025b010e35b7ba7bc/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f70666f727265742f70662d61727469636c652d657874726163746f72)](https://camo.githubusercontent.com/09e24494c31187ce21823cf6b82890e38492282e65e1f45025b010e35b7ba7bc/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f72656c656173652f70666f727265742f70662d61727469636c652d657874726163746f72)[![GitHub Tag](https://camo.githubusercontent.com/f0258d1e5e2e6ef3511cd560bb5ee80e3feef9584e47c009b5f34a546fffa7df/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f70666f727265742f70662d61727469636c652d657874726163746f72)](https://camo.githubusercontent.com/f0258d1e5e2e6ef3511cd560bb5ee80e3feef9584e47c009b5f34a546fffa7df/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f70666f727265742f70662d61727469636c652d657874726163746f72)[![GitHub commit activity](https://camo.githubusercontent.com/ea1bc6a4dc048bb2477a0f486180ddf5d286f257ebbef22b9dafd207662d996a/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f636f6d6d69742d61637469766974792f792f70666f727265742f70662d61727469636c652d657874726163746f72)](https://camo.githubusercontent.com/ea1bc6a4dc048bb2477a0f486180ddf5d286f257ebbef22b9dafd207662d996a/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f636f6d6d69742d61637469766974792f792f70666f727265742f70662d61727469636c652d657874726163746f72)[![Packagist Downloads](https://camo.githubusercontent.com/9ef185fd9ceecd304f5a351cb93a12d1ba524d3de485bff67c5913aa4327e944/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f70666f727265742f70662d61727469636c652d657874726163746f72)](https://packagist.org/packages/pforret/pf-article-extractor)[![PHP](https://camo.githubusercontent.com/d147c5bc4bc4956647b06838470b687d65dbf54cc4a8f098a14b8ecb66ff6148/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d2532333737374242342e7376673f6c6f676f3d706870266c6f676f436f6c6f723d7768697465)](https://camo.githubusercontent.com/d147c5bc4bc4956647b06838470b687d65dbf54cc4a8f098a14b8ecb66ff6148/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d2532333737374242342e7376673f6c6f676f3d706870266c6f676f436f6c6f723d7768697465)[![GitHub License](https://camo.githubusercontent.com/19e795194d73111f0407a94ade3d96e5aaefd1ae810f7f0a4848fd926970eabe/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f70666f727265742f70662d61727469636c652d657874726163746f72)](https://camo.githubusercontent.com/19e795194d73111f0407a94ade3d96e5aaefd1ae810f7f0a4848fd926970eabe/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c6963656e73652f70666f727265742f70662d61727469636c652d657874726163746f72)

[![](assets/unsplash.squeeze.jpg)](assets/unsplash.squeeze.jpg)

Boilerplate Removal and Fulltext Extraction from HTML pages.

Rewrite of `dotpack/php-boiler-pipe` for PHP8.2 and up, with tests.

Installation
------------

[](#installation)

```
composer require pforret/pf-article-extractor
```

Usage
-----

[](#usage)

```
use Pforret\PfArticleExtractor\ArticleExtractor;

$articleData = ArticleExtractor::getArticle($html);
/*
 * $articleData = Pforret\PfArticleExtractor\Formats\ArticleContentsDTO Object
(
    [title] => Film Podcast: Wicked Little Letters Named Film of the Month
    [content] => UK Film Club was back in March with a new episode of their film podcast. (...)
    [date] =>
    [images] => Array
        (
            [0] => https://static.wixstatic.com/media/.../b19cd0_dde0d59546f84127865267f43994f39b~mv2.jpg
        )

    [links] => Array
        (
            [0] => https://www.chrisolson.co.uk/
            (...)
        )

)

 */
```

Under the hood
--------------

[](#under-the-hood)

- package accepts a full HTML page as input
- it will walk the DOM tree and try to find the main article content
- it will remove boilerplate content (like headers, footers, sidebars, ...)
- it will try to extract the main article content
- it will try to extract the title, date, images and links from the article

Rights now it's tested with example pages for

- Blogger
- Drupal
- Jekyll
- Mkdocs
- Wix
- WordPress

Similar packages
----------------

[](#similar-packages)

- [beautifulsoup4](https://pypi.org/project/beautifulsoup4/) - Python, MIT
- [html-text](https://pypi.org/project/html-text/) - Python, MIT
- [kohlschutter/boilerpipe](https://github.com/kohlschutter/boilerpipe) - Java, Apache 2.0
- [fivefilters/readability.php](https://github.com/fivefilters/readability.php) - PHP, GPL-3.0
- [miso-belica/jusText](https://github.com/miso-belica/jusText) - Python, BSD2
- [codelucas/newspaper](https://github.com/codelucas/newspaper) - Python, Apache

###  Health Score

45

—

FairBetter than 92% of packages

Maintenance84

Actively maintained with recent releases

Popularity23

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity51

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 91.1% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~41 days

Recently: every ~155 days

Total

16

Last Release

83d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/474312?v=4)[Peter Forret](/maintainers/pforret)[@pforret](https://github.com/pforret)

---

Top Contributors

[![pforret](https://avatars.githubusercontent.com/u/474312?v=4)](https://github.com/pforret "pforret (51 commits)")[![dotpack](https://avatars.githubusercontent.com/u/1175814?v=4)](https://github.com/dotpack "dotpack (3 commits)")[![claude](https://avatars.githubusercontent.com/u/81847?v=4)](https://github.com/claude "claude (1 commits)")[![doersino](https://avatars.githubusercontent.com/u/1944410?v=4)](https://github.com/doersino "doersino (1 commits)")

---

Tags

extractorhtmlphphtmlextract

###  Code Quality

TestsPHPUnit

Code StyleLaravel Pint

### Embed Badge

![Health badge](/badges/pforret-pf-article-extractor/health.svg)

```
[![Health](https://phpackages.com/badges/pforret-pf-article-extractor/health.svg)](https://phpackages.com/packages/pforret-pf-article-extractor)
```

###  Alternatives

[spatie/laravel-html

A fluent html builder

8376.4M72](/packages/spatie-laravel-html)[ckeditor/ckeditor

JavaScript WYSIWYG web text editor.

5234.2M76](/packages/ckeditor-ckeditor)[caxy/php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML.

21520.9M15](/packages/caxy-php-htmldiff)[yajra/laravel-datatables-html

Laravel DataTables HTML builder plugin

2899.6M48](/packages/yajra-laravel-datatables-html)[wa72/htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler

3383.9M34](/packages/wa72-htmlpagedom)[tinymce/tinymce

Web based JavaScript HTML WYSIWYG editor control.

1697.5M105](/packages/tinymce-tinymce)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
