PHPackages                             markuspoerschke/extractum - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. markuspoerschke/extractum

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

markuspoerschke/extractum
=========================

Extract information from web pages.

1.0.3(4y ago)41.1k1[6 PRs](https://github.com/markuspoerschke/extractum/pulls)MITPHPPHP ^7.4 || ^8.0

Since Dec 23Pushed 1y ago2 watchersCompare

[ Source](https://github.com/markuspoerschke/extractum)[ Packagist](https://packagist.org/packages/markuspoerschke/extractum)[ RSS](/packages/markuspoerschke-extractum/feed)WikiDiscussions 1.x Synced 3w ago

READMEChangelog (4)Dependencies (11)Versions (11)Used By (0)

Extractum
=========

[](#extractum)

*Extractum* is a PHP library that extracts information from web pages.

Getting Started
---------------

[](#getting-started)

### Installation

[](#installation)

```
composer require markuspoerschke/extractum
```

### Usage

[](#usage)

```
$uri = 'https://www.example.com/';
$html = file_get_contents($uri);

$extractor = new Extractum\Extractor();
$essence = $extractor->extract($html, $uri);
```

Extracted Information
---------------------

[](#extracted-information)

The extracted information are returned as an object of type `Extractum\Essence`.

PropertyDescription`date`The date when the web page was published.`description`Normally the meta description or any other excerpt.`image`The URL to the preview image. Normally defined as a Open Graph attribute.`language`The two character language code of the HTML tag.`links`All links within the main content.`parsedDate`A `DateTimeImmutable` object if `date``text`Unformatted text of the main content. All new lines and not needed spaces are removed.`title`The web pages’s title. This is normally the content of the first `h1` tag.License
-------

[](#license)

This package is released under the [MIT license](LICENSE).

###  Health Score

34

—

LowBetter than 75% of packages

Maintenance29

Infrequent updates — may be unmaintained

Popularity19

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity65

Established project with proven stability

 Bus Factor1

Top contributor holds 88.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~328 days

Total

5

Last Release

705d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/ea45298e271c0c0fc3c7869aff5f563d1e9c35be86c2505a4dda3f1da0004bc0?d=identicon)[markuspoerschke](/maintainers/markuspoerschke)

---

Top Contributors

[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (753 commits)")[![dependabot-preview[bot]](https://avatars.githubusercontent.com/in/2141?v=4)](https://github.com/dependabot-preview[bot] "dependabot-preview[bot] (63 commits)")[![markuspoerschke](https://avatars.githubusercontent.com/u/1222377?v=4)](https://github.com/markuspoerschke "markuspoerschke (31 commits)")[![markuspoerschke-bot](https://avatars.githubusercontent.com/u/79374170?v=4)](https://github.com/markuspoerschke-bot "markuspoerschke-bot (4 commits)")

---

Tags

extractorhacktoberfesthtml-parserinformation-extractionreadabilityextractorscraperreadability

###  Code Quality

TestsPHPUnit

Static AnalysisPsalm

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/markuspoerschke-extractum/health.svg)

```
[![Health](https://phpackages.com/badges/markuspoerschke-extractum/health.svg)](https://phpackages.com/packages/markuspoerschke-extractum)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M3.1k](/packages/craftcms-cms)[spatie/crawler

Crawl all internal links found on a website

2.8k18.5M67](/packages/spatie-crawler)[drupal/drupal-extension

Drupal extension for Behat

22215.7M173](/packages/drupal-drupal-extension)[drupal/core-dev

require-dev dependencies from drupal/drupal; use in addition to drupal/core-recommended to run tests from drupal/core.

2022.6M342](/packages/drupal-core-dev)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49617.1k](/packages/blackfire-player)[chameleon-system/chameleon-base

The Chameleon System core.

1028.6k5](/packages/chameleon-system-chameleon-base)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
