PHPackages                             dealnews/metadata - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. dealnews/metadata

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

dealnews/metadata
=================

Extracts meta data (using oembed, opengraph, twitter-cards, scrapping the html, etc) from web pages

0.1.0(2mo ago)1302↓33.3%BSD-3-ClausePHPPHP ^8.2CI passing

Since Feb 25Pushed 2mo agoCompare

[ Source](https://github.com/dealnews/metadata)[ Packagist](https://packagist.org/packages/dealnews/metadata)[ RSS](/packages/dealnews-metadata/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (1)Dependencies (2)Versions (2)Used By (0)

DealNews Metadata Extractor
===========================

[](#dealnews-metadata-extractor)

A PHP library for extracting rich metadata from web pages using multiple strategies: oEmbed, OpenGraph, JSON-LD, Twitter Cards, and HTML fallbacks.

Features
--------

[](#features)

- **Multi-Source Extraction**: Pulls metadata from oEmbed, OpenGraph, JSON-LD, Twitter Cards, and standard HTML
- **Priority-Based Merging**: Higher-quality sources take precedence (oEmbed → OpenGraph → JSON-LD → Twitter → HTML)
- **Flexible Input**: Fetch URLs directly or process pre-fetched HTML content
- **Typed Value Object**: Returns structured `Metadata` object with typed properties
- **Known Provider Support**: Fast-path for popular platforms (YouTube, Vimeo, Twitter/X, Instagram, etc.)
- **oEmbed Discovery**: Automatic discovery via HTML link tags
- **Configurable Error Handling**: Choose between exceptions or graceful degradation

Requirements
------------

[](#requirements)

- PHP 8.2 or higher
- ext-dom (included with PHP)
- ext-json (included with PHP)
- Guzzle 7.0+ (for HTTP requests)

Installation
------------

[](#installation)

```
composer require dealnews/metadata
```

Basic Usage
-----------

[](#basic-usage)

### Extract from URL

[](#extract-from-url)

```
use DealNews\Metadata\MetadataExtractor;

$extractor = new MetadataExtractor();
$metadata = $extractor->extract('https://example.com/article');

echo $metadata->title;        // "Article Title"
echo $metadata->description;  // "Article description..."
echo $metadata->image_url;    // "https://example.com/image.jpg"
echo $metadata->author;       // "John Doe"
```

### Extract from HTML

[](#extract-from-html)

```
$extractor = new MetadataExtractor();
$html = 'My Page...';
$metadata = $extractor->extract($html, false);  // false = not a URL
```

### Configuration Options

[](#configuration-options)

```
$extractor = new MetadataExtractor([
    'throw_on_http_error' => true,   // Throw exceptions on HTTP failures
    'http_timeout'        => 15,     // Request timeout in seconds
    'user_agent'          => 'MyBot/1.0',  // Custom user agent
]);
```

Metadata Fields
---------------

[](#metadata-fields)

The `Metadata` object contains the following properties:

PropertyTypeDescription`title``?string`Page title`description``?string`Page description`url``?string`Canonical URL`image_url``?string`Primary image URL`image_width``?int`Image width in pixels`image_height``?int`Image height in pixels`type``?string`Content type (article, video, etc.)`site_name``?string`Name of the website/publisher`author``?string`Author name`published_time``?string`Publication date/time (ISO 8601)`modified_time``?string`Last modified date/time (ISO 8601)`oembed_html``?string`Embedded HTML from oEmbed`oembed_type``?string`oEmbed type (video, photo, rich, link)All fields are nullable and will be `null` if not found.

Extraction Priority
-------------------

[](#extraction-priority)

The library runs extractors in this order and merges results:

1. **oEmbed** (provider registry + discovery)
2. **OpenGraph** (og:\* meta tags)
3. **JSON-LD** (schema.org structured data)
4. **Twitter Cards** (twitter:\* meta tags)
5. **HTML** (title, meta description, canonical link)

Later extractors only fill fields that are still `null` - they won't overwrite data from higher-priority sources.

oEmbed Support
--------------

[](#oembed-support)

### Supported Providers

[](#supported-providers)

The library includes built-in support for popular oEmbed providers:

- YouTube
- Vimeo
- Twitter/X
- Instagram
- Facebook
- TikTok
- SoundCloud
- Spotify

### Discovery

[](#discovery)

For sites not in the registry, the library automatically looks for oEmbed discovery links:

```

```

**Note**: oEmbed endpoints may require API keys or have usage limits. These are the caller's responsibility to manage.

Error Handling
--------------

[](#error-handling)

### Graceful Degradation (Default)

[](#graceful-degradation-default)

By default, the library returns partial results on errors:

```
$extractor = new MetadataExtractor();
$metadata = $extractor->extract('https://nonexistent.example.com');
// Returns empty Metadata object, no exception thrown
```

### Strict Mode

[](#strict-mode)

Enable exceptions for HTTP errors:

```
$extractor = new MetadataExtractor(['throw_on_http_error' => true]);

try {
    $metadata = $extractor->extract('https://nonexistent.example.com');
} catch (\GuzzleHttp\Exception\GuzzleException $e) {
    // Handle HTTP error
}
```

Edge Cases
----------

[](#edge-cases)

### Relative URLs

[](#relative-urls)

Image and canonical URLs are resolved against the base URL when possible:

```
// HTML:
// Base URL: https://example.com/other
// Result: https://example.com/page
```

### Multiple JSON-LD Blocks

[](#multiple-json-ld-blocks)

The library handles pages with multiple `` blocks and `@graph` structures.

### User-Agent Headers

[](#user-agent-headers)

Some sites block requests without proper User-Agent headers. The library includes a default:

```
Mozilla/5.0 (compatible; MetadataBot/1.0)

```

Customize if needed:

```
$extractor = new MetadataExtractor([
    'user_agent' => 'MyCustomBot/2.0',
]);
```

### Character Encoding

[](#character-encoding)

The library uses PHP's DOMDocument for HTML parsing, which handles most encoding issues automatically via libxml.

Development
-----------

[](#development)

### Running Tests

[](#running-tests)

```
composer install
./vendor/bin/phpunit tests/
```

### Code Coverage

[](#code-coverage)

```
./vendor/bin/phpunit tests/ --coverage-html coverage/
```

License
-------

[](#license)

BSD-3-Clause. See LICENSE file for details.

Contributing
------------

[](#contributing)

This is a DealNews internal library. For issues or questions, contact the development team.

###  Health Score

40

—

FairBetter than 87% of packages

Maintenance91

Actively maintained with recent releases

Popularity19

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity36

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

73d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/49531?v=4)[Brian Moon](/maintainers/brianlmoon)[@brianlmoon](https://github.com/brianlmoon)

---

Top Contributors

[![brianlmoon](https://avatars.githubusercontent.com/u/49531?v=4)](https://github.com/brianlmoon "brianlmoon (1 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/dealnews-metadata/health.svg)

```
[![Health](https://phpackages.com/badges/dealnews-metadata/health.svg)](https://phpackages.com/packages/dealnews-metadata)
```

###  Alternatives

[shlinkio/shlink

A self-hosted and PHP-based URL shortener application with CLI and REST interfaces

4.8k4.3k](/packages/shlinkio-shlink)[ralphjsmit/laravel-helpers

A package containing handy helpers for your Laravel-application.

13704.6k2](/packages/ralphjsmit-laravel-helpers)[dhlparcel/magento2-plugin

DHL Parcel plugin for Magento 2

11180.5k2](/packages/dhlparcel-magento2-plugin)[aedart/athenaeum

Athenaeum is a mono repository; a collection of various PHP packages

255.2k](/packages/aedart-athenaeum)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
