PHPackages                             givetwice/product-info-fetcher - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. givetwice/product-info-fetcher

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

givetwice/product-info-fetcher
==============================

When given a product URL, will return structured data of that product, including name, description, price, etc

0.4.2(1mo ago)134951[1 PRs](https://github.com/GiveTwice/product-info-fetcher/pulls)MITPHPPHP ^8.4CI passing

Since Dec 17Pushed 1mo agoCompare

[ Source](https://github.com/GiveTwice/product-info-fetcher)[ Packagist](https://packagist.org/packages/givetwice/product-info-fetcher)[ Docs](https://github.com/GiveTwice/product-info-fetcher)[ GitHub Sponsors](https://github.com/GiveTwice)[ RSS](/packages/givetwice-product-info-fetcher/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (10)Dependencies (10)Versions (16)Used By (0)

Product Info Fetcher
====================

[](#product-info-fetcher)

[![Latest Version on Packagist](https://camo.githubusercontent.com/7530f6abb424eab9794e8db26a8859623e3b0ddb4ae7f3d79bcccb8c36d3d716/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6769766574776963652f70726f647563742d696e666f2d666574636865722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/givetwice/product-info-fetcher)[![Tests](https://camo.githubusercontent.com/16e5b01092630c964029413a829496931488d85130d4867e198f0eef06b8100d/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f4769766554776963652f70726f647563742d696e666f2d666574636865722f72756e2d74657374732e796d6c3f6272616e63683d6d61696e266c6162656c3d7465737473267374796c653d666c61742d737175617265)](https://github.com/GiveTwice/product-info-fetcher/actions/workflows/run-tests.yml)[![Total Downloads](https://camo.githubusercontent.com/b65df48bc8349f3ab0946a5e411676d5d526b123692bb287b64c49419527541d/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f6769766574776963652f70726f647563742d696e666f2d666574636865722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/givetwice/product-info-fetcher)

A PHP package that fetches product information from any URL and returns structured data. It parses JSON-LD structured data, Open Graph meta tags, and HTML image elements to extract product details like name, description, price, and images.

Installation
------------

[](#installation)

```
composer require givetwice/product-info-fetcher
```

Usage
-----

[](#usage)

### Basic

[](#basic)

```
use GiveTwice\ProductInfoFetcher\ProductInfoFetcher;

$product = (new ProductInfoFetcher('https://example.com/product'))
    ->fetchAndParse();

// Core fields
echo $product->name;          // "iPhone 15 Pro"
echo $product->description;   // "The latest iPhone with A17 Pro chip"
echo $product->priceInCents;  // 99900
echo $product->priceCurrency; // "USD"
echo $product->url;           // "https://example.com/product"
echo $product->imageUrl;      // "https://example.com/images/iphone.jpg"
echo $product->allImageUrls;  // ["https://...", "https://..."] (all found images)

// Additional fields
echo $product->brand;         // "Apple"
echo $product->sku;           // "IPHONE15PRO-256"
echo $product->gtin;          // "0194253392200"
echo $product->availability;  // ProductAvailability::InStock
echo $product->condition;     // ProductCondition::New
echo $product->rating;        // 4.8
echo $product->reviewCount;   // 1250

// For display purposes
echo $product->getFormattedPrice(); // "USD 999.00"
```

### Pricing

[](#pricing)

Prices are stored as integers in cents to avoid floating-point precision issues. This follows the same approach used by payment systems like Stripe.

```
$product->priceInCents;  // 139099 (integer)
$product->priceCurrency; // "EUR" (ISO 4217 currency code)

// For display
$product->getFormattedPrice(); // "EUR 1390.99"

// For calculations (no floating-point issues)
$total = $product->priceInCents * $quantity;
$displayPrice = number_format($total / 100, 2);
```

The parser normalizes various price formats:

- String prices: `"999.00"` → `99900`
- Integer prices: `1479` → `147900`
- European format: `"1.234,56"` → `123456`

### Availability &amp; Condition

[](#availability--condition)

The `availability` and `condition` fields return enum instances:

```
use GiveTwice\ProductInfoFetcher\Enum\ProductAvailability;
use GiveTwice\ProductInfoFetcher\Enum\ProductCondition;

// Availability values
ProductAvailability::InStock
ProductAvailability::OutOfStock
ProductAvailability::PreOrder
ProductAvailability::BackOrder
ProductAvailability::Discontinued

// Condition values
ProductCondition::New
ProductCondition::Used
ProductCondition::Refurbished
ProductCondition::Damaged

// Usage
if ($product->availability === ProductAvailability::InStock) {
    // Product is available
}

// Get string value
$product->availability?->value; // "InStock"
```

### Multiple Images

[](#multiple-images)

The `imageUrl` field contains the primary image (first found). The `allImageUrls` array contains all unique images found across all sources (JSON-LD, meta tags, and HTML image elements):

```
$product->imageUrl;      // Primary image (first found)
$product->allImageUrls;  // All unique images from all sources

// Example: different resolutions from different sources
// [0] "http://example.com/product-370x370.jpg"  (from JSON-LD)
// [1] "//example.com/product-big.jpg"           (from og:image)
```

This is useful when sources provide different image sizes or when you want fallback options.

### With Options

[](#with-options)

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->setUserAgent('MyApp/1.0 (https://myapp.com)')
    ->setTimeout(10)
    ->setConnectTimeout(5)
    ->setAcceptLanguage('nl-BE,nl;q=0.9,en;q=0.8')
    ->fetchAndParse();
```

### Custom Headers

[](#custom-headers)

For sites with stricter bot detection, you can add extra HTTP headers to mimic a real browser:

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->withExtraHeaders([
        'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Cache-Control' => 'no-cache',
        'DNT' => '1',
        'Sec-CH-UA' => '"Google Chrome";v="131", "Chromium";v="131"',
        'Sec-CH-UA-Mobile' => '?0',
        'Sec-CH-UA-Platform' => '"macOS"',
        'Sec-Fetch-Dest' => 'document',
        'Sec-Fetch-Mode' => 'navigate',
        'Sec-Fetch-Site' => 'none',
        'Sec-Fetch-User' => '?1',
        'Upgrade-Insecure-Requests' => '1',
    ])
    ->fetchAndParse();
```

Extra headers are merged with defaults and can override them. Multiple `withExtraHeaders()` calls can be chained.

### HTTP Proxy

[](#http-proxy)

Route requests through an HTTP proxy:

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->viaProxy('http://proxy.example.com:3128')
    ->fetchAndParse();

// With authentication
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->viaProxy('http://username:password@proxy.example.com:3128')
    ->fetchAndParse();
```

The proxy is used for both regular HTTP requests (via Guzzle/cURL) and headless browser requests (via Chrome's `--proxy-server` flag). Proxy authentication is handled automatically.

### Headless Browser

[](#headless-browser)

Some sites use advanced bot protection (Akamai, Cloudflare) that blocks simple HTTP requests. For these sites, you can use a headless Chrome browser via Puppeteer to fetch the page.

**Prefer Headless (recommended for known-protected sites):**

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->preferHeadless()
    ->fetchAndParse();
```

Use `preferHeadless()` when you know the site has bot protection. This skips the HTTP request entirely and uses headless Chrome directly. This is more efficient because attempting an HTTP request first can get your IP flagged, causing subsequent headless requests to also fail.

**Fallback Mode:**

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->enableHeadlessFallback()
    ->fetchAndParse();
```

When enabled, the fetcher will:

1. First attempt a normal HTTP request
2. If blocked with a 403 status, automatically retry using a headless Chrome browser
3. Parse the resulting HTML as usual

Use `enableHeadlessFallback()` when you're unsure if the site has bot protection and want to try the faster HTTP request first.

**Requirements:**

To use the headless fallback, you need Node.js 18+ and Puppeteer installed:

```
npm install puppeteer@^23.0 puppeteer-extra@^3.3 puppeteer-extra-plugin-stealth@^2.11
```

On Linux servers, you may also need system dependencies for headless Chrome:

```
# Ubuntu 24.04+
apt-get install -y libnss3 libatk1.0-0t64 libatk-bridge2.0-0t64 \
  libcups2t64 libdrm2 libxkbcommon0 libxcomposite1 \
  libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2t64 \
  libpango-1.0-0 libcairo2

# Ubuntu 22.04 and earlier
apt-get install -y libnss3 libatk1.0-0 libatk-bridge2.0-0 \
  libcups2 libdrm2 libxkbcommon0 libxcomposite1 \
  libxdamage1 libxfixes3 libxrandr2 libgbm1 libasound2
```

**Custom Paths:**

```
$product = (new ProductInfoFetcher('https://example.com/product'))
    ->enableHeadlessFallback()
    ->setNodeBinary('/usr/local/bin/node')
    ->setChromePath('/usr/bin/chromium')
    ->fetchAndParse();
```

### Separate Fetch and Parse

[](#separate-fetch-and-parse)

```
$fetcher = new ProductInfoFetcher('https://example.com/product');
$fetcher->fetch();
$product = $fetcher->parse();
```

### Parse Existing HTML

[](#parse-existing-html)

```
$product = (new ProductInfoFetcher())
    ->setHtml($html)
    ->parse();
```

### Access as Array

[](#access-as-array)

```
$product = (new ProductInfoFetcher($url))->fetchAndParse();

$data = $product->toArray();
// [
//     'name' => 'iPhone 15 Pro',
//     'description' => 'The latest iPhone...',
//     'url' => 'https://example.com/product',
//     'priceInCents' => 99900,
//     'priceCurrency' => 'USD',
//     'imageUrl' => 'https://example.com/image.jpg',
//     'allImageUrls' => ['https://...', 'https://...'],
//     'brand' => 'Apple',
//     'sku' => 'IPHONE15PRO-256',
//     'gtin' => '0194253392200',
//     'availability' => 'InStock',
//     'condition' => 'New',
//     'rating' => 4.8,
//     'reviewCount' => 1250,
// ]
```

### Check Completeness

[](#check-completeness)

```
if ($product->isComplete()) {
    // Product has name and description
}
```

How It Works
------------

[](#how-it-works)

The package attempts to extract product information in the following order:

1. **JSON-LD** - Looks for `` with `@type: Product` or `@type: ProductGroup`
2. **Meta Tags** - Falls back to Open Graph (`og:`), Twitter Cards (`twitter:`), and standard meta tags
3. **HTML Images** - Extracts product images directly from `` elements using common patterns (Amazon's `landingImage`, product image classes, data attributes)

If the first parser returns complete data (name and description), it returns immediately. Otherwise, it merges results from multiple parsers. Images from all three sources are always combined.

### Supported Structures

[](#supported-structures)

- **schema.org Product** - Standard product markup including `offers`, `brand`, `sku`, `gtin`, `aggregateRating`
- **schema.org ProductGroup** - Product variants (e.g., bol.com) with `hasVariant[]`
- **Open Graph** - `og:title`, `og:description`, `og:image`, `product:price:amount`, `product:price:currency`, `product:availability`, `product:condition`

Both short (`"@type": "Product"`) and full URL (`"@type": "http://schema.org/Product"`) formats are supported for all schema.org types.

### Meta Tag Fallback Chain

[](#meta-tag-fallback-chain)

When JSON-LD is unavailable, the parser tries multiple sources:

- **name**: `og:title` → `twitter:title` → ``
- **description**: `og:description` → `twitter:description` → ``
- **image**: `og:image` → `twitter:image` → HTML image elements
- **url**: `` → `og:url`

### HTML Image Extraction

[](#html-image-extraction)

For sites without structured data or meta tags (e.g., Amazon), the package extracts images directly from HTML:

- **Amazon pattern**: `` with `data-old-hires` for high-res images
- **Common IDs**: `main-image`, `product-image`, `hero-image`
- **Common classes**: `product-image`, `main-image`, `gallery-image`
- **Data attributes**: `data-zoom-image`, `data-large-image`, `data-src`

High-resolution images are prioritized when available.

Testing
-------

[](#testing)

```
composer test
```

Changelog
---------

[](#changelog)

Please see [CHANGELOG](CHANGELOG.md) for more information on what has changed recently.

Contributing
------------

[](#contributing)

Please see [CONTRIBUTING](https://github.com/spatie/.github/blob/main/CONTRIBUTING.md) for details.

Security Vulnerabilities
------------------------

[](#security-vulnerabilities)

Please review [our security policy](../../security/policy) on how to report security vulnerabilities.

Credits
-------

[](#credits)

- [GiveTwice](https://github.com/GiveTwice)
- [All Contributors](../../contributors)

License
-------

[](#license)

The MIT License (MIT). Please see [License File](LICENSE.md) for more information.

###  Health Score

47

—

FairBetter than 94% of packages

Maintenance90

Actively maintained with recent releases

Popularity25

Limited adoption so far

Community10

Small or concentrated contributor base

Maturity50

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 91.5% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~11 days

Recently: every ~23 days

Total

10

Last Release

55d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/0dbb69afe09fb9bbbac55e2c1cfe89b8596d634432720217b3c5736c67913bea?d=identicon)[mattias\_givetwice](/maintainers/mattias_givetwice)

---

Top Contributors

[![mattiasgeniar](https://avatars.githubusercontent.com/u/407270?v=4)](https://github.com/mattiasgeniar "mattiasgeniar (43 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (3 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (1 commits)")

---

Tags

givetwiceproduct-info-fetcher

###  Code Quality

TestsPest

Code StyleLaravel Pint

### Embed Badge

![Health badge](/badges/givetwice-product-info-fetcher/health.svg)

```
[![Health](https://phpackages.com/badges/givetwice-product-info-fetcher/health.svg)](https://phpackages.com/packages/givetwice-product-info-fetcher)
```

###  Alternatives

[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[shlinkio/shlink

A self-hosted and PHP-based URL shortener application with CLI and REST interfaces

4.8k4.3k](/packages/shlinkio-shlink)[ralphjsmit/laravel-helpers

A package containing handy helpers for your Laravel-application.

13704.6k2](/packages/ralphjsmit-laravel-helpers)[aedart/athenaeum

Athenaeum is a mono repository; a collection of various PHP packages

245.2k](/packages/aedart-athenaeum)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
