PHPackages                             andhirto/php-goose - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [HTTP &amp; Networking](/categories/http)
4. /
5. andhirto/php-goose

ActiveLibrary[HTTP &amp; Networking](/categories/http)

andhirto/php-goose
==================

Readability / Html Content / Article Extractor &amp; Web Scrapping library written in PHP

1.1.4(4y ago)0103Apache-2.0PHPPHP &gt;=7.1.0

Since Oct 3Pushed 4y agoCompare

[ Source](https://github.com/andhirto/php-goose)[ Packagist](https://packagist.org/packages/andhirto/php-goose)[ Docs](https://github.com/scotteh/php-goose)[ RSS](/packages/andhirto-php-goose/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (4)Dependencies (5)Versions (29)Used By (0)

PHP Goose - Article Extractor
=============================

[](#php-goose---article-extractor)

[![Scrutinizer Code Quality](https://camo.githubusercontent.com/53ef7691635f31f18085e04e4f6d66841874b850dff313417091b7d98a51b6e1/68747470733a2f2f7363727574696e697a65722d63692e636f6d2f672f73636f747465682f7068702d676f6f73652f6261646765732f7175616c6974792d73636f72652e706e673f623d6d6173746572)](https://scrutinizer-ci.com/g/scotteh/php-goose/?branch=master)

Intro
-----

[](#intro)

PHP Goose is a port of [Goose](https://github.com/GravityLabs/goose/) originally developed in Java and converted to Scala by [GravityLabs](https://github.com/GravityLabs/). Portions have also been ported from the Python port [python-goose](https://github.com/grangier/python-goose). Its mission is to take any news article or article type web page and not only extract what is the main body of the article but also all metadata and most probable image candidate.

The extraction goal is to try and get the purest extraction from the beginning of the article for servicing flipboard/pulse type applications that need to show the first snippet of a web article along with an image.

Goose will try to extract the following information:

- Main text of an article
- Main image of article
- Any YouTube/Vimeo movies embedded in article
- Meta Description
- Meta tags
- Publish Date

The PHP version was rewritten by:

- Andrew Scott

Requirement
-----------

[](#requirement)

- PHP 7.1 or later
- PSR-4 compatible autoloader

The older 0.x versions with PHP 5.5+ support are still available under [releases](https://github.com/scotteh/php-goose/releases).

Install
-------

[](#install)

This library is designed to be installed via [Composer](https://getcomposer.org/doc/).

Add the dependency into your projects composer.json.

```
{
  "require": {
    "scotteh/php-goose": "^1.0"
  }
}

```

Download the composer.phar

```
curl -sS https://getcomposer.org/installer | php
```

Install the library.

```
php composer.phar install
```

Autoloading
-----------

[](#autoloading)

This library requires an autoloader, if you aren't already using one you can include [Composers autoloader](https://getcomposer.org/doc/01-basic-usage.md#autoloading).

```
require('vendor/autoload.php');
```

Usage
-----

[](#usage)

```
use \Goose\Client as GooseClient;

$goose = new GooseClient();
$article = $goose->extractContent('http://url.to/article');

$title = $article->getTitle();
$metaDescription = $article->getMetaDescription();
$metaKeywords = $article->getMetaKeywords();
$canonicalLink = $article->getCanonicalLink();
$domain = $article->getDomain();
$tags = $article->getTags();
$links = $article->getLinks();
$videos = $article->getVideos();
$articleText = $article->getCleanedArticleText();
$entities = $article->getPopularWords();
$image = $article->getTopImage();
$allImages = $article->getAllImages();
```

Configuration
-------------

[](#configuration)

All config options are not required and are optional. Default (fallback) values have been used below.

```
use \Goose\Client as GooseClient;

$goose = new GooseClient([
    // Language - Selects common word dictionary
    //   Supported languages (ISO 639-1):
    //     ar, cs, da, de, en, es, fi, fr, hu, id, it, ja,
    //     ko, nb, nl, no, pl, pt, ru, sv, vi, zh
    'language' => 'en',
    // Minimum image size (bytes)
    'image_min_bytes' => 4500,
    // Maximum image size (bytes)
    'image_max_bytes' => 5242880,
    // Minimum image size (pixels)
    'image_min_width' => 120,
    // Maximum image size (pixels)
    'image_min_height' => 120,
    // Fetch best image
    'image_fetch_best' => true,
    // Fetch all images
    'image_fetch_all' => false,
    // Guzzle configuration - All values are passed directly to Guzzle
    //   See: http://guzzle.readthedocs.io/en/stable/request-options.html
    'browser' => [
        'timeout' => 60,
        'connect_timeout' => 30
    ]
]);
```

Licensing
---------

[](#licensing)

PHP Goose is licensed by Gravity.com under the Apache 2.0 license, see the LICENSE file for more details.

###  Health Score

31

—

LowBetter than 68% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity9

Limited adoption so far

Community18

Small or concentrated contributor base

Maturity69

Established project with proven stability

 Bus Factor2

2 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~103 days

Recently: every ~70 days

Total

26

Last Release

1652d ago

Major Versions

0.6.4 → 1.0.02018-02-14

PHP version history (3 changes)0.1.0PHP &gt;=5.4.0

0.4.0PHP &gt;=5.5.0

1.0.0PHP &gt;=7.1.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/c64d08a82e06091376fdab3d9dd645bcde1f343372a6b8913d38b3cf1c1e7336?d=identicon)[andhirto](/maintainers/andhirto)

---

Top Contributors

[![squallstar](https://avatars.githubusercontent.com/u/574210?v=4)](https://github.com/squallstar "squallstar (22 commits)")[![andhirto](https://avatars.githubusercontent.com/u/45546377?v=4)](https://github.com/andhirto "andhirto (4 commits)")[![shaneiseminger](https://avatars.githubusercontent.com/u/843313?v=4)](https://github.com/shaneiseminger "shaneiseminger (2 commits)")[![r-da](https://avatars.githubusercontent.com/u/25054264?v=4)](https://github.com/r-da "r-da (2 commits)")[![cdubz](https://avatars.githubusercontent.com/u/10456740?v=4)](https://github.com/cdubz "cdubz (2 commits)")[![dependabot-preview[bot]](https://avatars.githubusercontent.com/in/2141?v=4)](https://github.com/dependabot-preview[bot] "dependabot-preview[bot] (1 commits)")[![elmariachi111](https://avatars.githubusercontent.com/u/1344649?v=4)](https://github.com/elmariachi111 "elmariachi111 (1 commits)")[![FaZeRs](https://avatars.githubusercontent.com/u/9529341?v=4)](https://github.com/FaZeRs "FaZeRs (1 commits)")[![jeroenseegers](https://avatars.githubusercontent.com/u/811475?v=4)](https://github.com/jeroenseegers "jeroenseegers (1 commits)")[![lucascvs](https://avatars.githubusercontent.com/u/38900631?v=4)](https://github.com/lucascvs "lucascvs (1 commits)")[![mhugot](https://avatars.githubusercontent.com/u/3684974?v=4)](https://github.com/mhugot "mhugot (1 commits)")[![oliverhermanni](https://avatars.githubusercontent.com/u/1240179?v=4)](https://github.com/oliverhermanni "oliverhermanni (1 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")[![psefranek](https://avatars.githubusercontent.com/u/31729267?v=4)](https://github.com/psefranek "psefranek (1 commits)")[![rbatukaev](https://avatars.githubusercontent.com/u/4622415?v=4)](https://github.com/rbatukaev "rbatukaev (1 commits)")[![samwilson](https://avatars.githubusercontent.com/u/213655?v=4)](https://github.com/samwilson "samwilson (1 commits)")[![sters](https://avatars.githubusercontent.com/u/1658147?v=4)](https://github.com/sters "sters (1 commits)")[![tfevens](https://avatars.githubusercontent.com/u/701763?v=4)](https://github.com/tfevens "tfevens (1 commits)")[![anare](https://avatars.githubusercontent.com/u/1291552?v=4)](https://github.com/anare "anare (1 commits)")[![treeleaf](https://avatars.githubusercontent.com/u/877288?v=4)](https://github.com/treeleaf "treeleaf (1 commits)")

---

Tags

httpcontentwebsitetextextractorscraperscrapingreadability

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/andhirto-php-goose/health.svg)

```
[![Health](https://phpackages.com/badges/andhirto-php-goose/health.svg)](https://phpackages.com/packages/andhirto-php-goose)
```

###  Alternatives

[spatie/crawler

Crawl all internal links found on a website

2.8k16.3M52](/packages/spatie-crawler)[duzun/hquery

An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

363146.3k4](/packages/duzun-hquery)[graham-campbell/guzzle-factory

Provides A Simple Guzzle Factory With Good Defaults

916.4M49](/packages/graham-campbell-guzzle-factory)[middlewares/negotiation

Middleware to implement content negotiation

47442.1k11](/packages/middlewares-negotiation)[laurentvw/scrapher

A web scraper for PHP to easily extract data from web pages

192.5k1](/packages/laurentvw-scrapher)[ptlis/conneg

Tools for performing content negotiation.

364.9k1](/packages/ptlis-conneg)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
