PHPackages                             scotteh/php-goose - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [HTTP &amp; Networking](/categories/http)
4. /
5. scotteh/php-goose

AbandonedArchivedLibrary[HTTP &amp; Networking](/categories/http)

scotteh/php-goose
=================

Readability / Html Content / Article Extractor &amp; Web Scrapping library written in PHP

1.1.1(2y ago)451116.3k↓56.4%1182Apache-2.0PHPPHP &gt;=7.1.0

Since Oct 3Pushed 2y ago1 watchersCompare

[ Source](https://github.com/scotteh/php-goose)[ Packagist](https://packagist.org/packages/scotteh/php-goose)[ Docs](https://github.com/scotteh/php-goose)[ RSS](/packages/scotteh-php-goose/feed)WikiDiscussions master Synced yesterday

READMEChangelogDependencies (5)Versions (26)Used By (2)

PHP Goose - Article Extractor
=============================

[](#php-goose---article-extractor)

Note
----

[](#note)

This repository has been archived as of 2023-09-05.

Intro
-----

[](#intro)

PHP Goose is a port of [Goose](https://github.com/GravityLabs/goose/) originally developed in Java and converted to Scala by [GravityLabs](https://github.com/GravityLabs/). Portions have also been ported from the Python port [python-goose](https://github.com/grangier/python-goose). Its mission is to take any news article or article type web page and not only extract what is the main body of the article but also all metadata and most probable image candidate.

The extraction goal is to try and get the purest extraction from the beginning of the article for servicing flipboard/pulse type applications that need to show the first snippet of a web article along with an image.

Goose will try to extract the following information:

- Main text of an article
- Main image of article
- Any YouTube/Vimeo movies embedded in article
- Meta Description
- Meta tags
- Publish Date

The PHP version was rewritten by:

- Andrew Scott

Requirement
-----------

[](#requirement)

- PHP 7.1 or later
- PSR-4 compatible autoloader

The older 0.x versions with PHP 5.5+ support are still available under [releases](https://github.com/scotteh/php-goose/releases).

Install
-------

[](#install)

This library is designed to be installed via [Composer](https://getcomposer.org/doc/).

Add the dependency into your projects composer.json.

```
{
  "require": {
    "scotteh/php-goose": "^1.0"
  }
}

```

Download the composer.phar

```
curl -sS https://getcomposer.org/installer | php
```

Install the library.

```
php composer.phar install
```

Autoloading
-----------

[](#autoloading)

This library requires an autoloader, if you aren't already using one you can include [Composers autoloader](https://getcomposer.org/doc/01-basic-usage.md#autoloading).

```
require('vendor/autoload.php');
```

Usage
-----

[](#usage)

```
use \Goose\Client as GooseClient;

$goose = new GooseClient();
$article = $goose->extractContent('http://url.to/article');

$title = $article->getTitle();
$metaDescription = $article->getMetaDescription();
$metaKeywords = $article->getMetaKeywords();
$canonicalLink = $article->getCanonicalLink();
$domain = $article->getDomain();
$tags = $article->getTags();
$links = $article->getLinks();
$videos = $article->getVideos();
$articleText = $article->getCleanedArticleText();
$entities = $article->getPopularWords();
$image = $article->getTopImage();
$allImages = $article->getAllImages();
```

Configuration
-------------

[](#configuration)

All config options are not required and are optional. Default (fallback) values have been used below.

```
use \Goose\Client as GooseClient;

$goose = new GooseClient([
    // Language - Selects common word dictionary
    //   Supported languages (ISO 639-1):
    //     ar, cs, da, de, en, es, fi, fr, hu, id, it, ja,
    //     ko, nb, nl, no, pl, pt, ru, sv, vi, zh
    'language' => 'en',
    // Minimum image size (bytes)
    'image_min_bytes' => 4500,
    // Maximum image size (bytes)
    'image_max_bytes' => 5242880,
    // Minimum image size (pixels)
    'image_min_width' => 120,
    // Maximum image size (pixels)
    'image_min_height' => 120,
    // Fetch best image
    'image_fetch_best' => true,
    // Fetch all images
    'image_fetch_all' => false,
    // Guzzle configuration - All values are passed directly to Guzzle
    //   See: http://guzzle.readthedocs.io/en/stable/request-options.html
    'browser' => [
        'timeout' => 60,
        'connect_timeout' => 30
    ]
]);
```

Licensing
---------

[](#licensing)

PHP Goose is licensed by Gravity.com under the Apache 2.0 license, see the LICENSE file for more details.

###  Health Score

46

—

FairBetter than 92% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity53

Moderate usage in the ecosystem

Community31

Small or concentrated contributor base

Maturity68

Established project with proven stability

 Bus Factor3

3 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~148 days

Recently: every ~463 days

Total

23

Last Release

1032d ago

Major Versions

0.6.4 → 1.0.02018-02-14

PHP version history (3 changes)0.1.0PHP &gt;=5.4.0

0.4.0PHP &gt;=5.5.0

1.0.0PHP &gt;=7.1.0

### Community

Maintainers

![](https://www.gravatar.com/avatar/b0dd883bbd90c820bad81c315e70b65eb82b32366e95396aaf2e13bc9892ace1?d=identicon)[scotteh](/maintainers/scotteh)

---

Top Contributors

[![squallstar](https://avatars.githubusercontent.com/u/574210?v=4)](https://github.com/squallstar "squallstar (22 commits)")[![shaneiseminger](https://avatars.githubusercontent.com/u/843313?v=4)](https://github.com/shaneiseminger "shaneiseminger (2 commits)")[![aurelie-vndl](https://avatars.githubusercontent.com/u/82381702?v=4)](https://github.com/aurelie-vndl "aurelie-vndl (2 commits)")[![r-da](https://avatars.githubusercontent.com/u/25054264?v=4)](https://github.com/r-da "r-da (2 commits)")[![cdubz](https://avatars.githubusercontent.com/u/10456740?v=4)](https://github.com/cdubz "cdubz (2 commits)")[![dependabot-preview[bot]](https://avatars.githubusercontent.com/in/2141?v=4)](https://github.com/dependabot-preview[bot] "dependabot-preview[bot] (1 commits)")[![elmariachi111](https://avatars.githubusercontent.com/u/1344649?v=4)](https://github.com/elmariachi111 "elmariachi111 (1 commits)")[![FaZeRs](https://avatars.githubusercontent.com/u/9529341?v=4)](https://github.com/FaZeRs "FaZeRs (1 commits)")[![jeroenseegers](https://avatars.githubusercontent.com/u/811475?v=4)](https://github.com/jeroenseegers "jeroenseegers (1 commits)")[![lucascvs](https://avatars.githubusercontent.com/u/38900631?v=4)](https://github.com/lucascvs "lucascvs (1 commits)")[![mhugot](https://avatars.githubusercontent.com/u/3684974?v=4)](https://github.com/mhugot "mhugot (1 commits)")[![oliverhermanni](https://avatars.githubusercontent.com/u/1240179?v=4)](https://github.com/oliverhermanni "oliverhermanni (1 commits)")[![peter279k](https://avatars.githubusercontent.com/u/9021747?v=4)](https://github.com/peter279k "peter279k (1 commits)")[![psefranek](https://avatars.githubusercontent.com/u/31729267?v=4)](https://github.com/psefranek "psefranek (1 commits)")[![rbatukaev](https://avatars.githubusercontent.com/u/4622415?v=4)](https://github.com/rbatukaev "rbatukaev (1 commits)")[![samwilson](https://avatars.githubusercontent.com/u/213655?v=4)](https://github.com/samwilson "samwilson (1 commits)")[![sters](https://avatars.githubusercontent.com/u/1658147?v=4)](https://github.com/sters "sters (1 commits)")[![tfevens](https://avatars.githubusercontent.com/u/701763?v=4)](https://github.com/tfevens "tfevens (1 commits)")[![anare](https://avatars.githubusercontent.com/u/1291552?v=4)](https://github.com/anare "anare (1 commits)")[![treeleaf](https://avatars.githubusercontent.com/u/877288?v=4)](https://github.com/treeleaf "treeleaf (1 commits)")

---

Tags

articlearticle-extractorautoloadercomposerphpphp-goosereadabilityscraperhttpcontentwebsitetextextractorscraperscrapingreadability

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/scotteh-php-goose/health.svg)

```
[![Health](https://phpackages.com/badges/scotteh-php-goose/health.svg)](https://phpackages.com/packages/scotteh-php-goose)
```

###  Alternatives

[spatie/crawler

Crawl all internal links found on a website

2.8k18.5M67](/packages/spatie-crawler)[quickbooks/v3-php-sdk

The Official PHP SDK for QuickBooks Online Accounting API

28210.6M33](/packages/quickbooks-v3-php-sdk)[duzun/hquery

An extremely fast web scraper that parses megabytes of HTML in a blink of an eye. No dependencies. PHP5+

362167.6k5](/packages/duzun-hquery)[graham-campbell/guzzle-factory

Provides A Simple Guzzle Factory With Good Defaults

927.0M55](/packages/graham-campbell-guzzle-factory)[laurentvw/scrapher

A web scraper for PHP to easily extract data from web pages

202.5k1](/packages/laurentvw-scrapher)[eslazarev/wildberries-sdk

Wildberries OpenAPI clients (generated).

273.0k](/packages/eslazarev-wildberries-sdk)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
