PHPackages                             michaeluno/php-simple-web-scraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. michaeluno/php-simple-web-scraper

ActiveApplication

michaeluno/php-simple-web-scraper
=================================

A PHP application which runs on Heroku and dumps web site outputs including JavaScript generated contents.

1.4.4(4y ago)211019MITPHPPHP &gt;=5.6.20

Since Oct 20Pushed 4y ago2 watchersCompare

[ Source](https://github.com/michaeluno/php-simple-web-scraper)[ Packagist](https://packagist.org/packages/michaeluno/php-simple-web-scraper)[ RSS](/packages/michaeluno-php-simple-web-scraper/feed)WikiDiscussions master Synced 5d ago

READMEChangelogDependencies (3)Versions (13)Used By (0)

PHP Simple Web Scraper
======================

[](#php-simple-web-scraper)

A PHP application for Heroku, which can dump web site outputs including JavaScript generated contents.

 [![](_asset/image/screenshot.jpg "screenshot")](_asset/image/screenshot.jpg)

Demo
----

[](#demo)

Visit [here](https://php-simple-web-scraper.herokuapp.com/). If the server is sleeping, it takes several seconds to wake up.

Usage
-----

[](#usage)

### Basic Usage

[](#basic-usage)

Perform an HTTP request with the `url` query parameter and encoded URL as a value.

```
http(s)://{app-address}/?url={encoded target url}

```

#### Example

[](#example)

```
http(s)://{app-address}/?url=https%3A%2F%2Fgithub.com

```

### Parameters

[](#parameters)

#### output

[](#output)

Determines the output type, which includes `html`, `json`, `screenshot`.

##### *html* (default)

[](#html-default)

HTML source code of the target web site. JavaScript generated contents are also retrieved and dumped.

##### *json*

[](#json)

`output=json`

HTTP response data as JSON. Useful for cross domain communications with JSONP.

###### Example

[](#example-1)

```
http(s)://{app-address}/?url=https%3A%2F%2Fgithub.com&output=json

```

##### *screenshot*

[](#screenshot)

`output=screenshot`

A jpeg image of the site snapshot.

###### Example

[](#example-2)

```
http(s)://{app-address}/?url=https%3A%2F%2Fgithub.com&output=screenshot

```

#### file-type

[](#file-type)

When `screenshot` is given for the `output` parameter, the output file type can be set with the `file-type` parameter. Default: `jpg`.

It accepts the following values: `pdf`, `png`, `jpg`, `jpeg`, `bmp`, `ppm`.

#### width

[](#width)

When `screenshot` is given for the `output` parameter, `width` sets the screenshot image width.

#### height

[](#height)

When `screenshot` is given for the `output` parameter, `height` sets the screenshot image height. Leave it unset to get full height. The default minimum height is `720` pixels.

###### Example

[](#example-3)

```
http(s)://{app-address}/?url=https%3A%2F%2Fgithub.com&output=screenshot&file-type=png

```

#### user-agent

[](#user-agent)

Sets a custom user agent. By default, the client's user agent accessing the app will be used. This can be changed by specifying the value with this parameter.

If `random` is given, the user-agent will be randomly assigned.

##### Example

[](#example-4)

To set a user agent, `Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100102 Firefox/57.0`,

```
http(s)://{app-address}/?url=https%3A%2F%2Fwww.whatismybrowser.com%2Fdetect%2Fwhat-http-headers-is-my-browser-sending&user-agent=Mozilla/5.0%20(Windows%20NT%206.1;%20Win64;%20x64;%20rv:57.0)%20Gecko/20100102%20Firefox/57.0

```

```
http(s)://{app-address}/?url=https%3A%2F%2Fwww.whatismybrowser.com%2Fdetect%2Fwhat-http-headers-is-my-browser-sending&user-agent=random

```

#### load-images

[](#load-images)

Decides whether to load images. By default, this is disabled for the `html` and `json` output types. Enabled for the `screenshot` output type.

Accepts a boolean value `true`, `false`, or `1`, `0`.

##### Example

[](#example-5)

```
http(s)://{app-address}/?url=https%3A%2F%2Fwww.whatismybrowser.com%2Fdetect%2Fwhat-http-headers-is-my-browser-sending&user-agent=Mozilla/5.0%20(Windows%20NT%206.1;%20Win64;%20x64;%20rv:57.0)%20Gecko/20100102%20Firefox/57.0

```

#### output-encoding

[](#output-encoding)

Sets the encoding used for the output. Default: `utf8`

#### cache-lifespan

[](#cache-lifespan)

All requests are cached for 20 minutes by default. This detemines how long the cache should be retained. If you do not want a cached result or want to renew the cache, pass `0`. Default: `1200`.

#### headers

[](#headers)

Sets a custom HTTP headers. Accepts the value as an array.

##### Example

[](#example-6)

To set `DNT` value,

```
http(s)://{app-address}/?url=https%3A%2F%2Fwww.whatismybrowser.com%2Fdetect%2Fwhat-http-headers-is-my-browser-sending&headers[DNT]=1

```

#### method

[](#method)

HTTP request method. Default: `GET`. Accepts the followings.

- OPTIONS
- GET
- HEAD
- POST
- PUT
- DELETE
- PATCH

When using `POST`, give sending post data with the `data` request key. The program checks `$_REQUEST[ 'data' ]` to send POST data.

##### Example

[](#example-7)

```
http(s)://{app-address}/?url=http%3A%2F%2Fhttpbin.org%2Fpost&method=POST&data[foo]=bar

```

Run as Heroku Application
-------------------------

[](#run-as-heroku-application)

This is a Heroku application and meant to be deployed to a [Heroku](https://dashboard.heroku.com/) application instance.

### Requirements

[](#requirements)

- Heroku account
- [Heroku CLI](https://devcenter.heroku.com/articles/heroku-command-line)
- Git

### Steps to Deploy

[](#steps-to-deploy)

#### a) Quick Deploy

[](#a-quick-deploy)

You may simply use the following button to deploy this application:

[![Deploy](https://camo.githubusercontent.com/3da7ea007288c7a8b14c8c84f0836d66ba8f82bd2f3b72fb7e0e72e19a16d28f/68747470733a2f2f7777772e6865726f6b7563646e2e636f6d2f6465706c6f792f627574746f6e2e706e67)](https://heroku.com/deploy)

#### b) Manual Deploy

[](#b-manual-deploy)

1. Clone this repository to your local machine. Create a directory and from there, in a console window, type the following.

```
git clone https://github.com/michaeluno/php-simple-web-scraper.git

```

This will download the repository files.

2. Change the working directory to the cloned one.

```
cd php-simple-web-scraper

```

3. Login to Heroku from Heroku CLI.

```
heroku login

```

4. Create a new Heroku app.

```
heroku create

```

This gives somehing like this with a random app name. `glacial-basin-46381` is the app name in the below example.

```
https://glacial-basin-46381.herokuapp.com/ | https://git.heroku.com/glacial-basin-46381.git

```

5. Type the following. Replace `{heroku-app-name}` with your app name given in the above step.

```
heroku git:remote -a {heroku-app-name}

```

6. Upload the files to Heroku.

```
git push heroku master

```

7. Open the app in your browser.

```
heroku open

```

###  Health Score

31

—

LowBetter than 68% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity17

Limited adoption so far

Community13

Small or concentrated contributor base

Maturity64

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~88 days

Recently: every ~111 days

Total

12

Last Release

1789d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/a44ec6d52d65d1fc13fb805e15823388f0659b162c89c33c9e6429f82c03d55c?d=identicon)[michaeluno](/maintainers/michaeluno)

---

Top Contributors

[![michaeluno](https://avatars.githubusercontent.com/u/3408158?v=4)](https://github.com/michaeluno "michaeluno (49 commits)")

---

Tags

cross-domaincross-domain-requestcross-domain-solutioncross-origincross-origin-resource-sharingcross-sitecross-site-scriptingcrowlerherokuheroku-applicationphantomjsphpproxyscraperweb-scraper

### Embed Badge

![Health badge](/badges/michaeluno-php-simple-web-scraper/health.svg)

```
[![Health](https://phpackages.com/badges/michaeluno-php-simple-web-scraper/health.svg)](https://phpackages.com/packages/michaeluno-php-simple-web-scraper)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
