PHPackages                             parseword/massfetcher - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [HTTP &amp; Networking](/categories/http)
4. /
5. parseword/massfetcher

ActiveLibrary[HTTP &amp; Networking](/categories/http)

parseword/massfetcher
=====================

A multithreaded utility to retrieve a page or file from numerous websites

v1.0.1(6y ago)15Apache-2.0PHPPHP &gt;=7.1

Since Feb 1Pushed 6y ago2 watchersCompare

[ Source](https://github.com/parseword/massfetcher)[ Packagist](https://packagist.org/packages/parseword/massfetcher)[ Docs](https://github.com/parseword/massfetcher/)[ RSS](/packages/parseword-massfetcher/feed)WikiDiscussions master Synced 3d ago

READMEChangelog (2)Dependencies (1)Versions (3)Used By (0)

MassFetcher
===========

[](#massfetcher)

MassFetcher is a multithreaded HTTP GET request utility. Give it a path to request, and a giant list of domains to request it from. Retrieved files are saved to disk (subject to configuration parameters). You may find MassFetcher useful if you want to perform various types of web analysis:

- Gauge the average size of web index pages
- Determine the popularity of specific code libraries, meta tags, etc.
- Inspect lots of `ads.txt` files looking for new ad networks to block
- Find out how quickly (or not) a proposal like `./well-known/security.txt` is being implemented

MassFetcher will go get the data; doing something with it is up to you.

Requirements
------------

[](#requirements)

- PHP &gt;= 7.1, with
- The `pthreads` extension, either compiled-in or enabled as a module, and
- The `curl` extension, either compiled-in or enabled as a module
- Composer

Installation
------------

[](#installation)

Clone this repository to a new directory and then run `composer install`. This will pull in the dependency (a logger) and set up the autoloader.

Copy `config.php-dist` to `config.php`.

Usage
-----

[](#usage)

Configure your settings inside `config.php`. Here you can set the target URI path you want to request, along with a bunch of options to modify MassFetcher's behavior. The options are explained in the comments.

Supply your list of target hosts in a file called `domains.txt`. The [Alexa Top 1M list](http://s3.amazonaws.com/alexa-static/top-1m.csv.zip) may come in handy, but do some small test runs first!

Run `php fetcher.php` to execute MassFetcher.

Retrieved files will be saved to a directory (defaults to `data`) in a series of hierarchical subdirectories.

The repository ships with a sample `domains.txt` containing 100 hostnames, a a config that will request `/ads.txt` from all of them, and the logger set to debug level. You should probably run once using these defaults, then examine the `output.log` file to see what's going on under the hood.

Resources and Performance
-------------------------

[](#resources-and-performance)

Performance will vary depending upon your hardware, internet connection, and configuration settings. Broadly speaking, with 64 threads I've averaged around 1,000 requests per minute from various commodity cloud instances.

MassFetcher may use significantly more bandwidth and disk space than you expect. Due to error pages, redirects, and oddly-configured servers, you're going to get plenty of junk data.

For instance, suppose you request `/ads.txt`:

- telegram.org replies with "200 OK" but sends their index page instead.
- booking.com properly sends a 404 response, but it weighs in at a hefty 300KB.
- whatsapp.com redirects to its 600KB index page.

Some of MassFetcher's settings can help mitigate junk data. In particular, the strict filename matching option will only write a fetched file to disk if the final destination URI, after all redirects, has the same base filename that you requested.

You should do some small test runs whenever you change configuration, before launching into an enormous fetch job.

###  Health Score

25

—

LowBetter than 37% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity5

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity56

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~144 days

Total

2

Last Release

2515d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/bda8c14d499a81b862fd0bbc33e137be054a9803974311ad6d247779c2cf8c7b?d=identicon)[parseword](/maintainers/parseword)

---

Top Contributors

[![parseword](https://avatars.githubusercontent.com/u/7971018?v=4)](https://github.com/parseword "parseword (5 commits)")

---

Tags

httphttp-requestsmultithreadedphp7pthreadsspiderphpspidermultithreaded

### Embed Badge

![Health badge](/badges/parseword-massfetcher/health.svg)

```
[![Health](https://phpackages.com/badges/parseword-massfetcher/health.svg)](https://phpackages.com/packages/parseword-massfetcher)
```

###  Alternatives

[khr/php-mcurl-client

wrap curl client (http client) for PHP 5.3; using php multi curl, parallel request and write asynchronous code

71219.8k6](/packages/khr-php-mcurl-client)[hannesvdvreken/guzzle-debugbar

A Guzzle middleware that logs requests to debugbar's timeline

76410.4k1](/packages/hannesvdvreken-guzzle-debugbar)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
