PHPackages                             agentsquidflaps/web-scraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. agentsquidflaps/web-scraper

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

agentsquidflaps/web-scraper
===========================

Scrape website sitemaps for HTML elements

1.0.0(5y ago)151MITPHPPHP &gt;=7.2

Since Sep 13Pushed 5y ago2 watchersCompare

[ Source](https://github.com/agentsquidflaps/web-scraper)[ Packagist](https://packagist.org/packages/agentsquidflaps/web-scraper)[ Docs](https://github.com/agentsquidflaps/web-scraper)[ RSS](/packages/agentsquidflaps-web-scraper/feed)WikiDiscussions master Synced 3w ago

READMEChangelogDependencies (3)Versions (3)Used By (0)

Getting started
===============

[](#getting-started)

Install
-------

[](#install)

```
composer install agentsquidflaps/web-scraper

```

Requirements
------------

[](#requirements)

- PHP 7.2 or greater
- ext-json
- ext-simplexml
- symfony/dom-crawler 4 or greater
- symfony/css-selector 4 or greater

### Documentation

[](#documentation)

Please see below for basic usage or you can go to  for more information.

Usage
-----

[](#usage)

Basic usage...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])->getData()

```

...this will simply output the HTML for all pages in your sitemap in a JSON format.

You can also target specific elements on a page...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])
->setElements([
    '.btn',
    'table'
])
->getData()

```

...and instead of returning the whole page, it'll return elements in a page that match the criteria of the elements provided.

### Save file

[](#save-file)

You can also save the data to a file. To do so just...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])
->setFileLocation('somewhere.json')
->saveData()

```

### Formats

[](#formats)

You can also output the data in different formats. Supported formats are currently JSON, Array and CSV.

```
(new WebScraper())->setSitemaps([
        'https://www.yoursite.com/sitemap.xml'
    ])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->saveData()

```

### Disabling verify peer

[](#disabling-verify-peer)

You don't have to verify peer when grabbing URLs to scrape (although, highly recommended). This can be useful if the URLs provided in the sitemap have sketchy or non-existent SSLs.

```
(new WebScraper())->setSitemaps([
        'http://www.yoursite.com/sitemap.xml'
    ])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->setVerifyPeerEnabled(false)
->saveData()

```

###  Health Score

23

—

LowBetter than 26% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community5

Small or concentrated contributor base

Maturity51

Maturing project, gaining track record

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

2110d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/26868698?v=4)[Timothy Norris](/maintainers/agentsquidflaps)[@agentsquidflaps](https://github.com/agentsquidflaps)

---

Tags

phpscrapersitemaps

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/agentsquidflaps-web-scraper/health.svg)

```
[![Health](https://phpackages.com/badges/agentsquidflaps-web-scraper/health.svg)](https://phpackages.com/packages/agentsquidflaps-web-scraper)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.9k](/packages/craftcms-cms)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49517.1k](/packages/blackfire-player)[vdb/php-spider

A configurable and extensible PHP web spider

1.3k184.2k7](/packages/vdb-php-spider)[crwlr/crawler

Web crawling and scraping library.

36816.4k2](/packages/crwlr-crawler)[drupal/core-dev

require-dev dependencies from drupal/drupal; use in addition to drupal/core-recommended to run tests from drupal/core.

2022.0M321](/packages/drupal-core-dev)[jansenfelipe/cnpj-gratis

Com esse pacote você poderá consultar, gratuitamente, CNPJs diretamente no site da receita.

17021.0k](/packages/jansenfelipe-cnpj-gratis)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
