PHPackages                             agentsquidflaps/web-scraper - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. agentsquidflaps/web-scraper

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

agentsquidflaps/web-scraper
===========================

Scrape website sitemaps for HTML elements

1.0.0(5y ago)151MITPHPPHP &gt;=7.2

Since Sep 13Pushed 5y ago2 watchersCompare

[ Source](https://github.com/agentsquidflaps/web-scraper)[ Packagist](https://packagist.org/packages/agentsquidflaps/web-scraper)[ Docs](https://github.com/agentsquidflaps/web-scraper)[ RSS](/packages/agentsquidflaps-web-scraper/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependencies (3)Versions (3)Used By (0)

Getting started
===============

[](#getting-started)

Install
-------

[](#install)

```
composer install agentsquidflaps/web-scraper

```

Requirements
------------

[](#requirements)

- PHP 7.2 or greater
- ext-json
- ext-simplexml
- symfony/dom-crawler 4 or greater
- symfony/css-selector 4 or greater

### Documentation

[](#documentation)

Please see below for basic usage or you can go to  for more information.

Usage
-----

[](#usage)

Basic usage...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])->getData()

```

...this will simply output the HTML for all pages in your sitemap in a JSON format.

You can also target specific elements on a page...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])
->setElements([
    '.btn',
    'table'
])
->getData()

```

...and instead of returning the whole page, it'll return elements in a page that match the criteria of the elements provided.

### Save file

[](#save-file)

You can also save the data to a file. To do so just...

```
(new WebScraper())->setSitemaps([
    'https://www.yoursite.com/sitemap.xml'
])
->setFileLocation('somewhere.json')
->saveData()

```

### Formats

[](#formats)

You can also output the data in different formats. Supported formats are currently JSON, Array and CSV.

```
(new WebScraper())->setSitemaps([
        'https://www.yoursite.com/sitemap.xml'
    ])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->saveData()

```

### Disabling verify peer

[](#disabling-verify-peer)

You don't have to verify peer when grabbing URLs to scrape (although, highly recommended). This can be useful if the URLs provided in the sitemap have sketchy or non-existent SSLs.

```
(new WebScraper())->setSitemaps([
        'http://www.yoursite.com/sitemap.xml'
    ])
->setFileLocation('somewhere.csv')
->setFormat(WebScraper::FORMAT_CSV)
->setVerifyPeerEnabled(false)
->saveData()

```

###  Health Score

23

—

LowBetter than 27% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity6

Limited adoption so far

Community5

Small or concentrated contributor base

Maturity50

Maturing project, gaining track record

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

2065d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/26868698?v=4)[Timothy Norris](/maintainers/agentsquidflaps)[@agentsquidflaps](https://github.com/agentsquidflaps)

---

Tags

phpscrapersitemaps

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/agentsquidflaps-web-scraper/health.svg)

```
[![Health](https://phpackages.com/badges/agentsquidflaps-web-scraper/health.svg)](https://phpackages.com/packages/agentsquidflaps-web-scraper)
```

###  Alternatives

[vdb/php-spider

A configurable and extensible PHP web spider

1.4k181.0k7](/packages/vdb-php-spider)[dusterio/link-preview

Link preview generation for PHP with Laravel support

126326.6k3](/packages/dusterio-link-preview)[crwlr/crawler

Web crawling and scraping library.

37214.8k2](/packages/crwlr-crawler)[jansenfelipe/cnpj-gratis

Com esse pacote você poderá consultar, gratuitamente, CNPJs diretamente no site da receita.

17620.9k](/packages/jansenfelipe-cnpj-gratis)[godbout/dash-docset-builder

Dash (LOVE) Docset Builder in PHP (LOVE).

1253.5k](/packages/godbout-dash-docset-builder)[jansenfelipe/cep-gratis

Com esse pacote você poderá realizar consultas de CEP gratuitamente.

689.4k](/packages/jansenfelipe-cep-gratis)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
