PHPackages                             jbsommeling/scannr - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. jbsommeling/scannr

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

jbsommeling/scannr
==================

A Laravel package that crawls websites to detect broken links, redirect chains, HTTPS downgrades, and more. Includes JavaScript rendering support for SPAs.

v1.0.6(1mo ago)01[1 PRs](https://github.com/JBSommeling/scannr/pulls)MITPHPPHP ^8.4CI passing

Since Mar 17Pushed 5d agoCompare

[ Source](https://github.com/JBSommeling/scannr)[ Packagist](https://packagist.org/packages/jbsommeling/scannr)[ Docs](https://github.com/JBSommeling/scannr)[ RSS](/packages/jbsommeling-scannr/feed)WikiDiscussions main Synced 3w ago

READMEChangelog (10)Dependencies (16)Versions (20)Used By (0)

[![Latest Stable Version](https://camo.githubusercontent.com/7b3b767fced2e43e0d90f22044230b5e32d17a13f232b5e0a937768143611797/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6a62736f6d6d656c696e672f7363616e6e722e737667)](https://packagist.org/packages/jbsommeling/scannr)[![Total Downloads](https://camo.githubusercontent.com/bcfaafa281400acd4a5b8e6e84352675a808c33142627e05cb6c5c220c4f0cae/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f6a62736f6d6d656c696e672f7363616e6e722e737667)](https://packagist.org/packages/jbsommeling/scannr)[![Tests](https://camo.githubusercontent.com/aad598b5595b31804f5cea6b1a3a856ba88a5057d7a8e2376a7f6a3deaec5403/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f4a42536f6d6d656c696e672f7363616e6e722f74657374732e796d6c3f6c6162656c3d7465737473)](https://github.com/JBSommeling/scannr/actions)[![License](https://camo.githubusercontent.com/7784bc1a9f9840447403556bcc9a0c7719123f3e40f088c1024b8e1d812392dc/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f6a62736f6d6d656c696e672f7363616e6e722e737667)](https://packagist.org/packages/jbsommeling/scannr)[![PHP Version Require](https://camo.githubusercontent.com/31dca127e256a2fa8b97bf62b6849bc16a5a02c32f386cf189aca40fe152c310/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f646570656e64656e63792d762f6a62736f6d6d656c696e672f7363616e6e722f706870)](https://packagist.org/packages/jbsommeling/scannr)[![Code Size](https://camo.githubusercontent.com/dbb127773a6c27b5aa5f91b433eac7f368ea69d86bb1e842f18b818e68de2407/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f6c616e6775616765732f636f64652d73697a652f4a42536f6d6d656c696e672f7363616e6e72)](https://github.com/JBSommeling/scannr)[![GitHub stars](https://camo.githubusercontent.com/efa165d77b67af4567bc0c7bfb8975cb562c3da892524c0bf1c3b0dd8f353b3f/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f4a42536f6d6d656c696e672f7363616e6e722e737667)](https://github.com/JBSommeling/scannr/stargazers)

Scannr
======

[](#scannr)

A Laravel package that crawls websites to detect broken links, redirect chains, HTTPS downgrades, and more. Includes JavaScript rendering support for SPAs. Use it as a **dev dependency** in your Laravel project or as a **Docker-based GitHub Action** in your CI/CD pipeline.

Features
--------

[](#features)

- **BFS Crawling** — Systematically crawls websites level by level
- **Broken Link Detection** — Identifies links returning 4xx/5xx status codes
- **Multi-Element Scanning** — Extracts URLs from ``, ``, ``, ``, media elements, download triggers, and inline JS handlers
- **JavaScript Rendering** — Scan SPAs (React, Vue, Angular) with headless browser support via `--js` flag
- **Smart JS Detection** — Automatically detect SPAs and enable JS rendering only when needed via `--smart-js` flag
- **Sitemap Integration** — Discover URLs from XML, HTML, or plain text sitemaps (including sitemap index files)
- **robots.txt Support** — Respects `Disallow` rules and `Crawl-delay` directives
- **Redirect Chain Tracking** — Follows and reports redirect chains, including loop detection
- **HTTPS Downgrade Detection** — Warns when redirects downgrade from HTTPS to HTTP
- **Link Flag System** — Every link is tagged with observation flags (discovery method, platform traits, technical anomalies, validation outcome)
- **Integrity Scoring** — Weight-based scoring model that rates your site's link health (A–F)
- **Quality Gates** — Fail CI/CD when critical issues are found or the integrity score drops below a configurable threshold
- **Rate Limit Backoff** — Automatic exponential backoff on HTTP 429 responses
- **Noise URL Filtering** — Hides XML namespaces, CDN preconnect hints, and JS framework error docs
- **Multiple Output Formats** — Table, JSON, or CSV output
- **Domain Validation** — Warns when the scanned URL doesn't match your `APP_URL`

Responsible Use
---------------

[](#responsible-use)

Scannr is intended for scanning websites you own or have explicit permission to test.

Users are responsible for ensuring their usage complies with:

- Website terms of service
- Applicable laws and regulations
- robots.txt directives

Do not use Scannr to:

- Crawl websites without permission
- Overload servers or bypass rate limits
- Access protected or private content

The authors are not responsible for misuse of this tool.

---

Installation
------------

[](#installation)

### As a Laravel Dev Dependency

[](#as-a-laravel-dev-dependency)

```
composer require jbsommeling/scannr
```

The package auto-discovers its service provider. To publish the configuration:

```
php artisan vendor:publish --tag=scannr-config
```

To publish the migration (for `--queue` support):

```
php artisan vendor:publish --tag=scannr-migrations
php artisan migrate
```

### JavaScript Rendering (Optional)

[](#javascript-rendering-optional)

For `--js` and `--smart-js` support, install Puppeteer:

```
npm install puppeteer
```

---

CLI Usage
---------

[](#cli-usage)

### Basic Scan

[](#basic-scan)

```
php artisan site:scan https://example.com
```

### Command Signature

[](#command-signature)

```
php artisan site:scan {url} [options]

```

### Options

[](#options)

OptionDefaultDescription`--depth=N`3Maximum crawl depth from the starting URL`--max=N`300Maximum number of URLs to scan`--timeout=N`5Request timeout in seconds`--format=FORMAT`tableOutput format: `table`, `json`, or `csv``--status=FILTER`allFilter results: `all`, `ok`, or `broken``--filter=TYPE`allFilter by element type: `all`, `a`, `link`, `script`, `img`, `media`, `form``--scan-elements=TYPES`allElement types to scan: `all`, or comma-separated (e.g., `a,img`)`--sitemap`falseUse sitemap.xml to discover URLs before crawling`--js`falseEnable JavaScript rendering for SPA/React sites`--smart-js`falseAuto-detect SPAs and enable JS rendering when needed`--no-robots`falseIgnore robots.txt rules`--advanced`falseShow noise URLs (XML namespaces, CDN hints, etc.)`--strip-params=PARAMS`—Additional tracking parameters to strip (comma-separated)`--delay-min=N`configMinimum delay between requests in milliseconds`--delay-max=N`configMaximum delay between requests in milliseconds`--queue`falseDispatch scan as a background job`--fail-on-critical`falseFail with exit code 1 if critical issues are found`--min-rating=GRADE`noneMinimum acceptable rating: `excellent`, `good`, `needs_attention`, `none`### Examples

[](#examples)

```
# Show only broken links
php artisan site:scan https://example.com --status=broken

# Deep scan with sitemap discovery
php artisan site:scan https://example.com --depth=5 --max=1000 --sitemap

# Scan a React SPA with JSON output
php artisan site:scan https://my-spa.com --js --format=json

# Only scan images and anchors
php artisan site:scan https://example.com --scan-elements=a,img --status=broken

# Throttle requests (200–500ms random delay between each request)
php artisan site:scan https://example.com --delay-min=200 --delay-max=500

# Fail if any critical issues found or rating drops below "Good"
php artisan site:scan https://example.com --fail-on-critical --min-rating=good

# Strict quality gate: require "Excellent" rating
php artisan site:scan https://example.com --fail-on-critical --min-rating=excellent
```

### Example Output

[](#example-output)

#### Table (default)

[](#table-default)

```
Site Scan: https://www.example.com
========================================

  Robots.txt Crawl-delay: 1s (using 1000ms-1000ms delay)
  Robots.txt: respecting 4 Disallow/Allow rule(s)
 11/30 [▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░]  36% 30/30 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%

  🟢 Site Integrity Score: 100.0 / 100  —  Excellent

    Link Integrity:        100.0 / 100
    Security Hygiene:      100.0 / 100
    Technical Hygiene:     100.0 / 100
    Redirect Health:       100.0 / 100
    Link Verifiability:    100.0 / 100

  Critical Issues:       0
  Broken Links:          0
  Warnings:              0
  Manual Verification:   0

Summary:
  Total scanned:     11
  Pages scanned:     2
  Internal links:    1
  Assets scanned:    9
  External links:    1
  Working (2xx):     11
  Redirects:         0
  Broken:            0
  Timeouts:          0

+----------------------------------------------+--------------------------+----------+--------+----------+
| URL                                          | Source                   | Element  | Status | Type     |
+----------------------------------------------+--------------------------+----------+--------+----------+
| https://www.example.com                      | start                    |       | 200    | internal |
| https://example.com/favicon.svg              | https://www.example.com  |    | 200    | internal |
| https://example.com/favicon.ico              | https://www.example.com  |    | 200    | internal |
| https://example.com/assets/app-Dk29a1xC.css  | https://www.example.com  |    | 200    | internal |
| https://example.com/assets/app-Bx7L3f2q.js   | https://www.example.com  |  | 200    | internal |
| https://fonts.googleapis.com/css2?family=... | https://www.example.com  |    | 200    | external |
+----------------------------------------------+--------------------------+----------+--------+----------+

```

#### JSON (`--format=json`)

[](#json---formatjson)

```
{
    "summary": {
        "totalScanned": 8,
        "ok": 8,
        "redirects": 0,
        "broken": 0,
        "timeouts": 0,
        "redirectChainCount": 0,
        "totalRedirectHops": 0,
        "httpsDowngrades": 0,
        "criticalCount": 0,
        "warningCount": 0,
        "lowConfidenceCount": 0,
        "pagesScanned": 2,
        "internalLinks": 1,
        "assetsScanned": 6,
        "externalLinks": 1
    },
    "integrityScore": {
        "overallScore": 100,
        "grade": "Excellent",
        "components": {
            "linkIntegrity": 100,
            "securityHygiene": 100,
            "technicalHygiene": 100,
            "redirectHealth": 100,
            "linkVerifiability": 100
        }
    },
    "links": [
        {
            "url": "https://www.example.com",
            "sourcePage": "start",
            "status": "200",
            "type": "internal",
            "sourceElement": "a",
            "analysis": {
                "flags": [],
                "severity": "info",
                "confidence": "high",
                "verification": "none"
            },
            "redirect": {
                "chain": [],
                "isLoop": false,
                "hasHttpsDowngrade": false
            },
            "network": {
                "retryAfter": null
            }
        }
    ],
    "brokenLinks": []
}
```

---

GitHub Action
-------------

[](#github-action)

Use Scannr in your CI/CD pipeline to catch broken links on every deploy.

### Basic Usage

[](#basic-usage)

```
name: Scan for broken links
on:
  push:
    branches: [main]
  schedule:
    - cron: '0 6 * * 1'  # Weekly Monday 6am

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: JBSommeling/scannr@v1
        with:
          url: https://example.com
```

### All Options

[](#all-options)

```
- uses: JBSommeling/scannr@v1
  with:
    url: https://example.com       # Required
    depth: 5                        # Max crawl depth (default: 3)
    max: 500                        # Max URLs to scan (default: 300)
    timeout: 10                     # Request timeout in seconds (default: 5)
    format: table                   # Output: table, json, csv (default: table)
    status: broken                  # Filter: all, ok, broken (default: all)
    filter: all                     # Element type filter (default: all)
    scan-elements: a,img            # Elements to scan (default: all)
    sitemap: true                   # Use sitemap.xml (default: false)
    js: true                        # JS rendering for SPAs (default: false)
    smart-js: true                  # Auto-detect SPAs (default: false)
    no-robots: false                # Ignore robots.txt (default: false)
    advanced: false                 # Show noise URLs (default: false)
    strip-params: ref,tracker       # Extra tracking params to strip
    delay-min: 200                  # Min delay between requests in ms
    delay-max: 500                  # Max delay between requests in ms
    fail-on-broken: true            # Fail step on broken links (default: true)
    fail-on-critical: true          # Fail step on critical issues (default: true)
    min-rating: good                # Minimum integrity rating (default: good)
```

### Examples

[](#examples-1)

**Scan a SPA with JavaScript rendering:**

```
- uses: JBSommeling/scannr@v1
  with:
    url: https://my-react-app.com
    js: true
    status: broken
```

**Get JSON output for downstream processing:**

```
- uses: JBSommeling/scannr@v1
  id: scan
  with:
    url: https://example.com
    format: json
    fail-on-broken: false

- name: Process results
  if: steps.scan.outputs.exit-code != '0'
  run: echo "Broken links detected!"
```

**Deep weekly audit with sitemap:**

```
- uses: JBSommeling/scannr@v1
  with:
    url: https://example.com
    depth: 5
    max: 1000
    sitemap: true
    smart-js: true
```

**Quality gate — require "Excellent" rating:**

```
- uses: JBSommeling/scannr@v1
  with:
    url: https://example.com
    fail-on-critical: true
    min-rating: excellent
```

**Relaxed quality gate — only fail on critical issues:**

```
- uses: JBSommeling/scannr@v1
  with:
    url: https://example.com
    fail-on-critical: true
    min-rating: none
```

---

URL Restriction
---------------

[](#url-restriction)

Scannr is designed for scanning **your own websites**, not arbitrary third-party sites.

### Local Dev Dependency

[](#local-dev-dependency)

When running via `php artisan site:scan`, Scannr compares the scan URL's domain against your `APP_URL` environment variable. If they don't match, a warning is displayed:

```
⚠ The scan URL domain (other-site.com) does not match your APP_URL (https://myapp.com).
  Scannr is intended for scanning your own websites.

```

### GitHub Action (.scannr.yml)

[](#github-action-scannryml)

In CI/CD, you can create a `.scannr.yml` file in your repository root to declare which domains are allowed:

```
# .scannr.yml
allowed_domains:
  - example.com
  - staging.example.com
  - www.example.com
```

When `.scannr.yml` is present, the GitHub Action will **block** scans of URLs that don't match any listed domain.

If no `.scannr.yml` exists, the URL input is trusted (since you control your own workflow file).

---

Configuration
-------------

[](#configuration)

Publish the config file to customize scanner behavior:

```
php artisan vendor:publish --tag=scannr-config
```

This creates `config/scannr.php` with settings for:

- **Tracking Parameters** — URL params to strip during normalization (utm\_\*, fbclid, etc.)
- **User Agent** — The User-Agent header sent with requests
- **Rate Limiting** — Delay between requests, backoff on 429 responses
- **Hard Limits** — Maximum caps for depth and URL count
- **JavaScript Rendering** — Custom paths to Node, npm, and Chrome binaries
- **Noise URL Detection** — Namespace domains, preconnect hints, framework error patterns
- **Integrity Scoring** — Penalty weights, confidence multipliers, grade thresholds
- **External Platforms** — Domains with known bot protection (LinkedIn, GitHub, etc.)

---

Testing
-------

[](#testing)

```
composer test
```

---

License
-------

[](#license)

This package is open-sourced software licensed under the [MIT license](https://opensource.org/licenses/MIT).

[![Repository Health](https://camo.githubusercontent.com/bd23bfb9b23e9076457fb5fb21560180bfc9a267facde1ae92f1ca5c8a4d3121/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4865616c74682d37332532463130302d79656c6c6f77)](https://camo.githubusercontent.com/bd23bfb9b23e9076457fb5fb21560180bfc9a267facde1ae92f1ca5c8a4d3121/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4865616c74682d37332532463130302d79656c6c6f77)

Last updated: 2026-06-13

###  Health Score

44

—

FairBetter than 91% of packages

Maintenance96

Actively maintained with recent releases

Popularity1

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity60

Established project with proven stability

 Bus Factor1

Top contributor holds 94.3% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~3 days

Recently: every ~13 days

Total

17

Last Release

44d ago

Major Versions

v0.1.9 → v1.0.02026-03-20

### Community

Maintainers

![](https://www.gravatar.com/avatar/12237136bcf5c7f1184dee9af0f6d7206b48776a8dc9736cdc9a3d40b96e2f90?d=identicon)[JBSommeling](/maintainers/JBSommeling)

---

Top Contributors

[![JBSommeling](https://avatars.githubusercontent.com/u/42837257?v=4)](https://github.com/JBSommeling "JBSommeling (199 commits)")[![Copilot](https://avatars.githubusercontent.com/in/1143301?v=4)](https://github.com/Copilot "Copilot (10 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (2 commits)")

---

Tags

laravelcrawlerseoscannerbroken-linksci-cdgithub-action

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/jbsommeling-scannr/health.svg)

```
[![Health](https://phpackages.com/badges/jbsommeling-scannr/health.svg)](https://phpackages.com/packages/jbsommeling-scannr)
```

###  Alternatives

[craftcms/cms

Craft CMS

3.6k3.6M2.9k](/packages/craftcms-cms)[spatie/laravel-export

Create a static site bundle from a Laravel app

672139.5k6](/packages/spatie-laravel-export)[laravel/ai

The official AI SDK for Laravel.

9782.1M162](/packages/laravel-ai)[blackfire/player

A powerful web crawler and web scraper with Blackfire support

49517.1k](/packages/blackfire-player)[zidbih/laravel-deadlock

Make temporary Laravel workarounds expire and fail CI when ignored.

984.0k](/packages/zidbih-laravel-deadlock)[interaction-design-foundation/laravel-geoip

Support for multiple Geographical Location services.

19253.0k3](/packages/interaction-design-foundation-laravel-geoip)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
