PHPackages                             sandermuller/laravel-queue-insights - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Queues &amp; Workers](/categories/queues)
4. /
5. sandermuller/laravel-queue-insights

ActiveLibrary[Queues &amp; Workers](/categories/queues)

sandermuller/laravel-queue-insights
===================================

Self-hosted, driver-agnostic queue observability for Laravel. Per-class throughput, durations, failures, and live depth/in-flight/delayed metrics with a Livewire dashboard.

0.22.0(2w ago)52.4k—0.9%[1 PRs](https://github.com/SanderMuller/laravel-queue-insights/pulls)MITPHPPHP ^8.3CI passing

Since Apr 26Pushed 4d ago1 watchersCompare

[ Source](https://github.com/SanderMuller/laravel-queue-insights)[ Packagist](https://packagist.org/packages/sandermuller/laravel-queue-insights)[ Docs](https://github.com/SanderMuller/laravel-queue-insights)[ RSS](/packages/sandermuller-laravel-queue-insights/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (10)Dependencies (36)Versions (28)Used By (0)

Laravel Queue Insights
======================

[](#laravel-queue-insights)

[![Latest Version on Packagist](https://camo.githubusercontent.com/d6d91bf8f5dff98d1808eb16acbc84e2db912aee53cfd529c36649a9d2d3d965/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874732e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/sandermuller/laravel-queue-insights)[![GitHub Tests Action Status](https://camo.githubusercontent.com/2b00c98605b74fbfd03b8f3a2faac271bca89c2290eec3dd816c58f0cf14de04/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874732f72756e2d74657374732e796d6c3f6272616e63683d6d61696e266c6162656c3d7465737473267374796c653d666c61742d737175617265)](https://github.com/sandermuller/laravel-queue-insights/actions?query=workflow%3Arun-tests+branch%3Amain)[![GitHub PHPStan Action Status](https://camo.githubusercontent.com/0aca841b715bcf0a4394e12159b287c7fbf175c1e4e44be507c2110f699a94f6/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874732f7068707374616e2e796d6c3f6272616e63683d6d61696e266c6162656c3d7068707374616e267374796c653d666c61742d737175617265)](https://github.com/sandermuller/laravel-queue-insights/actions?query=workflow%3Aphpstan+branch%3Amain)[![Total Downloads](https://camo.githubusercontent.com/6a226bc674b3636cc530df90224666bb50130ac932c7334e70aadc183e4622a0/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874732e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/sandermuller/laravel-queue-insights)[![License](https://camo.githubusercontent.com/d05106b5af6f95e80b8a0a31a29083cad7d66ba1e729388331f493628a2506bd/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874732e7376673f7374796c653d666c61742d737175617265)](LICENSE)[![Laravel Compatibility](https://camo.githubusercontent.com/e79e3106800fd0406ec3a27ad0a6eec9abdb545e5425ee3160de42549239251d/68747470733a2f2f62616467652e6c61726176656c2e636c6f75642f62616467652f73616e6465726d756c6c65722f6c61726176656c2d71756575652d696e7369676874733f7374796c653d666c6174)](https://packagist.org/packages/sandermuller/laravel-queue-insights)

Self-hosted, driver-agnostic queue observability for Laravel.

[![Queue Insights dashboard](screenshot.png)](screenshot.png)

Contents
--------

[](#contents)

- [Live demo](#live-demo)
- [Features](#features)
- [Requirements](#requirements)
- [Install](#install)
- [Payload capture](#payload-capture)
- [Dashboard](#dashboard)
- [Running workers](#running-workers)
- [Ops runbook](#ops-runbook)
- [Alerting](#alerting)
- [Horizon supervisor auto-discovery](#horizon-supervisor-auto-discovery)
- [Connection aliasing](#connection-aliasing)
- [Prometheus](#prometheus)
- [Scheduler observability](#scheduler-observability)
- [Testing](#testing)
- [Upgrading](#upgrading)
- [Changelog](#changelog)
- [Contributing](#contributing)
- [Security Vulnerabilities](#security-vulnerabilities)
- [Credits](#credits)
- [License](#license)

Live demo
---------

[](#live-demo)

[**queue-insights-demo.laravel.cloud**](https://queue-insights-demo.laravel.cloud) — public preview hosted on Laravel Cloud, seeded with realistic fixtures.

Features
--------

[](#features)

- **Driver-agnostic depth, in-flight, delayed counts** per queue — SQS, Redis, database.
- **Pending &amp; delayed-job inspector** per queue, event-captured into Redis (same view across drivers). Optional payload capture under a separate budget so the per-row hash math doesn't pin the completed-stream sanitiser settings.
- **Batched jobs** — per-batch progress, counts, cancelled state, per-item rollup linking back to job modals.
- **Chained-job visibility** — `↳ Next` chip + Chain modal section, plus opportunistic backward `↰ From {parent}` lineage.
- **Job initiator** — every job records where it was dispatched from: a coarse origin (HTTP route / artisan command / scheduled task, propagated into nested dispatches via `Context`) and an opt-in `file:line` call site. Surfaced in the completed / pending / failed modals.
- **Wait time** per queue (p50 / p95) and per job — enqueue → pickup gap.
- **24h throughput sparkline** + headline stats (jobs/min, past hour, max p95 wait + runtime). Optional 7th tile: total Redis bytes consumed by the package's keyspace.
- **Queues grouped *Needs attention* vs *Healthy*** so a broken queue can't hide in a long list.
- **Per-class metrics** — 24h processed / failed, avg + max duration, last run.
- **Recent completed + failed lists** with shared filter row (connection, queue, class, date range), per-page dropdown (10 / 25 / 50 / 100), all persisted in the URL. Failed rows surface runtime alongside Completed (computed via a 30 d `failed-runtime:{uuid}` side-key written when the worker's `start:` stamp survives to `JobFailed`).
- **Global queue + class scope across every section.** Click a queue row in the Overview section's queues tables or a class row on the Classes section to scope Failed, Completed, Pending, Classes, and Silenced lists in one move. URL-shareable (`?qk={conn}:{queue}`, `?ck={fqcn}`); inline scope strip above the section panes shows the active scope with per-chip clear; click an already-selected row to toggle off. Scoping a silenced class auto-reveals its rows on Failed + Completed.
- **Retry badge** — pending, in-flight, and completed rows render an orange `retry N` chip with hover tooltip when the worker has picked the job up more than once. Backed by `attempts` stamped on the `pending:{uuid}` hash at `JobProcessing`.
- **Retry failed jobs** from the dashboard, single or bulk — gated, rate-limited, audit-logged.
- **Markdown export** of failed-job details for AI-assisted triage or trackers.
- **Alerting** — nine detectors (depth, stalled, oldest-pending, stuck-inflight, failure-rate, slow-p95, snapshot-errored, backlog-growing, connection-drift) with per-rule cooldown + `log` / `slack` / `mail` / `sentry` channels + typed events.
- **Prometheus** — opt-in `/metrics` (text + OpenMetrics), fail-closed auth, per-class cardinality control, optional scheduler metrics families, plus a `prometheus-push` command for short-lived workers.
- **Scheduler observability** — opt-in. Captures every `Illuminate\Console\Events\Scheduled*` into per-task definition snapshots + per-run records (start/finish/exit/runtime/host/output), exposes a lazy-loaded dashboard panel with per-task + per-run drilldown modals (host-distribution chart, correlated-jobs section, exception block, output viewer, markdown export), ships a missed/hung sweeper, and routes scheduler alerts through the same `QueueAlertNotification` pipeline as queue alerts (log / slack / mail / sentry; per-domain channel block) — typed `ScheduledTaskMissed` / `ScheduledTaskHung` / `ScheduledTaskFailed` events still fire alongside.
- **Horizon integration** — supervisor queue auto-discovery from `horizon.environments`, `horizon.silenced` merged into our suppression filter, operator-declared `connection_aliases` collapses dispatcher/worker connection drift onto a canonical key.
- **Light / dark / system theme** with a tri-state toggle in the header. Persists per operator; default follows OS `prefers-color-scheme`.
- **12h / 24h clock toggle** in the header (12h / auto / 24h). `auto` follows browser locale + OS 24-hour preference. Persists per operator.
- **Standalone Livewire + Blade** — no Filament or Nova coupling.
- **Small, bounded Redis footprint** — auto-evicting, no external observability service required.
- **Redis Cluster compatible** — opt-in (`QUEUE_INSIGHTS_REDIS_CLUSTER`); hash-tag pinning keeps the keyspace on one slot so the package's multi-key Lua + pipelines stay CROSSSLOT-legal.

Requirements
------------

[](#requirements)

- PHP 8.3+
- Laravel 11, 12, or 13
- Redis (for insights storage)
- `livewire/livewire` 3 or 4 (only if you use the bundled dashboard route).

CI runs against three Livewire resolver legs: Livewire 3.0, Livewire 3 latest, and Livewire 4 latest. Coverage is PHP-side only. The JS and Alpine paths aren't browser-tested, so do a smoke render in your own staging before upgrading the host.

Install
-------

[](#install)

```
composer require sandermuller/laravel-queue-insights
php artisan vendor:publish --tag=queue-insights-config
```

The service provider auto-discovers.

### Run the scheduler

[](#run-the-scheduler)

Every dashboard tile, alert detector, and Prometheus gauge reads from snapshots written by `php artisan queue-insights:snapshot`. The package auto-registers it on Laravel's scheduler with `->everyMinute()->withoutOverlapping()` — you just need a host that runs `php artisan schedule:work` (or the equivalent `* * * * * cd /path && php artisan schedule:run` cron).

To opt out and wire it yourself, set `queue-insights.schedule.enabled = false` and add `Schedule::command('queue-insights:snapshot')` to your own kernel.

`snapshots[]` lists the queues to capture. Static config plus Horizon autodiscovery (when `laravel/horizon` is installed) cover most setups — see the published `config/queue-insights.php` for the shape and the Horizon section below.

### Optional environment knobs

[](#optional-environment-knobs)

VarDefaultPurpose`QUEUE_INSIGHTS_REDIS``default`Laravel Redis connection name the package writes to. Point at a dedicated DB on shared Redis.`QUEUE_INSIGHTS_KEY_PREFIX``qm:{APP_ENV}:`Prefix for every Redis key the package writes. See [Key-prefix strategies](#key-prefix-strategies).Subsystems each carry their own `.enabled` switch (`dashboard.enabled`, `pending.enabled`, `alerts.enabled`, `prometheus.enabled`, `scheduler.enabled`, `batches.enabled`, `initiator.enabled`) — flip those individually rather than reaching for a global kill switch.

Payload capture
---------------

[](#payload-capture)

Off by default. Laravel payloads embed serialized and sometimes encrypted job state, and a regex over JSON keys can't sanitize that safely.

Three modes via `QUEUE_INSIGHTS_CAPTURE_PAYLOADS`:

ModeBehavior`off` *(default)*No payload persisted.`metadata``displayName`, `maxTries`, `timeout`, `backoff` only. No user data, no serialized command body.`full`Raw body after a sanitizer pass. Apps with sensitive jobs MUST bind a custom `PayloadSanitizer` that understands their job shape.Read [`SECURITY.md`](SECURITY.md) before enabling `full`.

### Pending payload capture (separate budget)

[](#pending-payload-capture-separate-budget)

The completed-stream `capture.payloads` setting controls what's persisted on **completed and failed** rows. Pending and in-flight rows have their own knob because the memory math differs structurally — completed-stream entries are MAXLEN-trimmed (`N × bytes`), but pending hashes fan out as `max_per_queue × queues × TTL`, which on a 10k-row × 10-queue host is ~400 MB at 4 KB/row.

```
# .env
QUEUE_INSIGHTS_PENDING_CAPTURE_PAYLOADS=metadata    # off | metadata | full
QUEUE_INSIGHTS_PENDING_INCLUDE_COMMAND_BODY=false   # opt in to persist data.command bytes
```

Defaults: `off` (no payload fields written), `4096` byte cap per pending hash (a quarter of the completed-stream cap), and `data.command` omitted even under `full` until the host explicitly opts in. The same `capture.redact_keys` regex list is applied either way.

Dashboard
---------

[](#dashboard)

Mounts at `/queue-insights` when `dashboard.enabled=true` and `livewire/livewire` is installed. Define the `viewQueueInsights` Gate in your app:

```
// app/Providers/AuthServiceProvider.php
Gate::define('viewQueueInsights', fn ($user) => $user->isAdmin());
```

### Multi-connection scoping

[](#multi-connection-scoping)

When you monitor more than one queue connection (e.g. a multi-tenant app with one connection per tenant, or a mixed `sqs` + `redis` setup), the dashboard exposes connection as a **first-class navigation axis**, not a filter dropdown:

- `/queue-insights` — un-scoped, every monitored connection aggregated into one view.
- `/queue-insights/{connection}` — scoped to a single connection. Every panel narrows: queue rows, alerts strip, snapshot watchdog, pending/delayed/in-flight inspectors, batches, recent completed/failed lists, headline stats (jobs / min, throughput sparkline, p95 wait, max runtime), per-class metrics, and the alert-rules panel's depth thresholds.

A tab strip above the headline cards renders one tab per allowed connection plus an "All" tab. The strip auto-suppresses when only one connection is monitored.

The `{connection}` segment is constrained to the union of `snapshots.*.connection` and any Horizon-autodiscovered supervisor connections — typos 404 instead of mounting an empty dashboard. Pre-alias legacy URLs (`/queue-insights/redis` when `aliases.redis = redis-staging` is published) resolve to the canonical scope.

#### Per-connection authorisation (optional)

[](#per-connection-authorisation-optional)

Add the `viewQueueInsightsConnection` Gate to authorise per connection:

```
// app/Providers/AuthServiceProvider.php
Gate::define('viewQueueInsightsConnection', function ($user, string $connection): bool {
    return $user->canAccessTenant($connection);
});
```

When defined, the dashboard:

- 403s direct visits to `/queue-insights/{connection}` the user can't access.
- Hides denied connections from the tab strip.
- Renames the "All" tab to "All allowed" with a tooltip listing only the connections the user can already open (denied tenants are never named).

If the gate isn't defined, every monitored connection is reachable to anyone who passes `viewQueueInsights` — same behaviour as pre-spec versions.

#### Audit log carries scope

[](#audit-log-carries-scope)

Every retry log line (`queue-insights.retry`) includes `scope_connection` alongside the existing filter snapshot, so retries that span tenants are distinguishable from scoped retries.

#### Upgrade note — per-connection class metrics need traffic to warm

[](#upgrade-note--per-connection-class-metrics-need-traffic-to-warm)

Per-connection class counters (`processed:{class}:{connection}:{bucket}`, `failed:{class}:{connection}:{bucket}`, `duration:{class}:{connection}`, `last_run:{class}:{connection}`, `classes:{connection}` zset) are dual-written alongside the existing aggregate keys. Aggregate dashboards (`/queue-insights`) render correctly from second 0 after upgrade. Scoped views (`/queue-insights/{connection}`) for per-class p95 / throughput / 24h totals fill in as new events flow — the first hour after deploy will show `0` for class counts on a scoped view. Aggregate keys are unchanged so rolling back the package version is safe.

#### Known limitations under scope

[](#known-limitations-under-scope)

These v1 gaps surface only on the connection-scoped routes; the un-scoped dashboard is unaffected.

- **Heterogeneous batches are first-write-wins.** A `Bus::batch([...])` whose member jobs span multiple connections is indexed under the connection that dispatched the FIRST job. Other connections' scoped views won't see the batch. The detail/items view under scope reads `qi:batch-uuid-conn:{uuid}` (a dedicated uuid → connection side-key written when the job is queued, lifetime = `batches.ttl_seconds`) so cross-connection member uuids stay filtered even after the member has been processed/failed. Members past `batches.ttl_seconds` from queue time pass through. Operators relying on heterogeneous batches can fall back to the un-scoped view, which still shows every batch.
- **Recent completed list under a class drilldown post-filters by connection.** When the operator selects a class on a scoped view (`?class=App\\Foo` on `/queue-insights/redis`) the read routes to the per-class stream and post-filters rows by their `connection` field. The class stream caps at 1000 entries so the post-filter is cheap, but in extreme traffic skews a class drilldown may show fewer rows than the un-scoped class view. The plain scoped Recent completed list (no class drilldown) reads the dedicated `qi:completed:connection:{c}` stream and is unaffected.
- **Per-connection counter dual-write isn't atomic.** Aggregate and per-connection counters are written as separate Redis commands. A listener crash mid-write can leave the per-connection counter behind aggregate; later traffic re-fills it. Same best-effort guarantee the package's existing listeners offer; never produces phantom data.

### Retry permissions (write actions)

[](#retry-permissions-write-actions)

Retrying a failed job is a write action and needs its own Gate, separate from the read-only `viewQueueInsights`:

```
Gate::define('retryFailedJobs', fn ($user) => $user->isAdmin());
```

Without that Gate, the Retry button stays hidden in the failed-job modal, the bulk Retry button stays hidden above the failed-jobs table, and direct calls to the underlying Livewire methods (`retryFailed`, `retryFailedBulk`) return 403.

The retry path uses Laravel's first-party `queue:retry` Artisan command, so it's idempotent against an already-retried row and works regardless of queue driver.

Guards on the retry path:

- 30 retries per minute, per user.
- The server rejects a bulk retry when the matching set is over 100 rows. The UI shows a "narrow to retry" hint instead of the action button.
- The server also rejects a bulk retry when no filter is set, so you can't accidentally one-click retry every failed job.
- Every retry writes an `info`-level log line with channel `queue-insights.retry`, including the user id, the active filter set, and `scope_connection` (the multi-connection scope, when set). Forward that to your audit log.

### Retry workflow

[](#retry-workflow)

To triage a failed job:

1. Open the dashboard and find the row in the **Recent failed** list.
2. Optional: narrow with the inline filter toolbar above the list — connection, queue, class, or date range. The URL updates as you change a field, so the filtered view is shareable.
3. Click any row to open the failed-job modal. You'll see the exception, stack trace, payload, and metadata.
4. To retry one job, click *Retry* in the modal header. The button flips to a red "Confirm retry?" for two seconds; click again to fire. The modal closes and a green banner confirms dispatch. If `queue:retry` exits non-zero, you get a red banner instead of a misleading success.
5. To retry several at once, set at least one filter. A *Retry N jobs* button appears next to the section heading, with the same two-click confirm pattern. Anything matching more than 100 rows shows a *N matches · narrow to retry* hint instead of an action button.

A failed retry never leaves the dashboard in a half-broken state. The row is either re-dispatched (and removed from `failed_jobs`) or left alone.

### Filtering &amp; scoping

[](#filtering--scoping)

There are two layers. **Global scope** (queue + class) is set by clicking a row in the queues tables (Overview section) or on the Classes section and applies to every list pane — Failed, Completed, Pending, Silenced. **Per-pane filters** narrow within a section on top of the active scope.

#### Global scope

[](#global-scope)

AxisSet byCleared byQuery-string keyClassclicking a class row on the **Classes** sectionclicking the same row again, or the chip's `×``ck`Queueclicking the connection/queue cell in the **Overview** section's queues tablesclicking the same row again, or the chip's `×``qk`Active scope renders as an inline `Filtering by queue=… · class=…` strip above the section panes with a per-chip clear button. URL-shareable so a paste into chat preserves the operator's view.

When the active class scope IS a class in `queue-insights.silenced`, both Failed and Completed auto-reveal silenced rows so the lists don't read empty after the click. The "Show silenced" checkbox on each pane stays available for an explicit override.

#### Per-pane filters

[](#per-pane-filters)

Both *Recent completed* and *Recent failed* have an always-visible filter toolbar above the list. Each field binds to a short query-string key, so a narrowed view is shareable and bookmarkable.

Connection, Queue, and Class are populated as `` dropdowns from the configured queues (snapshots + Horizon autodiscovery) and the 24h class roster — no free-text typos. The Class dropdown on both panes binds to the global `?ck=` (same prop the Classes tab toggles), so picking a class on either pane scopes the other automatically.

#### Recent failed filter

[](#recent-failed-filter)

FieldQuery-string keyMatch semanticsConnection`fc`Exact (`connection` column)Queue`fq`Exact (`queue` column)Class`ck`Anchored prefix substring on `payload.displayName`, case-insensitiveFrom`ffrom``failed_at >=  00:00:00`To`fto``failed_at =  00:00:00`To`cto``processed_at  1`). The retry stamp is written by the `JobProcessing` listener via `MarkInFlight.lua` and ages out with the pending hash.

The data is **event-captured into Redis**, not peeked from the queue driver. The `JobQueued` listener stamps a per-uuid hash + per-queue sorted set into the package's Redis namespace; `JobProcessing` / `JobProcessed` / `JobFailed` clean up. Driver-agnostic by design — works for SQS, where there's no way to peek individual messages without consuming them, alongside Redis and database queues.

Bounded storage:

- ~500 bytes per pending job (uuid + class FQCN + connection + queue + queued\_at + available\_at).
- Per-queue cap (`pending.max_per_queue`, default 10000) enforced via `ZREMRANGEBYRANK` — when the cap is hit, the lowest-score (earliest `available_at`) entry is dropped first.
- TTL safety net (`pending.ttl_seconds`, default 86400 = 24h) drops orphans whose cleanup listener never fired (worker crash, raw `Queue::push()` outside Laravel's event flow).

The dashboard compares the tracked count against the snapshot's `depth + delayed` — when they diverge by more than `pending.gap_warn_threshold` (default 5), a `+N gap` badge appears on the toggle and a banner inside the inspector body warns that the lists are a sample, not a complete enumeration. Read the queue counters above for totals when the gap is non-zero. Gap usually points to one of:

- A worker crashed mid-pickup and the `JobProcessing` listener didn't fire (TTL eventually cleans).
- Jobs are being pushed via raw `Queue::push()` outside Laravel's standard dispatch (no `JobQueued` event raised).
- The `pending.max_per_queue` cap kicked in on a high-volume queue (more jobs in the queue than the tracked sample).

To opt out (memory-bounded production), set `QUEUE_INSIGHTS_PENDING_ENABLED=false`. The listener writes become no-ops, the inspector toggle disappears, and existing keys age out via TTL.

### Batches

[](#batches)

The dashboard renders a top-level **Batches** section above the Queues panel for jobs dispatched via `Bus::batch([...])->dispatch()`. Each row shows the batch name (or `Batch ` when unnamed), a progress bar driven by Laravel's authoritative `Bus::findBatch()` counts, and a counts triplet (`processed/total · failed · pending`). Cancelled batches show a red `cancelled` chip; finished + no-failures show a gray `finished` chip; jobs that fail when `allowFailures()` is off render `cancelled (first failure)` even before Laravel stamps `cancelled_at`.

Expanding a row reveals the per-uuid item list in enqueue order, with a status icon (✓ processed / ✗ failed / ⌛ pending) per item. Clicking a completed item opens the existing completed-job modal (by stream id); clicking a failed item opens the failed-job modal (by `failed_jobs.id`). The expand state is URL-shareable (`?batch=`).

Every completed, failed, and pending row that belongs to a batch carries a small batch chip — clicking it opens the batch modal directly. The chip also renders inside the completed/failed/pending modal heroes, so an operator drilling into a single job can jump to its batch in one click. Inside an item modal that was opened from a batch, a `← Back to batch` button in the header returns you to the batch view without losing context (item modals stack visually on top of the batch modal).

The data is **event-captured into Redis** alongside Laravel's own `BatchRepository`. The `JobQueued` listener writes the following keys per batched job:

- `qi:batches:index` (sorted set) — recent batchIds, ordered by first-seen unix timestamp. Used to enumerate batches without `SCAN`. Score-pruned on every enqueue (no whole-key TTL) so the head doesn't accumulate forever.
- `qi:batches:index:{connection}` (sorted set) — per-connection roster, populated first-write-wins via Lua so a heterogeneous batch lands on exactly one connection. Same score-pruning as the aggregate index. Read by `/queue-insights/{connection}` scoped views.
- `qi:batch:{id}:connection` (string) — single arbiter for first-write-wins. The atomic `SET … NX` on this key gates the per-connection ZADD inside `BatchClaimConnection.lua`. TTL is refreshed on every subsequent JobQueued for the same batch so the pointer doesn't age out under continued traffic.
- `qi:batch-uuid-conn:{uuid}` (string) — uuid → connection side-key written for every batched job. Survives the JobProcessed/JobFailed pending-hash deletion so the heterogeneous-batch detail-view scope filter keeps working after members have run.
- `qi:batch:{id}:uuids` (list) — RPUSH-ordered uuids in the batch. Bounded per batch by `batches.max_uuids_per_batch` (default 5000, best-effort under heavy concurrent dispatch).
- `qi:batch:uuid:{uuid}` (string) — reverse lookup uuid → batchId, used to render the per-row chip on completed jobs.

`RecordJobProcessed` and `RecordJobFailed` add two more per-uuid index keys (`qi:uuid-completed:{uuid}` and `qi:uuid-failed:{uuid}`) so the per-item rollup can route clicks into the existing modal flows.

Bounded storage:

- ~50 bytes per uuid (`qi:batch:{id}:uuids` entry + `qi:batch:uuid:{uuid}` reverse pointer + index entry, amortised per batch).
- TTL on every per-batch key (`batches.ttl_seconds`, default 604800 = 7d). Self-pruning on the index via `ZREMRANGEBYSCORE` on each enqueue; per-batch keys age out via Redis EXPIRE.
- Authoritative counts (`pending_jobs`, `processed_jobs`, `failed_jobs`, `progress`, `finished_at`, `cancelled_at`) come from `Bus::findBatch()` on every render — the captured keys exist only to enumerate batches and resolve uuid → display row, NOT to count.

**Retry caveat.** `queue:retry` and `queue:retry-batch` use `Queue::pushRaw()`, which does NOT fire `JobQueued`, so a retried job won't refresh as a fresh pending entry in the per-item rollup. The retry will still flow through `JobProcessed` (which DOES fire), so a successful retry overwrites `qi:uuid-failed:{uuid}` with `qi:uuid-completed:{uuid}` and the row flips from ✗ to ✓ within one poll cycle.

To opt out, set `QUEUE_INSIGHTS_BATCHES_ENABLED=false`. The listener writes become no-ops, the Batches section disappears, and chips stop rendering on existing rows.

### Chained jobs

[](#chained-jobs)

Jobs dispatched through `Bus::chain([...])->dispatch()` (or `$job->chain([...])`) carry the remaining chain inside the serialized command body. The dashboard renders that forward chain context in two places:

- **List rows** — completed and failed rows that have a follow-up job render a small `↳ NextJob (+N)` chip, where the leaf-class name shows the immediate next job and `+N` counts the further-down-chain jobs after it. Hover reveals the full FQCN and the total chained count.
- **Modal Chain section** — the completed and failed modals include a `Chain` block with the next job's FQCN, the `+N more chained` count, and the chain's queue/connection (when set on the job). The block is clickable: it swaps the modal into a "Chained jobs" detail view that lists every chained link in order with per-link connection/queue, and a `← Back` button (or `Esc`) returns to the job view. Drilling into a single chained job inside the **failed-job modal** also surfaces its constructor properties (extracted from the serialized payload, framework internals filtered out) — same renderer used by the parent job's payload section. The completed-modal chain view stays metadata-only since the slim chain summary persisted on the stream entry doesn't retain user-bound data.

For **failed jobs** the source is `failed_jobs.payload.data.command` — Laravel always persists this column, so chain context renders regardless of the package's `capture.payloads` setting. For **completed jobs** the listener writes a JSON-encoded `chain` field (a list of `{class, connection, queue}` per chained link, typically ~80–300 bytes) onto each completed-stream entry at the time the job runs, also independent of `capture.payloads`. Per-link `connection`/`queue` overrides set on individual jobs are preserved — the displayed route reflects what Laravel will actually dispatch to. Encrypted jobs (`ShouldBeEncrypted`) carry an opaque base64 blob in `data.command`, so the chip and section are silently omitted for those rows — no error, just no signal.

**Backward chain visibility — `↰ From {parent}`.** As the parent enters processing, the package drops a short-lived **claim ticket** into Redis (per-shape FIFO list keyed by connection/queue/next-class/tail-fingerprint, default 60 s TTL). When the next link's `JobQueued` fires inside `CallQueuedHandler::call()`, the listener pops a ticket and stamps the parent's UUID onto the child's lineage hash. The completed-modal then renders `↰ From {uuid}` above the existing `↳ Next` row, and the failed-job markdown export gains a `**Parent:** \`{uuid}` ({class})` line so AI-assisted triage can trace upstream of the failure point.

- **Disable** via `QUEUE_INSIGHTS_CHAIN_LINEAGE=false` (or `chain_lineage.enabled = false`). Both write and read sides short-circuit at the listener entry — zero Redis writes, zero overhead.
- **Encrypted parents (`ShouldBeEncrypted`) are silently skipped on both sides** — the serialized command body is opaque base64, so neither the parent's chain context nor the child's tail can be decoded. The child renders without a parent attribution; document this limitation if you mix encrypted chains with the dashboard.
- **Cross-worker collision tolerance.** Two parents with identical chain shape (same connection/queue/next-class/remaining-tail) running concurrently on different workers can attribute their children to each other in dispatch order rather than dispatch identity. Within a single worker chain dispatch is synchronous, so attribution is exact. Acceptable for an observability tool — see `internal/specs/backward-chain-lineage.md` §3 for the full collision model.
- **Class label is best-effort.** `qi:class:{uuid}` (TTL = `chain_lineage.lineage_ttl_seconds`, default 7 d) is the index that hydrates a parent UUID to a class name in the markdown export and modal. Past that horizon the UUID still renders, just without `(ClassName)`.
- **Click-through to the parent's modal is not in v1** — the lineage row is plain text plus a copy-to-clipboard button. Resolving a UUID to its target surface (completed stream id vs failed\_jobs id) is a follow-up.

`queue:retry` re-runs a failed job through the normal worker path, so the eventual completed-stream entry of a retried chained job will still carry the correct `chain` field — the retry doesn't lose chain visibility. Backward lineage is keyed by uuid and survives the retry too: the existing `qi:lineage:{uuid}` is never overwritten with null.

### Job initiator

[](#job-initiator)

Where `↰ From {parent}` answers "which *job* ran before this one", the job initiator answers "which *request, command, or scheduled task* started the work" — and, optionally, the exact line of code that dispatched it. Both surface as `Origin` and `Dispatched from` rows in the completed-, pending-, and failed-job modals, and in the failed-job markdown export.

- **Origin** — the coarse entry point: `http:{route}` for a job dispatched during a request, `artisan:{command}` inside a console command, `schedule:{task}` for one dispatched by a scheduled task. Origin rides Laravel's `Context`, so it's serialized into the job payload and **propagates into nested dispatches** — a job dispatched by another job inherits the root origin. Jobs dispatched outside any of those (tinker, a bare daemon) carry no origin.
- **Call site** — the exact `file:line` the `dispatch()` ran from, so two code paths that dispatch the same job class stay distinguishable. Opt-in: it costs one bounded `debug_backtrace()` per dispatch, so it's **off by default**.

```
// config/queue-insights.php
'initiator' => [
    'enabled' => env('QUEUE_INSIGHTS_INITIATOR', true),
    'capture_origin' => true,
    'capture_call_site' => false,  // opt in for file:line precision
],
```

Origin capture is automatic — the package appends an HTTP middleware to the `web` / `api` groups and listens on `CommandStarting` plus the scheduler lifecycle. Coverage is best-effort: requests through custom route groups, and dispatches that run before the group middleware, carry no origin. Disable the whole feature with `QUEUE_INSIGHTS_INITIATOR=false` — listeners and the middleware become no-ops.

### Customising row markup

[](#customising-row-markup)

The dashboard's queue, completed, and failed lists are each rendered through a Blade partial, plus a shared filter-form partial. They're publishable — a host that wants to swap a row's columns or restyle the filter chrome can publish the partials and edit them in place without forking the whole `dashboard.blade.php` view:

```
php artisan vendor:publish --tag=queue-insights-views
```

PartialWhat it renders`partials/queue-row.blade.php`One row in the Queues list (Needs attention + Healthy groups)`partials/completed-row.blade.php`One row in Recent completed`partials/failed-list-row.blade.php`One row in Recent failed`partials/batch-row.blade.php`One row in the Batches section (header + per-item rollup)`partials/batch-chip.blade.php`The small chip rendered on rows that belong to a batch`partials/filter-form.blade.php`The collapsible 5-field filter form (used by both completed + failed)`partials/stat-tile.blade.php`One tile in the headline-stats panel beside the throughput sparklineIf you only want to override one row layout, leave the others unpublished — Blade will fall back to the package's bundled version for those.

### Failure context

[](#failure-context)

When a job or scheduled task fails, the package captures a snapshot of the surrounding context so you can debug it without re-running anything:

- **`Context` snapshot** — the visible [Laravel `Context`](https://laravel.com/docs/context) facade at failure time (request id, user id, tenant, trace id — whatever your app puts there, including context added *during* the job). Captured on **any** queue driver.
- **Environment** — worker `host`, `pid`, app `env`, and an optional `release`/deploy identifier.

It shows in the failed-job and scheduled-run modals, and — most usefully — in the **Copy as Markdown** export, so a failure pasted into an AI agent or issue tracker is self-describing ("user 4821, tenant acme, trace abc123, release 2.4.0"). Both the `JobFailedAlert` and scheduler `ScheduledTaskFailed` events also carry the snapshot for host listeners. For scheduled tasks, the **root-cause inner exception** (deepest `getPrevious()`) is captured discretely so it survives stack-trace truncation.

Context **values are redacted by key name** through the same `capture.redact_keys` list as payloads (`password`, `token`, `secret`, `api_?key`, `authorization` by default) before storage — the export is paste-safe. Hidden context is never captured.

```
// config/queue-insights.php
'failure_context' => [
    'enabled' => env('QUEUE_INSIGHTS_FAILURE_CONTEXT', true),
    'capture_app_context' => true,
    'context_keys' => [],            // [] = all visible keys (sanitized); or restrict to a list
    'capture_environment' => true,
    'release_resolver' => null,      // null → env('APP_VERSION'); a string config-key; or a callable
    'max_value_bytes' => 2048,
    'ttl_seconds' => 604800,
],
```

Note

Capturing is gated on `failure_context.enabled` (on by default — it's cheap, since failures are rare, and sanitized). Set `QUEUE_INSIGHTS_FAILURE_CONTEXT=false` to disable entirely.

### Embedding the dashboard inside an admin layout

[](#embedding-the-dashboard-inside-an-admin-layout)

Disable the bundled route and mount the Livewire component yourself:

```
// config/queue-insights.php
'dashboard' => ['enabled' => false, /* ... */],
```

```
{{-- resources/views/admin/queue-insights.blade.php --}}
@extends('admin.layout')

@section('content')
    @livewire('queue-insights-dashboard')
@endsection
```

To embed a connection-scoped view, pass the scope as a mount param:

```
@livewire('queue-insights-dashboard', ['connection' => $tenant->queueConnection])
```

The component validates the connection against the configured roster (snapshots + Horizon autodiscovery; 404s on mismatch) and runs `viewQueueInsightsConnection` defensively, same as the bundled route — so this is safe to render in publicly-reachable views.

### Dark mode

[](#dark-mode)

The dashboard ships with a tri-state theme toggle (sun / monitor / moon) in the header — `light`, `dark`, and `system` (follows `prefers-color-scheme`, default). The header itself stays Horizon-dark in both modes by design; the rest of the chrome flips between light and dark surfaces.

Persistence lives in `localStorage['qi-theme']`. A blocking inline script in `` resolves the preference before first paint, so there's no flash of incorrect theme. The toggle survives `wire:navigate` morphs without leaking listeners.

```
'dashboard' => [
    'theme' => [
        // Default true; set to false to revert to the always-light look.
        'enabled' => env('QUEUE_INSIGHTS_DARK_MODE', true),
    ],
    'clock' => [
        // Default true. Tri-state header control (12h / auto / 24h) persisted
        // client-side via localStorage['qi-clock']. `auto` follows browser locale.
        'enabled' => env('QUEUE_INSIGHTS_CLOCK_TOGGLE', true),
    ],
    'redis_memory' => [
        // Default false. Opt-in 7th headline tile summing MEMORY USAGE across
        // every key under `key_prefix`. SCAN cost scales with keyspace size —
        // measure before enabling on multi-thousand-key hosts.
        'enabled' => env('QUEUE_INSIGHTS_REDIS_MEMORY_TILE', false),
        'cache_ttl' => 60,
    ],
],
```

Operators on system-dark hosts (terminal, IDE, Linear) get a coherent dark dashboard; operators on light hosts see the same look they had before. Disable via `QUEUE_INSIGHTS_DARK_MODE=false` in `.env` if needed — the inline script, color-scheme meta, and toggle component all skip emission and the dashboard reverts to the pre-feature always-light rendering.

### Custom payload sanitizer

[](#custom-payload-sanitizer)

The default `KeyRedactingSanitizer` can't see inside PHP-serialized `data.command` bodies. Apps with sensitive jobs should bind their own:

```
// app/Providers/AppServiceProvider.php
use SanderMuller\QueueInsights\Contracts\PayloadSanitizer;

$this->app->bind(PayloadSanitizer::class, YourSanitizer::class);
```

Running workers
---------------

[](#running-workers)

`php artisan queue-insights:work` is a thin parent supervisor that reads `queue-insights.snapshots`, groups entries by connection, and spawns one `queue:work` subprocess per connection with `--queue=q1,q2,...` (Laravel's built-in priority list). For hosts running Laravel Horizon, use `horizon` instead — `queue-insights:work` is the alternative for projects without Horizon. (Horizon autodiscovery feeds the *dashboard*, but does not start workers from `horizon.environments` via this command.)

```
# Boot every monitored connection. One process per (connection, queue list).
php artisan queue-insights:work

# Restrict to one connection, e.g. when running per-connection systemd units.
# Both forms compose; they accept repeated flags AND comma-separated values.
php artisan queue-insights:work --connection=sqs
php artisan queue-insights:work --connection=sqs,redis
php artisan queue-insights:work --connection=sqs --connection=redis

# All `queue:work` flags forward verbatim to every child.
php artisan queue-insights:work --tries=5 --timeout=90 --memory=256 --max-jobs=1000
```

The supervisor owns argv assembly + signal forwarding + exit-code propagation. SIGTERM/SIGINT/SIGQUIT received by the parent are forwarded to every live child; after `queue-insights.work.shutdown_grace_seconds` (default 120) any survivors get SIGKILL with a stderr warning. Parent exit code is the **first** non-zero child's, or `128 + signum` for signal-initiated stops (Bash convention — lets systemd / supervisord distinguish operator-stop from supervisor-crash).

Output is line-prefixed with `[{connection}]` so `journalctl` / `docker logs` consumers can `grep` by connection without log shipping.

### Non-goals

[](#non-goals)

This is **not** a Horizon replacement. The command is intentionally bounded to "one command, every monitored queue, one process group." Out of scope:

- **Auto-restart on crash** — host process manager owns liveness (systemd `Restart=on-failure`, supervisord, docker `restart: unless-stopped`).
- **Worker pool sizing / autoscaler** — one process per connection. Operators who want N workers per connection run N units with `--connection=X`.
- **Worker-liveness Redis keys + dashboard panel** — the existing `snapshot_command_dead` watchdog covers the snapshotter; no `qi:workers:*` heartbeat.
- **Cross-connection priority** — not possible while children are separate processes. Within-connection priority works (comma-list `--queue=q1,q2,q3`).
- **Per-queue flag overrides** — every child gets the same `--tries`, `--timeout`, etc. Per-queue sizing requires separate `--connection=X` units.

### Runtime requirements

[](#runtime-requirements)

- Requires the `pcntl` extension. POSIX hosts without it (and Windows generally) refuse to boot — the supervisor would otherwise orphan its children on shutdown.
- `queue:restart` works transparently — children share Laravel's global `illuminate:queue:restart` cache key reader.
- Pre-deploy ritual is unchanged: run `php artisan queue:restart` after a deploy, every child picks it up independently.

### `shutdown_grace_seconds` tuning

[](#shutdown_grace_seconds-tuning)

The default 120s covers `--timeout=60` + 20s SQS long-poll + headroom. The window must be **strictly greater than** the largest child `--timeout` plus driver poll latency (SQS long-poll = 20s, redis BLPOP up to 5s) — otherwise SIGKILL races a still-draining job. Bump it if you raise `--timeout`.

```
// config/queue-insights.php
'work' => [
    'shutdown_grace_seconds' => 120,
],
```

Ops runbook
-----------

[](#ops-runbook)

### Console commands

[](#console-commands)

CommandWhen to run`queue-insights:snapshot`Auto-registered every minute on Laravel's scheduler when `schedule.enabled=true` (default). Captures depth / in-flight / delayed for every queue in `snapshots[]`. Run manually for one-off captures or when you've opted out of auto-registration.`queue-insights:work`Long-running supervisor that boots one `queue:work` per `snapshots[]` connection. Use when not running Horizon. See [Running workers](#running-workers).`queue-insights:purge-pending {connection} {queue}`One-shot cleanup of orphan pending entries on a single (connection, queue) pair (workers that crashed mid-pickup, raw `Queue::push()` outside Laravel's event flow). Default dry-run; pass `--force` to mutate. Refuses to scrub the live default queue unless `--allow-live-queue` is set. Not online-safe — quiesce dispatch first.`queue-insights:migrate-aliases`One-shot migration after publishing `connection_aliases` — rewrites pending/inflight zsets onto the canonical name without waiting for `pending.ttl_seconds` to drain. Default dry-run; `--force` to mutate. See [Connection aliasing](#connection-aliasing).`queue-insights:prometheus-push`One-shot collect + PUT to a Pushgateway, for short-lived workers / CLI scripts. See [Push gateway](#push-gateway-short-lived-workers--cli).`queue-insights:schedule:list`Print the captured scheduler-task snapshot table. Read-only. Requires `scheduler.enabled`.`queue-insights:schedule:sweep`Detect missed + hung scheduler runs; dispatch their typed events. Auto-registered on Laravel's scheduler with `->everyMinute()->onOneServer()->withoutOverlapping()` when `scheduler.enabled` and `scheduler.sweeper.enabled` are both true. Run manually for one-off sweeps.### Dashboard signals

[](#dashboard-signals)

SignalMeaning`—` on in-flight / delayedDriver can't produce the metric (Null / sync), or the live cache expired (&gt;90s since the last successful snapshot).`stale` badgeNo snapshot ran in the last 2 minutes.`error` badgeLast snapshot run failed for this queue. Hover for the error message (10-minute TTL).`no snapshot yet`The command has never completed successfully against this queue.### Driver-specific quirks

[](#driver-specific-quirks)

- SQS values are AWS approximations. `GetQueueUrl` is cached for 1h in Redis; the first run per new queue name costs one extra API call.
- Redis reads `LLEN queues:{name}` plus `ZCARD` on `:reserved` and `:delayed`. Matches Laravel's own queue key convention.
- Database depth includes rows whose reservation has expired (crashed workers leave their jobs poppable again). Matches `DatabaseQueue::getNextAvailableJob()` exactly.

### Key-prefix strategies

[](#key-prefix-strategies)

- Shared Redis (multi-tenant, or multiple apps or envs on the same Redis): keep the default `QUEUE_INSIGHTS_KEY_PREFIX=qm:{APP_ENV}:`. Safe against collision.
- Dedicated Redis: override to `QUEUE_INSIGHTS_KEY_PREFIX=qm:` to drop the env segment and shorten every key.

### Redis Cluster

[](#redis-cluster)

Queue Insights issues multi-key Lua scripts and pipelines (atomic counter pairs, the pending → in-flight transition, batched dashboard reads). Redis Cluster rejects any multi-key command whose keys span hash slots with `CROSSSLOT` — so on a cluster-mode Redis those writes silently fail (the listeners catch and log; the dashboard reads error).

To run against Redis Cluster:

1. **Set `QUEUE_INSIGHTS_REDIS_CLUSTER=true`.** This wraps `key_prefix` in a Redis hash tag (`{qm:env:}…`), so every key the package writes hashes to a single slot and multi-key ops become CROSSSLOT-legal. If you have already placed your own `{…}` tag in `QUEUE_INSIGHTS_KEY_PREFIX`, it is left as-is.
2. **Configure the matching connection as a real cluster connection** in `config/database.php` (a `clusters` block, or `options.cluster`) so the client follows `MOVED` redirects — a plain connection pointed at a cluster endpoint will not. See [UPGRADING.md](UPGRADING.md) for a copy-paste `clusters` block.

Trade-off: hash-tag pinning co-locates the entire Queue Insights keyspace on **one** cluster slot — i.e. one node. That is intentional and fine for a bounded observability keyspace (capped streams, TTL'd keys), but it means Queue Insights does not shard across the cluster. If that keyspace is large enough to matter, point `redis_connection` at a standalone (non-cluster) Redis instead.

Alerting
--------

[](#alerting)

Enable via `QUEUE_INSIGHTS_ALERTS_ENABLED=true`. Nine detectors run every snapshot tick (≈ every minute) against live Redis state:

RuleScopeFires when`depth`per-queue`live:depth` ≥ a configured threshold`stalled`per-queuedepth ≥ `min_depth` AND no worker pickups in `idle_seconds``oldest_pending`per-queuethe oldest runnable pending job has been waiting `seconds` (skips not-yet-due delayed jobs)`stuck_inflight`per-queuethe longest-running in-flight job has been executing `seconds``failure_rate`per-class`failed / (processed + failed)` ≥ `ratio` over the **current hour bucket** AND total ≥ `min_jobs``slow_p95`per-classper-class p95 duration ≥ `class_threshold_ms[$class]` (opt-in per class)`snapshot_errored`per-queuethe snapshot driver threw on the most recent tick (auto-clears on next success / 10-min TTL)`backlog_growing`per-queueleast-squares depth slope over the recent samples ≥ `min_slope_per_minute` (opt-in, warms up after `min_samples` samples)`connection_drift`globalpending rows present under a Laravel queue connection that isn't the configured canonical for that queue (opt-in, default off — see [Connection aliasing](#connection-aliasing))A dashboard-only watchdog (`snapshot_command_dead`) renders a top-level red banner when `live:depth` keys are absent for every configured queue — i.e. the snapshot command itself has been silent for ≥ 90 s.

Cooldown applies to **outbound notifications only** (key: `alert:cooldown:{rule}:{c}:{q}`, TTL `cooldown_seconds`). The dashboard always reflects live state.

### Alert on individual job failures (`job_failed`)

[](#alert-on-individual-job-failures-job_failed)

The nine rules above are poll-driven. `job_failed` is a tenth, **event-driven** rule (opt-in, default off): it fires once on a job's **final** failure (Laravel's `JobFailed` — i.e. retries exhausted), the same trigger as [spatie/laravel-failed-job-monitor](https://github.com/spatie/laravel-failed-job-monitor) — so you don't need both. On top of a bare per-failure ping it adds per-class **cooldown**, **silencing** (`queue-insights.silenced`), and the same multi-channel routing (Slack / mail / Sentry / log) as every other rule. Because the only signal is the event, it works on **any** queue driver — no Redis snapshot required.

```
'job_failed' => ['enabled' => true, 'severity' => 'warning', 'notify' => true],
```

A typed `SanderMuller\QueueInsights\Events\JobFailedAlert` event is dispatched (cooldown-gated, silencing-filtered) carrying the job class, connection, queue, uuid, and the live exception — subscribe to forward it anywhere.

**`vs failure_rate`:** pick `job_failed` for "tell me about every failure", `failure_rate` for "tell me when a class is failing *a lot*" (a ratio over the hour bucket). They're complementary; enabling both gives an alert per incident *and* a trend alert.

Important

Unlike the poll-driven rules (which notify from the snapshot command), `job_failed` notifies **synchronously inside the worker**. With Slack/mail/Sentry enabled, the first failure of each class per cooldown window blocks the worker on that network call. For high-failure-volume apps, set `'notify' => false` to keep the `JobFailedAlert` event firing while skipping the package's synchronous channels, and dispatch your own queued notification from a listener.

### Config example

[](#config-example)

```
// config/queue-insights.php
'alerts' => [
    'enabled' => env('QUEUE_INSIGHTS_ALERTS_ENABLED', false),
    'cooldown_seconds' => 900,

    'rules' => [
        'depth' => [
            'enabled' => true,
            // Multiple thresholds matching the same (connection, queue) →
            // highest matching severity wins per tick.
            'thresholds' => [
                ['connection' => 'sqs', 'queue' => 'work', 'depth' => 1000, 'severity' => 'warning'],
                ['connection' => 'sqs', 'queue' => 'work', 'depth' => 5000, 'severity' => 'critical'],
            ],
        ],
        'stalled' => ['enabled' => true, 'idle_seconds' => 120, 'min_depth' => 1, 'severity' => 'critical'],
        'oldest_pending' => ['enabled' => true, 'seconds' => 600, 'severity' => 'warning'],
        'stuck_inflight' => ['enabled' => true, 'seconds' => 300, 'severity' => 'warning'],
        'failure_rate' => ['enabled' => true, 'min_jobs' => 20, 'ratio' => 0.10, 'severity' => 'warning'],
        // Per-job failure alert (event-driven, opt-in). See "Alert on
        // individual job failures" below. `notify => false` keeps the
        // JobFailedAlert event but skips this rule's package channels.
        'job_failed' => ['enabled' => false, 'severity' => 'warning', 'notify' => true],
        'slow_p95' => [
            'enabled' => false,
            'class_threshold_ms' => ['App\\Jobs\\GenerateReport' => 30_000],
            'severity' => 'warning',
        ],
        'snapshot_errored' => ['enabled' => true, 'severity' => 'warning'],
        'backlog_growing' => [
            'enabled' => false,
            'min_slope_per_minute' => 50.0,
            'min_samples' => 5,
            'severity' => 'warning',
        ],
        'connection_drift' => ['enabled' => false, 'severity' => 'warning'],
    ],

    'channels' => [
        'log' => ['enabled' => true, 'level' => 'warning'],
        'slack' => ['enabled' => false, 'webhook_url' => env('QUEUE_INSIGHTS_SLACK_WEBHOOK')],
        'mail' => ['enabled' => false, 'to' => ['ops@example.com']],
        'sentry' => ['enabled' => false],
    ],
],
```

> **Heads up — `oldest_pending` / `stuck_inflight` need pending tracking.**Both detectors read `pending-zset:*` / `inflight-zset:*` populated by the `RecordJobQueued` / `RecordJobProcessing` listeners. With `pending.enabled = false` they short-circuit at runtime and a one-off boot warning lists which rules were tripped. Either re-enable pending tracking or disable those rules.

### Notification channels

[](#notification-channels)

The package ships four channels out of the box:

- **`log`** — zero-dep, on by default; one structured log line per issue at the configured level (`alerts.channels.log.level`).
- **`slack`** — `Http::post` to a Slack-compatible incoming webhook (works with Slack, Mattermost, Rocket.Chat). Block Kit payload with severity-coloured attachment; falls back to plain `text` if the receiver rejects Block Kit. Set `QUEUE_INSIGHTS_SLACK_WEBHOOK` and `alerts.channels.slack.enabled = true`. `QUEUE_INSIGHTS_SLACK_CHANNEL` (queue alerts) and `QUEUE_INSIGHTS_SCHEDULER_SLACK_CHANNEL` (scheduler alerts) are optional display labels surfaced in the dashboard's alert-rules panel — they don't override the webhook's destination, since Slack incoming-webhooks bind the channel server-side at creation time.
- **`mail`** — uses Laravel's first-party mail channel; subject prefix `[Queue Insights] {severity}: {rule} on {target}`. Recipients from `alerts.channels.mail.to` (array of addresses).
- **`sentry`** — captures each issue into your application's existing Sentry project as a grouped event. No DSN config here: the channel uses whatever Sentry hub the host has initialised. Recommended setup is [`sentry/sentry-laravel`](https://github.com/getsentry/sentry-laravel) with `SENTRY_LARAVEL_DSN` set (any initialised `sentry/sentry` hub works too); then set `alerts.channels.sentry.enabled = true`. Severity maps fixed — `critical → error`, `warning → warning` — and events fingerprint per `[queue-insights, rule, target]` so Sentry groups one issue per rule+target instead of opening a new one each snapshot tick. Tags (`queue_insights.rule`/`severity`/`connection`/`queue`/`job_class`) and the full issue context (as a `queue-insights` context block) ride along.

`slack`, `mail`, and `sentry` feature-detect their underlying dependency (`Illuminate\Http\Client\Factory`, `mail.manager`, and — for sentry — a *bound* Sentry hub client, not merely the loaded SDK) — if it's missing the channel is silently skipped, and the dashboard's alert-rules panel shows the reason (sentry's row reads `SDK not installed` when the package is absent, or `hub not configured` when the SDK is present but no DSN/hub is initialised). Because sentry requires a live client, a misconfigured scheduler-sentry-only setup falls back to the queue-side channels rather than dropping the alert.

### Adding more channels (Discord, Teams, PagerDuty, Telegram, …)

[](#adding-more-channels-discord-teams-pagerduty-telegram-)

The package emits a `SanderMuller\QueueInsights\Alerts\Notifications\QueueAlertNotification` and routes it through `SanderMuller\QueueInsights\Alerts\Notifications\QueueInsightsNotifiable`, exactly as Spatie's alerting packages and Horizon do. To add a destination:

1. Install the matching `laravel-notification-channels/*` package (`discord`, `microsoft-teams`, `pagerduty`, `telegram`, `vonage`, …).
2. Extend `QueueAlertNotification` to add the channel to `via()` and a `to{Channel}()` method, OR override `QueueInsightsNotifiable` and add `routeNotificationFor{Channel}()`.
3. Bind your override in your `AppServiceProvider`:

    ```
    $this->app->bind(QueueAlertNotification::class, MyQueueAlertNotification::class);
    $this->app->bind(QueueInsightsNotifiable::class, MyNotifiable::class);
    ```

### Typed events (always fire)

[](#typed-events-always-fire)

Each rule fires a typed event regardless of which channels are enabled — host apps can hook `Event::listen(...)` for custom routing:

- `QueueDepthExceeded` (existing — added trailing nullable `?string $severity`)
- `QueueStalled`, `OldestPendingAging`, `StuckInFlight`, `SnapshotErrored`
- `JobClassFailureRateExceeded`, `JobClassP95Exceeded`
- `BacklogGrowing`

### Active-rules panel

[](#active-rules-panel)

The dashboard footer renders a read-only summary of `alerts.rules` + `alerts.channels` so operators can verify what's monitored without SSH'ing into the server. Edit the config file to change anything — there is no runtime mutation surface.

### Migrating from the 0.x `alerts.thresholds` shape

[](#migrating-from-the-0x-alertsthresholds-shape)

The pre-1.0 config exposed a single flat `alerts.thresholds` list. It is still honoured (legacy wins over `alerts.rules.depth.thresholds`) and emits a one-off boot warning. To migrate:

```
 'alerts' => [
     'enabled' => true,
     'cooldown_seconds' => 900,
-    'thresholds' => [
-        ['connection' => 'sqs', 'queue' => 'work', 'depth' => 1000],
-    ],
+    'rules' => [
+        'depth' => [
+            'enabled' => true,
+            'thresholds' => [
+                ['connection' => 'sqs', 'queue' => 'work', 'depth' => 1000, 'severity' => 'warning'],
+            ],
+        ],
+    ],
 ],
```

Note: Laravel's `mergeConfigFrom` is a shallow merge, so hosts that published `config/queue-insights.php` before this version will not pick up the new nested defaults under `alerts.rules.*` automatically — copy the new keys from the package config when migrating.

### Silencing noisy jobs

[](#silencing-noisy-jobs)

Mirrors Horizon's `horizon.silenced` knob: list job-class FQCNs whose **failures** should be suppressed from the dashboard's Failed list, the headline failed-tile, the throughput sparkline's failed series, the `failure_rate` alert detector, and outbound notifications.

```
'silenced' => [
    App\Jobs\IntermittentlyFailingJob::class,
    App\Jobs\ThirdPartyApiSometimesFlakes::class,
],

// Glob fallback for whole namespaces or related classes. Exact `silenced`
// entries are matched first; `silenced_patterns` is `Str::is`-style and
// matches case-insensitively, same as `silenced`.
'silenced_patterns' => [
    'App\\Jobs\\Reports\\*',
    'App\\Jobs\\*Sync',
],
```

Counter writes (`qi:processed:{class}:{bucket}`, `qi:failed:{class}:{bucket}`, `qi:classes`) are preserved — silencing is a read-side filter only, so removing a class from the list immediately re-surfaces its history without any backfill. The class rows table keeps showing throughput / p95 / max for silenced classes with a muted `silenced` badge so you can still triage them.

SurfaceBehaviour under silencingFailed list (Failed tab)Hidden by default. The "Show silenced" checkbox on the failed-pane filter form reveals them; URL-shareable as `?fs=1`.Headline `failed_past_hour` + throughput sparkline failed seriesSilenced classes excluded. Processed series stays exact.`failure_rate` alert detectorReturns null for silenced classes — no event, no notification, no cooldown burned.`slow_p95` alert detectorUnchanged — silencing is a failure-noise filter, not a perf filter. Exclude noisy classes from `class_threshold_ms` if you want their perf alerts muted too.Class rows tableRow stays, marked with a muted `silenced` badge inline next to the FQCN. Operators still see throughput / p95 / max for silenced classes.Modal-by-uuid + chain-lineage click-through + batch-detail itemsNOT filtered. Silencing is a list-level filter; uuid-addressed lookups always resolve so a batched member or chain parent stays clickable.`qi:failed:{class}:{bucket}` Redis counters + `qi:classes` zsetStill written by the listeners. Silencing is reversible without losing history.The bulk-retry uuid collector inherits the same SQL exclusion path — bulk-retry actions on the default-filter view never queue silenced classes for retry. Toggle "Show silenced" first if you want them in the bulk set.

#### Horizon-silenced jobs

[](#horizon-silenced-jobs)

When `laravel/horizon` is installed, entries from Horizon's own `config('horizon.silenced')` are automatically merged into the same filter set — operator-edited `config/horizon.php` entries and upstream packages writing to it at boot (e.g. [spatie/laravel-health](https://github.com/spatie/laravel-health)'s `silence_health_queue_job` flag, which adds `Spatie\Health\Jobs\HealthQueueJob`) take effect without a duplicate `queue-insights.silenced` entry. Merge is read-only; we never write back to Horizon's config.

Horizon supervisor auto-discovery
---------------------------------

[](#horizon-supervisor-auto-discovery)

When `laravel/horizon` is installed, the dashboard Queues panel + pending/in-flight aggregation surface every Horizon supervisor's `{connection, queue}` from `horizon.environments` without hand-listing each one under `snapshots[]`. Static snapshot entries still win on collision (deduped on canonical `{connection, queue}`). Resolution mirrors Horizon's own `ProvisioningPlan` — `Str::is` glob match on env keys, recursive merge with `horizon.defaults` for supervisors that only override `processes`/`tries`/`balance`.

```
// config/queue-insights.php — defaults (no operator action needed for most hosts):
'horizon' => [
    'autodiscover' => env('QUEUE_INSIGHTS_HORIZON_AUTODISCOVER', true),
    'environment' => env('QUEUE_INSIGHTS_HORIZON_ENV'), // null = app()->environment()
],
```

`autodiscover` is tri-state:

ValueBehaviour`false`Never autodiscover — static `snapshots[]` only.`true` (default)Autodiscover **only when Horizon's service provider is loaded** in the running app — i.e. Horizon is actually the queue runtime here.`'force'`Autodiscover from `config/horizon.php` regardless of whether the provider is loaded.The `true` gate matters on Vapor and similar setups: Horizon may be *configured* (`config/horizon.php` defines supervisors) while jobs actually run on SQS and Horizon's provider is excluded (`composer.json` `extra.laravel.dont-discover` + conditional registration). There, `true` correctly skips those supervisor queues — they'd never receive a snapshot. Set `QUEUE_INSIGHTS_HORIZON_AUTODISCOVER=force` if you genuinely want config-derived rows without the provider loaded. Set `QUEUE_INSIGHTS_HORIZON_ENV` when running multiple Horizon environments off the same Laravel `APP_ENV`.

When `autodiscover='force'` is set but the Horizon provider isn't loaded in the running app, the dashboard surfaces a top-level "Horizon not running" banner so operators don't read empty supervisor rows as a healthy state.

Connection aliasing
-------------------

[](#connection-aliasing)

Use when one physical queue store is reached via multiple Laravel queue connection names — e.g. a `redis` connection for dispatchers + a `redis-staging` connection for Horizon workers, both pointing at the same Redis DB. Without aliasing, `JobQueued::$connectionName` (the dispatcher's name) and `JobProcessing::$connectionName` (the worker's name) diverge across every connection-keyed keyspace (pending/inflight zsets, per-class rosters + counters, Prometheus labels). Pending rows orphan; the dashboard panel scoped to the worker connection shows zero pending for a queue that's actively draining.

Publish the alias map to collapse both sides onto a canonical name:

```
// config/queue-insights.php
'connection_aliases' => [
    'redis' => 'redis-staging',
    'redis-staging' => 'redis-staging',
],
```

Rules (enforced by `ConfigValidator::validateConnectionAliases`):

- identity mappings (`A => A`) allowed
- transitive chains (`A => B, B => C, B !== C`) rejected — flatten manually
- mutual cycles (`A => B, B => A`) rejected

Affects every connection-keyed Redis key and the `connection` label on every Prometheus metric. See [`UPGRADING.md`](UPGRADING.md) for the Prometheus relabel rule.

Prometheus
----------

[](#prometheus)

Enable via `QUEUE_INSIGHTS_PROMETHEUS_ENABLED=true`. Mounts `GET /metrics` (path configurable) exposing queue-insights state in Prometheus 0.0.4 text format — or OpenMetrics 1.0.0 when the scraper sends `Accept: application/openmetrics-text` (Prometheus negotiates this automatically). Default-off; adoption is opt-in.

Auth is **fail-closed**: the package's default middleware refuses with `403` unless `prometheus.token` (preferred for shared infra) or `prometheus.allow_ips` (CIDR list) is configured. There is no silent open default.

```
# .env
QUEUE_INSIGHTS_PROMETHEUS_ENABLED=true
QUEUE_INSIGHTS_PROMETHEUS_TOKEN=long-random-string
```

```
# prometheus.yml
scrape_configs:
  - job_name: laravel-queue-insights
    metrics_path: /metrics
    bearer_token: long-random-string
    static_configs:
      - targets: ['app.example.com']
```

### Metric catalogue

[](#metric-catalogue)

MetricTypeLabelsNotes`queue_insights_queue_depth`gauge`connection`, `queue`Mirrors snapshot loop output. Pair with `queue_insights_snapshot_alive`.`queue_insights_inflight_jobs`gauge`connection`, `queue``ZCARD inflight-zset`.`queue_insights_pending_jobs`gauge`connection`, `queue`Runnable now (`available_at  now`).`queue_insights_oldest_pending_age_seconds`gauge`connection`, `queue`0 when empty.`queue_insights_oldest_inflight_age_seconds`gauge`connection`, `queue`0 when empty.`queue_insights_jobs_processed_total`counter`class`, `connection`True monotonic INCR — safe for `rate()` / `increase()`.`queue_insights_jobs_failed_total`counter`class`, `connection`Same.`queue_insights_job_duration_count_total`counter`class`, `connection`Mean = `rate(sum) / rate(count)` Prometheus-side.`queue_insights_job_duration_sum_seconds_total`counter`class`, `connection`Seconds (HINCRBY `sum_ms` ÷ 1000).`queue_insights_job_duration_max_seconds`gauge`class`, `connection`Lifetime max. Use `max_over_time()` for windowed maxima.`queue_insights_alert_active`gauge`rule`, `connection`, `queue`, `severity` (+ `class` for class-scoped rules)Always 1 when present; absent series = no alert. Use `OR on() vector(0)` Grafana-side to render gaps as 0.`queue_insights_snapshot_alive`gauge`connection`, `queue`1/0. **Use this in alerts**, not `_age_seconds`.`queue_insights_snapshot_age_seconds`gauge`connection`, `queue`**Omitted** when the snapshot key is absent (so alerts can use `absent(...)` cleanly instead of clamping to 0).`queue_insights_snapshot_errors_total`counter`connection`, `queue`Monotonic INCR — paired with the existing 10-min `snapshot:error:*` boolean.`queue_insights_exporter_collect_duration_seconds`gauge(none)Wall-clock seconds of the previous collect cycle.Per-class metrics (`*_processed_total`, `*_failed_total`, duration aggregates) are **opt-in by class** to bound cardinality. Default `class_filter.mode = allow_list` with empty `classes` → no per-class metrics emitted. Three modes:

```
// config/queue-insights.php
'prometheus' => [
    'class_filter' => [
        // 'allow_list'        — only emit for the FQCNs in `classes` (DEFAULT)
        // 'allow_all'         — emit for every class on classes:{connection}
        // 'top_n_by_recency'  — top N most-recently-seen per connection (recency, NOT throughput)
        'mode' => 'allow_list',
        'classes' => [
            App\Jobs\GenerateReport::class,
            App\Jobs\SyncCustomer::class,
        ],
        'top_n' => 50,
    ],
],
```

A two-tier cache (per-request memoise + 5 s Redis cache, key `prom:cache:rendered:{flavour}`) bounds thunder-herd when multiple Prometheus replicas scrape concurrently. Set `prometheus.cache_ttl_seconds = 0` to disable both layers for instant reads.

Each metric family has its own toggle under `prometheus.metrics.*` (default-on) — disable any family the host doesn't need to keep the scrape body lean.

### Scheduler metrics

[](#scheduler-metrics)

When `scheduler.enabled = true` AND each per-family toggle below is set, the exporter emits scheduler-side families. **Default OFF** — adoption is opt-in per family (mirrors the per-class queue metrics stance).

MetricTypeLabelsNotes`queue_insights_scheduled_task_runs_total`counter`task`, `status`Status: `success` (= `total_runs - total_failed`), `failed`, `skipped`. Hung + missed are separate families below.`queue_insights_scheduled_task_runtime_sum_seconds_total`counter`task`Lifetime runtime sum, seconds. Pair with `queue_insights_scheduled_task_runs_total` for mean: `rate(sum) / rate(runs_total{status=~"success|failed"})`. Sample omitted until the first finished run.`queue_insights_scheduled_task_last_run_timestamp`gauge`task`, `status`Unix ts (seconds) of last run per status. Page on `time() - queue_insights_scheduled_task_last_run_timestamp{status="success"} > N`. Sample omitted when no run of that status exists.`queue_insights_scheduled_task_hung_total`counter`task`Detections from `HungTaskReconciler`.`queue_insights_scheduled_task_missed_total`counter`task`Detections from `MissedRunReconciler`.`queue_insights_scheduled_task_in_flight`gauge`task`1 when the task is mid-run (Started without Finished/Failed). Sample omitted when not running.`queue_insights_scheduled_snapshot_age_seconds`gauge(none)Seconds since the schedule snapshot was last rewritten on app boot. **Omitted when never written** (alerts use `absent(...)` cleanly).`queue_insights_scheduled_sweeper_age_seconds`gauge(none)Seconds since `MissedRunReconciler` last completed a tick. Alert on `> 2 × sweeper.sweep_seconds`.Toggle each family independently:

```
'prometheus' => [
    'metrics' => [
        // ...
        'scheduler_runs_total' => true,
        'scheduler_runtime_sum' => true,
        'scheduler_last_run_timestamp' => true,
        'scheduler_hung_total' => true,
        'scheduler_missed_total' => true,
        'scheduler_in_flight' => true,
        'scheduler_snapshot_age' => true,
        'scheduler_sweeper_age' => true,
    ],

    // Per-task cardinality control. Task rosters are typically  [
        'mode' => 'allow_all',  // | allow_list
        'tasks' => [],          // taskKey list, used when mode = allow_list
    ],
],
```

`runtime_max_seconds` is intentionally NOT shipped in v1 — would need a Lua HSET-IF-GREATER write path. Operators who need lifetime max can compute `max_over_time(queue_insights_scheduled_task_runtime_sum_seconds_total[N])` Prometheus-side as a coarse proxy, or run the per-task duration sparkline in the dashboard for exact values.

### Push gateway (short-lived workers / CLI)

[](#push-gateway-short-lived-workers--cli)

For processes that exit before any scrape can land, `php artisan queue-insights:prometheus-push` does a one-shot collect + PUT to a configured Pushgateway. Long-running workers should be **scraped, not pushed** — push-mode is for CLI scripts and scheduled tasks where pull-mode can't reach the process.

```
# .env
QUEUE_INSIGHTS_PUSHGATEWAY_URL=https://pushgateway.example/metrics
QUEUE_INSIGHTS_PUSHGATEWAY_JOB=laravel-queue-insights
QUEUE_INSIGHTS_PUSHGATEWAY_INSTANCE=worker-01   # required for clustered hosts
```

The command **fails closed** when `pushgateway.instance` is unset and `--accept-shared-grouping` is not passed: clustered hosts that share a `job` label without distinct `instance` values silently overwrite each other's pushed metrics. Pass `--accept-shared-grouping` once you've confirmed single-replica semantics, or set `instance` per-replica.

```
php artisan queue-insights:prometheus-push                           # PUT metrics
php artisan queue-insights:prometheus-push --delete                  # DELETE the grouping
php artisan queue-insights:prometheus-push --accept-shared-grouping  # opt out of the instance guard
```

Exit codes mirror Symfony Console convention: `0` success, `1` Pushgateway HTTP failure, `2` config error (missing URL / unset instance without override).

Scheduler observability
-----------------------

[](#scheduler-observability)

Enable via `QUEUE_INSIGHTS_SCHEDULER_ENABLED=true`. Off by default — existing queue-insights users opt in.

When on, the package listens on Laravel's `Illuminate\Console\Events\Scheduled*` events and records:

- **Per-task definition snapshots** — cron expression, command summary, queue connection, `runInBackground`, `withoutOverlapping`, `onOneServer`. Snapshot is hash-stable; a `php artisan schedule:list`-style render is rebuilt from these.
- **Per-run records** — `Starting`, `Finished` (exit code + runtime), `Failed` (exception class + message), `Skipped` (reason), `BackgroundTaskFinished` (parent process exits before the child; the run is closed off the running pointer). Output capture is configurable: `off` / `metadata` (exit code only) / `full` (stdout/stderr after the bound `PayloadSanitizer` pass + byte cap).
- **Counters + 24h aggregates** — per-task processed / failed / skipped / hung / missed counts and rolling p95 runtime.

```
# .env
QUEUE_INSIGHTS_SCHEDULER_ENABLED=true
QUEUE_INSIGHTS_SCHEDULER_CAPTURE=metadata   # off | metadata | full
QUEUE_INSIGHTS_SCHEDULER_ALERTS_ENABLED=false
```

### Dashboard panel

[](#dashboard-panel)

When the dashboard is mounted and `scheduler.dashboard.enabled = true`, a lazy-loaded **Scheduled tasks** panel renders below the queue panes. Empty-state copy guides first-time hosts; the panel hides itself when scheduler observability is disabled. Gate via the existing `viewQueueInsights` ability, or define a narrower `viewScheduleInsights` Gate to gate scheduler reads independently.

Click a row in the **Tasks** card to open the per-task drilldown — cron expression + flag pills, 24h tile grid, host-distribution bar (suppressed for single-host tasks), recent-runs table scoped to the task. Click a row in **Recent runs** to open the per-run drilldown — exception block (failed runs), output viewer (full-capture only; closure tasks render an "output capture not supported" hint), skip-reason explainer, correlated-jobs section listing every job uuid the run dispatched (click-through opens the queue-side modal). Both modals are URL-bound (`?s_tk=` + `?s_rid=`) so deep-links round-trip; aged-out runs render an "Expired" empty state. A markdown-export copy button on the run modal hands the full context to AI agents or trackers.

The package rebuilds the snapshot on `app->booted` from the live `Schedule::events()`. Hosts that pre-seed the snapshot keys themselves (custom import script, fixture seeder, etc.) can opt out with `QUEUE_INSIGHTS_SCHEDULER_SNAPSHOT_REBUILD=false` to keep their own data on every boot.

### CLI

[](#cli)

```
php artisan queue-insights:schedule:list    # snapshot table: cron, command, last run, counters
php artisan queue-insights:schedule:sweep   # one-off sweep: flag missed + hung runs, dispatch events
```

Run the sweep on its own short cron (`* * * * *`) — the sweeper's own work is detect-only; it does not poll Redis on hot-path tick events.

### Missed + hung detection

[](#missed--hung-detection)

A run is **missed** when the cron expression's next-fire timestamp passes without a `Starting` event landing inside `sweeper.drift_seconds` (default 90 s). A run is **hung** when no `Finished` / `Failed` event arrives within `expected_runtime + hung.grace_seconds` (default 300 s); expected runtime is the rolling p95 from aggregates and falls back to `grace_seconds` alone for tasks with fewer than `hung.min_runs_for_p95` (default 10) recorded runs.

When `scheduler.alerts.enabled = true`, missed/hung/failed detections dispatch typed events with per-`(taskKey, rule)` cooldown (`scheduler.alerts.cooldown_seconds`, default 900). Cooldown gates the **event dispatch itself** — when an alert is suppressed by cooldown, no event fires. Host listeners on `ScheduledTaskFailed` / `Missed` / `Hung` therefore only see the leading edge of an alerting condition; subsequent ticks within the cooldown window are silent until cooldown expires.

Notifications additionally require the package-wide `alerts.enabled` master switch to be on — typed events fire under `scheduler.alerts.enabled` alone, but log / slack / mail emission is gated on **both** flags so a host running with `alerts.enabled=false` for queue alerts doesn't suddenly start paging on scheduler events after upgrade.

```
SanderMuller\QueueInsights\Events\ScheduledTaskMissed   { taskKey, task, expectedAtMs }
SanderMuller\QueueInsights\Events\ScheduledTaskHung     { taskKey, runId, task?, … }
SanderMuller\QueueInsights\Events\ScheduledTaskFailed   { taskKey, runId, task, … }

```

Scheduler alerts route through the same `QueueAlertNotification` pipeline as queue alerts — `log` / `slack` / `mail` / `sentry` channels, Spatie-style notifiable, host-extensible. Operators get one mental model and one set of channels to wire.

#### Per-domain channel routing

[](#per-domain-channel-routing)

Populate `scheduler.alerts.channels` to send scheduler alerts to a different Slack channel / mail recipient list / log channel. When the scheduler block has at least one channel explicitly enabled, scheduler-scoped issues read it; otherwise they fall back to `alerts.channels`. Single-list installs (only `alerts.channels` populated) Just Work without any extra config:

```
'scheduler' => [
    'alerts' => [
        'enabled' => env('QUEUE_INSIGHTS_SCHEDULER_ALERTS_ENABLED', false),
        'cooldown_seconds' => 900,
        'channels' => [
            'slack' => [
                'enabled' => true,
                'webhook_url' => env('QUEUE_INSIGHTS_SCHEDULER_SLACK_WEBHOOK'),
                'channel' => '#cron-watch',
            ],
        ],
    ],
],
```

Scheduler-scoped Slack payloads carry a `Run URL` field that deep-links into the dashboard's per-run modal (`?s_rid={taskKey}:{runId}`). Missed runs link to the per-task modal (`?s_tk={taskKey}`) instead.

The typed `ScheduledTaskFailed` / `ScheduledTaskMissed` / `ScheduledTaskHung` events keep firing alongside the notification path, so existing host listeners stay wired. The cooldown key namespace moved from `sched:alert:cooldown:*` to `alert:cooldown:scheduled_task_*:task:{taskKey}` for parity with queue-side alerts — see UPGRADING for the one-shot Redis cleanup.

### External heartbeat

[](#external-heartbeat)

In-process detection cannot catch a fully-dead scheduler (`schedule:run` not running at all). The sweeper command **POSTs out** to an operator-supplied heartbeat URL after every successful tick — a Healthchecks.io / Cronitor / Oh Dear / Sentry Crons / Better Stack ping endpoint. Configure the destination URL and the receiving SaaS alerts when posts go silent:

```
'scheduler' => [
    'heartbeat' => [
        'enabled' => true,
        'url' => env('QUEUE_INSIGHTS_SCHEDULER_HEARTBEAT_URL'),
    ],
],
```

Payload is a small JSON body (`host_id`, `timestamp`, `tasks_swept`); the sweeper times the request out at 5 s and logs a warning on failure rather than blocking. The host owns the receiving uptime monitor; the package owns the outbound POST.

### Retention

[](#retention)

Per-run records age out at `scheduler.retention.run_ttl_seconds` (default 7 d). The recent-runs index is capped at `runs_index_max` entries (default 10 000). Per-run job zsets (`qi:sched:run-jobs:{runId}`) are capped at `run_jobs_max` (default 5 000) so a fan-out task that dispatches a very large number of jobs cannot grow the index unbounded — oldest by score evicted first.

Testing
-------

[](#testing)

```
composer test
```

Runs the Pest suite via Orchestra Testbench. `composer qa` additionally runs Rector, Pint, and PHPStan.

Upgrading
---------

[](#upgrading)

See [UPGRADING.md](UPGRADING.md) for migration steps between minor versions. Patch releases never require manual steps.

Changelog
---------

[](#changelog)

See [CHANGELOG.md](CHANGELOG.md) and the [GitHub releases page](https://github.com/SanderMuller/laravel-queue-insights/releases). The changelog is updated automatically on release publish — do not edit by hand.

Contributing
------------

[](#contributing)

Issues and pull requests welcome at [github.com/SanderMuller/laravel-queue-insights](https://github.com/SanderMuller/laravel-queue-insights). Please run `composer qa` and `composer test` before opening a PR.

Security Vulnerabilities
------------------------

[](#security-vulnerabilities)

Please review [our security policy](SECURITY.md) on how to report security vulnerabilities.

Credits
-------

[](#credits)

- [Sander Muller](https://github.com/SanderMuller)
- [All Contributors](https://github.com/SanderMuller/laravel-queue-insights/contributors)

License
-------

[](#license)

MIT. See [`LICENSE`](LICENSE).

###  Health Score

49

—

FairBetter than 94% of packages

Maintenance98

Actively maintained with recent releases

Popularity28

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity49

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 97.8% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

25

Last Release

17d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/abeb7bd51fd77656b4133bdcdf4eca99cc8b495b83ca72f63674ff15f2b72a39?d=identicon)[SanderMuller](/maintainers/SanderMuller)

---

Top Contributors

[![SanderMuller](https://avatars.githubusercontent.com/u/9074391?v=4)](https://github.com/SanderMuller "SanderMuller (271 commits)")[![dependabot[bot]](https://avatars.githubusercontent.com/in/29110?v=4)](https://github.com/dependabot[bot] "dependabot[bot] (6 commits)")

---

Tags

laravelMetricsqueuesqsdashboardhorizonobservability

###  Code Quality

TestsPest

Static AnalysisPHPStan, Rector

Code StyleLaravel Pint

Type Coverage Yes

### Embed Badge

![Health badge](/badges/sandermuller-laravel-queue-insights/health.svg)

```
[![Health](https://phpackages.com/badges/sandermuller-laravel-queue-insights/health.svg)](https://phpackages.com/packages/sandermuller-laravel-queue-insights)
```

###  Alternatives

[laravel/horizon

Dashboard and code-driven configuration for Laravel queues.

4.1k91.3M277](/packages/laravel-horizon)[spatie/laravel-health

Monitor the health of a Laravel application

88011.3M149](/packages/spatie-laravel-health)[laravel/pulse

Laravel Pulse is a real-time application performance monitoring tool and dashboard for your Laravel application.

1.7k14.1M120](/packages/laravel-pulse)[laravel/ai

The official AI SDK for Laravel.

9782.1M153](/packages/laravel-ai)[illuminate/queue

The Illuminate Queue package.

20432.2M1.5k](/packages/illuminate-queue)[tallstackui/tallstackui

TallStackUI is a powerful suite of Blade components that elevate your workflow of Livewire applications.

719160.4k12](/packages/tallstackui-tallstackui)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
