PHPackages                             survos/past-perfect-bundle - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. survos/past-perfect-bundle

ActiveSymfony-bundle[Utility &amp; Helpers](/categories/utility)

survos/past-perfect-bundle
==========================

Symfony bundle for harvesting PastPerfect Online collections

2.0.194(1mo ago)087MITPHPPHP ^8.4

Since Mar 5Pushed 1mo agoCompare

[ Source](https://github.com/survos/past-perfect-bundle)[ Packagist](https://packagist.org/packages/survos/past-perfect-bundle)[ GitHub Sponsors](https://github.com/kbond)[ RSS](/packages/survos-past-perfect-bundle/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (18)Versions (68)Used By (0)

SurvosPastPerfectBundle
=======================

[](#survospastperfectbundle)

Symfony bundle for harvesting [PastPerfect Online](https://www.pastperfectonline.com/) collections.

Streams listing pages and detail pages into JSONL files with built-in resume support via [survos/jsonl-bundle](https://github.com/survos/jsonl-bundle) sidecars. Raw HTML pages are cached to disk so re-runs are fast.

Site discovery (finding all `*.pastperfectonline.com` tenants) is delegated to [survos/site-discovery-bundle](https://github.com/survos/site-discovery-bundle).

**Requirements:** PHP 8.4+, Symfony 8.0+. Uses `Dom\HTMLDocument` (PHP 8.4 native HTML5 API) — no external DOM library.

---

Installation
------------

[](#installation)

```
composer require survos/past-perfect-bundle survos/site-discovery-bundle
```

Register if not using Flex:

```
// config/bundles.php
return [
    Survos\SiteDiscoveryBundle\SurvosSiteDiscoveryBundle::class => ['all' => true],
    Survos\PastPerfectBundle\SurvosPastPerfectBundle::class      => ['all' => true],
];
```

Optional — install `survos/import-bundle` for field profiling and CSV export:

```
composer require survos/import-bundle
```

---

Configuration
-------------

[](#configuration)

```
# config/packages/survos_past_perfect.yaml
survos_past_perfect:
    throttle:   1.0                          # seconds between uncached HTTP requests
    cache_dir:  var/pastperfect              # raw HTML cache location
    user_agent: "SurvosPastPerfectBundle Harvester"
```

---

Typical workflow
----------------

[](#typical-workflow)

```
# 1. Find all PPO tenant sites (via Internet Archive CDX)
bin/console pastperfect:discover-registry

# 2. Validate each site is live (dispatches Messenger messages)
bin/console pastperfect:probe-registry

# 3. Harvest the listing index for one site
bin/console pastperfect:harvest-listing https://fauquierhistory.pastperfectonline.com

# 4. Fetch and parse all detail pages (HTML cached locally)
bin/console pastperfect:harvest-details \
    var/pastperfect/fauquierhistory/fauquierhistory-listing.jsonl

# 5. Review the field landscape (requires survos/import-bundle)
bin/console import:profile:report \
    var/pastperfect/fauquierhistory/fauquierhistory-details.jsonl.profile.json
```

---

Commands
--------

[](#commands)

### `pastperfect:discover-registry`

[](#pastperfectdiscover-registry)

Queries the Internet Archive CDX API for all `*.pastperfectonline.com` tenant hostnames and writes a registry listing JSONL. Delegates to `CdxDiscoveryService` from `survos/site-discovery-bundle`.

```
bin/console pastperfect:discover-registry [options]
```

OptionDefaultDescription`--output``var/pastperfect/registry-listing.jsonl`Output path`--force`falseRe-discover even if file is already complete`--limit`0Stop after N sites (0 = unlimited). **Use during development.**```
# Test with 5 sites
bin/console pastperfect:discover-registry --limit=5

# Full discovery (slow — CDX pages take 10–30 s each)
bin/console pastperfect:discover-registry
```

Each record:

```
{
  "slug":          "fauquierhistory",
  "host":          "fauquierhistory.pastperfectonline.com",
  "base_url":      "https://fauquierhistory.pastperfectonline.com",
  "discovered_via": "internet_archive_cdx",
  "validated":     false,
  "validated_at":  null
}
```

---

### `pastperfect:probe-registry`

[](#pastperfectprobe-registry)

Reads the registry listing and dispatches one `ProbeRegistrySiteMessage` per site. Each message validates the host is a live PPO site and writes the result to a probed registry JSONL.

Configure Symfony Messenger to handle the messages asynchronously, or leave unrouted for synchronous handling:

```
# config/packages/messenger.yaml
framework:
    messenger:
        routing:
            Survos\PastPerfectBundle\Message\ProbeRegistrySiteMessage: async
            Survos\PastPerfectBundle\Message\ProbeItemMessage:         async
```

```
bin/console pastperfect:probe-registry [listingFile] [options]
```

ArgumentDefaultDescription`listingFile``var/pastperfect/registry-listing.jsonl`Input listing JSONLOptionDefaultDescription`--output``{dir}/registry-probed.jsonl`Output path`--force`falseRe-probe all sites`--limit`0Stop after N sites---

### `pastperfect:harvest-listing`

[](#pastperfectharvest-listing)

Fetches the AdvancedSearch listing pages, parses all record links, and writes a listing JSONL. Resumes automatically on re-run (sidecar tracks progress).

```
bin/console pastperfect:harvest-listing  [options]
```

ArgumentDescription`baseUrl`e.g. `https://fauquierhistory.pastperfectonline.com`OptionDefaultDescription`--output-dir``var/pastperfect`Output directory`--force`falseRe-harvest even if completeOutput: `{output-dir}/{site}/{site}-listing.jsonl`

Each record:

```
{
  "source": "pastperfectonline",
  "site":   "fauquierhistory",
  "type":   "webobject",
  "id":     "AC429E12-B023-4E3D-BEC0-693892645021",
  "url":    "https://fauquierhistory.pastperfectonline.com/webobject/AC429E12-..."
}
```

---

### `pastperfect:harvest-details`

[](#pastperfectharvest-details)

Reads the listing JSONL, fetches each detail page (cached under `{cache_dir}/{site}/detail/`), parses catalog fields, and writes a flat details JSONL. A `.profile.json` is written automatically after every run.

```
bin/console pastperfect:harvest-details  [options]
```

ArgumentDescription`listingFile`Path to listing JSONL from `harvest-listing`OptionDefaultDescription`--output-dir`same dir as listing fileOutput directory`--force`falseRe-fetch all, ignoring cache`--profile-only`falseOnly (re-)profile the existing JSONL`--limit`0Stop after N recordsOutput files:

- `{site}-details.jsonl` — one flat record per item
- `{site}-details.jsonl.profile.json` — field profile

```
# Full harvest (cached HTML means re-runs are fast)
bin/console pastperfect:harvest-details \
    var/pastperfect/fauquierhistory/fauquierhistory-listing.jsonl

# Test with 10 records
bin/console pastperfect:harvest-details \
    var/pastperfect/fauquierhistory/fauquierhistory-listing.jsonl --limit=10

# Re-profile without any HTTP requests
bin/console pastperfect:harvest-details \
    var/pastperfect/fauquierhistory/fauquierhistory-listing.jsonl --profile-only
```

---

About rights and licensing
--------------------------

[](#about-rights-and-licensing)

PastPerfect Online has **no dedicated rights or license field** in its catalog schema. The only rights signal is the footer copyright notice on each page, e.g.:

> © Fauquier Historical Society 2021

This is captured as `rights_notice` in every detail record. It is a **site-level**attribution, not item-level.

Government-operated PPO sites may publish under a permissive or CC license, but PPO itself provides no mechanism to declare this. Rights decisions must be made at the application level, using `rights_notice` plus out-of-band knowledge about the institution.

---

Analysing harvested fields
--------------------------

[](#analysing-harvested-fields)

```
# Review field landscape (requires survos/import-bundle)
bin/console import:profile:report \
    var/pastperfect/fauquierhistory/fauquierhistory-details.jsonl.profile.json

# Sort by distinct value count
bin/console import:profile:report ... --sort=distinct

# Show fields that look like delimited lists
bin/console import:profile:report ... --only=split

# Re-generate the profile at any time
bin/console pastperfect:harvest-details fauquierhistory-listing.jsonl --profile-only
```

---

Messenger messages
------------------

[](#messenger-messages)

Two messages are available for async processing:

MessageDispatched byHandler`ProbeRegistrySiteMessage``pastperfect:probe-registry``ProbeRegistrySiteHandler``ProbeItemMessage`(dispatch manually)`ProbeItemHandler`Route to any Symfony Messenger transport — Doctrine, AMQP, Redis, etc. Leave unrouted for synchronous in-process handling (useful for small sites or testing).

---

License
-------

[](#license)

MIT

###  Health Score

49

—

FairBetter than 94% of packages

Maintenance97

Actively maintained with recent releases

Popularity13

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity65

Established project with proven stability

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

67

Last Release

51d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/21b39551f92ed4143772c622f9e571589c5a72c96ab3c53fe67489ce0d83e806?d=identicon)[tacman1123](/maintainers/tacman1123)

---

Top Contributors

[![tacman](https://avatars.githubusercontent.com/u/619585?v=4)](https://github.com/tacman "tacman (5 commits)")

---

Tags

symfonyHarvestjsonlpastperfect

### Embed Badge

![Health badge](/badges/survos-past-perfect-bundle/health.svg)

```
[![Health](https://phpackages.com/badges/survos-past-perfect-bundle/health.svg)](https://phpackages.com/packages/survos-past-perfect-bundle)
```

###  Alternatives

[sylius/sylius

E-Commerce platform for PHP, based on Symfony framework.

8.4k5.6M647](/packages/sylius-sylius)[sulu/sulu

Core framework that implements the functionality of the Sulu content management system

1.3k1.3M151](/packages/sulu-sulu)[shopware/platform

The Shopware e-commerce core

3.3k1.5M3](/packages/shopware-platform)[pentatrion/vite-bundle

Vite integration for your Symfony app

2725.3M13](/packages/pentatrion-vite-bundle)[prestashop/prestashop

PrestaShop is an Open Source e-commerce platform, committed to providing the best shopping cart experience for both merchants and customers.

9.0k15.4k](/packages/prestashop-prestashop)[contao/core-bundle

Contao Open Source CMS

1231.6M2.3k](/packages/contao-core-bundle)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
