PHPackages                             survos/dataset-bundle - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. survos/dataset-bundle

ActiveSymfony-bundle[Utility &amp; Helpers](/categories/utility)

survos/dataset-bundle
=====================

Shared data directory conventions and path utilities for dataset-driven apps (APP\_DATA\_DIR).

2.7.2(1w ago)048↑325%4MITPHPPHP ^8.4

Since May 21Pushed 1w agoCompare

[ Source](https://github.com/survos/dataset-bundle)[ Packagist](https://packagist.org/packages/survos/dataset-bundle)[ GitHub Sponsors](https://github.com/kbond)[ RSS](/packages/survos-dataset-bundle/feed)WikiDiscussions main Synced 1w ago

READMEChangelogDependencies (30)Versions (13)Used By (4)

Survos Data Bundle
==================

[](#survos-data-bundle)

`survos/data-bundle` centralizes dataset filesystem conventions for dataset-driven Symfony applications.

Despite the historical name, this bundle is not the owner of shared semantic metadata contracts. It manages where dataset files, provider metadata, Pixie databases, run artifacts, cache files, and related JSONL outputs live.

For shared vocabulary and typed metadata contracts, use `survos/data-contracts`.

Scope
-----

[](#scope)

This bundle provides:

- `DataPaths`: root-level path resolution under `APP_DATA_DIR`
- `DatasetPaths`: dataset-scoped path helpers
- dataset metadata loading and ensuring
- `DatasetInfo` / `Provider` registry entities
- provider snapshot encoding
- dataset context helpers for console/import workflows
- commands for browsing, diagnosing, and resolving dataset paths

This bundle does not provide:

- Dublin Core vocabulary constants
- collection-object DTO contracts
- metadata claim storage
- AI workflow execution
- media upload, IIIF, or mediary publishing
- import/normalize/profile logic

Relationship to Other Packages
------------------------------

[](#relationship-to-other-packages)

- `survos/data-contracts`: shared metadata vocabulary and DTO contracts.
- `survos/data-bundle`: dataset paths, provider storage, and dataset registry.
- `survos/import-bundle`: import/convert workflows that may ask this bundle for dataset paths.
- `survos/ai-workflow-bundle`: task execution in apps that own subject context.
- claims bundle: tracked metadata assertions with provenance and confidence.
- `survos/media-bundle`: media identity and mediary publishing.

The dependency direction should stay honest: packages should require `survos/data-contracts` directly when they only need `DcTerms`, `ContentType`, or metadata DTOs. Do not require this bundle just to get vocabulary classes.

Core Idea
---------

[](#core-idea)

All dataset work lives under a single root directory:

```
APP_DATA_DIR=/absolute/path/to/data/root
```

The bundle avoids repository-relative paths and gives services and commands one place to ask for canonical locations.

Example layout:

```
$APP_DATA_DIR/
  work/
    /
      00_meta/
        dataset.json
      10_extract/
        obj.jsonl
      20_normalize/
        obj.jsonl
      21_profile/
        obj.profile.json
      30_terms/
        *.jsonl
  pixie/
    tenants/
      .db
    template/
    exports/
  runs/
  cache/

```

Installation
------------

[](#installation)

```
composer require survos/data-bundle
```

Set the root directory:

```
export APP_DATA_DIR=/absolute/path/to/data/root
```

Usage
-----

[](#usage)

Inject `DataPaths` for root and dataset path resolution:

```
use Survos\DataBundle\Service\DataPaths;

final class SomeService
{
    public function __construct(
        private readonly DataPaths $paths,
    ) {
    }
}
```

Common dataset paths:

```
$paths->datasetDir('dc/tb09jw350');
$paths->extractDir('dc/tb09jw350');
$paths->extractFile('dc/tb09jw350');
$paths->normalizeDir('dc/tb09jw350');
$paths->normalizeFile('dc/tb09jw350');
$paths->profileDir('dc/tb09jw350');
$paths->profileFile('dc/tb09jw350');
$paths->termsDir('dc/tb09jw350');
```

Pixie paths:

```
$paths->pixieTenantDb('larco');
```

Operational directories:

```
$paths->runsDir;
$paths->cacheDir;
```

Commands
--------

[](#commands)

Current command names retain the historical `data:*` prefix:

```
bin/console data:path dc/tb09jw350 20_normalize
bin/console data:head dc/tb09jw350 20_normalize --limit=5
bin/console data:diag dc/tb09jw350
bin/console data:browse
bin/console data:scan-datasets
```

These may eventually move to `dataset:*` aliases when the bundle is renamed.

Directory Creation
------------------

[](#directory-creation)

Ensure global roots exist:

```
$paths->ensureRootDirs();
```

Ensure standard dataset stage directories exist:

```
$paths->ensureDatasetDirs('dc/tb09jw350');
```

Atomic File Writes
------------------

[](#atomic-file-writes)

For small metadata files:

```
$paths->atomicWrite($path, $contents);
```

The write uses a temporary file in the same directory followed by an atomic rename.

Design Principles
-----------------

[](#design-principles)

- Dataset path conventions are centralized.
- Paths are semantic, not stringly typed.
- Dataset/provider storage concerns stay separate from semantic metadata contracts.
- Import, AI workflow, claims, and media publishing remain in their own packages.
- The bundle should stay boring and infrastructure-focused.

Future Rename
-------------

[](#future-rename)

The better long-term name is `survos/dataset-bundle`. See [`docs/rename-to-dataset-bundle.md`](docs/rename-to-dataset-bundle.md).

###  Health Score

47

—

FairBetter than 93% of packages

Maintenance98

Actively maintained with recent releases

Popularity11

Limited adoption so far

Community14

Small or concentrated contributor base

Maturity58

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

12

Last Release

11d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/21b39551f92ed4143772c622f9e571589c5a72c96ab3c53fe67489ce0d83e806?d=identicon)[tacman1123](/maintainers/tacman1123)

---

Top Contributors

[![tacman](https://avatars.githubusercontent.com/u/619585?v=4)](https://github.com/tacman "tacman (10 commits)")

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/survos-dataset-bundle/health.svg)

```
[![Health](https://phpackages.com/badges/survos-dataset-bundle/health.svg)](https://phpackages.com/packages/survos-dataset-bundle)
```

###  Alternatives

[easycorp/easyadmin-bundle

Admin generator for Symfony applications

4.3k17.5M370](/packages/easycorp-easyadmin-bundle)[sulu/sulu

Core framework that implements the functionality of the Sulu content management system

1.3k1.4M195](/packages/sulu-sulu)[shopware/core

Shopware platform is the core for all Shopware ecommerce products.

585.4M506](/packages/shopware-core)[2lenet/crudit-bundle

The easy like Crud'it Bundle.

1715.6k12](/packages/2lenet-crudit-bundle)[open-dxp/opendxp

Content &amp; Product Management Framework (CMS/PIM)

9017.2k55](/packages/open-dxp-opendxp)[chameleon-system/chameleon-base

The Chameleon System core.

1027.9k4](/packages/chameleon-system-chameleon-base)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
