PHPackages                             acdh-oeaw/arche-metadata-crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. acdh-oeaw/arche-metadata-crawler

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

acdh-oeaw/arche-metadata-crawler
================================

Script and library for checking and generating ARCHE metadata in ACDH schema

0.16.2(2mo ago)1227[2 issues](https://github.com/acdh-oeaw/arche-metadata-crawler/issues)MITPHPPHP ^8.1CI passing

Since Nov 8Pushed 2mo ago4 watchersCompare

[ Source](https://github.com/acdh-oeaw/arche-metadata-crawler)[ Packagist](https://packagist.org/packages/acdh-oeaw/arche-metadata-crawler)[ Docs](https://github.com/acdh-oeaw/arche-metadata-crawler)[ RSS](/packages/acdh-oeaw-arche-metadata-crawler/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (11)Versions (47)Used By (0)

Metadata Crawler
================

[](#metadata-crawler)

[![Latest Stable Version](https://camo.githubusercontent.com/4b0c8b4c690ab1e653639eae77c63d247533d08c8ce185181cc1c95659ed7190/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f61726368652d6d657461646174612d637261776c65722f762f737461626c65)](https://packagist.org/packages/acdh-oeaw/arche-metadata-crawler)[![Build status](https://github.com/acdh-oeaw/arche-metadata-crawler/workflows/test/badge.svg?branch=master)](https://github.com/acdh-oeaw/arche-metadata-crawler/workflows/test/badge.svg?branch=master)[![Coverage Status](https://camo.githubusercontent.com/b2b07b9fcf38101e1f7270d721ea27ff957375ac3b4925d9621045cf15ae8e56/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f6769746875622f616364682d6f6561772f61726368652d6d657461646174612d637261776c65722f62616467652e7376673f6272616e63683d6d6173746572)](https://coveralls.io/github/acdh-oeaw/arche-metadata-crawler?branch=master)[![License](https://camo.githubusercontent.com/cbecd397cafdeaa356195f035f1105ace7b288823da9bc207316b9e9ab74b820/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f61726368652d6d657461646174612d637261776c65722f6c6963656e7365)](https://packagist.org/packages/acdh-oeaw/arche-metadata-crawler)

Functionality
-------------

[](#functionality)

A set of scripts:

- Merging metadata of a collection from inputs in [various formats](docs/metadata_formats.md)
- Validating the merged metadata
- Generating XLSX metadata templates based on the current ontology (see the *horizontal* metadata files in [metadata formats description](docs/metadata_formats.md#horizontal-metadata-file))

used for the metadata curation during ARCHE ingestions.

Installation
------------

[](#installation)

### Locally

[](#locally)

- Install PHP and [composer](https://getcomposer.org/)
- Run: ```
    composer require acdh-oeaw/arche-metadata-crawler
    ```

### As a docker image

[](#as-a-docker-image)

- Install [docker](https://www.docker.com/).
- Run the `acdhch/arche-ingest` image mounting your data directory into it: ```
    docker run --rm -ti --entrypoint bash -u `id -u`:`id -g` \
               -v pathToYourDataDir:/data \
               acdhch/arche-ingest
    ```
- Run the scripts, e.g. ```
    arche-create-metadata-template /data all
    ```

    and ```
    arche-crawl-meta \
      /data/metadata \
      /data/merged.ttl \
      /ARCHE/staging/GlaserDiaries_16674/data \
      https://id.acdh.oeaw.ac.at/glaserdiaries

    ```

    - if you need the [file-checker](https://github.com/acdh-oeaw/repo-file-checker), you can just run it with `arche-filechecker`

### On ACDH Cluster

[](#on-acdh-cluster)

Nothing to be done. It is installed there already.

Usage
-----

[](#usage)

(For a full walk-trough using arche-ingestion@acdh-cluster and the Wollmilchsau test collection please look [here](docs/walktrough.md))

### On ACDH Cluster

[](#on-acdh-cluster-1)

First, get the arche-ingestion workload console as described [here](https://github.com/acdh-oeaw/arche-ingest/blob/master/docs/acdh-cluster.md)

Then:

- Generate and validate the metadata:
    - Run the `arche-crawl-meta` script: ```
        /ARCHE/vendor/bin/arche-crawl-meta \
           \
         --filecheckerReportDir  \
           \
           \
           \
          2>&1 | tee
        ```

        e.g. ```
        /ARCHE/vendor/bin/arche-crawl-meta \
          /ARCHE/staging/GustavMahlerArchiv_22334/metadata \
          --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \
          /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \
          /ARCHE/staging/GustavMahlerArchiv_22334/data \
          https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \
          2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
        ```

        - If you are want to skip the checks (which speeds up the process significantly), add the `--noCheck` parameter, e.g. ```
            /ARCHE/vendor/bin/arche-crawl-meta \
              /ARCHE/staging/GustavMahlerArchiv_22334/metadata \
              --filecheckerReportDir /ARCHE/staging/GustavMahlerArchiv_22334/checkReports/2024_04_08_09_19_24 \
              /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.ttl \
              /ARCHE/staging/GustavMahlerArchiv_22334/data \
              https://id.acdh.oeaw.ac.at/GustavMahlerArchiv \
              --noCheck \
              2>&1 | tee /ARCHE/staging/GustavMahlerArchiv_22334/scriptFiles/metadata.log
            ```
- Create metadata templates: ```
    /ARCHE/vendor/bin/arche-create-metadata-template \
       \
      all
    ```

    e.g. to create templates in the current directory ```
    /ARCHE/vendor/bin/arche-create-metadata-template . all
    ```

### Locally

[](#locally-1)

- Generating and validaing the metadata: ```
    vendor/bin/arche-crawl-meta \
      --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \
      pathToInputMetadataDir \
      mergedMetadataFilePath \
      pathToCollectionData \
      pathToTargetMetadataFile
    ```

    e.g. ```
    vendor/bin/arche-crawl-meta \
      --filecheckerReportDir reports/2024_03_01_12_45_23 \
      metaDir \
      metadata.ttl \
      `pwd`/data \
      https://id.acdh.oeaw.ac.at/myCollection
    ```
- Creating metadata templates: ```
    vendor/bin/arche-create-metadata-template \
       \
      all
    ```

    e.g. to create templates in the current directory ```
    bin/arche-create-metadata-template . all
    ```

Remarks:

- To get a list of all available parameters run: ```
    vendor/bin/arche-crawl-meta --help
    vendor/bin/arche-create-metadata-template --help
    ```

### As a docker container

[](#as-a-docker-container)

- Generating and validaing the metadata:
    Run a container mounting directory structure inside the container and overridding the command to be run with the arche-crawl-meta: ```
    docker run \
      --rm -u `id -u`:`id -g`\
      -v pathInHost:/mnt \
      --entrypoint arche-crawl-meta \
      acdhch/arche-ingest \
      --filecheckerReportDir pathToDirectoryWithFilecheckerOutput \
      pathToInputMetadataDir \
      mergedMetadataFilePath \
      pathToCollectionData \
      pathToTargetMetadataFile
    ```

    e.g. to use with pahts relatively to the current working directory ```
    docker run \
      --rm -u `id -u`:`id -g`\
      -v `pwd`:/mnt \
      --entrypoint arche-crawl-meta \
      acdhch/arche-ingest \
      --filecheckerReportDir /mnt/reports/2024_03_01_12_45_23 \
      /mnt/metaDir \
      /mnt/metadata.ttl \
      /mnt/data \
      https://id.acdh.oeaw.ac.at/myCollection
    ```
- Creating metadata templates:
    Run a container mounting directory where templates should be created under `/mnt` inside the container and overridding the command to be run with the arche-create-metadata-template: ```
    docker run \
      --rm -u `id -u`:`id -g`\
      -v pathToDirectoryWhereTemplateShouldBeCreated:/mnt \
      --entrypoint arche-create-metadata-template
      acdhch/arche-ingest \
      /mnt all
    ```

    e.g. to create the templates in the current directory ```
    docker run \
      --rm -u `id -u`:`id -g` \
      -v `pwd`:/mnt \
      --entrypoint arche-create-metadata-template \
      acdhch/arche-ingest \
      /mnt all
    ```

###  Health Score

38

—

LowBetter than 85% of packages

Maintenance63

Regular maintenance activity

Popularity15

Limited adoption so far

Community8

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~18 days

Recently: every ~48 days

Total

46

Last Release

86d ago

PHP version history (3 changes)0.1.0PHP &gt;=8.1 &lt;8.2

0.2.0PHP &gt;=8.1 &lt;8.3

0.9.0PHP ^8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/f1f662e3acb95efd9208cdcd2c97437db366044aea705985251760efcbd23070?d=identicon)[zozlak](/maintainers/zozlak)

---

Top Contributors

[![zozlak](https://avatars.githubusercontent.com/u/6503177?v=4)](https://github.com/zozlak "zozlak (107 commits)")

---

Tags

arche

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/acdh-oeaw-arche-metadata-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/acdh-oeaw-arche-metadata-crawler/health.svg)](https://phpackages.com/packages/acdh-oeaw-arche-metadata-crawler)
```

###  Alternatives

[civicrm/civicrm-core

Open source constituent relationship management for non-profits, NGOs and advocacy organizations.

728272.9k20](/packages/civicrm-civicrm-core)[in2code/powermail

Powermail is a well-known, editor-friendly, powerful and easy to use mailform extension for TYPO3 with a lots of features

982.5M38](/packages/in2code-powermail)[solspace/craft-freeform

The most flexible and user-friendly form building plugin!

52664.9k12](/packages/solspace-craft-freeform)[tomshaw/electricgrid

A feature-rich Livewire package designed for projects that require dynamic, interactive data tables.

116.6k](/packages/tomshaw-electricgrid)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
