PHPackages                             acdh-oeaw/arche-ingest - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. acdh-oeaw/arche-ingest

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

acdh-oeaw/arche-ingest
======================

A set of sample ARCHE ingestion scripts

1.6.33(2mo ago)09311[2 issues](https://github.com/acdh-oeaw/arche-ingest/issues)MITPHPCI failing

Since Jan 11Pushed 1mo ago4 watchersCompare

[ Source](https://github.com/acdh-oeaw/arche-ingest)[ Packagist](https://packagist.org/packages/acdh-oeaw/arche-ingest)[ Docs](https://github.com/acdh-oeaw/arche-ingest)[ RSS](/packages/acdh-oeaw-arche-ingest/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (8)Versions (62)Used By (0)

[![Latest Stable Version](https://camo.githubusercontent.com/44519c79c1313d55257232ebf636edb94c1a802d041cbf50ecb8b007244b3888/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f61726368652d696e676573742f762f737461626c65)](https://packagist.org/packages/acdh-oeaw/arche-ingest)[![Build status](https://github.com/acdh-oeaw/arche-ingest/workflows/phpunit/badge.svg?branch=master)](https://github.com/acdh-oeaw/arche-ingest/workflows/phpunit/badge.svg?branch=master)[![Coverage Status](https://camo.githubusercontent.com/11074b779b730b44de575e85e970c90df74b78c2967a25982c44f76c804b30c6/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f6769746875622f616364682d6f6561772f61726368652d696e676573742f62616467652e7376673f6272616e63683d6d6173746572)](https://coveralls.io/github/acdh-oeaw/arche-ingest?branch=master)[![License](https://camo.githubusercontent.com/a001d6e4ace5895492a88cce8abe9c3d4ac9ba8035939b184dcc49d9b53e0e41/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f61726368652d696e676573742f6c6963656e7365)](https://packagist.org/packages/acdh-oeaw/arche-ingest)

A collection of ARCHE ingestion script templates
================================================

[](#a-collection-of-arche-ingestion-script-templates)

The REST API provided by the ARCHE is quite a low-level from the point of view of real-world data ingestions. To make ingestions simpler, the [arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest) library has been developed. While it provides a convenient high-level data ingestion API, it's still only a library which requires you to write your own ingestion script.

This repository is aimed at closing this gap - it provides a set of data ingestion scripts (built on top of the [the arche-lib-ingest](https://github.com/acdh-oeaw/arche-lib-ingest)) which can be used by people with almost no programming skills.

Scripts provided
----------------

[](#scripts-provided)

There are two script variants provided:

- **Console scripts variant** where where parameters are passed trough the command line.
    The benefit of this variant is easiness of use, especially in CI/CD workflows.
    - `bin/arche-import-metadata` imports metadata from an RDF file
    - `bin/arche-import-binary` (re)ingests a single resource's binary content (to be used when file name and/or location changed)
    - `bin/arche-delete-resource` removes a given repository resource (allows recursion, etc.)
    - `bin/arche-delete-triples` removes metadata triples specified in the ttl file (but doesn't remove repository resources)
    - `bin/arche-update-redmine` updates a Redmine issue describing the data curation/ingestion process (see a dedicated section at the bottom of the README)
- **Template variant** where you adjust execution parameters and/or the way the script works by editign its content.
    The benefit of this variant is that it allows to treat the adjusted script as a documentation of the ingestion process and/or adjust it to your particular needs.
    - `add_metadata_sample.php` adds metadata triples specified in the ttl file preserving all existing metadata of repository resources
    - `delete_metadata_sample.php` removes metadata triples specified in the ttl file (but doesn't remove repository resources)
    - `delete_resource_sample.php` removes a given repository resource (allows recursion, etc.)
    - `import_binary_sample.php` imports binary data from the disk
    - `import_metadata_sample.php` imports metadata from an RDF file
    - `reimport_single_binary.php` reingests a single resource's binary content (to be used when file name and/or location changed)

Installation &amp; Usage
------------------------

[](#installation--usage)

### Runtime environment

[](#runtime-environment)

You need [PHP](https://www.php.net/) and [Composer](https://getcomposer.org/).

You can also use the `acdhch/arche-ingest` [Docker](https://www.docker.com/) image (the `{pathToDirectoryWithFilesToIngest}` will be available at the `/data` location inside the Docker container):

```
docker run \
  --rm \
  -ti \
  --name arche-ingest \
  -v {pathToDirectoryWithFilesToIngest}:/data \
  acdhch/arche-ingest
```

### Console script variant

[](#console-script-variant)

- Install with: ```
    composer require acdh-oeaw/arche-ingest
    ```
- Update regularly with: ```
    composer update --no-dev

    ```
- Run with: ```
    vendor/bin/{scriptOfYourChoice} {parametersGoHere}
    ```

    e.g. ```
    vendor/bin/arche-import-metadata --concurrency 4 myRdf.ttl https://arche.acdh.oeaw.ac.at/api myLogin myPassword
    ```

    - To get the list of available parameters run ```
        vendor/bin/{scriptOfYourChoice} --help
        ```

        e.g. ```
        vendor/bin/arche-import-metadata --help
        ```

#### Running inside GitHub Actions

[](#running-inside-github-actions)

Do not store your ARCHE credentials in the workflow configuration file. Use repository secrets instead (see example below).

A fragment of your workflow's yaml config may look like that:

```
    - name: ingestion  dependencies
      run: |
        composer require acdh-oeaw/arche-ingest
    - name: ingest arche
      run: |
        vendor/bin/arche-import-metadata myRdfFile.ttl https://arche-curation.acdh-dev.oeaw.ac.at/api ${{secrets.ARCHE_LOGIN}} ${{secrets.ARCHE_PASSWORD}}
        vendor/bin/arche-update-redmine --token ${{ secrets.REDMINE_TOKEN }} https://redmine.acdh.oeaw.ac.at 1234 'Upload AIP to Curation Instance (Minerva)'
```

#### Running on ACDH Cluster

[](#running-on-acdh-cluster)

First, get the arche-ingestion workload console as described [here](https://github.com/acdh-oeaw/arche-ingest/blob/master/docs/acdh-cluster.md)

Then:

- Run `screen -S mySessionName`
- Go to your ingestion directory
- Run scripts using `{scriptName}`, e.g. ```
    arche-import-metadata myRdf.ttl https://arche.acdh.oeaw.ac.at/api myLogin myPassword
    ```
- If the script will take long to run, you may safely quit the console with `CTRL+a` + `d` followed by `exit`.
    - To get back to the script log again into `repo-ingestion@hephaistos` and run ```
        screen -r mySessionName
        ```

### Template variant

[](#template-variant)

- Clone this repository.
- Run ```
    composer update --no-dev
    ```
- Adjust the script of your choice.
    - Available parameters are provided at the beginning of the script.
    - Don't adjust anything below the ```
        // NO CHANGES NEEDED BELOW THIS LINE
        ```

        line until you consider yourself a programmer and would like to change the way a script works.
- Run the script with ```
    php -f {scriptOfYourChoice}
    ```

    - You can consider reading input from a file and/or saving output to a log file, e.g. with: ```
        php -f import_metadata_sample.php < inputData 2>&1 | tee logFile

        ```

        (see the section below for hints on the input file format)

### Long runs

[](#long-runs)

If you are performing time consuming operations, e.g. a large data ingestion, you may consider running scripts in a way they won't stop when you turn your computer off.

You can use `nohup` or `screen` for that, e.g.:

- nohup - run with: ```
    # console script variant
    nohup vendor/bin/arche-import-metadata --concurrency 4 myRdf.ttl https://arche.acdh.oeaw.ac.at/api myLogin myPassword > logFile 2>&1 &
    # template variant
    nohup php -f import_metadata_sample.php < input > logFile 2>&1 &

    ```

    - If you want to run template script variants that way, you **have to** prepare the input data file.
        It should look as follows: ```
        {arche instance API URL}
        yes
        {login}
        {password}

        ```

        e.g. ```
        https://arche-dev.acdh-dev.oeaw.ac.at
        yes
        myLogin
        myPassword

        ```
- screen
    - start a `screen` session with ```
        screen -S mySessionName
        ```
    - Then run your commands as usual
    - Hit `CTRL+a` followed by a `d` to leave the `screen` session.
    - You can get back to the `screen` session with ```
        screen -r mySessionName
        ```

Reporting errors
----------------

[](#reporting-errors)

Create a subtask of the Redmine issue [\#17641](https://redmine.acdh.oeaw.ac.at/issues/17641).

- Provide information on the exact location of the ingestion script location (including the script file itself) and any other information which may be required to replicated the problem.
- Assign Mateusz and Norbert as watchers.

Using arche-update-redmine in a GitHub workflow
-----------------------------------------------

[](#using-arche-update-redmine-in-a-github-workflow)

The basic idea is to execute data processing steps in a following way:

- note down the step name so we can read it instead of a failure
- perform the step
- call the arche-update-redmine

and have a separate on-failure job step which makes an arche-update-redmine call noting the faillure.

Remarks:

- As a good practice we should include the GitHub job URL in the Redmine issue note. For that we set up a dedicated environment variable.
- It goes without saying Redmine access credentials are stored as a repository secret.
- The way you store the main Redmine issue ID doesn't matter as it's not secret. Do it a way you want (here we just hardcode it in the workflow using an environment variable)

```
name: sample

on:
  push: ~

jobs:
  dockerhub:
    runs-on: ubuntu-latest
    env:
      REDMINE_ID: 21085
    steps:
    - uses: actions/checkout@v4
    - name: init
      run: |
        composer require acdh-oeaw/arche-ingest
        echo "RUN_URL=$GITHUB_SERVER_URL/$GITHUB_REPOSITORY/actions/runs/$GITHUB_RUN_ID" >> $GITHUB_ENV
    - name: virus scan
      run: |
        echo 'STEP=Virus Scan' >> $GITHUB_ENV
        ...perform the virus scan...
        vendor/bin/arche-update-redmine --token ${{ secrets.REDMINE_TOKEN }} --append "$RUN_URL" $REDMINE_ID 'Virus scan'
    - name: repo-filechecker
      run: |
        echo 'STEP=Run repo-file-checker' >> $GITHUB_ENV
        ...run the repo-filechecker...
        vendor/bin/arche-update-redmine --token ${{ secrets.REDMINE_TOKEN }} --append "$RUN_URL" $REDMINE_ID 'Run repo-file-checker'
    - name: check3
      run: |
        echo 'STEP=Upload AIP to Curation Instance (Minerva)' >> $GITHUB_ENV
        ...perform the ingestion...
        vendor/bin/arche-update-redmine --token ${{ secrets.REDMINE_TOKEN }} --append "$RUN_URL" $REDMINE_ID 'Upload AIP to Curation Instance (Minerva)'
    - name: on failure
      if: ${{ failure() }}
      run: |
        vendor/bin/arche-update-redmine --token ${{ secrets.REDMINE_TOKEN }} --append "$RUN_URL" --statusCode 1 $REDMINE_ID "$STEP"

```

###  Health Score

48

—

FairBetter than 95% of packages

Maintenance87

Actively maintained with recent releases

Popularity18

Limited adoption so far

Community14

Small or concentrated contributor base

Maturity63

Established project with proven stability

 Bus Factor1

Top contributor holds 94.8% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~25 days

Recently: every ~0 days

Total

61

Last Release

81d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/f1f662e3acb95efd9208cdcd2c97437db366044aea705985251760efcbd23070?d=identicon)[zozlak](/maintainers/zozlak)

---

Top Contributors

[![zozlak](https://avatars.githubusercontent.com/u/6503177?v=4)](https://github.com/zozlak "zozlak (128 commits)")[![bellerophons-pegasus](https://avatars.githubusercontent.com/u/5328599?v=4)](https://github.com/bellerophons-pegasus "bellerophons-pegasus (3 commits)")[![csae8092](https://avatars.githubusercontent.com/u/7540477?v=4)](https://github.com/csae8092 "csae8092 (3 commits)")[![aureon249](https://avatars.githubusercontent.com/u/32538469?v=4)](https://github.com/aureon249 "aureon249 (1 commits)")

---

Tags

arche

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/acdh-oeaw-arche-ingest/health.svg)

```
[![Health](https://phpackages.com/badges/acdh-oeaw-arche-ingest/health.svg)](https://phpackages.com/packages/acdh-oeaw-arche-ingest)
```

###  Alternatives

[heloufir/filament-workflow-manager

Manage your own workflows with filament

711.3k](/packages/heloufir-filament-workflow-manager)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
