PHPackages                             acdh-oeaw/repo-file-checker - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. acdh-oeaw/repo-file-checker

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

acdh-oeaw/repo-file-checker
===========================

Script for checking if files fulfill ARCHE repository ingestion requirements

3.19.0(1mo ago)23811[7 issues](https://github.com/acdh-oeaw/repo-file-checker/issues)MITPHPPHP ^8.1CI passing

Since Jan 25Pushed 1mo ago4 watchersCompare

[ Source](https://github.com/acdh-oeaw/repo-file-checker)[ Packagist](https://packagist.org/packages/acdh-oeaw/repo-file-checker)[ Docs](https://github.com/acdh-oeaw/repo-file-checker)[ RSS](/packages/acdh-oeaw-repo-file-checker/feed)WikiDiscussions master Synced 1mo ago

READMEChangelog (10)Dependencies (14)Versions (48)Used By (0)

repo-filechecker
================

[](#repo-filechecker)

[![Latest Stable Version](https://camo.githubusercontent.com/8d6b733dda9befc3c38833a1ad06c4020c75517ea488fc3ad7a5be948267269a/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f7265706f2d66696c652d636865636b65722f762f737461626c65)](https://packagist.org/packages/acdh-oeaw/repo-file-checker)[![Build Status](https://github.com/acdh-oeaw/repo-file-checker/actions/workflows/test.yml/badge.svg?branch=master)](https://github.com/acdh-oeaw/repo-file-checker/actions/workflows/test.yml)[![Coverage Status](https://camo.githubusercontent.com/031d8918a7e66d230dd2ff4708f4b0f85e4bfcf16aadf928af7dc2ebb3a7e38d/68747470733a2f2f636f766572616c6c732e696f2f7265706f732f6769746875622f616364682d6f6561772f7265706f2d66696c652d636865636b65722f62616467652e7376673f6272616e63683d6d6173746572)](https://coveralls.io/github/acdh-oeaw/repo-file-checker?branch=master)[![License](https://camo.githubusercontent.com/ea34a140e5312d7dcc49575614607edcbabd7494c5aa6cc0826340af84423aba/68747470733a2f2f706f7365722e707567782e6f72672f616364682d6f6561772f7265706f2d66696c652d636865636b65722f6c6963656e7365)](https://packagist.org/packages/acdh-oeaw/repo-file-checker)

Functionality
-------------

[](#functionality)

- Analyzes the data structure and creates a json/ndjson output providing:
    - Files list
    - Directory list
    - File type list
    - Errors list
- Can also create HTML reports from the generated JSON file.
- When run as a docker container, performs antivirus check on files.

### Implemented error checks

[](#implemented-error-checks)

- File and directory names don't contain forbidden characters.
- File extension matches MIME type deteced based on the file content (MIME-extensions mapping based on the [PRONOM database](http://www.nationalarchives.gov.uk/aboutapps/pronom) with some tuning for not fully reliable content-based MIME type recognition).
- MIME type of a file must be accepted by the ARCHE (as reported by the [arche-assets](https://github.com/acdh-oeaw/arche-assets/)).
- Text files don't contain the [byte order mark](https://en.wikipedia.org/wiki/Byte_order_mark).
- [BagIt](https://en.wikipedia.org/wiki/BagIt) archives are correct (based on checks performed by the [whikloj/bagittools](https://github.com/whikloj/BagItTools) library; bagit archives can be uncompressed of zip/tar gz/tar bz2 files).
- ZIP, XLSX, DOCX, ODS, ODT and PDF files aren't password protected.
    - To avoid memory limit problems only files up to a configuration-determined size are checked.
- XML files provide XML declaration and schema declaration and validate against the schema.
- Image files aren't corrupted.
- No duplicated files (compared by hash).
- No filenames conflicts on case-insensitive filesystems.

Installation
------------

[](#installation)

### Locally

[](#locally)

The filechecker depends on presence of some external tools in your system (e.g. gdal) so trying to run it locally can be a painful experience. If you want to try, just:

- Install PHP and [composer](https://getcomposer.org/)
- Run: ```
    composer require acdh-oeaw/repo-file-checker
    ```
- Install any other missing software based on errors you get while running the filechecker.

### As a docker image

[](#as-a-docker-image)

- Install [docker](https://www.docker.com/).

### On ACDH Cluster

[](#on-acdh-cluster)

Nothing to be done. It is installed there already.

Usage
-----

[](#usage)

### General remarks

[](#general-remarks)

- You can test if the check was successful by reading the exit code of the `arche-filechecker` command. `0` indicates a successful check and non-zero value that at least one error was found.
- To get a list of all available parameters run: ```
    vendor/bin/arche-filechecker --help
    ```
- If you have [bagit](https://en.wikipedia.org/wiki/BagIt) files, place them into a folder called `bagit` and also compress them into a tgz file.

### On ACDH cluster

[](#on-acdh-cluster-1)

First, get the arche-ingestion workload console as described [here](https://github.com/acdh-oeaw/arche-ingest/blob/master/docs/acdh-cluster.md)

Then:

- filechecker ```
    arche-filechecker --csv --html directoryToBeProcessed directoryToWriteReportsInto
    ```
- virus scan ```
    clamscan --infected --recursive directoryToScan
    ```

### Locally

[](#locally-1)

```
vendor/bin/arche-filechecker --csv --html directoryToBeProcessed directoryToWriteReportsInto
```

### As a docker container

[](#as-a-docker-container)

- Consider downloading fresh signatures for the antivirus software
    - If you're running inside a CI/CD workflow and don't want to be a bad guy causing unnecessary load on the server storing the signature, store the downloaded database in a cache, e.g. on Github Actions you may perform the db update using following build steps: ```
        - name: cache AV database
          id: avdb
          uses: actions/cache@v4
          with:
            path: ~/avdb
            key: constant
        - name: refresh AV database
          run: |
            chmod 777 ~/avdb
            docker run --rm -v ~/avdb:/var/lib/clamav --entrypoint freshclam acdhch/arche-ingest --foreground
        ```
    - On localhost (just adjust the path to the directory with the virus signatures) ```
        mkdir -p -m 777 ~/avdb
        docker run --rm -v ~/avdb:/var/lib/clamav --entrypoint freshclam acdhch/arche-ingest --foreground
        ```
- To run a virus check ```
    docker run \
      --rm \
      -v pathToVirusSignaturesDirectory:/var/lib/clamav \
      -v pathToDirectoryToBeProcessed:/data \
      --entrypoint clamscan \
      acdhch/arche-ingest
      --recursive --infected /data
    ```

    e.g. ```
    docker run \
      --rm \
      -v ~/avdb:/var/lib/clamav \
      -v `pwd`:/data \
      --entrypoint clamscan \
      acdhch/arche-ingest
      --recursive --infected /data
    ```
- To run the filechecker ```
    docker run \
      --rm -u $UID \
      -v pathToDirectoryToBeProcessed:/data \
      -v pathToReportsDir:/reports \
      --entrypoint arche-filechecker \
      acdhch/arche-ingest
      --csv --html /data /reports
    ```

    e.g. ```
    docker run \
      --rm --user $UID \
      -v /ARCHE/staging/testWollmilchsau/checkReports:/reports \
      -v /ARCHE/staging/testWollmilchsau/data:/data \
      --entrypoint arche-filechecker \
      acdhch/arche-ingest \
      --csv --html /data /reports
    ```

Remarks:

- If you're processing data in parts you can save some time by running the container in the daemonized mode. That way you can avoid loading the virus signatures database every time you run the check. The database load takes 2-5 seconds. In the daemonized setup:
    - Run the container with ```
        docker run \
          --rm -d \
          --name filechecker \
          -v `pwd`/MY_REPORTS_DIR:/reports \
          -v `pwd`/MY_DATA_DIR:/data \
          -v ~/.cvdupdate/database/:/var/lib/clamav \
          -e DAEMONIZE=1 \
          acdhch/arche-filechecker
        ```
    - Wait a few seconds for the AV software to load the viruses database (you can look at docker logs to check if it's ready).
    - Perform the checks with ```
        # virus check
        docker exec filechecker clamdscan --infected --recursive /data
        # filechecker check
        docker exec --user $UID filechecker /opt/filechecker/bin/arche-filechecker --csv --html /data /reports
        ```

Test Files:
-----------

[](#test-files)

Test files are stored in the `tests/data` folder.

###  Health Score

47

—

FairBetter than 94% of packages

Maintenance70

Regular maintenance activity

Popularity19

Limited adoption so far

Community15

Small or concentrated contributor base

Maturity71

Established project with proven stability

 Bus Factor1

Top contributor holds 67.4% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~33 days

Total

47

Last Release

47d ago

Major Versions

1.0.0 → 2.0.02022-01-25

2.1.1 → 3.0.0-RC12023-02-03

PHP version history (3 changes)2.0.0PHP &gt;=8.0 &lt;8.2

3.1.4PHP &gt;=8.0 &lt;8.3

3.10.1PHP ^8.1

### Community

Maintainers

![](https://www.gravatar.com/avatar/f1f662e3acb95efd9208cdcd2c97437db366044aea705985251760efcbd23070?d=identicon)[zozlak](/maintainers/zozlak)

---

Top Contributors

[![zozlak](https://avatars.githubusercontent.com/u/6503177?v=4)](https://github.com/zozlak "zozlak (157 commits)")[![nczirjak-acdh](https://avatars.githubusercontent.com/u/20183307?v=4)](https://github.com/nczirjak-acdh "nczirjak-acdh (70 commits)")[![bellerophons-pegasus](https://avatars.githubusercontent.com/u/5328599?v=4)](https://github.com/bellerophons-pegasus "bellerophons-pegasus (4 commits)")[![csae8092](https://avatars.githubusercontent.com/u/7540477?v=4)](https://github.com/csae8092 "csae8092 (1 commits)")[![vronk](https://avatars.githubusercontent.com/u/1272691?v=4)](https://github.com/vronk "vronk (1 commits)")

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Type Coverage Yes

### Embed Badge

![Health badge](/badges/acdh-oeaw-repo-file-checker/health.svg)

```
[![Health](https://phpackages.com/badges/acdh-oeaw-repo-file-checker/health.svg)](https://phpackages.com/packages/acdh-oeaw-repo-file-checker)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
