PHPackages                             discoverygarden/islandora\_hocr - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. discoverygarden/islandora\_hocr

ActiveDrupal-module

discoverygarden/islandora\_hocr
===============================

v1.4.1(3mo ago)017.3k↓10.6%5[1 issues](https://github.com/discoverygarden/islandora_hocr/issues)[2 PRs](https://github.com/discoverygarden/islandora_hocr/pulls)2GPL-3.0-or-laterPHPCI passing

Since May 1Pushed 3mo ago6 watchersCompare

[ Source](https://github.com/discoverygarden/islandora_hocr)[ Packagist](https://packagist.org/packages/discoverygarden/islandora_hocr)[ RSS](/packages/discoverygarden-islandora-hocr/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependenciesVersions (22)Used By (2)

Islandora hOCR
==============

[](#islandora-hocr)

Introduction
------------

[](#introduction)

Adds the hOCR derivative functionality.

Installation
------------

[](#installation)

Install as usual, see [this](https://www.drupal.org/docs/extending-drupal/installing-modules) for further information.

This module contains a migration facilitating the creation of a media use term for use in common Islandora configurations. Enabling the module will expose the `islandora_hocr_media_uses` migration to generate a media use term of the URI `https://discoverygarden.ca/use#hocr`.

```
# Flow might be something like:
drush en islandora_hocr
drush migrate:import islandora_hocr_media_uses
```

Configuration
-------------

[](#configuration)

Configuration is presented performed via environment variables.

VariableDefaultDescription`ISLANDORA_HOCR_SNIPPETS``20`Number of snippets per document to include in the response.### Derivatives

[](#derivatives)

An action must be created and configured to generate an hOCR derivative. The action must also be triggered by a context in order for the derivative to be made. Refer to the [official Islandora docs](https://islandora.github.io/documentation/concepts/derivatives/) for more information.

### Solr

[](#solr)

We expect to make use of the [Solr OCR Highlighting Plugin](https://dbmdz.github.io/solr-ocrhighlighting/). The particulars of its installation are ultimately up to the environment into which it is being installed.

We have a single environment variable to allow the path of the library on the Solr instance to be specified, such that we can add its path to the configset for Solr:

- `SOLR_HOCR_PLUGIN_PATH`: A path resolvable by Solr to the directory containing the OCR Highlighting Plugin JAR.

There are a couple of config entities included:

- the `islandora_hocr` field type to perform tokenization
- the "Select w/ HOCR highlighting" `/select_ocr` request handler.

### HOCR Indexing

[](#hocr-indexing)

To `node` entities, we have added the ability to index HOCR from related media, making use of the [Solr OCR Highlighting Plugin](https://dbmdz.github.io/solr-ocrhighlighting/0.8.3/)

As an example, you might add the `islandora_hocr_field:content` property to be indexed in Solr via the Search API Solr config, as `islandora_hocr_field`, as a `Fulltext ("islandora_hocr")` field.

Something of an aside, but the `islandora_hocr_field:uri` is presently prototypical: The Solr OCR Highlighting plugin has another character filter which handles processing paths into the contents of the files; however, in the context of things communicating via the network, such access might not always be possible, particular should access control enter in to the equation... as such, we presently expect the full page-level OCR document to be pushed for each page.

Usage
-----

[](#usage)

Assuming indexing is configured as above, with a `islandora_hocr_field`, then you might programmatically perform a Search API query with something like:

```
$index = \Drupal\search_api\Entity\Index::load('default_solr_index');
$query = $index->query();

// The search term(s).
$query->keys('bravo');
// Additional conditions, as desired.
$query->addCondition('type', 'islandora_object');
// Activate our highlighting behaviour.
$query->setOption('islandora_hocr_properties', [
  'islandora_hocr_field' => [],
]);

// Perform the query.
$results = $query->execute();

// Get the additionally-populated property info, so we can identify what fields from the highlighted results correspond to which property.
$info = $results->getQuery()->getOption('islandora_hocr_properties');
// This should be an associative array mapping language codes to Solr fields,
// which can then be found in the $highlights below.
$language_fields = $info['islandora_hocr_field']['language_fields'];

// When processing the results, the
foreach ($results as $result) {
  // Highlighting info can be acquired from the items. The format here is the
  // same as the format from https://dbmdz.github.io/solr-ocrhighlighting/0.8.3/query/#response-format
  // for the given item/document.
  $highlights = $result->getExtraData('islandora_hocr_highlights');
}
```

Troubleshooting/Issues
----------------------

[](#troubleshootingissues)

Having problems or solved one? contact [discoverygarden](http://support.discoverygarden.ca).

### Known issues

[](#known-issues)

- [Solr Cloud Package](https://dbmdz.github.io/solr-ocrhighlighting/0.8.3/installation/#for-solrcloud-users-installation-as-a-solr-package) (in)compatibility: The path to the library could be omitted; however, the conditional inclusion of prefixes in the config entities is problematic.

Maintainers/Sponsors
--------------------

[](#maintainerssponsors)

Current maintainers:

- [discoverygarden](http://www.discoverygarden.ca)

Sponsor:

- [CTDA: Connecticut Digital Archive](https://lib.uconn.edu/find/connecticut-digital-archive/)

License
-------

[](#license)

[GPLv3](http://www.gnu.org/licenses/gpl-3.0.txt)

###  Health Score

49

—

FairBetter than 95% of packages

Maintenance79

Regular maintenance activity

Popularity30

Limited adoption so far

Community25

Small or concentrated contributor base

Maturity54

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 57.8% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~100 days

Recently: every ~202 days

Total

11

Last Release

112d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/ed79bbc35ed0aaee2f603c6f16885fb01e203941eea0cd63013f9f6185df6c30?d=identicon)[adam-vessey](/maintainers/adam-vessey)

![](https://www.gravatar.com/avatar/49d799e416ff30e90b07d1b7b6a487dfe945ae884c53747de4724439965d7904?d=identicon)[lutaylor](/maintainers/lutaylor)

![](https://www.gravatar.com/avatar/454136e389341c2a9c22dfba3107e09e2f8653807185bb213c5272efa9b93864?d=identicon)[willtp87](/maintainers/willtp87)

---

Top Contributors

[![adam-vessey](https://avatars.githubusercontent.com/u/607975?v=4)](https://github.com/adam-vessey "adam-vessey (37 commits)")[![nchiasson-dgi](https://avatars.githubusercontent.com/u/53783039?v=4)](https://github.com/nchiasson-dgi "nchiasson-dgi (15 commits)")[![jordandukart](https://avatars.githubusercontent.com/u/1337738?v=4)](https://github.com/jordandukart "jordandukart (6 commits)")[![lutaylor](https://avatars.githubusercontent.com/u/2863794?v=4)](https://github.com/lutaylor "lutaylor (4 commits)")[![bibliophileaxe](https://avatars.githubusercontent.com/u/18718388?v=4)](https://github.com/bibliophileaxe "bibliophileaxe (1 commits)")[![willtp87](https://avatars.githubusercontent.com/u/688918?v=4)](https://github.com/willtp87 "willtp87 (1 commits)")

---

Tags

derivativesdrupaldrupal-moduleislandora

### Embed Badge

![Health badge](/badges/discoverygarden-islandora-hocr/health.svg)

```
[![Health](https://phpackages.com/badges/discoverygarden-islandora-hocr/health.svg)](https://phpackages.com/packages/discoverygarden-islandora-hocr)
```

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
