PHPackages                             bluem/teishredder - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. bluem/teishredder

AbandonedArchivedLibrary[Parsing &amp; Serialization](/categories/parsing)

bluem/teishredder
=================

Indexing and retrieval of data from TEI-XML documents

1.0(13y ago)356BSD-2PHPPHP &gt;=5.3.0

Since Dec 9Pushed 8y ago1 watchersCompare

[ Source](https://github.com/BlueM/TEIShredder)[ Packagist](https://packagist.org/packages/bluem/teishredder)[ Docs](https://github.com/BlueM/TEIShredder)[ RSS](/packages/bluem-teishredder/feed)WikiDiscussions master Synced yesterday

READMEChangelogDependenciesVersions (2)Used By (0)

TEIShredder Overview
====================

[](#teishredder-overview)

What is it?
-----------

[](#what-is-it)

TEIShredder is a set of PHP classes for indexing TEI XML documents and retrieving information on the text structure (physical and logical), contained elements, named entites etc. The information extracted from the source document is saved in a relational database, i.e. it is a form of XML shredding – hence the name.

TEIShredder is based on code that was written for a scholarly project called “Sandrart.net” ([www.sandrart.net](http://www.sandrart.net), a cooperation project between Goethe-Universität Frankfurt am Main, Germany, and the Kunsthistorisches Institut, Florence, Italy, funded by the Deutsche Forschungsgemeinschaft \[DFG\]), but was modified to make it a stand-alone project/library. Meanwhile, the project’s original code has been dropped in favor of TEIShredder.

System requirements
-------------------

[](#system-requirements)

- PHP 5.3 with the standard extensions enabled
- Relational database; tested with MySQL, PostgreSQL and SQLite. CREATE statements for these three databases can be found in the “create-\*.sql” files.

Using it
========

[](#using-it)

Getting started
---------------

[](#getting-started)

For a first quick test, open a shell on a Unix-oid system (Mac OS X, Linux, BSD, …), “cd” to the top TEIShredder directory and execute …

```
sed 's///g' create-sqlite.sql | sqlite3 test.sqlite

```

…, which means: “Take the contents of ‘create-sqlite.sql’ in this directory, remove &lt;prefix&gt; from the tables’ names and create an empty SQLite database called ‘test.sqlite’ in this directory which contains these tables”.

Then, you can run “test.php”, which takes an input XML file from the “test” directory, indexes it and saves the result in that database. (If you are more familiar with MySQL or PostgreSQL or don’t have an sqlite3 executable at hand, you could of course also use MySQL/PgSQL by changing the PDO constructor in “test.php”.) Then, it displays information on the data that has been collected, for instance the number and titles of the volumes in the TEI document, occurrences of sections, named entities etc.

If you like, you can now inspect the database’s contents. For instance, you can view the elements that were indexed by executing ...

```
sqlite3 test.sqlite 'SELECT * FROM element'

```

... at the shell.

TEI != TEI
----------

[](#tei--tei)

TEI can be used in many different ways. In my eyes, this is one of the very appealing features of TEI, but on the other hand, it makes developing generic tools much harder or impossible. TEIShredder is, to some extent, a generic tool insofar as it just processes TEI – but on the other hand, it has certain expectations of the TEI. Therefore, most likely, TEIShredder will not be able to process your unmodified TEI document, but it might be necessary to pre-process the document (for instance, using XSL-T or [XMLTransformer](https://github.com/BlueM/XMLTransformer)) to match these expectations.

Conventions / expectations
--------------------------

[](#conventions--expectations)

- If there are multiple volumes, each one must be enclosed by a a &lt;text&gt; block inside a &lt;group&gt; element.
- The main title of a volume is enclosed by a &lt;titlePart&gt; element. It is expected that each volume has a title, i.e. has a &lt;titlePart&gt; element.
- There must not be more than one &lt;titlePart&gt; element in each volume. If there are two or more, you should pre-process the XML and/or subclass the chunker class to make it ignore the unwanted &lt;titlePart&gt; elements when indexing.
- Text structure is encoded by nested &lt;div&gt; elements with &lt;head&gt; containing the section title.
- There is no special handling of columns, but only generic handling of &lt;milestone /&gt; elements. As the TEI Lite documentation suggests, columns should be encoded as &lt;milestone unit="column" \[n="..."\] /&gt;. Whenever TEIShredder encounters a &lt;milestone /&gt; element (regardless of whether it represents a column or some other change in a reference system), the values of @unit and @n (concatenated by "-", if both are present) will be saved together with the XML segment that follows this element.
- TEIShredder expects any element that should be indexed to have an @xml:id attribute, which means that elements without one will not be indexed. (Indexing such an element would be useless, as it could not be addressed, anyway.)

Database schema and performance
-------------------------------

[](#database-schema-and-performance)

The statements in the “create-\*.sql” files are only a guess regarding what might work for you. For instance, if you have named entities in a TEI document, but the identifiers are strictly numeric, it might help to set the datatype of column “identifier” in the entitiy table to an integer. And, as usual, indexes are extremely important. If you find that a query runs longer than, let’s say, 20 or 30 milliseconds, you should take a close look at your database’s EXPLAIN output for the underlying SQL query. Also, you might want to add foreign key constraints (for example between page.volume and volume.number) if you think your application might benefit from it.

###  Health Score

27

—

LowBetter than 47% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity12

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity58

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

4950d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/1493457?v=4)[Carsten Blüm](/maintainers/BlueM)[@BlueM](https://github.com/BlueM)

---

Top Contributors

[![BlueM](https://avatars.githubusercontent.com/u/1493457?v=4)](https://github.com/BlueM "BlueM (276 commits)")

---

Tags

digital-humanitiesindexingphptei-xmlxmltei

### Embed Badge

![Health badge](/badges/bluem-teishredder/health.svg)

```
[![Health](https://phpackages.com/badges/bluem-teishredder/health.svg)](https://phpackages.com/packages/bluem-teishredder)
```

###  Alternatives

[masterminds/html5

An HTML5 parser and serializer.

1.8k260.4M293](/packages/masterminds-html5)[jms/serializer

Library for (de-)serializing data of any complexity; supports XML, and JSON.

2.3k139.8M905](/packages/jms-serializer)[jms/metadata

Class/method/property metadata management in PHP

1.8k157.6M95](/packages/jms-metadata)[jms/serializer-bundle

Allows you to easily serialize, and deserialize data of any complexity

1.8k91.4M664](/packages/jms-serializer-bundle)[sabre/xml

sabre/xml is an XML library that you may not hate.

52933.7M139](/packages/sabre-xml)[presta/sitemap-bundle

A Symfony bundle that provides tools to build your application sitemap.

3839.8M35](/packages/presta-sitemap-bundle)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
