PHPackages                             silverstripe/textextraction - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [PDF &amp; Document Generation](/categories/documents)
4. /
5. silverstripe/textextraction

ActiveSilverstripe-vendormodule[PDF &amp; Document Generation](/categories/documents)

silverstripe/textextraction
===========================

Text Extraction API for SilverStripe CMS (mostly used with 'fulltextsearch' module)

5.0.1(10mo ago)9183.8k↓24.9%24[2 issues](https://github.com/silverstripe/silverstripe-textextraction/issues)5BSD-3-ClausePHPPHP ^8.3CI passing

Since Apr 30Pushed 5mo ago11 watchersCompare

[ Source](https://github.com/silverstripe/silverstripe-textextraction)[ Packagist](https://packagist.org/packages/silverstripe/textextraction)[ Docs](http://silverstripe.org)[ RSS](/packages/silverstripe-textextraction/feed)WikiDiscussions 5 Synced 1mo ago

READMEChangelog (10)Dependencies (9)Versions (57)Used By (5)

Text extraction module
======================

[](#text-extraction-module)

[![CI](https://github.com/silverstripe/silverstripe-textextraction/actions/workflows/ci.yml/badge.svg)](https://github.com/silverstripe/silverstripe-textextraction/actions/workflows/ci.yml)[![Silverstripe supported module](https://camo.githubusercontent.com/9b7e93d393a01f6d3091fb30983b870aa863ef076858115faaa1c74b995854ec/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f73696c7665727374726970652d737570706f727465642d3030373143342e737667)](https://www.silverstripe.org/software/addons/silverstripe-commercially-supported-module-list/)

Provides a text extraction API for file content, that can hook into different extractor engines based on availability and the parsed file format. The output returned is always a string of the file content.

Via the `FileTextExtractable` extension, this logic can be used to cache the extracted content on a `DataObject` subclass (usually `File`).

The module supports text extraction on the following file formats:

- HTML (built-in)
- PDF (with XPDF or Solr)
- Microsoft Word, Excel, Powerpoint (Solr)
- OpenOffice (Solr)
- CSV (Solr)
- RTF (Solr)
- EPub (Solr)
- Many others (Tika)

Read more in the [documentation](https://docs.silverstripe.org/en/optional_features/text-extraction).

Installation
------------

[](#installation)

```
composer require silverstripe/textextraction
```

Bugtracker
----------

[](#bugtracker)

Bugs are tracked in the issues section of this repository. Before submitting an issue please read over existing issues to ensure yours is unique.

If the issue does look like a new bug:

- Create a new issue
- Describe the steps required to reproduce your issue, and the expected outcome. Unit tests, screenshots and screencasts can help here.
- Describe your environment as detailed as possible: Silverstripe version, Browser, PHP version, Operating System, any installed Silverstripe modules.

Please report security issues to  directly. Please don't file security issues in the bugtracker.

Development and contribution
----------------------------

[](#development-and-contribution)

If you would like to make contributions to the module please ensure you raise a pull request and discuss with the module maintainers.

###  Health Score

61

—

FairBetter than 99% of packages

Maintenance59

Moderate activity, may be stable

Popularity43

Moderate usage in the ecosystem

Community37

Small or concentrated contributor base

Maturity92

Battle-tested with a long release history

 Bus Factor3

3 contributors hold 50%+ of commits

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~70 days

Recently: every ~33 days

Total

56

Last Release

167d ago

Major Versions

3.4.x-dev → 4.0.0-beta12023-01-22

3.5.0 → 4.0.0-rc12023-03-29

3.x-dev → 4.0.02023-04-27

4.1.1 → 5.0.0-alpha12024-12-02

4.x-dev → 5.0.12025-07-23

PHP version history (5 changes)2.0.0PHP &gt;=5.3.2

3.3.0PHP ^7.3 || ^8.0

3.4.0PHP ^7.4 || ^8.0

4.0.0-beta1PHP ^8.1

5.0.0-alpha1PHP ^8.3

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/654636?v=4)[Aaron Carlino](/maintainers/unclecheese)[@unclecheese](https://github.com/unclecheese)

![](https://www.gravatar.com/avatar/b0cba8b534e20e6ab4fff555a97b237a18436ebca1446fc0b29c8a8b504038b9?d=identicon)[GuySartorelli](/maintainers/GuySartorelli)

![](https://avatars.githubusercontent.com/u/111025?v=4)[Ingo Schommer](/maintainers/chillu)[@chillu](https://github.com/chillu)

![](https://www.gravatar.com/avatar/a25bc04c5720a36869d5a39c6449dde7eb43e19b7c8e666d5f632d6a9ab440b1?d=identicon)[emteknetnz](/maintainers/emteknetnz)

![](https://www.gravatar.com/avatar/afbb3dcc9ef29c1a6eedd6addcae5fce9ab1271915a85a4c349301b71237368d?d=identicon)[silverstripe-machine01](/maintainers/silverstripe-machine01)

![](https://www.gravatar.com/avatar/be6648e60fbab6f70bfc34dd8c14259562d28a47510a934ea9c01fe98633f3c2?d=identicon)[sminnee](/maintainers/sminnee)

![](https://avatars.githubusercontent.com/u/1168676?v=4)[Maxime Rainville](/maintainers/maxime-rainville)[@maxime-rainville](https://github.com/maxime-rainville)

---

Top Contributors

[![robbieaverill](https://avatars.githubusercontent.com/u/5170590?v=4)](https://github.com/robbieaverill "robbieaverill (49 commits)")[![emteknetnz](https://avatars.githubusercontent.com/u/4809037?v=4)](https://github.com/emteknetnz "emteknetnz (43 commits)")[![GuySartorelli](https://avatars.githubusercontent.com/u/36352093?v=4)](https://github.com/GuySartorelli "GuySartorelli (38 commits)")[![chillu](https://avatars.githubusercontent.com/u/111025?v=4)](https://github.com/chillu "chillu (30 commits)")[![dhensby](https://avatars.githubusercontent.com/u/563596?v=4)](https://github.com/dhensby "dhensby (13 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (12 commits)")[![NightJar](https://avatars.githubusercontent.com/u/778003?v=4)](https://github.com/NightJar "NightJar (4 commits)")[![assertchris](https://avatars.githubusercontent.com/u/200609?v=4)](https://github.com/assertchris "assertchris (4 commits)")[![michalkleiner](https://avatars.githubusercontent.com/u/233342?v=4)](https://github.com/michalkleiner "michalkleiner (3 commits)")[![camfindlay](https://avatars.githubusercontent.com/u/367847?v=4)](https://github.com/camfindlay "camfindlay (2 commits)")[![ichaber](https://avatars.githubusercontent.com/u/929858?v=4)](https://github.com/ichaber "ichaber (2 commits)")[![sabina-talipova](https://avatars.githubusercontent.com/u/87288324?v=4)](https://github.com/sabina-talipova "sabina-talipova (2 commits)")[![ScopeyNZ](https://avatars.githubusercontent.com/u/3260989?v=4)](https://github.com/ScopeyNZ "ScopeyNZ (1 commits)")[![ishannz](https://avatars.githubusercontent.com/u/20032948?v=4)](https://github.com/ishannz "ishannz (1 commits)")[![jakedaleweb](https://avatars.githubusercontent.com/u/11186642?v=4)](https://github.com/jakedaleweb "jakedaleweb (1 commits)")[![jnv](https://avatars.githubusercontent.com/u/616767?v=4)](https://github.com/jnv "jnv (1 commits)")[![lozcalver](https://avatars.githubusercontent.com/u/1655548?v=4)](https://github.com/lozcalver "lozcalver (1 commits)")[![martinhipp](https://avatars.githubusercontent.com/u/108774?v=4)](https://github.com/martinhipp "martinhipp (1 commits)")

---

Tags

hacktoberfestpdfsilverstripefulltext

###  Code Quality

TestsPHPUnit

Code StylePHP\_CodeSniffer

### Embed Badge

![Health badge](/badges/silverstripe-textextraction/health.svg)

```
[![Health](https://phpackages.com/badges/silverstripe-textextraction/health.svg)](https://phpackages.com/packages/silverstripe-textextraction)
```

###  Alternatives

[silverstripe/framework

The SilverStripe framework

7213.5M2.5k](/packages/silverstripe-framework)[silverstripe/cms

The SilverStripe Content Management System

5163.4M1.3k](/packages/silverstripe-cms)[silverstripe/admin

SilverStripe admin interface

262.6M325](/packages/silverstripe-admin)[aspose-cloud/aspose-words-cloud

Open, generate, edit, split, merge, compare and convert Word documents. Integrate Cloud API into your solutions to manipulate documents. Convert PDF to Word (DOC, DOCX, ODT, RTF and HTML) and in the opposite direction.

32157.4k](/packages/aspose-cloud-aspose-words-cloud)[symbiote/silverstripe-pdfrendition

A module that makes use of the Flying Saucer XHTML renderer project to create PDFs from XHTML pages.

175.0k](/packages/symbiote-silverstripe-pdfrendition)[yetiforce/yetiforcepdf

Library that generate pdf files from html.

13127.8k1](/packages/yetiforce-yetiforcepdf)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
