PHPackages                             ttree/contentinsight - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. ttree/contentinsight

ActiveTypo3-flow-package[Utility &amp; Helpers](/categories/utility)

ttree/contentinsight
====================

Tools to extract basic content inventory information from an existing website

2289PHP

Since Nov 26Pushed 11y ago1 watchersCompare

[ Source](https://github.com/ttreeagency/ContentInsight)[ Packagist](https://packagist.org/packages/ttree/contentinsight)[ RSS](/packages/ttree-contentinsight/feed)WikiDiscussions master Synced 1mo ago

READMEChangelogDependenciesVersions (2)Used By (0)

Ttree Content Insight
=====================

[](#ttree-content-insight)

[!\[Build Status\](http://gitlab.ttree.ch:8080/buildStatus/icon?job=OSS ContentInsight Master Commit)](http://gitlab.ttree.ch:8080/job/OSS%20ContentInsight%20Master%20Commit/) [![Total Downloads](https://camo.githubusercontent.com/0486bbfa85504faffd405377027139b61739601e39ef429b0f86f48d022d0e03/68747470733a2f2f706f7365722e707567782e6f72672f74747265652f636f6e74656e74696e73696768742f646f776e6c6f6164732e706e67)](https://packagist.org/packages/ttree/contentinsight)

This TYPO3 Flow package provider a CLI tools to extract Content Inventory CSV from existing website.

This package is under development and considered beta. This package require Flow 2.3.

Features
--------

[](#features)

- Extract website structure and basic meta data
- Support crawling presets
- Flexible report building (include a CSV report builder, but you can register your own report builder)
- Skip URI with regular expression
- Sort inventory based on document tree structure

Todos
-----

[](#todos)

- Generate human readable page ID (like, 1, 1.1, 1.2, 2, 2.1, 2.2, ...)
- Update report / multiple index support
- Get analytics data from Google Analytics

Configuration
-------------

[](#configuration)

Check the `Configuration/Settings.yaml` for detailed configurations.

By default, this package cache all Raw HTTP request for one day. You can change this settings in you own `Settings.yaml` and `Caches.yaml`.

Base Preset
-----------

[](#base-preset)

The base preset is automatically merged with all preset. You can enabled or disabled any property with the settings `presets.[preset_name].properties.[property_name].enabled`.

```
Ttree:
  ContentInsight:
    presets:
      '*':
        properties:
          'pageTitle':
            enabled: TRUE
          'navigationTitle':
            enabled: TRUE
```

Custom Preset
-------------

[](#custom-preset)

You can define custom preset to crawle different kind of informations. With the `class` setting you can use your own processor implementation to get information from the current URI. Your processor must implement `Ttree\ContentInsight\CrawlerProcessor\ProcessorInterface`:

```
Ttree:
  ContentInsight:
    presets:
      'custom':
        properties:
          'pageTitle':
            class: 'Your\Package\CrawlerProcessor\PageTitleProcessor'
          'metaDescription':
            enabled: TRUE
          'metaKeywords':
            enabled: TRUE
          'firstLevelHeader':
            enabled: TRUE
```

How to build a report ?
-----------------------

[](#how-to-build-a-report-)

The package support CSV reporting, but you can register your own Report builder. Check the `Settings.yaml`:

```
Ttree:
  ContentInsight:
    presets:
      'custom':
        reportConfigurations:
          'csv':
            enabled: TRUE
            renderType: 'Csv'
            renderTypeOptions:
              displayColumnHeaders: TRUE
            reportPath: '%FLOW_PATH_DATA%Reports/Ttree.ContentInsight'
            reportPrefix: 'content-inventory-report'
            properties:
              'id':
                label: 'ID'
              'pageTitle':
                label: 'Page Title'
              'navigationTitle':
                label: 'Navigation Title'
              'externalLink':
                label: 'External Link'
                postProcessor: 'Boolean'
              'currentUri':
                label: 'URL'
              'metaDescription':
                label: 'Meta Description'
              'metaKeywords':
                label: 'Meta Keywords'
              'firstLevelHeaderCount':
                label: 'Main Header Count (H1)'
              'firstLevelHeaderContent':
                label: 'Main Header Content (H1)'
              'remark':
                label: 'Crawling Remark'
```

The keys in the `properties` section must match the key produced by the `CrawlerProcessor` object.

The position of each column could be specified with the following syntax : `position: ''`The `` supports one of the following syntax:

```
    start ()
    end ()
    before  ()
    after  ()

```

### Example

[](#example)

```
Ttree:
  ContentInsight:
    presets:
      'custom':
        reportConfigurations:
          'csv':
            enabled: TRUE
            renderType: 'Csv'
            renderTypeOptions:
              displayColumnHeaders: TRUE
            reportPath: '%FLOW_PATH_DATA%Reports/Ttree.ContentInsight'
            reportPrefix: 'content-inventory-report'
            properties:
              'id':
                label: 'ID'
                position: '',
              'pageTitle':
                label: 'Page Title'
                position:''
```

For a single crawling preset you can register multiple reports if required. Foreach property you can register a post processor if you need to manipulate the property in the report, see `BooleanPostProcessor` for a basic example.

How to skip specific URI ?
--------------------------

[](#how-to-skip-specific-uri-)

You can define invalid URIs patterns in your crawling presets:

```
Ttree:
  ContentInsight:
    presets:
      'custom':
        invalidUriPatterns:
          'javascript':
            pattern: '@^javascript\:void\(0\)$@'
          'mailto':
            pattern: '@^mailto\:.*@'
          'anchor':
            pattern: '@^#.*@'
            message: 'Link to anchor'
```

If the pattern has a `message` all URL matching the pattern will be logged. By default the crawler skip those URLs silently.

Usage
-----

[](#usage)

To get the complete website inventory:

```
# flow contentinventor:extract --base-url http://www.domain.com

```

Or to limit the crawler to a part of the website

```
# flow contentinventor:extract --base-url http://www.domain.com/products

```

You can select a crawling presets

```
# flow contentinventor:extract --base-url http://www.domain.com/products --preset default

```

###  Health Score

23

—

LowBetter than 27% of packages

Maintenance20

Infrequent updates — may be unmaintained

Popularity14

Limited adoption so far

Community9

Small or concentrated contributor base

Maturity43

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 88.6% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/145363?v=4)[ttree](/maintainers/ttree)[@ttree](https://github.com/ttree)

---

Top Contributors

[![dfeyer](https://avatars.githubusercontent.com/u/221173?v=4)](https://github.com/dfeyer "dfeyer (70 commits)")[![lcherpit](https://avatars.githubusercontent.com/u/490499?v=4)](https://github.com/lcherpit "lcherpit (9 commits)")

### Embed Badge

![Health badge](/badges/ttree-contentinsight/health.svg)

```
[![Health](https://phpackages.com/badges/ttree-contentinsight/health.svg)](https://phpackages.com/packages/ttree-contentinsight)
```

###  Alternatives

[prewk/xml-streamer

Stream large XML files with low memory consumption.

5247.1k](/packages/prewk-xml-streamer)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
