PHPackages                             b13/ai-bots-love-markdown - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. b13/ai-bots-love-markdown

ActiveTypo3-cms-extension[Utility &amp; Helpers](/categories/utility)

b13/ai-bots-love-markdown
=========================

Serve page content as Markdown for AI bots

0.6.1(1mo ago)103.2k↓46.7%1[2 issues](https://github.com/b13/ai-bots-love-markdown/issues)GPL-2.0-or-laterPHPPHP ^8.2

Since Jan 21Pushed 1mo ago2 watchersCompare

[ Source](https://github.com/b13/ai-bots-love-markdown)[ Packagist](https://packagist.org/packages/b13/ai-bots-love-markdown)[ RSS](/packages/b13-ai-bots-love-markdown/feed)WikiDiscussions main Synced 2d ago

READMEChangelog (5)Dependencies (10)Versions (24)Used By (0)

AI Bots Love Markdown - Serve TYPO3 pages as Markdown for AI crawlers
=====================================================================

[](#ai-bots-love-markdown---serve-typo3-pages-as-markdown-for-ai-crawlers)

This TYPO3 extension provides an alternative Markdown representation of your pages for AI bots and crawlers.

This makes content accessible not just for humans and screen readers, but also for AI systems that consume and process web content.

Features
--------

[](#features)

- Converts any TYPO3 page to Markdown on-the-fly via content negotiation
- Works automatically with your existing page templates - no TypoScript configuration needed
- Extracts content from `` element (or `` as fallback)
- Automatic `` tag for Markdown discovery
- Strips navigation, header, footer, and other non-content elements
- Includes page metadata (title, dates, description, categories) as YAML front matter

Installation
------------

[](#installation)

Use `composer req b13/ai-bots-love-markdown` or install it via TYPO3's Extension Manager.

After installation, add the site set to your site configuration:

```
dependencies:
  - b13/ai-bots-love-markdown
```

Configuration
-------------

[](#configuration)

### Site Settings

[](#site-settings)

The extension provides the following settings that can be configured per site:

SettingDefaultDescription`ai_bots_love_markdown.enableContentNegotiation``true`Enable content negotiation via `Accept: text/markdown` header`ai_bots_love_markdown.enableDiscoveryTag``true`Add `` tag to HTML pages for Markdown discovery`ai_bots_love_markdown.pageTypeSuffix``ai-bots-love.md`PageType Suffix for Markdown link`ai_bots_love_markdown.pageTypeTypeNum``2026`PageType TypeNum for Markdown link`ai_bots_love_markdown.removeElements``script style nav footer aside form iframe noscript`Space-separated HTML tags stripped from the markdown output. `` is intentionally not included — article-level `` regions often hold the H1`ai_bots_love_markdown.excludedDoktypes``3,4,6,7,199,254,255`Comma-separated page doktypes that never produce a Markdown alternate response. Default covers TYPO3 system doktypes (external link, shortcut, mountpoint, sysfolder, recycler, etc.)`ai_bots_love_markdown.cacheable``true`Allow CDN / reverse-proxy caching of Markdown responses. Disable to force every hit through to the origin — required when you want to count AI-bot deliveries accurately (e.g. via `b13/ai-bot-tracker`). When `false`, `Cache-Control: private, no-store` is set on Markdown responsesTo override these settings, add them to your site's `settings.yaml`:

```
ai_bots_love_markdown.enableContentNegotiation: true
ai_bots_love_markdown.enableDiscoveryTag: false
ai_bots_love_markdown.pageTypeTypeNum: 1778074315
ai_bots_love_markdown.pageTypeSuffix: 'foo.md'
ai_bots_love_markdown.removeElements: 'script style nav footer aside form iframe noscript header'
ai_bots_love_markdown.excludedDoktypes: '3,4,6,7,199,254,255,37,38,41'
ai_bots_love_markdown.cacheable: false
```

### Per-page opt-out

[](#per-page-opt-out)

Editors can disable the Markdown alternate for individual pages via the page property **Disable Markdown version** (TCA field `pages.markdown_version`, default on, rendered inverted in the BE so the toggle reads as "disable"). When the toggle is turned on, the `` discovery tag is stripped from the HTML response and any direct request to `.md` / `Accept: text/markdown` returns the regular HTML.

### Caching and content negotiation

[](#caching-and-content-negotiation)

Markdown responses share the URL of the HTML page when `Accept: text/markdown` is used. To prevent reverse proxies from serving a cached HTML response to a Markdown requester (or vice versa), the middleware adds a `Vary: Accept` header to every response that goes through a site with content negotiation enabled — regardless of whether the current request asked for Markdown.

By default (`cacheable: true`), TYPO3's page-level `Cache-Control` is preserved on the Markdown response, so the CDN may cache. If you need every Markdown delivery to reach the origin — typically to track bot consumption — set `cacheable: false` and the response is sent with `Cache-Control: private, no-store`.

Usage
-----

[](#usage)

### Access Methods

[](#access-methods)

1. **Accept header**: Request any page with `Accept: text/markdown` header

    ```
    curl -H "Accept: text/markdown" https://example.com/my-page/

    ```
2. **URL suffix**: Append `/ai-bots-love.md` to any page URL

    ```
    https://example.com/ai-bots-love.md

    ```

### Output Format

[](#output-format)

The extension outputs Markdown with YAML front matter, extracting metadata from HTML meta tags with fallback to TYPO3 page record data:

```
---
title: "From og:title >  > page record"
url: "From canonical URL > request URL"
description: "From og:description > meta description > page record"
image: "From og:image (if present)"
author: "From meta author > page record"
date: 2024-01-15
modified: 2024-01-20
keywords:
  - From meta keywords
categories:
  - From TYPO3 categories
type: "From og:type (if present)"
site: "From og:site_name (if present)"
locale: "From og:locale (if present)"
---

# Page Title

Content converted to Markdown...
```

**Metadata Priority:**

- `title`: og:title → `` tag → page record
- `url`: canonical link → request URL
- `description`: og:description → meta description → page record
- `image`: og:image (if present)
- `author`: meta author → page record
- `date`/`modified`: page record (crdate/tstamp)
- `keywords`: meta keywords (if present)
- `categories`: TYPO3 sys\_category (from database)

### Auto-Discovery

[](#auto-discovery)

The extension automatically adds a `` tag to all HTML pages for Markdown discovery:

```

```

### Opt-in Behavior

[](#opt-in-behavior)

Pages are only converted to Markdown if they contain the `` tag. This means:

- Pages without the site set enabled won't be converted
- You can disable conversion for specific pages by disabling the discovery tag

Technical Details
-----------------

[](#technical-details)

### How It Works

[](#how-it-works)

1. A PSR-15 middleware intercepts requests with `Accept: text/markdown` header or `/ai-bots-love.md` suffix
2. The normal page rendering proceeds (your existing templates, TypoScript, etc.)
3. The middleware checks if the response contains the markdown alternate link tag
4. If present, it extracts `` content (or `` as fallback)
5. The HTML is converted to Markdown using league/html-to-markdown
6. Navigation, header, footer, and other non-content elements are stripped
7. Page metadata is added as YAML front matter

### Response Headers

[](#response-headers)

The Markdown response includes:

- `Content-Type: text/markdown; charset=utf-8`
- `X-Robots-Tag: noindex` (to prevent indexing of Markdown version)

### Best Practices for Templates

[](#best-practices-for-templates)

For best results, wrap your main content in a `` element:

```

    ...
    ...

    ...

```

### Markdown Include / Exclude Markers

[](#markdown-include--exclude-markers)

Beyond the `` selection and the configurable `removeElements` list, you can mark template regions explicitly. This gives editors and integrators fine-grained control over what ends up in the Markdown representation, without affecting the HTML page that humans see.

Two bundled Fluid partials do the work. They are auto-registered through the site set, so no `partialRootPaths` setup is required — just `` them.

#### `MarkdownInclude` — explicitly mark the content region

[](#markdowninclude--explicitly-mark-the-content-region)

Wraps the region that should become the Markdown body. Use this when ``-based detection is not good enough — e.g. there is no ``element, or `` contains too much (sidebars, related teasers, in-page navigation).

```

    … the real article content …

```

When include markers are present, **everything outside them is ignored** for the Markdown output, including content that would otherwise be picked up from `` or ``.

#### `MarkdownExclude` — cut a region out of the Markdown

[](#markdownexclude--cut-a-region-out-of-the-markdown)

Wraps a region that should *not* appear in the Markdown output even though it lives inside the content area — teasers, related-article boxes, breadcrumbs, CTAs, cookie notices, share buttons.

```

    Read also: …

```

Exclude regions can be **nested**, and they can sit inside an include region. Each `MarkdownExclude` block is removed as a whole; the surrounding content is kept.

#### How content is selected

[](#how-content-is-selected)

When a Markdown response is built, the content region is resolved in this priority order (see `MarkdownResponseMiddleware`):

1. **Include markers** — the content between `` and ``, if both are present.
2. **`` element** — the contents of the first `…`.
3. **`` element** — the contents of `…` as a fallback.
4. **Whole document** — last resort if none of the above match.

After the region is selected, every `MarkdownExclude` block inside it is removed (depth-aware, so nested excludes are handled correctly), and any residual markers are stripped before the HTML is converted to Markdown.

#### The markers themselves

[](#the-markers-themselves)

The partials emit plain HTML comments around their content:

PartialOpening markerClosing marker`MarkdownInclude``````MarkdownExclude`````Both partials only emit markers when their `content` is non-empty, so an empty region produces no markers at all.

Because the markers are ordinary HTML comments, they **survive TYPO3's page cache** — they are baked into the cached page body together with your content. That is intentional: the Markdown conversion needs them on every request, including fully-cached ones.

#### Markers never reach human visitors

[](#markers-never-reach-human-visitors)

Markers must not be visible to regular visitors. The `StripMarkdownMarkersMiddleware` removes all four markers from every `text/html` response before it leaves the server. (Markdown responses already have the markers stripped during conversion, so they are skipped by the content-type check.)

Note on robustness: marker stripping shortens the body, so the middleware also drops the stale `Content-Length` header — leaving it in place would otherwise cause an HTTP/2 `RST_STREAM` / `INTERNAL_ERROR` in browsers because the declared length no longer matches the bytes on the wire.

#### Debugging which regions your templates emit

[](#debugging-which-regions-your-templates-emit)

To see the markers in the rendered HTML (e.g. to verify your partials are wired up correctly), append `?markdown-markers=1` to any page URL. The stripping middleware then leaves the markers in the HTML response.

This bypass is gated for safety: it only works for **authenticated backend users** or in **non-production** application contexts, so the markers are never disclosed to anonymous crawlers on a production site.

### Events

[](#events)

The extension dispatches three PSR-14 events. Listen via the `#[AsEventListener]` attribute or by registering a listener in your extension's `Configuration/Services.yaml`. All events live under the `B13\AiBotsLoveMarkdown\Event\` namespace.

EventFiredUse case`BuildHtmlMarkdownConverterEvent`Before HTML → Markdown conversion runsAdd custom node converters, override `HtmlConverter` options`AfterFrontMatterForPageIsCreatedEvent`After the YAML front-matter array is assembled from meta tags and the page record, before serialisationAdd, remove, or replace front-matter entries (e.g. enrich with domain-specific keys)`AfterMarkdownConversionEvent`After Markdown content has been built, before the response is returnedSide effects on every Markdown delivery (e.g. `b13/ai-bot-tracker` writes a tracking row from this event)See [Extending the YAML front matter](#extending-the-yaml-front-matter) below for a full `AfterFrontMatterForPageIsCreatedEvent` listener example.

### Extending the YAML front matter

[](#extending-the-yaml-front-matter)

The default front matter is built from HTML meta tags and the page record. To add domain-specific keys (e.g. seminar data, product attributes, event dates), listen to the `AfterFrontMatterForPageIsCreatedEvent`. Listeners receive the assembled data array **before** it is rendered to YAML and may add, remove, or replace entries.

```
