PHPackages                             talan-hdf/semantic-suggestion - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. talan-hdf/semantic-suggestion

ActiveTypo3-cms-extension[Utility &amp; Helpers](/categories/utility)

talan-hdf/semantic-suggestion
=============================

TYPO3 extension for suggesting semantically related pages using advanced NLP and TF-IDF analysis

4.0.0(6mo ago)102.6k5[1 issues](https://github.com/friteuseb/semantic_suggestion/issues)GPL-2.0-or-laterPHPPHP ^8.1

Since Jul 30Pushed 5mo ago5 watchersCompare

[ Source](https://github.com/friteuseb/semantic_suggestion)[ Packagist](https://packagist.org/packages/talan-hdf/semantic-suggestion)[ RSS](/packages/talan-hdf-semantic-suggestion/feed)WikiDiscussions master Synced today

READMEChangelog (10)Dependencies (3)Versions (49)Used By (0)

TYPO3 Extension: Semantic Suggestion
====================================

[](#typo3-extension-semantic-suggestion)

[![TYPO3 12](https://camo.githubusercontent.com/08afacc49187e63c796f7d1c4401d0f0563bab574d9c525312b2827acb09a7c5/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5459504f332d31322d6f72616e67652e737667)](https://get.typo3.org/version/12)[![TYPO3 13](https://camo.githubusercontent.com/2cf6570821614808899422f68a66a381a2de1dd0746ba9cdba6155def1f4f396/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5459504f332d31332d6f72616e67652e737667)](https://get.typo3.org/version/13)[![TYPO3 14](https://camo.githubusercontent.com/382ff45949671f1b9c4431781f1961eb04a15376fe22523e03a9893c6d4ec278/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5459504f332d31342d6f72616e67652e737667)](https://get.typo3.org/version/14)[![Latest Stable Version](https://camo.githubusercontent.com/fab1044462f78310d9707b1d97a6d9455a2db1e5fcff7abcd469a3e367507db9/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f74616c616e2d6864662f73656d616e7469632d73756767657374696f6e2e737667)](https://packagist.org/packages/talan-hdf/semantic-suggestion)[![License](https://camo.githubusercontent.com/d5d771a72866da8488b61e15da1a007d218b236012bed3ef32607fb82a1a4ae9/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f74616c616e2d6864662f73656d616e7469632d73756767657374696f6e2e737667)](https://packagist.org/packages/talan-hdf/semantic-suggestion)

> Elevate your TYPO3 website with intelligent, AI-powered content recommendations using advanced NLP technology.

Introduction
------------

[](#introduction)

The Semantic Suggestion extension revolutionizes the way related content is presented on TYPO3 websites. Moving beyond traditional category-based functionalities, this extension employs advanced **NLP (Natural Language Processing)** and **TF-IDF (Term Frequency-Inverse Document Frequency)** analysis to create genuinely relevant content connections.

Since version 2.0.0, similarity scores are **stored in a dedicated database table** (`tx_semanticsuggestion_similarities`) instead of the TYPO3 cache. A **Scheduler task** handles calculation and storage, ensuring persistence and performance.

**🚀 NEW in v3.0.0**: Enhanced with [nlp\_tools](https://github.com/cywolf/nlp_tools) integration for professional-grade text analysis, featuring intelligent language detection for TYPO3 12/13 multilingual sites and optimized German language support.

### Key Benefits:

[](#key-benefits)

- **🎯 Highly Relevant Links**: TF-IDF vectorization with advanced NLP creates genuinely relevant content connections
- **🌍 Multilingual Intelligence**: Smart language detection for TYPO3 12/13 sites with automatic content analysis
- **🇩🇪 German Language Optimized**: Professional stemming and handling of compound words (Automobilindustrie ↔ Automobil)
- **📈 Increased User Engagement**: Keep visitors on your site longer by offering truly related content
- **🔍 Semantic Cocoon**: Contributes to a high-quality semantic network, enhancing SEO and navigation
- **⚡ Intelligent Automation**: Reduces manual linking work while improving internal link quality by 30-50%

### Performance Considerations

[](#performance-considerations)

- The similarity calculation process performed by the Scheduler task can take time, especially on sites with a large number of pages (&gt;500 pages might require 30s or more depending on the server).
- Displaying suggestions and statistics (reading from the database) is optimized.
- Use the backend module to assess the performance and relevance of suggestions for your specific setup.

---

🆕 What's New in Version 3.0.0
-----------------------------

[](#-whats-new-in-version-300)

### 🧠 Advanced NLP Integration

[](#-advanced-nlp-integration)

- **TF-IDF Vectorization**: Professional-grade similarity calculation replacing basic cosine similarity
- **Intelligent Language Detection**: Hybrid approach using TYPO3 configuration + content analysis
- **Advanced Text Processing**: Stemming, stop word removal, and text normalization via nlp\_tools

### 🌐 Enhanced Multilingual Support

[](#-enhanced-multilingual-support)

- **TYPO3 12/13 Compatibility**: Uses modern Context API and Site Configuration
- **Smart Language Detection**:
    - **Priority 1**: Uses TYPO3 site language configuration (`de_DE.UTF-8` → `de`)
    - **Priority 2**: Falls back to intelligent content analysis when TYPO3 config is uncertain
    - **Priority 3**: Confidence scoring prevents misdetection of mixed-language content
- **Multi-site Ready**: Supports complex multilingual TYPO3 setups

### 🇩🇪 German Language Excellence

[](#-german-language-excellence)

- **Professional Stemming**: Uses Wamania\\Snowball for German morphology
- **Compound Word Support**: Recognizes relationships between "Automobil" and "Automobilindustrie"
- **Umlaut Handling**: Perfect support for ä, ö, ü, ß characters
- **German Stop Words**: Optimized list including der, die, das, ein, eine, mit, von, etc.

### 📊 Enhanced Algorithm

[](#-enhanced-algorithm)

- **TF-IDF Vectors**: More accurate similarity scoring than traditional word counting
- **Confidence Thresholds**: Prevents false positives in language detection
- **Fallback Mechanisms**: Graceful degradation if NLP processing fails
- **Performance Caching**: TF-IDF vectors and stemming results are cached

Table of Contents
-----------------

[](#table-of-contents)

- [Introduction](#introduction)
- [Features](#features)
- [Requirements](#requirements)
- [Installation](#installation)
- [Language Detection System](#language-detection-system)
- [Configuration](#configuration)
    - [Scheduler Task Configuration](#scheduler-task-configuration)
    - [TypoScript Settings](#typoscript-settings)
    - [NLP Configuration](#nlp-configuration)
    - [Configuration Interaction](#configuration-interaction)
- [Usage (Frontend)](#usage-frontend)
- [Bootstrap Package Integration](#bootstrap-package-integration)
- [Backend Module](#backend-module)
- [Scheduler Task](#scheduler-task-1)
- [Similarity Logic (TF-IDF Enhanced)](#similarity-logic-tf-idf-enhanced)
- [Multilingual Sites Setup](#multilingual-sites-setup)
- [Display Customization](#display-customization)
- [Debugging &amp; Performance](#debugging--performance)
- [Migration from v2.x](#migration-from-v2x)
- [Contributing](#contributing)
- [License](#license)
- [Support](#support)

Features
--------

[](#features)

### 🔧 Core Functionality

[](#-core-functionality)

- **Advanced TF-IDF Analysis**: Professional semantic similarity calculation using vectorization
- **Database Storage**: Persistent similarity scores in `tx_semanticsuggestion_similarities` table
- **Automated Processing**: Scheduler task for background calculation and updates
- **Frontend Integration**: Clean API for displaying suggestions (title, media, excerpt)
- **Backend Analytics**: Comprehensive statistics and performance metrics

### 🌐 Multilingual &amp; NLP Features

[](#-multilingual--nlp-features)

- **Intelligent Language Detection**: Hybrid TYPO3 + content analysis approach
- **Professional Text Processing**: Stemming, stop word removal, text normalization
- **German Language Excellence**: Optimized for compound words and umlauts
- **Multi-site Support**: Handles complex TYPO3 multilingual configurations
- **Confidence Scoring**: Prevents false language detection in mixed content

### ⚙️ Configuration &amp; Performance

[](#️-configuration--performance)

- **Highly Configurable**: TypoScript settings for display and Scheduler for analysis scope
- **Performance Optimized**: TF-IDF vector caching and intelligent fallbacks
- **Flexible Exclusions**: Page-level exclusions for analysis and/or display
- **Debug Mode**: Comprehensive logging for development and troubleshooting

Requirements
------------

[](#requirements)

- **TYPO3**: 12.4.0 - 14.99.99
- **PHP**: 8.1 or higher
- **Dependencies**:
    - `cywolf/nlp-tools` ^2.0 (automatically installed via Composer)
    - `wamania/php-stemmer` (for advanced stemming support)

Installation
------------

[](#installation)

**Composer Installation (recommended)**1. Install the extension with dependencies: ```
    composer require talan-hdf/semantic-suggestion
    composer require cywolf/nlp-tools
    ```
2. Activate both extensions in the TYPO3 Extension Manager.
3. Clear TYPO3 cache: `./vendor/bin/typo3 cache:flush`
4. (Optional) Run unit tests to verify installation: ```
    ./vendor/bin/phpunit --configuration phpunit.xml.dist --testsuite unit
    ```

**Manual Installation**1. Download the extension from the [TER](https://extensions.typo3.org/extension/semantic_suggestion) or GitHub.
2. Upload the archive to `typo3conf/ext/`.
3. Activate the extension in the Extension Manager.

Language Detection System
-------------------------

[](#language-detection-system)

### 🧠 How Language Detection Works

[](#-how-language-detection-works)

The extension uses a **hybrid approach** that combines TYPO3's multilingual configuration with intelligent content analysis:

#### Detection Priority (Cascade System):

[](#detection-priority-cascade-system)

1. **🎯 Priority 1: TYPO3 Site Configuration** (for TYPO3 12/13)

    ```
    // Uses TYPO3's Context API and Site Configuration
    $site = $request->getAttribute('site');
    $siteLanguage = $site->getLanguageById($languageId);
    $locale = $siteLanguage->getLocale(); // e.g., "de_DE.UTF-8"
    return strtolower(substr($locale->getLanguageCode(), 0, 2)); // → "de"
    ```
2. **🔍 Priority 2: Content Analysis** (when TYPO3 config is uncertain)

    - Extracts character trigrams from text content
    - Compares against language profiles built from stop words
    - Uses confidence scoring to prevent false positives
3. **⚖️ Priority 3: Confidence Verification**

    ```
    // If confidence difference < 30%, trust TYPO3 context
    if (($firstScore - $secondScore) / $firstScore < 0.3) {
        return $this->getTypo3LanguageContext();
    }
    ```

### 🌍 Multilingual Site Examples

[](#-multilingual-site-examples)

#### Example 1: Well-configured TYPO3 Site

[](#example-1-well-configured-typo3-site)

```
# site/config.yaml
languages:
  - languageId: 0
    locale: 'en_US.UTF-8'  # → nlp_tools detects "en"
  - languageId: 1
    locale: 'de_DE.UTF-8'  # → nlp_tools detects "de"
  - languageId: 2
    locale: 'fr_FR.UTF-8'  # → nlp_tools detects "fr"
```

**Result**: nlp\_tools uses TYPO3 configuration directly via Site API

#### Example 2: Mixed Content Detection

[](#example-2-mixed-content-detection)

```
Page with:
- sys_language_uid = 0 (English)
- But content: "Die deutsche Automobilindustrie ist sehr wichtig"

Detection results:
- Content analysis: 85% German confidence
- TYPO3 context: English
- Final decision: German (high confidence overrides TYPO3)

```

#### Example 3: Uncertain Content

[](#example-3-uncertain-content)

```
Detection scores: French: 45%, German: 42%
Confidence difference: (45-42)/45 = 6.7% < 30%
→ Falls back to TYPO3 language context

```

### 🚀 nlp\_tools Integration

[](#-nlp_tools-integration)

Yes, the extension **fully integrates with nlp\_tools**:

```
// In PageAnalysisService.php
protected function getCurrentLanguage(): string
{
    // Uses nlp_tools LanguageDetectionService
    $detectedLanguage = $this->languageDetector->detectLanguage('');
    return $detectedLanguage; // Smart hybrid detection
}
```

The language detection leverages:

- **LanguageDetectionService**: Trigram analysis and confidence scoring
- **TextAnalysisService**: Advanced text processing and stemming
- **TextVectorizerService**: TF-IDF vectorization for similarity calculation

### 🌍 Language Support Matrix

[](#-language-support-matrix)

All languages benefit from nlp\_tools integration, but with different optimization levels:

#### 🥇 **EXPERT Level Support** (Maximum Optimization)

[](#-expert-level-support-maximum-optimization)

LanguageCodeStemmingStop WordsTF-IDFImprovement🇩🇪 **German**`de`✅ Advanced (Wamania\\Snowball)✅ Specialized✅ Full**+40-50%****German Features**:

- Professional compound word handling (`Automobilindustrie` ↔ `Automobil`)
- Perfect umlaut support (`ä, ö, ü, ß`)
- Specialized stop words (`der, die, das, ein, eine, mit, von, zu...`)

#### 🥈 **ADVANCED Level Support** (Professional Optimization)

[](#-advanced-level-support-professional-optimization)

LanguageCodeStemmingStop WordsTF-IDFImprovement🇫🇷 French`fr`✅ Wamania\\Snowball✅ Specialized✅ Full**+30-40%**🇬🇧 English`en`✅ Wamania\\Snowball✅ Specialized✅ Full**+30-40%**🇪🇸 Spanish`es`✅ Wamania\\Snowball✅ Specialized✅ Full**+30-40%**#### 🥉 **STANDARD Level Support** (TF-IDF Optimization)

[](#-standard-level-support-tf-idf-optimization)

LanguageCodeStemmingStop WordsTF-IDFImprovement🇮🇹 Italian`it`❌ Basic tokenization⚠️ Generic✅ Full**+20-30%**🇵🇹 Portuguese`pt`❌ Basic tokenization⚠️ Generic✅ Full**+20-30%**🇳🇱 Dutch`nl`❌ Basic tokenization⚠️ Generic✅ Full**+20-30%****All Others**`*`❌ Basic tokenization⚠️ Generic✅ Full**+20-25%**### 📊 **What Every Language Gets**

[](#-what-every-language-gets)

**✅ All languages benefit from**:

1. **TF-IDF Vectorization**: Professional semantic similarity calculation
2. **Intelligent Language Detection**: Hybrid TYPO3 + content analysis
3. **Advanced Text Cleaning**: Unicode normalization, accent removal
4. **Performance Caching**: TF-IDF vectors and processing results cached

```
// ALL languages get TF-IDF processing
$tfidfResult = $textVectorizer->createTfIdfVectors([$text1, $text2], $language);
$similarity = $textVectorizer->cosineSimilarity($vector1, $vector2);

// Only de/fr/en/es get advanced stemming
if (in_array($language, ['de', 'fr', 'en', 'es'])) {
    $stemmedWords = $textAnalyzer->stem($text, $language);
} else {
    $stemmedWords = $textAnalyzer->tokenize($text); // Basic tokenization
}
```

### 🎯 **Language-Specific Configuration Examples**

[](#-language-specific-configuration-examples)

#### 🇩🇪 German Sites (Maximum Performance)

[](#-german-sites-maximum-performance)

```
# Scheduler Configuration: minimumSimilarity = 0.15
plugin.tx_semanticsuggestion_suggestions.settings {
    enableStemming = 1                # CRITICAL for compound words
    proximityThreshold = 0.25         # Lower threshold (compound matching)
    minTextLength = 50                # German compound words in short text

    analyzedFields {
        title = 2.0                   # German titles contain key compounds
        keywords = 2.5                # German keywords very specific
        content = 1.0                 # Standard weight
        description = 1.2             # Meta descriptions useful
        abstract = 1.3                # German abstracts well-structured
    }
}

```

#### 🇫🇷🇬🇧🇪🇸 French/English/Spanish Sites

[](#-frenchenglishspanish-sites)

```
# Scheduler Configuration: minimumSimilarity = 0.2
plugin.tx_semanticsuggestion_suggestions.settings {
    enableStemming = 1                # Advanced stemming available
    proximityThreshold = 0.3          # Standard TF-IDF threshold
    minTextLength = 50                # Standard minimum length

    analyzedFields {
        title = 1.5                   # Titles important but less compound
        keywords = 2.0                # Keywords still highly relevant
        content = 1.0                 # Main content
        description = 1.0             # Meta descriptions
        abstract = 1.1                # Abstracts helpful
    }
}

```

#### 🇮🇹🇵🇹🇳🇱 Other Languages (Italian, Portuguese, Dutch, etc.)

[](#-other-languages-italian-portuguese-dutch-etc)

```
# Scheduler Configuration: minimumSimilarity = 0.25
plugin.tx_semanticsuggestion_suggestions.settings {
    enableStemming = 0                # No advanced stemming, but TF-IDF still helps
    proximityThreshold = 0.35         # Higher threshold (less precise without stemming)
    minTextLength = 100               # Longer text needed for TF-IDF accuracy

    analyzedFields {
        title = 1.8                   # Rely more on titles without stemming
        keywords = 2.2                # Keywords become more important
        content = 0.8                 # Content less reliable without stemming
        description = 1.0             # Standard weight
        abstract = 1.0                # Standard weight
    }
}

```

**Note**: Even without stemming, languages like Italian and Portuguese still get **significant improvements** from TF-IDF vectorization compared to basic word counting.

Configuration
-------------

[](#configuration)

🎯 **NEW in v3.1**: **Storage vs Display Quality Configuration** for clarity! Separate `qualityLevel` parameters for storage (Scheduler) and display (TypoScript) with clear explanations.

### 🚀 Storage vs Display Configuration (v3.1+)

[](#-storage-vs-display-configuration-v31)

The configuration separates **what gets stored** (Scheduler) from **what gets displayed** (TypoScript) for maximum flexibility:

```
🎯 CONFIGURATION FLOW:
┌─────────────────────┐    ┌──────────────────┐    ┌─────────────────────┐    ┌──────────────────┐
│ Storage QualityLevel│───▶│  Storage: direct │───▶│ Display QualityLevel│───▶│  Display Filter   │
│ (Scheduler Task)    │    │  (exact match)   │    │ (TypoScript)        │    │  (quality)       │
│ 0.3 → stores ≥0.3   │    │                  │    │ 0.4 → shows ≥0.4    │    │                  │
└─────────────────────┘    └──────────────────┘    └─────────────────────┘    └──────────────────┘
   CONTROLS DATABASE         SAVES SIMILARITIES      CONTROLS FRONTEND       USER SEES RESULTS
   STORAGE EFFICIENCY        FOR PRECISION           DISPLAY QUALITY         FILTERED SUGGESTIONS

```

**Benefits:**

- ✅ **Clear separation** of storage vs display logic
- ✅ **Flexible filtering** (display can be stricter than storage)
- ✅ **Performance optimization** (store broad, display selective)
- ✅ **Backward compatibility** with legacy configurations
- ✅ **Self-explanatory** values (higher = more selective)

### Legacy Configuration Hierarchy (v3.0 and earlier)

[](#legacy-configuration-hierarchy-v30-and-earlier)

> **⚠️ DEPRECATED**: This section documents the old system for reference. **Use unified `qualityLevel` instead!**

Click to view legacy configuration detailsUnderstanding the configuration hierarchy is **critical** for proper setup. The extension uses a **two-tier system** where Scheduler settings **always take precedence** over TypoScript settings:

```
🔄 CONFIGURATION FLOW:
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐    ┌──────────────────┐
│   Scheduler     │───▶│    Database      │───▶│   TypoScript    │───▶│   Frontend       │
│   Settings      │    │    Storage       │    │   Settings      │    │   Display        │
└─────────────────┘    └──────────────────┘    └─────────────────┘    └──────────────────┘
     LEVEL 1                 LEVEL 2                LEVEL 3               LEVEL 4
   (Analysis &              (Persistent            (Display &             (User Sees
    Storage)                 Data)                 Filtering)             Results)

```

#### **🎯 Level 1: Scheduler Configuration (Master Control)**

[](#-level-1-scheduler-configuration-master-control)

- **Controls**: What gets analyzed and stored in database
- **Authority**: Absolute - cannot be overridden by TypoScript
- **Key Settings**:
    - `startPageId`: Defines analysis scope
    - `excludePages`: Pages **never analyzed** (permanent exclusion)
    - `minimumSimilarity`: Minimum score to **store** in database
    - `recursiveExclusion`: How exclusions are applied

#### **🔍 Level 2: Database Storage (Persistent Data)**

[](#-level-2-database-storage-persistent-data)

- **Contains**: Only similarities ≥ `minimumSimilarity` from Scheduler
- **Source**: Generated by Scheduler task execution
- **Limitation**: TypoScript cannot access data that was never stored

#### **🎨 Level 3: TypoScript Configuration (Display Filter)**

[](#-level-3-typoscript-configuration-display-filter)

- **Controls**: What gets displayed from existing database data
- **Limitation**: Can only filter/limit existing data, cannot create new data
- **Key Settings**:
    - `proximityThreshold`: Minimum score to **display** (must be ≥ Scheduler `minimumSimilarity`)
    - `maxSuggestions`: Limit displayed results
    - `excludePages`: Pages excluded from **display only** (still analyzed/stored)

#### **👁️ Level 4: Frontend Display (Final Output)**

[](#️-level-4-frontend-display-final-output)

- **Shows**: Final filtered and limited results
- **Source**: Database data filtered by TypoScript settings

#### **⚖️ Priority Rules**

[](#️-priority-rules)

1. **Scheduler ALWAYS wins**:

    ```
    Scheduler excludePages = "42,56"
    TypoScript excludePages = "" (empty)
    → Pages 42,56 will NEVER appear (not analyzed at all)

    ```
2. **TypoScript cannot override Scheduler thresholds**:

    ```
    Scheduler minimumSimilarity = 0.5
    TypoScript proximityThreshold = 0.3
    → Impossible! No similarity < 0.5 exists in database

    ```
3. **TypoScript can only be MORE restrictive**:

    ```
    Scheduler minimumSimilarity = 0.3 ✅
    TypoScript proximityThreshold = 0.5 ✅
    → Works: Shows only similarities ≥ 0.5 from stored data ≥ 0.3

    ```

### Scheduler Task Configuration

[](#scheduler-task-configuration)

> **🔧 CRITICAL**: This is the **primary configuration** that controls what gets analyzed and stored in the database. TypoScript cannot override these settings!

Create a **"Semantic Suggestion: Generate Similarities"** task in the TYPO3 Scheduler module with these settings:

> **⚠️ WARNING**: Choose your `minimumSimilarity` carefully - it cannot be lowered retroactively without re-running the entire analysis!

- **`startPageId`** (required): The UID of the root page from which the analysis will begin. This defines the scope of the analysis for this task run. Each task execution is linked to a Start Page ID (stored as `root_page_id` in the DB).

    - Example: `1` (for site root page)
- **`excludePages`** (optional): Comma-separated list of page UIDs that will **not be analyzed**, and their similarities will **not be stored**.

    - Example: `42,56,78`
- **`minimumSimilarity`** (required): Threshold (0.0 to 1.0) below which a pair of similar pages will **not be saved** to the database. This controls storage efficiency.

    - Example: `0.3` (saves only pairs with similarity ≥ 30%)

**Recommended scheduling:**

- Frequency: Daily or weekly
- Execution time: During off-peak hours (e.g., 2:00 AM)

### TypoScript Settings

[](#typoscript-settings)

> **🎨 DISPLAY FILTER**: These settings control the **frontend display** and the **analysis algorithm details**. They can only filter/limit what was already stored by the Scheduler.

> **❌ CONSTRAINT**: `proximityThreshold` MUST be ≥ Scheduler `minimumSimilarity` (otherwise no suggestions will display)

Define them in your TypoScript Setup file under `plugin.tx_semanticsuggestion_suggestions.settings`.

#### Constants (constants.typoscript)

[](#constants-constantstyposcript)

```
plugin.tx_semanticsuggestion_suggestions.settings {
    # --- Frontend Display Settings ---
    # ⚠️ IMPORTANT: Must be ≥ Scheduler minimumSimilarity
    proximityThreshold = 0.25    # TF-IDF optimized threshold (was 0.5 in v2.x)
    maxSuggestions = 3           # Maximum number of suggestions to display
    excerptLength = 100          # Max length of the text excerpt
    excludePages =               # Pages to exclude from DISPLAY only (comma-separated UIDs)

    # --- Analysis Algorithm Settings (Used by Scheduler task) ---
    recencyWeight = 0.2          # Weight of recency in the final score (0.0 to 1.0)

    # Fields analyzed and their weights (TF-IDF enhanced)
    analyzedFields {
        title = 1.5              # Page titles are highly relevant
        description = 1.0        # Meta descriptions
        keywords = 2.0           # Explicit keywords have high weight
        abstract = 1.2           # Page abstracts/summaries
        content = 1.0            # Main page content (can be noisy)
    }

    # --- NLP & Language Settings (v3.0+) ---
    enableStemming = 1           # Enable advanced stemming (especially for German)
    defaultLanguage = en         # Fallback language
    minTextLength = 50           # Minimum text length for analysis
    confidenceThreshold = 0.3    # Language detection confidence threshold

    # --- Debugging ---
    debugMode = 0                # Enable debug logs (0 or 1)
}

```

#### Setup (setup.typoscript)

[](#setup-setuptyposcript)

```
# Main plugin configuration
plugin.tx_semanticsuggestion_suggestions {
    settings {
        proximityThreshold = {$plugin.tx_semanticsuggestion_suggestions.settings.proximityThreshold}
        maxSuggestions = {$plugin.tx_semanticsuggestion_suggestions.settings.maxSuggestions}
        excludePages = {$plugin.tx_semanticsuggestion_suggestions.settings.excludePages}
        excerptLength = {$plugin.tx_semanticsuggestion_suggestions.settings.excerptLength}
        recencyWeight = {$plugin.tx_semanticsuggestion_suggestions.settings.recencyWeight}
        debugMode = {$plugin.tx_semanticsuggestion_suggestions.settings.debugMode}

        analyzedFields {
            title = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.title}
            description = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.description}
            keywords = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.keywords}
            abstract = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.abstract}
            content = {$plugin.tx_semanticsuggestion_suggestions.settings.analyzedFields.content}
        }
    }
    view {
        # Paths to your Fluid templates if you wish to customize them
        templateRootPaths.10 = EXT:your_extension/Resources/Private/Templates/
        partialRootPaths.10 = EXT:your_extension/Resources/Private/Partials/
        layoutRootPaths.10 = EXT:your_extension/Resources/Private/Layouts/
    }
}

# Reusable TypoScript object
lib.semantic_suggestion = USER
lib.semantic_suggestion {
    userFunc = TYPO3\CMS\Extbase\Core\Bootstrap->run
    extensionName = SemanticSuggestion
    pluginName = Suggestions
    vendorName = TalanHdf

    view =< plugin.tx_semanticsuggestion_suggestions.view
    persistence =< plugin.tx_semanticsuggestion_suggestions.persistence
    settings =< plugin.tx_semanticsuggestion_suggestions.settings
}

# Content element integration
tt_content.list.20.semanticsuggestion_suggestions =< plugin.tx_semanticsuggestion_suggestions

```

### NLP Configuration

[](#nlp-configuration)

#### Advanced Language &amp; Text Processing Settings

[](#advanced-language--text-processing-settings)

```
plugin.tx_semanticsuggestion_suggestions {
    settings {
        # 🧠 NLP Features (NEW in v3.0)
        enableStemming = 1               # Enable advanced stemming (German compound words)
        defaultLanguage = en             # Fallback language
        minTextLength = 50               # Minimum text length for analysis
        confidenceThreshold = 0.3        # Language detection confidence threshold

        # 🌐 Language Mapping (TYPO3 12/13)
        languageMapping {
            0 = en    # Language UID 0 → English
            1 = fr    # Language UID 1 → French
            2 = de    # Language UID 2 → German
            3 = es    # Language UID 3 → Spanish
            4 = it    # Language UID 4 → Italian
            5 = pt    # Language UID 5 → Portuguese
        }

        # 🇩🇪 German Language Optimization
        proximityThreshold = 0.25        # Lower threshold for TF-IDF (more sensitive)

        # 📊 TF-IDF Algorithm Settings
        analyzedFields {
            title = 1.5      # German titles often contain key compound words
            keywords = 2.0   # German keywords are highly indicative
            content = 1.0    # Base content weight
            description = 1.0
            abstract = 1.2
        }
    }
}

```

#### Multilingual Site Example Configuration

[](#multilingual-site-example-configuration)

**For a German/English bilingual site:**

```
# constants.typoscript
plugin.tx_semanticsuggestion_suggestions.settings {
    # 🇩🇪 German language optimization
    enableStemming = 1                    # Critical for compound words
    proximityThreshold = 0.25             # Lower threshold for German compound matching
    confidenceThreshold = 0.3

    # Language detection mapping
    languageMapping {
        0 = en                           # TYPO3 language UID 0 = English
        1 = de                           # TYPO3 language UID 1 = German
    }

    # German-optimized field weights
    analyzedFields {
        title = 2.0                      # German titles contain key compounds
        keywords = 2.5                   # German keywords are very specific
        content = 1.0                    # Standard content weight
    }
}

# IMPORTANT: Create separate Scheduler tasks:
# Task 1: English content (startPageId = 1, minimumSimilarity = 0.3)
# Task 2: German content (startPageId = 10, minimumSimilarity = 0.25)

```

### Configuration Interaction

[](#configuration-interaction)

- **Analysis Scope**: Defined by the Scheduler task's `startPageId`.
- **DB Storage**: Controlled by the Scheduler task's `minimumSimilarity` and `excludePages`.
- **Similarity Calculation**: Performed by the `PageAnalysisService` (called by the Scheduler task), which uses the TypoScript settings `analyzedFields` and `recencyWeight`.
- **Frontend Display**: Reads from the DB and filters/limits based on the TypoScript settings `proximityThreshold`, `maxSuggestions`, `excludePages`.
- **Backend Display**: Reads from the DB (based on the selected `root_page_id`) and filters based on the TypoScript `proximityThreshold`.

**Key Points:**

- The `proximityThreshold` (TypoScript) cannot display suggestions with a score lower than the `minimumSimilarity` (Scheduler) because they were not saved. For the TypoScript setting to be effective, it must be ≥ the Scheduler threshold.
- A page excluded in the Scheduler will never be analyzed/stored. A page excluded *only* in TypoScript will be analyzed/stored (if not excluded in Scheduler) but not displayed. It's often simpler to keep the `excludePages` lists synchronized.
- You can create **multiple Scheduler tasks** with different `startPageId` values to analyze different sections of the site.

### Unified Configuration Setup

[](#unified-configuration-setup)

#### 1. Scheduler Task Configuration (Simplified)

[](#1-scheduler-task-configuration-simplified)

Create a **"Semantic Suggestion: Generate Similarities"** task with these settings:

- **`startPageId`** (required): Root page UID for analysis scope
- **`qualityLevel`** (required): Quality threshold (0.1-1.0) for suggestions
    - **Storage**: Automatically set to `qualityLevel - 0.1` (broad data collection)
    - **Display**: Uses `qualityLevel` directly (quality suggestions to users)
- **`excludePages`** (optional): Pages to exclude from both analysis and display
- **`recursiveExclusion`** (optional): Apply exclusions recursively

**Recommended Quality Levels:**

```
🇩🇪 German sites: 0.25 (compound word optimization)
🌍 Standard sites: 0.30 (balanced quality/quantity)
📚 Content sites: 0.35 (higher quality threshold)
💎 Premium sites: 0.40 (very selective suggestions)

```

#### 2. TypoScript Configuration (Simplified)

[](#2-typoscript-configuration-simplified)

```
plugin.tx_semanticsuggestion_suggestions {
    settings {
        # 🎯 UNIFIED QUALITY CONTROL
        qualityLevel = 0.3              # Single parameter for all quality control

        # Display Settings
        maxSuggestions = 3              # Number of suggestions to show
        excludePages =                  # Additional pages to exclude from display
        excerptLength = 100             # Text excerpt length

        # NLP Settings (v3.0+)
        enableStemming = 1              # Enable advanced text processing
        defaultLanguage = en            # Fallback language
        debugMode = 0                   # Debug logging
    }
}

```

### Configuration Examples

[](#configuration-examples)

#### Basic Setup (Most Common)

[](#basic-setup-most-common)

```
# Single quality level controls everything
qualityLevel = 0.3

# Result:
# - Storage threshold: 0.2 (collects broad range)
# - Display threshold: 0.3 (shows quality suggestions)
# - No configuration conflicts possible

```

#### German Site Optimization

[](#german-site-optimization)

```
# Lower threshold for German compound words
qualityLevel = 0.25
enableStemming = 1

# Result optimized for compound words like:
# "Automobil" ↔ "Automobilindustrie"

```

#### High-Quality Site

[](#high-quality-site)

```
# Higher threshold for premium content
qualityLevel = 0.4

# Result:
# - Storage threshold: 0.3 (good range)
# - Display threshold: 0.4 (only excellent suggestions)

```

### Migration from Legacy Configuration

[](#migration-from-legacy-configuration)

#### Automatic Migration

[](#automatic-migration)

The extension automatically migrates old configurations:

```
Legacy v3.0:
  Scheduler: minimumSimilarity = 0.4
  TypoScript: proximityThreshold = 0.5

Auto-migrated to v3.1:
  qualityLevel = 0.5 (migrated from proximityThreshold)
  Internal storage = 0.4 (preserved)

```

#### Manual Migration Guide

[](#manual-migration-guide)

1. **Identify your old `proximityThreshold`** from TypoScript
2. **Set `qualityLevel`** to that value
3. **Remove old parameters**:
    - Delete `proximityThreshold` from TypoScript
    - Delete `minimumSimilarity` from Scheduler (auto-computed)
    - Merge duplicate `excludePages` lists

**Example Migration:**

```
# OLD v3.0 configuration
plugin.tx_semanticsuggestion_suggestions.settings {
    proximityThreshold = 0.35          # OLD: Display threshold
    excludePages = 42,56,78            # OLD: Display exclusions only
}

# NEW v3.1 configuration
plugin.tx_semanticsuggestion_suggestions.settings {
    qualityLevel = 0.35               # NEW: Unified quality control
    excludePages = 42,56,78           # NEW: Unified exclusions (analysis + display)
}

# Scheduler task OLD: minimumSimilarity = 0.25, excludePages = ""
# Scheduler task NEW: qualityLevel = 0.35 (automatically computes storage = 0.25)

```

### Legacy Configuration Constraints &amp; Common Pitfalls

[](#legacy-configuration-constraints--common-pitfalls)

Understanding these technical constraints will save you hours of debugging:

#### **🚫 Critical Constraints**

[](#-critical-constraints)

##### **1. Threshold Hierarchy Rule**

[](#1-threshold-hierarchy-rule)

```
❌ WRONG Configuration:
Scheduler: minimumSimilarity = 0.7
TypoScript: proximityThreshold = 0.3
→ Result: NO suggestions displayed (none stored below 0.7)

✅ CORRECT Configuration:
Scheduler: minimumSimilarity = 0.3
TypoScript: proximityThreshold = 0.7
→ Result: Shows high-quality suggestions from broader stored data

```

**Rule**: `proximityThreshold` ≥ `minimumSimilarity` (or equal)

##### **2. Exclusion Scope Difference**

[](#2-exclusion-scope-difference)

```
❌ PROBLEMATIC:
Scheduler: excludePages = "" (empty)
TypoScript: excludePages = "42,56,78"
→ Result: Pages 42,56,78 are analyzed/stored but never displayed (wasted processing)

✅ EFFICIENT:
Scheduler: excludePages = "42,56,78"
TypoScript: excludePages = "" (empty or same list)
→ Result: Pages 42,56,78 are never processed (faster, cleaner)

```

##### **3. TF-IDF Score Ranges (NEW in v3.0)**

[](#3-tf-idf-score-ranges-new-in-v30)

```
⚠️ TF-IDF produces LOWER scores than v2.x cosine similarity:
v2.x typical range: 0.3-0.9
v3.0 TF-IDF range: 0.05-0.4

❌ Legacy Configuration:
minimumSimilarity = 0.8  → NO results with TF-IDF

✅ TF-IDF Optimized:
minimumSimilarity = 0.15  → Good range for TF-IDF
proximityThreshold = 0.25  → Quality suggestions

```

#### **📊 Language-Specific Constraints**

[](#-language-specific-constraints)

##### **German Language (Compound Words)**

[](#german-language-compound-words)

```
🇩🇪 German sites need LOWER thresholds due to compound word stemming:

❌ Standard Configuration:
proximityThreshold = 0.5  → Misses compound relationships

✅ German Optimized:
minimumSimilarity = 0.15
proximityThreshold = 0.25
enableStemming = 1
→ Captures "Automobil" ↔ "Automobilindustrie" relationships

```

##### **Multi-language Sites**

[](#multi-language-sites)

```
⚠️ CONSTRAINT: Each language needs separate analysis

❌ Single Task Configuration:
Task 1: startPageId = 1 (analyzing both English + German pages)
→ Result: Language mixing, poor similarity quality

✅ Multi-Task Configuration:
Task 1: startPageId = 1 (English root) - minimumSimilarity = 0.3
Task 2: startPageId = 10 (German root) - minimumSimilarity = 0.25
→ Result: Optimized per-language analysis

```

#### **⚡ Performance Constraints**

[](#-performance-constraints)

##### **Large Site Thresholds**

[](#large-site-thresholds)

```
Sites with >500 pages:

❌ Permissive Configuration:
minimumSimilarity = 0.05  → Database bloat (millions of records)
maxSuggestions = 10  → Slow frontend queries

✅ Performance Optimized:
minimumSimilarity = 0.25  → Quality storage
proximityThreshold = 0.4  → Fast display
maxSuggestions = 3  → Quick queries

```

#### **🔧 Validation Rules Checklist**

[](#-validation-rules-checklist)

Before deploying, verify these constraints:

- **Threshold Check**: `proximityThreshold` ≥ `minimumSimilarity`
- **Exclusion Sync**: Scheduler `excludePages` includes all TypoScript exclusions
- **TF-IDF Adjustment**: Thresholds lowered from v2.x values (≤ 0.4 typically)
- **Language Separation**: Each language has its own Scheduler task
- **Performance Test**: Task execution time acceptable for your server
- **Storage Monitoring**: Database table `tx_semanticsuggestion_similarities` size reasonable

#### **🚨 Warning Signs of Misconfiguration**

[](#-warning-signs-of-misconfiguration)

Watch for these symptoms:

SymptomLikely CauseSolution**Zero suggestions displayed**`proximityThreshold` &gt; all stored similaritiesLower `proximityThreshold` or `minimumSimilarity`**Poor suggestion quality**Threshold too lowRaise `proximityThreshold`**Missing expected pages**Pages excluded in SchedulerCheck `excludePages` settings**Scheduler timeouts**Large site with low thresholdRaise `minimumSimilarity`**Mixed language results**Single task for multilingual siteCreate per-language tasksUsage (Frontend)
----------------

[](#usage-frontend)

Integrate the plugin into your Fluid templates to display suggestions:

```

```

Or include it directly in TypoScript:

```
# Include on a page
page.10 =< lib.semantic_suggestion

# Or in a content element
lib.myContent = COA
lib.myContent {
    10 =< lib.semantic_suggestion
}

```

The plugin will read relevant suggestions for the current page from the database, applying filters defined in the TypoScript settings (`proximityThreshold`, `maxSuggestions`, `excludePages`).

Bootstrap Package Integration
-----------------------------

[](#bootstrap-package-integration)

The extension provides **optional automatic integration** with Bootstrap Package templates. When enabled, semantic suggestions will appear automatically after the main content on all pages.

### Quick Setup (Bootstrap Package Users)

[](#quick-setup-bootstrap-package-users)

**Step 1**: Enable the integration in your TypoScript Constants:

```
plugin.tx_semanticsuggestion_suggestions.settings {
    overrideBootstrapTemplates = 1
}

```

Or via the **Constant Editor** in TYPO3 Backend:

- Go to **Template** module
- Select **Constant Editor**
- Choose category **semantic\_suggestion** → **Template Integration**
- Enable **Override Bootstrap Package Templates**

**Step 2**: Clear the TYPO3 cache:

```
./vendor/bin/typo3 cache:flush
```

That's it! Semantic suggestions will now appear on all Bootstrap Package page layouts.

### Supported Page Layouts

[](#supported-page-layouts)

The following Bootstrap Package page layouts are supported:

LayoutDescription`Default.html`Standard single-column layout`Simple.html`Minimal layout without footer`2Columns.html`Two-column layout (75/25)`2Columns2575.html`Two-column layout (25/75)`2Columns5050.html`Two-column layout (50/50)`2ColumnsOffsetRight.html`Two-column with right offset`3Columns.html`Three-column layout`None.html`Empty layout (content only)`SpecialFeature.html`Feature page layout`SpecialStart.html`Start/landing page layout`SubnavigationLeft.html`Layout with left subnavigation`SubnavigationRight.html`Layout with right subnavigation`SubnavigationLeft2Columns.html`Left subnav with 2 columns`SubnavigationRight2Columns.html`Right subnav with 2 columns### Placement of Suggestions

[](#placement-of-suggestions)

Suggestions are placed:

- **After** the main content area (colPos 0)
- **Before** the bottom content area (colPos 9)
- Inside a `` wrapper

### Manual Integration (Non-Bootstrap Package Users)

[](#manual-integration-non-bootstrap-package-users)

If you use a **custom template package** or **Fluid Styled Content**, add the marker manually to your page template:

```
Semantic Suggestion - Related content based on semantic similarity

```

**Example placement in a custom template:**

```

```

### Styling the Suggestions

[](#styling-the-suggestions)

The suggestions are wrapped in a `section-semantic-suggestion` class. You can customize the styling:

```
.section-semantic-suggestion {
    margin-top: 2rem;
    padding: 1.5rem;
    background-color: #f8f9fa;
    border-radius: 0.5rem;
}

.section-semantic-suggestion h3 {
    margin-bottom: 1rem;
    font-size: 1.25rem;
}

.section-semantic-suggestion .suggestion-item {
    margin-bottom: 1rem;
    padding: 1rem;
    background: white;
    border-radius: 0.25rem;
    box-shadow: 0 1px 3px rgba(0,0,0,0.1);
}
```

### Disabling for Specific Pages

[](#disabling-for-specific-pages)

If you need to disable suggestions on specific pages while keeping the Bootstrap Package integration active:

1. **Via TypoScript conditions:**

```
[page["uid"] == 42]
    lib.semantic_suggestion >
[END]

```

2. **Via page exclusions:**

```
plugin.tx_semanticsuggestion_suggestions.settings {
    excludePages = 42,56,78
}

```

Backend Module
--------------

[](#backend-module)

[![Backend Module](Documentation/Medias/backend_module.png)](Documentation/Medias/backend_module.png)

A backend module ("Semantic Suggestion" under "Web") allows visualizing the results of the analyses stored in the database.

### Features

[](#features-1)

- **Analysis Selection**: Choose which analysis to view (based on the `startPageId` / `root_page_id` of executed Scheduler tasks).
- **Detailed Statistics**: Most similar pairs, score distribution, pages with the most links, language statistics.
- **Configuration Overview**: Reminder of the main parameters used (display threshold, etc.).
- **Performance Metrics**: Module load time, number of stored pairs for the selected analysis.

Scheduler Task
--------------

[](#scheduler-task)

The **"Semantic Suggestion: Generate Similarities"** task is essential for the extension's operation.

- **Role**: Calculates similarities between pages (using `PageAnalysisService`) and saves relevant results (above the `minimumSimilarity` threshold) to the `tx_semanticsuggestion_similarities` table.
- **Configuration**: Set the `startPageId`, `excludePages`, and `minimumSimilarity` via the Scheduler interface.
- **Frequency**: Schedule its execution regularly (e.g., daily, weekly) during off-peak hours to keep suggestions up-to-date without impacting site performance.

Similarity Logic (TF-IDF Enhanced)
----------------------------------

[](#similarity-logic-tf-idf-enhanced)

### 🔄 Processing Pipeline

[](#-processing-pipeline)

1. **🎯 Language Detection** (NEW in v3.0)

    ```
    Text Input → TYPO3 Site Context → Content Analysis → Language: "de"

    ```
2. **📝 Advanced Text Processing** (via nlp\_tools)

    ```
    Raw Text → Stop Words Removal → Stemming (German) → Clean Tokens

    Example:
    "Die deutschen Automobilhersteller"
    → "deutsch automobilherstell" (stemmed)

    ```
3. **🔢 TF-IDF Vectorization** (NEW in v3.0)

    ```
    [Text1, Text2] → TF-IDF Vectors → Cosine Similarity Score

    Traditional: Simple word counting
    TF-IDF: Professional semantic analysis

    ```
4. **💾 Smart Storage &amp; Display**

    ```
    Similarity Score + Recency Boost → Database → Frontend Filter
    Scheduler threshold: 0.3 → Display threshold: 0.5

    ```

### 📊 Algorithm Improvements

[](#-algorithm-improvements)

#### Before (v2.x): Basic Cosine Similarity

[](#before-v2x-basic-cosine-similarity)

```
// Simple word frequency comparison
$similarity = $dotProduct / ($magnitude1 * $magnitude2);
```

#### After (v3.0): TF-IDF + NLP

[](#after-v30-tf-idf--nlp)

```
// Professional semantic analysis
$tfidfResult = $this->textVectorizer->createTfIdfVectors([$text1, $text2], $language);
$similarity = $this->textVectorizer->cosineSimilarity($vector1, $vector2);

// + German stemming, stop word removal, confidence scoring
```

**Performance Impact**: 30-50% better accuracy, especially for German compound words.

Multilingual Sites Setup
------------------------

[](#multilingual-sites-setup)

### 🌍 TYPO3 12/13 Site Configuration

[](#-typo3-1213-site-configuration)

The extension automatically detects languages from your TYPO3 site configuration:

```
# config/sites/main/config.yaml
languages:
  -
    languageId: 0
    title: 'English'
    hreflang: 'en-US'
    locale: 'en_US.UTF-8'    # → Auto-detected as "en"
    iso-639-1: 'en'
  -
    languageId: 1
    title: 'Deutsch'
    hreflang: 'de-DE'
    locale: 'de_DE.UTF-8'    # → Auto-detected as "de"
    iso-639-1: 'de'
  -
    languageId: 2
    title: 'Français'
    hreflang: 'fr-FR'
    locale: 'fr_FR.UTF-8'    # → Auto-detected as "fr"
    iso-639-1: 'fr'
```

### 🎯 Per-Language Scheduler Tasks

[](#-per-language-scheduler-tasks)

For optimal performance, create separate Scheduler tasks for each language:

```
Task 1: "Generate Similarities - English"
- startPageId: 1 (English root)
- minimumSimilarity: 0.3

Task 2: "Generate Similarities - German"
- startPageId: 2 (German root)
- minimumSimilarity: 0.25  # Lower for German (compound words)

Task 3: "Generate Similarities - French"
- startPageId: 3 (French root)
- minimumSimilarity: 0.3

```

### 🔧 Language-Specific Tuning

[](#-language-specific-tuning)

#### German Language Optimization

[](#german-language-optimization)

```
plugin.tx_semanticsuggestion_suggestions.settings {
    # German compound words need lower thresholds
    proximityThreshold = 0.25

    # Enable stemming for better compound word detection
    enableStemming = 1

    # German titles are often highly descriptive
    analyzedFields {
        title = 2.0      # Higher weight for German titles
        keywords = 2.5   # German keywords are very specific
    }
}

```

### 🚨 Troubleshooting Multilingual Sites

[](#-troubleshooting-multilingual-sites)

#### Problem: Language always detected as "en"

[](#problem-language-always-detected-as-en)

**Solution**: Check your TypoScript language mapping:

```
plugin.tx_semanticsuggestion_suggestions.settings {
    languageMapping {
        0 = en
        1 = de    # Make sure this matches your site language UID
        2 = fr
    }
}

```

#### Problem: Mixed language content not detected properly

[](#problem-mixed-language-content-not-detected-properly)

**Enable debug mode** to see detection process:

```
plugin.tx_semanticsuggestion_suggestions.settings {
    debugMode = 1
}

```

**Check logs** for:

```
Language detected via nlp_tools: de
TF-IDF similarity calculated: language=de, vocabularySize=156
Confidence verification: firstScore=0.85, secondScore=0.45, using content analysis

```

Display Customization
---------------------

[](#display-customization)

Modify the appearance of suggestions by overriding the plugin's Fluid template (`List.html`). Configure the paths to your custom templates in TypoScript (see Configuration section).

Debugging &amp; Performance
---------------------------

[](#debugging--performance)

### 🔍 Debug Mode

[](#-debug-mode)

Enable comprehensive logging to troubleshoot language detection and similarity calculation:

```
plugin.tx_semanticsuggestion_suggestions.settings {
    debugMode = 1
}

```

**Debug logs show**:

```
[INFO] Language detected via nlp_tools: de
[DEBUG] TF-IDF similarity calculated: page1=123, page2=456, similarity=0.75, vocabularySize=245
[DEBUG] German stemming applied: "Automobilindustrie" → "automobilindustr"
[DEBUG] Confidence verification: using TYPO3 context due to low confidence difference
[ERROR] Failed to create TF-IDF vectors: text too short, using fallback calculation

```

### ⚡ Performance Monitoring

[](#-performance-monitoring)

**Backend Module** shows performance metrics:

- TF-IDF processing time per page pair
- Language detection accuracy
- Vocabulary size per language
- Cache hit rates for stemming/vectorization

**Scheduler Task** execution time:

- **Small sites** (&lt;100 pages): ~10-30 seconds
- **Medium sites** (100-500 pages): ~1-5 minutes
- **Large sites** (&gt;500 pages): ~5-30 minutes

### 🚀 Performance Optimization

[](#-performance-optimization)

#### Recommended Settings for Large Sites (&gt;500 pages)

[](#recommended-settings-for-large-sites-500-pages)

```
# Scheduler Configuration: minimumSimilarity = 0.3 (storage optimization)
plugin.tx_semanticsuggestion_suggestions.settings {
    # Performance optimizations
    minTextLength = 100               # Skip short content (faster processing)
    proximityThreshold = 0.4          # Higher display threshold (faster queries)
    maxSuggestions = 3                # Limit results (faster rendering)

    # Selective field analysis (reduce processing time)
    analyzedFields {
        title = 2.0                   # Titles are fast to process
        keywords = 2.0                # Keywords are lightweight
        content = 0.5                 # Reduce content weight (can be slow)
        description = 1.0             # Meta descriptions are fast
        abstract = 0                  # Disable if not commonly used
    }

    # Conservative language settings
    confidenceThreshold = 0.4         # Higher confidence reduces processing
    enableStemming = 1                # Keep enabled (cached results)
}

# SCHEDULER OPTIMIZATION for large sites:
# Split into multiple tasks:
# Task 1: Pages 1-100 (daily)
# Task 2: Pages 101-200 (every 2 days)
# Task 3: Pages 201+ (weekly)

```

#### Cache Configuration

[](#cache-configuration)

Ensure TYPO3 cache is properly configured for optimal performance:

```
# Clear cache after configuration changes
./vendor/bin/typo3 cache:flush

# Monitor cache effectiveness
./vendor/bin/typo3 cache:listGroups
```

Configuration Troubleshooting
-----------------------------

[](#configuration-troubleshooting)

### 🔍 Step-by-Step Diagnosis

[](#-step-by-step-diagnosis)

When suggestions aren't working as expected, follow this systematic approach:

#### **Step 1: Verify Scheduler Task Execution**

[](#step-1-verify-scheduler-task-execution)

```
# Check if task has run successfully
./vendor/bin/typo3 scheduler:run

# Check TYPO3 logs for task errors
tail -f var/log/typo3_*.log | grep -i semantic
```

**Expected Output:**

```
[INFO] Starting similarity generation task, startPageId: 1, minimumSimilarity: 0.3
[INFO] Similarity generation task completed successfully

```

#### **Step 2: Verify Database Content**

[](#step-2-verify-database-content)

```
-- Check if similarities are stored
SELECT COUNT(*) as total_similarities
FROM tx_semanticsuggestion_similarities;

-- Check score distribution
SELECT
    ROUND(similarity_score, 1) as score_range,
    COUNT(*) as count,
    sys_language_uid
FROM tx_semanticsuggestion_similarities
GROUP BY ROUND(similarity_score, 1), sys_language_uid
ORDER BY score_range DESC;
```

**Healthy Output Example:**

```
score_range | count | sys_language_uid
0.4         | 15    | 0
0.3         | 42    | 0
0.2         | 128   | 0
0.1         | 203   | 0

```

#### **Step 3: Test Configuration Hierarchy**

[](#step-3-test-configuration-hierarchy)

```
# Enable debug mode temporarily
plugin.tx_semanticsuggestion_suggestions.settings {
    debugMode = 1
}

```

**Check Debug Logs:**

```
tail -f typo3temp/logs/semantic_suggestion.log
```

### 🚨 Common Problems &amp; Solutions

[](#-common-problems--solutions)

#### **Problem 1: "No suggestions displayed anywhere"**

[](#problem-1-no-suggestions-displayed-anywhere)

**Symptoms:**

- Frontend shows empty suggestions list
- Backend module shows 0 similar pairs

**Diagnosis Checklist:**

```
✓ Scheduler task executed successfully?
✓ Database contains similarities?
✓ proximityThreshold ≤ stored similarities?
✓ Current page has stored similarities?

```

**Solutions:**

1. **Threshold Too High**

    ```
    ❌ Current: proximityThreshold = 0.8
    ✅ Fix: proximityThreshold = 0.3 (or lower)

    # Or check what's actually in database:
    SELECT MAX(similarity_score) FROM tx_semanticsuggestion_similarities;

    ```
2. **Scheduler Never Ran**

    ```
    # Manual execution
    ./vendor/bin/typo3 scheduler:run

    # Check task configuration
    SELECT * FROM tx_scheduler_task WHERE classname LIKE '%Similarities%';
    ```
3. **Wrong Root Page**

    ```
    ❌ Current: startPageId = 1, viewing page = 42
    ✅ Fix: startPageId = 1, ensure page 42 is child of page 1

    # Verify page tree relationship
    SELECT pid, title FROM pages WHERE uid = 42;

    ```

#### **Problem 2: "Suggestions displayed but poor quality"**

[](#problem-2-suggestions-displayed-but-poor-quality)

**Symptoms:**

- Suggestions shown but irrelevant
- Mixed languages in suggestions
- Very low similarity scores

**Solutions:**

1. **TF-IDF Score Adjustment (v3.0+)**

    ```
    ❌ Legacy: proximityThreshold = 0.7
    ✅ TF-IDF: proximityThreshold = 0.25

    # TF-IDF scores are naturally lower

    ```
2. **Language Separation Required**

    ```
    ❌ Single task: Pages 1-100 (mixed EN/DE content)
    ✅ Multi-task:
    - Task 1: English pages (1-50)
    - Task 2: German pages (51-100)

    ```
3. **Field Weight Optimization**

    ```
    # Increase weight of reliable fields
    analyzedFields {
        title = 2.0        # Titles are usually accurate
        keywords = 2.5     # Keywords are intentional
        content = 0.5      # Content can be noisy
    }

    ```

#### **Problem 3: "Missing expected pages in suggestions"**

[](#problem-3-missing-expected-pages-in-suggestions)

**Symptoms:**

- Page A should suggest Page B (they're clearly related)
- Page B exists in database but not suggested to Page A

**Diagnosis:**

```
-- Check if relationship exists in database
SELECT similarity_score
FROM tx_semanticsuggestion_similarities
WHERE page_id = A AND similar_page_id = B;

-- Check if excluded somewhere
SELECT exclude_pages FROM tx_scheduler_task WHERE classname LIKE '%Similarities%';
```

**Solutions:**

1. **Page Excluded in Scheduler**

    ```
    ❌ Scheduler excludePages = "42,56,78" (contains Page B)
    ✅ Remove Page B from exclusions, re-run Scheduler

    ```
2. **Similarity Below Threshold**

    ```
    -- Find actual similarity score
    SELECT similarity_score FROM tx_semanticsuggestion_similarities
    WHERE page_id = A AND similar_page_id = B;

    -- If score = 0.22 but proximityThreshold = 0.3
    -- Lower the threshold or improve content similarity
    ```
3. **Text Content Insufficient**

    ```
    Check if Page A or B has minimal text content:
    - Minimum 50 characters required
    - Pure image pages won't generate similarities
    - Check 'minTextLength' setting

    ```

#### **Problem 4: "Scheduler task timeouts or fails"**

[](#problem-4-scheduler-task-timeouts-or-fails)

**Symptoms:**

- Task shows "Failed" status
- PHP timeout errors in logs
- Task takes &gt;5 minutes

**Solutions:**

1. **Increase Processing Limits**

    ```
    # In Scheduler task or php.ini
    ini_set('max_execution_time', 300);  // 5 minutes
    ini_set('memory_limit', '512M');
    ```
2. **Reduce Analysis Scope**

    ```
    ❌ Current: startPageId = 1 (1000+ pages)
    ✅ Split:
    - Task 1: startPageId = 1 (pages 1-100)
    - Task 2: startPageId = 101 (pages 101-200)

    ```
3. **Increase Threshold**

    ```
    ❌ Current: minimumSimilarity = 0.05 (stores everything)
    ✅ Optimized: minimumSimilarity = 0.25 (quality only)

    ```

#### **Problem 5: "Mixed language suggestions"**

[](#problem-5-mixed-language-suggestions)

**Symptoms:**

- English page suggests German pages
- Suggestions ignore language boundaries

**Solutions:**

1. **Enable Language Mapping**

    ```
    plugin.tx_semanticsuggestion_suggestions.settings {
        languageMapping {
            0 = en
            1 = de
            2 = fr
        }
    }

    ```
2. **Check Site Configuration**

    ```
    # site/config.yaml should have proper locales
    languages:
      - languageId: 0
        locale: 'en_US.UTF-8'  # ← Must be properly formatted
      - languageId: 1
        locale: 'de_DE.UTF-8'  # ← Must be properly formatted
    ```
3. **Separate Scheduler Tasks**

    ```
    Instead of: One task analyzing mixed language tree
    Use: One task per language branch

    ```

### 🔧 Quick Fix Commands

[](#-quick-fix-commands)

```
# Clear all caches
./vendor/bin/typo3 cache:flush

# Re-run all scheduler tasks
./vendor/bin/typo3 scheduler:run

# Check database table size
echo "SELECT COUNT(*) FROM tx_semanticsuggestion_similarities;" | mysql your_db

# Reset extension configuration (emergency)
./vendor/bin/typo3 configuration:remove --path="EXTENSIONS/semantic_suggestion"
./vendor/bin/typo3 extension:deactivate semantic_suggestion
./vendor/bin/typo3 extension:activate semantic_suggestion
```

### 📋 Pre-Deployment Checklist

[](#-pre-deployment-checklist)

Before going live, verify:

- **Scheduler Tasks**: All tasks run successfully without timeouts
- **Database Check**: `tx_semanticsuggestion_similarities` contains expected data
- **Threshold Validation**: `proximityThreshold` ≥ `minimumSimilarity`
- **Language Testing**: Each language shows appropriate suggestions
- **Performance Test**: Frontend loads suggestions in &lt;200ms
- **Content Quality**: Manual review of suggestion relevance
- **Exclusion Review**: All intentionally excluded pages work correctly

🛡️ Final Configuration Validation Checklist
-------------------------------------------

[](#️-final-configuration-validation-checklist)

Use this comprehensive checklist to ensure your configuration is optimal:

### ✅ **Scheduler Configuration Validation**

[](#-scheduler-configuration-validation)

- **startPageId exists** and is accessible: `SELECT title FROM pages WHERE uid = [startPageId];`
- **minimumSimilarity appropriate for TF-IDF**: Between 0.1 and 0.4 (not v2.x values like 0.8)
- **excludePages list verified**: All UIDs exist and are intentionally excluded
- **Task execution successful**: Check task history and logs for errors
- **Multilingual separation**: Each language has its own task (recommended)

### ✅ **TypoScript Configuration Validation**

[](#-typoscript-configuration-validation)

- **Threshold hierarchy respected**: `proximityThreshold ≥ minimumSimilarity`
- **TF-IDF thresholds updated**: Not using v2.x legacy values (&gt;0.5)
- **Language settings match site**: `languageMapping` corresponds to TYPO3 language UIDs
- **Field weights optimized**: Higher weights for title/keywords, lower for content
- **Performance settings**: `maxSuggestions` and `minTextLength` appropriate for site size

### ✅ **Database &amp; Storage Validation**

[](#-database--storage-validation)

```
-- Verify data exists and has reasonable distribution
SELECT
    sys_language_uid,
    MIN(similarity_score) as min_score,
    MAX(similarity_score) as max_score,
    AVG(similarity_score) as avg_score,
    COUNT(*) as total_pairs
FROM tx_semanticsuggestion_similarities
GROUP BY sys_language_uid;

-- Check for unexpected language mixing
SELECT DISTINCT root_page_id, sys_language_uid, COUNT(*) as pairs
FROM tx_semanticsuggestion_similarities
GROUP BY root_page_id, sys_language_uid;
```

### ✅ **Frontend Integration Validation**

[](#-frontend-integration-validation)

- **Plugin displays correctly**: `` works
- **Suggestions appear on pages**: Test on multiple pages with different content
- **Language separation working**: German pages don't show English suggestions
- **Exclusions effective**: Excluded pages don't appear in suggestions
- **Performance acceptable**: Page load time impact &lt;100ms

### ✅ **NLP &amp; Language Detection Validation**

[](#-nlp--language-detection-validation)

- **nlp\_tools dependency installed**: `composer show cywolf/nlp-tools`
- **Stemming working** (German sites): Debug logs show stemmed words
- **Language detection accurate**: Pages analyzed in correct language
- **Site configuration proper**: Locales formatted as `de_DE.UTF-8` (not just `de`)

### 🚨 **Red Flags to Watch For**

[](#-red-flags-to-watch-for)

Warning SignQuick TestSolution**Zero suggestions anywhere**`SELECT COUNT(*) FROM tx_semanticsuggestion_similarities;`Lower thresholds or check task**Very low similarity scores** (all &lt;0.1)Check debug logs for TF-IDF failuresVerify text length and language detection**Mixed language results**Test DE page shows EN suggestionsSeparate scheduler tasks**Poor suggestion quality**Manual review of top suggestionsAdjust field weights or thresholds**Slow performance**Frontend timing &gt;500msIncrease thresholds or reduce maxSuggestions### 🎯 **Configuration Quality Score**

[](#-configuration-quality-score)

Rate your setup (aim for 80%+ before going live):

**Basic Setup (50 points)**

- Scheduler task runs (20 pts)
- Database contains data (15 pts)
- Frontend shows suggestions (15 pts)

**Optimization (30 points)**

- TF-IDF thresholds optimized (10 pts)
- Language separation implemented (10 pts)
- Performance &lt;200ms (10 pts)

**Advanced Features (20 points)**

- German stemming active (5 pts)
- Debug logging configured (5 pts)
- Exclusions properly managed (5 pts)
- Field weights customized (5 pts)

**Score: \_\_\_ / 100**

> **Target**: 80+ for production deployment **Minimum**: 60+ for staging/testing

Migration from v2.x
-------------------

[](#migration-from-v2x)

### 🔄 Automatic Migration

[](#-automatic-migration)

The extension includes automatic fallback mechanisms:

1. **TF-IDF Processing**: Falls back to old cosine similarity if nlp\_tools fails
2. **Language Detection**: Falls back to TypoScript mapping if site detection fails
3. **Text Processing**: Falls back to basic stop word removal if stemming fails

### 📋 Migration Checklist

[](#-migration-checklist)

1. **Install nlp\_tools dependency**:

    ```
    composer require cywolf/nlp-tools
    ```
2. **Update TypoScript configuration** (add new NLP settings):

    ```
    plugin.tx_semanticsuggestion_suggestions.settings {
        enableStemming = 1
        languageMapping {
            0 = en
            1 = de
            # ... your language mapping
        }
    }

    ```
3. **Clear TYPO3 cache**:

    ```
    ./vendor/bin/typo3 cache:flush
    ```
4. **Regenerate similarities** (run Scheduler task):

    - All existing similarities will be recalculated with TF-IDF
    - German content should see immediate improvement
    - Check debug logs to verify language detection
5. **Monitor performance**:

    - Check backend module for accuracy improvements
    - Compare similarity scores before/after migration
    - Verify multilingual sites work correctly

### 🆘 Rollback Plan

[](#-rollback-plan)

If you experience issues, you can disable advanced features:

```
plugin.tx_semanticsuggestion_suggestions.settings {
    enableStemming = 0           # Disable stemming
    debugMode = 1                # Enable debug logging
    minTextLength = 200          # Increase minimum text length
}

```

The extension will automatically fall back to v2.x behavior while maintaining database compatibility.

Contributing
------------

[](#contributing)

Contributions are welcome! Fork the repository, create a branch, make your changes, and submit a Pull Request.

License
-------

[](#license)

This project is licensed under the GNU General Public License v2.0 or later. See the [LICENSE](LICENSE) file.

Support
-------

[](#support)

**Contact**: Wolfangel Cyril () **Bugs &amp; Features**: [GitHub Issues](https://github.com/friteuseb/semantic_suggestion/issues)**Documentation &amp; Updates**: [GitHub Repository](https://github.com/friteuseb/semantic_suggestion)

###  Health Score

46

—

FairBetter than 92% of packages

Maintenance68

Regular maintenance activity

Popularity27

Limited adoption so far

Community17

Small or concentrated contributor base

Maturity62

Established project with proven stability

 Bus Factor1

Top contributor holds 53.1% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~16 days

Total

32

Last Release

192d ago

Major Versions

1.5.2.x-dev → 2.0.02025-03-27

2.3.0 → 3.1.02025-09-18

1.1.0-beta → 3.2.02025-10-08

3.3.0 → 4.0.02025-12-24

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/1471455?v=4)[Cyril Wolfangel](/maintainers/friteuseb)[@friteuseb](https://github.com/friteuseb)

---

Top Contributors

[![friteuseb](https://avatars.githubusercontent.com/u/1471455?v=4)](https://github.com/friteuseb "friteuseb (136 commits)")[![cyril-wolfangelTalanCOP](https://avatars.githubusercontent.com/u/220198517?v=4)](https://github.com/cyril-wolfangelTalanCOP "cyril-wolfangelTalanCOP (108 commits)")[![Gloumi](https://avatars.githubusercontent.com/u/6616874?v=4)](https://github.com/Gloumi "Gloumi (6 commits)")[![VincentFoulon80](https://avatars.githubusercontent.com/u/26797476?v=4)](https://github.com/VincentFoulon80 "VincentFoulon80 (5 commits)")[![colinatkins](https://avatars.githubusercontent.com/u/3305205?v=4)](https://github.com/colinatkins "colinatkins (1 commits)")

### Embed Badge

![Health badge](/badges/talan-hdf-semantic-suggestion/health.svg)

```
[![Health](https://phpackages.com/badges/talan-hdf-semantic-suggestion/health.svg)](https://phpackages.com/packages/talan-hdf-semantic-suggestion)
```

###  Alternatives

[friendsoftypo3/content-blocks

TYPO3 CMS Content Blocks - Content Types API | Define reusable components via YAML

103519.9k53](/packages/friendsoftypo3-content-blocks)[wazum/sluggi

TYPO3 extension for URL slug management with inline editing, auto-sync, locking, access control, and redirects

40529.5k](/packages/wazum-sluggi)[typo3/cms-redirects

TYPO3 CMS Redirects - Create manual redirects, list existing redirects and automatically createredirects on slug changes.

167.4M80](/packages/typo3-cms-redirects)[typo3/cms-form

TYPO3 CMS Form - Flexible TYPO3 frontend form framework that comes with a backend editor interface.

147.6M266](/packages/typo3-cms-form)[typo3/cms-sys-note

TYPO3 CMS System Notes - Records with messages which can be placed on any page and contain instructions or other information related to a page or section.

116.3M38](/packages/typo3-cms-sys-note)[typo3/cms-impexp

TYPO3 CMS Import/Export - Tool for importing and exporting records using XML or the custom T3D format.

106.8M86](/packages/typo3-cms-impexp)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
