PHPackages                             yaroslavpopovic/laravel-text-chunker - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. yaroslavpopovic/laravel-text-chunker

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

yaroslavpopovic/laravel-text-chunker
====================================

Flexible, strategy-based text chunking for Laravel — fork of droath/laravel-text-chunker with Laravel 13 support.

v2.0.0(1mo ago)012MITPHPPHP ^8.3

Since May 10Pushed 1mo agoCompare

[ Source](https://github.com/yaroslavpopovic/laravel-text-chunker)[ Packagist](https://packagist.org/packages/yaroslavpopovic/laravel-text-chunker)[ Docs](https://github.com/droath/laravel-text-chunker)[ GitHub Sponsors](https://github.com/Droath)[ RSS](/packages/yaroslavpopovic-laravel-text-chunker/feed)WikiDiscussions main Synced 1w ago

READMEChangelogDependencies (13)Versions (2)Used By (0)

Laravel Text Chunker
====================

[](#laravel-text-chunker)

[![Latest Version on Packagist](https://camo.githubusercontent.com/ec76f783d7a3e447d8802b703e29f60edc302159ae17910f8acf923d8542260f/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f64726f6174682f6c61726176656c2d746578742d6368756e6b65722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/droath/laravel-text-chunker)[![GitHub Tests Action Status](https://camo.githubusercontent.com/782f562b9a69fdfa704ff32292c84c3b8da7e4e060026ab33a6042e3c4b245fe/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f64726f6174682f6c61726176656c2d746578742d6368756e6b65722f72756e2d74657374732e796d6c3f6272616e63683d6d61696e266c6162656c3d7465737473267374796c653d666c61742d737175617265)](https://github.com/droath/laravel-text-chunker/actions?query=workflow%3Arun-tests+branch%3Amain)[![GitHub Code Style Action Status](https://camo.githubusercontent.com/04baf95a4447bb9b38fc7cc9b3a5d17bbc55dd6c9329d694fd68a9bccf6e782d/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f64726f6174682f6c61726176656c2d746578742d6368756e6b65722f7068707374616e2e796d6c3f6272616e63683d6d61696e266c6162656c3d636f64652532307374796c65267374796c653d666c61742d737175617265)](https://github.com/droath/laravel-text-chunker/actions?query=workflow%3A%22Fix+PHP+code+style+issues%22+branch%3Amain)[![Total Downloads](https://camo.githubusercontent.com/69087039cf8595d841465a813b377c83e487c63570530dab4207c9216d058eea/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f64742f64726f6174682f6c61726176656c2d746578742d6368756e6b65722e7376673f7374796c653d666c61742d737175617265)](https://packagist.org/packages/droath/laravel-text-chunker)

A Laravel package that provides flexible, strategy-based text chunking capabilities for AI/LLM applications. Split text into smaller segments using character count, token count, sentence boundaries, or markdown-aware strategies with a fluent, Laravel-friendly API.

Perfect for:

- Optimizing API calls to LLM providers like OpenAI by chunking text to fit token limits
- Implementing RAG (Retrieval-Augmented Generation) systems with context-aware chunks
- Preserving markdown structure when splitting documentation or content
- Creating custom text splitting logic for domain-specific needs

Requirements
------------

[](#requirements)

- PHP 8.3 or higher
- Laravel 11.x or 12.x

Installation
------------

[](#installation)

Install the package via Composer:

```
composer require droath/laravel-text-chunker
```

The package will automatically register itself via Laravel's auto-discovery.

### Configuration

[](#configuration)

Optionally, publish the configuration file:

```
php artisan vendor:publish --tag="text-chunker-config"
```

This will create a `config/text-chunker.php` file where you can customize default settings:

```
return [
    // Default strategy to use when none is specified
    'default_strategy' => 'character',

    // Strategy-specific configurations
    'strategies' => [
        'token' => [
            // Default OpenAI model for token encoding
            'model' => 'gpt-4',
        ],
        'sentence' => [
            // Abbreviations that should not trigger sentence breaks
            'abbreviations' => ['Dr', 'Mr', 'Mrs', 'Ms', 'Prof', 'Sr', 'Jr'],
        ],
    ],

    // Register custom strategies here
    'custom_strategies' => [
        // 'my-strategy' => \App\TextChunking\MyCustomStrategy::class,
    ],
];
```

Basic Usage
-----------

[](#basic-usage)

### Character-Based Chunking

[](#character-based-chunking)

Split text at exact character count boundaries:

```
use Droath\TextChunker\Facades\TextChunker;

$text = "Your long text content here...";

$chunks = TextChunker::strategy('character')
    ->size(100)
    ->chunk($text);

foreach ($chunks as $chunk) {
    echo "Chunk {$chunk->index}: {$chunk->text}\n";
    echo "Position: {$chunk->start_position} to {$chunk->end_position}\n";
}
```

### Token-Based Chunking

[](#token-based-chunking)

Split text by OpenAI token count (perfect for API optimization):

```
use Droath\TextChunker\Facades\TextChunker;

$text = "Your long text content here...";

$chunks = TextChunker::strategy('token')
    ->size(500) // 500 tokens per chunk
    ->chunk($text);

// Use different OpenAI model for encoding
$chunks = TextChunker::strategy('token', ['model' => 'gpt-3.5-turbo'])
    ->size(500)
    ->chunk($text);
```

**Supported Models:**

- `gpt-4`
- `gpt-3.5-turbo`
- `text-davinci-003`
- And other models supported by the tiktoken library

### Sentence-Based Chunking

[](#sentence-based-chunking)

Split text at sentence boundaries:

```
use Droath\TextChunker\Facades\TextChunker;

$text = "First sentence. Second sentence. Third sentence.";

$chunks = TextChunker::strategy('sentence')
    ->size(2) // 2 sentences per chunk
    ->chunk($text);

// Custom abbreviations
$chunks = TextChunker::strategy('sentence', [
        'abbreviations' => ['Dr', 'Mr', 'Mrs', 'Ph.D']
    ])
    ->size(3)
    ->chunk($text);
```

### Markdown-Aware Chunking

[](#markdown-aware-chunking)

Preserve markdown structure when chunking:

```
use Droath\TextChunker\Facades\TextChunker;

$markdown = size(100) // Target size in characters
    ->chunk($markdown);

// Markdown elements (code blocks, headers, lists, blockquotes, horizontal rules)
// are never split in the middle, even if they exceed the chunk size
```

Advanced Features
-----------------

[](#advanced-features)

### Overlap for Context Preservation

[](#overlap-for-context-preservation)

Add percentage-based overlap between chunks to maintain context (ideal for RAG systems):

```
use Droath\TextChunker\Facades\TextChunker;

$text = "Your long text content here...";

$chunks = TextChunker::strategy('character')
    ->size(100)
    ->overlap(20) // 20% overlap between chunks
    ->chunk($text);

// Each chunk will include 20% of the previous chunk's content
```

Overlap works with all strategies:

- **Character strategy**: 20% of characters overlap
- **Token strategy**: 20% of tokens overlap
- **Sentence strategy**: 20% of sentences overlap (rounded)
- **Markdown strategy**: 20% overlap while preserving element boundaries

### Chunk Value Objects

[](#chunk-value-objects)

Each chunk is returned as an immutable value object with metadata:

```
$chunks = TextChunker::strategy('character')->size(100)->chunk($text);

foreach ($chunks as $chunk) {
    $chunk->text;             // The chunk text content
    $chunk->index;            // Zero-based index (0, 1, 2, ...)
    $chunk->start_position;   // Character offset in original text (inclusive)
    $chunk->end_position;     // Character offset in original text (exclusive)
}
```

### Using the Manager Directly

[](#using-the-manager-directly)

Instead of the facade, you can inject the manager:

```
use Droath\TextChunker\TextChunkerManager;

class MyService
{
    public function __construct(
        protected TextChunkerManager $chunker
    ) {}

    public function processText(string $text): array
    {
        return $this->chunker
            ->strategy('token')
            ->size(500)
            ->overlap(10)
            ->chunk($text);
    }
}
```

Custom Strategies
-----------------

[](#custom-strategies)

Create your own chunking strategies by implementing the `ChunkerStrategyInterface`:

### Step 1: Create Strategy Class

[](#step-1-create-strategy-class)

```
