PHPackages                             ksanyok/text-humanize - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Localization &amp; i18n](/categories/localization)
4. /
5. ksanyok/text-humanize

ActiveLibrary[Localization &amp; i18n](/categories/localization)

ksanyok/text-humanize
=====================

Zero-dependency PHP library for algorithmic text humanization — transforms machine-generated text into natural prose

0.33.0(3w ago)26629proprietaryPythonPHP &gt;=8.1CI passing

Since Feb 18Pushed 3w agoCompare

[ Source](https://github.com/ksanyok/TextHumanize)[ Packagist](https://packagist.org/packages/ksanyok/text-humanize)[ Docs](https://github.com/ksanyok/TextHumanize)[ Fund](https://www.paypal.com/cgi-bin/webscr?cmd=_donations&business=ksanyok%40me.com&item_name=TextHumanize&currency_code=USD)[ RSS](/packages/ksanyok-text-humanize/feed)WikiDiscussions main Synced today

READMEChangelog (10)Dependencies (6)Versions (26)Used By (0)

TextHumanize
============

[](#texthumanize)

### The most advanced open-source text naturalization engine

[](#the-most-advanced-open-source-text-naturalization-engine)

**Transform AI-generated text into clearer, more natural prose — with proprietary PHANTOM™, ASH™, and SentenceValidator™ technologies**

**Reduce built-in AI-like style signals · 25 languages · 38-stage adaptive pipeline · 100% offline · Zero dependencies**

**External AI detector results are not guaranteed.** TextHumanize improves style, readability, and internal risk signals; it is not a bypass guarantee.

[![Python 3.9+](https://camo.githubusercontent.com/ab527689c28f0b061f17a149c01e47af10f2d7aaee6731e8bdb0850ba0fd0aa1/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707974686f6e2d332e392b2d3337373641422e7376673f6c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465)](https://www.python.org/downloads/)![TypeScript](https://camo.githubusercontent.com/e57d6d9ba66a844a8e33f1b092e91db683baf626aeeadb67a9659b144f18b458/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f547970655363726970742d352e782d3331373843362e7376673f6c6f676f3d74797065736372697074266c6f676f436f6c6f723d7768697465)[![PHP 8.1+](https://camo.githubusercontent.com/91c80da0012793f4e9798108fcb32049324a407f159b77e02726231b577e0054/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f7068702d382e312b2d3737374242342e7376673f6c6f676f3d706870266c6f676f436f6c6f723d7768697465)](https://www.php.net/) [![CI](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml)[![Tests](https://camo.githubusercontent.com/c61976b70dc296771d37f4f0a383b6836ceafe531e4c891a68f03292c6bfea06/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f74657374732d323236392532307061737365642d3265613434662e7376673f6c6f676f3d707974657374266c6f676f436f6c6f723d7768697465)](https://github.com/ksanyok/TextHumanize/actions/workflows/ci.yml) ![Zero Dependencies](https://camo.githubusercontent.com/8f55ef5aea13011a3bf62bc50776a147c57b71f1b11cb50252d61b48e5c88d7d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f646570656e64656e636965732d7a65726f2d627269676874677265656e2e737667)[![PyPI](https://camo.githubusercontent.com/cf361c45a6106112922d48573810a98512a03cfc1cee3bda2afb76cc17b575c9/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707970692d76302e33332e302d3337373541392e7376673f6c6f676f3d70797069266c6f676f436f6c6f723d7768697465)](https://pypi.org/project/texthumanize/)[![License](https://camo.githubusercontent.com/1a2baa04eb96a588dcce9bf0bdd0a08813bb0e10421b39c7a1e2737b57a0f70a/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f6c6963656e73652d4475616c2532302846726565253230253242253230436f6d6d65726369616c292d626c75652e737667)](LICENSE)

**240,000+ lines of code** · **131 Python modules** · **38-stage pipeline** · **25 languages + universal** · **2,269 tests**

**3 proprietary technologies:** PHANTOM™ (gradient-guided internal score optimization) · ASH™ (adaptive signature humanization) · SentenceValidator™ (interstage quality gate)

[Quick Start](#-quick-start) · [Proprietary Technologies](#-proprietary-technologies) · [Before &amp; After](#-before--after-examples) · [Features](#-feature-matrix) · [Benchmarks](#-performance--benchmarks) · [AI Detection](#-ai-detection-engine) · [API Reference](#-api-reference) · [Documentation](https://ksanyok.github.io/TextHumanize/) · [Live Demo](https://texthumanize.link/) · [License](#-license--pricing)

---

Table of Contents
-----------------

[](#table-of-contents)

- [Why TextHumanize?](#-why-texthumanize)
- [Proprietary Technologies](#-proprietary-technologies)
- [Installation](#-installation)
- [Quick Start](#-quick-start)
- [Private Offline Workflow](#-private-offline-workflow)
- [Before &amp; After Examples](#-before--after-examples)
- [Feature Matrix](#-feature-matrix)
- [Comparison with Competitors](#-comparison-with-competitors)
- [Processing Pipeline](#-processing-pipeline-38-stages)
- [AI Detection Engine](#-ai-detection-engine)
- [API Reference](#-api-reference)
- [Profiles &amp; Presets](#-profiles--style-presets)
- [Language Support](#-language-support)
- [NLP Infrastructure](#-nlp-infrastructure)
- [SEO Mode](#-seo-mode)
- [Readability Metrics](#-readability-metrics)
- [Paraphrasing Engine](#-paraphrasing-engine)
- [Tone Analysis &amp; Adjustment](#-tone-analysis--adjustment)
- [Watermark Detection &amp; Cleaning](#-watermark-detection--cleaning)
- [Content Spinning](#-content-spinning)
- [Coherence Analysis](#-coherence-analysis)
- [Morphological Engine](#-morphological-engine)
- [Stylistic Fingerprinting](#-stylistic-fingerprinting)
- [Auto-Tuner](#-auto-tuner-feedback-loop)
- [Plugin System](#-plugin-system)
- [Using Individual Modules](#-using-individual-modules)
- [CLI Reference](#-cli-reference)
- [REST API Server](#-rest-api-server)
- [Async API](#-async-api)
- [Performance &amp; Benchmarks](#-performance--benchmarks)
- [Architecture](#-architecture)
- [TypeScript / JavaScript Port](#-typescript--javascript-port)
- [PHP Library](#-php-library)
- [Testing &amp; Quality](#-testing--quality)
- [Security &amp; Limits](#-security--limits)
- [Responsible Use](#-responsible-use)
- [For Business &amp; Enterprise](#-for-business--enterprise)
- [FAQ &amp; Troubleshooting](#-faq--troubleshooting)
- [What's New in v0.33.0](#-whats-new-in-v0330)
- [Contributing](#-contributing)
- [Limitations](#-limitations)
- [Support the Project](#-support-the-project)
- [License &amp; Pricing](#-license--pricing)

---

TextHumanize is a **pure-algorithmic text processing engine** that transforms AI-generated drafts into clearer, more natural prose. Three proprietary technologies — **PHANTOM™** (gradient-guided optimization against TextHumanize's own detector), **ASH™** (adaptive signature humanization), and **SentenceValidator™** (interstage quality control) — drive a 38-stage pipeline that reduces built-in AI-like style signals while preserving meaning. No neural networks, no API keys, no internet — just 235K+ lines of finely tuned rules, dictionaries, and statistical methods.

> **Honest note:** TextHumanize is a style-normalization tool, not an AI-detection bypass tool. It reduces AI-like patterns (formulaic connectors, uniform sentence length, bureaucratic vocabulary) but does not guarantee that processed text will pass external AI detectors. Quality of humanization varies by language and text type. See [Limitations](#-limitations) below.

**Built-in toolkit:** AI Detection (3 detectors) · Paraphrasing · Tone Analysis · Watermark Cleaning · Content Spinning · Coherence Analysis · Readability Scoring · Stylistic Fingerprinting · Auto-Tuner · Perplexity Analysis · Plagiarism Detection · Grammar Check · Morphology Engine · Neural LM · **Async API** · **SSE Streaming**

**Platforms:** Python (full — 131 modules) · TypeScript/JavaScript (core) · PHP (full)

**For business:** SaaS integration · REST API with SSE streaming · Docker deployment · Bulk processing · Custom dictionaries · On-prem enterprise · White-label ready

**Languages:** 🇬🇧 EN · 🇷🇺 RU · 🇺🇦 UK · 🇩🇪 DE · 🇫🇷 FR · 🇪🇸 ES · 🇵🇱 PL · 🇧🇷 PT · 🇮🇹 IT · �🇱 NL · 🇸🇪 SV · 🇨🇿 CS · 🇷🇴 RO · 🇭🇺 HU · 🇩🇰 DA · 🇸🇦 AR · 🇨🇳 ZH · 🇯🇵 JA · 🇰🇷 KO · 🇹🇷 TR · 🇮🇳 HI · 🇻🇳 VI · 🇹🇭 TH · 🇮🇩 ID · 🇮🇱 HE · 🌍 **any language** via universal processor

---

🚀 Why TextHumanize?
-------------------

[](#-why-texthumanize)

> **Problem:** Machine-generated text has uniform sentence lengths, bureaucratic vocabulary, formulaic connectors, and low stylistic diversity — reducing readability, engagement, and brand authenticity.

> **Solution:** TextHumanize algorithmically normalizes text style while preserving meaning. Configurable intensity, deterministic output, full change reports. No cloud APIs, no rate limits, no data leaks.

AdvantageDetails🚀**Blazing fast**300–500 ms for a paragraph; full article in 1–2 seconds🔒**100% private**All processing is local — your text never leaves your machine🎯**Precise control**Intensity 0–100, 9 profiles, 9 idiolect presets, keyword preservation, max change ratio🌍**25 languages**Deep support for EN/RU/UK/DE; dictionaries for 25 languages; statistical processor for any other📦**Zero dependencies**Pure Python stdlib — no pip packages, no model downloads, starts in &lt;100 ms🔁**Reproducible**Seed-based PRNG — same input + same seed = identical output🧠**3-layer AI detection**18-metric heuristic + 35-feature logistic regression + MLP neural detector — no ML framework required🔌**Plugin system**Register custom hooks at any of 38 pipeline stages📊**Full analytics**Readability (6 indices), coherence, plagiarism, stylometric fingerprint, content health score🎭**Tone control**Analyze and adjust formality across 7 levels📚**2,944 dictionary entries**EN 1,733 + RU 1,345 + UK 1,042 + DE 874 + FR 718 + ES 749 + more🏢**Enterprise-ready**Dual license, 2,257+ tests, CI/CD, REST API, Docker, on-prem deployment🛡️**Secure by design**Input limits, zero network calls, linear-time regex, no eval/exec📝**Full auditability**Every call returns `change_ratio`, `quality_score`, `similarity`, `explain()` report---

� Proprietary Technologies
--------------------------

[](#-proprietary-technologies)

TextHumanize includes three original, proprietary technologies not found in any other open-source library:

### PHANTOM™ — Gradient-Guided Text Optimization Engine

[](#phantom--gradient-guided-text-optimization-engine)

**`phantom.py` — 2,943 lines** | An open-source text naturalizer that uses numerical gradient optimization against TextHumanize's own AI detector.

```
Input Text → ORACLE (gradient analysis) → SURGEON (32 surgical ops) → FORGE (iterative optimization) → Output

```

- **ORACLE** computes numerical gradients through the MLP detector via central differences (~70 forward passes, ~1.4ms), producing per-feature contribution analysis and ranked gap reports
- **SURGEON** executes 32 feature-targeted surgical text operations guided by Oracle gradients — rank-based magnitude scheduling focuses effort on highest-impact features first
- **FORGE** runs an iterative optimization loop with combined score tracking, stall detection, adaptive budget escalation, text expansion limits, and post-iteration cleanup
- **Result:** 100% internal pass rate on TextHumanize's built-in detector benchmark (15/15 texts across EN, RU, UK). Processing time: 0.7–1.4s. External detectors use different models and are not guaranteed.

```
result = humanize("AI text...", lang="en", phantom=True)  # Enable PHANTOM™
result = humanize_until_human("AI text...", lang="en")     # Auto-iterates against the built-in score
```

### ASH™ — Adaptive Signature Humanization

[](#ash--adaptive-signature-humanization)

**`ash_engine.py` + `signature_transfer.py` + `perplexity_sculptor.py`** | Statistically transforms text to match real human writing signatures.

```
AI Text → Feature Extraction → Human Profile Matching → Signature Transfer → Perplexity Sculpting → Human-like Text

```

- **Human Profiles** — statistical fingerprints of real human writing per language and corpus profile (sentence length distribution, connector variety, hedge words, colloquial turns, punctuation habits)
- **Signature Transfer** — morphs AI text's statistical signature toward the target human profile
- **Perplexity Sculpting** — adjusts word-level perplexity to match human perplexity distribution curves
- **Metric Gaps** — identifies and systematically closes the gap between AI and human writing on 35+ features

```
from texthumanize import ASHEngine, list_corpus_profiles

print(list_corpus_profiles()["support"]["aliases"])
ash = ASHEngine(lang="en", pipeline_profile="support")
result = ash.humanize("AI text...", preset="balanced")
```

### SentenceValidator™ — Interstage Quality Gate

[](#sentencevalidator--interstage-quality-gate)

**`sentence_validator.py` — 350 lines** | Catches and eliminates artifacts between pipeline stages in real-time.

```
Stage N → SentenceValidator (10 checks) → Stage N+1 → SentenceValidator (10 checks) → ...

```

- **10 checks per sentence:** duplicate words (`the the`), broken contractions (`do n't`), orphaned punctuation, double conjunctions (`and and`), dangling conjunctions, unterminated parentheses, triple+ character repeats, fragment chains, conjunction chains, empty sentences
- **7 validation checkpoints** between pipeline stages — catches artifacts the moment they appear
- **Language-aware** — recognizes conjunctions in EN, RU, UK, DE, FR, ES
- **Final sanitization** — post-pipeline cleanup removes residual artifacts that survive all stages

---

�📦 Installation
---------------

[](#-installation)

```
pip install texthumanize
```

**From source:**

```
git clone https://github.com/ksanyok/TextHumanize.git
cd TextHumanize && pip install -e .
```

> **Tip:** Pin your version for production: `pip install texthumanize==0.33.0`

**PHP / TypeScript**```
# PHP
cd php/ && composer install

# TypeScript
cd js/ && npm install
```

---

⚡ Quick Start
-------------

[](#-quick-start)

```
from texthumanize import (
    humanize, analyze, detect_ai, detect_ai_explain, quality_score_report, explain,
)

# 1. Humanize text
result = humanize(
    "Furthermore, it is important to note that this approach facilitates optimization.",
    lang="en",
    seed=42,
)
print(result.text)           # Normalized text
print(result.change_ratio)   # 0.50 — proportion of text changed
print(result.quality_score)  # Quality metric
print(result.metrics_after["humanize_explain"]["remaining_risks"])

# 2. Control with profiles and intensity
result = humanize(text, lang="en", profile="web", intensity=70)
strict = humanize(text, lang="en", quality_gate="strict")
minimal = humanize(text, lang="en", minimal=True)

# 3. AI Detection — 3-layer ensemble
ai = detect_ai("Text to check for AI generation.", lang="en")
print(f"AI: {ai['score']:.0%} | {ai['verdict']} | Confidence: {ai['confidence']:.0%}")

# 3b. Explainable AI audit
audit = detect_ai_explain("Furthermore, it is important to note...", lang="en")
print(audit["highlighted_spans"])

# 3c. Unified TextHumanize Quality Score (0..1 + letter grade)
quality = quality_score_report(result.text, original=result.original, lang="en")
print(f"Quality: {quality['score']:.2f} ({quality['grade']}) — {quality['verdict']}")
print(quality["strengths"], quality["recommendations"][:1])

# 4. Text analysis
report = analyze("Text to analyze.", lang="en")
print(f"Artificiality: {report.artificiality_score:.1f}/100")

# 5. Full change report
print(explain(result))
```

Private Offline Workflow
------------------------

[](#private-offline-workflow)

For privacy-sensitive content, use the local audit -&gt; safe cleanup -&gt; strict humanize -&gt; audit pattern. It keeps processing offline, preserves critical terms, and records review metrics without using cloud APIs.

```
python examples/private_offline_workflow.py
```

The example uses `backend="local"`, `quality_gate="strict"`, `minimal=True`, brand/identifier preservation, and a socket guard that raises if any code tries to open a network connection. See the full [Private Offline Workflow guide](https://ksanyok.github.io/TextHumanize/getting-started/private-offline-workflow/).

### All Features at a Glance

[](#all-features-at-a-glance)

```
from texthumanize import (
    # Core humanization
    humanize, humanize_batch, humanize_chunked, humanize_ai,
    humanize_batch_stream, humanize_until_human, humanize_sentences, humanize_stream,
    humanize_variants,
    # AI detection
    detect_ai, detect_ai_explain, detect_ai_batch, detect_ai_sentences,
    detect_ai_mixed, audit_report, quality_score_report,
    # NLP tools
    paraphrase, analyze_tone, adjust_tone,
    detect_watermarks, clean_watermarks, watermark_report,
    # Media (image/audio/video) watermark & provenance forensics
    detect_media_watermarks, clean_media_watermarks,
    spin, spin_variants,
    analyze_coherence, full_readability,
    # Advanced
    build_author_profile, compare_fingerprint,
    detect_ab, evasion_resistance, adversarial_calibrate,
    anonymize_style,
    # Infrastructure
    AutoTuner, BenchmarkSuite, STYLE_PRESETS,
)

# Paraphrasing — syntactic transforms
print(paraphrase("The system works efficiently.", lang="en"))

# Tone — 7-level formality scale
tone = analyze_tone("Please submit the documentation.", lang="en")
casual = adjust_tone("It is imperative to proceed.", target="casual", lang="en")

# Watermarks — detect and remove hidden characters
clean = clean_watermarks("Te\u200bxt wi\u200bth hid\u200bden chars")
wm = watermark_report("Te\u200bxt wi\u200bth hid\u200bden chars", lang="en")

# Spinning — generate N variants
variants = spin_variants("Original text.", count=5, lang="en")

# Batch + chunked processing
results = humanize_batch(["Text 1", "Text 2"], lang="en", max_workers=4)
result = humanize_chunked(large_doc, chunk_size=3000, lang="ru")
for item in humanize_batch_stream(texts, lang="en", memory_limit_mb=128):
    print(item["index"], item["result"].text)

# Iterative humanization — keep rewriting until AI score drops
result = humanize_until_human("AI text", lang="en", target_score=0.35)

# Streaming — process paragraphs as they arrive
for chunk in humanize_stream("Long text...", lang="en"):
    print(chunk, end="", flush=True)

# Stylistic fingerprinting
profile = build_author_profile("Author's sample text...", lang="en")
similarity = compare_fingerprint("New text", profile)

# Style anonymization
anon = anonymize_style("Text with distinctive style", lang="en")

# Async API
from texthumanize import async_humanize, async_detect_ai
result = await async_humanize("Text to process", lang="en")
ai = await async_detect_ai("Text to check", lang="en")
```

---

🔄 Before &amp; After Examples
-----------------------------

[](#-before--after-examples)

### English

[](#english)

**Before (AI-generated, AI score: 94%):**

> Furthermore, it is important to note that the implementation of cloud computing facilitates the optimization of business processes. Additionally, the utilization of microservices constitutes a significant advancement. Moreover, the integration of artificial intelligence into the workflow enhances decision-making processes and contributes to overall organizational efficiency.

**After (TextHumanize, profile="web", intensity=60, AI score: 23%):**

> Also, importantly, the implementation of cloud computing helps the tuning of business processes. Up a major advancement, additionally, the use of microservices makes. And, the merge of artificial intelligence into the workflow enhances decision-making processes; and, contributes to overall organizational speed.

```
AI score: 94% → 23%  (reduction: 71 percentage points)

```

### Russian

[](#russian)

**Before (AI score: 80%):**

> Необходимо отметить, что данная методология обеспечивает существенное повышение эффективности рабочих процессов. Кроме того, внедрение инновационных технологий способствует оптимизации функционирования организации. Более того, использование искусственного интеллекта позволяет значительно улучшить процесс принятия решений.

**After (AI score: 5%):**

> Важно — что данная метод даёт существенное повышение эффективности рабочих процессов! Впрочем, смотрите, внедрение инновационных технологий помогает оптимизации функционирования организации, значительно, к тому же, использование искусственного интеллекта позволяет улучшить процесс принятия решений.

```
AI score: 80% → 5%  (reduction: 75 percentage points)

```

### Ukrainian

[](#ukrainian)

**Before (AI score: 75%):**

> Необхідно зазначити, що дана методологія забезпечує суттєве підвищення ефективності робочих процесів. Крім того, впровадження інноваційних технологій сприяє оптимізації функціонування організації. Більш того, використання штучного інтелекту дозволяє значно покращити процес прийняття рішень.

**After (AI score: 17%):**

> Важливо, що ця метод дає суттєве підвищення ефективності робочих процесів; в принципі, впровадження інноваційних технологій веде до оптимізації функціонування організації. До того ж, використання штучного інтелекту дає змогу сильно покращити процес прийняття рішень.

```
AI score: 75% → 17%  (reduction: 58 percentage points)

```

### AI Score Reduction Summary

[](#ai-score-reduction-summary)

LanguageBeforeAfterReductionMode**English**94%2%**-92pp**web/70**English**94%23%**-71pp**web/60**Russian**80%5%**-75pp**web/50**Ukrainian**75%17%**-58pp**web/50> **Built-in AI detector scores.** Results measured with TextHumanize's 3-layer ensemble (heuristic + statistical + MLP neural). External detectors may produce different results.

### Profile Comparison (EN, intensity=50)

[](#profile-comparison-en-intensity50)

ProfileChange RatioQualityAI Score After`web`0.500.20**27%** 🟢`chat`0.610.20**27%** 🟢`marketing`0.480.25**27%** 🟢`seo`0.480.2533% 🟢`formal`0.480.2429% 🟢`academic`0.480.2429% 🟢> **Input AI score: 94%** — all profiles bring it below 35%.

---

🧩 Feature Matrix
----------------

[](#-feature-matrix)

CategoryFeaturePythonJSPHP**Core**`humanize()` — 38-stage pipeline✅✅✅`humanize_batch()` — parallel processing✅—✅`humanize_chunked()` — large text support✅—✅`humanize_ai()` — three-tier AI + rules✅——`humanize_until_human()` — iterative✅——`humanize_sentences()` — per-sentence✅——`humanize_stream()` — streaming✅——`humanize_variants()` — N output variants✅——`analyze()` — artificiality scoring✅✅✅`explain()` — change report✅—✅**AI Detection**`detect_ai()` — 3-layer ensemble✅✅✅`detect_ai_batch()` — batch detection✅——`detect_ai_sentences()` — per-sentence✅——`detect_ai_mixed()` — mixed content✅——`StatisticalDetector` — 35-feature LR✅——`NeuralAIDetector` — MLP (pure Python)✅——**NLP**`paraphrase()` — syntactic transforms✅—✅`POSTagger` — rule-based POS (4 langs)✅——`HMMTagger` — Viterbi HMM tagger✅——`CJKSegmenter` — zh/ja/ko segmentation✅——`SyntaxRewriter` — 8+ sentence transforms✅——`WordLanguageModel` — perplexity (14 langs)✅——`NeuralPerplexity` — LSTM char-level LM✅——`CollocEngine` — PMI scoring + replacement guard✅——`MorphologyEngine` — 4 languages✅——`WordVec` — lightweight word vectors✅——**Tone**`analyze_tone()` — formality analysis✅—✅`adjust_tone()` — 7-level adjustment✅—✅**Watermarks**`detect_watermarks()` — 6 types✅—✅`clean_watermarks()` — removal✅—✅**Spinning**`spin()` / `spin_variants()`✅—✅**Analysis**`analyze_coherence()` — paragraph flow✅—✅`full_readability()` — 6 indices✅—✅`check_grammar()` — rule-based (9 langs)✅——`uniqueness_score()` — plagiarism check✅——`content_health()` — composite 0–100✅——`semantic_similarity()` — TF-IDF cosine✅——`sentence_readability()` — per-sentence✅——Stylistic fingerprinting✅——**Quality**`BenchmarkSuite` — 6-dimension scoring✅——`FingerprintRandomizer` — anti-detection✅——`QualityGate` — CI/CD content check✅——**Advanced**Idiolect presets (9 personas)✅——Auto-Tuner (feedback loop)✅——AI backend (OpenAI/Ollama/OSS)✅——Custom dictionary overlays✅——Domain dictionaries (SaaS/ecommerce/etc.)✅——Dictionary trainer (corpus)✅——Neural network training loop✅——Dashboard (HTML reports)✅——Plugin system✅—✅REST API (OpenAPI + SSE)✅——SSE streaming✅——CLI (15+ commands)✅——**Languages**Full dictionary support14214Universal processor✅✅✅---

⚔️ Comparison with Competitors
------------------------------

[](#️-comparison-with-competitors)

### vs. Online Humanizers &amp; GPT/LLM Rewriting

[](#vs-online-humanizers--gptllm-rewriting)

CriterionTextHumanizeOnline HumanizersGPT/LLM RewritingWorks offline✅❌❌Privacy✅ 100% local❌ Third-party servers❌ Cloud APISpeed**~300 ms/paragraph**2–10 sec (network)~500 chars/secCost per 1M chars**$0**$10–50/month$15–60 (GPT-4)API key requiredNoYesYesDeterministic✅ Seed-based❌❌Languages**25 + universal**1–310+ but expensiveBuilt-in AI detector✅ 3-layer ensemble❌ or basic❌Max change control✅ `max_change_ratio`❌❌ UnpredictableOpen source✅❌❌Self-hosted✅ Docker / pip❌❌Audit trail✅ `explain()`❌❌### vs. Other Open-Source Libraries

[](#vs-other-open-source-libraries)

FeatureTextHumanizeTypical AlternativesPipeline stages**38**2–4Languages**25 + universal**1–2AI detection✅ 3-layer (18 + 35 + MLP)❌Python tests**2,248**10–50Codebase size**240,000+ lines**500–2KPlatformsPython + JS + PHPSinglePlugin system✅❌Tone analysis✅ 7 levels❌REST API✅ OpenAPI + SSE❌Readability metrics✅ 6 indices0–1Morphological engine✅ 4 languages❌Neural componentsMLP + LSTM + HMM❌Content spinning✅ spintax❌Stylistic fingerprinting✅❌Grammar checker✅ 9 languages❌Plagiarism detection✅ n-gram❌### vs. AI Detectors (GPTZero, Originality.ai)

[](#vs-ai-detectors-gptzero-originalityai)

FeatureTextHumanizeGPTZeroOriginality.aiPrice**Free**From $10/moFrom $14.95/moWorks offline✅❌❌Self-hosted✅❌❌Per-sentence detection✅✅✅Mixed-content detection✅✅❌Combined humanize + detect✅❌❌Custom training✅ `dict_trainer`❌❌API✅ REST + SSE✅ REST✅ RESTBatch detection✅✅ (paid)✅ (paid)CI/CD quality gate✅ `quality_gate.py`❌❌---

🔧 Processing Pipeline (38 Stages)
---------------------------------

[](#-processing-pipeline-38-stages)

```
Input Text
  │
  ├── ASH™ Pre-Processing (3 stages) ──
  ├─ [A1] ASH Signature Analysis     Analyze input statistical fingerprint
  ├─ [A2] ASH Feature Extraction     Extract 35+ features for adaptive tuning
  ├─ [A3] ASH Intensity Calibration  Auto-calibrate intensity per-feature
  │
  ├── Core Pipeline (28 stages) ──
  ├─ [0]  Watermark Cleaning         Remove zero-width chars, homoglyphs, invisible Unicode
  ├─ [1]  Segmentation               Protect URLs, code blocks, emails, brand terms
  ├─ [2]  Typography                 Normalize quotes, dashes, spaces (profile-aware)
  ├─ [2c] CJK Segmentation           Word segmentation for Chinese/Japanese/Korean
  ├─ [3]  Debureaucratization        Replace official/formulaic phrases with natural ones
  ├─ [4]  Structure Diversification   Vary sentence patterns, replace AI connectors
  ├─ [5]  Repetition Reduction       Remove tautology, vary repeated words
  ├─ [6]  Liveliness Injection       Add conversational markers, colloquialisms
  ├─ [7]  Semantic Paraphrasing      Voice transforms, clause reordering, nominalization reversal
  ├─ [7b] Syntax Rewriting           Active↔passive, fronting, cleft, conditional inversion
  │       └─ ✓ SentenceValidator checkpoint
  ├─ [8]  Tone Harmonization         Align vocabulary register to target profile
  ├─ [9]  Universal Processing       Language-agnostic statistical transforms
  ├─ [10] Naturalization             Core 3,444-line rule engine: AI-word swap, burstiness
  │       └─ ✓ SentenceValidator checkpoint
  ├─ [10a] Paraphrase Engine         MWE decomposition, hedging, perspective rotation
  │        └─ ✓ SentenceValidator checkpoint
  ├─ [10a½] Sentence Restructuring   Contractions, register mixing, rhetorical questions
  │         └─ ✓ SentenceValidator checkpoint
  ├─ [10b] Word LM Quality Gate      Bigram/trigram naturalness check (advisory)
  ├─ [10c] Entropy Injection         Increase statistical burstiness and entropy
  │        └─ ✓ SentenceValidator checkpoint
  ├─ [11] Readability Optimization   Split/merge sentences to match profile length targets
  ├─ [12] Grammar Correction         Grammar polish with safety gates (25 languages)
  │       └─ ✓ SentenceValidator checkpoint
  ├─ [13] Coherence Repair           Transitional phrases, paragraph flow repair
  │       └─ ✓ SentenceValidator checkpoint
  ├─ [13a] Entropy Injection (2nd)   Final entropy pass for high-intensity processing
  ├─ [13b] Fingerprint Randomizer    Anti-stylometric diversification
  ├─ [14] Validation                 Change ratio check, keyword preservation, AI regression guard
  ├─ [14a] Final Sanitization        Double conjunction, dangling conjunction, chain residue cleanup
  │
  ├── Post-Pipeline (8 stages) ──
  ├─ [P1] Detector-in-the-loop       Score check, up to 3 retry iterations
  ├─ [P2] LLM-assisted rewrite       Optional, if backend configured
  ├─ [P3] Regression guard           Hard constraint enforcement
  ├─ [P4] PHANTOM™ optimization      Gradient-guided internal score refinement (optional)
  │
  ├── ASH™ Post-Processing (3 stages) ──
  ├─ [A4] ASH Signature Transfer     Apply target human signature
  ├─ [A5] ASH Perplexity Sculpting   Match human perplexity distribution
  ├─ [A6] ASH Final Verification     Verify output matches target profile
  │
  └─ Output

```

**Adaptive intensity:** Auto-reduces processing for already-natural text. **Graduated retry:** Retries at lower intensity if change ratio exceeds the limit. **SentenceValidator™:** 7 interstage checkpoints catch artifacts between stages (10 checks per sentence). **Tier system:** Tier 1 languages (EN/RU/UK/DE) get all 38 stages. Tier 2 (FR/ES/IT/PL/PT/NL/SV/CS/RO/HU/DA) get ~30. Tier 3 (AR/ZH/JA/KO/TR/HI/VI/TH/ID/HE) get ~20 + universal.

---

🧠 AI Detection Engine
---------------------

[](#-ai-detection-engine)

Three independent detectors combined into a single score:

### Architecture

[](#architecture)

```
              ┌──────────────────────────────┐
              │       Input Text             │
              └──────────┬───────────────────┘
                         │
          ┌──────────────┼──────────────────┐
          ▼              ▼                  ▼
  ┌───────────────┐ ┌────────────────┐ ┌──────────────┐
  │  Heuristic    │ │  Statistical   │ │    Neural     │
  │  Detector     │ │  Detector      │ │   Detector    │
  │  (18 metrics) │ │  (35 features) │ │  (MLP, pure)  │
  └───────┬───────┘ └───────┬────────┘ └──────┬───────┘
          │                 │                  │
          └─────────────────┼──────────────────┘
                            ▼
              ┌──────────────────────────────┐
              │    Weighted Ensemble          │
              │  + Strong-signal detector     │
              │  + Majority voting            │
              └──────────────────────────────┘
                            │
                            ▼
              Score (0–100%), Verdict, Confidence

```

### 18 Heuristic Metrics

[](#18-heuristic-metrics)

\#MetricWhat It Measures1**Entropy**Character/word-level Shannon entropy2**Burstiness**Sentence/paragraph length variability (humans vary, AI doesn't)3**Vocabulary**TTR, MATTR, Yule's K, hapax legomena ratio4**Zipf**Fit to Zipf's law distribution5**Stylometry**Function word patterns, punctuation fingerprint6**AI Patterns**Formulaic phrases ("it is important to note", "furthermore")7**Punctuation**Punctuation distribution profile8**Coherence**Paragraph uniformity (too-uniform = AI)9**Grammar**Grammatical "perfection" level (too-perfect = AI)10**Openings**Sentence-opening diversity11**Readability**Consistency of readability scores across sentences12**Rhythm**Syllable patterns, sentence length rhythm13**Perplexity**N-gram predictability14**Discourse**Discourse structure (topic sentences, markers)15**Semantic Repetition**Cross-paragraph semantic overlap16**Entity**Specificity of named entities and examples17**Voice**Passive vs. active voice ratio18**Topic Sentence**Topic-sentence-per-paragraph pattern### 35-Feature Statistical Detector (Logistic Regression)

[](#35-feature-statistical-detector-logistic-regression)

CategoryFeaturesLexical (4)Type-token ratio, hapax ratio, avg word length, word length varianceSentence (3)Mean sentence length, length variance, length skewnessVocabulary (3)Yule's K, Simpson's diversity, vocabulary richnessN-gram (3)Bigram/trigram repetition rates, unique bigram ratioEntropy (3)Character entropy, word entropy, bigram entropyBurstiness (2)Sentence burstiness, vocabulary burstinessStructural (3)Paragraph count, avg paragraph length, list/bullet ratioPunctuation (5)Comma, semicolon, dash, question, exclamation ratesAI Pattern (1)AI pattern rate (**strongest single feature**, weight −2.10)Perplexity (2)Word frequency rank variance, Zipf fit residualReadability (2)Syllables/word, Flesch score normalizedDiscourse (3)Starter diversity, conjunction rate, transition word rateRhythm (1)Consecutive length difference variance### Neural MLP Detector

[](#neural-mlp-detector)

Feed-forward neural network entirely in pure Python (no PyTorch, no TensorFlow). Pre-trained weights shipped as compressed JSON (54 KB).

### Verdicts

[](#verdicts)

ScoreVerdictMeaning&lt; 35%`human_written`Likely written by a human35–65%`mixed`Mixed content or uncertain≥ 65%`ai_generated`Likely AI-generated### Detection Modes

[](#detection-modes)

```
# Single text
result = detect_ai("Text to check.", lang="en")
print(f"{result['score']:.0%} — {result['verdict']}")

# Per-sentence detection
for s in detect_ai_sentences(text, lang="en"):
    print(f"{'🤖' if s['label'] == 'ai' else '👤'} [{s['score']:.0%}] {s['text'][:80]}")

# Mixed-content detection (human + AI paragraphs)
report = detect_ai_mixed(text, lang="en")
for segment in report['segments']:
    print(f"{segment['label']}: {segment['text'][:60]}")

# Batch detection
results = detect_ai_batch(["Text 1", "Text 2", "Text 3"], lang="en")

# Offline detector benchmark: human vs raw/lightly/heavily edited AI
from texthumanize import detector_benchmark, index_eval_corpus, load_eval_corpus

corpus = load_eval_corpus(include_metadata=True)
print(corpus["license"]["id"])  # CC0-1.0

support_fixtures = load_eval_corpus(
    languages=["en"],
    domains=["support"],
    length_buckets=["lt_300"],
    sources=["text-humanize-authored-synthetic"],
)
print([sample["id"] for sample in support_fixtures])
print(index_eval_corpus()["counts"]["domain"])

report = detector_benchmark(languages=["en", "ru", "uk"])
print(report["per_language"]["en"]["avg_score_by_label"])

# Contributor data packs: AI markers, synonyms, collocations, watermark samples
from texthumanize import list_contributor_packs, load_contributor_pack

print(list_contributor_packs().keys())
support_synonyms = load_contributor_pack("synonyms", domains=["support"])
print(support_synonyms["entries"][0]["replacements"])
```

```
# Build a manual AI-marker review table from the licensed corpus
python scripts/update_marker_packs.py \
  --review-out marker_pack_review.md \
  --candidates-out marker_pack_candidates.json

# After a reviewer marks rows as approved, merge only those rows
python scripts/update_marker_packs.py --apply-reviewed marker_pack_review.md
```

---

📖 API Reference
---------------

[](#-api-reference)

### `humanize(text, lang, **kwargs) → HumanizeResult`

[](#humanizetext-lang-kwargs--humanizeresult)

ParameterTypeDefaultDescription`text``str`—Input text (max 1 MB)`lang``str`—Language code: `en`, `ru`, `uk`, `de`, etc.`profile``str``"web"`Processing profile: `chat`, `web`, `seo`, `docs`, `formal`, `academic`, `marketing`, `social`, `email`, plus intent aliases `seo_article`, `landing_page`, `product_description`, `support_reply`, `legal`, `social_post``intensity``int``50`Aggressiveness 0–100`seed``int``None`PRNG seed for reproducibility`preserve``dict``{}`Protect code, URLs, email, dates, prices, ids, quotes, named entities, brand terms`minimal``bool``False`Only humanize AI-flagged sentences`max_change_ratio``float``None`Maximum allowed proportion of change (0.0–1.0)`constraints``dict``{}`Advanced constraints (`keep_keywords`, etc.)`quality_gate``str``None`Use `"strict"` to rollback on similarity, grammar, or readability regression`backend``str``None`LLM backend: `"openai"`, `"ollama"`, `"oss"`, `"auto"`**Returns `HumanizeResult`:**

FieldTypeDescription`.text``str`Processed text`.change_ratio``float`Proportion of text changed (0.0–1.0)`.quality_score``float`Quality metric`.similarity``float`Semantic similarity to original`.metrics_after["humanize_explain"]``dict`Top 5 change reasons, top 5 remaining risks, sentence-level risk deltas`.metrics_after["anti_overhumanize"]``dict`Final guard report for stacked fillers, repeated discourse markers, and excessive `!` / `?` punctuation`.stages``list`Stages applied with timing### Other Humanization Modes

[](#other-humanization-modes)

```
# Batch — parallel processing with thread pool
results = humanize_batch(texts, lang="en", max_workers=4)
bounded = humanize_batch(texts, lang="en", memory_limit_mb=128)

# Chunked — split large documents
result = humanize_chunked(large_doc, chunk_size=3000, lang="ru", memory_limit_mb=128)

# Until human — loop until AI score drops below threshold
result = humanize_until_human(text, lang="en", target_score=0.35, max_iterations=5)

# Streaming — paragraph by paragraph
for chunk in humanize_stream(text, lang="en", memory_limit_mb=128):
    print(chunk, end="", flush=True)

# Variants — generate N different versions
variants = humanize_variants(text, lang="en", count=5)

# Sentences — humanize each sentence individually
results = humanize_sentences(text, lang="en")
```

### `detect_ai(text, lang) → dict`

[](#detect_aitext-lang--dict)

FieldDescription`score`AI probability (0.0–1.0)`verdict``"human_written"`, `"mixed"`, or `"ai_generated"``confidence`Confidence level (0.0–1.0)`metrics`Individual metric scores (18 heuristic + 35 statistical)`combined_score`Weighted average of all detectors### Other Core Functions

[](#other-core-functions)

FunctionDescription`analyze(text, lang)`Returns `AnalysisReport` with artificiality score, sentence stats`explain(result)`Human-readable change report`paraphrase(text, lang)`Syntactic paraphrasing (voice transforms, connector shuffling)`analyze_tone(text, lang)`Tone analysis (formality, style)`adjust_tone(text, target, lang)`Adjust formality to 7 levels`detect_ai_explain(text, lang)`Explainable AI detector report with spans and suggested actions`audit_report(text, lang)`Combined AI + watermark audit JSON`quality_score_report(text, original=None, lang)`Unified Quality Score (0..1 + letter grade) across 7 dimensions`detect_media_watermarks(path_or_bytes)`Image/audio/video AI-watermark &amp; provenance audit (C2PA, generator signatures, stego)`clean_media_watermarks(path_or_bytes, output=...)`Strip provenance/metadata from PNG/JPEG/WebP/WAV (honest about neural watermarks)`detect_watermarks(text)`Detect 6 types of invisible watermarks`clean_watermarks(text)`Remove all detected watermarks`watermark_report(text, lang)`Unified Unicode + statistical watermark report`spin(text, lang)`Generate a single spun variant`spin_variants(text, count, lang)`Generate N spun variants`analyze_coherence(text, lang)`Paragraph flow analysis`full_readability(text, lang)`6 readability indices`build_author_profile(text, lang)`Stylometric fingerprint`compare_fingerprint(text, profile)`Compare text to an author profile`anonymize_style(text, lang)`Stylometric anonymization`check_grammar(text, lang)`Grammar check (9 languages)`uniqueness_score(text)`N-gram uniqueness`content_health(text, lang)`Composite quality score 0–100---

🎭 Profiles &amp; Style Presets
------------------------------

[](#-profiles--style-presets)

### Processing Profiles

[](#processing-profiles)

ProfileUse CaseSentence LengthColloquialismsDefault Intensity`chat`Messaging, social media8–18 wordsHigh80`web`Blog posts, articles10–22 wordsMedium60`seo`SEO content (keyword-safe)12–25 wordsNone40`docs`Technical documentation12–28 wordsNone50`formal`Legal, official15–30 wordsNone30`academic`Research papers15–30 wordsNone25`marketing`Sales, promo copy8–20 wordsMedium70`social`Social media posts6–15 wordsHigh85`email`Business emails10–22 wordsMedium50### Style Presets (9 Idiolects)

[](#style-presets-9-idiolects)

PresetSentencesVocabularyStyle🎓 `student`Short–mediumSimpleConversational, informal✍️ `copywriter`Varied (short bursts + long)DynamicEnergetic, varied rhythm🔬 `scientist`Long, complexTechnicalFormal, precise, cautious hedging📰 `journalist`Medium, diverseClearNeutral, fact-oriented💬 `blogger`Short, punchyInformalQuestions, exclamations, personal🧑‍💼 `editor` / `редактор`MediumTightClear, restrained, polished🚀 `founder` / `основатель`VariedDirectConfident, personal, strategic🧠 `expert` / `эксперт`Medium–longDomain-heavyPractical, evidence-led🎧 `support` / `поддержка`ShortSimpleHelpful, calm, service-oriented```
from texthumanize import STYLE_PRESETS

result = humanize(text, lang="en", profile="seo", intensity=40,
                  constraints={"keep_keywords": ["API", "cloud"]})

result = humanize(text, lang="en", target_style="редактор")
```

### Intensity Levels

[](#intensity-levels)

RangeEffectUse Case0–20Minimal — typography and watermarks onlyAlready-natural text21–40Light — connectors and basic synonym swapSEO, formal content41–60Moderate — structure + paraphrasingBlog posts, web content61–80Aggressive — syntax rewriting + entropyChat, social media81–100Maximum — all transforms at full powerHeavy AI text---

🌍 Language Support
------------------

[](#-language-support)

### Language Tiers

[](#language-tiers)

TierLanguagesDetectionHumanizationSyntax Rewriting**1**EN, RU, UK, DE✅ Full✅ Full 38-stage✅**2**FR, ES, IT, PL, PT✅ Good✅ 15-stage❌**3**AR, ZH, JA, KO, TR✅ Basic✅ 10-stage + universal❌0Any other language✅ Statistical✅ Universal processor❌### Dictionary Coverage

[](#dictionary-coverage)

LanguageCodeSynonymsBureaucraticAI ConnectorsSentence StartersColloquialCollocationsEnglish`en`431645152751271,578Russian`ru`26948610073102408Ukrainian`uk`24333875468638German`de`138361655488125French`fr`141224614986128Spanish`es`166230604978126Polish`pl`15924760467834Portuguese`pt`16320460517936Italian`it`16823163497938Arabic`ar`126139654059—Chinese`zh`127137513859—Japanese`ja`120123664159—Korean`ko`118120673959—Turkish`tr`119122674359—**Universal processor** works for any language using statistical methods — burstiness injection, perplexity normalization, sentence length variation, punctuation diversification.

---

🧬 NLP Infrastructure
--------------------

[](#-nlp-infrastructure)

TextHumanize includes a full NLP stack — all implemented in pure Python with **zero external dependencies:**

ModuleComponentDescription`pos_tagger.py`**POS Tagger** (1,917 lines)Rule-based part-of-speech tagger with suffix/prefix rules for EN/RU/UK/DE`hmm_tagger.py`**HMM Tagger** (642 lines)Viterbi-decoding Hidden Markov Model for POS tagging`cjk_segmenter.py`**CJK Segmenter** (1,277 lines)Forward/backward max-match Chinese, particle-stripping Korean, character-type Japanese`morphology.py`**Morphology Engine** (811 lines)Suffix-based stemming and inflection for RU/UK/EN/DE`collocation_engine.py`**Collocation Engine** (224 lines)PMI-based collocation scoring for context-aware synonym selection`word_lm.py`**Word Language Model** (435 lines)Bigram/trigram with compressed frequency data for 25 languages`neural_lm.py`**Neural Char-Level LM** (391 lines)LSTM-based character language model for perplexity scoring`neural_engine.py`**Neural Primitives** (610 lines)Feed-forward net, LSTM cell, embeddings, HMM, layer norm, GELU — all in stdlib`neural_paraphraser.py`**Seq2Seq Paraphraser** (752 lines)Encoder-decoder with Bahdanau attention for neural paraphrasing`word_embeddings.py`**Word Vectors** (399 lines)Hash-based + cluster embeddings, cosine similarity, nearest neighbors`sentence_split.py`**Smart Splitter** (338 lines)Abbreviation-aware sentence splitting (Mr./Dr./URLs/decimals)`lang_detect.py`**Language Detector** (328 lines)Character trigram profiling for 25 languages`context.py`**Contextual Synonyms** (320 lines)Word sense disambiguation via context windows and topic detection`grammar.py`**Grammar Checker** (360 lines)Rule-based grammar for 9 languages (agreement, articles, punctuation)> **Total NLP infrastructure:** ~8,800 lines of code, zero pip dependencies.

---

🔍 SEO Mode
----------

[](#-seo-mode)

TextHumanize includes a dedicated SEO workflow to humanize content without harming search rankings:

```
result = humanize(text, lang="en", profile="seo", intensity=40,
                  constraints={"keep_keywords": ["cloud computing", "API", "microservices"]})
```

FeatureHow It Works**Keyword preservation**`preserve` and `keep_keywords` lists are never modified**Low intensity**SEO profile defaults to 40% — gentle transformations**No keyword stuffing**Does not add or repeat keywords**Structure preservation**Heading hierarchy (H1–H6) preserved**Meta-safe**Avoids changing first-paragraph introductions (critical for SEO)**Max change control**`max_change_ratio=0.3` ensures minimal disruption---

📊 Readability Metrics
---------------------

[](#-readability-metrics)

`full_readability()` returns 6 reading metrics:

IndexRangeWhat It Measures**Flesch Reading Ease**0–100Higher = easier (60–70 is ideal for web)**Flesch-Kincaid Grade**0–18US school grade level**Coleman-Liau Index**0–18Based on characters (not syllables)**Automated Readability Index**0–14Character and word counts**SMOG Grade**0–18Polysyllabic word density**Gunning Fog**0–20Complex words + sentence length**Grade interpretation:**

GradeAudience5–6General public, social media7–8Web content, blog posts9–10Magazine articles11–12Academic papers13+Technical/legal documents```
from texthumanize import full_readability

report = full_readability("Your text here.", lang="en")
print(f"Flesch: {report['flesch_reading_ease']:.1f}")
print(f"Grade: {report['flesch_kincaid_grade']:.1f}")
```

---

✍️ Paraphrasing Engine
----------------------

[](#️-paraphrasing-engine)

Rule-based syntactic paraphrasing — no LLM, no API, deterministic:

TransformExampleActive → Passive"The team built the app" → "The app was built by the team"Passive → Active"The report was written by John" → "John wrote the report"Clause reordering"After analyzing data, we decided…" → "We decided… after analyzing data"Nominalization reversal"The implementation of X" → "Implementing X"Connector shuffling"Furthermore, X. Additionally, Y." → "What's more, X. Also, Y."MWE decomposition"take into account" → "consider"Hedging injection"X is true" → "X appears to be true"Perspective rotation"Users need X" → "X is needed by users"```
from texthumanize import paraphrase

result = paraphrase("The implementation of the new system facilitates optimization.", lang="en")
print(result)  # "Implementing the new system helps optimize."
```

---

🎭 Tone Analysis &amp; Adjustment
--------------------------------

[](#-tone-analysis--adjustment)

7-level formality scale with marker-based detection:

LevelNameExample Markers1`slang`"ya", "gonna", "lol", contractions2`casual`"pretty much", "kind of", first person3`neutral`Balanced register4`professional`"regarding", "in accordance with"5`formal`"henceforth", "notwithstanding"6`academic`"thus", "consequently", passive voice7`legal`"hereinafter", "whereas", "pursuant to"```
from texthumanize import analyze_tone, adjust_tone

tone = analyze_tone("Please submit the documentation.", lang="en")
print(f"Formality: {tone['formality']}")  # "professional"

casual = adjust_tone("It is imperative to proceed immediately.", target="casual", lang="en")
print(casual)  # "We should probably get going on this."
```

---

🛡️ Watermark Detection &amp; Cleaning
-------------------------------------

[](#️-watermark-detection--cleaning)

Detects and removes 6 types of invisible text watermarks:

TypeHow It HidesDetection Method**Zero-width characters**U+200B, U+200C, U+200D, U+FEFFUnicode category scanning**Homoglyph substitution**Latin 'a' → Cyrillic 'а'Confusable character mapping**Invisible Unicode**U+2060, U+2061–U+2064Codepoint range check**Directional markers**RTL/LTR overridesBidirectional control detection**Soft hyphens**U+00ADPattern matching**Tag characters**U+E0001–U+E007FUnicode block scanning```
from texthumanize import detect_watermarks, clean_watermarks

report = detect_watermarks("Te\u200bxt with hid\u200bden marks")
print(f"Found: {report.total_watermarks} watermarks of {len(report.types)} types")

clean = clean_watermarks("Te\u200bxt with hid\u200bden marks")
print(clean)  # "Text with hidden marks"
```

---

🔄 Content Spinning
------------------

[](#-content-spinning)

Generate multiple unique variants with spintax support:

```
from texthumanize import spin, spin_variants

# Single variant
variant = spin("The system provides efficient processing.", lang="en")

# Multiple variants
variants = spin_variants("Original text here.", count=5, lang="en")
for i, v in enumerate(variants):
    print(f"Variant {i+1}: {v}")
```

The spinner uses language-pack synonyms, contextual substitution, and sentence restructuring to produce each variant.

---

🔗 Coherence Analysis
--------------------

[](#-coherence-analysis)

Measure paragraph-level text flow:

```
from texthumanize import analyze_coherence

report = analyze_coherence(text, lang="en")
print(f"Overall coherence: {report['score']:.2f}")
for issue in report.get('issues', []):
    print(f"  ⚠️ {issue['type']}: {issue['description']}")
```

MetricWhat It MeasuresParagraph similarityTF-IDF cosine between adjacent paragraphsTransition qualityPresence and appropriateness of connective phrasesTopic continuityKeyword overlap between sectionsReference chainsPronoun and entity co-reference tracking---

🔠 Morphological Engine
----------------------

[](#-morphological-engine)

Rule-based morphology for 4 languages — lemmatization, inflection, declension:

```
from texthumanize import MorphologyEngine, get_morphology

morph = get_morphology("ru")

# Lemmatize
lemma = morph.lemmatize("процессов")     # → "процесс"

# Get forms
forms = morph.get_forms("оптимизация")   # → ["оптимизации", "оптимизацию", ...]
```

LanguageOperationsSuffix RulesRussianLemmatization, declension, conjugation200+ suffix patternsUkrainianLemmatization, declension180+ suffix patternsEnglishLemmatization, pluralization150+ rulesGermanLemmatization, compound splitting120+ rules---

🎨 Stylistic Fingerprinting
--------------------------

[](#-stylistic-fingerprinting)

Extract and compare author stylometric profiles:

```
from texthumanize import build_author_profile, compare_fingerprint, anonymize_style

# Build a profile from samples
profile = build_author_profile("Author's writing sample...", lang="en")
print(f"Avg sentence: {profile.avg_sentence_length:.1f} words")
print(f"Vocabulary richness: {profile.vocabulary_richness:.2f}")

# Compare new text to a known author
similarity = compare_fingerprint("New text to attribute", profile)
print(f"Match: {similarity:.0%}")

# Anonymize style — normalize distinctive patterns
anon = anonymize_style("Text with distinctive style markers", lang="en")
```

**Fingerprint dimensions:** Mean sentence length, length variance, vocabulary richness, function word distribution, punctuation profile, discourse marker usage, passive voice ratio, average word length.

---

🎛️ Auto-Tuner (Feedback Loop)
-----------------------------

[](#️-auto-tuner-feedback-loop)

Automatically optimize intensity and profile based on feedback:

```
from texthumanize import AutoTuner

tuner = AutoTuner()

# Process and get feedback
result = tuner.suggest(text, lang="en")

# Provide feedback — was the result good?
tuner.feedback(result, score=0.8)  # 0.0 = bad, 1.0 = perfect

# Next suggestion will adapt
result2 = tuner.suggest(another_text, lang="en")
```

The tuner uses Bayesian-like optimization to find ideal `(intensity, profile)` combinations for your content type.

---

🔌 Plugin System
---------------

[](#-plugin-system)

Register custom hooks at any of 20 pipeline stages:

```
from texthumanize import Pipeline, humanize

# Function hook
def add_disclaimer(text: str, lang: str) -> str:
    return text + "\n\n---\nProcessed by TextHumanize."

Pipeline.register_hook(add_disclaimer, after="naturalization")
result = humanize("Your text here.", lang="en")
Pipeline.clear_plugins()
```

**Available hook points:** `watermark` → `segmentation` → `typography` → `debureaucratization` → `structure` → `repetitions` → `liveliness` → `paraphrasing` → `syntax_rewriting` → `tone` → `universal` → `naturalization` → `paraphrase_engine` → `sentence_restructuring` → `entropy_injection` → `readability` → `grammar` → `coherence` → `validation` → `restore`

---

🧪 Using Individual Modules
--------------------------

[](#-using-individual-modules)

Every module is independently importable:

```
# POS Tagging
from texthumanize.pos_tagger import POSTagger
tagger = POSTagger("en")
tags = tagger.tag("The cat sat on the mat".split())

# CJK Segmentation
from texthumanize.cjk_segmenter import CJKSegmenter
seg = CJKSegmenter()
words = seg.segment("自然言語処理は面白い", lang="ja")

# Collocation scoring
from texthumanize.collocation_engine import CollocEngine, replacement_is_natural
engine = CollocEngine("en")
score = engine.pmi("heavy", "rain")
best = engine.best_synonym("important", ["crucial", "key"], ["decision"])
safe = replacement_is_natural("heavy", "large", ["rain"], lang="en")  # False

# Perplexity
from texthumanize.word_lm import WordLanguageModel
lm = WordLanguageModel("en")
ppl = lm.word_perplexity("The cat sat on the mat")

# Grammar checking
from texthumanize.grammar import check_grammar
issues = check_grammar("He go to the store yesterday.", lang="en")

# Uniqueness / plagiarism
from texthumanize.uniqueness import uniqueness_score, compare_texts
score = uniqueness_score("Text to check")
sim = compare_texts("Original", "Modified version")

# Content health score
from texthumanize.health_score import content_health
report = content_health("Your article text...", lang="en")
print(f"Health: {report.score}/100")

# Custom dictionary overlay
from texthumanize.dictionaries import load_dict, update_dict
update_dict("en", {"bureaucratic": {"utilize": "use", "facilitate": "help"}})

# Domain dictionaries
from texthumanize import domain_terms_for_text, humanize
sample = "ARR and churn rate improved after onboarding."
terms = domain_terms_for_text(sample, domains="saas")
result = humanize(sample, lang="en", preserve={"domains": ["saas"]})
```

---

💻 CLI Reference
---------------

[](#-cli-reference)

```
# Basic humanization
texthumanize input.txt -l en -p web -i 70 -o output.txt

# AI detection
texthumanize detect input.txt -l en
texthumanize detect input.txt -l en --json
texthumanize explain input.txt -l en --json
texthumanize audit input.txt -l en --json

# With all analysis
texthumanize input.txt -l en --analyze --explain --detect-ai
texthumanize input.txt -l en --quality-gate strict
texthumanize input.txt -l en --fail-under-quality 0.65
texthumanize input.txt -l en --minimal

# Paraphrasing
texthumanize input.txt -l en --paraphrase -o out.txt

# Tone adjustment
texthumanize input.txt -l en --tone casual
texthumanize input.txt -l en --tone-analyze

# Watermark detection
texthumanize input.txt --watermarks
texthumanize watermark input.txt --json

# Content spinning
texthumanize input.txt -l en --spin --variants 5

# Coherence & readability
texthumanize input.txt -l en --coherence --readability

# Start REST API server
texthumanize dummy --api --port 8080

# Train neural detector
texthumanize train --samples 1000 --epochs 50 --output weights/

# Run benchmarks
texthumanize benchmark --json
texthumanize benchmark --json --fail-under-quality 0.60

# Pipe mode
echo "Text to humanize" | texthumanize - -l en

# Keyword preservation
texthumanize input.txt -l en --keep "API,cloud" --brand "TextHumanize"

# Verbose mode with report
texthumanize input.txt -l en --verbose --report report.json
texthumanize input.txt -l en --report report.html
```

### CLI Flags

[](#cli-flags)

FlagDescription`-l`, `--lang`Language code (required)`-p`, `--profile`Processing profile`-i`, `--intensity`Intensity 0–100`-o`, `--output`Output file path`--seed`PRNG seed for reproducibility`--keep`Comma-separated keywords to preserve`--brand`Brand terms to never modify`--max-change`Maximum change ratio (0.0–1.0)`--analyze`Print analysis report`--explain`Print change explanation`--detect-ai`Run AI detection`--audit`Combined AI + watermark audit JSON`--paraphrase`Paraphrase mode`--tone`Adjust tone to target level`--tone-analyze`Analyze current tone`--watermarks`Detect watermarks`--watermark-report`Unified watermark JSON report`--quality-gate``off` or `strict` post-processing guard`--fail-under-quality`Exit with code 2 if `quality_score` or benchmark average is below threshold`--minimal` / `--only-flagged`Only humanize AI-flagged sentences`--spin`Spin mode`--variants N`Number of spin variants`--coherence`Coherence analysis`--readability`Readability metrics`--api`Start REST API server`--port`API server port (default: 8080)`--verbose`Detailed output`--report`Save JSON report, or HTML when the path ends with `.html``--json`JSON output format---

🌐 REST API Server
-----------------

[](#-rest-api-server)

Zero-dependency HTTP server with rate limiting and CORS:

```
python -m texthumanize.api --port 8080
```

For FastAPI deployments, see `examples/fastapi_integration.py`. It includes request body limits, text and batch size limits, per-request timeouts, structured error envelopes with request ids, and `/v1/humanize/batch`.

OpenAPI 3.1 schema is available at `GET /openapi.json` for client generation, contract tests, and API gateway import.

### Endpoints

[](#endpoints)

MethodEndpointDescription`POST``/humanize`Full humanization`POST``/detect-ai`AI detection (single or batch)`POST``/analyze`Text metrics`POST``/paraphrase`Paraphrase text`POST``/tone/analyze`Tone analysis`POST``/tone/adjust`Tone adjustment`POST``/watermarks/detect`Detect watermarks`POST``/watermarks/clean`Remove watermarks`POST``/spin`Content spinning`POST``/spin/variants`Spin N variants`POST``/coherence`Coherence analysis`POST``/readability`Readability metrics`POST``/sse/humanize`SSE streaming humanization`GET``/health`Health check`GET``/openapi.json`OpenAPI 3.1 schema`GET``/`API documentation index`OPTIONS``*`CORS preflight**Rate limit:** 10 req/s per IP, burst 20 · **Max body:** 5 MB

### Example

[](#example)

```
# Humanize
curl -X POST http://localhost:8080/humanize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text here.", "lang": "en", "profile": "web", "intensity": 70}'

# AI detection
curl -X POST http://localhost:8080/detect-ai \
  -H "Content-Type: application/json" \
  -d '{"text": "Text to check.", "lang": "en"}'

# SSE streaming
curl -N http://localhost:8080/sse/humanize \
  -H "Content-Type: application/json" \
  -d '{"text": "Long text...", "lang": "en"}'
```

### Python Client

[](#python-client)

```
import requests

resp = requests.post("http://localhost:8080/humanize", json={
    "text": "Your text",
    "lang": "en",
    "profile": "web"
})
print(resp.json()["text"])
```

---

⚡ Async API
-----------

[](#-async-api)

Native `asyncio` support for all public functions:

```
import asyncio
from texthumanize import async_humanize, async_detect_ai, async_analyze
from texthumanize import async_paraphrase, async_humanize_batch, async_detect_ai_batch

async def main():
    result = await async_humanize("Text to process", lang="en", seed=42)
    print(result.text)

    ai = await async_detect_ai("Text to check", lang="en")
    print(f"AI: {ai['score']:.0%}")

    # Parallel batch
    results = await async_humanize_batch(["Text 1", "Text 2"], lang="en")

asyncio.run(main())
```

---

📈 Performance &amp; Benchmarks
------------------------------

[](#-performance--benchmarks)

All benchmarks on Apple Silicon (M-series), Python 3.12, single thread, after warm-up. See the public [Benchmark Methodology](https://ksanyok.github.io/TextHumanize/benchmark-methodology/)for corpus labels, quality dimensions, latency and tracemalloc peak-memory reporting rules, and detector limitations.

### Speed

[](#speed)

FunctionText SizeAvg Latency`humanize()`~30 words**~5 s**`humanize()`~80 words**~10 s**`humanize(phantom=True)`~80 words**~12 s**`detect_ai()`~30 words**~1 s**`detect_ai()`~80 words**~3 s**`paraphrase()`~80 words**&lt; 1 ms**`analyze_tone()`~80 words**&lt; 1 ms**`analyze()`~80 words**~80 ms**### AI Score Reduction

[](#ai-score-reduction)

```
┌──────────────────────────────────────────────────────────┐
│  TextHumanize v0.33.0 — AI Score Benchmark              │
├──────────────────────────────────────────────────────────┤
│  EN (web/50):    94% → 27%    (reduction: -67pp)        │
│  EN (web/60):    94% → 23%    (reduction: -71pp)        │
│  EN (web/70):    94% →  2%    (reduction: -92pp)        │
├──────────────────────────────────────────────────────────┤
│  RU (web/50):    80% →  5%    (reduction: -75pp)        │
│  UK (web/50):    75% → 17%    (reduction: -58pp)        │
├──────────────────────────────────────────────────────────┤
│  Best result:    EN web/70 — 94% → 2%  (-92pp)          │
└──────────────────────────────────────────────────────────┘

```

### Properties

[](#properties)

PropertyValueCold start**&lt; 100 ms**LRU cache hit**11× faster** than coldExternal network calls0 (offline-first)Deterministic (same seed)✅ AlwaysPipeline timeout30 s (configurable)API rate limiting10 req/s per IP, burst 20Max input size1 MBMemory per call4–200 KB> **Run benchmarks yourself:**
>
> ```
> python benchmarks/run_benchmark.py
> texthumanize benchmark --json
> python scripts/profile_hot_paths.py --sizes 1000,10000,100000 --json
> ```
>
>
>
> The hot-path profiler reports p50/p95 latency plus p50/p95 Python allocation peaks from `tracemalloc`; use `--no-memory` when you need latency-only runs.

---

🏗️ Architecture
---------------

[](#️-architecture)

```
texthumanize/                        # 131 Python modules, 240,000+ lines
├── core.py                          # Facade: 28+ public functions (2,391 lines)
├── pipeline.py                      # 38-stage pipeline + adaptive intensity (1,553 lines)
├── sentence_validator.py            # SentenceValidator™: interstage quality gate (350 lines)
├── phantom.py                       # PHANTOM™: gradient-guided internal score engine (2,943 lines)
├── api.py                           # REST API server, OpenAPI + SSE
├── async_api.py                     # Async wrappers for all functions (200 lines)
├── cli.py                           # CLI (15+ commands) (1,492 lines)
├── exceptions.py                    # Exception hierarchy (77 lines)
│
├── ── Detection & Analysis ──
├── detectors.py                     # AI detector: 18 heuristic metrics (2,441 lines)
├── statistical_detector.py          # 35-feature logistic regression (1,149 lines)
├── neural_detector.py               # MLP neural detector, pure Python (1,094 lines)
├── analyzer.py                      # Artificiality scoring + readability (506 lines)
│
├── ── NLP Infrastructure ──
├── neural_engine.py                 # NN primitives: MLP, LSTM, HMM (610 lines)
├── neural_lm.py                     # LSTM character-level LM (391 lines)
├── neural_paraphraser.py            # Seq2Seq with Bahdanau attention (752 lines)
├── pos_tagger.py                    # Rule-based POS tagger, 4 langs (1,917 lines)
├── hmm_tagger.py                    # Viterbi HMM POS tagger (642 lines)
├── cjk_segmenter.py                 # Chinese/Japanese/Korean segmenter (1,277 lines)
├── morphology.py                    # Morphological engine, 4 langs (1,015 lines)
├── word_lm.py                       # Word-level language model (435 lines)
├── word_embeddings.py               # Lightweight word vectors (399 lines)
├── collocation_engine.py            # PMI collocation scoring (224 lines)
├── sentence_split.py                # Smart sentence splitter (338 lines)
├── lang_detect.py                   # Trigram-based language detection (328 lines)
├── context.py                       # WSD — contextual synonyms (320 lines)
│
├── ── Pipeline Stages ──
├── watermark.py                     # Watermark detection & cleaning (524 lines)
├── segmenter.py                     # URL/code/brand protection (308 lines)
├── normalizer.py                    # Typography normalization (199 lines)
├── decancel.py                      # Debureaucratization (332 lines)
├── structure.py                     # Sentence diversification (319 lines)
├── repetitions.py                   # Repetition reduction (229 lines)
├── liveliness.py                    # Colloquialism injection (171 lines)
├── paraphraser_ext.py               # Semantic paraphrasing (887 lines)
├── syntax_rewriter.py               # Syntax rewriting: 8+ transforms (2,516 lines)
├── tone_harmonizer.py               # Tone alignment (98 lines)
├── universal.py                     # Language-agnostic processor (384 lines)
├── naturalizer.py                   # Core naturalization engine (3,444 lines)
├── paraphrase_engine.py             # MWE, hedging, perspective (1,152 lines)
├── sentence_restructurer.py         # Deep sentence transforms (1,385 lines)
├── entropy_injector.py              # Burstiness + entropy injection (1,187 lines)
├── readability_opt.py               # Readability optimization (274 lines)
├── grammar_fix.py                   # Grammar correction (72 lines)
├── coherence_repair.py              # Coherence repair (446 lines)
├── fingerprint_randomizer.py        # Anti-fingerprint diversification (408 lines)
├── validator.py                     # Quality validation (170 lines)
│
├── ── Extended Features ──
├── tone.py                          # Tone analysis & adjustment (547 lines)
├── paraphrase.py                    # Standalone paraphrasing API (406 lines)
├── spinner.py                       # Content spinning + spintax (370 lines)
├── coherence.py                     # Coherence analysis (357 lines)
├── grammar_guard.py                 # Grammar Guard with safety gates (616 lines)
├── grammar.py                       # Grammar checker, 25 langs (360 lines)
├── uniqueness.py                    # Plagiarism detection (226 lines)
├── health_score.py                  # Composite content health (188 lines)
├── semantic.py                      # Semantic similarity (145 lines)
├── fingerprint.py                   # Author fingerprinting (371 lines)
├── stylistic.py                     # Stylistic analysis + presets (721 lines)
├── plagiarism.py                    # Plagiarism N-gram check (271 lines)
├── diff_report.py                   # HTML/JSON diff reports (277 lines)
│
├── ── Infrastructure ──
├── autotune.py                      # Auto-tuner feedback loop (259 lines)
├── benchmark_suite.py               # Quality benchmarks (401 lines)
├── training.py                      # Neural training loop (1,264 lines)
├── dict_trainer.py                  # Corpus-based dictionary trainer (293 lines)
├── quality_gate.py                  # CI/CD content quality gate (280 lines)
├── dashboard.py                     # HTML dashboard reports (229 lines)
├── dictionaries.py                  # Custom dictionary overlays (174 lines)
├── ai_backend.py                    # LLM backend: OpenAI/Ollama/OSS (931 lines)
├── ai_markers.py                    # AI marker management (528 lines)
├── gptzero.py                       # GPTZero API integration (372 lines)
├── cache.py                         # Thread-safe LRU cache (93 lines)
│
├── ── Data ──
├── _colloc_data.py                  # PMI collocations (455 lines)
├── _replacement_data.py             # AI-word replacements (957 lines)
├── _word_freq_data.py               # Word frequency data (1,532 lines)
├── weights/                         # Pre-trained model weights (472 KB)
│   ├── detector_weights.json.zb85   # MLP detector (54 KB)
│   └── lm_weights.json.zb85        # LSTM LM (418 KB)
│
└── lang/                            # 25 language packs
    ├── en.py · ru.py · uk.py · de.py (Tier 1 — full pipeline)
    ├── fr.py · es.py · pl.py · pt.py · it.py · nl.py · sv.py · cs.py · ro.py · hu.py · da.py (Tier 2)
    └── ar.py · zh.py · ja.py · ko.py · tr.py · hi.py · vi.py · th.py · id_.py · he.py (Tier 3)

```

**Design principles:**

PrincipleImplementation**Modular**Each stage is a standalone class; every module is independently importable**Zero dependencies**Pure Python stdlib — no pip packages at all**Declarative rules**Language packs are data-only (dicts), no logic in lang files**Idempotent**Running the pipeline twice won't double-transform text**Safe defaults**Works out-of-the-box with sensible profiles**Lazy imports**PEP 562 lazy loading — only imports what you use**Deterministic**Seed-based PRNG for reproducible output**Extensible**Plugin hooks at 38 stages, custom dictionaries, AI backend---

🟦 TypeScript / JavaScript Port
------------------------------

[](#-typescript--javascript-port)

Core TextHumanize functionality in TypeScript for Node.js and browsers:

```
import { humanize, detectAi, analyze } from 'texthumanize';

const result = humanize("Text to process", { lang: "en", profile: "web" });
console.log(result.text);
console.log(result.changeRatio);

const ai = detectAi("Text to check", { lang: "en" });
console.log(`AI: ${(ai.score * 100).toFixed(0)}%`);
```

FeatureStatus`humanize()`✅`detectAi()`✅`analyze()`✅Language packs: EN, RU✅Universal processor✅```
cd js/ && npm install && npm test
```

---

🐘 PHP Library
-------------

[](#-php-library)

Full-featured PHP port with Composer support:

```
