PHPackages                             tekintian/html-cleaner - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. tekintian/html-cleaner

ActiveLibrary[Utility &amp; Helpers](/categories/utility)

tekintian/html-cleaner
======================

A powerful PHP tool for cleaning HTML content generated by Typora editor. Removes redundant spaces, useless attributes, and optimizes HTML structure while preserving content integrity.

1.0.0(6mo ago)01MITPHPPHP &gt;=7.2.0

Since Nov 15Pushed 6mo agoCompare

[ Source](https://github.com/tekintian/html-cleaner)[ Packagist](https://packagist.org/packages/tekintian/html-cleaner)[ Docs](https://github.com/tekintian/html-cleaner)[ RSS](/packages/tekintian-html-cleaner/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (1)Versions (2)Used By (0)

HTML Cleaner
============

[](#html-cleaner)

A powerful PHP tool for cleaning HTML content generated by Typora editor. Removes redundant spaces, useless attributes, and optimizes HTML structure while preserving content integrity.

Features
--------

[](#features)

- **Remove Inline Styles**: Eliminates all inline `style` attributes
- **Clean Typora Attributes**: Removes Typora-specific attributes like `cid`, `mdtype`, `md-inline`, etc.
- **Optimize Class Attributes**: Filters out `md-*` classes while preserving useful ones
- **Remove Empty Tags**: Cleans up empty `span` and `div` tags
- **Simplify Tag Structure**: Optimizes nested tag structures
- **Clean Whitespace**: Removes redundant spaces and normalizes whitespace
- **Process Code Blocks**: Preserves language classes and removes `br` tags in `pre` tags
- **External Link Handling**: Adds `target="_blank"` to external links
- **Auto Tag Links**: Automatically adds links to specified keywords
- **Environment Aware**: Smart debug output control based on environment (dev, testing, prod)
- **Configurable Behavior**: Environment variables for customizing behavior

Installation
------------

[](#installation)

### Composer Installation

[](#composer-installation)

```
composer require tekintian/html-cleaner
```

### Manual Installation

[](#manual-installation)

Download the package and include the autoloader:

```
require_once 'vendor/autoload.php';
```

Usage
-----

[](#usage)

### Basic Usage

[](#basic-usage)

```
use tekintian\HtmlCleaner\HtmlCleaner;

// Clean HTML string
$dirtyHtml = 'Content';
$cleanHtml = HtmlCleaner::clean($dirtyHtml);

// Clean HTML file
$cleanedHtml = HtmlCleaner::cleanFile('input.html', 'output.html');
```

### With Custom Tag Links

[](#with-custom-tag-links)

```
use tekintian\HtmlCleaner\HtmlCleaner;

$tagLinks = [
    'PHP' => 'https://www.php.net/manual/en/',
    'JavaScript' => 'https://developer.mozilla.org/en-US/docs/Web/JavaScript',
    'Python' => 'https://docs.python.org/3/',
];

$cleanedHtml = HtmlCleaner::clean($html, $tagLinks);
```

### Using Individual Processing Methods

[](#using-individual-processing-methods)

```
use tekintian\HtmlCleaner\HtmlCleaner;

// Custom processing pipeline
$html = HtmlCleaner::unifiedAttributeProcessing($dirtyHtml);
$html = HtmlCleaner::removeEmptyTags($html);
$html = HtmlCleaner::simplifyTags($html);
$html = HtmlCleaner::cleanWhitespace($html);

// Skip specific steps if not needed
// $html = HtmlCleaner::removeUselessAttributes($html);
// $html = HtmlCleaner::processPreTags($html);

// Apply custom processing between steps
$html = str_replace('', '', $html);

$cleanedHtml = $html;
```

### Debug Output Control

[](#debug-output-control)

```
// Set APP_DEBUG environment variable to control debug output
putenv('APP_DEBUG=true'); // Shows debug output
// putenv('APP_DEBUG=false'); // No debug output (default)

// Alternative: Use APP_ENV for backward compatibility
putenv('APP_ENV=dev'); // Also shows debug output

use tekintian\HtmlCleaner\HtmlCleaner;

$html = HtmlCleaner::clean($dirtyHtml);
// With debug enabled: Shows processing progress
// With debug disabled: Silent operation
```

API Reference
-------------

[](#api-reference)

### Main Methods

[](#main-methods)

#### `HtmlCleaner::clean(string $html, array|null $tagLinks = null): string`

[](#htmlcleanercleanstring-html-arraynull-taglinks--null-string)

Cleans HTML content and returns the cleaned version.

**Parameters:**

- `$html`: HTML content to clean
- `$tagLinks`: Optional tag link configuration array \[keyword =&gt; URL\]

**Returns:** Cleaned HTML content

#### `HtmlCleaner::cleanFile(string $inputFile, string|null $outputFile = null, array|null $tagLinks = null): string`

[](#htmlcleanercleanfilestring-inputfile-stringnull-outputfile--null-arraynull-taglinks--null-string)

Cleans an HTML file and saves the result.

**Parameters:**

- `$inputFile`: Input file path
- `$outputFile`: Output file path (auto-generated if null)
- `$tagLinks`: Optional tag link configuration array \[keyword =&gt; URL\]

**Returns:** Cleaned HTML content

**Throws:** Exception if file operations fail

### Individual Processing Methods

[](#individual-processing-methods)

#### `HtmlCleaner::unifiedAttributeProcessing(string $html): string`

[](#htmlcleanerunifiedattributeprocessingstring-html-string)

Processes HTML attributes in a unified manner (combining multiple loops).

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with processed attributes

#### `HtmlCleaner::removeEmptyTags(string $html): string`

[](#htmlcleanerremoveemptytagsstring-html-string)

Removes empty span and div tags from HTML content.

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with empty tags removed

#### `HtmlCleaner::simplifyTags(string $html): string`

[](#htmlcleanersimplifytagsstring-html-string)

Simplifies tag structure by optimizing nested tags.

**Parameters:**

- `$html`: HTML content to process

**Returns:** Simplified HTML content

#### `HtmlCleaner::cleanAllTagSpaces(string $html): string`

[](#htmlcleanercleanalltagspacesstring-html-string)

Cleans all redundant spaces within HTML tags.

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with cleaned tag spaces

#### `HtmlCleaner::removeUselessAttributes(string $html): string`

[](#htmlcleanerremoveuselessattributesstring-html-string)

Removes useless attributes from HTML content.

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with useless attributes removed

#### `HtmlCleaner::cleanWhitespace(string $html): string`

[](#htmlcleanercleanwhitespacestring-html-string)

Cleans whitespace characters and normalizes formatting.

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with cleaned whitespace

#### `HtmlCleaner::processPreTags(string $html): string`

[](#htmlcleanerprocesspretagsstring-html-string)

Processes pre tags, preserves language class names and removes br tags.

**Parameters:**

- `$html`: HTML content to process

**Returns:** HTML content with processed pre tags

Processing Steps
----------------

[](#processing-steps)

The cleaner performs the following operations in sequence:

1. **Unified Attribute Processing**: Combines multiple attribute processing loops
2. **Empty Tag Removal**: Removes empty `span` and `div` tags
3. **Tag Structure Simplification**: Optimizes nested tag structures
4. **Space Cleaning**: Removes redundant spaces in tags
5. **Useless Attribute Removal**: Eliminates empty and unnecessary attributes
6. **Whitespace Normalization**: Cleans up whitespace characters
7. **Pre Tag Processing**: Handles code blocks and language classes
8. **External Link Processing**: Adds `target="_blank"` to external links
9. **Tag Link Addition**: Automatically adds links to specified keywords

Configuration
-------------

[](#configuration)

### Environment Variables

[](#environment-variables)

- `APP_DEBUG`: Set to `true` or `1` to enable debug output (recommended)
- `APP_ENV`: Environment mode (dev, testing, prod) - also controls debug output for backward compatibility
- `HTML_ADD_TAG_LINK`: Set to `true` to enable automatic tag linking
- `HTTP_HOST`: Current host for external link detection (default: 'dev.tekin.cn')
- `REMOVE_HTML_SPAN`: Set to `true` to remove all span tags (aggressive mode)

### Debug Control Behavior

[](#debug-control-behavior)

SettingDebug OutputUse Case`APP_DEBUG=true` or `1`✅ EnabledDevelopment and debugging`APP_ENV=dev` or `testing`✅ EnabledBackward compatibility`APP_DEBUG=false` or unset❌ DisabledProduction deployment**Note:** `APP_DEBUG` takes precedence over `APP_ENV` for debug control.

### Customizing External Link Detection

[](#customizing-external-link-detection)

Override the `getCurrentHost()` method to customize external link detection:

```
class CustomHtmlCleaner extends HtmlCleaner {
    private static function getCurrentHost() {
        return 'your-domain.com'; // Custom host for external link detection
    }
}
```

Performance
-----------

[](#performance)

The tool is optimized for performance:

- **Efficient Regex Patterns**: Uses optimized regular expressions
- **Single Pass Processing**: Combines multiple operations where possible
- **Memory Efficient**: Processes large files with minimal memory usage

Examples
--------

[](#examples)

### Before Cleaning

[](#before-cleaning)

```

    Title

```

### After Cleaning

[](#after-cleaning)

```

    Title

```

File Structure
--------------

[](#file-structure)

```
html-cleaner/
├── HtmlCleaner.php          # Main cleaner class
├── index.php                # Usage example
├── readme.md               # English documentation
├── readme_zh.md            # Chinese documentation
└── tests/                  # Test files
    ├── 1.html              # Original HTML file
    ├── f1.html             # Template file
    ├── final_cleaned.html  # Cleaned HTML
    └── f1_final.html       # Final template with cleaned content

```

Testing
-------

[](#testing)

### Running Tests

[](#running-tests)

The project includes comprehensive unit tests to ensure code quality and functionality. To run the tests:

```
# Install dependencies (if not already installed)
composer install

# Run all tests
./vendor/bin/phpunit tests/

# Run specific test file
./vendor/bin/phpunit tests/HtmlCleanerTest.php

# Run tests with detailed output
./vendor/bin/phpunit --verbose tests/
```

### Test Coverage

[](#test-coverage)

The test suite covers:

- **Basic HTML Cleaning**: Core functionality testing
- **Debug Output Control**: Environment-based debug behavior
- **Individual Processing Methods**: Each public method has dedicated tests
- **File Operations**: File input/output handling
- **Environment Variables**: Configuration-based behavior
- **Complex HTML Structures**: Advanced HTML processing scenarios
- **Error Handling**: Exception and edge case testing

### Test Environment

[](#test-environment)

Tests are configured to run in PHP 7.2+ environments and include:

- **Environment Management**: Proper setup and teardown of environment variables
- **Output Buffering**: Testing debug output behavior
- **File System Operations**: Temporary file creation and cleanup
- **Mock Data**: Comprehensive test cases with various HTML inputs

### Continuous Integration

[](#continuous-integration)

To integrate testing into your development workflow:

```
# Example GitHub Actions configuration
name: PHP Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup PHP
        uses: shivammathur/setup-php@v2
        with:
          php-version: '7.2'
      - name: Install dependencies
        run: composer install --prefer-dist --no-progress
      - name: Execute tests
        run: ./vendor/bin/phpunit tests/
```

### Test Examples

[](#test-examples)

See the `tests/` directory for complete test implementations, including:

- `HtmlCleanerTest.php`: Main test class with 23 test methods
- Test files demonstrating various HTML cleaning scenarios
- Examples of custom processing pipelines using individual methods

Browser Compatibility
---------------------

[](#browser-compatibility)

The cleaned HTML is compatible with all modern browsers and maintains semantic structure.

SEO Benefits
------------

[](#seo-benefits)

- **Reduced File Size**: Smaller HTML files load faster
- **Clean Markup**: Search engines can better understand content structure
- **Semantic HTML**: Preserves meaningful tag structure

License
-------

[](#license)

This project is open source and available under the MIT License.

Support
-------

[](#support)

For issues and feature requests, please visit the [GitHub repository](https://github.com/tekintian/html-cleaner).

Contributing
------------

[](#contributing)

Contributions are welcome! Please feel free to submit pull requests or open issues for discussion.

Changelog
---------

[](#changelog)

### Version 1.1

[](#version-11)

- Added comprehensive English documentation
- Improved code comments and documentation
- Enhanced tag link functionality
- Better external link detection

### Version 1.0

[](#version-10)

- Initial release with core cleaning functionality

###  Health Score

27

—

LowBetter than 49% of packages

Maintenance68

Regular maintenance activity

Popularity1

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity30

Early-stage or recently created project

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Unknown

Total

1

Last Release

185d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/798ee906929abfb6fc44fb15400d7f0058dc60ed0b14367a331250f980075da3?d=identicon)[tekintian](/maintainers/tekintian)

---

Top Contributors

[![tekintian](https://avatars.githubusercontent.com/u/10243043?v=4)](https://github.com/tekintian "tekintian (1 commits)")

---

Tags

htmlcleaneroptimizerhtml minificationtyporahtml-cleanerhtml-optimization

###  Code Quality

TestsPHPUnit

### Embed Badge

![Health badge](/badges/tekintian-html-cleaner/health.svg)

```
[![Health](https://phpackages.com/badges/tekintian-html-cleaner/health.svg)](https://phpackages.com/packages/tekintian-html-cleaner)
```

###  Alternatives

[spatie/laravel-html

A fluent html builder

8376.4M72](/packages/spatie-laravel-html)[ckeditor/ckeditor

JavaScript WYSIWYG web text editor.

5234.2M76](/packages/ckeditor-ckeditor)[caxy/php-htmldiff

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML.

21320.9M15](/packages/caxy-php-htmldiff)[yajra/laravel-datatables-html

Laravel DataTables HTML builder plugin

2899.6M48](/packages/yajra-laravel-datatables-html)[wa72/htmlpagedom

jQuery-inspired DOM manipulation extension for Symfony's Crawler

3383.9M34](/packages/wa72-htmlpagedom)[tinymce/tinymce

Web based JavaScript HTML WYSIWYG editor control.

1697.5M106](/packages/tinymce-tinymce)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
