PHPackages                             cloudstudio/laravel-html-crawler - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Parsing &amp; Serialization](/categories/parsing)
4. /
5. cloudstudio/laravel-html-crawler

ActiveLibrary[Parsing &amp; Serialization](/categories/parsing)

cloudstudio/laravel-html-crawler
================================

A Laravel package for cleaning and transforming HTML content with a fluent interface

v1.0.2(1y ago)225[4 PRs](https://github.com/cloudstudio/laravel-html-crawler/pulls)MITPHPPHP ^8.2CI passing

Since Feb 16Pushed 4mo ago1 watchersCompare

[ Source](https://github.com/cloudstudio/laravel-html-crawler)[ Packagist](https://packagist.org/packages/cloudstudio/laravel-html-crawler)[ Docs](https://github.com/cloudstudio/laravel-html-crawler)[ GitHub Sponsors](https://github.com/:cloudstudio)[ RSS](/packages/cloudstudio-laravel-html-crawler/feed)WikiDiscussions main Synced 1mo ago

READMEChangelog (3)Dependencies (14)Versions (8)Used By (0)

Laravel HTML Crawler
====================

[](#laravel-html-crawler)

A Laravel package for cleaning and transforming HTML content. It provides a fluent interface to remove unwanted elements like CSS, scripts, and more, with options to preserve specific elements and even convert the cleaned HTML to Markdown.

Features
--------

[](#features)

- **Remove CSS** (inline styles and `` blocks)
- **Remove JavaScript** (inline scripts and `` blocks)
- **Preserve allowed tags** through a configurable list or helper methods
- **Convert to Markdown** for quick text transformations
- **Custom Regex Patterns** to remove specific parts of the HTML
- **Whitespace Normalization** with an option to preserve newlines

Installation
------------

[](#installation)

Install the package using Composer:

```
composer require cloudstudio/laravel-html-crawler
```

The package will automatically register itself in Laravel.

To publish the configuration file, run:

```
php artisan vendor:publish --provider="CloudStudio\HtmlCrawler\HtmlCrawlerServiceProvider"
```

Usage
-----

[](#usage)

### 1. Basic HTML Cleaning

[](#1-basic-html-cleaning)

By default, the package removes disallowed tags (for example, it will strip `` tags and any tags not explicitly allowed):

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'Hello World';
$cleanHtml = HtmlCrawler::fromHtml($html)->clean();

// Expected output: "Hello World"
```

### 2. Preserving Allowed Tags

[](#2-preserving-allowed-tags)

You can explicitly specify which tags to preserve:

#### Using `setAllowedTags`

[](#using-setallowedtags)

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'Hello World';
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->setAllowedTags(['p', 'a'])
    ->clean();

// Expected output: 'Hello World'
```

#### Using Helper Methods

[](#using-helper-methods)

The package offers helper methods to preserve groups of tags:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'Hello World';
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->keepParagraphs()   // Preserves  tags
    ->keepLinks()        // Preserves  tags
    ->clean();

// Expected output: 'Hello World'
```

### 3. Handling Scripts

[](#3-handling-scripts)

#### Removing `` by Default

[](#removing-script-by-default)

By default, `` blocks are removed:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'alert("x")Test';
$cleanHtml = HtmlCrawler::fromHtml($html)->clean();

// Expected output: "Test"
```

#### Preserving `` with `keepScripts()`

[](#preserving-script-with-keepscripts)

If you wish to keep `` blocks, use the `keepScripts()` method:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'alert("x")Test';
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->keepScripts()
    ->clean();

// Expected output: 'alert("x")Test'
```

### 4. Handling CSS

[](#4-handling-css)

By default, `` blocks and CSS links are removed. To preserve them, use `keepCss()`:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = '.text { color: red; }Styled text';
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->keepCss()
    ->clean();

// Expected output: '.text { color: red; }Styled text'
```

### 5. Using a Custom Regex Pattern

[](#5-using-a-custom-regex-pattern)

If you need to remove specific parts of the HTML using a regular expression:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'Remove meKeep me';
$pattern = '/.*?/';
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->useCustomPattern($pattern)
    ->clean();

// Expected output: 'Keep me'
```

### 6. Converting to Markdown

[](#6-converting-to-markdown)

You can convert the cleaned HTML to Markdown:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = 'TitleParagraph text';
$markdown = HtmlCrawler::fromHtml($html)
    ->withMarkdown()
    ->clean();
```

### 7. Handling Newlines

[](#7-handling-newlines)

Control how newlines are handled in the HTML:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$html = "Line 1\nLine 2";
$cleanHtml = HtmlCrawler::fromHtml($html)
    ->preserveNewlines(false)  // Set to false to replace newlines with spaces
    ->clean();

// Expected output: "Line 1 Line 2"
```

### 8. Loading HTML from a URL

[](#8-loading-html-from-a-url)

You can also load HTML directly from a URL:

```
use CloudStudio\HtmlCrawler\Facades\HtmlCrawler;

$cleanHtml = HtmlCrawler::fromUrl('https://example.com')
    ->clean();

// Output: the cleaned HTML content retrieved from the URL.
```

Configuration
-------------

[](#configuration)

The package includes a configuration file that allows you to define default options. After publishing the configuration file, you will find it at `config/html-crawler.php`:

```
return [
    'preserve_newlines'   => true,
    'allowed_tags'        => [],
    'convert_to_markdown' => false,
    'remove_scripts'      => true,
    'remove_styles'       => true,
];
```

You can modify these values according to your needs.

Troubleshooting
---------------

[](#troubleshooting)

If you encounter the error:

```
BindingResolutionException: Target class [config] does not exist.

```

make sure your tests are running in a Laravel environment using **orchestra/testbench**. For package testing, install Testbench with:

```
composer require --dev orchestra/testbench
```

Then, set up your base test case to extend Testbench (see the package documentation for more details).

Testing
-------

[](#testing)

To run the tests, you can use:

```
./vendor/bin/pest
```

or if using PHPUnit:

```
./vendor/bin/phpunit
```

Changelog
---------

[](#changelog)

Please see the [CHANGELOG](CHANGELOG.md) for detailed information on recent changes.

Contributing
------------

[](#contributing)

Please refer to [CONTRIBUTING](.github/CONTRIBUTING.md) for details on how to contribute to this package.

Security Vulnerabilities
------------------------

[](#security-vulnerabilities)

Please review [our security policy](../../security/policy) on how to report security vulnerabilities.

Credits
-------

[](#credits)

- [Cloud Studio](https://github.com/cloudstudio)
- [All Contributors](../../contributors)

License
-------

[](#license)

This package is open-sourced software licensed under the [MIT license](LICENSE.md).

###  Health Score

36

—

LowBetter than 82% of packages

Maintenance62

Regular maintenance activity

Popularity10

Limited adoption so far

Community7

Small or concentrated contributor base

Maturity55

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~0 days

Total

3

Last Release

457d ago

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/3589377?v=4)[Toni Soriano](/maintainers/cloudstudio)[@cloudstudio](https://github.com/cloudstudio)

---

Top Contributors

[![cloudstudio](https://avatars.githubusercontent.com/u/3589377?v=4)](https://github.com/cloudstudio "cloudstudio (3 commits)")

---

Tags

laravelhtmlcrawlermarkdowncleanercloudstudio

###  Code Quality

TestsPest

Code StyleLaravel Pint

### Embed Badge

![Health badge](/badges/cloudstudio-laravel-html-crawler/health.svg)

```
[![Health](https://phpackages.com/badges/cloudstudio-laravel-html-crawler/health.svg)](https://phpackages.com/packages/cloudstudio-laravel-html-crawler)
```

###  Alternatives

[spatie/laravel-markdown-response

Serve markdown versions of your HTML pages to AI agents and bots

6512.6k](/packages/spatie-laravel-markdown-response)[interaction-design-foundation/nova-html-card

A Laravel Nova card to display arbitrary HTML content

67731.2k3](/packages/interaction-design-foundation-nova-html-card)[guava/filament-knowledge-base

A filament plugin that adds a knowledge base and help to your filament panel(s).

206120.5k1](/packages/guava-filament-knowledge-base)[vormkracht10/laravel-mails

Laravel Mails can collect everything you might want to track about the mails that has been sent by your Laravel app.

24149.7k](/packages/vormkracht10-laravel-mails)[torchlight/torchlight-commonmark

A Commonmark extension for Torchlight, the syntax highlighting API.

29256.6k6](/packages/torchlight-torchlight-commonmark)[cartalyst/interpret

A driver-based content rendering package, with support for HTML, Markdown &amp; plain text. You can register custom drivers for custom content types.

1914.7k](/packages/cartalyst-interpret)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
