PHPackages                             kreuzberg-dev/kreuzcrawl - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Utility &amp; Helpers](/categories/utility)
4. /
5. kreuzberg-dev/kreuzcrawl

ActivePhp-ext[Utility &amp; Helpers](/categories/utility)

kreuzberg-dev/kreuzcrawl
========================

High-performance web crawling engine

0.2.0(1mo ago)1022↓100%12[8 issues](https://github.com/kreuzberg-dev/kreuzcrawl/issues)Elastic-2.0RustPHP ^8.4CI passing

Since Apr 14Pushed 1w agoCompare

[ Source](https://github.com/kreuzberg-dev/kreuzcrawl)[ Packagist](https://packagist.org/packages/kreuzberg-dev/kreuzcrawl)[ RSS](/packages/kreuzberg-dev-kreuzcrawl/feed)WikiDiscussions main Synced 1w ago

READMEChangelog (10)Dependencies (3)Versions (60)Used By (0)

Kreuzcrawl
==========

[](#kreuzcrawl)

 [ ![Bindings](https://camo.githubusercontent.com/23b2c7873e51d39aafa797a5571f07783c505f083f976594e38024d10eb4e433/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f42696e64696e67732d616c65662532302544372539302d303037656336) ](https://github.com/kreuzberg-dev/alef) [ ![Rust](https://camo.githubusercontent.com/535eb964ca9929a988157682ae836cfeea61060f9a3940b239a75ff512df3c4e/68747470733a2f2f696d672e736869656c64732e696f2f6372617465732f762f6b7265757a637261776c3f6c6162656c3d5275737426636f6c6f723d303037656336) ](https://crates.io/crates/kreuzcrawl) [ ![Python](https://camo.githubusercontent.com/208369df6123c61a5aaa5ed8e794757fb1d88acef02fc8f46ad7049d2b1adc46/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6b7265757a637261776c3f6c6162656c3d507974686f6e26636f6c6f723d303037656336) ](https://pypi.org/project/kreuzcrawl/) [ ![Node.js](https://camo.githubusercontent.com/fb7e7bab93de2575a2fe232c78b6ef8cb649f88d6741e198cba676d77e0c0708/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f406b7265757a626572672f6b7265757a637261776c3f6c6162656c3d4e6f64652e6a7326636f6c6f723d303037656336) ](https://www.npmjs.com/package/@kreuzberg/kreuzcrawl) [ ![WASM](https://camo.githubusercontent.com/39690b0bbfa94d9e70b7a6876af158e695f2e80b509c6d23c20e4a28bd7beb79/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f406b7265757a626572672f6b7265757a637261776c2d7761736d3f6c6162656c3d5741534d26636f6c6f723d303037656336) ](https://www.npmjs.com/package/@kreuzberg/kreuzcrawl-wasm) [ ![Java](https://camo.githubusercontent.com/91573cc6fa79683c55dfc591dcccfc80d3f782464bb3ee2ce0e07e74fbdb737e/68747470733a2f2f696d672e736869656c64732e696f2f6d6176656e2d63656e7472616c2f762f6465762e6b7265757a626572672e6b7265757a637261776c2f6b7265757a637261776c3f6c6162656c3d4a61766126636f6c6f723d303037656336) ](https://central.sonatype.com/artifact/dev.kreuzberg.kreuzcrawl/kreuzcrawl) [ ![Go](https://camo.githubusercontent.com/463f535ca921a6e7ea4ef5ea59b1aa1bc5ad798254f16297b79e72b6a5759dfa/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f762f7461672f6b7265757a626572672d6465762f6b7265757a637261776c3f6c6162656c3d476f26636f6c6f723d303037656336) ](https://pkg.go.dev/github.com/kreuzberg-dev/kreuzcrawl/packages/go) [ ![C#](https://camo.githubusercontent.com/b2e1f35d1888d3737b1f03ff9f51d4ee8016500228180e63bea35ff4d5b7df04/68747470733a2f2f696d672e736869656c64732e696f2f6e756765742f762f4b7265757a637261776c3f6c6162656c3d4325323326636f6c6f723d303037656336) ](https://www.nuget.org/packages/Kreuzcrawl/) [ ![PHP](https://camo.githubusercontent.com/4445599c703fcd2fd027ebe9bea9fd8deccbe84c707349d1cef3216211c0ffb8/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f762f6b7265757a626572672d6465762f6b7265757a637261776c3f6c6162656c3d50485026636f6c6f723d303037656336) ](https://packagist.org/packages/kreuzberg-dev/kreuzcrawl) [ ![Ruby](https://camo.githubusercontent.com/74501aee6fb561cdbe6ebaa4e6e9a082582ae58a6cafdfe012832f57292a9967/68747470733a2f2f696d672e736869656c64732e696f2f67656d2f762f6b7265757a637261776c3f6c6162656c3d5275627926636f6c6f723d303037656336) ](https://rubygems.org/gems/kreuzcrawl) [ ![Elixir](https://camo.githubusercontent.com/c6c1dfd6f8176afc194757301a079f48953cc82b49e651c1ca2c165d4840af16/68747470733a2f2f696d672e736869656c64732e696f2f686578706d2f762f6b7265757a637261776c3f6c6162656c3d456c6978697226636f6c6f723d303037656336) ](https://hex.pm/packages/kreuzcrawl) [ ![Dart](https://camo.githubusercontent.com/343f315fb525767e27a8246b0170d5c9266cefa0a04af846b412675cdd12dd58/68747470733a2f2f696d672e736869656c64732e696f2f7075622f762f6b7265757a637261776c3f6c6162656c3d4461727426636f6c6f723d303037656336) ](https://pub.dev/packages/kreuzcrawl) [ ![Kotlin](https://camo.githubusercontent.com/12701a67aabe2e27428afc0beb9c4f9a0c262877fc9689b80c4f13c077b889cb/68747470733a2f2f696d672e736869656c64732e696f2f6d6176656e2d63656e7472616c2f762f6465762e6b7265757a626572672e6b7265757a637261776c2e616e64726f69642f6b7265757a637261776c2d616e64726f69643f6c6162656c3d4b6f746c696e26636f6c6f723d303037656336) ](https://central.sonatype.com/artifact/dev.kreuzberg.kreuzcrawl.android/kreuzcrawl-android) [ ![Swift](https://camo.githubusercontent.com/b8e3c9990c66f8b878644a15e0ffd0fba03df291119bca044b94033c83d27246/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f53776966742d53504d2d303037656336) ](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/swift) [ ![Zig](https://camo.githubusercontent.com/376558d850c9af32b9d23b0211489a0b5c979af703256fa0b7c3e917bd591c53/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5a69672d7061636b6167652d303037656336) ](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/zig) [ ![C FFI](https://camo.githubusercontent.com/41b9c58c3810775402965a3a7652da9832ea83391c1ff255081e5551900db8f2/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f432d4646492d303037656336) ](https://github.com/kreuzberg-dev/kreuzcrawl/releases) [ ![Docker](https://camo.githubusercontent.com/7d83f05278efa79f40cf820ee19ebf22bd638c94cb23b09bb354e7d1e3282496/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f636b65722d676863722e696f2d3030376563363f6c6f676f3d646f636b6572266c6f676f436f6c6f723d7768697465) ](https://github.com/kreuzberg-dev/kreuzcrawl/pkgs/container/kreuzcrawl) [ ![License](https://camo.githubusercontent.com/00b4a61fb0a3d4fff141828717c6e38d927d2fa415c522df67c109f8f24982b9/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d456c61737469632d2d322e302d303037656336) ](https://github.com/kreuzberg-dev/kreuzcrawl/blob/main/LICENSE) [ ![Documentation](https://camo.githubusercontent.com/5e23b2ac231fb484faec37ba9eb56e877d313fd7c17f07c632668e5dd57af564/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446f63732d6b7265757a637261776c2d303037656336) ](https://docs.kreuzcrawl.kreuzberg.dev)

 [ ![Kreuzcrawl](https://raw.githubusercontent.com/kreuzberg-dev/kreuzcrawl/main/docs/assets/docs_top_banner.svg) ](https://kreuzberg.dev)

 [ ![Join Discord](https://camo.githubusercontent.com/c3d59355bb5f7fc8224936ba897a06b87e5ba6f52bd08dfd2f7f78bf669c550d/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f446973636f72642d436861742d3030376563363f6c6f676f3d646973636f7264266c6f676f436f6c6f723d7768697465) ](https://discord.gg/xt9WY3GnKR)

High-performance Rust web crawling engine for structured data extraction. Scrape, crawl, and map websites with native bindings for 14 languages — same engine, identical results across every runtime.

Key Features
------------

[](#key-features)

- **Structured extraction** — Text, metadata, links, images, assets, JSON-LD, Open Graph, hreflang, favicons, headings, and response headers
- **Markdown conversion** — Clean Markdown output with citations, document structure, and fit-content mode
- **Concurrent crawling** — Depth-first, breadth-first, or best-first traversal with configurable depth, page limits, and concurrency
- **14 language bindings** — Rust, Python, Node.js, TypeScript, Ruby, Go, Java, Kotlin (Android), C#, PHP, Elixir, Dart, Swift, Zig, and WebAssembly
- **Smart filtering** — BM25 relevance scoring, URL include/exclude patterns, robots.txt compliance, and sitemap discovery
- **Browser rendering** — Optional headless browser for JavaScript-heavy SPAs with WAF detection and bypass
- **Batch operations** — Scrape or crawl hundreds of URLs concurrently with partial failure handling
- **Streaming** — Real-time crawl events via async streams for progress tracking
- **Authentication** — HTTP Basic, Bearer token, and custom header auth with persistent cookie jars
- **Rate limiting** — Per-domain request throttling with configurable delays
- **Asset download** — Download, deduplicate, and filter images, documents, and other linked assets
- **MCP server** — Model Context Protocol integration for AI agents
- **REST API** — HTTP server with OpenAPI spec

**[Documentation](https://docs.kreuzcrawl.kreuzberg.dev)** | **[API Reference](https://docs.kreuzcrawl.kreuzberg.dev/reference/api-rust/)**

Installation
------------

[](#installation)

LanguagePackageInstall**[Python](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/python)**[kreuzcrawl](https://pypi.org/project/kreuzcrawl/)`pip install kreuzcrawl`**[Node.js](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl-node)**[@kreuzberg/kreuzcrawl](https://www.npmjs.com/package/@kreuzberg/kreuzcrawl)`npm install @kreuzberg/kreuzcrawl`**[Rust](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl)**[kreuzcrawl](https://crates.io/crates/kreuzcrawl)`cargo add kreuzcrawl`**[Go](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/go)**[pkg.go.dev](https://pkg.go.dev/github.com/kreuzberg-dev/kreuzcrawl/packages/go)`go get github.com/kreuzberg-dev/kreuzcrawl/packages/go`**[Java](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/java)**[Maven Central](https://central.sonatype.com/artifact/dev.kreuzberg.kreuzcrawl/kreuzcrawl)See [README](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/java)**[C#](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/csharp)**[NuGet](https://www.nuget.org/packages/Kreuzcrawl/)`dotnet add package Kreuzcrawl`**[Ruby](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/ruby)**[kreuzcrawl](https://rubygems.org/gems/kreuzcrawl)`gem install kreuzcrawl`**[PHP](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/php)**[kreuzberg-dev/kreuzcrawl](https://packagist.org/packages/kreuzberg-dev/kreuzcrawl)`composer require kreuzberg-dev/kreuzcrawl`**[Elixir](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/elixir)**[kreuzcrawl](https://hex.pm/packages/kreuzcrawl)`{:kreuzcrawl, "~> 0.2"}`**[WASM](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl-wasm)**[@kreuzberg/kreuzcrawl-wasm](https://www.npmjs.com/package/@kreuzberg/kreuzcrawl-wasm)`npm install @kreuzberg/kreuzcrawl-wasm`**[C FFI](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl-ffi)**[GitHub Releases](https://github.com/kreuzberg-dev/kreuzcrawl/releases)C header + shared library**[CLI](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl-cli)**[crates.io](https://crates.io/crates/kreuzcrawl-cli)`cargo install kreuzcrawl-cli`**CLI (Homebrew)**[kreuzberg-dev/tap](https://github.com/kreuzberg-dev/homebrew-tap)`brew install kreuzberg-dev/tap/kreuzcrawl`Quick Start
-----------

[](#quick-start)

**Python** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/python)```
from kreuzcrawl import create_engine, scrape

engine = create_engine()
result = scrape(engine, "https://example.com")

print(result.metadata.title)
print(result.markdown.content)
print(len(result.links))
```

**Node.js / TypeScript** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl-node)```
import { createEngine, scrape } from "@kreuzberg/kreuzcrawl";

const engine = createEngine();
const result = await scrape(engine, "https://example.com");

console.log(result.metadata.title);
console.log(result.markdown.content);
console.log(result.links.length);
```

**Rust** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/crates/kreuzcrawl)```
let engine = kreuzcrawl::create_engine(None)?;
let result = kreuzcrawl::scrape(&engine, "https://example.com").await?;

println!("{}", result.metadata.title);
println!("{}", result.markdown.content);
println!("{}", result.links.len());
```

**Go** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/go)```
engine, _ := kcrawl.CreateEngine()
result, _ := kcrawl.Scrape(engine, "https://example.com")

fmt.Println(result.Metadata.Title)
fmt.Println(result.Markdown.Content)
fmt.Println(len(result.Links))
```

**Java** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/java)```
var engine = Kreuzcrawl.createEngine(null);
var result = Kreuzcrawl.scrape(engine, "https://example.com");

System.out.println(result.metadata().title());
System.out.println(result.markdown().content());
System.out.println(result.links().size());
```

**C#** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/csharp)```
var engine = KreuzcrawlLib.CreateEngine(null);
var result = await KreuzcrawlLib.Scrape(engine, "https://example.com");

Console.WriteLine(result.Metadata.Title);
Console.WriteLine(result.Markdown.Content);
Console.WriteLine(result.Links.Count);
```

**Ruby** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/ruby)```
engine = Kreuzcrawl.create_engine(nil)
result = Kreuzcrawl.scrape(engine, "https://example.com")

puts result.metadata.title
puts result.markdown.content
puts result.links.length
```

**PHP** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/php)```
$engine = Kreuzcrawl::createEngine(null);
$result = Kreuzcrawl::scrape($engine, "https://example.com");

echo $result->metadata->title;
echo $result->markdown->content;
echo count($result->links);
```

**Elixir** — [Full docs](https://github.com/kreuzberg-dev/kreuzcrawl/tree/main/packages/elixir)```
{:ok, engine} = Kreuzcrawl.create_engine(nil)
{:ok, result} = Kreuzcrawl.scrape(engine, "https://example.com")

IO.puts(result.metadata.title)
IO.puts(result.markdown.content)
IO.puts(length(result.links))
```

Platform Support
----------------

[](#platform-support)

LanguageLinux x86\_64Linux aarch64macOS ARM64Windows x64Python✅✅✅✅Node.js✅✅✅✅WASM✅✅✅✅Ruby✅✅✅—Elixir✅✅✅✅Go✅✅✅✅Java✅✅✅✅C#✅✅✅✅PHP✅✅✅✅Rust✅✅✅✅C (FFI)✅✅✅✅CLI✅✅✅✅Architecture
------------

[](#architecture)

```
Your Application (Python, Node.js, Ruby, Java, Go, C#, PHP, Elixir, ...)
    │
Language Bindings (PyO3, NAPI-RS, Magnus, ext-php-rs, Rustler, cgo, Panama, P/Invoke)
    │
Rust Core Engine (async, concurrent, SIMD-optimized)
    │
    ├── HTTP Client (reqwest + tower middleware stack)
    ├── HTML Parser (html5ever + lol_html)
    ├── Markdown Converter (html-to-markdown-rs)
    ├── Content Extraction (metadata, JSON-LD, Open Graph, readability)
    ├── Link Discovery (robots.txt, sitemaps, anchor analysis)
    └── Browser Rendering (optional headless Chrome/Firefox)

```

Contributing
------------

[](#contributing)

Contributions are welcome! See our [Contributing Guide](https://github.com/kreuzberg-dev/kreuzcrawl/blob/main/CONTRIBUTING.md).

Part of Kreuzberg.dev
---------------------

[](#part-of-kreuzbergdev)

- [Kreuzberg](https://github.com/kreuzberg-dev/kreuzberg) — document intelligence: text, tables, metadata from 90+ formats with optional OCR.
- [Kreuzberg Cloud](https://github.com/kreuzberg-dev/kreuzberg-cloud) — managed extraction API with SDKs, dashboards, and observability.
- [html-to-markdown](https://github.com/kreuzberg-dev/html-to-markdown) — fast, lossless HTML→Markdown engine.
- [liter-llm](https://github.com/kreuzberg-dev/liter-llm) — universal LLM API client with native bindings for 14 languages and 143 providers.
- [tree-sitter-language-pack](https://github.com/kreuzberg-dev/tree-sitter-language-pack) — tree-sitter grammars and code-intelligence primitives.
- [alef](https://github.com/kreuzberg-dev/alef) — the polyglot binding generator that produces all per-language bindings.
- [Discord](https://discord.gg/xt9WY3GnKR) — community, roadmap, announcements.

License
-------

[](#license)

[Elastic License 2.0](https://github.com/kreuzberg-dev/kreuzcrawl/blob/main/LICENSE)

Links
-----

[](#links)

- [Documentation](https://docs.kreuzcrawl.kreuzberg.dev)
- [API Reference](https://docs.kreuzcrawl.kreuzberg.dev/reference/api-rust/)
- [GitHub](https://github.com/kreuzberg-dev/kreuzcrawl)
- [Issues](https://github.com/kreuzberg-dev/kreuzcrawl/issues)
- [Issues](https://github.com/kreuzberg-dev/kreuzcrawl/issues)
- [Discord](https://discord.gg/xt9WY3GnKR)

###  Health Score

48

—

FairBetter than 94% of packages

Maintenance93

Actively maintained with recent releases

Popularity19

Limited adoption so far

Community16

Small or concentrated contributor base

Maturity55

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 96.1% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~1 days

Total

50

Last Release

11d ago

PHP version history (2 changes)0.1.0-rc.1PHP ^8.4

v0.3.0-rc.28PHP &gt;=8.2

### Community

Maintainers

![](https://avatars.githubusercontent.com/u/30733348?v=4)[Na'aman Hirschfeld](/maintainers/Goldziher)[@Goldziher](https://github.com/Goldziher)

---

Top Contributors

[![Goldziher](https://avatars.githubusercontent.com/u/30733348?v=4)](https://github.com/Goldziher "Goldziher (787 commits)")[![github-actions[bot]](https://avatars.githubusercontent.com/in/15368?v=4)](https://github.com/github-actions[bot] "github-actions[bot] (21 commits)")[![v-tan](https://avatars.githubusercontent.com/u/22367932?v=4)](https://github.com/v-tan "v-tan (5 commits)")[![pratik-mahalle](https://avatars.githubusercontent.com/u/124587957?v=4)](https://github.com/pratik-mahalle "pratik-mahalle (3 commits)")[![kh3rld](https://avatars.githubusercontent.com/u/171191586?v=4)](https://github.com/kh3rld "kh3rld (1 commits)")[![Haoxincode](https://avatars.githubusercontent.com/u/48117929?v=4)](https://github.com/Haoxincode "Haoxincode (1 commits)")[![stchris](https://avatars.githubusercontent.com/u/217554?v=4)](https://github.com/stchris "stchris (1 commits)")

---

Tags

crawlingcsharpelixirffigolangjavamcpphppythonrubyrusttypescriptwasmweb-crawlerweb-scraping

###  Code Quality

TestsPHPUnit

Static AnalysisPHPStan

Code StylePHP CS Fixer

Type Coverage Yes

### Embed Badge

![Health badge](/badges/kreuzberg-dev-kreuzcrawl/health.svg)

```
[![Health](https://phpackages.com/badges/kreuzberg-dev-kreuzcrawl/health.svg)](https://phpackages.com/packages/kreuzberg-dev-kreuzcrawl)
```

###  Alternatives

[josemmo/verifactu-php

Librería para la implementación del sistema VERI\*FACTU de la AEAT española

1029.1k1](/packages/josemmo-verifactu-php)[eecli/addon-templates

Templates for the eecli generate:addon command.

386.2k1](/packages/eecli-addon-templates)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
