PHPackages                             probellm/probellm - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Testing &amp; Quality](/categories/testing)
4. /
5. probellm/probellm

ActiveLibrary[Testing &amp; Quality](/categories/testing)

probellm/probellm
=================

A PHP testing framework for LLM-powered agents. Built on top of PHPUnit/Pest.

v1.0.0(3mo ago)14MITPHPPHP ^8.4

Since Feb 11Pushed 2mo agoCompare

[ Source](https://github.com/Avead556/probellm)[ Packagist](https://packagist.org/packages/probellm/probellm)[ RSS](/packages/probellm-probellm/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (3)Versions (4)Used By (0)

ProbeLLM
========

[](#probellm)

 A PHP testing framework for LLM-powered agents.
 Built on top of PHPUnit / Pest.

 [Installation](#installation) · [Quick Start](#quick-start) · [Features](#features) · [ElevenLabs](#elevenlabs-convai) · [Cassettes](#cassette-system) · [Providers](#providers) · [License](#license)

---

Why ProbeLLM?
-------------

[](#why-probellm)

Testing LLM agents is hard. Responses are non-deterministic, API calls are slow and expensive, and tool-calling flows require multi-turn orchestration.

ProbeLLM solves this with:

- **Fluent DSL** for writing multi-turn dialog tests
- **ElevenLabs ConvAI** simulation testing with evaluation criteria, tool mocks, and dynamic variables
- **Cassette record/replay** so tests run offline, fast, and deterministic
- **LLM-as-judge** assertions for evaluating response quality with natural language criteria
- **Tool calling** support with auto-resolution of `tool_call_id`
- **Multimodal attachments** (images, PDFs, audio) via local files or URLs
- **PHPUnit attributes** for declarative test configuration

Installation
------------

[](#installation)

```
composer require probellm/probellm --dev
```

Requires PHP 8.4+ and ext-curl.

Quick Start
-----------

[](#quick-start)

```
use ProbeLLM\AgentTestCase;
use ProbeLLM\Attributes\AgentSystem;
use ProbeLLM\Attributes\AgentModel;
use ProbeLLM\Attributes\AgentReplayMode;
use ProbeLLM\DSL\AnswerExpectations;

#[AgentSystem('You are a helpful assistant. Always respond in valid JSON.')]
#[AgentModel('gpt-4o')]
#[AgentReplayMode]
class MyAgentTest extends AgentTestCase
{
    protected function resolveProvider(): LLMProvider
    {
        return OpenAICompatibleProvider::openAI(getenv('OPENAI_API_KEY'));
    }

    public function test_greeting(): void
    {
        $this->dialog()
            ->user('Return JSON with key "greeting" and value "hello".')
            ->answer(function (AnswerExpectations $a) {
                $a->assertJson()
                  ->assertJsonPath('$.greeting', equals: 'hello');
            });
    }
}
```

First run calls the real API and records cassettes automatically. All subsequent runs use cached responses — instant, no API calls:

```
./vendor/bin/phpunit
```

Features
--------

[](#features)

### Multi-turn Dialogs

[](#multi-turn-dialogs)

Chain `.user()` / `.answer()` / `.toolResult()` calls to test full conversation flows:

```
$this->dialog()
    ->user('Return JSON: {"count": 1}')
    ->answer(function (AnswerExpectations $a) {
        $a->assertJson()
          ->assertJsonPath('$.count', equals: 1);
    })
    ->user('Now increment the count.')
    ->answer(function (AnswerExpectations $a) {
        $a->assertJsonPath('$.count', equals: 2);
    });
```

### JSON Assertions

[](#json-assertions)

```
->answer(function (AnswerExpectations $a) {
    $a->assertJson()                                          // valid JSON
      ->assertJsonPath('$.name', equals: 'Alice')             // exact match
      ->assertJsonPath('$.bio', contains: 'engineer')         // substring
      ->assertJsonPath('$.bio', notContains: 'manager')       // negative substring
      ->assertJsonPath('$.items[0].id', notEmpty: true);      // nested array access
})
```

### Tool Calling

[](#tool-calling)

Define tools via `ToolContract`, assert on calls and arguments:

```
#[AgentTools(SearchTool::class)]
public function test_agent_searches(): void
{
    $this->dialog()
        ->user('Search for "PHP 8.4 features".')
        ->answer(function (AnswerExpectations $a) {
            $a->assertToolCalled('search')
              ->assertToolArgs('search', function (array $args) {
                  self::assertStringContainsString('PHP', $args['query']);
              });
        })
        ->toolResult('search', [
            'results' => [['title' => 'PHP 8.4 Released', 'url' => 'https://php.net']],
        ])
        ->answer(function (AnswerExpectations $a) {
            self::assertNotEmpty($a->lastMessage());
        });
}
```

### Multimodal Attachments

[](#multimodal-attachments)

Send images, PDFs, and audio files alongside user messages:

```
use ProbeLLM\DTO\Attachment;

$this->dialog()
    ->userWithAttachments('What is in this image?', [
        '/path/to/photo.png',                                    // local file
        Attachment::fromUrl('https://example.com/img.jpg'),      // URL
        Attachment::fromBase64($data, 'image/jpeg'),             // base64
    ])
    ->answer(function (AnswerExpectations $a) {
        $a->assertByPrompt('The response describes the contents of the image');
    });
```

Supported types: `image/*`, `application/pdf`, `audio/*`.

### LLM-as-Judge

[](#llm-as-judge)

Use natural language criteria to evaluate responses:

```
->answer(function (AnswerExpectations $a) {
    $a->assertJson()
      ->assertByPrompt('The response contains a healthy breakfast suggestion')
      ->assertByPrompt('No excessive sugar is recommended');
})
```

Judge model and temperature can be configured per-call, per-method, or per-class:

```
// Per-call override
$a->assertByPrompt('Criteria here', model: 'gpt-4o', temperature: 0.1);

// Via attributes
#[JudgeModel('gpt-4o-mini')]
#[JudgeTemperature(0.0)]
```

### PHPUnit Attributes

[](#phpunit-attributes)

Declarative configuration at class or method level:

AttributeScopeDescription`#[AgentSystem('...')]`Class / MethodSystem prompt`#[AgentSystemFile('path')]`Class / MethodSystem prompt from file`#[AgentModel('gpt-4o')]`Class / MethodModel name`#[AgentTemperature(0.7)]`Class / MethodSampling temperature`#[AgentTools(SearchTool::class)]`Class / MethodEnable tool calling`#[AgentReplayMode]`Class / MethodEnforce cassette-only mode`#[JudgeModel('gpt-4o-mini')]`Class / MethodJudge model`#[JudgeTemperature(0.0)]`Class / MethodJudge temperature`#[ElevenLabsAgentId('agent_...')]`Class / MethodElevenLabs agent ID`#[ElevenLabsAgentId(env: 'VAR')]`Class / MethodAgent ID from env variable`#[ElevenLabsTurnsLimit(20)]`Class / MethodMax simulation turnsMethod-level attributes override class-level. Multiple `#[AgentSystem]` and `#[AgentSystemFile]` are concatenated.

ElevenLabs ConvAI
-----------------

[](#elevenlabs-convai)

Test ElevenLabs conversational AI agents using the [simulate-conversation API](https://elevenlabs.io/docs/conversational-ai/customization/personality/simulated-conversations). ProbeLLM sends a simulated user against your agent and lets you assert on the resulting transcript, tool calls, evaluations, and workflow transfers.

### Setup

[](#setup)

```
use ProbeLLM\ElevenLabsTestCase;
use ProbeLLM\Attributes\ElevenLabsAgentId;
use ProbeLLM\Attributes\ElevenLabsTurnsLimit;
use ProbeLLM\DSL\ElevenLabsExpectations;

#[ElevenLabsAgentId(env: 'ELEVENLABS_AGENT_ID')]
#[ElevenLabsTurnsLimit(20)]
class MyVoiceAgentTest extends ElevenLabsTestCase
{
    // Uses ELEVENLABS_API_KEY env var automatically.
    // Override resolveElevenLabsProvider() for custom setup.
}
```

### Simulation Scenario

[](#simulation-scenario)

```
public function test_greeting(): void
{
    $this->elevenLabs()
        ->withDynamicVariable('companyName', 'Acme Corp')
        ->withUserPrompt('You just called the company, wait for the greeting')
        ->withTurnsLimit(4)
        ->withEvaluation('greeting', 'Agent greeted the user and mentioned the company name')
        ->run(function (ElevenLabsExpectations $e) {
            $e->assertMinTurns(2)
                ->assertAllEvaluationsPassed()
                ->assertByPrompt('The agent greeted the user politely');
        });
}
```

### Dynamic Variables

[](#dynamic-variables)

Pass `{{placeholder}}` values that your agent's prompt references:

```
$this->elevenLabs()
    ->withDynamicVariable('companyName', 'Acme Corp')
    ->withDynamicVariable('agentName', 'Sarah')
    ->withDynamicVariables([
        'businessHours' => '9am-5pm',
        'maxDiscount' => 15,
    ])
```

### Tool Mocks

[](#tool-mocks)

Mock tool responses so the agent's tools return predetermined data during simulation:

```
$this->elevenLabs()
    ->withToolMock('Create_order', ['status' => 'success', 'request_id' => 'REQ-001'])
    ->withToolMock('Transfer-to-number', ['status' => 'transferred'])
```

### Evaluation Criteria

[](#evaluation-criteria)

Define criteria that ElevenLabs evaluates against the conversation:

```
$this->elevenLabs()
    ->withEvaluation('data_collected', 'Agent collected name, phone, and address')
    ->withEvaluation('lead_created', 'Agent used the Create_order tool')
    ->run(function (ElevenLabsExpectations $e) {
        $e->assertAllEvaluationsPassed();      // all criteria passed
        $e->assertEvaluation('data_collected'); // specific criterion passed
        $e->assertEvaluationFailed('some_id');  // specific criterion failed
        $e->assertEvaluationCount(2);           // expected number of results
    });
```

### ElevenLabs Assertions

[](#elevenlabs-assertions)

#### Tool assertions

[](#tool-assertions)

```
$e->assertToolCalled('Create_order')           // tool was called at least once
  ->assertToolNotCalled('Dangerous_tool')        // tool was NOT called
  ->assertToolCalledTimes('Create_order', 1)    // exact call count
  ->assertToolExecuted('Create_order')          // called AND executed
  ->assertToolCallCount(2)                       // total tool calls
  ->assertNoToolsCalled()                        // no tools called at all
  ->assertToolCallParam('Create_order', 'name', 'John')  // param value
  ->assertToolCallParamContains('Create_order', 'address', 'Maple')
  ->assertToolCallHasParam('Create_order', 'phone')
  ->assertToolArgs('Create_order', function (array $args) {
      self::assertArrayHasKey('name', $args);
  });
```

#### Transcript assertions

[](#transcript-assertions)

```
$e->assertTranscriptContains('hello')            // full transcript contains string
  ->assertTranscriptNotContains('error')          // full transcript does NOT contain
  ->assertTranscriptMatchesRegex('/\d{3}-\d{4}/')
  ->assertAgentSaid('How can I help')             // only agent messages
  ->assertAgentNeverSaid('I am an AI')
  ->assertFirstAgentMessage('Welcome')
  ->assertLastAgentMessage('Goodbye')
  ->assertTranscriptRole(0, 'agent')              // role at index
  ->assertTranscriptContent(0, 'exact text')      // content at index
  ->assertMinTurns(4)                             // at least N entries
  ->assertMaxTurns(20);                           // at most N entries
```

#### Workflow / transfer assertions

[](#workflow--transfer-assertions)

```
$e->assertAgentHandled('agent_abc123')            // agent appeared in transcript
  ->assertTransferredToAgent('agent_xyz789')       // conversation transferred
  ->assertWorkflowNodeReached('node_qualifier')
  ->assertAgentCount(2);                           // number of unique agents
```

#### Analysis assertions

[](#analysis-assertions)

```
$e->assertCallSuccessful()                        // analysis.call_successful = "success"
  ->assertTranscriptSummaryContains('booked');     // analysis.transcript_summary contains
```

#### LLM Judge

[](#llm-judge)

```
$e->assertByPrompt('The agent collected all required information before creating the request');
```

Requires a judge provider. Override `resolveJudgeProvider()` in your test case or set `LLM_API_KEY` / `LLM_BASE_URL` env vars (auto-configured in `ElevenLabsTestCase`).

Cassette System
---------------

[](#cassette-system)

Cassettes record LLM responses to JSON files in `tests/cassettes/`. Each cassette is keyed by a SHA256 hash of all inputs (system prompt, messages, model, temperature, tools, test name, turn index) — any change produces a new key.

**Decision logic per turn:**

Cassette exists?Replay mode?ResultYes-Load from cassetteNoYesCall API, save cassetteNoNoCall API (no caching)ElevenLabs simulations use the same cassette system. The hash is computed from agent ID, user prompt, first message, tool mocks, evaluation criteria, turns limit, dynamic variables, and test name.

Providers
---------

[](#providers)

### OpenAI-compatible (OpenAI, OpenRouter, Groq, Together, Ollama, etc.)

[](#openai-compatible-openai-openrouter-groq-together-ollama-etc)

```
protected function resolveProvider(): LLMProvider
{
    return new OpenAICompatibleProvider(
        apiKey: getenv('LLM_API_KEY'),
        baseUrl: 'https://api.openai.com/v1',
    );

    // Or use factory methods:
    // return OpenAICompatibleProvider::openAI(getenv('OPENAI_API_KEY'));
    // return OpenAICompatibleProvider::openRouter(getenv('OPENROUTER_API_KEY'));
}
```

### Anthropic (Claude)

[](#anthropic-claude)

```
protected function resolveProvider(): LLMProvider
{
    return new AnthropicProvider(apiKey: getenv('ANTHROPIC_API_KEY'));
}
```

### ElevenLabs ConvAI

[](#elevenlabs-convai-1)

```
protected function resolveElevenLabsProvider(): ElevenLabsConvaiProvider
{
    return new ElevenLabsProvider(apiKey: getenv('ELEVENLABS_API_KEY'));
}
```

### Separate judge provider

[](#separate-judge-provider)

```
protected function resolveJudgeProvider(): ?LLMProvider
{
    return new AnthropicProvider(apiKey: getenv('ANTHROPIC_API_KEY'));
}
```

Exception Hierarchy
-------------------

[](#exception-hierarchy)

All exceptions extend `ProbeLLMException` (which extends `RuntimeException`), so you can catch them granularly or broadly:

ExceptionWhen`CassetteMissingException`Replay mode, cassette not found`ProviderException`HTTP/curl errors from LLM API`InvalidResponseException`Invalid JSON from provider or judge`ToolResolutionException`Tool class issues, missing tool\_call\_id`ConfigurationException`Missing ext-curl, file not found, no provider configuredEnvironment Variables
---------------------

[](#environment-variables)

VariableDescription`LLM_API_KEY`API key for your LLM provider`LLM_BASE_URL`Provider endpoint (default: `https://api.openai.com/v1`)`ELEVENLABS_API_KEY`API key for ElevenLabs ConvAI`ELEVENLABS_AGENT_ID`Default agent ID for ElevenLabs testsLicense
-------

[](#license)

MIT

###  Health Score

39

—

LowBetter than 86% of packages

Maintenance83

Actively maintained with recent releases

Popularity6

Limited adoption so far

Community6

Small or concentrated contributor base

Maturity53

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 100% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~25 days

Total

3

Last Release

46d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/101c0396b2599af45fe8df139a2916382aa8a7159c76e9776455ae92cd35ceb2?d=identicon)[Avead](/maintainers/Avead)

---

Top Contributors

[![Avead556](https://avatars.githubusercontent.com/u/41363009?v=4)](https://github.com/Avead556 "Avead556 (3 commits)")

---

Tags

agent-testingaianthropicelevenlabsllmllm-testingopenaiphpphpunittesting

###  Code Quality

TestsPHPUnit

Code StylePHP CS Fixer

### Embed Badge

![Health badge](/badges/probellm-probellm/health.svg)

```
[![Health](https://phpackages.com/badges/probellm-probellm/health.svg)](https://phpackages.com/packages/probellm-probellm)
```

###  Alternatives

[phpspec/prophecy

Highly opinionated mocking framework for PHP 5.3+

8.5k551.7M682](/packages/phpspec-prophecy)[brianium/paratest

Parallel testing for PHP

2.5k118.8M754](/packages/brianium-paratest)[beberlei/assert

Thin assertion library for input validation in business models.

2.4k96.9M570](/packages/beberlei-assert)[mikey179/vfsstream

Virtual file system to mock the real file system in unit tests.

1.4k108.0M2.7k](/packages/mikey179-vfsstream)[orchestra/testbench

Laravel Testing Helper for Packages Development

2.2k39.1M32.1k](/packages/orchestra-testbench)[phpspec/phpspec

Specification-oriented BDD framework for PHP 7.1+

1.9k36.7M3.1k](/packages/phpspec-phpspec)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)