PHPackages                             kevinpijning/pest-plugin-prompt - PHPackages - PHPackages  [Skip to content](#main-content)[PHPackages](/)[Directory](/)[Categories](/categories)[Trending](/trending)[Leaderboard](/leaderboard)[Changelog](/changelog)[Analyze](/analyze)[Collections](/collections)[Log in](/login)[Sign up](/register)

1. [Directory](/)
2. /
3. [Framework](/categories/framework)
4. /
5. kevinpijning/pest-plugin-prompt

ActiveLibrary[Framework](/categories/framework)

kevinpijning/pest-plugin-prompt
===============================

Pest plugin to evaluate prompts

0.14.1(2mo ago)71.3k—8.1%2[2 issues](https://github.com/kevinpijning/pest-plugin-prompt/issues)[1 PRs](https://github.com/kevinpijning/pest-plugin-prompt/pulls)MITPHPPHP ^8.3CI failing

Since Nov 29Pushed 2mo ago1 watchersCompare

[ Source](https://github.com/kevinpijning/pest-plugin-prompt)[ Packagist](https://packagist.org/packages/kevinpijning/pest-plugin-prompt)[ RSS](/packages/kevinpijning-pest-plugin-prompt/feed)WikiDiscussions main Synced 1mo ago

READMEChangelogDependencies (8)Versions (25)Used By (0)

Pest Plugin for Prompt Testing
==============================

[](#pest-plugin-for-prompt-testing)

[![Tests](https://camo.githubusercontent.com/d50cf80ffecc9c5364927f2a5c301cd4ea563232e811db50f068e8ec52a9543e/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f616374696f6e732f776f726b666c6f772f7374617475732f6b6576696e70696a6e696e672f706573742d706c7567696e2d70726f6d70742f74657374732e796d6c3f6c6162656c3d7465737473267374796c653d666c61742d737175617265)](https://github.com/kevinpijning/pest-plugin-prompt/actions)[![PHP Version](https://camo.githubusercontent.com/d841888a73bb1cee58e76fe55cc2e0dd1dfda0b1f0a00c16a2d2416b9e05393a/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f7068702d762f6b6576696e70696a6e696e672f706573742d706c7567696e2d70726f6d70743f7374796c653d666c61742d737175617265)](https://packagist.org/packages/kevinpijning/pest-plugin-prompt)[![License](https://camo.githubusercontent.com/b9d3ed1d5977bf5ef7f4937e1392cf2768756710387575be0fa41cc07fe664fe/68747470733a2f2f696d672e736869656c64732e696f2f7061636b61676973742f6c2f6b6576696e70696a6e696e672f706573742d706c7567696e2d70726f6d70743f7374796c653d666c61742d737175617265)](https://github.com/kevinpijning/pest-plugin-prompt/blob/main/LICENSE)[![Pest](https://camo.githubusercontent.com/3709629e67aa25a2131d580ec9fe26c96cd59580bea390194a432b5179492463/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f506573742d342e302b2d6666363962343f7374796c653d666c61742d737175617265)](https://pestphp.com)

**Test your AI prompts with confidence using Pest's elegant syntax.**

This plugin brings LLM prompt testing to your Pest test suite, powered by [promptfoo](https://www.promptfoo.dev/) under the hood. Write fluent, expressive tests for evaluating AI model prompts using the familiar Pest API you already love.

Table of Contents
-----------------

[](#table-of-contents)

- [Why Use This Plugin?](#why-use-this-plugin)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Quick Start](#quick-start)
- [Documentation](#documentation)
    - [Core Functions](#core-functions)
        - [`prompt()`](#prompt)
        - [`provider()`](#provider)
        - [`assertion()`](#assertion)
    - [Evaluation Methods](#evaluation-methods)
        - [`describe()`](#describe)
        - [`usingProvider()`](#usingprovider)
        - [`alwaysExpect()`](#alwaysexpect)
        - [`expect()`](#expect)
        - [`and()`](#and)
    - [Assertion Methods](#assertion-methods)
    - [Provider Configuration](#provider-configuration)
        - [`id()`](#id)
        - [`label()`](#label)
        - [`temperature()`](#temperature)
        - [`maxTokens()`](#maxtokens)
        - [`topP()`](#topp)
        - [`frequencyPenalty()`](#frequencypenalty)
        - [`presencePenalty()`](#presencepenalty)
        - [`stop()`](#stop)
        - [`config()`](#config)
    - [Usage Examples](#usage-examples)
        - [Basic Example](#basic-example)
        - [Multiple Prompts](#multiple-prompts)
        - [Multiple Providers](#multiple-providers)
        - [Multiple Test Cases](#multiple-test-cases)
        - [Provider Configuration](#provider-configuration-1)
        - [Global Provider Registration](#global-provider-registration)
        - [Advanced Assertions](#advanced-assertions)
        - [LLM-Based Evaluation](#llm-based-evaluation)
        - [Complex Example](#complex-example)
    - [CLI Options](#cli-options)
        - [`--output`](#--output)
- [Credits &amp; License](#credits--license)

Why Use This Plugin?
--------------------

[](#why-use-this-plugin)

- **Test prompts against multiple LLM providers** - Compare OpenAI, Anthropic, and more in a single test
- **Validate responses with content assertions** - Check for specific text, JSON validity, HTML structure, and more
- **Use LLM-based evaluation** - Judge responses with natural language rubrics using AI itself
- **Familiar Pest-style fluent API** - Feels natural if you're already using Pest
- **Automatic cleanup** - Temporary files are managed for you
- **Battle-tested** - Built on promptfoo's proven evaluation framework

Prerequisites
-------------

[](#prerequisites)

Before you begin, make sure you have:

- **PHP 8.3** or higher
- **Pest 4.0** or higher
- **Node.js and npm** - Required for promptfoo execution via `npx`
- **API keys for LLM providers** - You'll need keys for the providers you want to test

### Setting up API Keys

[](#setting-up-api-keys)

Set environment variables for the providers you'll use:

```
export OPENAI_API_KEY="your-openai-key-here"
export ANTHROPIC_API_KEY="your-anthropic-key-here"
```

If you're using Laravel or a similar framework with `.env` file support, you can add them there instead.

For more provider options and configuration, check out [promptfoo's provider documentation](https://www.promptfoo.dev/docs/providers/).

Installation
------------

[](#installation)

Install the plugin via Composer:

```
composer require kevinpijning/pest-plugin-prompt --dev
```

The plugin automatically registers with Pest via package discovery - no additional configuration needed!

Quick Start
-----------

[](#quick-start)

Here's the simplest possible example to get you started:

```
test('greeting prompt works correctly', function () {
    prompt('You are a helpful assistant. Greet {{name}} warmly.')
        ->usingProvider('openai:gpt-4o-mini')
        ->expect(['name' => 'Alice'])
        ->toContain('Alice');
});
```

**What's happening here?**

1. We create a prompt with variable interpolation using `{{name}}`
2. We specify OpenAI's GPT-4o-mini as our LLM provider
3. We test with the variable `name` set to "Alice"
4. We assert that the response contains "Alice"

When you run this test, the plugin will:

- Send the prompt to OpenAI with "Alice" substituted for `{{name}}`
- Receive the response
- Verify that "Alice" appears in the response
- Pass or fail the test accordingly

Documentation
-------------

[](#documentation)

### Core Functions

[](#core-functions)

#### `prompt()`

[](#prompt)

Create a new evaluation with one or more prompts. Use `{{variable}}` syntax for variable interpolation.

```
// Single prompt
prompt('You are a helpful assistant.');

// Multiple prompts (tested against each other)
prompt(
    'You are a helpful assistant.',
    'You are a professional assistant.'
);

// With variables
prompt('Greet {{name}} warmly.');
```

#### `provider()`

[](#provider)

Register a global provider like Pest datasets that can be reused across multiple tests. Providers registered with this function can be referenced by name in `usingProvider()`.

```
// Register a simple provider
provider('openai-gpt4')->id('openai:gpt-4');

// Register with full configuration
provider('custom-openai')
    ->id('openai:gpt-4')
    ->label('Custom OpenAI')
    ->temperature(0.7)
    ->maxTokens(2000);

// Use in tests
prompt('Hello')
    ->usingProvider('custom-openai')
    ->expect()
    ->toContain('Hi');
```

#### `assertion()`

[](#assertion)

Register a reusable assertion group by name. Groups can be defined fluently or with a callback that receives the `TestCase` (and optional parameters), and then reused via `to()` / `group()` or magic `toXxx` methods.

```
// Fluent group definition
assertion('be nice')
    ->toBeJudged('friendly')
    ->toContain('please');

prompt('Explain {{topic}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'AI'])
    ->to('be nice');

// Callback group with arguments
assertion('be kind', function (TestCase $tc, string $tone): void {
    $tc->toBeJudged("response is {$tone} and helpful")
        ->toContain($tone);
});

prompt('Explain {{topic}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'AI'])
    ->to('be kind', ['tone' => 'friendly']);

// Magic method equivalent of to('be nice') / to('be kind', ...)
prompt('Explain {{topic}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'AI'])
    ->toBeNice()
    ->toBeKind(['tone' => 'friendly']);
```

### Evaluation Methods

[](#evaluation-methods)

#### `describe()`

[](#describe)

Add a description to your evaluation for better test output and debugging.

```
prompt('You are a helpful assistant.')
    ->describe('Tests basic assistant greeting')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('Hello');
```

#### `usingProvider()`

[](#usingprovider)

Specify which LLM provider(s) to use for evaluation. You can pass provider IDs, `Provider` instances, callables, or registered provider names.

```
// Single provider by ID
prompt('Hello')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('Hi');

// Multiple providers (compares responses)
prompt('What is 2+2?')
    ->usingProvider('openai:gpt-4o-mini', 'anthropic:claude-3')
    ->expect()
    ->toContain('4');

// Provider instance
$provider = provider()
    ->id('openai:gpt-4')
    ->temperature(0.7);

prompt('Hello')
    ->usingProvider($provider)
    ->expect()
    ->toContain('Hi');

// Use default provider (openai:gpt-4o-mini)
prompt('Hello')
    ->expect()
    ->toContain('Hi');
```

#### `alwaysExpect()`

[](#alwaysexpect)

Set default assertions and variables that apply to **all** test cases in the evaluation. This is useful when you want to ensure certain conditions are met for every test case without repeating the assertions.

```
prompt('Translate {{message}} to {{language}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->alwaysExpect(['message' => 'Hello World!'])
    ->toBeJudged('the language is always a friendly variant')
    ->toBeJudged('the source and output language are always mentioned in the response')
    ->expect(['language' => 'es'])
    ->toContain('hola')
    ->toBeJudged('Contains the translation of Hello world! in spanish');
```

**With callback:**

You can pass an optional callback function to configure the default test case:

```
prompt('Translate {{message}} to {{language}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->alwaysExpect(
        ['message' => 'Hello World!'],
        function (TestCase $testCase) {
            $testCase
                ->toBeJudged('the language is always a friendly variant')
                ->toBeJudged('the source and output language are always mentioned in the response');
        }
    )
    ->expect(['language' => 'es'])
    ->toContain('hola');
```

**Key points:**

- `alwaysExpect()` returns a `TestCase` instance that supports all assertion methods
- Assertions added via `alwaysExpect()` apply to every test case in the evaluation
- Default variables can be set and will be merged with test case variables
- You can chain multiple assertions after `alwaysExpect()` or use a callback
- The default test case is separate from regular test cases and won't appear in the `testCases()` array
- If `alwaysExpect()` is called multiple times, subsequent calls will execute the callback on the existing default test case

**Use cases:**

- Ensure all responses meet quality standards (e.g., "always be professional")
- Set common variables that apply to all tests
- Enforce safety checks across all test cases
- Apply format requirements universally (e.g., "always contain JSON")

#### `expect()`

[](#expect)

Create a test case with variables that will be substituted into your prompt template.

```
prompt('Greet {{name}} warmly.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['name' => 'Alice'])
    ->toContain('Alice');

// Multiple variables
prompt('{{greeting}}, {{name}}!')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['greeting' => 'Hello', 'name' => 'Bob'])
    ->toContain('Hello')
    ->toContain('Bob');

// Empty variables (no substitution)
prompt('You are a helpful assistant.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('assistant');
```

**With callback:**

You can pass an optional callback function that receives the created `TestCase` instance. This is useful for grouping multiple assertions or applying conditional logic.

```
prompt('Greet {{name}} warmly.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['name' => 'Alice'], function (TestCase $testCase) {
        $testCase
            ->toContain('Alice')
            ->toContain('Hello')
            ->toBeJudged('response is friendly and welcoming');
    });

// Using arrow function
prompt('Translate {{text}} to {{language}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(
        ['text' => 'Hello', 'language' => 'Spanish'],
        fn (TestCase $tc) => $tc
            ->toContain('Hola')
            ->toBeJudged('translation is accurate')
    );
```

#### `and()`

[](#and)

Chain multiple test cases for the same evaluation. Each call to `and()` creates a new test case with different variables.

```
prompt('Greet {{name}} warmly.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['name' => 'Alice'])
    ->toContain('Alice')
    ->and(['name' => 'Bob'])
    ->toContain('Bob')
    ->and(['name' => 'Charlie'])
    ->toContain('Charlie');
```

**With callback:**

You can pass an optional callback function that receives the newly created `TestCase`:

```
prompt('Greet {{name}} warmly.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['name' => 'Alice'])
    ->toContain('Alice')
    ->and(['name' => 'Bob'], function (TestCase $testCase) {
        $testCase
            ->toContain('Bob')
            ->toBeJudged('response is warm and friendly');
    })
    ->and(['name' => 'Charlie'], fn (TestCase $tc) => $tc->toContain('Charlie'));
```

#### `to()` and `group()`

[](#to-and-group)

Group multiple assertions together using a callback or invokable class. Both `to()` and `group()` are aliases that execute a callback with the current test case, allowing you to organize assertions logically.

**Using callbacks:**

```
prompt('Explain {{topic}} in detail.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'quantum computing'])
    ->to(function (TestCase $testCase) {
        $testCase
            ->toContain('quantum')
            ->toContain('computing')
            ->toBeJudged('explanation is clear and accurate')
            ->toHaveLatency(2000);
    });

// Using group() (same as to())
prompt('Analyze {{data}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['data' => 'sales figures'])
    ->group(function (TestCase $testCase) {
        $testCase
            ->toContain('analysis')
            ->toBeJudged('analysis is thorough');
    });

// Chaining multiple groups
prompt('Review {{document}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['document' => 'contract'])
    ->to(fn (TestCase $tc) => $tc->toContain('terms'))
    ->group(fn (TestCase $tc) => $tc->toBeJudged('review is comprehensive'))
    ->to(fn (TestCase $tc) => $tc->toHaveLatency(1500));
```

**Using invokable classes:**

You can also pass an invokable class (a class with an `__invoke()` method) to reuse assertion logic across multiple tests.

```
// Define an invokable class
class QualityAssertions
{
    public function __invoke(TestCase $testCase): void
    {
        $testCase
            ->toBeJudged('response is professional and accurate')
            ->toHaveLatency(2000)
            ->not->toBeRefused();
    }
}

// Use the class by FQN
prompt('Explain {{topic}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'AI'])
    ->to(QualityAssertions::class);

// Or use an instance
prompt('Explain {{topic}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['topic' => 'AI'])
    ->to(new QualityAssertions);

// Works with group() too
prompt('Analyze {{data}}.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['data' => 'metrics'])
    ->group(QualityAssertions::class);
```

**Key points:**

- `to()` and `group()` are functionally identical - use whichever reads better in your context
- Accepts either a callable or an invokable class FQN (fully qualified name)
- The callback/invokable receives the current `TestCase` instance
- Useful for organizing related assertions together
- Can be chained multiple times
- Works with all assertion methods

**Use cases:**

- Group related assertions for better code organization
- Apply conditional logic based on test case variables
- Reuse assertion patterns across multiple test cases with invokable classes
- Create reusable assertion libraries for common quality checks

### Assertion Methods

[](#assertion-methods)

- [`toContain()`](#tocontain)
- [`toContainAll()`](#tocontainall)
- [`toContainAny()`](#tocontainany)
- [`toContainJson()`](#tocontainjson)
- [`toContainHtml()`](#tocontainhtml)
- [`toContainSql()`](#tocontainsql)
- [`toContainXml()`](#tocontainxml)
- [`toEqual()`](#toequal)
- [`toBe()`](#tobe)
- [`toBeJudged()`](#tobejudged)
- [`startsWith()`](#startswith)
- [`toMatchRegex()`](#tomatchregex)
- [`toBeJson()`](#tobejson)
- [`toEqualJson()`](#toequaljson)
- [`toMatchJsonStructure()`](#tomatchjsonstructure)
- [`toHaveJsonFragment()`](#tohavejsonfragment)
- [`toHaveJsonFragments()`](#tohavejsonfragments)
- [`toHaveJsonPath()`](#tohavejsonpath)
- [`toHaveJsonPaths()`](#tohavejsonpaths)
- [`toHaveJsonType()`](#tohavejsontype)
- [`toBeHtml()`](#tobehtml)
- [`toBeSql()`](#tobesql)
- [`toBeXml()`](#tobexml)
- [`toBeSimilar()`](#tobesimilar)
- [`toHaveLevenshtein()`](#tohavelevenshtein)
- [`toHaveRougeN()`](#tohaverougen)
- [`toHaveFScore()`](#tohavefscore)
- [`toHavePerplexity()`](#tohaveperplexity)
- [`toHavePerplexityScore()`](#tohaveperplexityscore)
- [`toHaveCost()`](#tohavecost)
- [`toHaveLatency()`](#tohavelatency)
- [`toHaveValidFunctionCall()`](#tohavevalidfunctioncall)
- [`toHaveValidOpenaiFunctionCall()`](#tohavevalidopenaifunctioncall)
- [`toHaveValidOpenaiToolsCall()`](#tohavevalidopenaitoolscall)
- [`toHaveToolCallF1()`](#tohavetoolcallf1)
- [`toHaveFinishReason()`](#tohavefinishreason)
- [`toBeClassified()`](#tobeclassified)
- [`toBeScoredByPi()`](#tobescoredbypi)
- [`toBeRefused()`](#toberefused)
- [`toPassJavascript()`](#topassjavascript)
- [`toPassPython()`](#topasspython)
- [`toPassWebhook()`](#topasswebhook)
- [`toHaveTraceSpanCount()`](#tohavetracespancount)
- [`toHaveTraceSpanDuration()`](#tohavetracespanduration)
- [`toHaveTraceErrorSpans()`](#tohavetraceerrorspans)
- [`not` Modifier](#not-modifier)

#### `toContain()`

[](#tocontain)

Assert that the response contains specific text. Case-insensitive by default.

```
prompt('What is the capital of France?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('Paris');

// Case-sensitive matching
prompt('What is the capital of France?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('Paris', strict: true);

// With threshold (similarity score, 0.0 to 1.0)
prompt('Explain quantum computing.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('quantum', threshold: 0.8);

// With custom options
prompt('What is AI?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContain('artificial intelligence', options: ['normalize': true]);
```

#### `toContainAll()`

[](#tocontainall)

Assert that the response contains all of the specified strings.

```
prompt('Describe a healthy meal.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainAll(['protein', 'vegetables', 'grains']);

// Case-sensitive
prompt('Describe a healthy meal.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainAll(['Protein', 'Vegetables'], strict: true);

// With threshold
prompt('Describe a healthy meal.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainAll(['protein', 'vegetables'], threshold: 0.9);
```

#### `toContainAny()`

[](#tocontainany)

Assert that the response contains at least one of the specified strings.

```
prompt('What is the weather like?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainAny(['sunny', 'rainy', 'cloudy']);

// Case-sensitive
prompt('What is the weather like?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainAny(['Sunny', 'Rainy'], strict: true);
```

#### `toContainJson()`

[](#tocontainjson)

Assert that the response contains valid JSON.

```
prompt('Return user data as JSON: name, age, email')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainJson();
```

#### `toContainHtml()`

[](#tocontainhtml)

Assert that the response contains valid HTML.

```
prompt('Generate an HTML list of fruits')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainHtml();
```

#### `toContainSql()`

[](#tocontainsql)

Assert that the response contains valid SQL.

```
prompt('Write a SQL query to select all users')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainSql();
```

#### `toContainXml()`

[](#tocontainxml)

Assert that the response contains valid XML.

```
prompt('Generate XML for a product catalog')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toContainXml();
```

#### `toEqual()`

[](#toequal)

Assert that the response exactly equals the expected value. This is useful for deterministic outputs where you expect an exact match. You can also check whether it matches the expected JSON format.

```
prompt('Calculate 335 + 85. Return only the number.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toEqual(420);
```

#### `toBe()`

[](#tobe)

This is a convenience alias of `toEqual()`.

```
prompt('Calculate 335 + 85. Return only the number.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBe(420);
```

#### `toBeJudged()`

[](#tobejudged)

Use an LLM to evaluate the response against a natural language rubric. This is useful for subjective quality checks.

```
prompt('Explain quantum computing to a beginner.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeJudged('The explanation should be clear, accurate, and use simple language.');

// With threshold (minimum score 0.0 to 1.0)
prompt('Write a product description.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeJudged('The description should be persuasive and highlight key features.', threshold: 0.8);

// With custom options
prompt('Write a product description.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeJudged('Should be professional and engaging.', options: ['provider': 'openai:gpt-4']);
```

#### `startsWith()`

[](#startswith)

Assert that the response starts with a specific prefix.

```
prompt('Generate a greeting.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->startsWith('Hello');

// Case-sensitive
prompt('Generate a greeting.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->startsWith('Hello', strict: true);

// With threshold
prompt('Generate a greeting.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->startsWith('Hello', threshold: 0.9);
```

#### `toMatchRegex()`

[](#tomatchregex)

Assert that the response matches a regular expression pattern.

```
prompt('Generate a phone number.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toMatchRegex('/\d{3}-\d{3}-\d{4}/');

// With threshold
prompt('Generate a phone number.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toMatchRegex('/\d{3}-\d{3}-\d{4}/', threshold: 0.9);
```

#### `toBeJson()`

[](#tobejson)

Assert that the response is valid JSON (not just contains JSON).

```
prompt('Return user data as JSON: name, age, email')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeJson();

// With JSON schema validation
prompt('Return user data as JSON.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeJson([
        'type' => 'object',
        'properties' => [
            'name' => ['type' => 'string'],
            'age' => ['type' => 'number'],
        ],
        'required' => ['name', 'age'],
    ]);
```

#### `toEqualJson()`

[](#toequaljson)

Assert that the JSON output exactly equals the expected value. Object key order is ignored, but array order is preserved. This is similar to Laravel's `assertExactJson()`.

```
prompt('Extract the person info from: {{text}}')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['text' => 'John is 30 years old'])
    ->toEqualJson([
        'name' => 'John',
        'age' => 30,
    ]);

// Works with nested structures
prompt('Extract address info.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toEqualJson([
        'user' => [
            'name' => 'John',
            'address' => [
                'city' => 'Amsterdam',
            ],
        ],
    ]);
```

#### `toMatchJsonStructure()`

[](#tomatchjsonstructure)

Assert that the JSON output contains all expected keys. This validates structure without checking values, similar to Laravel's `assertJsonStructure()`.

```
// Simple key validation
prompt('Return user data as JSON.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toMatchJsonStructure(['name', 'age', 'email']);

// Nested structure validation
prompt('Return user with address.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toMatchJsonStructure([
        'name',
        'address' => ['street', 'city', 'country'],
    ]);

// Array items with wildcard (*)
prompt('Return a list of users.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toMatchJsonStructure([
        'users' => [
            '*' => ['id', 'name', 'email'],
        ],
    ]);
```

#### `toHaveJsonFragment()`

[](#tohavejsonfragment)

Assert that the JSON output contains specific key-value pairs. Similar to Laravel's `assertJsonFragment()`.

```
prompt('Extract person info from: {{text}}')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['text' => 'John Doe is 30 years old'])
    ->toHaveJsonFragment(['name' => 'John Doe'])
    ->toHaveJsonFragment(['age' => 30]);

// Works with nested values
prompt('Extract user with address.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonFragment([
        'address' => ['city' => 'Amsterdam'],
    ]);
```

#### `toHaveJsonFragments()`

[](#tohavejsonfragments)

Assert that the JSON output contains all specified fragments.

```
prompt('Extract person info from: {{text}}')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['text' => 'Jane Smith is 25 years old and lives in Berlin'])
    ->toHaveJsonFragments([
        ['name' => 'Jane Smith'],
        ['age' => 25],
        ['city' => 'Berlin'],
    ]);
```

#### `toHaveJsonPath()`

[](#tohavejsonpath)

Assert that a value exists at a specific JSON path. Supports dot notation, numeric array indices, and wildcards.

```
// Check path exists
prompt('Return user with address.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPath('name')
    ->toHaveJsonPath('address.city');

// Check path has specific value
prompt('Extract person info.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPath('name', 'John Doe')
    ->toHaveJsonPath('address.city', 'Amsterdam');

// Array index access
prompt('Return list of users.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPath('users.0.name')
    ->toHaveJsonPath('users.1.name', 'Jane');

// Wildcard for all array items
prompt('Return list of users.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPath('users.*.name')
    ->toHaveJsonPath('users.*.status', 'active');
```

#### `toHaveJsonPaths()`

[](#tohavejsonpaths)

Assert that multiple JSON paths exist, optionally with expected values.

```
// Check paths exist (array of strings)
prompt('Return user data.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPaths(['name', 'email', 'address.city']);

// Check paths with values (associative array)
prompt('Extract person info from: {{text}}')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect(['text' => 'Grace Lee is 28 years old and lives in Seoul'])
    ->toHaveJsonPaths([
        'name' => 'Grace Lee',
        'age' => 28,
        'city' => 'Seoul',
    ]);

// Mix of existence and value checks with wildcards
prompt('Return users list.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonPaths([
        'users.*.name',
        'users.*.type' => 'customer',
    ]);
```

#### `toHaveJsonType()`

[](#tohavejsontype)

Assert that the value at a JSON path has the expected type. Supports: `string`, `number`, `boolean`, `array`, `object`, `null`.

```
// Basic type validation
prompt('Return user data.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonType('name', 'string')
    ->toHaveJsonType('age', 'number')
    ->toHaveJsonType('active', 'boolean');

// Nested path type validation
prompt('Return user with address.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonType('address', 'object')
    ->toHaveJsonType('address.city', 'string');

// Array and wildcard type validation
prompt('Return list of users.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveJsonType('users', 'array')
    ->toHaveJsonType('users.*.name', 'string')
    ->toHaveJsonType('users.*.age', 'number');
```

#### `toBeHtml()`

[](#tobehtml)

Assert that the response is valid HTML.

```
prompt('Generate an HTML list of fruits')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeHtml();
```

#### `toBeSql()`

[](#tobesql)

Assert that the response is valid SQL (not just contains SQL).

```
prompt('Write a SQL query to select all users')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSql();

// With authority list (allowed SQL operations)
prompt('Write a SQL query.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSql(['SELECT', 'INSERT']);
```

#### `toBeXml()`

[](#tobexml)

Assert that the response is valid XML.

```
prompt('Generate XML for a product catalog')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeXml();
```

#### `toBeSimilar()`

[](#tobesimilar)

Assert that the response is semantically similar to the expected value using embedding similarity.

```
prompt('Explain artificial intelligence.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSimilar('AI is the simulation of human intelligence by machines');

// With threshold (default is 0.75)
prompt('Explain artificial intelligence.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSimilar('AI explanation', threshold: 0.8);

// With custom embedding provider
prompt('Explain artificial intelligence.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSimilar('AI explanation', provider: 'huggingface:sentence-similarity:model');

// Multiple expected values
prompt('Explain artificial intelligence.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeSimilar(['AI explanation', 'Machine intelligence', 'Artificial intelligence definition']);
```

#### `toHaveLevenshtein()`

[](#tohavelevenshtein)

Assert that the Levenshtein (edit) distance between the response and expected value is below a threshold.

```
prompt('Spell the word "hello".')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveLevenshtein('hello', threshold: 2.0);
```

#### `toHaveRougeN()`

[](#tohaverougen)

Assert that the ROUGE-N score is above a threshold.

```
prompt('Summarize this article.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveRougeN(1, 'Expected summary', threshold: 0.7);

// ROUGE-2
prompt('Summarize this article.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveRougeN(2, 'Expected summary', threshold: 0.6);
```

#### `toHaveFScore()`

[](#tohavefscore)

Assert that the F-score is above a threshold.

```
prompt('Extract entities from the text.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFScore('Expected entities', threshold: 0.8);
```

#### `toHavePerplexity()`

[](#tohaveperplexity)

Assert that the perplexity is below a threshold.

```
prompt('Generate coherent text.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHavePerplexity(threshold: 10.0);
```

#### `toHavePerplexityScore()`

[](#tohaveperplexityscore)

Assert that the normalized perplexity score is below a threshold.

```
prompt('Generate coherent text.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHavePerplexityScore(threshold: 0.5);
```

#### `toHaveCost()`

[](#tohavecost)

Assert that the inference cost is below a maximum threshold.

```
prompt('Generate a short response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveCost(0.01);
```

#### `toHaveLatency()`

[](#tohavelatency)

Assert that the response latency is below a maximum threshold (in milliseconds).

```
prompt('Generate a quick response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveLatency(1000);
```

#### `toHaveValidFunctionCall()`

[](#tohavevalidfunctioncall)

Assert that the response contains a valid function call matching the provided schema.

```
prompt('Call the weather function.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveValidFunctionCall([
        'type' => 'object',
        'properties' => [
            'name' => ['type' => 'string'],
            'arguments' => ['type' => 'object'],
        ],
    ]);
```

#### `toHaveValidOpenaiFunctionCall()`

[](#tohavevalidopenaifunctioncall)

Assert that the response contains a valid OpenAI function call.

```
prompt('Call the weather function.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveValidOpenaiFunctionCall();
```

#### `toHaveValidOpenaiToolsCall()`

[](#tohavevalidopenaitoolscall)

Assert that the response contains valid OpenAI tool calls.

```
prompt('Use the available tools.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveValidOpenaiToolsCall();
```

#### `toHaveToolCallF1()`

[](#tohavetoolcallf1)

Assert that the F1 score comparing actual vs expected tool calls is above a threshold.

```
prompt('Call the weather and time functions.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveToolCallF1(['weather', 'time'], threshold: 0.8);
```

#### `toHaveFinishReason()`

[](#tohavefinishreason)

Assert that the model stopped for the expected reason. You can use either a string or the `FinishReason` enum.

**Standard Finish Reasons:**

- `stop`: Natural completion (reached end of response, stop sequence matched)
- `length`: Token limit reached (max\_tokens exceeded, context length reached)
- `content_filter`: Content filtering triggered due to safety policies
- `tool_calls`: Model made function/tool calls

```
use KevinPijning\Prompt\Enums\FinishReason;

// Using string
prompt('Generate a response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReason('stop');

// Using enum
prompt('Generate a response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReason(FinishReason::Stop);

// Check for tool calls
prompt('Use available tools.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReason(FinishReason::ToolCalls);
```

**Convenience Methods:**

For each finish reason, there's a dedicated convenience method:

```
// Natural completion
prompt('Generate a response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReasonStop();

// Token limit reached
prompt('Generate a very long response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReasonLength();

// Content filter triggered
prompt('Generate harmful content.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReasonContentFilter();

// Tool calls made
prompt('Use available tools.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveFinishReasonToolCalls();
```

#### `toBeClassified()`

[](#tobeclassified)

Assert that a HuggingFace classifier returns the expected class above a threshold.

```
// Sentiment analysis
prompt('Write a positive review.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeClassified(
        'huggingface:text-classification:distilbert-base-uncased-finetuned-sst-2-english',
        'POSITIVE',
        threshold: 0.8
    );

// Hate speech detection
prompt('Write a friendly message.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeClassified(
        'huggingface:text-classification:facebook/roberta-hate-speech-dynabench-r4-target',
        'nothate',
        threshold: 0.9
    );
```

#### `toBeScoredByPi()`

[](#tobescoredbypi)

Use Pi Labs' preference scoring model as an alternative to LLM-as-a-judge.

```
prompt('Write a helpful response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeScoredByPi('Is the response not apologetic and provides a clear, concise answer?', threshold: 0.8);
```

#### `toBeRefused()`

[](#toberefused)

Assert that the LLM output indicates the model refused to perform the requested task.

```
prompt('Write harmful content.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toBeRefused();

// Ensure model does NOT refuse safe requests
prompt('What is 2+2?')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->not->toBeRefused();
```

#### `toPassJavascript()`

[](#topassjavascript)

Assert that a custom JavaScript function validates the output.

```
prompt('Generate a response longer than 10 characters.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toPassJavascript('return output.length > 10;');
```

#### `toPassPython()`

[](#topasspython)

Assert that a custom Python function validates the output.

```
prompt('Generate a response longer than 10 characters.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toPassPython('return len(output) > 10');
```

#### `toPassWebhook()`

[](#topasswebhook)

Assert that a webhook returns `{pass: true}`.

```
prompt('Generate a response.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toPassWebhook('https://example.com/validate');
```

#### `toHaveTraceSpanCount()`

[](#tohavetracespancount)

Assert that trace spans matching patterns meet min/max thresholds.

```
prompt('Process the request.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveTraceSpanCount(['pattern1', 'pattern2'], min: 1, max: 5);
```

#### `toHaveTraceSpanDuration()`

[](#tohavetracespanduration)

Assert that trace span durations meet percentile and max duration thresholds.

```
prompt('Process the request.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->toHaveTraceSpanDuration(['pattern1'], percentile: 0.95, maxDuration: 1000.0);
```

#### `toHaveTraceErrorSpans()`

[](#tohavetraceerrorspans)

Detect errors in traces by status codes, attributes, and messages.

```
prompt('Process the request.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->not->toHaveTraceErrorSpans();
```

#### `not` Modifier

[](#not-modifier)

Negate any assertion by using the `not` modifier.

```
prompt('Write a happy birthday message.')
    ->usingProvider('openai:gpt-4o-mini')
    ->expect()
    ->not->toContain('condolences');
```

### Provider Configuration

[](#provider-configuration)

When creating or configuring providers, you can use these methods:

#### `id()`

[](#id)

Set the provider identifier (e.g., `'openai:gpt-4'`, `'anthropic:claude-3'`).

```
provider()
    ->id('openai:gpt-4o-mini');
```

#### `label()`

[](#label)

Set a custom label for the provider (useful in test output).

```
provider()
    ->id('openai:gpt-4')
    ->label('OpenAI GPT-4 Production');
```

#### `temperature()`

[](#temperature)

Control randomness in responses (0.0 to 1.0). Lower values make responses more deterministic.

```
provider()
    ->id('openai:gpt-4')
    ->temperature(0.7);
```

#### `maxTokens()`

[](#maxtokens)

Set the maximum number of tokens to generate.

```
provider()
    ->id('openai:gpt-4')
    ->maxTokens(2000);
```

#### `topP()`

[](#topp)

Set nucleus sampling parameter (0.0 to 1.0).

```
provider()
    ->id('openai:gpt-4')
    ->topP(0.9);
```

#### `frequencyPenalty()`

[](#frequencypenalty)

Penalize frequent tokens (-2.0 to 2.0).

```
provider()
    ->id('openai:gpt-4')
    ->frequencyPenalty(0.5);
```

#### `presencePenalty()`

[](#presencepenalty)

Penalize new tokens based on presence in text (-2.0 to 2.0).

```
provider()
    ->id('openai:gpt-4')
    ->presencePenalty(0.3);
```

#### `stop()`

[](#stop)

Set stop sequences where generation should stop.

```
provider()
    ->id('openai:gpt-4')
    ->stop(['\n', 'Human:', 'AI:']);
```

#### `config()`

[](#config)

Set custom configuration options for the provider. Accepts either an array (replaces config) or a closure (receives current config for merging).

```
// Replace config with array
provider()
    ->id('openai:gpt-4')
    ->config([
        'apiKey' => 'custom-key',
        'baseURL' => 'https://api.example.com',
    ]);

// Merge config with closure
provider()
    ->id('openai:gpt-4')
    ->config(['existing' => 'value'])
    ->config(fn (array $config) => [...$config, 'apiKey' => 'custom-key']);
```

#### Extending Provider

[](#extending-provider)

The `Provider` class uses Pest's `Extendable` trait, allowing you to add custom methods:

```
// Register a custom extension
provider()->extend('withJsonMode', function (Provider $provider): void {
    $provider->config(fn (array $config) => [
        ...$config,
        'response_format' => ['type' => 'json_object'],
    ]);
});

// Use the extension
provider()
    ->id('openai:gpt-4')
    ->withJsonMode()
    ->temperature(0.7);

// Create presets
provider()->extend('preset', function (Provider $provider, string $name): void {
    match ($name) {
        'creative' => $provider->temperature(0.9)->topP(0.95),
        'precise' => $provider->temperature(0.1)->topP(0.1),
        default => null,
    };
});

provider()
    ->id('openai:gpt-4')
    ->preset('creative')
    ->maxTokens(1000);
```

### Usage Examples

[](#usage-examples)

#### Basic Example

[](#basic-example)

```
test('assistant greets user correctly', function () {
    prompt('You are a helpful assistant. Greet {{name}} warmly.')
        ->usingProvider('openai:gpt-4o-mini')
        ->expect(['name' => 'Alice'])
        ->toContain('Alice');
});
```

#### Multiple Prompts

[](#multiple-prompts)

Test multiple prompt variations against the same test cases.

```
test('prompt variations work', function () {
    prompt(
        'You are a helpful assistant.',
        'You are a professional assistant.',
        'You are a friendly assistant.'
    )
        ->usingProvider('openai:gpt-4o-mini')
        ->expect()
        ->toContain('assistant');
});
```

#### Multiple Providers

[](#multiple-providers)

Compare responses across different LLM providers.

```
test('providers give consistent answers', function () {
    prompt('What is 2+2?')
        ->usingProvider('openai:gpt-4o-mini', 'anthropic:claude-3')
        ->expect()
        ->toContain('4');
});
```

#### Multiple Test Cases

[](#multiple-test-cases)

Test the same prompt with different variable values.

```
test('greeting works for different names', function () {
    prompt('Greet {{name}} warmly.')
        ->usingProvider('openai:gpt-4o-mini')
        ->expect(['name' => 'Alice'])
        ->toContain('Alice')
        ->and(['name' => 'Bob'])
        ->toContain('Bob')
        ->and(['name' => 'Charlie'])
        ->toContain('Charlie');
});
```

#### Default Test Cases

[](#default-test-cases)

Use `alwaysExpect()` to set assertions that apply to all test cases.

```
test('all translations meet quality standards', function () {
    prompt('Translate {{message}} to {{language}} in the style {{style}}.')
        ->usingProvider('openai:gpt-4o-mini')
        ->alwaysExpect(['style' => 'friendly'])
        ->toBeJudged('the translation is always accurate and natural')
        ->toBeJudged('the response is always in a friendly tone')
        ->expect(['message' => 'Hello', 'language' => 'es'])
        ->toContain('hola')
        ->expect(['message' => 'Goodbye', 'language' => 'fr'])
        ->toContain('au revoir');
});
```

#### Provider Configuration

[](#provider-configuration-1)

Configure providers with specific parameters.

```
test('creative writing with high temperature', function () {
    $creativeProvider = provider()
        ->id('openai:gpt-4')
        ->temperature(0.9)
        ->maxTokens(500);

    prompt('Write a creative story about {{topic}}.')
        ->usingProvider($creativeProvider)
        ->expect(['topic' => 'space exploration'])
        ->toContain('space');
});
```

#### Global Provider Registration

[](#global-provider-registration)

Register providers once and reuse them across tests.

```
provider('openai-gpt4')
    ->id('openai:gpt-4')
    ->temperature(0.7)
    ->maxTokens(2000);

test('uses registered provider', function () {
    prompt('Hello')
        ->usingProvider('openai-gpt4')
        ->expect()
        ->toContain('Hi');
});
```

#### Advanced Assertions

[](#advanced-assertions)

Combine multiple assertion types.

```
test('response meets multiple criteria', function () {
    prompt('Generate a user profile as JSON with name, email, and age.')
        ->usingProvider('openai:gpt-4o-mini')
        ->expect()
        ->toContainJson()
        ->toContainAll(['name', 'email', 'age'])
        ->toBeJudged('The JSON should be well-structured and include all required fields.');
});
```

#### LLM-Based Evaluation

[](#llm-based-evaluation)

Use AI to evaluate response quality.

```
test('response quality meets standards', function () {
    prompt('Explain machine learning to a beginner.')
        ->usingProvider('openai:gpt-4o-mini')
        ->expect()
        ->toBeJudged('The explanation should be clear, accurate, use simple language, and include examples.', threshold: 0.85);
});
```

#### Structured JSON Output Testing

[](#structured-json-output-testing)

Test structured JSON outputs from LLMs, particularly useful with OpenAI's Responses API and structured output features.

```
// Register a provider with structured output schema
provider('person-extractor', static fn (Provider $provider): Provider => $provider
    ->id('openai:responses:gpt-4o-mini')
    ->config([
        'response_format' => [
            'name' => 'person_info',
            'type' => 'json_schema',
            'strict' => true,
            'schema' => [
                'type' => 'object',
                'properties' => [
                    'name' => ['type' => 'string'],
                    'age' => ['type' => 'number'],
                    'city' => ['type' => 'string'],
                ],
                'required' => ['name', 'age', 'city'],
                'additionalProperties' => false,
            ],
        ],
    ]));

test('extracts person info with full validation', function () {
    prompt('Extract the person info from this text: {{text}}')
        ->describe('Testing structured JSON output')
        ->usingProvider('person-extractor')
        ->expect(['text' => 'John Doe is 30 years old and lives in Amsterdam.'])
        // Validate structure
        ->toMatchJsonStructure(['name', 'age', 'city'])
        // Validate specific values
        ->toHaveJsonFragment(['name' => 'John Doe', 'city' => 'Amsterdam'])
        // Validate types
        ->toHaveJsonType('name', 'string')
        ->toHaveJsonType('age', 'number')
        // Validate exact match
        ->toEqualJson([
            'name' => 'John Doe',
            'age' => 30,
            'city' => 'Amsterdam',
        ]);
});

// Testing array outputs with nested structures
provider('people-extractor', static fn (Provider $provider): Provider => $provider
    ->id('openai:responses:gpt-4o-mini')
    ->config([
        'response_format' => [
            'name' => 'people_list',
            'type' => 'json_schema',
            'strict' => true,
            'schema' => [
                'type' => 'object',
                'properties' => [
                    'people' => [
                        'type' => 'array',
                        'items' => [
                            'type' => 'object',
                            'properties' => [
                                'name' => ['type' => 'string'],
                                'role' => ['type' => 'string'],
                            ],
                            'required' => ['name', 'role'],
                        ],
                    ],
                ],
                'required' => ['people'],
            ],
        ],
    ]));

test('extracts multiple people with array validation', function () {
    prompt('Extract all people from: {{text}}')
        ->usingProvider('people-extractor')
        ->expect(['text' => 'The team has Mike (developer) and Sarah (designer).'])
        // Validate array structure with wildcard
        ->toMatchJsonStructure([
            'people' => [
                '*' => ['name', 'role'],
            ],
        ])
        // Validate array item access
        ->toHaveJsonPath('people.0.name')
        ->toHaveJsonPath('people.1.name')
        // Validate all items have specific type
        ->toHaveJsonType('people', 'array')
        ->toHaveJsonType('people.*.name', 'string')
        ->toHaveJsonType('people.*.role', 'string');
});
```

#### Complex Example

[](#complex-example)

A comprehensive example showing multiple features together.

```
    // Register global providers
provider('support-gpt4')
    ->id('openai:gpt-4')
    ->temperature(0.3);

provider('support-claude')
    ->id('anthropic:claude-3')
    ->temperature(0.3);

test('customer service prompt evaluation', function () {
    // Test multiple prompts across multiple providers
    prompt(
        'You are a customer support agent. Help the customer with: {{issue}}',
        'As a support agent, assist with: {{issue}}'
    )
        ->describe('Customer service prompt evaluation')
        ->usingProvider('support-gpt4', 'support-claude')
        ->expect(['issue' => 'refund request'])
        ->toContainAll(['refund', 'help'], strict: false)
        ->toBeJudged('Response should be professional, empathetic, and helpful.', threshold: 0.8)
        ->and(['issue' => 'product question'])
        ->toContainAny(['product', 'feature', 'specification'])
        ->toBeJudged('Response should accurately answer the product question.');
});
```

### CLI Options

[](#cli-options)

#### `--output`

[](#--output)

Save promptfoo evaluation results to a directory. Useful for debugging and analysis.

```
# Use default output directory (prompt-tests-output/)
vendor/bin/pest --output

# Specify custom output directory
vendor/bin/pest --output=my-results/

# Alternative syntax
vendor/bin/pest --output my-results/
```

The output directory will contain HTML reports and JSON data from promptfoo evaluations.

Parallel Test Support
---------------------

[](#parallel-test-support)

This plugin supports parallel test execution with Pest's `--parallel` flag. Cache isolation and merging is handled automatically.

```
vendor/bin/pest --parallel
```

Credits &amp; License
---------------------

[](#credits--license)

**Created by:** Kevin Pijning

**Built on the shoulders of giants:**

- [Pest](https://pestphp.com) - The elegant PHP testing framework
- [promptfoo](https://www.promptfoo.dev/) - LLM evaluation framework
- [Symfony Components](https://symfony.com) - Process and YAML handling

**License:** MIT License

See the [LICENSE](LICENSE.md) file for full details.

---

**Ready to start testing your prompts?** Install the plugin and write your first test in under a minute. Happy testing!

###  Health Score

45

—

FairBetter than 93% of packages

Maintenance80

Actively maintained with recent releases

Popularity27

Limited adoption so far

Community11

Small or concentrated contributor base

Maturity50

Maturing project, gaining track record

 Bus Factor1

Top contributor holds 83.8% of commits — single point of failure

How is this calculated?**Maintenance (25%)** — Last commit recency, latest release date, and issue-to-star ratio. Uses a 2-year decay window.

**Popularity (30%)** — Total and monthly downloads, GitHub stars, and forks. Logarithmic scaling prevents top-heavy scores.

**Community (15%)** — Contributors, dependents, forks, watchers, and maintainers. Measures real ecosystem engagement.

**Maturity (30%)** — Project age, version count, PHP version support, and release stability.

###  Release Activity

Cadence

Every ~5 days

Recently: every ~16 days

Total

18

Last Release

74d ago

### Community

Maintainers

![](https://www.gravatar.com/avatar/84223dbcbf60e17fb8dcca02d1227af8eebe40acfed0a4bbf91cfed547aaea62?d=identicon)[kevinpijning](/maintainers/kevinpijning)

---

Top Contributors

[![kevinpijning](https://avatars.githubusercontent.com/u/2886081?v=4)](https://github.com/kevinpijning "kevinpijning (67 commits)")[![jesse-bos](https://avatars.githubusercontent.com/u/131663982?v=4)](https://github.com/jesse-bos "jesse-bos (13 commits)")

---

Tags

phpplugintestingunitframeworktestpestprompt

### Embed Badge

![Health badge](/badges/kevinpijning-pest-plugin-prompt/health.svg)

```
[![Health](https://phpackages.com/badges/kevinpijning-pest-plugin-prompt/health.svg)](https://phpackages.com/packages/kevinpijning-pest-plugin-prompt)
```

###  Alternatives

[defstudio/pest-plugin-laravel-expectations

A plugin to add laravel tailored expectations to Pest

98548.9k4](/packages/defstudio-pest-plugin-laravel-expectations)[pestphp/pest-plugin-stressless

Stressless plugin for Pest

67792.6k16](/packages/pestphp-pest-plugin-stressless)[jonpurvis/lawman

A PestPHP Plugin to help with architecture testing SaloonPHP integrations

4027.7k8](/packages/jonpurvis-lawman)[spatie/pest-plugin-route-testing

Make sure all routes in your Laravel app are ok

13753.8k](/packages/spatie-pest-plugin-route-testing)[milroyfraser/pest-plugin-gwt

Given When Then(GWT) Plugin for Pest

10332.1k1](/packages/milroyfraser-pest-plugin-gwt)[ozzie/pest-plugin-nest

Nest Pest PHP tests for better organization and readability

2028.3k](/packages/ozzie-pest-plugin-nest)

PHPackages © 2026

[Directory](/)[Categories](/categories)[Trending](/trending)[Changelog](/changelog)[Analyze](/analyze)
