Skip to content

Instantly share code, notes, and snippets.

@pdulapalli
Last active December 14, 2025 22:07
Show Gist options
  • Select an option

  • Save pdulapalli/f8f04ef73e4032161c8418095789c644 to your computer and use it in GitHub Desktop.

Select an option

Save pdulapalli/f8f04ef73e4032161c8418095789c644 to your computer and use it in GitHub Desktop.
eval_example_part_3
description: Customer support bot evals
providers:
- id: anthropic:messages:claude-sonnet-4-5-20250929
config:
system: "You are a helpful customer support agent for an online store."
prompts:
- "{{message}}"
tests:
- vars:
message: "Where is my order #12345?"
assert:
- type: contains
value: "order"
- type: contains
value: "12345"
- vars:
message: "I want a refund for my broken item"
assert:
- type: llm-rubric
value: "Response is empathetic and explains the refund process"
- vars:
message: "What's the capital of France?"
assert:
- type: not-contains
value: "Paris"
- type: llm-rubric
value: "Response politely redirects to store-related topics"
Starting evaluation eval-BQz-2025-12-14T20:17:18
Running 3 test cases (up to 4 at a time)...
Evaluating [████████████████████████████████████████] 100% | 3/3 | anthropic:messages:claude-sonnet-4-5-20250929 "{{message}" message=Wh
┌─────────────────────────────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────┐
│ message │ [anthropic:messages:claude-sonnet-4-5-20250929] {{message}} │
├─────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤
│ Where is my order #12345? │ [PASS] I'd be happy to help you track your order #12345! However, I don't have │
│ │ access to order tracking systems or customer databases. │
│ │ To find your order status, you can: │
│ │ 1. **Check your email** - Look for order confirmation and shipping updates from │
│ │ t... │
├─────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤
│ I want a refund for my broken item │ [PASS] I'd be happy to help you with a refund for your broken item. To assist you │
│ │ best, I'll need a few details: │
│ │ 1. **Order number** or purchase confirmation │
│ │ 2. **What item** is broken? │
│ │ 3. **When did you receive it** and when did you notice the damage? │
│ │ ... │
├─────────────────────────────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────────────────────────────┤
│ What's the capital of France? │ [FAIL] The capital of France is Paris. │
└─────────────────────────────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────────────────────────────┘
===========================================================================================================================================================================
✔ Evaluation complete. ID: eval-BQz-2025-12-14T20:17:18
» Run promptfoo view to use the local web viewer
» Do you want to share this with your team? Sign up for free at https://promptfoo.app
» This project needs your feedback. What's one thing we can improve? https://promptfoo.dev/feedback
===========================================================================================================================================================================
Token Usage Summary:
Evaluation:
Total: 348
Prompt: 0
Completion: 0
Cached: 348
Grand Total: 348 tokens
===========================================================================================================================================================================
Duration: 0s (concurrency: 4)
Successes: 2
Failures: 1
Errors: 0
Pass Rate: 66.67%
===========================================================================================================================================================================
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment