How It Works
Evaluations follow a simple structure:- Data: Test cases with inputs and expected outputs
- Task: The AI operation being evaluated
- Scorers: Functions that grade the output
Running Evaluations
Writing Evaluations
Create files with the.eval.ts extension in the evals/ directory. See existing evals for examples.
Use runCoreChatAgentEval from lib/ai/eval-agent.ts to wrap the core chat agent for evaluation contexts.
Custom Scorers
Scorers receive the output, expected value, and input. Return a number between 0 and 1. Evalite also provides built-in scorers likeexactMatch, answerCorrectness, and answerRelevancy.