Evaluations

How It Works
Running Evaluations
Writing Evaluations
Custom Scorers

ChatJS uses Evalite to evaluate AI agent outputs. Evaluations help you measure response quality, catch regressions, and iterate on prompts with confidence.

How It Works

Evaluations follow a simple structure:

Data: Test cases with inputs and expected outputs
Task: The AI operation being evaluated
Scorers: Functions that grade the output

Results are stored in SQLite and displayed in a web UI.

Running Evaluations

# Watch mode (re-runs on file changes)
bun eval:dev

# View results UI
bun eval:serve

Writing Evaluations

Create files with the .eval.ts extension in the evals/ directory. See existing evals for examples. Use runCoreChatAgentEval from lib/ai/eval-agent.ts to wrap the core chat agent for evaluation contexts.

Custom Scorers

Scorers receive the output, expected value, and input. Return a number between 0 and 1. Evalite also provides built-in scorers like exactMatch, answerCorrectness, and answerRelevancy.

URL Routing

Branching

⌘I

Getting Started

Core Concepts

Features

Customization

Deployment

Reference

Roadmap

How It Works

Running Evaluations

Writing Evaluations

Custom Scorers

Getting Started

Core Concepts

Features

Customization

Deployment

Reference

Roadmap

​How It Works

​Running Evaluations

​Writing Evaluations

​Custom Scorers

How It Works

Running Evaluations

Writing Evaluations

Custom Scorers