🖌️Create Evaluators
Last updated
Last updated
To create a new Evaluator, click on the 'Evaluators' from the left panel and click on the 'Create evaluator' button. Fill out the form to create the desired evaluator. This includes:
Name of the evaluator.
Description of what the evaluator does.
Select the evaluator type
Link the criterion you created earlier.
Select the LLM configuration to power the evaluator.
Name | Description | Variable | Use LLM | Use Criterion | Use Prompt |
---|---|---|---|---|---|
To use LLM as a judge, you need to create
Click on 'Create criterion'.
Describe your criterion in natural language, detailing what good performance looks like. The LLM will use this description as the standard to evaluate your data.
Click on 'Create' to save the criterion.
We currently support models from OpenAI, Anthropic and open-source models through Replicate
Note that only evaluators with input and output variables can be triggered for online evaluation in production
To automatically evaluate an inference, you can add Triggers in the Evaluator detail page.
Select the evaluator you just created.
Navigate to the 'Triggers' section within the evaluator's detail page, and click on 'Add tag'
Ensure that the tags specified in the evaluators' triggers are included when you stream your inference data.
Follow the full API documentation for detailed instructions on how to implement triggers in your inference stream.
Criterion
Evaluates a model based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.
Input Prediction
Labeled Criterion
Evaluates a model based on a custom criterion, with a reference label. Use this evaluator when you have a custom criterion and would like to evaluate an output against a reference label.
Input Prediction Reference
Question Answering
Evaluates if the prediction answers the question posed in the input correctly, compared against a reference label.
Input Prediction Reference
Chain of Thought Question Answering
Given an input question, this evaluator determines if the prediction is correct using step-by-step reasoning process, compared against a reference label.
Input Prediction Reference
Context Question Answering
Evaluates if the prediction answers the question posed in the input correctly, using the context provided by the reference.
Input Prediction Reference
Score String
Evaluates the output on a scale of 1 to 10 based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.
Input Prediction
Labeled Score String
Gives a score between 1 and 10 to a prediction based on a ground truth reference label. Use this evaluator when you have a custom criterion to evaluate the LLM output compared to a ground truth reference.
Input Prediction Reference
Pairwise String
Predicts the preferred prediction from between two models. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.
Input Prediction Prediction_b
Labeled Pairwise String
Predicts the preferred prediction from two models based on a ground truth reference label. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.
Input Prediction Prediction_b Reference
Pairwise String Distance
This evaluator compares two predictions using string edit distances.
Prediction Prediction_b
String Distance
This evaluator compares a prediction to a reference answer using string edit distances.
Prediction Reference
Embedding Distance
This evaluator compares a prediction to a reference answer using embedding distances.
Prediction Reference
Exact Match
Compares predictions to a reference answer using exact matching.
Prediction Prediction_b
Regex Match
Compares predictions to a reference answer using regular expressions.
Prediction Reference
JSON Validity
Checks if a prediction is valid JSON.
Prediction
JSON Equality
Tests if a prediction is equal to a reference JSON.
Prediction Reference
JSON Edit Distance
Computes a distance between two canonicalized JSON strings. Available algorithms include: Damerau-Levenshtein, Levenshtein, Jaro, Jaro-Winkler, Hamming, and Indel.
Prediction Reference
JSON Schema Validation
Checks if a prediction is valid JSON according to a JSON schema.
Prediction