🖌️Create Evaluators
To create a new Evaluator, click on the 'Evaluators' from the left panel and click on the 'Create evaluator' button. Fill out the form to create the desired evaluator. This includes:
Name of the evaluator.
Description of what the evaluator does.
Select the evaluator type
Link the criterion you created earlier.
Select the LLM configuration to power the evaluator.
Evaluator Type
Name | Description | Variable | Use LLM | Use Criterion | Use Prompt |
---|---|---|---|---|---|
Criterion | Evaluates a model based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference. | Input Prediction | |||
Labeled Criterion | Evaluates a model based on a custom criterion, with a reference label. Use this evaluator when you have a custom criterion and would like to evaluate an output against a reference label. | Input Prediction Reference | |||
Question Answering | Evaluates if the prediction answers the question posed in the input correctly, compared against a reference label. | Input Prediction Reference | |||
Chain of Thought Question Answering | Given an input question, this evaluator determines if the prediction is correct using step-by-step reasoning process, compared against a reference label. | Input Prediction Reference | |||
Context Question Answering | Evaluates if the prediction answers the question posed in the input correctly, using the context provided by the reference. | Input Prediction Reference | |||
Score String | Evaluates the output on a scale of 1 to 10 based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference. | Input Prediction | |||
Labeled Score String | Gives a score between 1 and 10 to a prediction based on a ground truth reference label. Use this evaluator when you have a custom criterion to evaluate the LLM output compared to a ground truth reference. | Input Prediction Reference | |||
Pairwise String | Predicts the preferred prediction from between two models. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion. | Input Prediction Prediction_b | |||
Labeled Pairwise String | Predicts the preferred prediction from two models based on a ground truth reference label. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion. | Input Prediction Prediction_b Reference | |||
Pairwise String Distance | This evaluator compares two predictions using string edit distances. | Prediction Prediction_b | |||
String Distance | This evaluator compares a prediction to a reference answer using string edit distances. | Prediction Reference | |||
Embedding Distance | This evaluator compares a prediction to a reference answer using embedding distances. | Prediction Reference | |||
Exact Match | Compares predictions to a reference answer using exact matching. | Prediction Prediction_b | |||
Regex Match | Compares predictions to a reference answer using regular expressions. | Prediction Reference | |||
JSON Validity | Checks if a prediction is valid JSON. | Prediction | |||
JSON Equality | Tests if a prediction is equal to a reference JSON. | Prediction Reference | |||
JSON Edit Distance | Computes a distance between two canonicalized JSON strings. Available algorithms include: Damerau-Levenshtein, Levenshtein, Jaro, Jaro-Winkler, Hamming, and Indel. | Prediction Reference | |||
JSON Schema Validation | Checks if a prediction is valid JSON according to a JSON schema.
| Prediction |
LLM as a judge
To use LLM as a judge, you need to create
Create a criterion
Click on 'Create criterion'.
Describe your criterion in natural language, detailing what good performance looks like. The LLM will use this description as the standard to evaluate your data.
Click on 'Create' to save the criterion.
Add LLM config
We currently support models from OpenAI, Anthropic and open-source models through Replicate
Smart Trigger an online Evaluator
Note that only evaluators with input and output variables can be triggered for online evaluation in production
Add Trigger Logic
To automatically evaluate an inference, you can add Triggers in the Evaluator detail page.
Select the evaluator you just created.
Navigate to the 'Triggers' section within the evaluator's detail page, and click on 'Add tag'
Implement Trigger in Inference Stream
Ensure that the tags specified in the evaluators' triggers are included when you stream your inference data.
Follow the full API documentation for detailed instructions on how to implement triggers in your inference stream.
Last updated