🖌️Create Evaluators

To create a new Evaluator, click on the 'Evaluators' from the left panel and click on the 'Create evaluator' button. Fill out the form to create the desired evaluator. This includes:

  • Name of the evaluator.

  • Description of what the evaluator does.

  • Select the evaluator type

  • Link the criterion you created earlier.

  • Select the LLM configuration to power the evaluator.

Evaluator Type

NameDescriptionVariableUse LLMUse CriterionUse Prompt

Criterion

Evaluates a model based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.

Input Prediction

Labeled Criterion

Evaluates a model based on a custom criterion, with a reference label. Use this evaluator when you have a custom criterion and would like to evaluate an output against a reference label.

Input Prediction Reference

Question Answering

Evaluates if the prediction answers the question posed in the input correctly, compared against a reference label.

Input Prediction Reference

Chain of Thought Question Answering

Given an input question, this evaluator determines if the prediction is correct using step-by-step reasoning process, compared against a reference label.

Input Prediction Reference

Context Question Answering

Evaluates if the prediction answers the question posed in the input correctly, using the context provided by the reference.

Input Prediction Reference

Score String

Evaluates the output on a scale of 1 to 10 based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.

Input Prediction

Labeled Score String

Gives a score between 1 and 10 to a prediction based on a ground truth reference label. Use this evaluator when you have a custom criterion to evaluate the LLM output compared to a ground truth reference.

Input Prediction Reference

Pairwise String

Predicts the preferred prediction from between two models. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.

Input Prediction Prediction_b

Labeled Pairwise String

Predicts the preferred prediction from two models based on a ground truth reference label. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.

Input Prediction Prediction_b Reference

Pairwise String Distance

This evaluator compares two predictions using string edit distances.

Prediction Prediction_b

String Distance

This evaluator compares a prediction to a reference answer using string edit distances.

Prediction Reference

Embedding Distance

This evaluator compares a prediction to a reference answer using embedding distances.

Prediction Reference

Exact Match

Compares predictions to a reference answer using exact matching.

Prediction Prediction_b

Regex Match

Compares predictions to a reference answer using regular expressions.

Prediction Reference

JSON Validity

Checks if a prediction is valid JSON.

Prediction

JSON Equality

Tests if a prediction is equal to a reference JSON.

Prediction Reference

JSON Edit Distance

Computes a distance between two canonicalized JSON strings. Available algorithms include: Damerau-Levenshtein, Levenshtein, Jaro, Jaro-Winkler, Hamming, and Indel.

Prediction Reference

JSON Schema Validation

Checks if a prediction is valid JSON according to a JSON schema.

Prediction

LLM as a judge

To use LLM as a judge, you need to create

Create a criterion

  • Click on 'Create criterion'.

  • Describe your criterion in natural language, detailing what good performance looks like. The LLM will use this description as the standard to evaluate your data.

  • Click on 'Create' to save the criterion.

Add LLM config

We currently support models from OpenAI, Anthropic and open-source models through Replicate

Smart Trigger an online Evaluator

Note that only evaluators with input and output variables can be triggered for online evaluation in production

Add Trigger Logic

To automatically evaluate an inference, you can add Triggers in the Evaluator detail page.

  • Select the evaluator you just created.

  • Navigate to the 'Triggers' section within the evaluator's detail page, and click on 'Add tag'

Implement Trigger in Inference Stream

  • Ensure that the tags specified in the evaluators' triggers are included when you stream your inference data.

  • Follow the full API documentation for detailed instructions on how to implement triggers in your inference stream.

Last updated