# Create Evaluators

To create a new Evaluator, click on the 'Evaluators' from the left panel and click on the 'Create evaluator' button. Fill out the form to create the desired evaluator. This includes:

* Name of the evaluator.
* Description of what the evaluator does.
* Select the evaluator type
* Link the criterion you created earlier.
* Select the LLM configuration to power the evaluator.

### Evaluator Type

<table data-full-width="true"><thead><tr><th width="178">Name</th><th width="354">Description</th><th width="132">Variable</th><th width="102" data-type="checkbox">Use LLM</th><th width="133" data-type="checkbox">Use Criterion</th><th data-type="checkbox">Use Prompt</th></tr></thead><tbody><tr><td>Criterion</td><td>Evaluates a model based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.</td><td>Input<br>Prediction</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Labeled Criterion</td><td>Evaluates a model based on a custom criterion, with a reference label. Use this evaluator when you have a custom criterion and would like to evaluate an output against a reference label.</td><td>Input<br>Prediction<br>Reference</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Question Answering</td><td>Evaluates if the prediction answers the question posed in the input correctly, compared against a reference label.</td><td>Input<br>Prediction<br>Reference</td><td>true</td><td>false</td><td>true</td></tr><tr><td>Chain of Thought Question Answering</td><td>Given an input question, this evaluator determines if the prediction is correct using step-by-step reasoning process, compared against a reference label.</td><td>Input<br>Prediction<br>Reference</td><td>true</td><td>false</td><td>true</td></tr><tr><td>Context Question Answering</td><td>Evaluates if the prediction answers the question posed in the input correctly, using the context provided by the reference.</td><td>Input<br>Prediction<br>Reference</td><td>true</td><td>false</td><td>true</td></tr><tr><td>Score String</td><td>Evaluates the output on a scale of 1 to 10 based on a custom criterion. Use this evaluator when you have a custom criterion to evaluate the LLM output without needing a ground truth reference.</td><td>Input<br>Prediction</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Labeled Score String</td><td>Gives a score between 1 and 10 to a prediction based on a ground truth reference label. Use this evaluator when you have a custom criterion to evaluate the LLM output compared to a ground truth reference.</td><td>Input<br>Prediction<br>Reference</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Pairwise String</td><td>Predicts the preferred prediction from between two models. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.</td><td>Input<br>Prediction<br>Prediction_b</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Labeled Pairwise String</td><td>Predicts the preferred prediction from two models based on a ground truth <strong>reference</strong> label. When you have two predictions generated for the same input, this evaluator helps you choose the preferred one based on both your ground truth reference and custom criterion.</td><td>Input<br>Prediction<br>Prediction_b<br>Reference</td><td>true</td><td>true</td><td>true</td></tr><tr><td>Pairwise String Distance</td><td>This evaluator compares two predictions using string edit distances.</td><td>Prediction<br>Prediction_b</td><td>false</td><td>false</td><td>false</td></tr><tr><td>String Distance</td><td>This evaluator compares a prediction to a reference answer using string edit distances.</td><td>Prediction<br>Reference</td><td>false</td><td>false</td><td>false</td></tr><tr><td>Embedding Distance</td><td>This evaluator compares a prediction to a reference answer using embedding distances.</td><td>Prediction<br>Reference</td><td>false</td><td>false</td><td>false</td></tr><tr><td>Exact Match</td><td>Compares predictions to a reference answer using exact matching.</td><td>Prediction<br>Prediction_b</td><td>false</td><td>false</td><td>false</td></tr><tr><td>Regex Match</td><td>Compares predictions to a reference answer using regular expressions.</td><td>Prediction<br>Reference</td><td>false</td><td>false</td><td>false</td></tr><tr><td>JSON Validity</td><td>Checks if a prediction is valid JSON.</td><td>Prediction</td><td>false</td><td>false</td><td>false</td></tr><tr><td>JSON Equality</td><td>Tests if a prediction is equal to a reference JSON.</td><td>Prediction<br>Reference</td><td>false</td><td>false</td><td>false</td></tr><tr><td>JSON Edit Distance</td><td>Computes a distance between two canonicalized JSON strings. Available algorithms include: Damerau-Levenshtein, Levenshtein, Jaro, Jaro-Winkler, Hamming, and Indel.</td><td>Prediction<br>Reference</td><td>false</td><td>false</td><td>false</td></tr><tr><td>JSON Schema Validation</td><td><p>Checks if a prediction is valid JSON according to a JSON schema.</p><p><br></p><p></p></td><td>Prediction</td><td>false</td><td>false</td><td>false</td></tr></tbody></table>

### LLM as a judge

To use LLM as a judge, you need to create

#### Create a criterion

* Click on '**Create criterion**'.
* Describe your criterion in natural language, detailing what good performance looks like. The LLM will use this description as the standard to evaluate your data.
* Click on '**Create**' to save the criterion.

<figure><img src="https://2399115798-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWsdxCCToJ5K2nk2ZZvqc%2Fuploads%2FkTJDQfoMEn5kNkvHfj2u%2FScreenshot%202024-09-26%20at%202.27.45%E2%80%AFPM.png?alt=media&#x26;token=df9c4723-94fc-48f1-878c-6a4ea03ba89c" alt="" width="375"><figcaption></figcaption></figure>

#### Add LLM config

We currently support models from OpenAI, Anthropic and open-source models through Replicate

<figure><img src="https://2399115798-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWsdxCCToJ5K2nk2ZZvqc%2Fuploads%2FvxCKRBvEblGykv1Q31z8%2FScreenshot%202024-09-26%20at%202.25.36%E2%80%AFPM.png?alt=media&#x26;token=e9d4bf2c-24f4-4d75-b8ca-ba4baf640ead" alt="" width="225"><figcaption></figcaption></figure>

## Smart Trigger an online Evaluator

{% hint style="info" %}
Note that only evaluators with **input** and **output** variables can be triggered for online evaluation in production
{% endhint %}

### Add Trigger Logic

To automatically evaluate an inference, you can add **Triggers** in the Evaluator detail page.

* Select the evaluator you just created.

<figure><img src="https://2399115798-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWsdxCCToJ5K2nk2ZZvqc%2Fuploads%2FmWuQuj1O7HVybP6l8bAr%2FScreenshot%202024-09-26%20at%201.35.12%E2%80%AFPM.png?alt=media&#x26;token=8bd14bdc-4a0d-404b-a4cc-d1e2bbc8c207" alt="" width="563"><figcaption></figcaption></figure>

* Navigate to the 'Triggers' section within the evaluator's detail page, and click on '**Add tag**'

<figure><img src="https://2399115798-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FWsdxCCToJ5K2nk2ZZvqc%2Fuploads%2FxZlH7QpIQo50yq1DqAyb%2FScreenshot%202024-09-26%20at%201.36.57%E2%80%AFPM.png?alt=media&#x26;token=6860af0f-a1f1-4ca8-a726-ec4ba4ea1858" alt="" width="375"><figcaption></figcaption></figure>

### Implement Trigger in Inference Stream

* Ensure that the tags specified in the evaluators' triggers are included when you stream your inference data.
* Follow the full [API documentation](https://app.ownlayer.com/docs#tag/inferences) for detailed instructions on how to implement triggers in your inference stream.
