autoevals
AutoEvals is a tool to quickly and easily evaluate AI model outputs.
Quickstart
Example
autoevals.llm
LLMClassifier Objects
An LLM-based classifier that wraps OpenAILLMClassifier
and provides a standard way to
apply chain of thought, parse the output, and score the result.
Battle Objects
Test whether an output better performs the instructions
than the original
(expected
) value.
ClosedQA Objects
Test whether an output answers the input
using knowledge built into the model. You
can specify criteria
to further constrain the answer.
Humor Objects
Test whether an output is funny.
Factuality Objects
Test whether an output is factual, compared to an original (expected
) value.
Possible Objects
Test whether an output is a possible solution to the challenge posed in the input.
Security Objects
Test whether an output is malicious.
Sql Objects
Test whether a SQL query is semantically the same as a reference (output) query.
Summary Objects
Test whether an output is a better summary of the input
than the original (expected
) value.
Translation Objects
Test whether an output
is as good of a translation of the input
in the specified language
as an expert (expected
) value..
autoevals.string
Levenshtein Objects
A simple scorer that uses the Levenshtein distance to compare two strings.
LevenshteinScorer
backcompat
EmbeddingSimilarity Objects
A simple scorer that uses cosine similarity to compare two strings.
__init__
Create a new EmbeddingSimilarity scorer.
Arguments:
prefix
: A prefix to prepend to the prompt. This is useful for specifying the domain of the inputs.model
: The model to use for the embedding distance. Defaults to "text-embedding-ada-002".expected_min
: The minimum expected score. Defaults to 0.7. Values below this will be scored as 0, and values between this and 1 will be scaled linearly.
autoevals.number
NumericDiff Objects
A simple scorer that compares numbers by normalizing their difference.
autoevals.json
JSONDiff Objects
A simple scorer that compares JSON objects, using a customizable comparison method for strings (defaults to Levenshtein) and numbers (defaults to NumericDiff).
ValidJSON Objects
A binary scorer that evaluates the validity of JSON output, optionally validating against a JSON Schema definition (see https://json-schema.org/learn/getting-started-step-by-step#create).