Datasets allow you to collect data from production, staging, evaluations, and even manually, and then use that data to run evaluations and track improvements over time.

For example, you can use Datasets to:

  • Store evaluation test cases for your eval script instead of managing large JSONL or CSV files
  • Log all production generations to assess quality manually or using model graded evals
  • Store user reviewed (👍, 👎) generations to find new test cases

In Braintrust, datasets have a few key properties:

  • Integrated. Datasets are integrated with the rest of the Braintrust platform, so you can use them in evaluations, explore them in the playground, and log to them from your staging/production environments.
  • Versioned. Every insert, update, and delete is versioned, so you can pin evaluations to a specific version of the dataset, rewind to a previous version, and track changes over time.
  • Scalable. Datasets are stored in a modern cloud data warehouse, so you can collect as much data as you want without worrying about storage or performance limits.
  • Secure. If you run Braintrust in your cloud environment, datasets are stored in your warehouse and never touch our infrastructure.

Creating a dataset

Records in a dataset are stored as JSON objects, and each record has three top-level fields:

  • input is a set of inputs that you could use to recreate the example in your application. For example, if you're logging examples from a question answering model, the input might be the question.
  • expected (optional) is the output of your model. For example, if you're logging examples from a question answering model, this might be the answer. You can access expected when running evaluations as the expected field; however, expected does not need to be the ground truth.
  • metadata (optional) is a set of key-value pairs that you can use to filter and group your data. For example, if you're logging examples from a question answering model, the metadata might include the knowledge source that the question came from.

Datasets are created automatically when you initialize them in the SDK.

Inserting records

You can use the SDK to initialize and insert into a dataset:

import { initDataset, Dataset } from "braintrust";
const dataset = initDataset("My App", { dataset: "My Dataset" });
for (let i = 0; i < 10; i++) {
  const id = dataset.insert({
    input: i,
    expected: { result: i + 1, error: null },
    metadata: { foo: i % 2 },
  console.log("Inserted record with id", id);
console.log(await dataset.summarize());

Updating records

In the above example, each insert() statement returns an id. This id can be used to update the record later:

  input: i,
  expected: { result: i + 1, error: "Timeout" },
  metadata: { foo: i % 2 },

Deleting records

You can also delete records by id:

await dataset.delete(id);

Viewing a dataset

You can view a dataset in the Braintrust UI by navigating to the project and then clicking on the dataset.

Dataset Viewer

From the UI, you can filter records, create new ones, edit values, and delete records. You can also copy records between datasets and from experiments into datasets. This feature is commonly used to collect interesting or anomalous examples into a golden dataset.

Using a dataset in an evaluation

You can use a dataset in an evaluation by initializing a dataset, iterating through its records, and logging the records' ids to link them to the evaluation:

import { initDataset, init, Dataset, Experiment } from "braintrust";
function myApp(input: any) {
  return `output of input ${input}`;
function myScore(output: any, rowExpected: any) {
  return Math.random();
const dataset = initDataset("My App", { dataset: "My Dataset" });
const experiment = init("My App", {
  experiment: "My Experiment",
  dataset: dataset,
for await (const row of dataset) {
  const output = myApp(row.input);
  const closeness = myScore(output, row.expected);
    input: row.input,
    expected: row.expected,
    scores: { closeness },
console.log(await experiment.summarize());

Logging from your application

To log to a dataset from your application, you can simply use the SDK and call insert(). Braintrust logs are queued and sent asynchronously, so you don't need to worry about critical path performance.

Since the SDK uses API keys, it's recommended that you log from a privileged environment (e.g. backend server), instead of client applications directly.

This example walks through how to track 👍/👎 from feedback:

import { initDataset, Dataset } from "braintrust";
class MyApplication {
  private dataset: Dataset | undefined = undefined;
  async initApp() {
    this.dataset = await initDataset("My App", { dataset: "logs" });
  async logUserExample(input: any, expected: any, userId: string, orgId: string, thumbsUp: boolean) {
    if (this.dataset) {
        metadata: { userId, orgId, thumbsUp },
    } else {
      console.warn("Must initialize application before logging");