Logging

This guide walks through how to log real-world interactions in your application. We encourage you to use the feature to track user behavior, debug customer issues, and incorporate new patterns into your evaluations. Ultimately, logging effectively is a critical component to developing high quality AI applications.

Before proceeding, make sure to read the quickstart guide and setup an API key.

Logging Screenshot

Writing logs

To write to braintrust, simply wrap the code you wish to log. Braintrust will automatically capture and log information behind the scenes.

import { initLogger, traced } from "braintrust";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
async function someLLMFunction(input: string) {
  return traced(async (span) => {
    const output = invokeLLM(input);
    span.log({ input, output });
  });
}
 
export async function POST(req: Request) {
  return traced(async (span) => {
    return await someLLMFunction(req.body);
  });
}

For full details, refer to the tracing guide, which describes how to log traces to braintrust.

Viewing logs

To view logs, navigate to the "Logs" tab in the appropriate project in the Braintrust UI. Logs are automatically updated in real-time as new traces are logged.

You can filter logs by tags, time range, and arbitrary subfields using Braintrust Query Language syntax. Here are a few examples of common filters:

Description	Syntax
Logs older than the past day	`created < CURRENT_DATE - INTERVAL 1 DAY`
Logs with a `user_id` field equal to `1234`	`metadata.user_id = '1234'`
Logs with a `Factuality` score greater than `0.5`	`scores.Factuality > 0.5`

💡

BTQL is currently in beta. To enable it, visit your profile settings, and enable "Use Braintrust query language for search filters".

User feedback

Braintrust supports logging user feedback, which can take multiple forms:

A score for a specific span, e.g. the output of a request could be 👍 (corresponding to 1) or 👎 (corresponding to 0), or a document retrieved in a vector search might be marked as relevant or irrelevant on a scale of 0->1.
An expected value, which gets saved in the expected field of a span, alongside input and output. This is a great place to store corrections.
A comment, which is a free-form text field that can be used to provide additional context.
Additional metadata fields, which allow you to track informationa about the feedback, like the user_id or session_id.

Each time you submit feedback, you can specify one or more of these fields using the logFeedback() / log_feedback() method, which simply needs you to specify the span_id corresponding to the span you want to log feedback for, and the feedback fields you want to update.

The following example shows how to log feedback within a simple API endpoint.

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const { body } = req;
    const result = await someLLMFunction(body);
    span.log({ input: body, output: result });
    return {
      result,
      requestId: span.id,
    };
  });
}
 
export async function POSTFeedback(req: Request) {
  logger.logFeedback({
    id: req.body.requestId,
    scores: {
      correctness: req.body.score,
    },
    comment: req.body.comment,
    metadata: {
      user_id: req.user.id,
    },
  });
}

Collecting multiple scores

Often, you want to collect multiple scores for a single span. For example, multiple users might provide independent feedback on a single document. Although each score and expected value is logged separately, each update overwrites the previous value. Instead, to capture multiple scores, you should create a new span for each submission, and log the score in the scores field. When you view and use the trace, Braintrust will automatically average the scores for you in the parent span(s).

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const { body } = req;
    const result = await someLLMFunction(body);
    span.log({ input: body, output: result });
    return {
      result,
      requestId: span.export(),
    };
  });
}
 
export async function POSTFeedback(req: Request) {
  logger.traced(
    async (span) => {
      logger.logFeedback({
        id: span.id, // Use the newly created span's id, instead of the original request's id
        comment: req.body.comment,
        scores: {
          correctness: req.body.score,
        },
        metadata: {
          user_id: req.user.id,
        },
      });
    },
    {
      parent: req.body.requestId,
      name: "feedback",
    }
  );
}

Tags and queues

Braintrust supports curating logs by adding tags, and then filtering on them in the UI. Tags naturally flow between logs, to datasets, and even to experiments, so you can use them to track various kinds of data across your application, and track how they change over time.

Add tags

Configuring tags

Tags are configured at the project level, and in addition to a name, you can also specify a color and description. To configure tags, navigate to the "Configuration" tab in a project, where you can add, modify, and delete tags.

Configure tags

Adding tags in the SDK

You can also add tags to logs using the SDK. To do so, simply specify the tags field when you log data.

import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
export async function POST(req: Request) {
  return logger.traced(async (span) => {
    const { body } = req;
    const result = await someLLMFunction(body);
    span.log({ input: body, output: result, tags: ["user-action"] });
    return {
      result,
      requestId: span.span_id,
    };
  });
}

‼️

Tags can only be applied to top-level spans, e.g those created via traced() or logger.startSpan()/ logger.start_span(). You cannot apply tags to subspans (those created from another span), because they are properties of the whole trace, not individual spans.

You can also apply tags while capturing feedback via the logFeedback() / log_feedback() method.

 
import { initLogger } from "braintrust";
 
const logger = initLogger({
  projectName
  apiKey: process.env.BRAINTRUST_API_KEY,
});
 
export async function POSTFeedback(req: Request) {
  logger.logFeedback({
    id: span.id, // Use the newly created span's id, instead of the original request's id
    comment: req.body.comment,
    scores: {
      correctness: req.body.score,
    },
    metadata: {
      user_id: req.user.id,
    },
    tags: ["user-feedback"],
  });
}

Filtering by tags

To filter by tags, simply select the tags you want to filter by in the UI.

Filter by tags

Using tags to create queues

You can also use tags to create queues, which are a way to organize logs into groups. Queues are useful for tracking logs you want to look at later, or for organizing logs into different categories. To create a queue, you should create two tags: one for the queue, and one to indicate that the event is no longer in the queue. For example, you might create a triage tag, and a triaged tag.

As you're reviewing logs, simply add the triage tag to the logs you want to review later. To see the logs in the queue, filter by the triage tag. You can add an additional label, like -tags:triaged to exclude logs that have been marked as done.

💡

-tags:triaged is not formal syntax in Braintrust, but our AI search knows to look for it, or terms like it, to exclude logs with the triaged tag.

Triaged

Implementation considerations

Data model

Each log entry is associated with an organization and a project. If you do not specify a project name or id in initLogger()/init_logger(), the SDK will create and use a project named "Global".
Although logs are associated with a single project, you can still use them in evaluations or datasets that belong to any project.
Like evaluation experiments, log entries contain optional input, output, expected, scores, metadata, and metrics fields. These fields are optional, but we encourage you to use them to provide context to your logs.
Logs are indexed automatically to enable efficient search. When you load logs, Braintrust automatically returns the most recently updated log entries first. You can also search by arbitrary subfields, e.g. metadata.user_id = '1234'. Currently, inequality filters, e.g. scores.accuracy > 0.5 do not use an index.

Initializing

The initLogger()/init_logger() method initializes the logger. Unlike the experiment init() method, the logger lazily initializes itself, so that you can call initLogger()/init_logger() at the top of your file (in module scope). The first time you log() or start a span, the logger will log into Braintrust and retrieve/initialize project details.

Flushing

The SDK can operate in two modes: either it sends log statements to the server after each request, or it buffers them in memory and sends them over in batches. Batching reduces the number of network requests and makes the log() command as fast as possible. Each SDK flushes logs to the server as fast as possible, and attempts to flush any outstanding logs when the program terminates.

You can enable background batching by setting the asyncFlush / async_flush flag to true in initLogger()/init_logger(). When async flush mode is on, you can use the .flush() method to manually flush any outstanding logs to the server.

// In the JS SDK, `asyncFlush` is false by default.
const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  asyncFlush: true,
});
 
...
 
// Some function that is called while cleaning up resources
async function cleanup() {
    await logger.flush();
}

Serverless environments

The asyncFlush / async_flush flag controls whether or not logs are flushed when a trace completes. This flag should be set to false in serverless environments where the process may halt as soon as the request completes. By default, asyncFlush is set to false in the Typescript SDK, since most Typescript applications are serverless, and True in Python.

const logger = initLogger({
  projectName: "My Project",
  apiKey: process.env.BRAINTRUST_API_KEY,
  asyncFlush: false,
});

Evaluations Tracing