Anthropic model provider configuration and integration guide
Anthropic provides access to Claude models including Claude 4 Sonnet, Claude 4.1 Opus, and other cutting-edge language models. Braintrust integrates seamlessly with Anthropic through direct API access, wrapAnthropic wrapper functions for automatic tracing, and proxy support.
This guide covers manual instrumentation. For quicker setup, use auto-instrumentation.
Set the Anthropic API key and your Braintrust API key as environment variables
.env
ANTHROPIC_API_KEY=<your-anthropic-api-key>BRAINTRUST_API_KEY=<your-braintrust-api-key># For organizations on the EU data plane, use https://api-eu.braintrust.dev# For self-hosted deployments, use your data plane URL# BRAINTRUST_API_URL=<your-braintrust-api-url>
API keys are stored as one-way cryptographic hashes, never in plaintext.
Install the braintrust and @anthropic-ai/sdk packages.
import Anthropic from "@anthropic-ai/sdk";import { wrapAnthropic, initLogger } from "braintrust";// Initialize the Braintrust loggerconst logger = initLogger({ projectName: "My Project", // Your project name apiKey: process.env.BRAINTRUST_API_KEY,});// Wrap the Anthropic client with the Braintrust loggerconst client = wrapAnthropic( new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }),);// All API calls are automatically loggedconst result = await client.messages.create({ model: "claude-sonnet-4-5-20250929", max_tokens: 1024, messages: [{ role: "user", content: "What is machine learning?" }],});
Each traced Anthropic call logs metrics to the span based on what the API returns. Token counts are always present; other fields appear only when the relevant feature is in use.
Metric
Description
prompt_tokens
Total input tokens (including cached and cache-creation tokens)
When Claude uses server-side tools, Braintrust records the provider’s tool usage counters dynamically:
Metric pattern
Description
server_tool_use_<field_name>
Server-side tool usage counts returned by Anthropic. Examples include server_tool_use_web_search_requests, server_tool_use_web_fetch_requests, and server_tool_use_code_execution_requests.
The following metadata fields are also logged when the API returns them:
Evaluations distill the non-deterministic outputs of Anthropic models into an effective feedback loop that enables you to ship more reliable, higher quality products. The Braintrust Eval function is composed of a dataset of user inputs, a task, and a set of scorers. To learn more about evaluations, see the Experiments guide.
You can use Anthropic models to score the outputs of other AI systems. This example uses the LLMClassifierFromSpec scorer to score the relevance of the outputs of an AI system.Install the autoevals package to use the LLMClassifierFromSpec scorer.
Create a scorer that uses the LLMClassifierFromSpec scorer to score the relevance of the output. You can then include relevanceScorer as a scorer in your Eval function (see above).
Anthropic models support system prompts for better instruction following.
import Anthropic from "@anthropic-ai/sdk";const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });const response = await client.messages.create({ model: "claude-sonnet-4-5-20250929", max_tokens: 1024, system: "You are a helpful assistant that responds in JSON format.", messages: [{ role: "user", content: "What is the capital of France?" }],});
Anthropic supports prompt caching to reduce costs and latency for repeated content. When you use prompt caching, Braintrust automatically captures cache read and creation token counts as span metrics. If the API returns a cache creation breakdown (ephemeral 5-minute vs. 1-hour), those are captured as separate metrics too — see the full list in Trace automatically.
import Anthropic from "@anthropic-ai/sdk";const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });const response = await client.messages.create({ model: "claude-sonnet-4-5-20250929", max_tokens: 1024, system: [ { type: "text", text: "You are an AI assistant analyzing the following document...", cache_control: { type: "ephemeral" }, }, ], messages: [{ role: "user", content: "Summarize the key points." }],});
You can also access Anthropic models through the Braintrust gateway, which provides a unified interface for multiple providers. Use any supported provider’s SDK to call Anthropic models.
import { OpenAI } from "openai";const client = new OpenAI({ baseURL: "https://gateway.braintrust.dev/v1", apiKey: process.env.BRAINTRUST_API_KEY,});const response = await client.chat.completions.create({ model: "claude-sonnet-4-5-20250929", messages: [{ role: "user", content: "What is a proxy?" }], seed: 1, // A seed activates the proxy's cache});
The Braintrust gateway supports structured outputs for Anthropic models.
import { OpenAI } from "openai";import { z } from "zod";const client = new OpenAI({ baseURL: "https://gateway.braintrust.dev/v1", apiKey: process.env.BRAINTRUST_API_KEY,});// Define a Zod schema for the responseconst ResponseSchema = z.object({ name: z.string(), age: z.number(),});const completion = await client.beta.chat.completions.parse({ model: "claude-sonnet-4-5-20250929", messages: [ { role: "system", content: "Extract the person's name and age." }, { role: "user", content: "My name is John and I'm 30 years old." }, ], response_format: { type: "json_schema", json_schema: { name: "person", // The Zod schema for the response schema: ResponseSchema, }, },});