DatasetPipeline(...) in a TypeScript or Python file, then run it with the bt datasets pipeline CLI command.
Dataset pipelines require
bt CLI v0.10.0 or later, plus the braintrust SDK for the language you write the pipeline in: TypeScript SDK v3.16.0 or later, or Python SDK v0.23.0 or later.Define a pipeline
A pipeline definition has three parts:source: Which project to read from, an optional SQL filter, and whether to operate on individual spans or entire traces (scope).transform: A function that receives a source span or trace and returns one or more dataset rows, or nothing to skip it.target: The dataset to write to. Braintrust creates the dataset if it does not exist.
pipeline.ts
transform function returns a single row, a list of rows, or null/None to skip the source. A row accepts the standard dataset fields: input, expected, metadata, tags, and id. When you omit id, the row inherits the source span or trace ID.
When scope is "span", the transform receives the span’s id, input, output, expected, and metadata, along with the full trace. When scope is "trace", it receives only the trace.
Run a pipeline
Run the full pipeline in one shot. The--limit flag controls how many source spans or traces to discover, and --window sets how far back to look, which defaults to the last day (1d). For the full list of flags, see the bt datasets pipeline CLI reference.
run discovers source refs, transforms them, and inserts the resulting rows into the target dataset.
Staged workflow
For larger jobs, or when you want to inspect or edit rows before writing them, split the run into three stages:bt-sync/ by default: pull writes pulled.jsonl and transform writes transformed.jsonl. You can inspect or edit transformed.jsonl before running push. Use --root to change the directory, or --out and --in to override individual artifact paths.
For the full set of flags, including source and target overrides and concurrency controls, see the bt datasets pipeline CLI reference.
Next steps
- Browse the
bt datasets pipelinecommand reference for every flag. - Learn other ways to create datasets from uploads, the SDK, or production logs.
- Use datasets in evaluations once your rows are in place.