Skip to main content
Deploy the Braintrust data plane in your AWS account using the Braintrust Terraform module. This is the recommended way to self-host Braintrust on AWS.Braintrust recommends deploying in a dedicated AWS account. AWS enforces account-level Lambda concurrency limits, and since Braintrust’s API runs on Lambda, sharing an account with other workloads can lead to throttling and service disruptions. A dedicated account also aligns with AWS best practices for workload isolation and security.
To test infrastructure provisioning before committing to production-sized resources, use the sandbox example. It uses minimal instance sizes and has deletion protection disabled for easy teardown. It is not suitable for performance or load testing.

1. Configure the Terraform module

The Braintrust Terraform module contains all the necessary resources for a self-hosted Braintrust data plane.
  1. Copy the entire contents of the examples/braintrust-data-plane directory from the terraform-aws-braintrust-data-plane repository into your own repository.
  2. In provider.tf, configure your AWS account and region. Supported regions: us-east-1, us-east-2, us-west-2, eu-west-1, ca-central-1, and ap-southeast-2. If you require support for a different region, contact Braintrust.
  3. In terraform.tf, set up your remote backend (typically S3 and DynamoDB).
  4. In main.tf, customize the Braintrust deployment settings. The defaults are suitable for a large production-sized deployment. Adjust them based on your needs, but keep in mind the hardware requirements.
    Each deployment must have a unique deployment_name within the same AWS account (max 18 characters). The default is "braintrust", change this if you have multiple deployments. Resource names (IAM roles, RDS instances, S3 buckets) are prefixed with this value and will collide if duplicated.
    Brainstore instances require instance types with local NVMe storage for caching (e.g., c8gd, c5d, m5d, i3, i4i families). Generic instance types without local storage (t3, m5, c5) are not supported and will fail at plan time.

2. Initialize AWS account

If you’re using a new AWS account, run the create-service-linked-roles.sh script to create all necessary IAM service-linked roles for the deployment:
./scripts/create-service-linked-roles.sh

3. Configure Brainstore license

Your deployment includes Brainstore, a high-performance query engine for real-time trace ingestion. Brainstore requires a license key.
  1. Go to Settings > Data plane.
    Only organization owners can access this page. If you don’t see your data plane configuration, contact Braintrust to enable self-hosting.
  2. Copy your Brainstore license.
  3. Pass the key to Terraform. The recommended approach is to store the license key in AWS Secrets Manager and reference it using a Terraform data source:
    data "aws_secretsmanager_secret_version" "brainstore_license" {
      secret_id = "braintrust/brainstore-license-key"
    }
    
    Then pass data.aws_secretsmanager_secret_version.brainstore_license.secret_string as the brainstore_license_key value in the module. Alternatively, you can pass the key without storing it in Secrets Manager:
    • Set TF_VAR_brainstore_license_key=your-key in your environment.
    • Pass it via command line: terraform apply -var 'brainstore_license_key=your-key'.
    • Add it to an uncommitted terraform.tfvars or .auto.tfvars file.
    Do not commit the license key to your git repository.

4. Deploy the module

Initialize and apply the Terraform configuration:
terraform init
terraform apply
The first terraform apply may fail with transient errors such as ASG health check timeouts (while instances are still booting) or Lambda rate limits. Re-running terraform apply resolves these.
This will create all necessary AWS resources including:
  • Two isolated VPCs:
    • Main VPC: Hosts Braintrust services (API, database, Redis, Brainstore)
    • Quarantine VPC: Runs user-defined functions (scorers, tools) in network isolation. This creates ~30 Lambda functions across multiple runtimes. This is required for most production use cases.
  • Lambda functions for the Braintrust API
  • Public CloudFront endpoint and API Gateway
  • EC2 Auto-scaling group for Brainstore
  • PostgreSQL database, Redis cache, and S3 buckets
  • KMS key for encryption

5. Get your API URL

After the deployment completes, get your API URL from the Terraform outputs:
terraform output
You should see output similar to:
api_url = "https://dx6atff6gocr6.cloudfront.net"
Save this URL. You’ll need it to configure your Braintrust organization.

6. Configure your organization

Connect your Braintrust organization to your newly deployed data plane.
Changing your live organization’s API URL can disrupt access for existing users. If you are testing, create a new Braintrust organization for your data plane instead of updating your live environment.
  1. Go to Settings > Data plane.
    Only organization owners can access this page.
  2. In API URL area, select Edit.
  3. Enter the API URL from the last step.
  4. Leave the other fields blank.
  5. If your deployment is accessed through a VPN or is otherwise on a private network (not accessible from the public internet), enable Data plane is on a private network. This enables Chrome’s Local Network Access permission handling, which is required for browser access to private network resources. When enabled, Chrome will prompt users to grant permission for the Braintrust UI to access your self-hosted data plane. See Grant browser permissions for details.
  6. Select Save.
The UI will automatically test the connection to your new data plane. Verify that the ping to each endpoint is successful.

Debug issues

If you encounter issues, you can use the dump-logs.sh script to collect logs:
./scripts/dump-logs.sh <deployment_name> [--minutes N] [--service <svc1,svc2,...|all>]
For example, to dump 60 minutes of logs for the bt-sandbox deployment, run:
./scripts/dump-logs.sh bt-sandbox
This will save logs for all services to a logs-<deployment_name> directory, which you can share with the Braintrust team for debugging.

Customize the deployment

Use an existing VPC

To deploy into an existing VPC instead of creating a new one, set create_vpc = false and provide your VPC and subnet IDs:
module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  create_vpc = false

  existing_vpc_id              = "vpc-xxxxxxxxx"
  existing_private_subnet_1_id = "subnet-xxxxxxxxx"
  existing_private_subnet_2_id = "subnet-xxxxxxxxx"
  existing_private_subnet_3_id = "subnet-xxxxxxxxx"
  existing_public_subnet_1_id  = "subnet-xxxxxxxxx"

  # ... other configuration ...
}
Your existing VPC must have:
  • At least 3 private subnets across different availability zones
  • At least 1 public subnet
  • Internet and NAT gateways with properly configured route tables
The module manages its own security groups. To also use an existing quarantine VPC, set existing_quarantine_vpc_id and the corresponding existing_quarantine_private_subnet_*_id variables.

Use custom tags

To apply custom tags to all resources, pass the custom_tags parameter to the Braintrust module:
module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  custom_tags = {
    Environment = "production"
    Team        = "ml-platform"
    CostCenter  = "engineering"
  }

  # ... other configuration ...
}
These tags will be applied to all resources including Brainstore EC2 instances, volumes, and ENIs. The deployment name variable automatically prefixes resource names and applies a BraintrustDeploymentName tag across all resources.
Use the custom_tags parameter instead of the AWS provider’s default_tags configuration. Due to a Terraform limitation, default_tags are not applied to resources that use launch templates, such as Brainstore instances.

Redis instance sizing

Important for AWS: Avoid using burstable Redis instances (t-family instances like cache.t4g.micro) in production. These instances use CPU credits that can be exhausted during high-load periods, leading to performance throttling.Instead, use non-burstable instances like cache.r7g.large, cache.r6g.medium, or cache.r5.large for predictable performance. Even if these instances seem oversized initially, they provide consistent performance without the risk of CPU credit exhaustion.

VPC connectivity

To connect Braintrust’s VPC to other internal resources (like an LLM gateway), use one of the following approaches:
  • Create a VPC Endpoint Service for your internal resource, then create a VPC Interface Endpoint inside of the Braintrust “Quarantine” VPC
  • Set up VPC peering with the Braintrust “Quarantine” VPC

Lambda memory limits

The API Handler and AI Proxy Lambda functions default to 10240 MB (the Lambda maximum). You can reduce these to lower costs in environments with tighter memory quotas, though Braintrust recommends keeping the defaults for production workloads.
module "braintrust-data-plane" {
  source = "github.com/braintrustdata/terraform-aws-braintrust-data-plane"

  api_handler_memory_limit = 10240  # default, valid range 1–10240 MB
  ai_proxy_memory_limit    = 10240  # default, valid range 1–10240 MB

  # ... other configuration ...
}
The brainstore_wal_footer_version variable controls the WAL footer format written by Brainstore. It defaults to "" (unset) and should not be changed outside of a planned upgrade sequence.
Do not set brainstore_wal_footer_version without following the upgrade guide. Setting it at the same time as a version bump can cause Brainstore nodes still rolling out to fail to read the new WAL format.
See Enable efficient WAL format in the v2.0 upgrade guide for the correct migration steps.

KMS encryption

When kms_key_arn is configured, all managed S3 buckets (Brainstore, code-bundle, and Lambda responses) enforce blocked_encryption_types = ["NONE"], preventing unencrypted object uploads. This policy is applied automatically as of v4.5.0 — upgrading from an earlier version will include this change in your terraform plan.

AI Proxy CORS headers

As of v4.5.0, the x-bt-use-gateway header is included in the AI Proxy Lambda function URL CORS allowed headers. Browser clients can send this header to control gateway routing without triggering a CORS preflight rejection. No configuration is required.

Next steps