The Problem
You’ve built your LLM application. It works.
Now you want better observability, load balancing, or caching.
Most solutions require:
- Rewriting your API calls
- Learning new SDKs
- Refactoring working code
- Testing everything again
We built Bifrost to be different: drop it in, change one URL, done.
OpenAI-Compatible API
Bifrost speaks OpenAI’s API format.
If your code works with OpenAI, it works with Bifrost.
Before
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
After
import openai
openai.api_base = "http://localhost:8080/openai" # Only change
openai.api_key = "sk-..." # Your actual API key
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)
One line changed. That’s it.
Works With Every Major Framework
Because Bifrost is OpenAI-compatible, it works with any framework that supports OpenAI.
LangChain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
openai_api_base="http://localhost:8080/langchain",
openai_api_key="sk-..."
)
LlamaIndex
from llama_index.llms import OpenAI
llm = OpenAI(
api_base="http://localhost:8080/openai",
api_key="sk-..."
)
LiteLLM
import litellm
response = litellm.completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
base_url="http://localhost:8080/litellm"
)
Anthropic SDK
import anthropic
client = anthropic.Anthropic(
base_url="http://localhost:8080/anthropic",
api_key="sk-ant-..."
)
Same pattern everywhere: change the base URL, keep everything else.
Multiple Providers, One Interface
Bifrost routes to multiple providers through the same API.
Configuration
{
"providers": [
{
"name": "openai",
"api_key": "sk-...",
"models": ["gpt-4", "gpt-4o-mini"]
},
{
"name": "anthropic",
"api_key": "sk-ant-...",
"models": ["claude-sonnet-4", "claude-opus-4"]
},
{
"name": "azure",
"api_key": "...",
"endpoint": "https://your-resource.openai.azure.com"
}
]
}
Your code
# OpenAI
response = client.chat.completions.create(
model="gpt-4", # Routes to OpenAI
messages=[...]
)
# Anthropic (same code structure)
response = client.chat.completions.create(
model="anthropic/claude-sonnet-4", # Routes to Anthropic
messages=[...]
)
Switch providers by changing the model name.
No refactoring required.
Built-In Observability Integration
Bifrost integrates with observability platforms out of the box.
Maxim AI
{
"plugins": [
{
"name": "maxim",
"config": {
"api_key": "your-maxim-key",
"repo_id": "your-repo-id"
}
}
]
}
Every request is automatically traced to the Maxim dashboard.
Zero instrumentation code.
Prometheus
{
"metrics": {
"enabled": true,
"port": 9090
}
}
Metrics exposed at /metrics.
Plug into your existing Prometheus setup.
OpenTelemetry
{
"otel": {
"enabled": true,
"endpoint": "http://your-collector:4318"
}
}
Standard OTLP export to any OpenTelemetry collector.
Framework-Specific Integrations
Claude Code
Update your Claude Code config:
{
"baseURL": "http://localhost:8080/openai",
"provider": "anthropic"
}
All Claude Code requests now flow through Bifrost.
Track token usage, costs, and cache responses automatically.
LibreChat
Add to librechat.yaml:
custom:
- name: "Bifrost"
apiKey: "dummy"
baseURL: "http://localhost:8080/v1"
models:
default: ["openai/gpt-4o"]
Universal model access across all configured providers.
MCP (Model Context Protocol) Support
Bifrost supports MCP for tool calling and context management.
Configure MCP servers
{
"mcp": {
"servers": [
{
"name": "filesystem",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem"]
},
{
"name": "brave-search",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-brave-search"],
"env": {
"BRAVE_API_KEY": "your-key"
}
}
]
}
}
Your LLM calls automatically gain access to MCP tools.
No manual tool definitions required.
Deployment Options
Docker
docker run -p 8080:8080 \
-e OPENAI_API_KEY=sk-... \
maximhq/bifrost:latest
Docker Compose
services:
bifrost:
image: maximhq/bifrost:latest
ports:
- "8080:8080"
environment:
- OPENAI_API_KEY=sk-...
volumes:
- ./data:/app/data
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: bifrost
spec:
replicas: 3
template:
spec:
containers:
- name: bifrost
image: maximhq/bifrost:latest
ports:
- containerPort: 8080
Terraform examples are available in the docs.
Real Integration Example
Before (Direct OpenAI)
import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent
openai.api_key = "sk-..."
llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm)
# No observability
# No caching
# No load balancing
# No failover
After (Through Bifrost)
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent
llm = ChatOpenAI(
model="gpt-4",
openai_api_base="http://localhost:8080/langchain"
)
agent = initialize_agent(tools, llm)
# Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓
One line changed. All features enabled.
Migration Checklist
1. Install Bifrost
docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost
2. Add API keys
- Visit
http://localhost:8080 - Add your provider keys
3. Update base URL
openai.api_base = "http://localhost:8080/openai"
LangChain:
openai_api_base = "http://localhost:8080/langchain"
4. Test one request
Verify it works and check the dashboard.
5. Deploy
Everything else stays the same.
Total migration time: ~10 minutes.
Try It Yourself
git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up
Full integration examples for LangChain, LiteLLM, and more are available in the GitHub repo.
The Bottom Line
Bifrost integrates with your existing stack in minutes:
- OpenAI-compatible API (works everywhere)
- Change one URL, keep all your code
- Multi-provider support through one interface
- Built-in observability with zero instrumentation
No refactoring. No new SDKs. Just drop it in.
Built by the team at Maxim AI — we also build evaluation and observability tools for production AI agents.
Top comments (0)