Kuldeep Paul

Posted on Dec 12

How Bifrost Integrates With Your Existing LLM Stack (No Refactoring Required)

#openai #tooling #architecture #llm

The Problem

You’ve built your LLM application. It works.

Now you want better observability, load balancing, or caching.

Most solutions require:

Rewriting your API calls
Learning new SDKs
Refactoring working code
Testing everything again

We built Bifrost to be different: drop it in, change one URL, done.

OpenAI-Compatible API

Bifrost speaks OpenAI’s API format.

If your code works with OpenAI, it works with Bifrost.

Before

import openai

openai.api_key = "sk-..."

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

After

import openai

openai.api_base = "http://localhost:8080/openai"  # Only change
openai.api_key = "sk-..."  # Your actual API key

response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

One line changed. That’s it.

Works With Every Major Framework

Because Bifrost is OpenAI-compatible, it works with any framework that supports OpenAI.

LangChain

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/langchain",
    openai_api_key="sk-..."
)

LlamaIndex

from llama_index.llms import OpenAI

llm = OpenAI(
    api_base="http://localhost:8080/openai",
    api_key="sk-..."
)

LiteLLM

import litellm

response = litellm.completion(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    base_url="http://localhost:8080/litellm"
)

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:8080/anthropic",
    api_key="sk-ant-..."
)

Same pattern everywhere: change the base URL, keep everything else.

Multiple Providers, One Interface

Bifrost routes to multiple providers through the same API.

Configuration

{
  "providers": [
    {
      "name": "openai",
      "api_key": "sk-...",
      "models": ["gpt-4", "gpt-4o-mini"]
    },
    {
      "name": "anthropic",
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4", "claude-opus-4"]
    },
    {
      "name": "azure",
      "api_key": "...",
      "endpoint": "https://your-resource.openai.azure.com"
    }
  ]
}

Your code

# OpenAI
response = client.chat.completions.create(
    model="gpt-4",  # Routes to OpenAI
    messages=[...]
)

# Anthropic (same code structure)
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",  # Routes to Anthropic
    messages=[...]
)

Switch providers by changing the model name.

No refactoring required.

Built-In Observability Integration

Bifrost integrates with observability platforms out of the box.

Maxim AI

{
  "plugins": [
    {
      "name": "maxim",
      "config": {
        "api_key": "your-maxim-key",
        "repo_id": "your-repo-id"
      }
    }
  ]
}

Every request is automatically traced to the Maxim dashboard.

Zero instrumentation code.

Prometheus

{
  "metrics": {
    "enabled": true,
    "port": 9090
  }
}

Metrics exposed at /metrics.

Plug into your existing Prometheus setup.

OpenTelemetry

{
  "otel": {
    "enabled": true,
    "endpoint": "http://your-collector:4318"
  }
}

Standard OTLP export to any OpenTelemetry collector.

Framework-Specific Integrations

Claude Code

Update your Claude Code config:

{
  "baseURL": "http://localhost:8080/openai",
  "provider": "anthropic"
}

All Claude Code requests now flow through Bifrost.

Track token usage, costs, and cache responses automatically.

LibreChat

Add to librechat.yaml:

custom:
  - name: "Bifrost"
    apiKey: "dummy"
    baseURL: "http://localhost:8080/v1"
    models:
      default: ["openai/gpt-4o"]

Universal model access across all configured providers.

MCP (Model Context Protocol) Support

Bifrost supports MCP for tool calling and context management.

Configure MCP servers

{
  "mcp": {
    "servers": [
      {
        "name": "filesystem",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem"]
      },
      {
        "name": "brave-search",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-brave-search"],
        "env": {
          "BRAVE_API_KEY": "your-key"
        }
      }
    ]
  }
}

Your LLM calls automatically gain access to MCP tools.

No manual tool definitions required.

Deployment Options

Docker

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  maximhq/bifrost:latest

Docker Compose

services:
  bifrost:
    image: maximhq/bifrost:latest
    ports:
      - "8080:8080"
    environment:
      - OPENAI_API_KEY=sk-...
    volumes:
      - ./data:/app/data

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: bifrost
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: bifrost
          image: maximhq/bifrost:latest
          ports:
            - containerPort: 8080

Terraform examples are available in the docs.

Real Integration Example

Before (Direct OpenAI)

import openai
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent

openai.api_key = "sk-..."

llm = ChatOpenAI(model="gpt-4")
agent = initialize_agent(tools, llm)

# No observability
# No caching
# No load balancing
# No failover

After (Through Bifrost)

from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent

llm = ChatOpenAI(
    model="gpt-4",
    openai_api_base="http://localhost:8080/langchain"
)

agent = initialize_agent(tools, llm)

# Automatic observability ✓
# Semantic caching ✓
# Multi-key load balancing ✓
# Provider failover ✓

One line changed. All features enabled.

Migration Checklist

1. Install Bifrost

docker run -p 8080:8080 -v $(pwd)/data:/app/data maximhq/bifrost

2. Add API keys

Visit http://localhost:8080
Add your provider keys

3. Update base URL

openai.api_base = "http://localhost:8080/openai"

LangChain:

openai_api_base = "http://localhost:8080/langchain"

4. Test one request

Verify it works and check the dashboard.

5. Deploy

Everything else stays the same.

Total migration time: ~10 minutes.

Try It Yourself

git clone https://github.com/maximhq/bifrost
cd bifrost
docker compose up

Full integration examples for LangChain, LiteLLM, and more are available in the GitHub repo.

The Bottom Line

Bifrost integrates with your existing stack in minutes:

OpenAI-compatible API (works everywhere)
Change one URL, keep all your code
Multi-provider support through one interface
Built-in observability with zero instrumentation

No refactoring. No new SDKs. Just drop it in.

Built by the team at Maxim AI — we also build evaluation and observability tools for production AI agents.

DEV Community