DEV Community

Takashi Fujita
Takashi Fujita

Posted on

๐Ÿ› ๏ธ A Practical Meta-MCP Architecture for Claude Code: Compressing 60+ Tools into Just Two

Introduction

When using multiple MCP servers in Claude Code (e.g., Notion, Playwright, Chrome DevTools),
the tool definitions alone can consume 36.6k tokens of context.

To reduce this overhead, I built a meta MCP server called Code Mode, which exposes only two tools:

  • search_docs โ€” search API definitions of child MCP servers
  • execute_code โ€” run TypeScript that calls MCP bindings

This reduced the token usage from 36.6k โ†’ 4.4k tokens (โˆ’88%).

This post explains the approach, the architecture, and the practical pitfalls I encountered when implementing it.


Background: Why a Meta-MCP?

The design is inspired by Anthropic's engineering post:

๐Ÿ”— Code execution with MCP
https://www.anthropic.com/engineering/code-execution-with-mcp

Instead of giving the LLM a long list of explicit tool definitions,
you give it just enough machinery to:

  1. look up available API bindings, and
  2. execute code that uses them.

For this experiment, I connected three child MCP servers:

  • Notion MCP
  • Playwright MCP
  • Chrome DevTools MCP

Playwright and Chrome DevTools overlap, but I wanted to compare their behavior.
(Yes, this inflates the original token count โ€” but the reduction effect remains the same.)


The MCP Context Problem

Hereโ€™s what Claude Code loads when these MCP servers are enabled:

MCP tools: 36.6k tokens (18.3%)
โ”” chrome-devtools (26 tools)
โ”” playwright (22 tools)
โ”” notion (15 tools)
Enter fullscreen mode Exit fullscreen mode

More than 60 tools are injected into the conversation context every time.

On a 200k context window, 18% was being consumed by tool definitions alone,
leaving less room for code, logs, or iterative reasoning.


The Code Mode Approach

Code Mode exposes only two tools:

Code Mode MCP
โ”œโ”€ search_docs: query API schemas of child MCP servers
โ””โ”€ execute_code: run TypeScript that calls MCP bindings
Enter fullscreen mode Exit fullscreen mode

The LLM workflow becomes:

  1. search_docs โ†’ find the API
  2. execute_code โ†’ perform the operation

search_docs

// List all bindings
await search_docs({})
// => notion, playwright, chrome, ...

// Search by keyword
await search_docs({ query: "navigate" })
// => returns schema for playwright.browser_navigate
Enter fullscreen mode Exit fullscreen mode

execute_code

The core of Code Mode.

await execute_code({
  code: `
    await playwright.browser_navigate({ url: "https://example.com" });
    const snapshot = await playwright.browser_snapshot({});
    console.log(snapshot);
  `,
})
Enter fullscreen mode Exit fullscreen mode

All available bindings (playwright, notion, chrome, etc.)
are injected into the execution environment automatically.


Implementation Notes

Choosing Deno for the Execution Environment

execute_code needs to run arbitrary TypeScript generated by the LLM,
but inside a sandbox.

Deno was the best fit because:

  1. TypeScript runs without transpilation
  2. Worker API is built-in and behaves like browser Web Workers
  3. Permission model is strong and easy to restrict
  4. The MCP TypeScript SDK works in Deno

Hereโ€™s how the sandbox worker is created:

const worker = new Worker(workerUrl, {
  type: "module",
  deno: {
    permissions: {
      net: false,
      read: false,
      write: false,
      env: false,
      run: false,
      ffi: false,
    },
  },
})
Enter fullscreen mode Exit fullscreen mode

Execution Constraints

Item Constraint
Timeout 30 seconds
Allowed console.log, basic JS/TS, MCP bindings
Forbidden fetch, file access, external network

Everything must go through MCP tool calls, which are relayed to the parent process.


Connecting Child MCP Servers

Architecture:

Claude Code
โ†“ stdio
Code Mode (search_docs / execute_code)
โ”œโ”€ Notion MCP (SSE via mcp-remote)
โ”œโ”€ Chrome DevTools MCP (stdio)
โ””โ”€ Playwright MCP (stdio)
Enter fullscreen mode Exit fullscreen mode

If a child server fails to connect:

  • It is omitted from search_docs
  • Any attempt to call its bindings results in Unknown tool

Currently, reconnection is manual (/mcp restart).


Pitfall: โ€œos error 35โ€ When Nesting MCP Servers

When Code Mode launched another MCP server via StdioClientTransport,
I repeatedly hit:

Resource temporarily unavailable (os error 35)
Enter fullscreen mode Exit fullscreen mode

Root Cause

StdioClientTransport defaults to:

stderr: "inherit"
Enter fullscreen mode Exit fullscreen mode

Since the meta server itself communicates with Claude Code via stdio,
stderr inheritance causes I/O interference.

Fix

Always use:

const transport = new StdioClientTransport({
  command: "npx",
  args: ["-y", "@playwright/mcp@latest"],
  stderr: "pipe",  // REQUIRED for nested MCP
})
Enter fullscreen mode Exit fullscreen mode

Do this for all child MCP servers.


Reducing the Tool Definition Size

Originally, the execute_code tool description included:

  • all available bindings
  • code examples
  • server lists

It reached 1.3k tokens by itself.

After simplification:

description: "Execute TypeScript code that uses child MCP servers. Use search_docs to inspect available bindings."
Enter fullscreen mode Exit fullscreen mode

โ†’ 687 tokens.

Tool definitions now scale linearly regardless of how many child MCP servers exist.


Results

Metric Before After
MCP tool tokens 36.6k 4.4k
Number of tools 63 2
Context usage 18.3% 2.2%

The difference in conversational โ€œbreathing roomโ€ is substantial.


Limitations

1. Less intuitive for an LLM

Normal MCP tools are self-descriptive.
Code Mode requires:

  1. searching docs
  2. writing code

LLMs need a short introduction before they can use it effectively.


2. Hardcoded configuration

Code Mode is not a plug-and-play npm package.

  • Child server paths are hardcoded
  • Users must adjust the code to their environment

This is an approach, not a product.


3. Weak reconnection handling

Once a child MCP disconnects (OAuth expiry, browser closure, etc.):

  • There is no auto-retry
  • Manual /mcp restart is required

This can be improved.


Conclusion

  • MCP tool definitions consume significant context
  • A meta MCP server can compress dozens of tools into two
  • Deno provides a robust sandbox for executing TypeScript safely
  • stderr: "pipe" is essential to avoid I/O conflicts in nested MCP setups
  • The approach works well but isnโ€™t universal or beginner-friendly

The full implementation (Deno + MCP SDK) is available here:

๐Ÿ‘‰ https://gist.github.com/tgfjt/a3716f7b1651d7fd7df2d769efdf644e

Hope this helps anyone experimenting with multi-MCP setups or context-efficient tooling!


References

Top comments (0)