Overview

Claude Code kept hitting token limits on a 9-platform OTT monorepo. The CLAUDE.md file had grown to 105k characters — 29 workflow documents, schema docs, and API docs all inlined — and the model was saturating mid-session with no way to inspect live infrastructure state. The fix was to give it tools instead of text.

The result is 10 custom MCP servers: TypeScript processes that Claude Code spawns on demand via stdio transport, each connecting to a specific layer of the stack — Postgres, BullMQ, MinIO, Docker, Prometheus, Grafana, feature flags, environment secrets, semantic code search, and project task state. Together they reduced CLAUDE.md from 105k to 32k characters while adding live infrastructure visibility that never existed before.

All 10 servers are wired into .ruler/mcp.json so every claude session inside the monorepo loads the full toolset automatically. Several servers point their env vars at Tailscale mesh IPs so a session on the development iMac can reach Docker services on a remote Linux machine without SSH tunnels.

The Challenge

The iMac had 16 GB of RAM and all stateful Docker services — Postgres, Dragonfly, MinIO, Temporal, NATS, Prometheus, Grafana, Flipt — lived on two machines accessible only over Tailscale SSH. Inlining every piece of context into CLAUDE.md was the first solution attempted, and 105k characters was where it broke. Third-party MCP servers (filesystem, sqlite) are generic — they cannot understand a BullMQ queue topology, OTT task state, or the per-app environment variable requirements of a monorepo with 6 distinct service boundaries.

The servers also had to be environment-variable–driven so DOCKER_HOST, DB_HOST, and REDIS_HOST could be pointed at remote Tailscale IPs at session start with no changes to server source code. The debugging loop for failing transcoding jobs — browser, SSH terminal, editor — had to collapse entirely into the coding session.

Architecture

All 10 servers use StdioServerTransport — each is a long-running child process spawned by Claude Code, communicating over stdin/stdout, then connecting out to its target service. Zero port conflicts, no auth layer, and the server lifecycle is tied to the Claude Code session itself.

Storage is per-server and purpose-fit. ott-context-mcp uses a local SQLite database with 17 tables in WAL mode. code-embeddings-mcp stores 768-dimensional vectors in self-hosted Qdrant, generated locally by Ollama's nomic-embed-text model. drizzle-studio-mcp connects directly to Postgres and renders results as ASCII box-drawing tables in Claude's response. Tailscale bridging works by passing DOCKER_HOST=ssh://user@<tailscale-ip> to dockerode, which tunnels Docker API calls over SSH automatically.

The code-embeddings server uses a BatchProcessor (50 items, 5 concurrent), a 10k-entry LRU EmbeddingCache, and a RateLimiter to avoid overwhelming Ollama. Incremental hash-based change detection means re-indexing only touches modified files.

Key Decisions

stdio over HTTP transport. Zero port conflicts, no auth surface, and the server dies cleanly with the session. HTTP would be necessary for a shared team setup; for solo use it is pure overhead.

Environment-variable host injection for Tailscale. Every server reads its target from env vars rather than hardcoded IPs. dockerode parses DOCKER_HOST=ssh://user@host natively — no manual tunnel setup required.

SQLite for ott-context-mcp. Postgres already runs on a remote machine. Adding a network dependency to what must be a fast, always-available local tool was the wrong tradeoff. SQLite with WAL mode handles the concurrent read/write pattern of multiple tool calls per session without blocking.

Local Ollama, not a cloud embedding API. Zero cost per embedding call, no data leaving the machine, and nomic-embed-text 768-dim is sufficient for TypeScript/TSX code search. The embedding cache and incremental indexing absorb the latency tradeoff.

Pattern learning as a first-class tool. The learn_pattern tool writes structured records — mistake, correction, severity, package scope, optional auto-fix script — to SQLite. Claude Code calls it at the end of a debugging session, creating a durable correction library that persists across context resets — unlike unreliable "remember this" prompts.

CLAUDE.md reduction as a KPI. The 69% reduction from 105k to 32k characters was the stated design target. Workflow docs load on demand via workflow_get; context budget is spent on code.

Results

Overview

The Challenge

Architecture

Key Decisions

stdio over HTTP transport. Zero port conflicts, no auth surface, and the server dies cleanly with the session. HTTP would be necessary for a shared team setup; for solo use it is pure overhead.

CLAUDE.md reduction as a KPI. The 69% reduction from 105k to 32k characters was the stated design target. Workflow docs load on demand via workflow_get; context budget is spent on code.

Custom MCP Servers

Overview

The Challenge

Architecture

Key Decisions

Results

Key Metrics

Related projects

Prachyam Dev Mesh

OTT Component Library

Comments

Custom MCP Servers

Overview

The Challenge

Architecture

Key Decisions

Results

Key Metrics

Related projects

Prachyam Dev Mesh

OTT Component Library

Comments

Local AI Inference Mesh