Ten tools that gave Claude Code a brain
Claude Code kept hitting token limits on a 9-platform OTT monorepo. The CLAUDE.md file had grown
to 105k characters — 29 workflow documents, schema docs, and API docs all inlined — and the model
was saturating mid-session with no way to inspect live infrastructure state. The fix was to give it
tools instead of text.
The result is 10 custom MCP servers: TypeScript processes that Claude Code spawns on demand via
stdio transport, each connecting to a specific layer of the stack — Postgres, BullMQ, MinIO, Docker,
Prometheus, Grafana, feature flags, environment secrets, semantic code search, and project task state.
Together they reduced CLAUDE.md from 105k to 32k characters while adding live infrastructure
visibility that never existed before.
All 10 servers are wired into .ruler/mcp.json so every claude session inside the monorepo loads
the full toolset automatically. Several servers point their env vars at Tailscale mesh IPs so a
session on the development iMac can reach Docker services on a remote Linux machine without SSH
tunnels.
The iMac had 16 GB of RAM and all stateful Docker services — Postgres, Dragonfly, MinIO, Temporal,
NATS, Prometheus, Grafana, Flipt — lived on two machines accessible only over Tailscale SSH. Inlining
every piece of context into CLAUDE.md was the first solution attempted, and 105k characters was
where it broke. Third-party MCP servers (filesystem, sqlite) are generic — they cannot understand a
BullMQ queue topology, OTT task state, or the per-app environment variable requirements of a
monorepo with 6 distinct service boundaries.
The servers also had to be environment-variable–driven so DOCKER_HOST, DB_HOST, and REDIS_HOST
could be pointed at remote Tailscale IPs at session start with no changes to server source code.
The debugging loop for failing transcoding jobs — browser, SSH terminal, editor — had to collapse
entirely into the coding session.
All 10 servers use StdioServerTransport — each is a long-running child process spawned by Claude
Code, communicating over stdin/stdout, then connecting out to its target service. Zero port conflicts,
no auth layer, and the server lifecycle is tied to the Claude Code session itself.
Storage is per-server and purpose-fit. ott-context-mcp uses a local SQLite database with 17 tables
in WAL mode. code-embeddings-mcp stores 768-dimensional vectors in self-hosted Qdrant, generated
locally by Ollama's nomic-embed-text model. drizzle-studio-mcp connects directly to Postgres and
renders results as ASCII box-drawing tables in Claude's response. Tailscale bridging works by passing
DOCKER_HOST=ssh://user@<tailscale-ip> to dockerode, which tunnels Docker API calls over SSH
automatically.
The code-embeddings server uses a BatchProcessor (50 items, 5 concurrent), a 10k-entry LRU
EmbeddingCache, and a RateLimiter to avoid overwhelming Ollama. Incremental hash-based change
detection means re-indexing only touches modified files.
stdio over HTTP transport. Zero port conflicts, no auth surface, and the server dies cleanly with the session. HTTP would be necessary for a shared team setup; for solo use it is pure overhead.
Environment-variable host injection for Tailscale. Every server reads its target from env vars
rather than hardcoded IPs. dockerode parses DOCKER_HOST=ssh://user@host natively — no manual
tunnel setup required.
SQLite for ott-context-mcp. Postgres already runs on a remote machine. Adding a network
dependency to what must be a fast, always-available local tool was the wrong tradeoff. SQLite with
WAL mode handles the concurrent read/write pattern of multiple tool calls per session without blocking.
Local Ollama, not a cloud embedding API. Zero cost per embedding call, no data leaving the
machine, and nomic-embed-text 768-dim is sufficient for TypeScript/TSX code search. The embedding
cache and incremental indexing absorb the latency tradeoff.
Pattern learning as a first-class tool. The learn_pattern tool writes structured records —
mistake, correction, severity, package scope, optional auto-fix script — to SQLite. Claude Code calls
it at the end of a debugging session, creating a durable correction library that persists across
context resets — unlike unreliable "remember this" prompts.
CLAUDE.md reduction as a KPI. The 69% reduction from 105k to 32k characters was the stated
design target. Workflow docs load on demand via workflow_get; context budget is spent on code.
context reduction
TypeScript
infra bridges
Did this resonate?
CLAUDE.md shrank from 105k to 32k characters — roughly 18,250 tokens saved per session —
eliminating context-limit interruptions that had been routine. Infra diagnostics that previously
required a browser or SSH terminal now happen inside the coding session: when a transcoding job
fails, Claude Code can inspect the BullMQ queue, check Docker logs, query Prometheus, and suggest a
fix without switching context. The ott-context-mcp pattern-learning system means recurring error
classes are flagged before they repeat. The code-embeddings-mcp enables natural-language code
navigation across a monorepo too large to hold in context — at zero per-query cost using local
Ollama inference.