Compounding Agent Swarms: Multi-Agent Architectures That Scale Without Breaking the Bank
A swarm of intelligent agents is not inherently better than one. The difference is what happens after they talk to each other.
This is part six of the AI to Web3 series. We have built five independent agent capabilities for Hydra: LangGraph orchestration (Article 1), n8n execution (Article 2), RAG knowledge (Article 3), LangFuse observability (Article 4), and a fine-tuned DeFi specialist (Article 5).
Today we coordinate them into a swarm — and we do it without letting the cost spiral out of control.
Why we are writing about this
68% of new DeFi protocols launched in Q1 2026 include at least one autonomous AI agent, according to BlockEden's Q1 2026 DeFi report. Multi-agent systems are no longer experimental. They are infrastructure.
But there is a trap. Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to escalating costs and inadequate risk controls. Research shows multi-agent systems cost 4.8x more than single agents for the same task when poorly designed. The teams that succeed are the ones that design cost into the architecture from the start — not as an afterthought.
The framework landscape in 2026
The multi-agent ecosystem has consolidated. The frameworks worth knowing:
LangGraph — graph-based state machines for complex, branching agent workflows. Durable execution, time-travel debugging, human-in-the-loop. The S-tier choice for production systems with explicit control requirements. This is Hydra's orchestration backbone.
CrewAI — role-based agent teams (role, goal, backstory). The fastest path from idea to working multi-agent demo. Uses LiteLLM internally, so per-agent model assignment is first-class. Best for prototyping and well-defined hierarchical tasks.
AutoGen AG2 — conversational emergence. Agents talk to each other iteratively until they converge on a solution. Strong for open-ended research and code generation where you do not know the steps in advance.
Google ADK — hierarchical agent trees with native A2A protocol support. The right choice for Google Cloud shops or systems that need to federate with external agents via A2A.
OpenAI Agents SDK — explicit handoffs between agents. Clean mental model, built-in guardrails and tracing. Best for OpenAI-ecosystem teams.
Mastra — TypeScript-native graph-based workflows with .network() routing. The LangGraph equivalent for the JavaScript/TypeScript stack.
Strands Agents (AWS) — four collaboration patterns (agents-as-tools, swarms, agent graphs, workflows) optimized for Amazon Nova models. Cost-competitive due to Nova's pricing.
The protocol stack: MCP and A2A
The agent communication layer has standardized under the Agentic AI Foundation (AAIF), launched in December 2025 by OpenAI, Anthropic, Google, Microsoft, AWS, and Block.
MCP (Model Context Protocol) — Anthropic's standard, now AAIF-governed. Handles agent-to-tool communication: how an agent connects to databases, APIs, file systems, and services. Every major framework supports it.
A2A (Agent-to-Agent Protocol) — Google's contribution. Handles peer discovery (Agent Cards at /.well-known/agent-card.json), task delegation between agents, and multi-agent workflows via JSON-RPC 2.0 over HTTP. When agents need to talk to each other rather than to tools, A2A is the protocol.
ACP (Agent Commerce Protocol) — IBM/BeeAI, now Linux Foundation. Pricing, offers, and transaction state for agent-to-agent commerce. Relevant when agents are billing each other for services.
For Hydra: MCP for tool access (Sentinel's vector store, Executor's n8n webhook), A2A for inter-agent coordination in the swarm.
Orchestration patterns
The right pattern depends on the task structure. The five patterns that appear most in production:
Orchestrator-Worker. A lead agent decomposes a goal into subtasks and delegates to specialist workers. The orchestrator synthesizes their outputs. This is the dominant pattern for complex, multi-domain problems — and it is what Hydra uses.
Hierarchical. A tree structure: root agent manages sub-agents recursively. Each level handles a different scope. Useful for enterprise workflows with clear authority structures.
Swarm (P2P). Decentralized mesh — agents share a context object and pick up tasks from a queue based on their capabilities. Useful for parallelizable workloads without a clear orchestration hierarchy.
Critic-Refiner. A generator agent produces output; an evaluator agent critiques it; the generator revises. Iterate until the evaluator is satisfied. Used in Hydra's Oracle agent to validate signals before passing them to the Strategist.
Agents-as-Tools. Specialized agents are wrapped as callable tools and invoked by a primary agent as needed. The simplest composition pattern — no explicit orchestration graph required.
The cost problem — and how to solve it
This deserves direct treatment because it is where most multi-agent projects fail.
The baseline reality: a single production agent costs $7,050–$21,100/month. A multi-agent system naively running everything on frontier models costs 4.8x more. That is not a rounding error.
The solution is model tiering — assigning different models to different agent roles based on the reasoning requirements of each role.
The model cost ladder (April 2026)
| Tier | Model | Input / Output per 1M tokens | Use for |
|---|---|---|---|
| Free | Fine-tuned Qwen 7B via Ollama | $0 (electricity only) | Domain-specialist analysis |
| Budget | DeepSeek V3.2 | $0.28 / $0.42 | High-volume retrieval, formatting, extraction |
| Budget | GPT-5 Nano | $0.05 / $0.40 | Classification, routing, structured tool calls |
| Mid-tier | Claude Sonnet 4.6 | $3.00 / $15.00 | Complex reasoning, planning, security analysis |
| Frontier | Claude Opus 4.6 | $5.00 / $25.00 | Mission-critical synthesis (use sparingly) |
Prompt caching — Anthropic, OpenAI, and Google all offer up to 90% discount on repeated system prompts. Strategist and Guardian have long system prompts — cache them.
Semantic caching with Redis — for high-repetition queries (current pool fees, token prices), a vector search over cached responses before triggering new retrieval. Up to 73% cost reduction for high-repetition workloads.
Batch APIs — ~50% discount for non-time-sensitive analysis. The Analyst's deep research runs are not time-critical — batch them.
OpenRouter as the gateway
OpenRouter provides a single OpenAI-compatible API to 300+ models from 60+ providers. LangChain's first-party langchain-openrouter package integrates directly with LangGraph:
pip install langchain-openrouter
from langchain_openrouter import ChatOpenRouter
# Each agent gets its own model — LangGraph instantiates per node
strategist_llm = ChatOpenRouter(
model="anthropic/claude-sonnet-4-6",
openrouter_provider={"sort": "cost", "allow_fallbacks": True},
)
sentinel_llm = ChatOpenRouter(model="deepseek/deepseek-chat-v3-2")
oracle_llm = ChatOpenRouter(model="openai/gpt-5-nano")
OpenRouter routes by availability and cost, not by task difficulty. It is a proxy, not an intelligent router. For the 80% of queries that are routine, this is sufficient. For the 20% requiring escalation, you layer a difficulty classifier on top.
Difficulty-based routing
View Hydra code
# hydra/router.py
from langchain_openrouter import ChatOpenRouter
# Cheap classifier — GPT-5 Nano is fast and accurate enough for this
classifier_llm = ChatOpenRouter(model="openai/gpt-5-nano")
async def route_by_complexity(task: str) -> ChatOpenRouter:
"""
Pre-screens task complexity and returns the appropriate model.
The classifier itself uses the cheapest capable model.
"""
result = await classifier_llm.ainvoke(
f"Rate the reasoning complexity of this DeFi analysis task on a scale of 1-5. "
f"Return only the number.\n\nTask: {task}"
)
score = int(result.content.strip())
if score >= 4:
return ChatOpenRouter(model="anthropic/claude-sonnet-4-6")
elif score >= 2:
return ChatOpenRouter(model="deepseek/deepseek-chat-v3-2")
else:
# Route to local model — zero cost
from langchain_community.llms import Ollama
return Ollama(model="hydra-analyst", base_url="http://localhost:11434")
The CASTER paper (January 2026) formalizes this "Context-Aware Strategy for Task Efficient Routing" approach in graph-based multi-agent systems and reports up to 72.4% cost reduction without quality degradation.
Compounding: how agent swarms get smarter over time
The "compounding" effect is the actual differentiator of multi-agent systems — not parallelization (though that helps), but the way agent outputs feed back as context into other agents across cycles.
In a naive implementation, each agent runs once and outputs a result. In a compounding swarm:
- The Sentinel retrieves current pool state and tags it with a confidence score
- The Analyst evaluates the pool state using the fine-tuned model, producing a risk assessment
- The risk assessment is stored in the shared state and also in the knowledge base (vector store)
- On the next cycle, the Sentinel retrieves not just raw protocol data but also the Analyst's previous assessment — it has learned from the last cycle
- The Strategist observes that the Analyst's assessment has been consistent for 3 cycles and increases its decision confidence accordingly
Agent memory compounds. Risk patterns become learned context. The system's effective intelligence grows with each cycle, without retraining any model.
This requires the Mem0 memory layer (introduced in Article 3) and explicit state management in the LangGraph graph.
Hydra — Article 6 contribution: the Oracle agent and full swarm coordination
The Oracle is the last information agent — it provides macro context (market sentiment, protocol news, governance signals) using Agentic RAG over web and social sources. The Strategist now has three information streams: Sentinel (on-chain), Oracle (off-chain), Analyst (domain reasoning).
View Hydra code
# hydra/oracle.py
from langchain_openrouter import ChatOpenRouter
from hydra.orchestrator import HydraState
from hydra.router import route_by_complexity
import httpx
# Budget model for web signal triage — high volume, low complexity
oracle_llm = ChatOpenRouter(model="openai/gpt-5-nano")
async def oracle_node(state: HydraState) -> HydraState:
"""
Critic-refiner pattern: fetches web signals, evaluates quality,
passes only high-confidence signals to the Strategist.
"""
positions = [pos["protocol"] for pos in state["portfolio"].get("positions", [])]
queries = [f"{protocol} security news risk governance 2026" for protocol in positions]
raw_signals = []
async with httpx.AsyncClient() as client:
for query in queries:
# Firecrawl for web research (via n8n webhook in production)
response = await client.get(
"http://localhost:5678/webhook/hydra-research",
params={"q": query},
)
raw_signals.extend(response.json().get("results", []))
# Critic step: filter signals by relevance and recency
critique_prompt = (
f"Filter these signals for relevance to portfolio positions {positions}. "
f"Return only signals with material impact. Signals:\n{raw_signals}"
)
relevant_signals = await oracle_llm.ainvoke(critique_prompt)
return {**state, "signals": state["signals"] + [
{"source": "oracle", "content": relevant_signals.content, "confidence": 0.75}
]}
# Updated Strategist with full swarm coordination
async def strategist_node(state: HydraState) -> HydraState:
from hydra.router import route_by_complexity
synthesis_task = (
f"Portfolio has {len(state['portfolio'].get('positions', []))} positions. "
f"Received {len(state['signals'])} signals. Produce portfolio-level decisions."
)
# Route to the right model based on complexity
strategist_llm = await route_by_complexity(synthesis_task)
response = await strategist_llm.ainvoke(
f"Portfolio: {state['portfolio']}\n\n"
f"Signals from Sentinel (on-chain), Oracle (off-chain), and Analyst:\n{state['signals']}\n\n"
f"Identified risks: {state['risks']}\n\n"
f"Propose specific portfolio adjustments with rationale."
)
decisions = _parse_decisions(response.content)
return {**state, "decisions": decisions}
The final LangGraph graph — all six agents coordinated:
View orchestrator code
# hydra/orchestrator.py (final Article 6 version)
from langgraph.graph import StateGraph, START, END
from hydra.sentinel import sentinel_node
from hydra.oracle import oracle_node, strategist_node
from hydra.analyst import analyst_node
from hydra.executor import executor_node
from hydra.observer import get_langfuse_callback
def build_hydra_graph(checkpointer=None):
graph = StateGraph(HydraState)
# Information gathering — run in parallel
graph.add_node("sentinel", sentinel_node) # on-chain RAG
graph.add_node("oracle", oracle_node) # off-chain RAG, critic-refiner
graph.add_node("analyst", analyst_node) # fine-tuned DeFi specialist
# Orchestration and execution
graph.add_node("strategist", strategist_node) # synthesis + routing
graph.add_node("executor", executor_node) # n8n webhook → on-chain
# Parallel information gathering from START
graph.add_edge(START, "sentinel")
graph.add_edge(START, "oracle")
# Analyst waits for Sentinel signals
graph.add_edge("sentinel", "analyst")
# Strategist waits for all three
graph.add_edge("analyst", "strategist")
graph.add_edge("oracle", "strategist")
# Conditional execution gate
graph.add_conditional_edges(
"strategist",
lambda s: "executor" if s["decisions"] and s["human_approved"] else END,
{"executor": "executor", END: END},
)
graph.add_edge("executor", END)
return graph.compile(checkpointer=checkpointer)
Hydra cost architecture at scale
With model tiering, caching, and the fine-tuned local model in place, the estimated monthly cost for a moderately active Hydra instance (100 decision cycles/day):
| Agent | Model | Est. monthly cost |
|---|---|---|
| Strategist | Sonnet 4.6 (with prompt caching) | ~$120 |
| Sentinel | DeepSeek V3.2 | ~$15 |
| Analyst | Fine-tuned Qwen 7B via Ollama | ~$5 (electricity) |
| Oracle | GPT-5 Nano (with semantic cache) | ~$8 |
| Executor | GPT-5 Nano (structured calls) | ~$3 |
| Total | ~$151/month |
Compared to running every agent on Sonnet 4.6: ~$2,700/month. 94% cost reduction from model tiering alone.
The stack so far
| Layer | Technology | Status |
|---|---|---|
| Orchestration | LangGraph 1.1 | Done — Article 1 |
| Automation | n8n 2.0 | Done — Article 2 |
| Knowledge | pgvector + LlamaIndex + GraphRAG | Done — Article 3 |
| Observability | LangFuse (self-hosted) | Done — Article 4 |
| Specialization | Fine-tuned Qwen 7B via Ollama | Done — Article 5 |
| Coordination | Multi-agent swarm + OpenRouter routing | Done — this article |
| Security | SOAR + Guardian | Article 7 |
| Resilience | Structured logging · Tenacity retries · LangFuse self-hosted | Article 8 |
Next in this series: SOAR capabilities — a compounding swarm of DeFi agents is a powerful target. The final article adds the Guardian: an autonomous security agent with the ability to detect exploits, simulate transactions before signing, and trigger incident response. And the full Hydra architecture proposal.
AI to Web3 series — building Hydra, a sovereign multi-agent DeFi intelligence mesh:
1 — LangChain orchestration · 2 — n8n execution · 3 — RAG at scale · 4 — LLM observability · 5 — Fine-tuning · 6 — Agent swarms · 7 — SOAR · 8 — Production resilience
Get weekly intel — courtesy of intel.hyperdrift.io