Compounding Agent Swarms: Multi-Agent Architectures That Scale Without Breaking the Bank

A swarm of intelligent agents is not inherently better than one. The difference is what happens after they talk to each other.

This is part six of the AI to Web3 series. We have built five independent agent capabilities for Hydra: LangGraph orchestration (Article 1), n8n execution (Article 2), RAG knowledge (Article 3), LangFuse observability (Article 4), and a fine-tuned DeFi specialist (Article 5).

Today we coordinate them into a swarm — and we do it without letting the cost spiral out of control.

Why we are writing about this

68% of new DeFi protocols launched in Q1 2026 include at least one autonomous AI agent, according to BlockEden's Q1 2026 DeFi report. Multi-agent systems are no longer experimental. They are infrastructure.

But there is a trap. Gartner predicts 40% of agentic AI projects will be canceled by 2027 due to escalating costs and inadequate risk controls. Research shows multi-agent systems cost 4.8x more than single agents for the same task when poorly designed. The teams that succeed are the ones that design cost into the architecture from the start — not as an afterthought.

The framework landscape in 2026

The multi-agent ecosystem has consolidated. The frameworks worth knowing:

LangGraph — graph-based state machines for complex, branching agent workflows. Durable execution, time-travel debugging, human-in-the-loop. The S-tier choice for production systems with explicit control requirements. This is Hydra's orchestration backbone.

CrewAI — role-based agent teams (role, goal, backstory). The fastest path from idea to working multi-agent demo. Uses LiteLLM internally, so per-agent model assignment is first-class. Best for prototyping and well-defined hierarchical tasks.

AutoGen AG2 — conversational emergence. Agents talk to each other iteratively until they converge on a solution. Strong for open-ended research and code generation where you do not know the steps in advance.

Google ADK — hierarchical agent trees with native A2A protocol support. The right choice for Google Cloud shops or systems that need to federate with external agents via A2A.

OpenAI Agents SDK — explicit handoffs between agents. Clean mental model, built-in guardrails and tracing. Best for OpenAI-ecosystem teams.

Mastra — TypeScript-native graph-based workflows with .network() routing. The LangGraph equivalent for the JavaScript/TypeScript stack.

Strands Agents (AWS) — four collaboration patterns (agents-as-tools, swarms, agent graphs, workflows) optimized for Amazon Nova models. Cost-competitive due to Nova's pricing.

The protocol stack: MCP and A2A

The agent communication layer has standardized under the Agentic AI Foundation (AAIF), launched in December 2025 by OpenAI, Anthropic, Google, Microsoft, AWS, and Block.

MCP (Model Context Protocol) — Anthropic's standard, now AAIF-governed. Handles agent-to-tool communication: how an agent connects to databases, APIs, file systems, and services. Every major framework supports it.

A2A (Agent-to-Agent Protocol) — Google's contribution. Handles peer discovery (Agent Cards at /.well-known/agent-card.json), task delegation between agents, and multi-agent workflows via JSON-RPC 2.0 over HTTP. When agents need to talk to each other rather than to tools, A2A is the protocol.

ACP (Agent Commerce Protocol) — IBM/BeeAI, now Linux Foundation. Pricing, offers, and transaction state for agent-to-agent commerce. Relevant when agents are billing each other for services.

For Hydra: MCP for tool access (Sentinel's vector store, Executor's n8n webhook), A2A for inter-agent coordination in the swarm.

Orchestration patterns

The right pattern depends on the task structure. The five patterns that appear most in production:

Orchestrator-Worker. A lead agent decomposes a goal into subtasks and delegates to specialist workers. The orchestrator synthesizes their outputs. This is the dominant pattern for complex, multi-domain problems — and it is what Hydra uses.

Hierarchical. A tree structure: root agent manages sub-agents recursively. Each level handles a different scope. Useful for enterprise workflows with clear authority structures.

Swarm (P2P). Decentralized mesh — agents share a context object and pick up tasks from a queue based on their capabilities. Useful for parallelizable workloads without a clear orchestration hierarchy.

Critic-Refiner. A generator agent produces output; an evaluator agent critiques it; the generator revises. Iterate until the evaluator is satisfied. Used in Hydra's Oracle agent to validate signals before passing them to the Strategist.

Agents-as-Tools. Specialized agents are wrapped as callable tools and invoked by a primary agent as needed. The simplest composition pattern — no explicit orchestration graph required.

The cost problem — and how to solve it

This deserves direct treatment because it is where most multi-agent projects fail.

The baseline reality: a single production agent costs $7,050–$21,100/month. A multi-agent system naively running everything on frontier models costs 4.8x more. That is not a rounding error.

The solution is model tiering — assigning different models to different agent roles based on the reasoning requirements of each role.

The model cost ladder (April 2026)

Tier	Model	Input / Output per 1M tokens	Use for
Free	Fine-tuned Qwen 7B via Ollama	$0 (electricity only)	Domain-specialist analysis
Budget	DeepSeek V3.2	$0.28 / $0.42	High-volume retrieval, formatting, extraction
Budget	GPT-5 Nano	$0.05 / $0.40	Classification, routing, structured tool calls
Mid-tier	Claude Sonnet 4.6	$3.00 / $15.00	Complex reasoning, planning, security analysis
Frontier	Claude Opus 4.6	$5.00 / $25.00	Mission-critical synthesis (use sparingly)

Prompt caching — Anthropic, OpenAI, and Google all offer up to 90% discount on repeated system prompts. Strategist and Guardian have long system prompts — cache them.

Semantic caching with Redis — for high-repetition queries (current pool fees, token prices), a vector search over cached responses before triggering new retrieval. Up to 73% cost reduction for high-repetition workloads.

Batch APIs — ~50% discount for non-time-sensitive analysis. The Analyst's deep research runs are not time-critical — batch them.

OpenRouter as the gateway

OpenRouter provides a single OpenAI-compatible API to 300+ models from 60+ providers. LangChain's first-party langchain-openrouter package integrates directly with LangGraph:

pip install langchain-openrouter

from langchain_openrouter import ChatOpenRouter

# Each agent gets its own model — LangGraph instantiates per node
strategist_llm = ChatOpenRouter(
    model="anthropic/claude-sonnet-4-6",
    openrouter_provider={"sort": "cost", "allow_fallbacks": True},
)
sentinel_llm = ChatOpenRouter(model="deepseek/deepseek-chat-v3-2")
oracle_llm = ChatOpenRouter(model="openai/gpt-5-nano")

OpenRouter routes by availability and cost, not by task difficulty. It is a proxy, not an intelligent router. For the 80% of queries that are routine, this is sufficient. For the 20% requiring escalation, you layer a difficulty classifier on top.

Difficulty-based routing

View Hydra code

# hydra/router.py
from langchain_openrouter import ChatOpenRouter

# Cheap classifier — GPT-5 Nano is fast and accurate enough for this
classifier_llm = ChatOpenRouter(model="openai/gpt-5-nano")

async def route_by_complexity(task: str) -> ChatOpenRouter:
    """
    Pre-screens task complexity and returns the appropriate model.
    The classifier itself uses the cheapest capable model.
    """
    result = await classifier_llm.ainvoke(
        f"Rate the reasoning complexity of this DeFi analysis task on a scale of 1-5. "
        f"Return only the number.\n\nTask: {task}"
    )
    score = int(result.content.strip())

    if score >= 4:
        return ChatOpenRouter(model="anthropic/claude-sonnet-4-6")
    elif score >= 2:
        return ChatOpenRouter(model="deepseek/deepseek-chat-v3-2")
    else:
        # Route to local model — zero cost
        from langchain_community.llms import Ollama
        return Ollama(model="hydra-analyst", base_url="http://localhost:11434")

The CASTER paper (January 2026) formalizes this "Context-Aware Strategy for Task Efficient Routing" approach in graph-based multi-agent systems and reports up to 72.4% cost reduction without quality degradation.

Compounding: how agent swarms get smarter over time

The "compounding" effect is the actual differentiator of multi-agent systems — not parallelization (though that helps), but the way agent outputs feed back as context into other agents across cycles.

In a naive implementation, each agent runs once and outputs a result. In a compounding swarm:

The Sentinel retrieves current pool state and tags it with a confidence score
The Analyst evaluates the pool state using the fine-tuned model, producing a risk assessment
The risk assessment is stored in the shared state and also in the knowledge base (vector store)
On the next cycle, the Sentinel retrieves not just raw protocol data but also the Analyst's previous assessment — it has learned from the last cycle
The Strategist observes that the Analyst's assessment has been consistent for 3 cycles and increases its decision confidence accordingly

Agent memory compounds. Risk patterns become learned context. The system's effective intelligence grows with each cycle, without retraining any model.

This requires the Mem0 memory layer (introduced in Article 3) and explicit state management in the LangGraph graph.

Hydra — Article 6 contribution: the Oracle agent and full swarm coordination

The Oracle is the last information agent — it provides macro context (market sentiment, protocol news, governance signals) using Agentic RAG over web and social sources. The Strategist now has three information streams: Sentinel (on-chain), Oracle (off-chain), Analyst (domain reasoning).

View Hydra code

# hydra/oracle.py
from langchain_openrouter import ChatOpenRouter
from hydra.orchestrator import HydraState
from hydra.router import route_by_complexity
import httpx

# Budget model for web signal triage — high volume, low complexity
oracle_llm = ChatOpenRouter(model="openai/gpt-5-nano")

async def oracle_node(state: HydraState) -> HydraState:
    """
    Critic-refiner pattern: fetches web signals, evaluates quality,
    passes only high-confidence signals to the Strategist.
    """
    positions = [pos["protocol"] for pos in state["portfolio"].get("positions", [])]
    queries = [f"{protocol} security news risk governance 2026" for protocol in positions]

    raw_signals = []
    async with httpx.AsyncClient() as client:
        for query in queries:
            # Firecrawl for web research (via n8n webhook in production)
            response = await client.get(
                "http://localhost:5678/webhook/hydra-research",
                params={"q": query},
            )
            raw_signals.extend(response.json().get("results", []))

    # Critic step: filter signals by relevance and recency
    critique_prompt = (
        f"Filter these signals for relevance to portfolio positions {positions}. "
        f"Return only signals with material impact. Signals:\n{raw_signals}"
    )
    relevant_signals = await oracle_llm.ainvoke(critique_prompt)

    return {**state, "signals": state["signals"] + [
        {"source": "oracle", "content": relevant_signals.content, "confidence": 0.75}
    ]}


# Updated Strategist with full swarm coordination
async def strategist_node(state: HydraState) -> HydraState:
    from hydra.router import route_by_complexity

    synthesis_task = (
        f"Portfolio has {len(state['portfolio'].get('positions', []))} positions. "
        f"Received {len(state['signals'])} signals. Produce portfolio-level decisions."
    )

    # Route to the right model based on complexity
    strategist_llm = await route_by_complexity(synthesis_task)

    response = await strategist_llm.ainvoke(
        f"Portfolio: {state['portfolio']}\n\n"
        f"Signals from Sentinel (on-chain), Oracle (off-chain), and Analyst:\n{state['signals']}\n\n"
        f"Identified risks: {state['risks']}\n\n"
        f"Propose specific portfolio adjustments with rationale."
    )

    decisions = _parse_decisions(response.content)
    return {**state, "decisions": decisions}

The final LangGraph graph — all six agents coordinated:

View orchestrator code

# hydra/orchestrator.py (final Article 6 version)
from langgraph.graph import StateGraph, START, END
from hydra.sentinel import sentinel_node
from hydra.oracle import oracle_node, strategist_node
from hydra.analyst import analyst_node
from hydra.executor import executor_node
from hydra.observer import get_langfuse_callback

def build_hydra_graph(checkpointer=None):
    graph = StateGraph(HydraState)

    # Information gathering — run in parallel
    graph.add_node("sentinel", sentinel_node)   # on-chain RAG
    graph.add_node("oracle", oracle_node)         # off-chain RAG, critic-refiner
    graph.add_node("analyst", analyst_node)       # fine-tuned DeFi specialist

    # Orchestration and execution
    graph.add_node("strategist", strategist_node)  # synthesis + routing
    graph.add_node("executor", executor_node)      # n8n webhook → on-chain

    # Parallel information gathering from START
    graph.add_edge(START, "sentinel")
    graph.add_edge(START, "oracle")
    # Analyst waits for Sentinel signals
    graph.add_edge("sentinel", "analyst")
    # Strategist waits for all three
    graph.add_edge("analyst", "strategist")
    graph.add_edge("oracle", "strategist")

    # Conditional execution gate
    graph.add_conditional_edges(
        "strategist",
        lambda s: "executor" if s["decisions"] and s["human_approved"] else END,
        {"executor": "executor", END: END},
    )
    graph.add_edge("executor", END)

    return graph.compile(checkpointer=checkpointer)

Hydra cost architecture at scale

With model tiering, caching, and the fine-tuned local model in place, the estimated monthly cost for a moderately active Hydra instance (100 decision cycles/day):

Agent	Model	Est. monthly cost
Strategist	Sonnet 4.6 (with prompt caching)	~$120
Sentinel	DeepSeek V3.2	~$15
Analyst	Fine-tuned Qwen 7B via Ollama	~$5 (electricity)
Oracle	GPT-5 Nano (with semantic cache)	~$8
Executor	GPT-5 Nano (structured calls)	~$3
Total		~$151/month

Compared to running every agent on Sonnet 4.6: ~$2,700/month. 94% cost reduction from model tiering alone.

The stack so far

Layer	Technology	Status
Orchestration	LangGraph 1.1	Done — Article 1
Automation	n8n 2.0	Done — Article 2
Knowledge	pgvector + LlamaIndex + GraphRAG	Done — Article 3
Observability	LangFuse (self-hosted)	Done — Article 4
Specialization	Fine-tuned Qwen 7B via Ollama	Done — Article 5
Coordination	Multi-agent swarm + OpenRouter routing	Done — this article
Security	SOAR + Guardian	Article 7
Resilience	Structured logging · Tenacity retries · LangFuse self-hosted	Article 8

Next in this series: SOAR capabilities — a compounding swarm of DeFi agents is a powerful target. The final article adds the Guardian: an autonomous security agent with the ability to detect exploits, simulate transactions before signing, and trigger incident response. And the full Hydra architecture proposal.

AI to Web3 series — building Hydra, a sovereign multi-agent DeFi intelligence mesh:

1 — LangChain orchestration · 2 — n8n execution · 3 — RAG at scale · 4 — LLM observability · 5 — Fine-tuning · 6 — Agent swarms · 7 — SOAR · 8 — Production resilience