AI Agents: From ReAct to Multi-Agent Systems
An agent is what happens when an LLM stops answering once and starts acting repeatedly in the world. This guide traces the control loops, tool use, and guardrails that separate a demo agent from a dependable one.
Defining the paradigm
An agent = LLM + memory + tools + a control loop
A language model answers a single question in a single forward pass. An agent runs a loop: it perceives the current state (via context), reasons about the next action, executes that action through a tool, observes the result, and iterates until the task is complete or a stopping condition is met. The agent's power comes from composing LLM reasoning with deterministic tool execution — each component doing what it does best. Visualize as a circular flow diagram: Observe → Think → Act → Observe, with the LLM at the center and tools as external boxes it dispatches to.
Memory
Agents maintain context across steps via four memory types: in-context (the current prompt window), external (vector store of past interactions), episodic (structured log of prior actions and results), and semantic (knowledge graph of learned facts).
Tools
Deterministic functions the agent can invoke: search APIs, code executors, database queries, web browsers, file systems, calculators. Each tool has a typed schema the LLM uses to decide when and how to call it. Tools are the agent's "hands" in the world.
Orchestration
The control loop that manages the Observe-Think-Act cycle. May be implemented as a simple while loop (LangChain AgentExecutor), a state machine (LangGraph), or a message-passing event bus (AutoGen). Choice affects debuggability and reliability.
1 LLM + tools
Single-agent
N LLMs + roles
Multi-agent
The foundational pattern
ReAct: interleaved reasoning and acting
ReAct (Yao et al., 2022) is the simplest complete agent pattern: at each step, the model generates a Thought (free-form reasoning about what to do next), an Action (a tool call with arguments), and an Observation (the tool's return value). This interleaving prevents the model from acting without reasoning and keeps the entire decision trace visible for debugging. ReAct is the default agent pattern in LangChain and most frameworks.
Thought
Free-form natural language reasoning. "I need to check the current price before placing the order. I should call the pricing API." The thought is never executed — it just improves action selection by making the model reason explicitly before committing.
Action
A structured tool call: search("Q3 earnings Apple"), python_repl("import pandas; df.describe()"), or file_read("/reports/q3.csv"). The LLM generates JSON matching the tool's schema. The framework executes it deterministically.
Observation
The tool's return value, injected back into context. The model now has the tool result available in its next reasoning step. Truncate long observations (e.g., web page content) before injection — they can overflow the context window.
An agent trajectory is a sequence of observations o_t, actions a_t, and rewards r_t. The policy π(a_t | o_1...o_t) maps the full history to the next action — implemented by the LLM over its context window.
Scaling with specialization
Multi-agent systems: orchestrator + specialized workers
A single LLM has a finite context window and cannot reliably track long-horizon tasks with many parallel subtasks. Multi-agent systems decompose the problem: an orchestrator plans and delegates; specialized worker agents handle narrow subtasks (research, coding, critique, summarization). Each agent can have its own tools, memory, and even a different underlying model. The orchestrator aggregates results and decides what to do next. Visualize as a hub-and-spoke: orchestrator at the center dispatching to worker nodes, with results flowing back as messages.
LangGraph pattern
Models agent state as a typed graph: nodes = agent functions, edges = conditional transitions, state = shared typed dict. Explicit state machine with cycle detection and streaming makes it debuggable. The current production-grade standard for complex agentic workflows.
AutoGen / CrewAI pattern
Agents communicate via natural language messages in a shared conversation thread. Easier to prototype; harder to reason about state deterministically. Good for creative or open-ended tasks where rigid state machines limit emergent behavior.
Task Decomposition
Step 1Orchestrator receives a complex user request and breaks it into a dependency graph of subtasks. Tasks with no dependencies run in parallel; dependent tasks wait. Reduces total latency from serial to parallel execution.
Agent Dispatch
Step 2Each subtask is routed to the best-suited worker agent. Worker agents are specialized: a researcher agent has web search + RAG tools; a coder agent has a Python REPL + code sandbox; a critic agent has no tools but evaluates outputs.
Parallel Execution
Step 3Worker agents run their ReAct loops independently. Long-running tasks execute concurrently while the orchestrator monitors progress and handles failures. Each worker maintains its own short-term context; shared state goes to a message bus.
Aggregation & Verification
Step 4Orchestrator collects worker outputs and evaluates consistency. A dedicated critic agent checks factual consistency across workers. Human-in-the-loop gates (approval checkpoints) can be inserted before high-stakes actions.
Production engineering
Making agents reliable: the four failure modes
Agents fail in ways that single-inference systems never do. A failed tool call partway through a 20-step task can be catastrophic — unlike a bad single response that the user can retry. Production agents need explicit failure handling, cost control, and observability at every step.
Planning failures
Agent pursues a wrong plan for many steps before realizing it. Fix: require explicit plan generation before execution, then validate the plan against task requirements. Add reflection steps: "Check if the current trajectory is still on track."
Tool call failures
Tool returns an error or unexpected format; agent ignores it and hallucinates a plausible result. Fix: typed tool schemas with explicit error handling. Provide fallback tools. Log every tool call with input, output, and latency.
Infinite loops
Agent gets stuck in a retry loop — tool fails, agent retries with same arguments indefinitely. Fix: max_iterations hard limit, exponential backoff, circuit breaker pattern. Log loop detection signals (repeated identical tool calls).
Cost explosion
A 20-step agent using GPT-4o at each step with large contexts costs $1+ per run. Unexpected loops multiply this 10x. Fix: token budget per run, model routing (use GPT-4o-mini for simple tool calls, GPT-4o for complex reasoning), cost alarms.
10–30 steps
Max iterations
$0.05 – $2 / run
Avg. agent cost
Checklist
- Instrument every agent run with a full trace: step index, thought, action, tool name, token count, latency, cost.
- Set max_iterations (20 default) and max_cost_per_run budgets and handle graceful degradation.
- Require human approval before any irreversible action: sending emails, writing to databases, making purchases.
- Test agent robustness with adversarial inputs: missing tool responses, contradictory search results, malformed JSON returns.
- Version your tool schemas alongside your agent prompts — schema changes break running agents silently.
Related posts
Retrieval-Augmented Generation: Architecture, Evaluation, and Production
RAG gives an LLM a memory it can check instead of bluffing from a frozen past. This guide follows the full pipeline from chunking to evaluation so a prototype can grow into a production system.
12 min readOWASP Top 10 for LLM Apps: Real Attacks, Real Fixes
For LLM apps, the attack often arrives as plain language rather than obviously malicious code. This guide walks through the OWASP risks as real failure stories, then shows the concrete controls that stop them.
16 min readSecurity & Compliance Standards for AI Systems
AI security begins where ordinary app security stops: the attack can be a dataset, a gradient, or a paragraph that looks harmless. This guide maps that wider threat surface and the controls regulated teams need.
14 min read