Agentic AILLM Systems

AI Agents: From ReAct to Multi-Agent Systems

An agent is what happens when an LLM stops answering once and starts acting repeatedly in the world. This guide traces the control loops, tool use, and guardrails that separate a demo agent from a dependable one.

13 min readMarch 17, 2026

AgentsReActTool UseMulti-AgentLLMOrchestration

Defining the paradigm

An agent = LLM + memory + tools + a control loop

A language model answers a single question in a single forward pass. An agent runs a loop: it perceives the current state (via context), reasons about the next action, executes that action through a tool, observes the result, and iterates until the task is complete or a stopping condition is met. The agent's power comes from composing LLM reasoning with deterministic tool execution — each component doing what it does best. Visualize as a circular flow diagram: Observe → Think → Act → Observe, with the LLM at the center and tools as external boxes it dispatches to.

State

Memory

Agents maintain context across steps via four memory types: in-context (the current prompt window), external (vector store of past interactions), episodic (structured log of prior actions and results), and semantic (knowledge graph of learned facts).

Action space

Tools

Deterministic functions the agent can invoke: search APIs, code executors, database queries, web browsers, file systems, calculators. Each tool has a typed schema the LLM uses to decide when and how to call it. Tools are the agent's "hands" in the world.

Control

Orchestration

The control loop that manages the Observe-Think-Act cycle. May be implemented as a simple while loop (LangChain AgentExecutor), a state machine (LangGraph), or a message-passing event bus (AutoGen). Choice affects debuggability and reliability.

1 LLM + tools

Single-agent

N LLMs + roles

Multi-agent

The foundational pattern

ReAct: interleaved reasoning and acting

ReAct (Yao et al., 2022) is the simplest complete agent pattern: at each step, the model generates a Thought (free-form reasoning about what to do next), an Action (a tool call with arguments), and an Observation (the tool's return value). This interleaving prevents the model from acting without reasoning and keeps the entire decision trace visible for debugging. ReAct is the default agent pattern in LangChain and most frameworks.

Reasoning

Thought

Free-form natural language reasoning. "I need to check the current price before placing the order. I should call the pricing API." The thought is never executed — it just improves action selection by making the model reason explicitly before committing.

Execution

Action

A structured tool call: search("Q3 earnings Apple"), python_repl("import pandas; df.describe()"), or file_read("/reports/q3.csv"). The LLM generates JSON matching the tool's schema. The framework executes it deterministically.

Feedback

Observation

The tool's return value, injected back into context. The model now has the tool result available in its next reasoning step. Truncate long observations (e.g., web page content) before injection — they can overflow the context window.

\text{Trajectory} = (o_1, a_1, r_1,\; o_2, a_2, r_2,\; \ldots,\; o_T)

An agent trajectory is a sequence of observations o_t, actions a_t, and rewards r_t. The policy π(a_t | o_1...o_t) maps the full history to the next action — implemented by the LLM over its context window.

Scaling with specialization

Multi-agent systems: orchestrator + specialized workers

A single LLM has a finite context window and cannot reliably track long-horizon tasks with many parallel subtasks. Multi-agent systems decompose the problem: an orchestrator plans and delegates; specialized worker agents handle narrow subtasks (research, coding, critique, summarization). Each agent can have its own tools, memory, and even a different underlying model. The orchestrator aggregates results and decides what to do next. Visualize as a hub-and-spoke: orchestrator at the center dispatching to worker nodes, with results flowing back as messages.

Framework

LangGraph pattern

Models agent state as a typed graph: nodes = agent functions, edges = conditional transitions, state = shared typed dict. Explicit state machine with cycle detection and streaming makes it debuggable. The current production-grade standard for complex agentic workflows.

Framework

AutoGen / CrewAI pattern

Agents communicate via natural language messages in a shared conversation thread. Easier to prototype; harder to reason about state deterministically. Good for creative or open-ended tasks where rigid state machines limit emergent behavior.

Task Decomposition

Step 1

Orchestrator receives a complex user request and breaks it into a dependency graph of subtasks. Tasks with no dependencies run in parallel; dependent tasks wait. Reduces total latency from serial to parallel execution.

Agent Dispatch

Step 2

Each subtask is routed to the best-suited worker agent. Worker agents are specialized: a researcher agent has web search + RAG tools; a coder agent has a Python REPL + code sandbox; a critic agent has no tools but evaluates outputs.

Parallel Execution

Step 3

Worker agents run their ReAct loops independently. Long-running tasks execute concurrently while the orchestrator monitors progress and handles failures. Each worker maintains its own short-term context; shared state goes to a message bus.

Aggregation & Verification

Step 4

Orchestrator collects worker outputs and evaluates consistency. A dedicated critic agent checks factual consistency across workers. Human-in-the-loop gates (approval checkpoints) can be inserted before high-stakes actions.

Production engineering

Making agents reliable: the four failure modes

Agents fail in ways that single-inference systems never do. A failed tool call partway through a 20-step task can be catastrophic — unlike a bad single response that the user can retry. Production agents need explicit failure handling, cost control, and observability at every step.

Hallucination risk

Planning failures

Agent pursues a wrong plan for many steps before realizing it. Fix: require explicit plan generation before execution, then validate the plan against task requirements. Add reflection steps: "Check if the current trajectory is still on track."

Execution risk

Tool call failures

Tool returns an error or unexpected format; agent ignores it and hallucinates a plausible result. Fix: typed tool schemas with explicit error handling. Provide fallback tools. Log every tool call with input, output, and latency.

Control risk

Infinite loops

Agent gets stuck in a retry loop — tool fails, agent retries with same arguments indefinitely. Fix: max_iterations hard limit, exponential backoff, circuit breaker pattern. Log loop detection signals (repeated identical tool calls).

Cost risk

Cost explosion

A 20-step agent using GPT-4o at each step with large contexts costs $1+ per run. Unexpected loops multiply this 10x. Fix: token budget per run, model routing (use GPT-4o-mini for simple tool calls, GPT-4o for complex reasoning), cost alarms.

10–30 steps

Max iterations

$0.05 – $2 / run

Avg. agent cost

Checklist

Instrument every agent run with a full trace: step index, thought, action, tool name, token count, latency, cost.
Set max_iterations (20 default) and max_cost_per_run budgets and handle graceful degradation.
Require human approval before any irreversible action: sending emails, writing to databases, making purchases.
Test agent robustness with adversarial inputs: missing tool responses, contradictory search results, malformed JSON returns.
Version your tool schemas alongside your agent prompts — schema changes break running agents silently.

LLM Systems

Retrieval-Augmented Generation: Architecture, Evaluation, and Production

RAG gives an LLM a memory it can check instead of bluffing from a frozen past. This guide follows the full pipeline from chunking to evaluation so a prototype can grow into a production system.

12 min read

AI Security

OWASP Top 10 for LLM Apps: Real Attacks, Real Fixes

For LLM apps, the attack often arrives as plain language rather than obviously malicious code. This guide walks through the OWASP risks as real failure stories, then shows the concrete controls that stop them.

16 min read

AI Governance

Security & Compliance Standards for AI Systems

AI security begins where ordinary app security stops: the attack can be a dataset, a gradient, or a paragraph that looks harmless. This guide maps that wider threat surface and the controls regulated teams need.

14 min read

All articles