What Agents Actually Are | The Agent Handbook

The One-Paragraph Explanation

Everyone's using the word agent now. And look, i get it. It's the hot term. But most of the time when people say agent they mean very different things, and the confusion is.. actually slowing people down from understanding what's happening.

So let me try to cut through it.

Receive a task or objective

An agent is a system that can take actions autonomously to achieve a goal. Not just answer questions (that's a chatbot). Not just suggest code edits or research a topic (that's a copilot). An agent receives a goal, breaks it into steps, uses tools to execute those steps, and adapts when things go wrong. The key word is autonomy. The system is making decisions, not just responding to prompts.

That's the simplest version. But like most simple explanations, it hides the part that actually matters.

The loop.

The loop is the whole thing

Almost every agent, from the simplest task runner to the most ambitious autonomous system, reduces to the same control loop. A goal comes in. System plans or picks a next step. It acts, usually by calling a tool. It observes what happened. It evaluates whether that helped, whether it violated constraints, whether it created new problems. Then it updates its state, revises the plan, decides whether to keep going or hand off to a human. Repeat until done, blocked, or stuck.

If you want to understand why agents fail, start here. Most failures don't happen because the model is dumb. They're failures of step selection, tool choice, verification, or state management inside this loop. i keep seeing teams blame the model when the real problem is that their loop has no error recovery, no stop conditions, no way to back out of a bad path.

Once you internalize this, the whole field starts to make more sense.

The Agent Stack

i think the clearest way to understand agents is as a stack. Five layers, each one making the loop more capable (and potentially dangerous)

Layer 1: Foundation Model. The brain. GPT-5, Claude, Gemini, Grok whatever. This is the reasoning engine that powers everything above it. Without a capable foundation model, nothing else works.

Layer 2: Reasoning & Planning. The ability to break a goal into steps, figure out what to do next, and adjust when a step fails. This is the hardest unsolved problem in agents right now. i'll go deeper in chapter 2.

Layer 3: Memory. What separates a chatbot from an agent. Short-term memory (the current context window), long-term memory (what happened in previous sessions), and episodic memory (learning from past runs). Without memory, every interaction starts from zero. Agents need a working state for the current run, what's happened, what's pending, what's true right now.

Layer 4: Tools. The hands. APIs, web browsers, code interpreters, databases, file systems. Tools are how agents interact with the real world. Function calling is the mechanism that makes this possible.

Layer 5: Actions. The actual work. Sending an email, deploying code, making a trade, filing a document. This is where agents cross from interesting demo to being actually useful.

Most agents aren't agents

A lot of what gets called agents is really automation with a model attached.

Workflow automation is fixed steps, deterministic routing. Low variance, high reliability, usually great ROI. Agentic workflows have some fixed steps and some steps chosen by a model. Medium variance, needs monitoring and guardrails. And then there are open-ended agents where the model chooses most steps, tools, and subgoals. High variance, strong containment required.

If you're building for production, you usually want to start with workflow automation, then introduce agentic steps where human-written logic becomes too brittle or too expensive. i've been saying this for a while now and i keep seeing teams skip straight to fully autonomous and pay for it later.

Autonomy is a spectrum, not a switch

Not everything labeled an agent is actually autonomous. i think about autonomy as a spectrum.

At the low end you have autocomplete. Your phone keyboard, GitHub Copilot's inline suggestions. Zero autonomy. The system is just predicting the next token based on what you've typed.

Next up: copilots. ChatGPT, Claude in chat mode, Cursor. These can do multi-step reasoning but only when you tell them to. They wait for instructions, execute, and hand the result back. They don't initiate.

Then: task agents. Devin, Replit Agent, customer support bots. Given a clear task, they can break it into steps, use tools, and execute with minimal supervision. They operate within guardrails but make real decisions. And the latest one that has been going viral in my circles is OpenClaw, a personal AI assistant you can message.

At the high end: autonomous agents. Systems that operate continuously, set their own sub-goals, and handle unexpected situations. Very few of these work reliably today. Most demos in this space are.. optimistic.

Knowing where a product sits on this spectrum tells you a lot about how to evaluate it.

The autonomy spectrum

Less autonomyMore autonomy

Why Now?

i get asked this a lot. Agents as a concept have been around forever. So why did they suddenly become real in 2025-2026?

What changed is that all the ingredients got good enough at the same time.

Context windows exploded. Claude went from 8K to 1M tokens. GPT-5 hit 400K. This matters because agents need to hold entire codebases, conversation histories, and tool outputs in working memory.

Tool use got reliable. Function calling went from a flaky experiment to something you could build production systems on. Models got dramatically better at knowing when to use a tool and what arguments to pass.

Reasoning models arrived. o1, o3, Claude's extended thinking. Models that can actually plan and reason through multi-step problems instead of just generating plausible-sounding text.

Cost collapsed. GPT-5 level capability went from $60/million tokens to under $1 for some models (especially the open source models from China). This made it economically viable to have agents that call the model hundreds of times per task.

Each of these alone wouldn't have been enough. Together, they created the conditions for agents to actually work.

Hype vs. Reality

Here's where i try to be honest about what's working and what isn't.

Working well right now:

Coding tasks with clear specifications (Cursor, Devin, Claude Code)
Customer support triage and response (Intercom, Sierra)
Research and synthesis (Perplexity, deep research tools)
Structured data extraction and processing
Simple multi-step workflows with well-defined APIs

Getting there but unreliable:

Complex multi-step reasoning across many tools
Anything requiring long-running sessions (hours, not minutes)
Tasks that need common sense about the physical world
Collaborative multi-agent systems

Still 1-2 years out:

Truly autonomous agents that run for days or weeks
Agents that reliably handle edge cases without human oversight
General-purpose agents that can do anything
Agents that learn and improve from their own experience in production

i'm maybe 70% confident in these timelines. The space moves fast, and i've been surprised before. But the pattern i see is: every six months, the working well list grows by 2-3 items, and the getting there list shifts down one notch.

The biggest gap right now is between demos and production. Almost anything can be demoed. Making it work 1000 times in a row without breaking? That's where most companies are stuck. More on this in chapter 4.