How Agents Work Under the Hood
Technical intuition without requiring a CS degree.
I used to think agents were mostly a model problem.
Better model. bigger context. better prompting. done.
Then i started looking at systems that actually work in production and.. yeah. the model matters, but the system matters more than people want to admit. especially once tools and real permissions enter the picture.
Hover or tap a layer to explore
Reasoning & Planning
This is the hardest problem in the agent space. And honestly, the more i dig into it, the less clean the answers get.
The core question is: how does a model go from "here's a goal" to "here are the steps i need to take, in order, adapting as i go"?
The first time i really understood why this is hard, i thought about driving with a GPS that only shows the next turn.
You can get surprisingly far that way. You take the next turn, then the next, then the next. But you also could end up in places where step 2 looked correct, but step 8 is a dead end. And now you’ve burned 30 minutes and got lost.
That’s agents today. a lot of local correctness. not enough global foresight.
There are several approaches, each with real tradeoffs.
ReAct (Reasoning + Acting). The most common pattern. The model thinks out loud ("i need to find the user's email first, then draft the message"), takes an action (calls a tool), observes the result, and repeats. It's simple, it works for straightforward tasks, and most agent frameworks use some version of this under the hood.
The problem with ReAct is that it's greedy. It makes one decision at a time without considering the full plan. So it can get stuck in loops, or go down a path that seemed right at step 2 but turns out to be a dead end at step 8.
Chain-of-thought and tree-of-thought. Chain-of-thought is when the model reasons step by step before acting. Tree-of-thought takes it further by exploring multiple reasoning paths and selecting the best one. In theory, this should produce better plans. In practice, it's more expensive and slower, and models often converge on the same path anyway.
Planning-then-executing vs. interleaved. Some systems plan everything upfront, then execute the plan. Others plan and execute in alternating steps. The tradeoff is reliability vs. adaptability. Upfront planning breaks when the world is unpredictable. Interleaved planning is slower but handles surprises better.
The honest answer is that no one has solved planning well. It’s a bundle of failure modes:
• decomposition failure: the agent splits the task wrong and misses a prerequisite
• ordering failure: right steps, wrong sequence
• tool selection failure: wrong tool, or right tool with wrong arguments
• verification failure: it doesn’t check whether the step worked
• objective drift: it optimizes for looking done instead of being done
It's the biggest bottleneck for making agents reliable, and it's where a lot of the research effort is concentrated right now.
5 ways planning breaks
Memory
People talk about memory like it’s one thing. It isn’t.
The simplest distinction that actually matches reality is state vs memory.
State is what the agent must not forget during the current run. What it has already done. what it is trying next. which files it edited. what the constraints are. what the user said was non-negotiable. what failed. what succeeded.
State has to be correct right now. If it’s wrong, you get repetition, skipped steps, and agents that swear they already emailed someone they never emailed.
Memory is what persists across runs. Things you want the agent to reuse later. preferences. stable facts. past incidents. patterns that tend to work.
Memory can be lossy. State cannot. Memory is what separates a chatbot from an agent. Without it, every interaction starts from zero. There are three types that matter.
Short-term memory (the context window). This is what the model can see right now. Everything in the current conversation, all the tool outputs, the system prompt. When people talk about 200K token context windows, this is what they mean. Bigger context = more the agent can hold in its head at once.
But context windows have limits. Even 200K tokens runs out fast when you're processing codebases or long conversations. And the model's attention degrades for information in the middle of long contexts (the lost in the middle problem).
Long-term memory (RAG, vector databases). Information stored outside the context window that the agent can retrieve when needed. This is usually implemented with vector databases. The agent embeds its memories as vectors, and when it needs to remember something, it searches for semantically similar memories.
The challenge with RAG is retrieval quality. The agent needs to know what to search for, and the retrieval system needs to return relevant results. When either fails, the agent either misses important context or gets confused by irrelevant information.
Episodic memory (learning from experience). This is the frontier. Can an agent learn from its previous runs? Remember that this approach worked last time, or that this client prefers formal language? Very few systems do this well today, but it's arguably the most important type of memory for production agents.
Tool Use
This is the part that makes agents actually useful. A model that can think but can't act is just a chatbot. Tools are what give agents hands.
The mechanism is function calling. You define a set of tools (each with a name, description, and parameter schema), the model decides when to use one and generates the arguments, and your code executes the actual function call and returns the result.
It sounds simple. In practice, the hard parts are:
-
Tool selection. When you have 50 tools available, the model needs to pick the right one. This is where good tool descriptions matter more than you'd think.
-
Argument generation. The model needs to generate valid arguments. A wrong file path, a malformed API call, a missing required field. These are the most common failure modes.
-
Error handling. What happens when a tool call fails? Good agents retry with different arguments, try alternative tools, or ask for clarification. Bad agents loop forever or give up.
This is also where standards start to matter. Anthropic’s MCP is interesting because it’s trying to make tool definitions portable across ecosystems. I don’t know if MCP ultimately wins, but the direction is correct. the tool layer is becoming an ecosystem. standards reduce friction.
Multi-Agent Systems
When one agent isn't enough, you use multiple agents that coordinate. This is one of the most hyped areas and also one of the most underwhelming in practice.
The idea is appealing. One agent researches, another writes, a third reviews. Just like a team of humans. And for simple pipelines (agent A produces output, agent B refines it), this works okay.
But real multi-agent coordination is genuinely hard. Agents need to communicate, share context, handle disagreements, and avoid stepping on each other's work. Most multi-agent frameworks today are essentially pipelines with extra steps, not true collaborative systems.
Where multi-agent does shine today is:
• parallel attempts: run three independent solutions, pick the best
• critique and verification: one agent proposes, another attacks
• specialization: one agent knows the codebase, another is a test runner, another is policy enforcement
It's still early days, but I'm bullish on multi-agent as an architecture.
Guardrails & Safety
How do you stop an agent from going rogue? This is not a theoretical question. An agent with access to your email, your code repo, and your bank account could cause real damage if it makes a bad decision.
The main approaches:
Sandboxing. Run the agent in an isolated environment where it can't cause permanent damage. This is how most coding agents work. The agent writes code in a sandbox, and only approved changes get merged.
Human-in-the-loop. Require human approval for high-stakes actions. Send the email draft before the agent actually sends it. Review the code diff before it gets committed. This is the most reliable safety mechanism but also the most friction-intensive.
Output filtering. Check the agent's planned actions against a set of rules before executing them. No sending emails to external addresses, no deleting production data, no spending more than $X. This is the guardrails approach.
Kill switches. The ability to immediately stop an agent that's going off the rails. Sounds obvious, but a surprising number of agent deployments don't have one.
The honest take: safety is still more art than science. Every deployment needs a thoughtful combination of these approaches based on the risk profile of what the agent can do. There's no universal solution yet.
That’s the under-the-hood picture as i see it right now.
If you’re building or investing, here’s the simplest practical question i keep coming back to. Where does this system get its reliability from?
Better planning helps. better models help. but the teams that win tend to win because they built state management, verification, and guardrails that force the agent to behave.
Get the Weekly Brief
Weekly AI agents intel for 13,000+ readers. Subscribe and get the 2026 AI Playbook (PDF) free.