Getting Started | The Agent Handbook

This chapter is simple on purpose. If you read the first four chapters and you still feel like agents are abstract, the fix is to touch one. Build one. Run one. Break one.

The fastest way to understand agents is to watch the loop fail, then make it fail less.

If You're a Developer

Your first month with agents

You probably want to build immediately. Good. Start smaller than you think you should.

Your first project should be a single-tool agent that you can understand end to end in an afternoon. The goal is learning the loop.

Example Project: Data cleanup agent for CSVs.

Input: messy CSV.

Tool: a local code runner.

Output: cleaned CSV plus a diff report of what changed.

Success: predictable transformations, no silent corruption.

Two tests: missing columns (should stop), inconsistent date formats (should normalize and report).

Pick one framework and learn it well. i'd recommend starting with one of these three:

Claude's tool use API directly. If you just want to understand how agents work at the lowest level, start here. Define tools, send messages, handle tool calls. No framework overhead. The Anthropic docs are solid.
LangGraph. If you need state machines, branching logic, and human-in-the-loop. The learning curve is steeper but it handles complex workflows better than anything else i've used.
CrewAI. If you want to prototype multi-agent systems fast. Great for getting something working in a day. Starts to creak under production load, but for learning and prototyping, it's the fastest path.

Common mistakes i see developers make:

Over-engineering the prompt. Your first prompt should be 5 lines. Add complexity only when the agent fails at something specific.
Not logging everything. You will need to debug agent runs. Log every tool call, every model response, every decision point. Future you will thank present you.
Ignoring error handling. Tool calls fail. APIs time out. Models hallucinate. Build retry logic and fallbacks from day one.
Testing on happy paths only. The demo works. Production won't. Test with messy inputs, partial data, and adversarial cases early.

If You're a Founder

The agent space is wide open. But the easy picks have been picked, and the hard problems are genuinely hard.

Where i see real opportunities right now:

Vertical agents in regulated industries. Healthcare, legal, finance. The compliance requirements create a natural moat. A coding agent can be swapped easily. A medical coding agent that understands ICD-10 billing rules and has been validated against real claims? That's sticky.
Agent infrastructure. The picks-and-shovels play. Monitoring, evaluation, deployment, orchestration. Every agent builder needs these and most are rolling their own right now. There's room for dominant platform plays here.
Domain-specific data flywheels. (Back to the closed loop from chapter 4.) If you can get to production fast and start generating proprietary training data, you build a moat that's hard to replicate. The model layer is commoditizing. The data layer is not.

What's been tried and is hard:

General-purpose autonomous agents. They are exciting and hard to ship. The environment is too broad, the failure surface is too large, and the product becomes a support nightmare.
AI-first companies without a distribution advantage. If your entire value prop is "we use AI," you're competing with every other startup saying the same thing. The winners have distribution first, AI second.

A useful defensibility test.

Do you own unique data from real workflows
Are you deeply integrated into the workflow so switching is painful
Have you closed the reliability gap for a narrow use case

If you have two of these, you are in a better position than most.

The defensibility test

Two of three and you're ahead of most.

0/3Check off the moats you have (or are building).

If You're an Investor

The agent market is confusing from the outside. Here's how i'd navigate it.

The three-layer market map

I like to map the space into three layers.

Foundation layer. The model providers (OpenAI, Anthropic, Google, etc.). These are mostly spoken for. You're not investing here unless you're writing very large checks.
Platform layer. Frameworks, orchestration tools, infrastructure. This is where there's real competition and where platform winners will emerge. Look for strong developer adoption and usage growth.
Application layer. Vertical agents solving specific problems. This is the biggest opportunity set. Hundreds of verticals, each with different requirements and competitive dynamics.

Due diligence questions i'd ask any agent company:

What's your success rate over the last 30 days on real tasks, and how do you measure it?
What are you top 5 failure categories by frequency?
How does your agent get better over time? Is the loop closed?
What happens when the underlying model gets updated? How dependent are you on a specific provider?
What's the unit economics per task, including error handling and human supervision?

What separates winners from the rest. In my experience, it's not the team with the best model integration. It's the team that understands the domain deeply enough to handle the last 10% of edge cases that trip everyone else up.

If You're a Business Leader

You're probably being pitched agent solutions weekly at this point. Here's how to separate signal from noise.

Where agents can help your company today

Where agents can actually help your company today (not in two years):

Customer support triage. Routing tickets, drafting initial responses, handling common questions. This is the most proven agent use case. If you have a support team handling more than 500 tickets/month, an agent can probably help.
Internal knowledge base Q&A. An agent that can answer employee questions about policies, processes, and documentation. Works well when you have good source material.
Data processing and reporting. Extracting information from documents, generating reports from structured data, monitoring dashboards. Agents are surprisingly good at this.
Code review and testing. If you have a development team, AI-assisted code review catches real bugs and speeds up the review cycle.

How to evaluate vendors:

Ask for a pilot with your actual data, not their demo data
Measure success rate, intervention rate, and time saved per task
Ask what happens when it fails, and force them to show you
Understand the pricing model. Per-task, per-seat, or per-outcome makes a huge difference
Talk to their existing customers, not their references

What to pilot first. Pick a high-volume, medium-stakes task where the cost of errors is low. Customer support triage is the classic starting point for a reason.

If You're Just Curious

Weekend challenge

~3.5 hours total

0/5

Five things you can do this weekend to actually experience agents firsthand:

Use Claude Code or Cursor for a coding task. Even if you're not a developer, set up a simple project and watch an agent write code, fix bugs, and iterate. It's the most tangible way to understand what agents can and can't do.
Try a deep research tool. Perplexity Pro or ChatGPT's deep research. Give it a genuine research question you've been meaning to answer. Evaluate the quality of the output vs. what you'd find in an hour of Googling.
Build a simple agent with zero code. Tools like Zapier AI or Make.com let you create simple agent workflows without writing code. Automate something in your actual life.
Read one technical paper. The ReAct paper is a great starting point. It's readable, foundational, and will give you the vocabulary to understand everything else in the space.
Follow the builders. The best way to stay current is to follow the people actually building agents. Check our reading list in chapter 6 for who to follow and why.