The Reading List
Curated further reading, tiered by depth. Papers, posts, people, and the COT archive.
The Essential 10
If you read nothing else, read these. This is the foundation for how agents actually behave in the real world, plus the few older papers that still explain the core loop.
- LLM Powered Autonomous Agents (Lilian Weng, 2023)
The best single overview of planning, memory, and tool use. It also helps you see why "agent" is a system design problem, not a prompt. - Building Effective Agents (Anthropic, 2024)
Practical patterns that survive contact with production. Also unusually clear about when agents are the wrong tool. - Model Context Protocol (Anthropic, 2024)
The moment tools start looking like an ecosystem. If MCP keeps spreading, it changes how you think about distribution, permissions, and integration lock-in. - How We Built Our Multi-Agent Research System (Anthropic, 2024)
A real example of multi-agent done as engineering, not as theater. Especially useful for evaluation and coordination patterns. - Function Calling Guide (OpenAI, updated 2025)
The details matter here. Schemas, strictness, and tool definitions end up deciding whether your agent is robust or flaky. - Structured Outputs (OpenAI, 2024)
Boring name, huge practical impact. This is how you reduce "agent made up the JSON" failures in real systems. - τ-bench: A Benchmark for Tool-Agent-User Interaction (Yao et al., 2024)
One of the best benchmarks for what hurts in production: multi-turn tool use, rule following, inconsistency across retries. - Introducing SWE-bench Verified (OpenAI, 2024)
Not because it's perfect, but because it's a cleaner measurement discipline than most agent evals. Worth reading to calibrate what scores do and do not mean. - Mitigating Prompt Injections in Browser Use (Anthropic, 2025)
Every serious agent ends up touching untrusted text. This is the best "you will get burned this way" overview from a lab.
By Topic
Curated reading lists organized by major area. Each list is ordered from accessible to deep.
Agent Architecture and Reasoning
- Building Effective Agents (Anthropic): The practical starting point.
- LLM Powered Autonomous Agents (Lilian Weng): Comprehensive technical overview.
- ReAct Paper (Yao et al.): The foundational reasoning + acting loop.
- τ-bench (Yao et al.): Rule following under tool constraints.
- A Practical Guide to Building Agents (OpenAI): Product and engineering playbook for shipping safely.
- AI Agents Whitepaper (Google): Framework for understanding agent architectures and deployment patterns.
Tool Use and the Tools Ecosystem
- Function Calling Guide (OpenAI): The reference implementation for tool definitions and schemas.
- Structured Outputs (OpenAI): Reducing JSON hallucination in agent tool calls.
- Toolformer (Schick et al., 2023): The research root of self-taught tool use.
- Model Context Protocol (MCP) (Anthropic): The emerging standard for connecting agents to tools and data sources.
- Claude Tool Use Guide (Anthropic): Practical guide to implementing tool use.
Multi-Agent Systems
Multi-agent coordination is still early. These are useful as patterns and prototypes, not as evidence that the problem is solved.
- Multi-Agent Research System (Anthropic): The most practical reference for real multi-agent engineering.
- AutoGen: Enabling Next-Gen LLM Applications (Wu et al.): Multi-agent conversation framework from Microsoft.
- MetaGPT: Multi-Agent Collaborative Framework (Hong et al.): Agents taking on software engineering roles.
- LangGraph Documentation (LangChain): State machine-based agent orchestration.
- CrewAI Documentation (CrewAI): Role-based multi-agent prototyping.
Evaluation and Reliability
Repeated-run reliability is the thing to optimize for, not single-shot pass rates. These resources cover both what to measure and why most benchmarks mislead.
- SWE-bench Verified (OpenAI): A cleaner measurement discipline than most agent evals.
- τ-bench (Yao et al.): Pass^k and consistency across retries. The production-relevant angle.
- AgentBench (Liu et al.): Multi-dimensional evaluation of LLM agents across diverse environments.
Security and Safety
- Practices for Governing Agentic AI Systems (OpenAI, Dec 2023): The clearest framework for agent safety: delegation boundaries, monitoring, and principal-agent problems.
- Mitigating Prompt Injections in Browser Use (Anthropic, 2025): Every agent that touches untrusted text needs this.
- Prompt Injection Series (Simon Willison): The builder's intuition for prompt injection. Practical, hands-on, and regularly updated.
Robotics and Embodied AI
A separate lane from software agents. Included for completeness, but the engineering challenges are fundamentally different.
- Voyager (Wang et al., 2023): An agent that plays Minecraft by writing and reusing its own code. Skill libraries and curriculum learning.
- Generative Agents: Interactive Simulacra of Human Behavior (Park et al., 2023): The Stanford "AI town" paper. Memory and reflection architecture for long-running agents.
- RT-2: Vision-Language-Action Models (Google DeepMind): Transferring web knowledge to robot control.
- Open X-Embodiment (Open X-Embodiment Collaboration): The largest robot learning dataset and why cross-embodiment transfer matters.
- pi-zero: A Vision-Language-Action Model (Physical Intelligence): Foundation models for robot control.
The Economics of AI Agents
- Generative AI's Act Two (Sequoia Capital): Where value accrues in AI: from models to applications.
- Who Owns the Generative AI Platform? (a16z): The infrastructure vs. application layer economics.
- Rise of AI Agent Infrastructure (Madrona): The picks-and-shovels layer and where platform winners emerge.
- AI Voice Agents: 2025 Update (a16z): Voice-first agent interfaces and where the market is heading.
- Enterprise Automation Architecture (Menlo Ventures): How agents fit into enterprise automation stacks.
- AI Agents Investment Thesis (Eximius Ventures): A VC framework for evaluating agent startups.
- Enterprise Spending Transform (ARK Invest): How enterprise AI spending is shifting toward agentic systems.
- How People Create :and Destroy :Value with Gen AI (BCG / Harvard): Hard data on when AI agents help vs. hurt performance.
Deep Essays
Valuable for context and vision, but not the quickest path to understanding agents as systems. Read these after the essentials.
- Situational Awareness: The Decade Ahead (Leopold Aschenbrenner, 2024): 155 pages on where AI capability is heading. The scaling arguments inform everything about what agents will be able to do.
- Machines of Loving Grace (Dario Amodei): The most optimistic serious case for what AI could do for humanity. Covers health, poverty, governance, and work.
- The Model is the Product (Vintage Data): Why the model layer is becoming the product layer, and what that means for agent companies trying to build moats.
- The Goldilocks Zone (Not Boring): Finding the sweet spot between too early and too late in AI. Timing, ambition, and where agents fit.
Follow These People
Researchers, builders, and thinkers worth following. Each one has shaped how i think about some aspect of the agent space.
Researchers
- Lilian Weng (@lilianweng): Her blog posts are the best technical overviews in the field. Every new post is mandatory reading.
- Shunyu Yao (@ShunyuYao12): First author of the ReAct and Tree of Thoughts papers. Foundational agent research.
- Harrison Chase (@hwchase17): Whether or not you use LangChain, he's consistently ahead of the curve on agent patterns and architectures.
- Jim Fan (@DrJimFan): Working on foundation agents and embodied AI. His threads on general-purpose agent architectures are excellent.
Builders
- Swyx (@swyx): Named the AI Engineer role and consistently surfaces the best thinking on applied AI.
- Simon Willison (@simonw): The most practical, hands-on AI blogger. His explorations of tool use, prompt injection, and agent capabilities are gold.
- Karpathy (@karpathy): His explanations of AI concepts are the clearest in the industry. Required viewing for anyone entering the space.
- Matt Shumer (@mattshumer_): Shares practical lessons from production agent deployments.
Thinkers
- Ethan Mollick (@emollick): The best bridge between AI capabilities and practical business implications.
- Ben Thompson (@benthompson): The sharpest analysis of how AI changes business strategy. His frameworks for aggregation theory apply directly to agent platforms.
- Leopold Aschenbrenner (@leopoldasch): "Situational Awareness" is the most important piece on where AI capability is heading.
The COT Archive
Index of all Chain of Thought newsletter issues and deep dives. This is the bridge between the handbook and the weekly newsletter. Subscribe at agents.chainofthought.xyz to get new issues delivered weekly.
Deep Dives
Long-form investigations into specific companies, technologies, and trends. These are the pieces that inform most of the handbook's analysis.
- The Closed Loop Advantage in AI (Feb 2026): Why execution lives inside closed systems, not inside the model.
- The Dexterity Stack: Why Robots Lose to Towels (Jan 2026): Intelligence lives at the point of contact.
- The 2026 AI Playbook (Dec 2025): 40 critical insights from 1,436 AI podcasts.
- K-Scale: The Team That Tried to Beat Tesla (Dec 2025): The fragile economics of open-source robots.
- TinyFish: AI to Read the Unreadable Web (Oct 2025): Turning the internet's messiest pages into structured data.
- Decart: Generating Worlds (Sep 2025): How to make a God Machine 10x cheaper.
- The Watt Moment: Manus AI (Sep 2025): $90M ARR in 5 months. What makes it tick?
- Sierra: Enterprise AI Agents (Aug 2025): The unexpected delight of being understood by a machine.
The Secret Agent (Weekly)
Weekly newsletter covering the sharpest stories from the world of AI agents. 5 curated stories per issue, designed to keep you ahead of the curve. Currently at issue #33.
Get the Weekly Brief
Weekly AI agents intel for 13,000+ readers. Subscribe and get the 2026 AI Playbook (PDF) free.