The Leap From Answering to Doing
Until now, most of the AI we have discussed in this book works in a simple loop: you ask, it answers. You provide input, it generates output. The interaction is one-and-done, like sending a text message and getting a reply.
AI agents are fundamentally different. An agent does not just answer your question — it goes out and does things on your behalf. It can browse the web, write and run code, send emails, search databases, make calculations, and chain all of these actions together to accomplish complex goals. If a chatbot is like a reference librarian who answers your questions, an agent is like a personal assistant who takes your request and handles the entire task from start to finish.
This chapter explores what makes agents different, how they work, what they can do today, and where their limitations lie.
What Makes an Agent Different From a Chatbot
A chatbot is reactive. It waits for your input, generates a response, and stops. If you ask it to plan a trip, it gives you suggestions. If you ask it to book a hotel, it cannot actually book anything — it can only tell you about hotels.
An agent is proactive. Given the same trip-planning request, an agent could search for flights, compare prices, check your calendar for conflicts, find hotels near your meeting location, and present you with a complete itinerary — or even book it for you, if you grant it the authority to do so.
The key differences are:
Autonomy. An agent can take multiple steps without waiting for your input at each stage. You define the goal, and the agent figures out the steps needed to achieve it.
Tool use. Agents can use external tools — search engines, calculators, databases, APIs, code interpreters, and more. This gives them capabilities far beyond what a language model can do with text generation alone.
Persistence. While a chatbot interaction is typically stateless (each conversation starts fresh), agents can maintain state across interactions, remembering what they have done, what worked, and what still needs to be done.
Decision-making. Agents make choices about what to do next based on the results of their previous actions. If their first approach does not work, they can try a different strategy.
The Agent Loop: Perceive, Think, Act
At the heart of every AI agent is a loop that repeats until the task is complete. This loop has three phases, and understanding it helps demystify how agents work.
Perceive
The agent takes in information about its current situation. This might include:
- The original task you gave it
- The results of its previous actions
- Error messages if something went wrong
- New information it has discovered along the way
Think of this as the agent "looking around" to understand where it is and what it knows.
Think
The agent uses a language model to reason about what to do next. Based on its current perception, it considers its options and decides on the next action. This is where the language model's reasoning capabilities come into play — the agent is essentially asking itself, "Given everything I know so far, what should I do next to make progress toward the goal?"
This thinking step might involve:
- Breaking the overall goal into sub-tasks
- Evaluating which sub-task to tackle next
- Considering whether its previous action succeeded or failed
- Deciding to try a different approach if the current one is not working
Act
The agent executes its chosen action. This is where tool use comes in. The action might be:
- Searching the web for information
- Running a piece of code
- Querying a database
- Calling an external API
- Writing content to a file
- Sending a message
After the action completes, the loop restarts. The agent perceives the result of its action, thinks about what to do next, and acts again. This continues until the task is complete or the agent determines it cannot make further progress.
A Concrete Example
Let us walk through a realistic agent interaction to make this concrete.
Your request: "Find the three most popular Italian restaurants within walking distance of my hotel in downtown Chicago, check if any of them have availability for two people this Saturday at 7pm, and create a comparison with ratings, price range, and distance."
Agent loop, iteration 1:
- Perceive: User wants restaurant recommendations near a Chicago hotel
- Think: I need to know the hotel location first, then search for Italian restaurants nearby
- Act: Search for the user's hotel location (if stored in memory) or ask for clarification
Agent loop, iteration 2:
- Perceive: Hotel is at 123 Michigan Avenue. Now I need Italian restaurants nearby
- Think: I should search for highly-rated Italian restaurants within walking distance
- Act: Use a restaurant search tool to find Italian restaurants near that address, sorted by rating
Agent loop, iteration 3:
- Perceive: Found eight Italian restaurants within a half mile. Need to narrow to top three
- Think: I will select the three highest-rated ones and check availability
- Act: Check reservation availability for Restaurant A on Saturday at 7pm for two people
Agent loop, iterations 4-5:
- Similar reservation checks for Restaurants B and C
Agent loop, iteration 6:
- Perceive: Have all the information now — ratings, prices, distances, availability
- Think: I should compile this into a clear comparison
- Act: Generate a formatted comparison table and present it to the user
The user gave one instruction. The agent executed six or more steps autonomously, using multiple tools, making decisions at each step, and adapting to the information it discovered along the way.
Tool Use and Function Calling
Tools are what give agents their superpowers. Without tools, a language model can only generate text. With tools, it can interact with the real world.
How Tool Use Works
In a tool-use system, the AI model is given a list of available tools along with descriptions of what each tool does, what inputs it requires, and what outputs it returns. When the model decides it needs to use a tool, it generates a structured request — essentially filling in the tool's input fields — and the system executes the tool and returns the result.
For example, a model might have access to a weather tool described as:
- Tool name: get_weather
- Description: Returns current weather conditions for a given city
- Input: city name (text)
- Output: temperature, conditions, humidity, wind speed
When the user asks "Should I bring an umbrella to my meeting in Seattle tomorrow?" the model recognizes it needs weather data, generates a call to get_weather with "Seattle" as the input, receives the result (say, 52 degrees, rainy, 85% humidity), and uses that information to craft its response: "Yes, you should definitely bring an umbrella. Seattle is expected to have rain tomorrow with 85% humidity."
Function Calling
Function calling is the specific technical mechanism that makes tool use possible. The AI model outputs a structured function call — typically in JSON format — that the system can parse and execute. This is different from the model just writing about using a tool in plain text. The model produces a machine-readable instruction that the system actually carries out.
This distinction matters because it means tool use is reliable and precise. The model does not just describe what it wants to do; it produces an executable instruction that the system carries out deterministically.
Common Agent Tools
Modern AI agents typically have access to some combination of:
Web search. Searching the internet for current information, much like you would use a search engine.
Code execution. Writing and running code in languages like Python or JavaScript. This is enormously powerful because it means the agent can do anything a computer program can do — data analysis, file manipulation, calculations, web scraping, and more.
File operations. Reading, writing, and modifying files on a computer system. A coding agent might read your source code, understand it, make changes, and write the updated files.
API calls. Interacting with external services — sending emails through an email API, creating calendar events, posting to social media, querying databases, or interacting with any other service that provides an API.
Browser control. Some agents can operate a web browser, clicking buttons, filling out forms, and navigating websites the way a human would. This allows them to interact with websites that do not have APIs.
Memory and State
One of the most important differences between a simple chatbot and a sophisticated agent is memory. Agents need to remember things across multiple steps and, in some cases, across multiple conversations.
Short-Term Memory: The Context Window
The most basic form of agent memory is the context window — the running record of the current conversation, including all previous steps, tool results, and decisions. This allows the agent to build on what it has already done without repeating itself.
However, context windows have limits. Even the largest models can only hold a certain amount of information in their context. For long-running tasks that generate lots of intermediate results, the agent may need strategies for summarizing or compressing its history to stay within these limits.
Working Memory: Scratchpads and Notes
Many agent systems give the agent a "scratchpad" — a place to write notes to itself. As the agent works through a complex task, it can jot down intermediate findings, track which sub-tasks are complete, and record important information it will need later. This is analogous to the notes you might take while working on a complex project.
Long-Term Memory: Persistent Storage
The most advanced agents have access to persistent memory that survives beyond a single conversation. This might include:
- User preferences. The agent remembers that you prefer window seats on flights, that you are vegetarian, or that you like detailed technical explanations.
- Past interactions. The agent can recall that you asked about a specific topic last week and build on that previous conversation.
- Learned procedures. If the agent figured out a complex process for a task, it can store that process and reuse it next time.
Long-term memory is typically implemented using databases — often vector databases, similar to those used in RAG systems (covered in the previous chapter). The agent stores memories as embeddings and retrieves relevant ones when they might be useful for the current task.
Multi-Agent Systems
Some of the most sophisticated AI applications use not just one agent but multiple agents working together. Each agent might have different specializations, tools, and roles, and they coordinate to accomplish complex tasks.
How Multi-Agent Systems Work
Think of a multi-agent system like a small team in an office. You might have:
- A project manager agent that breaks a large task into sub-tasks and assigns them
- A researcher agent that searches for and synthesizes information
- A writer agent that produces polished content
- A reviewer agent that checks the work for errors and quality
Each agent focuses on what it does best, and they pass work between them in a structured workflow. The project manager might receive a request to write a market analysis report, break it into research tasks, hand those to the researcher, pass the findings to the writer, and then send the draft to the reviewer for quality checks.
Benefits of Multi-Agent Systems
Specialization. Each agent can be optimized for its specific role — the researcher might use a model with strong analytical capabilities, while the writer might use one tuned for creative output.
Parallelism. Multiple agents can work simultaneously on different parts of a task, reducing the total time to completion.
Quality control. Having separate agents for creation and review introduces a check-and-balance system, similar to how human teams use peer review.
Challenges of Multi-Agent Systems
Coordination overhead. Agents need to communicate effectively, and miscommunication between agents can lead to errors or wasted effort.
Compounding errors. If one agent makes a mistake and passes incorrect information to the next, errors can cascade through the system.
Complexity. Multi-agent systems are harder to build, debug, and maintain than single-agent systems. The additional sophistication is only justified for genuinely complex tasks.
Current Capabilities and Limitations
It is important to have a realistic picture of what AI agents can and cannot do today. The technology is impressive but far from perfect.
What Agents Can Do Well
Coding tasks. AI coding agents are among the most mature applications. They can write new code, debug existing code, refactor programs, write tests, and navigate complex codebases. Professional developers increasingly use AI agents as pair programming partners.
Research and analysis. Agents that can search the web, read documents, and synthesize information are effective for many research tasks — from market research to competitive analysis to literature reviews.
Data processing. Agents that can write and execute code are remarkably good at data tasks: cleaning datasets, running analyses, creating visualizations, and generating reports.
Routine workflows. For well-defined, repeatable processes — processing invoices, classifying documents, generating reports from templates — agents can handle high volumes reliably.
What Agents Still Struggle With
Long-horizon planning. Agents are good at tasks that take minutes to complete. Tasks that require sustained effort over hours or days — with many interdependent steps — remain challenging. Agents can lose track of their overall plan, get stuck in loops, or make early decisions that cause problems much later.
Recovering from errors gracefully. When something unexpected happens — a website is down, a file is in an unexpected format, a tool returns an error — agents sometimes struggle to adapt. They may retry the same failing approach or give up too easily.
Knowing when to ask for help. Good human assistants know when they are in over their head and ask for guidance. AI agents sometimes plow ahead with incorrect assumptions rather than pausing to ask the user for clarification.
Understanding nuance and context. Agents can follow explicit instructions well but sometimes miss implied requirements or social context. An agent asked to "clean up this email before I send it to the CEO" might fix grammar and spelling but miss the fact that the tone is inappropriately casual for the audience.
Security and safety. Giving an agent the ability to take real-world actions — sending emails, modifying files, making purchases — means mistakes have real consequences. A bug in an agent's reasoning could lead to unintended emails, deleted files, or unauthorized transactions.
Real-World Agent Applications
Despite the limitations, AI agents are being deployed successfully in a growing number of applications.
Software Development
Coding agents are the most widely adopted category. Tools like GitHub Copilot, Claude Code, and similar products act as AI pair programmers that can understand a codebase, implement features, fix bugs, and write tests. Some teams report that AI agents handle a significant percentage of routine coding tasks, freeing developers to focus on architecture and design.
Customer Service
Advanced customer service systems use agents that can not only answer questions but take actions — processing refunds, updating account information, scheduling appointments, and escalating complex issues to human agents. The key is careful design of what the agent can and cannot do autonomously.
Personal Productivity
AI agents that manage calendars, draft emails, organize files, and handle routine administrative tasks are becoming increasingly common. These agents learn user preferences over time and become more effective the longer they are used.
Business Process Automation
Agents are being used to automate complex business processes that previously required multiple manual steps — from processing loan applications to managing supply chain logistics to conducting initial screening of job candidates.
Scientific Research
Research agents can search literature, analyze data, generate hypotheses, and even design experiments. While they are not replacing scientists, they are accelerating the pace of research by handling time-consuming tasks like literature reviews and data analysis.
The Safety Question
As agents become more capable and autonomous, safety becomes increasingly important. Several concerns are worth understanding:
Alignment. How do you ensure an agent's actions align with your actual intentions, especially for complex goals where the right approach is ambiguous? An agent told to "increase engagement on our social media" might take approaches you would not approve of if it interprets the goal too literally.
Scope control. Agents need clear boundaries on what they can and cannot do. An agent with access to your email should not send messages without your approval (unless you have specifically authorized it). An agent with access to your files should not delete things without confirmation.
Transparency. Users need to be able to understand what an agent did and why. Good agent systems maintain logs of all actions taken, tools used, and reasoning applied, so users can review and audit the agent's behavior.
Reversibility. Ideally, an agent's actions should be reversible when possible. If an agent makes a mistake, you should be able to undo it. This argues for designing agent systems with confirmation steps for high-impact actions and undo capabilities wherever feasible.
Looking Ahead
AI agents represent one of the most active areas of development in AI, and the field is evolving rapidly. Several trends are shaping the future:
More capable tool use. Agents are gaining access to more and better tools, expanding the range of tasks they can handle. The trend is toward agents that can interact with virtually any software system or service.
Better planning. Research on agent planning is improving the ability of agents to handle complex, multi-step tasks without losing track of the overall goal.
Human-agent collaboration. The most effective deployments are not fully autonomous but collaborative — agents that handle routine aspects of a task while checking in with humans for important decisions. This pattern leverages the strengths of both humans and AI.
Standardization. Frameworks and protocols for building agents are maturing, making it easier for developers to create reliable agent systems. This will accelerate adoption across industries.
The bottom line is that AI agents represent the next major step in what AI can do for people. They are moving AI from a tool you consult to a collaborator that works alongside you. The technology is not yet mature enough to trust with fully autonomous operation in high-stakes situations, but for an expanding range of tasks, agents are becoming remarkably useful partners.
See This in the News
Building AI agents is no longer restricted to researchers at major AI labs — the tools and frameworks are becoming accessible to a much wider audience. For a practical look at how agent development is being democratized, read Building AI Agents: No PhD Required on AIWire.