This is the first in a series of posts on building an AI agent. For part one, we'll examine the structure of an agent and put together a strategy to build a (very) basic one of our own.
What is an agent?
Today, most generative AI interactions follow a similar pattern. You enter a prompt, and the model creates a response based on your input. To get another answer, you have to provide another prompt. A human (you) is always initiating the process.
Agents work differently. The idea is that they can plan and act independently of people. Rather than a single-use prompt, you provide a goal - to research a topic or take an action. Then, the agent creates a plan and gets to work. Think JARVIS from Iron Man - a piece of software that can problem solve, take direction, conduct research, and do simple tasks.
In the ideal case - and to be clear, we are currently very far from here - agents can plan, evaluate, and adapt to achieve their goals in the best way possible. It's potentially a far more flexible setup than traditional automation, which uses triggers or logic to execute fixed, rigid actions. We can imagine a world of small human teams leveraging much larger teams of AI agents to get more done. Some simple, contrived examples would be scheduling meetings using agents or running your online store via an agent.
Of course, the examples of agents we have today are still pretty limited. While LLMs have made planning and task management much more capable, we’re still developing the systems to translate tasks into actions. While there are plenty of academic examples, like Stanford’s 8-bit Westworld experiment, practical examples like Shopify Sidekick or Harvey AI are pretty narrowly tailored. One of the best use cases might be internet research, as that's a task that can be done completely through a computer program.
There's a way to apply AI agents to nearly any task, at least where no higher-level strategic or creative functions are needed. Since they use LLMs to reason, they're limited to the capabilities of the model you're using. As these improve, AI may pick up on more nuance and understand more clearly the ins and outs of reaching a goal, making them suitable for more complex tasks in the future.
If you want to play with some existing agent projects, you can try Auto-GPT, AgentGPT, babyagi, or JARVIS.
Anatomy of an agent
After researching and testing out a handful of existing LLM agents, I've found a common structure to how the agents operate. While many features can be added on top of this skeleton, it's a great foundation to start with.
Set a goal. The program starts by taking an initial prompt from the user and saving it as the goal. Sometimes, the program will also confirm that it understands what to do and will get approval from the user.
Create a plan. With a goal in mind, we next use the agent to generate a list of tasks. This is where good prompting begins to play a part - high-level goals lead to high-level task lists, which are not particularly useful.
Execute the next task. Most of the agent's time is spent in a loop: execute the next task, analyze the results, repeat. But "execute the next task" can be roughly broken up into three steps.
Gather information. Before doing an action, we want to gather all of the relevant information. That could mean searching the internet, querying a database, or retrieving notes/context from other tasks.
Take action. Eventually, we’ll need to take action. That could mean calling another program or model or using a third-party API.
Store data. Once we’re done with the task, it’ll help to keep track of the results. On top of that, any context we have for the task, like new relevant information, will be useful to store.
Process the results. After finishing the task, we want to analyze the results. We may decide to create additional tasks or move to the next one.
Reprioritize the task list. If we’re creating new tasks, we’ll evaluate our new task list and decide if we need to reprioritize.
Return to step 3. And that’s the loop! We keep going until we run out of tasks or accomplish our goal.