

Discover more from Artificial Ignorance
This is the first in a series of posts on building an AI agent. For part one, we'll examine the structure of an agent and put together a strategy to build a (very) basic one of our own.
What is an agent?
Today, most generative AI interactions follow a similar pattern. You enter a prompt, and the model creates a response based on your input. To get another answer, you have to provide another prompt. A human (you) is always initiating the process.
Agents work differently. The idea is that they can plan and act independently of people. Rather than a single-use prompt, you provide a goal - to research a topic or take an action. Then, the agent creates a plan and gets to work. Think JARVIS from Iron Man - a piece of software that can problem solve, take direction, conduct research, and do simple tasks.
In the ideal case - and to be clear, we are currently very far from here - agents can plan, evaluate, and adapt to achieve their goals in the best way possible. It's potentially a far more flexible setup than traditional automation, which uses triggers or logic to execute fixed, rigid actions. We can imagine a world of small human teams leveraging much larger teams of AI agents to get more done. Some simple, contrived examples would be scheduling meetings using agents or running your online store via an agent.
Of course, the examples of agents we have today are still pretty limited. While LLMs have made planning and task management much more capable, we’re still developing the systems to translate tasks into actions. While there are plenty of academic examples, like Stanford’s 8-bit Westworld experiment, practical examples like Shopify Sidekick or Harvey AI are pretty narrowly tailored. One of the best use cases might be internet research, as that's a task that can be done completely through a computer program.
There's a way to apply AI agents to nearly any task, at least where no higher-level strategic or creative functions are needed. Since they use LLMs to reason, they're limited to the capabilities of the model you're using. As these improve, AI may pick up on more nuance and understand more clearly the ins and outs of reaching a goal, making them suitable for more complex tasks in the future.
If you want to play with some existing agent projects, you can try Auto-GPT, AgentGPT, babyagi, or JARVIS.
Anatomy of an agent
After researching and testing out a handful of existing LLM agents, I've found a common structure to how the agents operate. While many features can be added on top of this skeleton, it's a great foundation to start with.
Set a goal. The program starts by taking an initial prompt from the user and saving it as the goal. Sometimes, the program will also confirm that it understands what to do and will get approval from the user.
Create a plan. With a goal in mind, we next use the agent to generate a list of tasks. This is where good prompting begins to play a part - high-level goals lead to high-level task lists, which are not particularly useful.
Execute the next task. Most of the agent's time is spent in a loop: execute the next task, analyze the results, repeat. But "execute the next task" can be roughly broken up into three steps.
Gather information. Before doing an action, we want to gather all of the relevant information. That could mean searching the internet, querying a database, or retrieving notes/context from other tasks.
Take action. Eventually, we’ll need to take action. That could mean calling another program or model or using a third-party API.
Store data. Once we’re done with the task, it’ll help to keep track of the results. On top of that, any context we have for the task, like new relevant information, will be useful to store.
Process the results. After finishing the task, we want to analyze the results. We may decide to create additional tasks or move to the next one.
Reprioritize the task list. If we’re creating new tasks, we’ll evaluate our new task list and decide if we need to reprioritize.
Return to step 3. And that’s the loop! We keep going until we run out of tasks or accomplish our goal.
There are a few things to note about this outline. First, this loop often runs forever: if you ask ChatGPT to make a list of tasks more granular or whether a plan should have more steps, it will happily generate more to-dos. And in fact, we're going to design an agent that will run forever (until it’s killed); not ideal for professional software, but perfectly fine for a tutorial.
Second, for our basic agent, we won’t have any internet access. That means no gathering information or using other software - we’re just building a planning agent for now. In the future, we’ll look at adding internet access. And finally, we’re going to “store data” locally: for a more robust system we’d want some sort of database, but our V1 will just keep everything in memory.
With that in mind, let's outline an Agent class that we'll fill in. We're going to be using ChatGPT for this example, so we'll also include our OpenAI key (and don’t forget to run pip install openai
).
from typing import List
import openai
openai.api_key = "YOUR_OPENAI_KEY"
class Agent:
goal = ""
tasks = []
results = []
model = "gpt-3.5-turbo"
def __init__(self, model: str = ""):
pass
def set_goal(self, goal: str):
pass
def create_plan(self) -> List[str]:
pass
def get_next_task(self) -> str:
pass
def execute_task(self, task: str):
pass
def process_result(self, task: str, result: str) -> List[str]:
pass
def run(self):
pass
Filling in the gaps
Some of these methods are pretty straightforward, like __init__
, get_next_task
, and set_goal
:
def __init__(self, model: str = ""):
if model:
self.model = model
def get_next_task(self) -> str:
return self.tasks.pop(0)
def set_goal(self, goal: str):
self.goal = goal
But the remainder of the tasks will require using ChatGPT to power the agent. The first step is going to be putting together our list of tasks.
def create_plan(self) -> List[str]:
prompt = (
f"You are an AI assistant working towards a goal: {goal}. "
"Return a list of tasks to be completed to achieve the goal. "
"Respond with one task per line, formatted as a list with dashes. "
"For example:\n\n"
"- First task\n"
"- Second task\n\n"
"Each line must begin with a dash followed by a space. "
'If the list is empty, write, "There are no additional tasks." '
"Do not include any headers or formatting in your response. "
)
messages = [{
"role": "system",
"content": prompt
}]
response = openai.ChatCompletion.create(
model=self.model,
messages=messages,
max_tokens=1024,
temperature=0.2
)
task_content = response["choices"][0]["message"]["content"]
tasks = task_content.split("\n")
for task in tasks:
task = task.replace("- ", "")
if task == "There are no additional tasks.":
return
self.tasks.append(task)
return self.tasks
This is the first step of our agent, and it comes with some pretty big caveats. This code is super fragile. There is no error handling for OpenAI errors or timeouts and we aren't checking that our prompt fits within the max token length. We aren't even checking that the output is formatted correctly at all! Plus, the prompt itself is pretty messy; it would benefit from a more refined template.
Of course, these challenges are all solvable. But I wanted to point them out in case you're hitting errors running this yourself. We'll look at making our agent more robust in a later post, but in the meantime, you can check out ChatGPT function calling, LangChain, or Microsoft Guidance to learn more about adding more rigid structure to LLM responses.
We're going to take a similar approach to task execution: we're creating a prompt with our goal and our next task. For additional context, we also include the previous task and result so the agent knows where we're coming from. We could also include the previous tasks or a summary of prior results - but keep in mind that ChatGPT has a default limit of 4096 tokens, so providing the extra context will risk causing an error (not to mention costing more per task).
def execute_task(self, task: str) -> str:
prompt = f"You are an AI assistant working towards a goal: {goal}. "
if self.results:
last_result = self.results[-1]
prompt += (
f'Your previous task was "{last_result[0]}", '
f'and your previous result was "{last_result[1]}". '
)
prompt += (
f"Your next task is: {task}. "
"Perform this task."
)
messages = [{
"role": "system",
"content": prompt
}]
response = openai.ChatCompletion.create(
model=self.model,
messages=messages,
max_tokens=1024,
temperature=0.2
)
result = response["choices"][0]["message"]["content"]
return result
With additional development, we would use vector embeddings (see the last tutorial) to store our progress and summarize what's been done so far.
After completing a task, we want to handle the output. We're going to store the task and the result (to reference in a future execution step), and then ask the LLM to use the result to reprioritize the list of tasks. We're going to replace our task list entirely with the new output, which isn't the best approach but is a simple one to start with.
def format_tasks(self) -> str:
return "\n".join([f"- {t}" for t in self.tasks])
def process_result(self, task: str, result: str) -> List[str]:
self.results.append((task, result))
prompt = (
f"You are an AI assistant working towards a goal: {goal}. "
f'Your previous task was "{task}", '
f'and your previous result was "{result}". '
f'Currently, your remaining tasks are: {self.format_tasks()}.\n'
"Given this information, update the list of tasks. "
"Add more tasks if you think they are needed, "
"and reorder the list from highest priority to lowest priority. "
"Respond with one task per line, formatted as a list with dashes. "
"For example:\n\n"
"- First task\n"
"- Second task\n\n"
"Each line must begin with a dash followed by a space. "
'If you feel the goal has been completed, write "There are no additional tasks." '
"Do not include any headers or formatting in your response. "
)
messages = [{
"role": "system",
"content": prompt
}]
response = openai.ChatCompletion.create(
model=self.model,
messages=messages,
max_tokens=2048,
temperature=0.2
)
task_content = response["choices"][0]["message"]["content"]
tasks = task_content.split("\n")
for task in tasks:
task = task.replace("- ", "")
if task == "There are no additional tasks.":
return
self.tasks.append(task)
return self.tasks
In writing the process_result
method, we're providing ChatGPT with a list of the tasks that we still have to do. In designing the prompt, I decided to create a helper method, format_tasks
, which will also come in handy with debugging.
Like I mentioned above, because of the nature of LLMs, ChatGPT will nearly always return more tasks. A more sophisticated agent would have a separate way of checking whether the goal was "completed" or not, rather than just generating more conversation responses.
We've been doing a lot of repetition when it comes to the OpenAI API calls - it would be better to refactor these methods and use a shared call_openai
method instead. But we have all of the major pieces, so we're ready to put together our run
method:
def run(self):
print("\nCREATING PLAN...\n")
self.create_plan()
print(self.format_tasks())
while len(self.tasks):
task = self.get_next_task()
print(f"\nNEXT TASK: {task}\n")
result = self.execute_task(task)
print(f"\nRESULT: {result}\n")
self.process_result(task, result)
print("\nUPDATED PLAN:\n")
print(self.format_tasks())
print("\nTASKS FINISHED.\n")
This maps pretty cleanly to the outline we first created. But how well does it work?
Running our agent
It's pretty straightforward to run our agent. If we take the code above and add it to a file named agent.py
, we can add a little bit of boilerplate. Then we can run python agent.py
and watch the fireworks!
if __name__ == "__main__":
goal = "Create a business plan for a boba food truck."
agent = Agent()
agent.set_goal(goal)
agent.run()
Here's what I got from the first few task executions of the agent. It's a little too long to include here, but I've got a screenshot to give you an idea.
Next steps
After running our super simple agent a few times, I’m a bit… underwhelmed. It's essentially doing a lot of brainstorming on repeat, and it isn't particularly good at "making progress" toward the goal. Looking forward, there are some pretty obvious improvements that we can make to the program, and some questions to explore.
Narrower use case. Having an agent that can "do anything" just means it's pretty terrible at everything. Instead, we should focus on a narrower use case, like "writing a research report" or "setting up a new codebase."
Better goal checking. This agent runs on an endless loop, but how would we prevent that? If we have a narrower use case, how can we be better at understanding when we're "done"?
Long-term memory. In some of my tests, it was clear that ChatGPT lost the thread at some points. We've tried using LlamaIndex as a sort of short-term memory for LLMs - how well would that work here? When should we provide additional context for the agent?
Internet access. Everything gets more interesting with internet access! Whether that's to do research or to give ChatGPT the ability to call other software, we can do much more once we upgrade our agent with Wifi. Of course, that also introduces more issues: If we're pulling or pushing third-party data, how can we manage hallucinations?
Better prompts/User input. The out-of-the-box prompts will be far too general for real-world usage. What would be better would be tailoring them to specific roles or getting user input. The prompts I used here were rough first drafts - after some consideration, I already have some ideas for improving them.
Better code quality. And as I said before, we can refactor the OpenAI API calls and add plenty of safeguards and error handling.
Was this tutorial helpful? Where would you want to see this project go next? Leave a comment on Substack!