

Discover more from Artificial Ignorance
This is the second in a series of posts on building an AI agent. Last time, we built a very rudimentary agent using ChatGPT as the core LLM. While we created some useful architecture, it ultimately couldn't do much beyond high-level planning and endless task generation.
#AgentGoals
At the end of our last tutorial, it was clear that our first agent was fairly limited. There were quite a few obvious improvements to make - and I decided to tackle three in particular:
Narrower use case. Having an agent that can "do anything" just means it's pretty terrible at everything. Instead, we should focus on a narrower use case, like "writing a research report" or "setting up a new codebase.
Internet access. Everything gets more interesting with internet access! Whether that's to do research or to give ChatGPT the ability to call other software, we can do much more once we upgrade our agent with Wifi.
Better code quality. We can refactor the OpenAI API calls and add plenty of safeguards and error handling.
Taking all three together, I chose a much narrower goal for our agent's V2: internet research. We're going to reuse the architecture of our existing agent but with a different outcome:
Given a research topic or question, generate a list of sub-topics or questions.
For each sub-topic:
Search the internet for a list of relevant links.
Read and summarize the contents of each link.
Create an overall sub-topic summary.
Write a final answer based on all of the individual sub-topics.
In particular, we can answer questions about topics that rely on factual data or data more recent than 2021. A simple example would be, "Who won the 2023 Super Bowl?" Out of the box, ChatGPT responds with:
As an AI developed by OpenAI, I don't have real-time data or future predictions. As of my last update in October 2021, I can't provide the winner of the 2023 Super Bowl. Please check the most recent sources for this information.
I want to do better than this. Let's get started!
Paying off tech debt
The first thing to do is to refactor our agent. We want some basic scaffolding to extend for our research agent (and any future agents). Most of these methods should be pretty familiar from part one, but there are a couple of new ones.
class Agent:
goal = ""
tasks = []
results = []
model = "gpt-3.5-turbo-16k"
def get_next_task(self) -> str:
""" Return the next task to be completed. """
def set_goal(self, goal: str):
""" Set the goal of the agent. """
def create_completion(
self,
prompt: str,
max_tokens: int = 1024,
temperature: float = 0.2,
) -> str:
""" Create a completion using the agent's model. """
def create_plan(self) -> List[str]:
""" Create a plan to achieve the agent's goal. """
def execute_task(self, task: str) -> str:
""" Execute a task to achieve the agent's goal. """
def process_result(self, task: str, result: str) -> List[str]:
""" Process the result of a task to achieve the agent's goal. """
def finish_plan(self) -> str:
""" Process all of the results once all tasks are completed. """
def run(self):
""" Run the agent until all tasks are completed. """
The first new method here is create_completion
, which is a nice wrapper around the OpenAI API. We're using our own for simplicity, but you can achieve a similar effect (with more features) by using something like LangChain. For now, we're just going to wrap the API call and add some error checking. The other new method is finish_plan
, which we'll come back to in a bit.
def create_completion(
self,
prompt: str,
max_tokens: int = 1024,
temperature: float = 0.2,
) -> str:
""" Create a completion using the agent's model. """
messages = [{"role": "system", "content": prompt}]
try:
response = openai.ChatCompletion.create(
model=self.model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
)
except openai.error.InvalidRequestError:
# Log the error
return ""
return response["choices"][0]["message"]["content"]
We also need to rewrite our prompts - instead of high-level goals, they should focus on research.
def create_plan(self) -> List[str]:
prompt = (
f"You are an AI assistant researching a topic/question: {self.goal}. "
"Return a list of sub-topics or queries to properly research the topic. "
"These sub-topics will be passed to a search engine to read and analyze the results. "
"Respond with one sub-topic/query per line, formatted as a list with dashes. "
"For example:\n\n"
"- First query\n"
"- Second query\n\n"
"Each line must begin with a dash followed by a space. "
"Do not include any headers or formatting in your response. "
)
Now, "goals" are "topics," and "tasks" are "sub-topics." But the meat of our changes needs to take place inside of execute_task
. As a reminder, here's what needs to happen for each sub-topic:
Search the internet for a list of relevant links.
Read and summarize the contents of each link.
Create an overall sub-topic summary.
Each step can be broken out into its own method, starting with internet access.
Just Google it
At first, I struggled with how to actually connect the agent to the internet. Asking ChatGPT for a list of links will result in hallucinations, and any valid URLs would be from 2021 or earlier. Ultimately, I wanted the agent to do its research the same way that I would - by using Google.
But "Google it" is far easier said than done for a piece of software. Scraping Google results is not only incredibly complex: think answer boxes, relevant questions, maps results, knowledge graph results, etc. It's also a great way to get your IP blacklisted. The "proper" approach would be to use Google Cloud's Programmable Search Engine to create a Custom Search JSON API that you can then use to pull structured results.
That's a ton of work for a simple demo. It was much, much easier to sign up for SerpAPI - a service that feeds your query into a search engine and returns the results parsed as JSON. Their free tier offers 100 searches per month, which will work for now. If I wanted to run the research agent at scale, I'd have to either pony up for SerpAPI or make the Google Cloud integration work.
Here's what a quick and dirty integration with SerpAPI looks like (if you're following along at home, don't forget to run pip install requests
):
def get_search_urls(self, topic: str, engine: str) -> list:
""" Search Google for a topic and return a list of URLs and metadata. """
import requests
url = f"https://serpapi.com/search.json"
params = {
"engine": engine,
"q": topic,
"api_key": "YOUR_SERPAPI_KEY",
"location": "United States",
"hl": "en",
}
response = requests.get(url, params=params)
results = response.json()["organic_results"]
return results
Here are the (truncated) results for the query "Who won the 2023 Super Bowl?":
[
{
'link': 'https://en.wikipedia.org/wiki/Super_Bowl_LVII',
'position': 1,
'source': 'Wikipedia',
'title': 'Super Bowl LVII'
},
{
'link': 'https://www.usatoday.com/story/sports/nfl/super-bowl/2023/02/13/super-bowl-57-winners-losers-patrick-mahomes-eagles-defense/11243745002/',
'position': 2,
'source': 'USA Today',
'title': 'Super Bowl 57 winners, losers: Patrick Mahomes breaking ...'
},
{
'link': 'https://www.espn.com/nfl/game/_/gameId/401438030',
'position': 3,
'source': 'ESPN',
'title': 'Chiefs 38-35 Eagles (Feb 12, 2023) Final Score'
},
{
'link': 'https://www.southwestjournal.com/all-super-bowl-winners-1967-to-2023/',
'position': 4,
'source': 'Southwest Journal',
'title': 'All Super Bowl winners 1967 to 2023: From Legends to ...'
},
]
This gives us a list of reasonably relevant links for a given query - but how do we read and analyze those links?
Read โem and weep
Since we're already using ChatGPT, you might be tempted to keep using it to read the search results - I know I was. The problem is that each search result is a web page filled with thousands of HTML tags, JavaScript code, and special characters. Even a modest site can easily consume ChatGPTโs entire 16K or 32K context window, even before accounting for extra prompts or the response.
Fortunately, parsing webpages is not a new problem. newspaper
is a fantastic Python library that easily retrieves, parses, and even summarizes the contents of a URL. It's crazy simple to get the contents of our search results:
def retrieve_url(self, topic: str, url: str) -> str:
""" Retrieve the contents of a URL. """
from newspaper import Article
article = Article(url)
article.download()
article.parse()
return article.text
This just leaves our overall task execution method, where we're compiling the article summaries into a single output. Once again, we're revising the prompt but keeping the arguments and return value the same.
def execute_task(self, task: str) -> str:
""" Execute a task to research a subtopic. """
search_urls = self.get_search_urls(task, "google")
articles = []
for search_url in search_urls:
url = search_url["link"]
article = self.retrieve_url(task, url)
articles.append(article)
prompt = (
f"You are an AI assistant researching a topic: {task}. "
"Below are the contents of the top Google search results. "
"Create a write-up of the topic using the articles below. "
"Only use the data from the articles, do not include any other information. \n\n"
"\n\n".join(articles)
)
result = self.create_completion(prompt, max_tokens=4096, temperature=0.2)
return result
The best laid plans
Earlier, I mentioned that there was a second new method, finish_plan
. That's because I've altered the strategy of the agent slightly. Before, we were using process_result
to analyze the task result and refresh our tasks (leading to an endless list of TODOs). Now, process_result
internally keeps track of the answers, but doesn't do anything with them:
def process_result(self, task: str, result: str) -> List[str]:
""" Process the result of a task to research a topic. """
self.results.append((task, result))
return self.tasks
I made this change to actually finish our task list. Otherwise, ChatGPT will keep creating work for us to do. But it means that we need a step to look at all of our results together, and put together the final answer that we want.
def finish_plan(self) -> str:
""" Process all of the results once all tasks are completed. """
for result in self.results:
prompt = (
f"You are an AI assistant researching a topic/question: {self.goal}. "
"Here is the result of your research so far: "
"====================\n\n"
f"{self.answer}"
"====================\n\n"
"Given the information below, rewrite the report. "
"Include any of the new information that you think is relevant. "
"====================\n\n"
f"{result[1]}"
)
self.answer = self.create_completion(prompt, max_tokens=4096, temperature=0.2)
return self.answer
A simple (but bad) approach would be to send all of our article summaries to ChatGPT and ask it to write a summary. The problem is that we only have so many tokens to use, and the article content might easily go over that limit. Instead, we're starting with an empty answer
variable and repeatedly showing it to ChatGPT along with new information. It should add the new information and re-write the answer.
Report card
How did our agent do? Here's what happens when we run it with our original query, "Who won the 2023 Super Bowl?":
The Kansas City Chiefs emerged as the winners of the Super Bowl LVII, the championship game of the National Football League (NFL) for the 2022 season. They defeated the Philadelphia Eagles with a score of 38-35 in a game that took place on February 12, 2023, at State Farm Stadium in Glendale, Arizona. The Chiefs made a comeback in the second half after trailing 24-14 at halftime, with the game-winning field goal kicked by Harrison Butker. The Chiefs' quarterback, Patrick Mahomes, was named the Super Bowl Most Valuable Player (MVP). The game was the most-watched program in American television history, with an average of 115.1 million viewers. The halftime show, headlined by Rihanna, peaked at 118.7 million viewers.
This works... surprisingly well! As a longer test, I asked the question "What is Retrieval Augmented Generation (RAG)?" and changed the prompt to ask for Markdown. Here's a brief snippet:
# Retrieval-Augmented Generation (RAG): An In-depth Analysis
Retrieval-Augmented Generation (RAG) is a sophisticated technique employed in the field of natural language processing. It amalgamates the strengths of retrieval-based models and generative models to augment the quality and relevance of the text generated. RAG is designed to address the limitations of foundation models, which are typically trained offline on general domain corpora, making them less effective for domain-specific tasks.
## How RAG Works
RAG operates by using a retrieval-based model to extract pertinent information from a knowledge base or a set of documents based on a given query or context. This information is then used as input or additional context for the generative model. This allows the generative model to leverage the accuracy and specificity of the retrieval-based model to produce more relevant and accurate text.
Retrieval models are designed to retrieve relevant information from a given set of documents or a knowledge base. They use techniques like information retrieval or semantic search to identify the most relevant pieces of information based on a given query. Generative models, on the other hand, are designed to generate new content based on a given prompt or context. These models use a large amount of training data to learn the patterns and structures of natural language.
It's stilted language, undoubtedly because of the words "research" and "report" being thrown around, as well as the technical topic and search results. But that's very fixable with some prompt trial and error.
Next Steps
Better prompts. I'm going to keep experimenting with these prompts - they're okay, but far from excellent. I think the output could be quite impressive with the right prompt engineering. For example, right now it's generating too much text for a simple question, but probably not enough for a research report. How could we tweak the prompts to make it more relevant to the type of answer we want?
Better task management. One thing that's still happening is we're generating a ton of tasks for potentially a short answer. What would be better is figuring out how to narrow down the initial task list so that we're only researching what we need - nobody needs 100 Google search results to figure out who won the Super Bowl.
Token management. Between parts one and two, I've been conspicuously avoiding talking about token management, but it's a real issue. For these tutorial examples, we can avoid the problem by simply using a model with a bigger context window. Unfortunately, that's a half-baked (and expensive) solution. Intelligently shrinking our prompt or summarizing pieces as we go might be a better approach. OpenAI also offers tiktoken
, a library to calculate how many tokens a given prompt will use.
Long-term memory. While we can print out individual steps or save the final result to a text file, we're losing most of the intermediate data. Plus, we're sort of ham-fisting our way around ChatGPT's limited memory by providing the existing text and revising it at each step. Previously, we've experimented with using embeddings as long-term memory, which might be an option here.
Was this tutorial helpful? Where would you want to see this project go next? Leave a comment below!
Tutorial: How to build an AI agent (Part 2)
Love this part 2 - thanks for sharing!
Dope AF.