Discover more from Artificial Ignorance
The State of AI Engineering
Notes from the first AI Engineer Summit.
For the last three days, I've been at the inaugural AI Engineer Summit, with over 500 attendees and over two dozen speakers. It was an absolute jam-packed conference, and my brain is still processing much of what I saw and heard.
Despite the conference being the first of its kind, there were still some major announcements:
Replit launched two new coding models, with the second being on par with CodeLlama in various categories.
GitHub talked about its Copilot revenue for the first time - it's now making $100 million in annual recurring revenue from its AI code completion tool.
And AutoGPT revealed its $12 million investment from Redpoint Ventures.
But the conference was much more than new models and funding - it was an exploration of what builders are dealing with at the cutting edge, and what might be possible if we can solve some key challenges. It deeply reinforced my belief in the idea of "AI Engineering" being different from what has come before:
Software engineering will spawn a new subdiscipline, specializing in applications of AI and wielding the emerging stack effectively, just as “site reliability engineer”, “devops engineer”, “data engineer” and “analytics engineer” emerged.
The emerging (and least cringe) version of this role seems to be: AI Engineer.
–, “The Rise of the AI Engineer”
The talks from both days were livestreamed and are available via YouTube if you want to dive deeper. If you're more of a text person, I've got recaps from all of the talks and workshops. But here are a few of my key takeaways:
We are so early.
One of the most eye-opening things was how raw a lot of this is. The technology, the design patterns, the libraries, the research, the QA - all of it. As much as it might feel like some folks are already miles ahead, the reality is that we're just starting to figure out what's possible. A few areas where that felt particularly relevant:
Prompting. I kept hearing from speakers how much of a difference prompting makes. The right words in the right order can move the needle more than anything else for lots of different tasks. We're still hacking to get what we want, like begging the LLM to output JSON or threatening to take a human life if it doesn't. Plus, most AI engineers don't even have an agreed-upon prompt management strategy! It's a mix of external tools, internal tools, and spreadsheets.
Evals. The prompting problem is compounded by the fact that we don't have good QA systems figured out yet. And given the non-deterministic nature of LLMs, tweaking prompts and models just seems like trying to run A/B tests without doing quantitative measurements. How do we know if the changes we're making are actually working? Without evals (and there were some great suggestions for how to get started), the only alternative is to do a "vibe check" on your results to see if your changes worked - that seems a little insane.
UX. When you have a ChatGPT hammer, everything looks like a Copilot nail. It was incredibly refreshing to see new approaches to AI UX; chatbots have a time and place, but they probably shouldn't be the default mode of engaging with AI. How do we build different interfaces to engage with all of humanity's knowledge? Not only that - but better UX is also the key to building a moat. GitHub and Midjourney have built data collection and feedback directly into their UX, and have been improving faster than their competitors as a result.
Guardrails. If you've used an LLM for any serious amount of time, you'll know it hallucinates. But LLM issues go deeper than that. If you're using it to call other software, you might get bad data; if you're using it to generate brand-specific content, it might decide to mention a competitor. There isn't a fundamental way of preventing this right now, but there are a variety of approaches (some using other ML models) to try and catch these problems before they get to the end-user.
Mind the hype.
With everything being so new, it's also difficult to know (from the outside) what's real and what's hot air. Take two of the most talked-about topics: agents and RAG (retrieval augmented generation).
With agents, there's a lot of promise - after all, it's the ultimate goal of AI in many ways. We'd love to have Rosie from The Jetsons or Iron Man's Jarvis take care of our tasks without further thought. But we're having a hard time getting today's agents to complete more than the most basic tasks. And even when they do, they usually have a 60-70% success rate at best.
Meanwhile, RAG - a technique to give LLMs "long-term memory" by surfacing relevant documents and adding them to a prompt - has blown up in recent months. But beyond simple demos, we're still figuring out the best practices here. One thing I learned was that RAG is much more successful when the right answer is provided as the first example in the prompt - and when it's stuck in the middle, RAG can be worse than having no documents at all!
There is real value here.
But it's not all bad. Many are wondering whether these wave of AI apps are going to figure out actual business models or whether they're going to fizzle out as the hype subsides. While many will likely not make it, Github has demonstrated that there is real value to be created (and captures) with generative AI.
Github Copilot is now a) profitable and b) generating $100 million in ARR. That's a big deal. Over a million developers have tried the tool, and by Github's measurements, it has made them 55% faster. As compute gets cheaper and models improve, code generation will become more ubiquitous and profitable.
There's also plenty of value to be created with tiny projects - you don't have to be Github or OpenAI to make something people want. Many big-name projects started out as open-source experiments built on nights and weekends. If you're at the cutting edge, a lot of this may seem obvious or pedestrian, but 99.99% of people don't know how this stuff works, let alone how to build with it, so solving tiny problems can lead to big impacts.
It's only going to get faster.
The conference started with the idea of a "1000x engineer." It's a play on the "10x engineer" idea: a programmer so good that they're 10x more productive than the average. With AI, we may have multiple avenues of stacking 10x improvements:
Software engineers enhanced by AI tools.
Software engineers building AI products to 10x others.
AI products that replace software engineers entirely.
And as each of these approaches gets better, the speed of improvements and innovations will keep getting faster (at least for a while). Twelve months ago, not many people were paying attention to GPT-3, and we had a handful of new models being released and discussed each year. Now, a dozen or two models are being uploaded to HuggingFace every week.
The phrase "Cambrian explosion" kept being used, and with good reason. It's impossible to keep up with the latest news articles, research papers, model releases, product launches, and infrastructure improvements. The "state of the art" changes from month to month.
I'm not sure what AI Engineering will look like a year from now - we might have solved the major issues we're facing today, or we might not. It felt like the speakers were at least in agreement on what the major issues were, which is a great thing - it means more focus and more effort will go into solving them.
Yet, as overwhelming as it all might seem, now is still the best time to get started. Let’s get to work.
Artificial Ignorance is reader-supported. If you found this interesting or insightful, consider becoming a free or paid subscriber.