Exactly one year ago, Humanloop published an interview between its CEO Raza Habib and Sam Altman. The interview notes were taken down, but the Internet never forgets, and you can still find a copy via the Wayback Machine.
In those interview notes, Altman discussed OpenAI's roadmap for the rest of 2023 and early 2024.
Sam shared what he saw as OpenAI’s provisional near-term roadmap for the API.
2023:
Cheaper and faster GPT-4 — This is their top priority. In general, OpenAI’s aim is to drive “the cost of intelligence” down as far as possible and so they will work hard to continue to reduce the cost of the APIs over time.
Longer context windows — Context windows as high as 1 million tokens are plausible in the near future.
Finetuning API — The finetuning API will be extended to the latest models but the exact form for this will be shaped by what developers indicate they really want.
A stateful API — When you call the chat API today, you have to repeatedly pass through the same conversation history and pay for the same tokens again and again. In the future there will be a version of the API that remembers the conversation history.
2024:
Multimodality — This was demoed as part of the GPT-4 release but can’t be extended to everyone until after more GPUs come online.
Here's how that's broken down in the twelve months since:
Cheaper and faster GPT-4: check. This launched last November with GPT-4 Turbo.
Longer context windows: (mostly) check. While GPT-4 Turbo had a 128K token context window, it's not quite the million-token window Altman predicted. That said, it's been on OpenAI's radar for months, even before Gemini launched the first million-token context window in February.
Finetuning API: check. The base version of ChatGPT has been fine-tunable since August, and additional developer tools were released in April.
A stateful API: check. This was the Assistants API, which can remember user conversations and save system prompts.
Multimodality: check. While GPT-4 with Vision went into general availability in April, it was quickly followed up with GPT-4o, which natively handles text, audio, and video.
What strikes me is not just that nearly everything Altman listed has come to pass, but also that it was all on the roadmap a full year ago. And there were the major releases he didn’t mention: GPTs, native apps, Sora, and a whole lot more.
Last week gave us another example of the timescale on which OpenAI operates. After legal requests from Scarlett Johansson, the company published a timeline of how it created ChatGPT's voices. From start to finish, the voice features took nearly nine months.
In September of 2023, we introduced voice capabilities to give users another way to interact with ChatGPT.
...
In early 2023, to identify our voice actors, we had the privilege of partnering with independent, well-known, award-winning casting directors and producers.
...
On May 10, 2023, the casting agency and our casting directors issued a call for talent. In under a week, they received over 400 submissions from voice and screen actors.
Of course, OpenAI is far from alone in taking months to design and release a feature. Meta, which has the closest thing to infinite money to spend on AI R&D, took six months to train Llama 2, and Gemini has been incorporating research breakthroughs from months or years ago.
I bring all this up for a couple of reasons.
One: let them cook. If you're keeping up with this space, you're used to seeing bombshell announcements every other week. Maybe that will continue, maybe it won't. But know that those announcements took months, if not years, to come together. Microsoft and OpenAI have been collaborating since 2019 - it's had that long to start dreaming up the idea of a Copilot key on new Windows machines.
Likewise, while the internet was convinced that Google was asleep at the wheel, it was working on Gemini, the first production model capable of working with one million tokens. And despite some hiccups, both it's continuing to improve the model's capabilities and releasing smaller, faster versions to boot.
We must remember that getting to new fundamental research breakthroughs takes time. As tech consumers, we've been spoiled by yearly releases from Apple and Google showcasing new hardware and software. But research doesn't work that way, and yes, a lot of this stuff is still bottlenecked by research. Most people don't realize it took three years to get from GPT-3 to GPT-4. To expect a similar jump in capability (GPT-4 to GPT-5), in just a year, seems bonkers.
Two: the future is already here. There's a quote that's used a lot when it comes to technology: "The future is already here – it's just not evenly distributed." It's attributed to William Gibson, author of Neuromancer and Count Zero. And it's true of many things; for example, most of my friends outside of tech have no idea that you can, today, call a self-driving Waymo as if it's an Uber and have it take you across San Francisco.
In the case of AI, it's not only that the future isn't evenly distributed - it's waiting for distribution. Enormous capabilities are still in development, only visible to the researchers and engineers working on them. Even just considering OpenAI, we know about Sora and Voice Engine, two completely new modalities unavailable to the public. And plenty of other models are in the pipeline from OpenAI, Anthropic, DeepMind, Midjourney, Stability, and others. These companies are training dozens, if not hundreds, of different models to test various parameters and datasets.
Of course, the companies sitting on these capabilities have plenty of reasons not to launch them yet: safety considerations, product decisions, and maximizing PR impact. But even if the overall pace of launches slows down, I suspect we will see the big AI labs continue to share the breakthroughs that exist today in the coming months and years.
I think about Moore's Law and that particular roadmap all the time. I wonder if there will be a similar roadmap for AI that makes itself evident over the next few years, kind of a self-fulfilling prophecy for generative AI.