Welcome to a new experiment - the first ever Model Memo! The last 60 days have seen a crazy number of foundation model releases, not to mention a deluge of open-source variants of DeepSeek's R1. But in talking with friends and colleagues, most people just want to know: is the model good?
The answer, of course, it almost always "it depends." Each model is going to shine in certain areas, and fall flat in others. But I wanted to offer this recap as a paid bonus, for readers who haven't had a chance to try the latest and greatest, and mostly just want to know if they should switch their daily driver.
In the last 45 days alone, we've seen a flurry of next-generation language models and features. Some have been focused on scaling up existing systems, others are continuing the trend of "reasoning" models, and still others are experimenting with new agentic UX.
Key Themes
Several New SOTA Models: Multiple foundation models debuted with significantly improved reasoning and knowledge – OpenAI's GPT-4.5, Anthropic's Claude 3.7, xAI's Grok 3, and Google's Gemini 2.0 – each pushing state-of-the-art performance in different areas.
Agentic AI Features: Major platforms introduced agent-like tools (OpenAI's Operator and Deep Research, Perplexity's Deep Research, Anthropic's Claude Code) that enable AI to browse the web, use tools, or write and execute code on behalf of users, signaling a trend toward AI that acts as well as chats.
Focus on Reasoning & Accuracy: A key theme is improved reasoning and lower hallucination rates. Models like GPT-4.5 were tuned to hallucinate less, and hybrid "thinking" modes (Claude 3.7's extended mode, Grok 3's RL-trained reasoning) achieve breakthrough results on hard benchmarks.
Premium Access & Pricing Shifts: The most powerful models now come at a premium. OpenAI's GPT-4.5 is initially limited to $200/month Pro subscribers and carries very high API costs, and xAI's Grok 3 led to a doubling of X Premium+ pricing (to $50/mo). At the same time, some new features, like Deep Research, are offered free (with limits).