AI Roundup 106: Grok 3

February 21, 2024.

Charlie Guo

Feb 21, 2025

Grok 3

released Grok 3, the latest family of models to incorporate reasoning capabilities and top the benchmarks.

Why it matters:

Grok 3's benchmark scores beat out o1 and R1, though they fall short of the unreleased o3. Early reviews are positive, and the model has been doing well on the LM Chatbot Arena.
The company is also baking new features into its standalone Grok chatbot, such as Deep Search, a "Deep Research" competitor. Currently, the latest Grok models are behind an X Premium+ subscription.
While xAI is still following in OpenAI's footsteps, it's remarkable how quickly they have built a serious AI lab. Just going by their data center - a staggering 200,000 H100s - suggests xAI is trying to win through sheet brute force computing.

Elsewhere in frontier models:

Microsoft unveiled Muse AI, a groundbreaking generative AI model that creates game environments based on visual inputs and controller actions.
Rabbit demonstrated its generalist Android agent controlling tablet apps, building on its earlier web agent technology.
Mistral launched Mistral Saba, a 24B-parameter model specifically trained for Arabic language and cultural content.
And Perplexity open-sourced R1 1776, a modified version of DeepSeek R1 that reportedly removes Chinese censorship elements.

Humanoids

Meta is making a major push into humanoid robotics, focusing on the underlying AI, sensors, and software platform that could power robots made by others.

The big picture:

Many are pinning their hopes on humanoids as the next big technology wave - Nvidia's CEO has called it a multitrillion-dollar opportunity, and Elon Musk predicts we'll see ten billion humanoid robots by 2040.
We're still a long way away - promising demos from companies like Figure and Unitree have been released1, but nothing is available for consumer purchase yet. That said, the underlying software will continue to get better. Just this week, Microsoft Research introduced Magma, an integrated foundation model that has improved spatial awareness.
As the industry grows, Meta aims to become the "Android" of the ecosystem - offering its software for free while setting industry standards from which it can benefit. However, it may have competition from, unsurprisingly, Apple.

Elsewhere in the FAANG free-for-all:

Meta announced LlamaCon, its first developer conference focused on generative AI, scheduled for April 29.
Apple plans to enhance Vision Pro with Apple Intelligence and new features, while its ambitious Siri overhaul faces technical challenges and possible delays.
And Google launched a Gemini-powered AI tool to assist biomedical scientists in hypothesis generation and research acceleration.

Kids these days

As if the threat of automation weren't enough, the programming industry now has a new worry: new junior developers can't actually code.

Between the lines:

To some extent, I agree with the core premise: AI tools let junior devs copy/paste generated code blocks and error messages back and forth without thinking about how the code works or how it might break.
As a whole, the industry is headed towards a difficult spot: after years of programmer shortages, job openings have hit a five-year low. And it's even bleaker for juniors - many companies don't want to train junior devs, while companies like OpenAI and Meta are banking on replacing mid-level engineers as soon as this year.
For what it's worth, senior developers aren't entirely replaceable just yet. OpenAI released SWE-Lancer, a benchmark consisting of real-world freelance coding tasks - and the latest models barely cracked 40% completion.

Elsewhere in AI anxiety:

Anthropic asks a court to nix a DOJ remedy that would block Google from investing in AI startups, saying it would harm competition.
Sam Altman and OpenAI's board weigh governance changes, including special voting rights for its nonprofit board, to deter hostile bids.
NIST plans to cut 497 people, including most staff at its Chips for America program, leaving the AI Safety Institute's future uncertain.
And the UK recast its AI Safety Institute as an AI Security Institute, without bias concerns, following JD Vance's criticism of "hand-wringing about safety".

Thanks for reading Artificial Ignorance! This post is public so feel free to share it.

Things happen

Le Chat tops 1M downloads in just 14 days. HP acquires Humane's assets for $116M. Fiverr unveils tools for gig workers to configure AI models trained on their work. Former OpenAI CTO Mira Murati launches AI startup Thinking Machines Lab. Please Stop Inviting AI Notetakers to Meetings. Investors plan massive AI data center in South Korea. How militaries and startups use AI to safeguard deep-sea infrastructure. Anthropic expects to burn $3B in 2025. NYT greenlights select AI tools for staff use. Match is using AI to detect men's "off-color" messages on Tinder. Walmart's tech and AI investments help it take on Amazon. US states consider "algorithmic discrimination" bills similar to EU's AI Act. Sources describe disputes at Google over AI efforts. Detecting AI agent use and abuse. DeepSeek's rise turns spotlight on Hangzhou as China's AI hub. Perplexity releases Deep Research feature more widely. Dell nears $5B+ deal with xAI for servers. How Altman blindsided Musk with $500B Stargate deal. South Korea suspends DeepSeek app downloads over data rules. AI killed the tech interview - now what? Nvidia launches AI platform to teach ASL. EU organizations partner to develop open-source European LLMs. Google removes Gemini support from main iOS app. Q&A with Satya Nadella on Microsoft's AGI plan and more. ChatGPT crosses 400M weekly active users. Zuckerberg lobbies senators on AI at US Capitol. Meta leads charge against EU's AI Act. ASE Technology opens new plant in Malaysia. EU digital chief says AI regulation reduction not due to US pressure. The Generative AI Con. Woman gets banned for AI voice clone saying "arse". My LLM codegen workflow.

Last week's roundup

AI Roundup 105: AI Action Summit

Charlie Guo

Feb 14

Read full story

Plus whatever the hell this is from Clone Robotics.

Andrew Smith

Feb 21

What are the downsides to not understanding the kernel? I see this happening everywhere, not just in CS and related fields - like nobody knows how to write cursive any more, and pretty soon maybe nobody will remember how to make letters with a pencil. Before that, we sort of phased out memorizing long passages, etc.

With each of these, and over a long enough time frame, the trade off was worth it; but shorter term, I reckon there were some very disruptive moments. I'm guessing the lack of folks understanding how these core languages actually work will lead to some surprises, but also that we get past this awkward, temporary phase where it hurts us more than it helps us.

Expand full comment

4 replies by Charlie Guo and others

4 more comments...