Diminishing returns: Training
OpenAI, Google, and Anthropic are discovering that bigger isn't always better, as their latest AI models fall short of expectations despite massive investments in training.
The big picture:
Foundation models are hitting roadblocks with OpenAI's Orion, Google's Gemini, and Anthropic's Claude 3.5 Opus all experiencing disappointing results relative to their costs.
The main challenge seems to be a lack of high-quality training data, but the industry’s massive hype has painted it into a corner, as "10x more GPUs" doesn't mean "a 10x better product."
For what it's worth, new approaches do seem necessary - whether that's something like OpenAI's o1, or a non-transformer architecture entirely.
Elsewhere in OpenAI:
OpenAI has made the ChatGPT desktop app for Windows available to all users, while the macOS version can now read code from some developer-focused apps.
Sources say the company plans to launch a new AI agent, codenamed Operator, which can use a computer to take actions on a person's behalf.
A policy blueprint suggests the US and its neighbors should form a "North American Compact for AI" to compete with China in talent, financing, and more.
And OpenAI's new "Student's Guide to Writing with ChatGPT".
Diminishing returns: Benchmarks
AI companies are also developing new internal benchmarks, as existing public benchmarks are quickly becoming obsolete.
Why it matters:
New AI models consistently score above 90% on public benchmarks, making them less useful for distinguishing models by performance. These new internal benchmarks, however, are raising concerns about transparency and comparability.
While new public benchmarks like SWE-bench Verified and FrontierMath are emerging, the lack of standardized testing makes it increasingly difficult for businesses and consumers to make meaningful comparisons between different AI models.
As we shift from chatbots to agents, we'll need even more complex evaluation systems: things like sandbox environments or multi-stage problem-solving scenarios instead of simple multiple-choice questions.
Elsewhere in the FAANG free-for-all:
NASA partnered with Microsoft to create Earth Copilot, an AI chatbot that uses NASA's geospatial information to answer questions about Earth.
Google released a free Gemini app and a Gemini-powered Vids app for iOS, allowing users to interact via text, voice, or camera and use Gemini Live.
YouTube is testing a feature allowing select creators to use AI to restyle licensed songs for their Shorts using text prompts.
And Apple updated Final Cut Pro for macOS and iPadOS, adding AI closed captions and Vision Pro spatial video editing capabilities.
Art-ificial intelligence
A portrait of Alan Turing created by Ai-Da, a humanoid robot artist, sold for $1.1 million at Sotheby's - nearly ten times its estimated value.
Between the lines:
The artwork, depicting Turing as an "AI God," was created through a complex process involving multiple AI systems, 3D printing, and human assistance - challenging traditional notions of artistic creation.
Ai-Da, named after computing pioneer Ada Lovelace and operated by a team of 30 people, blurs the lines between human and machine-made art.
While not the first AI artwork to sell at auction, the substantial price and symbolic subject matter suggest growing market acceptance of AI-generated art, even as it raises questions about creativity, authorship, and artistic value.
Elsewhere in AI absurdity:
Fake AI-generated albums targeting real artists are flooding Spotify to profit from royalty fraud, exploiting a distribution process based on the honor system.
Absurd AI content about Elon Musk fixing America has gone viral on Facebook.
Anthropic is collaborating with the US DOE to ensure Claude models don't share dangerous information about nuclear energy.
An AI chatbot added to a mushroom foraging Facebook group immediately gave tips for cooking dangerous mushrooms.
And O2 has unveiled Daisy, an AI granny designed to waste scammers' time.
Things happen
François Chollet is leaving Google after nearly a decade. A Q&A with Gwern Branwen on anonymity, intelligence, and AGI timelines. High-end AI chips are creating a winner-takes-all trend in the chip sector. Elon Musk's supercomputer freaked out AI rivals. Tech companies are leasing more office space as AI demand grows. SoftBank plans to build an AI supercomputer in Japan. Ilya Sutskever says we're back in the age of wonder and discovery. Taiwan's chip production is on track to increase 22% YoY in 2024. US private data center construction spending has grown to nearly $30B/year. AI adoption rates have stalled in the US at ~33%. Baidu unveils smart glasses powered by its Ernie LLM. AI startup founders hope for lighter regulations under Trump. Alibaba claims its Qwen2.5-Coder-32B-Instruct matches GPT-4. Spotify's CTO on AI-generated music and recommendations. Nvidia B200 GPU and Google Trillium TPU debut on MLPerf Training benchmark. Ecosia and Qwant launch European Search Perspective. US orders TSMC to halt shipments of advanced chips to China. AI makes tech debt more expensive. Claude AI to process secret government data through Palantir deal. The barriers to AI engineering are crumbling fast. AI hype is cooling, according to a new survey. AlphaFold3 is now open source. You're probably not testing your AI well enough. A look at diagrams AI can and cannot generate. Facebook Research releases "Watermark Anything". Researchers detail RoboPAIR, an algorithm to bypass LLM safeguards. Concerns rise about AI-fabricated scientific data. AI disinfo has amplified satire and false narratives since August. Study finds 7% to 17% of CS peer review sentences were written by LLMs. Chegg has lost 500K+ subscribers since ChatGPT's launch. Greg Brockman has returned to OpenAI.