AI Roundup 093: Diminishing returns

November 15, 2024.

Nov 15, 2024

∙ Paid

Diminishing returns: Training

OpenAI, Google, and Anthropic are discovering that bigger isn't always better, as their latest AI models fall short of expectations despite massive investments in training.

The big picture:

Foundation models are hitting roadblocks with OpenAI's Orion, Google's Gemini, and Anthropic's Claude 3.5 Opus all experiencing disappointing results relative to their costs.
The main challenge seems to be a lack of high-quality training data, but the industry’s massive hype has painted it into a corner, as "10x more GPUs" doesn't mean "a 10x better product."
For what it's worth, new approaches do seem necessary - whether that's something like OpenAI's o1, or a non-transformer architecture entirely.

Elsewhere in OpenAI:

OpenAI has made the ChatGPT desktop app for Windows available to all users, while the macOS version can now read code from some developer-focused apps.
Sources say the company plans to launch a new AI agent, codenamed Operator, which can use a computer to take actions on a person's behalf.
A policy blueprint suggests the US and its neighbors should form a "North American Compact for AI" to compete with China in talent, financing, and more.
And OpenAI's new "Student's Guide to Writing with ChatGPT".

Diminishing returns: Benchmarks

AI companies are also developing new internal benchmarks, as existing public benchmarks are quickly becoming obsolete.

Why it matters:

New AI models consistently score above 90% on public benchmarks, making them less useful for distinguishing models by performance. These new internal benchmarks, however, are raising concerns about transparency and comparability.
While new public benchmarks like SWE-bench Verified and FrontierMath are emerging, the lack of standardized testing makes it increasingly difficult for businesses and consumers to make meaningful comparisons between different AI models.
As we shift from chatbots to agents, we'll need even more complex evaluation systems: things like sandbox environments or multi-stage problem-solving scenarios instead of simple multiple-choice questions.

Elsewhere in the FAANG free-for-all:

NASA partnered with Microsoft to create Earth Copilot, an AI chatbot that uses NASA's geospatial information to answer questions about Earth.
Google released a free Gemini app and a Gemini-powered Vids app for iOS, allowing users to interact via text, voice, or camera and use Gemini Live.
YouTube is testing a feature allowing select creators to use AI to restyle licensed songs for their Shorts using text prompts.
And Apple updated Final Cut Pro for macOS and iPadOS, adding AI closed captions and Vision Pro spatial video editing capabilities.

Art-ificial intelligence

A portrait of Alan Turing created by Ai-Da, a humanoid robot artist, sold for $1.1 million at Sotheby's - nearly ten times its estimated value.

Between the lines:

The artwork, depicting Turing as an "AI God," was created through a complex process involving multiple AI systems, 3D printing, and human assistance - challenging traditional notions of artistic creation.
Ai-Da, named after computing pioneer Ada Lovelace and operated by a team of 30 people, blurs the lines between human and machine-made art.
While not the first AI artwork to sell at auction, the substantial price and symbolic subject matter suggest growing market acceptance of AI-generated art, even as it raises questions about creativity, authorship, and artistic value.

Elsewhere in AI absurdity:

Fake AI-generated albums targeting real artists are flooding Spotify to profit from royalty fraud, exploiting a distribution process based on the honor system.
Absurd AI content about Elon Musk fixing America has gone viral on Facebook.
Anthropic is collaborating with the US DOE to ensure Claude models don't share dangerous information about nuclear energy.
An AI chatbot added to a mushroom foraging Facebook group immediately gave tips for cooking dangerous mushrooms.
And O2 has unveiled Daisy, an AI granny designed to waste scammers' time.