AI Roundup 104: Deep Research

February 7, 2025.

Feb 07, 2025

Housekeeping

I'll be attending the HumanX - the AI conference that’s set to redefine the future of technology. Taking place on March 10-13, 2025 at The Fontainebleau Las Vegas, this forum is where the brightest minds in AI will gather to shape what’s next!

I’m excited to extend an exclusive offer to attend HumanX 2025. Register now with code HX25p_artificialignorance and save $250 on general admission.

If you end up getting a ticket, let me know! I'd love to do a reader meetup at the event.

Deep Research

In the aftermath of the DeepSeek hype cycle, OpenAI released a few new models and products late last week.

Here's the latest:

o3-mini, the latest and most cost-effective reasoning model. Despite significantly outperforming GPT 4o on coding benchmarks, it costs less than half as much per token.
Deep Research is a reasoning agent that performs multi-step research on the internet for complex tasks. It's powered by a version of o3 optimized for web browsing and data analysis, and early reviews have been very positive.
And the company also unveiled its first-ever rebrand, complete with a bespoke font, new logo, and updated color palette.

Elsewhere in frontier models:

Google expands its AI offerings by making Gemini 2.0 broadly available, and three new models: 2.0 Flash, 2.0 Flash-Lite and 2.0 Pro Experimental.
ByteDance researchers showcase OmniHuman-1, a new system that can generate entire deepfake videos from a single reference image and audio clip.
Anthropic introduces Constitutional Classifiers, a protective layer for LLMs designed to prevent model jailbreaking and monitor harmful content.
And researchers from Stanford and UW claim they created an AI reasoning model s1 for less than $50 using Gemini 2.0 distillation.

The agent awakens

GitHub Copilot, one of, if not the most widely used, AI coding tools, announced a major upgrade: Agent mode.

Why it matters:

"Agent mode" enables Copilot to iterate on its own code, self-heal errors, and complete subtasks autonomously. Vision for Copilot lets the tool work with mockups and UI designs.
While other coding tools have had similar features for a few months, Copilot's reach is massive - the product has over 1.3 million paying users as of early 2025.
GitHub clearly has bigger ambitions: Project Padawan, coming later this year, will turn Copilot into an autonomous team member, handling entire workflows from issue assignment to PR creation and review responses.

Elsewhere in FAANG free-for-all:

Google is testing an AI Mode in Search powered by Gemini 2.0 that enables users to ask exploratory questions and receive AI-generated responses.
AWS is developing automated reasoning technology that uses mathematical logic to prevent AI hallucinations.
Meta is restructuring teams by combining Facebook and Messenger units while reorganizing its AI group ahead of planned layoffs.
And Microsoft has created an Advanced Planning Unit within its AI division to study the broader implications of artificial intelligence on society, health, and work.

Law and order

This week, the EU AI Act gained some new teeth as new prohibitions (and punishments) on specific types of AI systems took effect.

The big picture:

The EU now bans several specific AI applications, including emotion tracking in workplaces, manipulative "dark patterns" for financial gain, and unverified criminal behavior prediction by police.
The penalties for violations are notably steep - up to €35 million or 7% of global annual revenue, surpassing even the GDPR's substantial fines.
As the US moves away from AI regulation (and regulation in general), the EU stands in stark contrast - sending a clear message that AI compliance will be taken seriously.

Elsewhere in AI geopolitics:

The upcoming AI Action Summit in Paris will focus on open source, clean energy, and AI principles rather than new regulations.
FTC Chair Lina Khan suggests that DeepSeek's breakthroughs demonstrate insufficient US competition and the potential for foreign startups to outpace the US.
And DOGE is using AI tools in Microsoft Azure to analyze sensitive financial data for potential budget cuts.

Elsewhere in AI anxiety:

The UK became the first country to introduce new laws making it illegal to possess, create, or distribute AI tools designed to produce CSAM.
Security researchers found that DeepSeek's R1 failed to detect or block malicious prompts, and its restrictions could be easily bypassed.
Google has removed language from its AI Principles that previously prohibited AI applications likely to cause overall harm.
And Meta has outlined risky AI systems it won't release, including those that could aid in cybersecurity, chemical, and biological attacks.

Thanks for reading! This post is public, so feel free to share it.