Housekeeping
I'll be attending the HumanX - the AI conference that’s set to redefine the future of technology. Taking place on March 10-13, 2025 at The Fontainebleau Las Vegas, this forum is where the brightest minds in AI will gather to shape what’s next!
I’m excited to extend an exclusive offer to attend HumanX 2025. Register now with code HX25p_artificialignorance and save $250 on general admission.
If you end up getting a ticket, let me know! I'd love to do a reader meetup at the event.
Deep Research
In the aftermath of the DeepSeek hype cycle, OpenAI released a few new models and products late last week.
Here's the latest:
o3-mini, the latest and most cost-effective reasoning model. Despite significantly outperforming GPT 4o on coding benchmarks, it costs less than half as much per token.
Deep Research is a reasoning agent that performs multi-step research on the internet for complex tasks. It's powered by a version of o3 optimized for web browsing and data analysis, and early reviews have been very positive.
And the company also unveiled its first-ever rebrand, complete with a bespoke font, new logo, and updated color palette.
Elsewhere in frontier models:
Google expands its AI offerings by making Gemini 2.0 broadly available, and three new models: 2.0 Flash, 2.0 Flash-Lite and 2.0 Pro Experimental.
ByteDance researchers showcase OmniHuman-1, a new system that can generate entire deepfake videos from a single reference image and audio clip.
Anthropic introduces Constitutional Classifiers, a protective layer for LLMs designed to prevent model jailbreaking and monitor harmful content.
And researchers from Stanford and UW claim they created an AI reasoning model s1 for less than $50 using Gemini 2.0 distillation.
The agent awakens
GitHub Copilot, one of, if not the most widely used, AI coding tools, announced a major upgrade: Agent mode.
Why it matters:
"Agent mode" enables Copilot to iterate on its own code, self-heal errors, and complete subtasks autonomously. Vision for Copilot lets the tool work with mockups and UI designs.
While other coding tools have had similar features for a few months, Copilot's reach is massive - the product has over 1.3 million paying users as of early 2025.
GitHub clearly has bigger ambitions: Project Padawan, coming later this year, will turn Copilot into an autonomous team member, handling entire workflows from issue assignment to PR creation and review responses.
Elsewhere in FAANG free-for-all:
Google is testing an AI Mode in Search powered by Gemini 2.0 that enables users to ask exploratory questions and receive AI-generated responses.
AWS is developing automated reasoning technology that uses mathematical logic to prevent AI hallucinations.
Meta is restructuring teams by combining Facebook and Messenger units while reorganizing its AI group ahead of planned layoffs.
And Microsoft has created an Advanced Planning Unit within its AI division to study the broader implications of artificial intelligence on society, health, and work.
Law and order
This week, the EU AI Act gained some new teeth as new prohibitions (and punishments) on specific types of AI systems took effect.
The big picture:
The EU now bans several specific AI applications, including emotion tracking in workplaces, manipulative "dark patterns" for financial gain, and unverified criminal behavior prediction by police.
The penalties for violations are notably steep - up to €35 million or 7% of global annual revenue, surpassing even the GDPR's substantial fines.
As the US moves away from AI regulation (and regulation in general), the EU stands in stark contrast - sending a clear message that AI compliance will be taken seriously.
Elsewhere in AI geopolitics:
The upcoming AI Action Summit in Paris will focus on open source, clean energy, and AI principles rather than new regulations.
FTC Chair Lina Khan suggests that DeepSeek's breakthroughs demonstrate insufficient US competition and the potential for foreign startups to outpace the US.
And DOGE is using AI tools in Microsoft Azure to analyze sensitive financial data for potential budget cuts.
Elsewhere in AI anxiety:
The UK became the first country to introduce new laws making it illegal to possess, create, or distribute AI tools designed to produce CSAM.
Security researchers found that DeepSeek's R1 failed to detect or block malicious prompts, and its restrictions could be easily bypassed.
Google has removed language from its AI Principles that previously prohibited AI applications likely to cause overall harm.
And Meta has outlined risky AI systems it won't release, including those that could aid in cybersecurity, chemical, and biological attacks.
Things happen
DOGE is developing a custom AI chatbot for the US GSA. OpenAI co-founder John Schulman joins Mira Murati's stealth AI startup. DeepSeek suspends API credit top-ups due to server capacity shortages. AI-restored Beatles song wins Grammy. MLCommons and Hugging Face release massive speech dataset for AI research. OpenAI is expected to air its first Super Bowl ad this weekend. Mistral releases mobile apps and updates AI assistant. DeepSeek's R1 ranks lowest in cybersecurity among leading AI models. Microsoft AI CEO poaches three former Google DeepMind colleagues. SoftBank commits $3B annually to use OpenAI's tech. Robotics startup Figure exits OpenAI deal to focus on in-house AI. Adobe adds contract intelligence to Acrobat's AI Assistant. DeepSeek tops most downloaded mobile apps list. Add "fucking" to your Google searches to neutralize AI summaries. How to train an AI image model on yourself. A look at Future You, an AI-based research tool. Why does AI slop feel so bad to read? OpenAI's Operator is more demo than product. DeepSeek is supercharging the debate over AI transparency. Replace your CEO with an LLM. DeepSeek's rise shows why China's top AI talent is skipping Silicon Valley. A profile of incoming White House OSTP director Michael Kratsios. AI adoption has outpaced PCs and the internet, but productivity gains remain elusive. US man pleads guilty to cyberstalking campaign using AI chatbots. Sam Altman says OpenAI "on the wrong side of history" regarding open-sourcing. Amazon to preview Alexa generative AI revamp. Google's X spins out Heritable Agriculture.
The following report provides insights into s1 and DeepSeek-R1 that you may find valuable:
From Brute Force to Brain Power: How Stanford's s1 Surpasses DeepSeek-R1
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5130864