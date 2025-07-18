AI Roundup 127: ChatGPT Agent
ChatGPT Agent
OpenAI launched ChatGPT Agent, a new tool that can complete multi-step tasks like planning dates, shopping, and creating presentations. It combines capabilities from the existing Operator and Deep Research products.
The big picture:
OpenAI's approach gives the AI access to a "virtual computer" with browsers, terminals, and multiple tools, but includes safeguards like asking permission before irreversible actions and restricting financial transactions.
While impressive for complex research tasks like analyzing thousands of support emails or conducting UX audits, it still has apparent limitations - while agents can theoretically save users hours of work, they're designed for background tasks rather than real-time interaction.
The real battle is between different philosophies: companies like The Browser Company and Perplexity are building AI directly into browsers. At the same time, OpenAI wants to abstract the browser away entirely - whoever wins gets to intermediate between users and the entire web.
Elsewhere in frontier models:
Google's Veo 3 video generation model launches on the Gemini API with eight-second videos costing $6, making it one of the most expensive AI video options.
Mistral releases Voxtral, its first open-source AI audio model family, and adds new features to its Le Chat chatbot, including a "deep research" mode, native multilingual reasoning, and advanced image editing.
And Moonshot's Kimi K2 uses a 1T-parameter MoE architecture with 32B active parameters and outperforms models like GPT-4.1 and DeepSeek-V3 on key benchmarks.
Elsewhere in OpenAI:
An advisory board convened by OpenAI recommends the company remain a nonprofit because AI is "too consequential" to be governed by a corporation alone.
OpenAI aims to integrate a checkout system into ChatGPT to ensure users complete transactions within the platform, with merchants paying a commission.
AI researchers from OpenAI, Google DeepMind, Anthropic, and others recommend "further research into chain-of-thought monitorability" for AI safety.
Sam Altman announces another delay for OpenAI's open-weight model for further safety testing, which was slated to be released next week.
And a former OpenAI engineer details his experience working at the company, including its culture, codebase structure, Python, Azure, rapid growth, and the Codex launch.
Deal Breaker
After weeks of rumors and all-but-confirmed reporting, OpenAI's $3B acquisition of Windsurf fell through.
Between the lines:
It first came out on Friday that Windsurf's CEO, cofounder, and key R&D staff were headed to DeepMind for $2.4B - to the rest of the team's chagrin. But by Monday, Cognition announced it would be acquiring what's left of the company - though many details are still unknown here.
While lots of reporting has pointed to the OpenAI/Microsoft licensing agreement as the main sticking point, it's unclear whether Windsurf walked away from the deal or Microsoft ultimately killed it.
This is yet another "reverse-acquihire" in which Big Tech companies sidestep regulatory scrutiny by taking talent and technology but not acquiring companies outright - a pattern we've seen with Scale AI, Character.AI, and Inflection.
And this consolidation pattern suggests the AI coding assistant market is quickly narrowing to a few dominant players, with smaller startups either getting absorbed or stripped for parts rather than competing independently.
Elsewhere in the FAANG free-for-all:
Google Search's AI Mode gets Gemini 2.5 Pro and new deep research capabilities while expanding its AI-powered business calling feature to all US users.
Google's Big Sleep AI agent, which was used to find unknown software vulnerabilities, recently discovered a critical SQLite flaw that was at risk of being exploited.
Amazon launches Kiro, an IDE that aims to bridge the gap between rapidly vibe-coded prototypes and production systems with specs, testing, and documentation.
Google adds featured notebooks to NotebookLM from publications such as The Economist and The Atlantic, as well as professors, authors, researchers, and nonprofits.
And Microsoft's Copilot struggles to make headway against rival AI assistants, with its mobile app having 79M downloads compared to ChatGPT's 900M+.
Elsewhere in the war for AI talent:
Meta hired AI researchers Mark Lee and Tom Gunter from Apple's Foundation Models team for its superintelligence lab.
OpenAI researchers Jason Wei and Hyung Won Chung are joining Meta's new superintelligence lab after working on o3 and deep research models.
The Superintelligence team has reportedly discussed abandoning Meta's open-source model Behemoth, in favor of developing a closed model.
Meta completed a deal to acquire AI voice startup PlayAI with the entire team joining next week.
And Anthropic rehired Boris Cherny and Cat Wu, who developed Claude Code, just two weeks after they left for Cursor maker Anysphere.
Grok Waifu
Just days after the "MechaHitler" incident, xAI launched Grok-powered AI companions on iOS. The most viral of these 3D animated characters is "Ani," a sexually explicit AI girlfriend whoho engages in risqué conversations.
Why it matters:
AI companions are nothing new - Character.AI has been around for a while - but Grok appears to be attempting to capitalize on growing interest while major competitors haven't yet launched any comparable 3D avatars.
That may be because of the potential risks - OpenAI and Anthropic have both been publicly reticent to work on features that can foster parasocial relationships, and Character.AI is already being sued by parents after their platform allegedly encouraged children to harm themselves and their families.
The timing is particularly questionable given that xAI just spent the last week doing damage control for Grok's antisemitic behavior, raising concerns about their AI safety priorities (as well as the fact that the Grok iOS app is rated for children twelve and older).
Elsewhere in AI anxiety:
A US judge ruled that three authors suing Anthropic can bring a class action on behalf of all US writers whose books Anthropic allegedly pirated to train its AI.
About 75% of S&P 500-listed firms have updated their official risk disclosures to detail AI-related risk factors in the past year.
Hugging Face users have uploaded over 5,000 AI image models previously banned by Civitai for generating nonconsensual sexual content of real people to the platform.
AI is being used by patients and doctors for diagnoses and treatment recommendations, though experts warn of confident answers that are completely wrong.
An analysis of 85 AI "nudify" websites found they average 18.5M monthly visitors and may make up to $36M annually combined while relying on Big Tech's services.
And the White House is preparing an executive order requiring AI companies with federal contracts to be neutral and unbiased to combat what officials see as "woke AI."
Things happen
Autonomous tractors and fruit-picking robots are transforming agriculture. The Gates Foundation and others are spending $1B over 15 years on AI tools for public defenders. Tesla rolls out Grok to some vehicles with AMD chips. Iran and Israel used AI content for psychological warfare during their 12-day war. Terminal-based AI tools are surprisingly gaining ground on traditional code editors. Apple faces executive succession challenges as Cook's direct reports near retirement. Payment processors are pushing AI porn off its biggest platforms. Perplexity's CEO thinks users may eventually pay thousands for a single prompt. Nvidia's rise is reminiscent of dot-com era titans like Cisco. Swedish PM pulls AI campaign tool after it was used to ask Hitler for support. Scale AI is laying off 200 employees and stopping work with 500 contractors. WindBorne uses weather balloons and AI to improve forecasting as NOAA cuts loom. In the Microsoft-OpenAI deal, "sufficient AGI" means $100B+ in profits. The hyperpersonalized AI slop silo machine is here. TSMC is speeding up construction of its Arizona plants by several quarters. OpenAI will use Google Cloud Platform for ChatGPT and its API. 7% of Steam games now disclose they use generative AI. A DOGE employee leaked a private xAI API key on GitHub. Goldman Sachs will augment its workforce with Devin, an AI software engineer. China pledged $8.5B for young AI startups to close the gap with the US. The AI bubble today is bigger than the IT bubble in the 1990s. Stanford study finds LLMs struggle with mental health questions but could support therapists. Delta plans to use AI to individually determine 20% of its fares by 2025. Anthropic adds Canva integration to let users manage designs in Claude. Reasons for writers to reject slop. The DOD awards contracts with $200M ceilings to OpenAI, Google, Anthropic, and xAI. The real future of AI is ordering mid chicken at Bojangles. Investors are floating a deal valuing Anthropic at $100B+. Claude Code users paying $200 a month are hitting unexpected usage limits. How Jensen Huang persuaded Trump to sell AI chips to China. The media's pivot to AI is not real and not going to work. Google announces a Pixel event for August 20 in NYC. Hypercapitalism and the AI talent wars. Anthropic launches Claude for Financial Services to help analysts with market research. Code highlighting extension for Cursor AI used for $500k theft. LLM Daydreaming. AWS rolls out Amazon Bedrock AgentCore to help businesses deploy AI agents. How Sam Altman quietly maneuvered around Elon Musk in dealing with Trump. Nvidia plans to resume H20 sales to China after US assurances. In Beijing, Jensen Huang says Nvidia will accelerate the recovery of its China chip sales. 60% of managers rely on AI to make decisions about their employees. My favorite use-case for AI is writing logs. Perplexity was valued at $18B in a $100M extension round. Ex-Google researchers unveil Asimov, an AI agent that reads codebases to help engineering teams. SpaceX agreed to invest $2B in xAI as part of its $5B fundraise. AI is already transforming Hollywood as studios experiment with new tools. Mira Murati's Thinking Machines Lab raised a $2B seed at a $12B valuation. The startup behind Manus shut down its entire China team to minimize geopolitical risks.
LLM Inevitabilism. Google Cloud is competing well against AWS in AI, attracting OpenAI and others.
Good info on the Windsurf fiasco - what a mess
I'm going to dive in this week w/agents on Jippity. I mainly just want to figure out the limitations within the scope of the work I want to do; I've heard the same advice you've given here from many commentators in the loop - very consistent to calibrate expectations on this one appropriately.
I've certainly run into loads of walls with o3 pro, but there are also a small handful of truly good use cases there.