Gemini 1.5
Barely a week after releasing Gemini Ultra to consumers, Google announced Gemini 1.5 - a new model with improved performance and impressively large context windows.
Between the lines:
Gemini 1.5 Pro is reportedly on par with Gemini 1.0 Ultra (if this is extremely confusing, don't worry - you're not alone). And we already know Gemini 1.0 Ultra is about as good as GPT-4.
The model's standard context window is 128,000 tokens, though some early testers will have access to a 1 million token context version. Google has also had successful retrieval results on up to 10 million tokens - a staggering amount of data.
These developments could mean a significant capability improvement from GPT-4. However, this is mostly just an announcement for now - the model is being previewed with a limited number of developers and enterprise customers.
Elsewhere in the FAANG free-for-all:
Leaked Google documents also detailed Goose, a Gemini-based coding model designed for internal use.
Apple is reportedly testing a Copilot-like tool for Xcode, and is exploring other AI features for macOS.
And Nvidia's market cap surpassed Amazon and Alphabet, as the chip maker debuts new GPUs and hardware-based LLM apps.
Elsewhere in foundation models:
Meta releases V-JEPA, a model that watches videos to fill in missing gaps.
Stability AI shows off Stable Cascade, an improved image generator built on a new architecture.
And Amazon details BASE TTS, the largest ever text-to-speech model yet.
Sora
Mere hours after Gemini's release, OpenAI published Sora, a new text-to-video model. The new model can create up to a minute of 1080p video, and the example clips are shockingly good. It can handle detailed prompts and render scenes that would otherwise require enormous effort.
Why it matters:
In addition to creating new videos from scratch, Sora can animate still images, extend videos forwards or backwards in time, transform existing videos with prompts, and connect two videos.
But the creation of videos is kind of a red herring - the model is actually being trained to understand and simulate the real world. As a result, it demonstrates a remarkable ability to model space and motion.
Unlike DALL-E, Sora's outputs are very photorealistic - even more so than other AI video tools I've seen. We're about to hit a point where AI made videos are indistinguishable from the real thing. It's worth pondering the immediate consequences of that.
Elsewhere in OpenAI:
OpenAI is testing a "memory" feature that lets ChatGPT (and custom GPTs) remember conversation details over time.
Andrej Karpathy, one of OpenAI's founding members, has left the company.
OpenAI is denied its trademark application for "GPT."
The company has suspended actors associated with Russia, China, North Korea, and Iran after potential malicious activity.
Sarah Silverman's lawsuit against OpenAI has been mostly dismissed.
And the FT looks at OpenAI's business model as the company works to more than double its revenue by 2025.
Patent pending
The US Patent and Trademark Office (USPTO) has said that AI cannot be named as a patent inventor - only a real person who has made a “significant contribution” to the invention.
Between the lines:
The definition of "significant contribution" really matters. For example, an inventor who asks ChatGPT to design a critical part of a remote-control car would not be eligible for a patent.
That said, the USPTO does take into account how extensive prompting may warrant consideration. And more broadly, it doesn't require applicants to disclose when they used AI.
This decision makes sense to me - it's in keeping with the perception of AI as a tool, like a camera or a microscope. It would be silly to share patent credit with a camera or to ban patent applications that happened to use a microscope.
Elsewhere in AI regulation:
Congressional aides are pessimistic that a wide-ranging AI bill will pass Congress this year.
The FTC is looking to modify an existing rule on business impersonation to also cover individual deepfakes.
And the EU proposes criminalizing AI-generated child sexual abuse and deepfakes.
Things happen
Calling US lawmakers with the AI voices of kids killed in mass shootings. Slack's new AI features to summarize threads, channels, and questions. Cohere's nonprofit research lab releases an open-source multilingual LLM Aya. A look at Adobe's careful approach to Firefly, emphasizing brand safety and IP protections. The unsettling scourge of obituary spam. How AI is transforming the business of advertising. Scientists aghast at bizarre AI rat with huge genitals. AI copyright lawsuits could make the whole industry go extinct. McDonald's is making job applicants take weird AI personality tests. Antagonistic AI. The University of Michigan is selling student recordings to train AI. Microsoft to invest €3.2 billion in German AI infrastructure. Police report shows how a high school deepfake nightmare unfolded. AI needs so much power that old coal plants are sticking around. Automating ableism. AI is starting to threaten white-collar jobs. Goody-2, the world’s ‘Most Responsible’ AI chatbot. Nvidia CEO: "Every country needs sovereign AI." Azure is now growing much faster than AWS, thanks to its relationship with OpenAI. Watermarking the future. Airbnb plans to use AI to create the "ultimate concierge."
Yeah Sora and Gemini 1.5 are by far the biggest news of this week, and both in a single day, hours apart.