The future is hear
Last week I did a deep dive with some voice-generating software, and it was shocking how good it was. I wrote:
In earlier drafts of this post, I was tempted to include audio samples of my cloned voice. But as I learned more about the space, it became pretty obvious that that was a bad idea. Just last week, a journalist broke into his own bank account using an AI-cloned voice. Today, scammers send phishing emails from "a family member stranded overseas" - what if they could also leave a voicemail?
I wasn't expecting to be right so soon.
The man calling Ruth Card sounded just like her grandson Brandon. So when he said he was in jail, with no wallet or cellphone, and needed cash for bail, Card scrambled to do whatever she could to help.
“It was definitely this feeling of . . . fear,” she said. “That we’ve got to help him right now.”
Card, 73, and her husband, Greg Grace, 75, dashed to their bank in Regina, Saskatchewan, and withdrew 3,000 Canadian dollars ($2,207 in U.S. currency), the daily maximum.
They hurried to a second branch for more money. But a bank manager pulled them into his office: Another patron had gotten a similar call and learned the eerily accurate voice had been faked, Card recalled the banker saying. The man on the phone probably wasn’t their grandson.
This is one of the more distressing cases of AI-fraud, but it's far from the only one. A viral Instagram photographer recently confessed that his posts weren't, in fact, photographs:
Soon after [Jos] Avery's Instagram feed launched in October, positive comments about his fake photos began pouring in. "All I can say is: Your art is somehow unique, very unique, also very precious; you are actually telling paramount stories to the viewer using your cams," wrote one commenter four weeks ago. "Setting novel highlights in contemporary photography IMHO! Your work is a great delight to mind and soul."
Up until very recently, when asked, Avery was either vague about how he created the images or told people his works were actual photographs, even going so far as to describe which kind of camera he used to create them ("a Nikon D810 with 24-70mm lens"). But guilt began to build as his popularity grew.
Back in January, CNET came under fire when it was discovered that they were publishing AI-generated articles without clearly disclosing it.
CNET is the subject of a swirling controversy around the use of AI in publishing, and it’s Jaffe’s team that’s been at the center of it all. Last week, Futurism reported that the website had been quietly publishing articles written using artificial intelligence tools. Over 70 articles have appeared with the byline “CNET Money Staff” since November, but an editorial note about a robot generating those stories was only visible if readers did a little clicking around.
It wasn’t just readers that were confused about what stories on CNET involve the use of AI. Beyond the small CNET Money team, few people at the outlet know specific details about the AI tools — or the human workflow around them — that outraged readers last week.
Look, I don't want to cry wolf here. Long before Midjourney and Stable Diffusion, Photoshop gave millions the ability to trick people with the push of a few buttons. A 1938 radio broadcast of War of the Worlds caused panic when listeners believed Martians were actually invading. We have long grappled with the pace of technology and our inability to keep up with techno-literacy.
The thing that feels a little bit different, now, is the capability and readiness of bad actors. The industrialization of technology abuse. When Photoshop came out, there weren't organized teams of hackers or internet fraudsters ready to add it to their workflows. But now, these tools can be deployed at scale before most are even aware they exist.
FAANG free for all: Microsoft
We've talked about ChatGPT and its role in the battle between Microsoft and the rest of big tech1. My belief is that the focus on search is missing the point. Rather, the value will be the millions of slightly-different workflows that people and businesses have. And over time, as AI is more tightly coupled with existing productivity tools (and as it improves), those with integrated AI stand to win over those without.
Clearly, Microsoft plans to be part of the former. They're adding AI to Dynamics 365:
In Dynamics 365, Microsoft’s launching what it calls Copilot (borrowing branding from GitHub’s Copilot service), which — broadly speaking — aims to automate some of the more repetitive sales and customer service tasks.
For example, in Dynamics 365 Sales and Viva Sales, Copilot can help write email responses to customers and create an email summary of a Teams meeting in Outlook. The meeting summary pulls in details from the seller’s CRM, such as product and pricing information, Lamanna says, and combines them with insights from the recorded Teams call.
They're powering image captions for Reddit:
Now, as a part of Microsoft’s broader, ongoing effort to commercialize its AI research, [the AI system] Florence is arriving as a part of an update to the Vision APIs in Azure Cognitive Services. The Florence-powered Microsoft Vision Services launches today in preview for existing Azure customers, with capabilities ranging from automatic captioning, background removal and video summarization to image retrieval.
...
Montgomery says that Reddit will use the new Florence-powered APIs to generate captions for images on its platform, creating “alt text” so users with vision challenges can better follow along in threads.
And that's just products that are actually in use. On the R&D front:
They've opened up a waitlist for Microsoft Designer, an image-generating AI.
They demoed a "ChatGPT for robotics", which lets you use natural language to describe tasks to a robot. For example, "build a tower only using the yellow blocks."
They've announced a multi-modal LLM, which is like ChatGPT that can work with text, images, and more. Apparently, it can “analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions.”
Clearly Microsoft has been working on this for a long time - it's not like they've built all this in the last few months. But it is certainly impressive to see a giant, stodgy company reinvent itself over the past few years.
FAANG free for all: Google
Of course, it takes two to tango. For as much as Microsoft has been firing on all AI cylinders, Google has dropped the ball2. Back in 2016 we had articles about Google "remaking itself as a 'machine learning first' company." CEO Sundar Pichai touted the "AI-first era" they were entering.
Which made sense! They helped define the modern machine learning tech stack. Their research group regularly put out cutting edge research papers. They owned Google Brain and DeepMind3, and planned to bake machine learning into their products to get even more ahead.
But now, the perception from the outside is that Google squandered its lead. And people are wondering how, exactly, that happened:
More than two years ago, a pair of Google researchers started pushing the company to release a chatbot built on technology more powerful than anything else available at the time. The conversational computer program they had developed could confidently debate philosophy and banter about its favorite TV shows, while improvising puns about cows and horses.
The researchers, Daniel De Freitas and Noam Shazeer, told colleagues that chatbots like theirs, supercharged by recent advances in artificial intelligence, would revolutionize the way people searched the internet and interacted with computers, according to people who heard the remarks.
They pushed Google to give access to the chatbot to outside researchers, tried to get it integrated into the Google Assistant virtual helper and later asked for Google to make a public demo available.
Google executives rebuffed them at multiple turns, saying in at least one instance that the program didn’t meet company standards for the safety and fairness of AI systems, the people said. The pair quit in 2021 to start their own company to work on similar technologies, telling colleagues that they had been frustrated they couldn’t get their AI tool at Google out to the public.
Part of the problem is that Google was the dominant player in ML, not to mention a tech giant at a time when the zeitgeist turned against tech. So any moves it made were subject to intense scrutiny.
In 2018, the company unveiled Duplex, an AI assistant that could call restaurants to make reservations or find out opening hours. The demos sounded shockingly good, and the AI would even make verbal tics like "um" and "uh," or take pauses. People freaked out. Also in 2018, Google's employees very publicly protested Project Maven, a Pentagon contract that (among other things) intended to use AI to improve drone accuracy.
In 2020, Google parted ways with Timnit Gebru, a prominent AI ethics researcher4. She refused to retract a paper on the biases and risks of software like LaMBDA, technology similar to what powers ChatGPT.
In the end, it seems like they ended up a bit gun-shy. Which honestly might have been a good call, given the size of the company. I've said before that Microsoft discovering the sharp corners of ChatGPT, in real-time, with millions of users, is wild. Having startups push the envelope means a limited blast radius of any experiment gone awry.
Unfortunately, the market doesn't reward you for being a responsible steward of new technology. From Bloomberg:
Senior management has declared a “code red” that comes with a directive that all of its most important products—those with more than a billion users—must incorporate generative AI within months, according to a person with knowledge of the matter. In an early example, the company announced in March that creators on its YouTube video platform would soon be able to use the technology to virtually swap outfits.
...
Google’s code red seems to have scrambled its risk-reward calculations in ways that concern some experts in the field. Emily Bender, a professor of computational linguistics at the University of Washington, says Google and other companies hopping onto the generative AI trend may not be able to steer their AI products away “from the most egregious examples of bias, let alone the pervasive but slightly subtler cases.” The spokesperson says Google’s efforts are governed by its AI principles, a set of guidelines announced in 2018 for developing the technology responsibly, adding that the company is still taking a cautious approach.
The current pace of AI development seems to involve a lot more "move fast and break things" than it used to.
CG(A)I
I don't have anything to say about this video, it's just really cool5.
No, that's a lie. I've got something to say6.
The jury's still out on whether generative AI is going to replace artists or not. Certainly it's currently appropriating their styles, and they're currently suing to stop that. But it's too early to know whether digital artists will go the way of the dodo. There's still going to be value in human-created art7.
That said, if there is an area that's likely to be disrupted by generative AI, VFX seems like a good candidate. It's an industry that's notoriously cheap, and looks for any and all ways to shave costs. Companies change locales on a regular basis to get tax breaks. And it's a commercial enterprise, which means the artistic value isn't quite as important as the project budget. I would not be shocked to see AI-generated VFX quickly become the norm. Some companies are already working on it, via AI-generated music videos and natural language VFX.
On a similar note, Corridor Digital recently created an anime sequence using rough live-action footage, green screens, and AI.
Many people are dunking on the quality of final output, but to me that's shortsighted. Does it look visually stunning? Not really, especially if you're a fan of anime. But Corridor Digital created something that was interesting and valuable to them for a fraction of the usual cost.
Back in 2011, the average 30-minute anime episode cost hundreds of thousands of dollars. And now it's doable with $500 of equipment and the right know-how. We now have capabilities that used to only be possible with the biggest of budgets.
Things happen
Roastedby.ai – “talk trash, have fun”. “I Coaxed ChatGPT Into a Deeply Unsettling BDSM Relationship.” Slack launches ChatGPT bot. Discord tests ChatGPT-powered bot, Clyde. Anthropic’s core views on AI safety. AI teaches itself to use an API. Rumors of GPT-4. Bing Chat has a secret “celebrity impersonation” mode. Noam Chomsky: The False Promise of ChatGPT.
I guess it’s, uh, MAANA now? GAMMA? MAMAN?
At least, that’s how it seems from the outside. In reailty, they employ tens of thousands of very smart people, and the narrative of “Google can’t innovate, their downfall is right around the corner” is a little too convenient for me.
DeepMind, to its credit, put out some really impressive AI demos, including beating world-class professional Go and StarCraft 2 players. But nothing available to consumers.
It’s a bit of a she-said, megacorporation-said situation. Gebru has claimed she was fired, but Google has claimed she wasn’t (and that her research wasn’t sufficiently rigorous).
Some context here - the video clips are from Studio Paranormal. It’s circulated online as being a product of Stable Diffusion, but Studio Paranormal haven’t officially said whether that’s correct.
¯\_(ツ)_/¯
The existence of modern art is all the evidence I need here. If you look at a $10,000 painting and think "I could do that," then why does it matter if a computer can do it too?