AI has a CSAM problem
LAION-5B is a massive image dataset companies like Stable Diffusion use to train their image generation models. This week, Stanford researchers found over one thousand images in the dataset were instances of child sexual abuse material (CSAM).
Between the lines:
The relative percentage of CSAM images is small - there are over five billion images in the entire dataset.
Even so, the dataset has been taken down "out of an abundance of caution" - as it's unclear whether distributing the dataset as a whole might constitute distributing illegal materials.
But this will be an ongoing problem for companies who train models scraped on public internet content - there is far too much CSAM on platforms like Facebook and Twitter, despite attempts to moderate it.
Elsewhere in problematic AI content:
Twitter is showing ads for apps that use AI to "undress" women in photos, while TikTok and Meta are blocking search terms related to said "nudify" apps.
Etsy, despite its anti-porn policy, has merchants selling AI-generated porn, including deepfaked images of dozens of celebrities.
And Facebook is currently flooded with AI-generated images that are poor imitations of genuine artists' work.
A peek at Apple's AI strategy
Apple recently released a new paper that could suggest how the company is thinking about AI. In a nutshell: LLMs optimized to run with far less memory (e.g., on an iPhone).
Why it matters:
Apple has remained out of the spotlight with AI this year - although the company has done plenty of work with machine learning, it has conspicuously avoided any splashy "AI" announcements.
Earlier (unconfirmed) reports suggest that the company is working on LLM-based projects internally, with mixed messaging on how far along those projects are.
That said, the company is uniquely positioned to launch a privacy-first, smartphone-native AI assistant - unlike Microsoft or Google, it has extensive expertise in both hardware and chip manufacturing.
Elsewhere in the FAANG free-for-all:
Microsoft Copilot can now compose songs, including lyrics and voices, via a new integration with AI music app Suno.
Google plans to limit election-related searches that Bard responds to starting in 2024, in preparation for elections worldwide.
And OpenAI has suspended ByteDance (the parent company of TikTok) after reporting that ByteDace used ChatGPT's API to develop its own LLM, dubbed "Project Seed."
People are worried about AI job loss
An ongoing theme this year has been the fear of job loss due to AI. It's clear that the workplace will be majorly impacted - the big questions are to what extent and how quickly.
By the numbers:
A recent survey from ResumeBuilder indicates that 37% of business leaders used AI to replace workers in 2023 - though most big brands haven't yet created any AI-specific roles.
Some firms, like Deloitte, are trying to avoid over-hiring in the first place by using AI to automate repetitive tasks (and ideally avoid mass layoffs down the line).
Meanwhile, according to Asana's "State of AI at Work" report, 36% of employees use AI weekly, and 29% of their work is replaceable by AI.
Elsewhere in AI anxiety:
OpenAI updated its safety guidelines for AI, saying that the board can hold back a new model's release even against objects from leadership (including the CEO).
Amazon's AI-generated product review summaries can sometimes misrepresent products or exaggerate negative feedback, hurting sales.
And Rite Aid has been banned from using facial recognition systems for five years as part of a settlement with the FTC.
Things happen
Only three states have enacted laws concerning deepfakes in political campaigns. Bill Gates: "AI is about to supercharge innovation." Creators and porn stars alike turn to AI doppelgangers to keep fans entertained. South Korea's AI industry wants to compete with US giants. Midjourney v6 starts rolling out. UK Supreme Court rules AI can't be a "patent inventor." OpenAI's publishing deals are a hedge against a future walled-garden internet. New AI beats humans in a marble-rolling Labyrinth game. AI fake news is creating a "misinformation superspreader." Anthropic adds expanded legal protections to its API. Details on Stability AI's Membership program. How RAND Corporation helped craft Biden's AI EO. CeruleanAI is helping save coral reefs. ChatGPT conversations can now be archived. Nvidia CEO: "We bet the farm on AI and no one knew it." AI is owned by Big Tech. Interviews with Andrew Bosworth, Andrew Ng, and Fei-Fei Li.
My prediction for 2024 is that we'll see a proliferation of smaller language (and vision) models and active research into efficient attention mechanisms or alternatives. We've already seen the latter progress this year.