Discover more from Artificial Ignorance
How do you regulate a superintelligence?
A bottoms-up look at building oversight for ChatGPT.
What do deadly picnics, crash test dummies, and dreams of Hitler have in common?
The world is scrambling to try and regulate AI right now. Last week, Sam Altman testified in front of the US Senate. He acknowledged the risks and dangers of AI, and welcomed the idea of oversight.
AI regulation in the US is becoming more and more likely. Congress feels it missed the boat on regulating social media and wants to make up for lost time. Even OpenAI is at the point of suggesting governance rules for superintelligence.
So let's consider what that could look like, using ChatGPT1 as an example.
Thanks for reading Artificial Ignorance! Subscribe for free to receive new posts and support my work.
A big ol' disclaimer
Fortunately for everyone involved, I am not responsible for any sort of policymaking. While I have a technical background, I do not consider myself an expert in machine learning or public policy. Also, it's worth noting my bias here: I'm an American, and only familiar with existing US laws and agencies.
Anatomy of a ChatGPT
To illustrate how we might regulate AI, let’s take apart an example product - specifically ChatGPT.
Building a large language model like ChatGPT is a multi-stage process. At a high level, that looks something like this:
Gather training data for the model.
Design the model architecture.
Train the language model.
Evaluate and fine-tune the language model.
Refine the model with human feedback.
Productize the model for consumer use.
We’ll look at each step in more detail and identify where regulation could be effective. Along the way, there are historical examples that are worth drawing inspiration from. None of them are perfect analogies, but perfect shouldn’t be the enemy of good.
Remember that this walkthrough only applies to a single product - ChatGPT. Some, but not all, of the steps, apply to other machine learning models. A comprehensive approach would need to be tailored to multiple use cases.
1. Training data
The first step in building a language model is collecting the training data. Training data consists of millions or billions of examples we want the model to learn from. In ChatGPT's case, that's human language2. ChatGPT uses a combination of Common Crawl3, Wikipedia, millions of online news articles, and thousands of books as a training dataset.
Having high-quality training data is extremely important. If the model learns from wrong or inconsistent examples, the final output will be worthless. This could mean an regulation opportunity.
Limit training data to standardized, pre-approved training datasets.
Audit models using training data above a certain size threshold.
Audit all training datasets for copyrighted/harmful materials.
Developers must publish summaries/disclosures of training data.
Licenses to compile a dataset above a certain size.
In practice, I'm skeptical about our ability to regulate training data. Most of these ideas seem heavy-handed, considering most LLMs are trained on publicly available text. Banning language model datasets would effectively ban crawling the internet. Plus, organizations like LAION and EleutherAI make it even easier to build your own LLM by providing curated datasets.
It’s also worth noting that the size requirements for fine-tuning LLMs are rapidly coming down. New experiments demonstrate that we can used finished models like ChatGPT to fine-tune new models with far less data. It’s a bit like the 4-minute mile: now that we know it can be done, we’re finding easier and easier ways to do it.
2. Model architecture
The next step in training an LLM is designing the model architecture. This means all the technical implementation details: tokenizers, layers, attention heads, and decoders4. Most companies using LLMs will use off-the-shelf models - this step mostly applies to researchers and advanced AI companies.
Registration for models above a certain size/capability/impact threshold.
Approval for new/significantly modified architectures by independent third parties.
Developers must publish descriptions/summaries of their models.
Outright bans of certain model architectures.
Additional support/funding for model transparency research.
One of the biggest challenges with regulating models is how much of a black box they are. Neural networks are “just” billions of numbers that use math to create coherent outputs. But there is research to make AI models more transparent, which could use additional support.
However, the speed of progress makes it difficult to set hard numerical limits. With each week, we’re seeing big improvements with fewer and fewer resources. So size thresholds that define “state-of-the-art” today are unlikely to stay that way.
In the early stages of model development, most of the harms are still theoretical. That means most of the actions involve either registration or disclosures. While not a perfect analogy, we can look at how a different industry works with disclosures: food labels.
Consumer labeling: The picnic that killed a President
Food labeling in the US began in the 1850s, after widespread fears of food poisoning. One highly publicized case was President Zachary Taylor, who fell ill and died after eating raw fruit and milk at a picnic.
In the 170+ years since, we've created hundreds of laws and multiple agencies to manage food safety. The USDA and FDA help create and enforce rules for production, handling, and sale. Standards like health inspections and sell-by dates are thanks to these many laws. And behind the scenes, there’s an entire industry dedicated to regulating what goes into what we eat.
One of the most visible standards is the Nutrition Facts label, intended to help customers make better dietary choices. The label includes serving size, calories, percent daily values, nutrients, and an ingredients list.
Margaret Mitchell has proposed an equivalent for machine learning models: model cards. Some of the suggested features for a model card include:
Model details: basic information about the model.
Person/organization developing the model.
Model date and version.
Model type. For example: linear regression, decision tree, neural network, etc.
Training algorithms, features, and parameters.
Usage guidelines: the envisioned use cases of the model.
Primary users and use cases.
Out-of-scope use cases.
Metrics: the potential real-world impacts of the model.
Performance metrics. How well the model scores on accuracy, precision, recall, or other benchmarks.
Fairness metrics. A measure of the model's fairness or bias.
Datasets: details on the data used to create the model.
Training data. The size, source, or other aspects of the training data. Any preprocessing steps and any known biases or limitations of the data.
Evaluation data. Like training data, but a look at the dataset(s) used to test the model's performance.
An overview of ethical considerations taken into account during the model's development. This could include privacy and data consent, fairness in model outcomes, and potential model abuse.
Once you've gathered the training data and designed the model, the next step is to train it. This means running the neural network billions of times to learn from the training examples. For large models, the server costs are very expensive - by some estimates, it costs almost $5 million just to train ChatGPT.
Limit the number of CPU/GPU time spent on training a model. This could be enforced at the cloud platform or data center level.
Limit the dollar amount spent on training a model.
Register training sessions with an independent third party, with documentation on training data and architectures used.
Require certification before model training and recertification before significant retraining.
The high cost of training for state-of-the-art models lends itself well to gatekeeping. But as we've discussed, setting a numerical threshold is difficult. The training required for models gets lower weekly as new techniques are researched.
That being said, this is one of the key control points for cutting-edge models. Training is a costly, one-time step5. It would be hard for US companies to hide massive training runs - only so many GPUs are available, particularly supercomputer GPUs.
4. Evaluation and fine-tuning
Once a language model has been trained, it must be tested and fine-tuned for its intended use cases. For a language model, that could mean asking it to complete a sentence and score its accuracy. For other models, it could mean testing it on image recognition or audio generation tasks.
Regardless of the use case, most models have an evaluation dataset different from the training dataset. It’s important that these are kept separate - otherwise, the model has already seen the answers to the test!
Create industry-wide evaluation standards/requirements.
Testing and certification done by an independent third party.
Require retesting after significant retraining.
Model developers must document/publish their fine-tuning process.
Once the model has been trained, we move past hard and fast rules and get into fuzzier, qualitative territory. In the case of fine-tuning, there’s an opportunity to do “QA testing” once the models are finished. Agencies or auditors could perform independent testing of models using a non-public methodology to avoid gaming - something like the Consumer Price Index (CPI).
Considering how many developments are now being made by companies instead of researchers, adding industry oversight seems like a potential option. We can learn from the multiple agencies that are in charge of governing another vertical: motor vehicles.
Industry oversight: From airbags to AI
At a high level, cars and trucks are a great case study. They're complex machines, necessary for modern society, and have a huge amount of oversight. There are multiple agencies involved, including the National Highway Traffic Safety Administration (NHTSA), the Environmental Protection Agency (EPA), and the Department of Motor Vehicles (DMV).
We didn’t create this framework overnight - it took nearly 100 years. But the end result is an industry that works quite well at improving vehicle safety, given how many miles we drive every day. What that looks like in practice:
Design: The NHTSA sets and enforces rules around car design. The Federal Motor Vehicle Safety Standards (FMVSS) cover construction materials and safety features.
Safety & performance: Post-design, the FMVSS also requires various safety and performance tests. They include crash protection, rollover resistance, tire/brake performance, and airbag/seatbelt performance.
Emissions: Outside of the NHTSA, the EPA sets vehicle emissions and fuel efficiency standards.
Infrastructure: Over the decades, we've set up shared laws and spaces to use cars. You can’t park wherever you feel like, nor can you drive at any speed you want.
Licensing: To drive a car, you must pass multiple tests and practice for dozens of hours. You also need to register your car with the state and buy insurance.
Recalls: If a car model has a defect that poses a safety risk, the NHTSA can issue a recall.
Some members of Congress have proposed a new AI oversight agency. Others believe existing agencies like the FTC should be responsible. But having a centralized organization to set standards seems like an effective way to protect consumers.
Consider the current mashup of AI and motor vehicles: self-driving cars. While self-driving cars are an amazing advancement, new vehicles still had to follow existing traffic and safety laws. And while initial testing was a bit of a wild west, the NHTSA stepped in to provide guidelines and best practices for developers.
5. Providing human feedback
One somewhat unique step to ChatGPT (and other chatbots) is RLHF: reinforcement learning from human feedback. Out of the box, most large language models are not particularly useful to consumers. They're autocomplete on steroids. So if you say, "Write me a poem," the LLM will interpret that as a sentence to finish rather than a task to perform.
RLHF is what takes the model from autocomplete to assistant. It's a process where humans rate responses, and those ratings help fine-tune the model to give more helpful answers.
Create industry-wide alignment standards/requirements.
Post-alignment testing and certification done by an independent third party.
Accreditation/training for the workers giving feedback.
Model developers must document/publish their alignment criteria.
RLHF is a very new process, so there isn’t much known about how it can be tested or standardized. But OpenAI has found that it leads to better instruction following, fewer hallucinations, and even slightly less toxicity.
Unfortunately, we don’t know much today about how RLHF is being done inside of companies. OpenAI published some early papers, but has since stopped sharing their research. Anthropic, which long-touted its “Constitutional AI principles,” only just revealed those principles. There aren’t any shared standards or expectations around preventing misinformation or bias. But as fuzzy as this process is, it does strike me as one of the key ways to try and “get things right,” for language models.
The last step is taking the model from a prototype to a product. This is your standard web or mobile application development. Building UIs, APIs, and other ways of letting users interact with the AI model.
Approval and registration before publicly launching AI products.
Ban certain types of model interactions, such as internet access.
Require ongoing risk assessment and mitigation reports.
Investigate and report any model harms, and potentially recall products.
Make product developers liable for damages from AI models.
Software harms are unique by virtue of how fast they can scale. Unlike physical product failures, which tend to happen over time, software flaws can be exploited and leveraged in a matter of minutes.
So once a model has been productized, it’s difficult to prevent damage other than via outright bans. While we can use investigations and liability to incentivize companies, the cat is mostly out of the bag at that point.
If we aren’t able to enact regulation before the productizing step, then it would likely be up to companies to self-regulate how they build models instead. If I’m honest, I don’t have a ton of faith that they would do a good job. But, the scientific community offers some hope with how they tackled the advancement of genetic editing.
Self-regulation: Jennifer Doudna’s Hitler dream
In 2015, Jennifer Doudna had a dream about Hitler that would change the trajectory of genetic engineering research. Doudna, a professor of biochemistry and the inventor of CRISPR, had just made an incredible breakthrough. But she was still troubled:
“I ended up having several dreams that were very intense… where I walked into a room, and a colleague said: ‘I want to introduce you to someone, they want to know about CRISPR.” And I realized with horror that it was Adolf Hitler. And he leaned over and said: ‘So, tell me all about how it works.’ I remember waking up from that dream, and I was shaking. And I thought: ‘Oh, my gosh. What have I done?’”
– Jennifer Doudna, UC Berkeley Professor of biochemistry
After the nightmares, she publicly called for a halt to human gene editing. She organized meetings with peers and colleagues about the dangers of unleashing this technology. Eventually, that led to the 2015 International Summit on Human Gene Editing.
The summit, composed of the best in the field, decided on a self-imposed moratorium on gene editing until further risk evaluations could be done. And this moratorium has held up, with international bodies helping to enforce the agreement.
It’s worth examining the differences between the machine learning and biomedical fields. While both require significant study and training, there’s a large gap when it comes to formal ethical expectations.
All research with human subjects must pass an ethics review board.
Scientific journals often have ethical guidelines for the research they publish.
While not all researchers are doctors, all doctors take the Hippocratic Oath.
I’m not convinced that AI is going to have its “CRISPR moment” anytime soon6. AI breakthroughs are now being made by corporations, not academic institutions. Even if they wanted to pause development, there’s a duty to shareholders to pursue the additional profits.
But like genetic engineering, AI is going to need international cooperation on top of domestic regulation. We’re still getting started, but there is some promising legislation to keep an eye on.
What regulation exists today?
In the US, not much has made it into law. However, members of Congress have proposed nearly two dozen potential bills since 2020 that would impact AI development. Many are targeted at Big Tech more broadly, but some are AI-specific. They cover the creation of new agencies, model transparency, data protection, product design, public benefit, and market competition.
Without bipartisan effort, they’re unlikely to move forward. But there are few to watch:
Algorithm Accountability Act: Requires generative AI models involved in “critical decisions” to assess impacts and perform ongoing testing.
Digital Services Oversight and Safety Act: Mandates companies to develop risk mitigation plans for harmful content and discriminatory behaviors.
On the international stage, there’s somewhat more momentum. Ongoing efforts include:
The OECD’s AI principles.
The EU AI Act.
The EU AI Act is by far the most comprehensive and impactful legislation. The most recent updates include the following provisions:
Registration for “high-risk” or foundational AI models.
Third-party risk testing with to-be-determined benchmarks. Retesting would be required after substantial training changes.
Launching a model would require a permitting and review process.
Fines of up to 4% of a company’s gross world revenue, or €20 million for individuals.
Open-source models would not be exempt.
Companies based in the US would not be exempt (if they allow EU access).
Making sense of the options
In looking at the process of building ChatGPT, we’ve touched on three different approaches to regulation: consumer disclosures, industry oversight, and self-regulation.
Truthfully, I don’t see consumer disclosures as being very impactful. While they would help thoughtful regulators do their jobs, most people I know don’t pay too much attention to food labels. And that’s with names and numbers they understand - once we start documenting billions of parameters or dozens of attention heads, most people will be clueless.
Self-regulation, on the other hand, feels too good to be true. The world is now keenly aware of AI’s potential and its commercial applications, and we’re in an arms race that isn’t likely to defuse itself. And we’ve seen the results of the last decade-plus of social media self-regulation. But there may be an opportunity to develop international standards that help bring companies in line. The EU AI act may become the de facto world standard without competing legislation.
When it comes to the US, industry oversight seems like the most likely strategy. It won’t be the only one - we’ll likely see elements of many approaches combined. But we have examples of how that could work well. Cars continue to add features and technology, but do so in a safe way. There is neither a moratorium on new car development, nor a push to remove all speed limits.
Regardless of what laws get passed, we won't get it right on the first try. It will undoubtedly get political. Many will call it anti-innovative or un-American. But to continue the vehicle metaphor, I like thinking about policy as a train station, not an airport. There might not be a direct route to your final destination, but that doesn't mean you should abandon the trip entirely.
Instead, take the route that moves you in the right direction and then decide on the next step of the journey. And understand that any approach to regulate something as complex and impactful as AI is going to take a lot of time.
For other models, that might mean images or audio samples.
Common Crawl is an open repository of internet data that's free for anyone to use. But one of the issues with using Common Crawl is the data quality - OpenAI has said they clean and filter the dataset before using it.
Recent advancements like LoRA are making it easier to tweak models without significant retraining, so enforcements at the training step wouldn’t be a silver bullet.
The closest we’ve come to this is the open letter calling for a 6-month halt on new LLM research; so far, it hasn’t had much impact.