Many LLM projects, whether you're chatting with a PDF or building an agent, start off by using OpenAI's models. It makes sense - ChatGPT is the most well-known LLM right now, and OpenAI has made their APIs extremely easy to work with.
At some point though, you may find yourself wanting to experiment beyond the base models. Maybe because of costs or latency, or perhaps you have a more niche use case. In any case, one of the most popular ways to do that is by fine-tuning your own foundation model. Many people have fine-tuned versions of Llama 2, Meta's openly available model.
But fine-tuning open-source models often involves digging into dense machine learning libraries, and finding platforms to deploy and host your fine-tuned model. So for now, let's explore an easy way to fine-tune the base version of ChatGPT - that is, GPT-3.5 Turbo. We’re going to target a very specific goal: I want to make ChatGPT talk like a pirate, without having to prompt it to do so.
Along the way, we'll touch on how fine-tuning works, and how to evaluate whether it's the right fit for you.
What is fine-tuning?
Modern LLMs are, generally speaking, pre-trained - that's the "P" in "GPT". Before being released by OpenAI, the model weights of ChatGPT were trained on billions of pieces of text in order to better predict human language.
But often, that general-purpose pre-training isn't as good at specific use cases as we would like. That's where fine-tuning comes in: given a (relatively) small number of examples, we can tweak the model weights slightly and create a tailored version of the model. ChatGPT was fine-tuned on examples of conversation and dialogue, which helped create its signature assistant style. CodeLlama, a variant of Llama 2, was fine-tuned on code generation examples.
As a developer, you can do additional fine-tuning to customize ChatGPT for your specific use case - usually to “bake in” a lot of additional context, or change the domain of the model to something other than what it was originally trained for.
Fine-tuning works by providing a curated set of brand new training examples (and ideally some testing examples to compare against). Each example is an input-output pair, showing how the model should respond for a given prompt. The pre-trained model is then trained on the new data - the weights of the model are updated to better predict the training examples.
In a production setting, the fine-tuning process requires substantial hardware and expertise - which is why we're taking a much leaner approach.
The GPU-poor man's approach
To do our fine-tuning, we're going to do use a cloud platform to train and host our model. There are a few reasons why:
I don't have access to tons of RAM or GPUs. I am a GPU-poor.
Writing PyTorch code can be intimidating for beginners, and is beyond the scope of this tutorial.
Even if I fine-tuned a model locally, I would still need to deploy it to a cloud platform in order to interact with it.
Luckily, OpenAI has APIs that make it really straightforward. For open-source models, there are many companies trying to provide this same service - if you've got a platform that you've tried and enjoyed, leave a reply or a comment!
But one of the reasons I like OpenAI is that their approach to fine-tuning is dead simple:
Install the OpenAI Python library (
pip install openai
)Get your API Key.
Create a training data file.
Create a fine-tuning job from the file.
Wait for the fine-tuning job to complete.
That's it! You can now use the fine-tuned model to create completions.
Here's what that looks like in 4 lines of code: