Recently, a post by
crystallized a lot of things that I've been thinking about with AI and programming. The entire post is worth a read, but here is the most relevant bit:I think software engineering will spawn a new subdiscipline, specializing in applications of AI and wielding the emerging stack effectively, just as “site reliability engineer”, “devops engineer”, “data engineer” and “analytics engineer” emerged.
The emerging (and least cringe) version of this role seems to be: AI Engineer.
Personally, I strongly identify with this concept. I have a lot of experience as a full-stack developer and relatively little with machine learning. I haven't spent thousands on Google Colab or had my name attached to arxiv papers. But I've built more prototypes for myself in the last six months than I had in the previous six years.
I believe the foundation models from OpenAI, Stability AI, and others are a new platform to build on. Compare AI with the adoption of mobile phones: when they first arrived, businesses made smaller website layouts and called it a day. Now though, every company that can afford it hires an iOS/Android engineer to build a custom app. Something similar is happening in AI. Existing companies are trying to figure out how to tailor their applications to integrate LLMs, and new companies that could not have existed before are springing into existence.
So I asked myself - what would an AI engineering curriculum look like? From foundational concepts to key projects, what would it take to excel in this emerging discipline? Because while the nascent field is small, I have a sneaking suspicion it will keep getting bigger.
Multiple tracks
Right now, I believe the field is large enough to support multiple tracks for AI engineering, separated by medium. The largest is "text," which focuses on large language models and their use cases. Much of human language relies on text, so, naturally, there are more use cases for it.
But there is another "multi-modal" track as well. Foundation models for images, speech, and potentially videos and music offer new opportunities. Mixing and matching these models, along with LLMs, requires additional skill sets. And over time, the boundaries will likely blur as products like ChatGPT begin working with images and audio. But each foundation model means the ability to specialize as we build tools and techniques.
I'll focus on text and LLMs for the rest of this post, but the ideas can all be translated over to different mediums.
The syllabus
Most people still consider AI Engineering as a form of either Machine Learning or Data Engineering, so they recommend the same prerequisites. But I guarantee you that none of the highly effective AI Engineers I named above have done the equivalent work of the Andrew Ng Coursera courses, nor do they know PyTorch, nor do they know the difference between a Data Lake or Data Warehouse.
– “The Rise of the AI Engineer” from
.Like any curriculum, there are foundational building blocks that you need to master. But the basics here seem more analogous to web development than machine learning - simple concepts that work up to models/frameworks, tools, and projects. A web developer would start with the basics of HTML/JS, then learn a framework, then learn third-party libraries and tools, and then can complete a full-stack project. In my view, there’s more emphasis here on Python and JS scripting than on matrix multiplication.
Concepts:
Large language models: what a large language model is and how it works. At a high level (i.e., without going into transformer architecture), how they're built and trained, and most importantly, their limitations. It's critical to understand a) that they are not magic or conscious and b) that they hallucinate regularly.
Embeddings: what embeddings are and why they're useful. Consider the different types of projects you can build with embeddings, including search, clustering, and summarization.
RLHF: what RLHF is and the impact it's had on LLMs. A look at how OpenAI has used instruction tuning to create a much more useful model and how Anthropic uses Constitutional AI to take this to the next level.
Prompt engineering: best practices for prompting LLMs. A review of techniques like chain-of-thought and personas and the difference between "completions" and "chat completions."
Models:
GPT-4. As GPT-4 is still the most advanced model, it should be mastered first. Considering it also can call functions and will soon be able to browse the web via API, it has the most potential for applications.
Claude/Bard. Beyond OpenAI's model, it's also worth looking at one or more competitor models to know their capabilities and how difficult a technology switch would be. The top two GPT-4 competitors are Claude and Bard, but that's likely to change.
LLaMa (and its descendants). There is a Cambrian explosion of open-source LLMs, largely due to the leak of Meta's LLaMa. While their usage is still a legal gray area, Meta plans to release a commercially usable version soon. While not as advanced, open-source models are an important option for many businesses that can’t send sensitive data to OpenAI.
Tools:
LangChain/Guidance. Tools for building abstractions on top of LLMs. They allow for more complex prompt workflows and make it easier to change between different models.
LlamaIndex. A framework for integrating data sources with an LLM. It can connect to APIs, documents, and databases for retrieval and querying.
Pinecone/Weaviate. Popular options for a fully-managed vector database. Used with AI applications and compatible with OpenAI, Anthropic, Cohere, and Hugging Face embeddings.
Whisper/DALL-E/ElevenLabs. Additional APIs that can enable transcription, image-generation, and voice-generation. Useful for building multi-modal applications such as smart assistants.
Projects:
Document chatbot: A chatbot that can take files and PDFs and answer questions about their contents.
Email summarizer: A browser extension that can summarize Gmail threads.
ChatGPT plugin: A plugin that provides ChatGPT access to a third-party API.
Basic agent: A simple agent that can break a high-level task into sub-tasks, then execute them individually.
Smart assistant: A smart assistant capable of listening to questions and generating spoken answers.
Model fine-tuning: As an optional project, fine-tune a language model such as GPT-3 for a specific use case.
Key philosophies
Build, don't train. For AI engineering, the value lies in building atop existing models, not training new ones from scratch. Find products that work out of the box rather than spending time and resources on in-house models. That being said, it may not be possible to avoid training entirely, depending on your use case. Fine-tuning LLMs and diffusion models can add value in narrow scenarios, and it's becoming ever easier with consumer-grade hardware. But the focus should be on writing application code, not running training cycles.
BYOK: bring your own keys. One of the best features, in my opinion, of recent AI applications is the ability to supply my own API keys. This doesn't make sense for every kind of application, but when it does, I appreciate this approach for multiple reasons:
It's clear upfront which models are supported under the hood, and ideally, users can choose between models.
Customers are aligned on usage costs (no more "buy AI credits"), and the business doesn't have to eat the cost of running the model.
Users get the benefits of portability and history; though not even API supports this, in the future, I expect more APIs to let users keep track of their history and share previous completions.
Find your AI stack. Because everything is so new, there isn't a battle-tested tech stack like LAMP or MEAN for AI engineering. So for now, developers ought to figure out what the components of their stack should be (e.g., vector database/embedding model/language model/prompt framework), then understand what tools are available for each layer of the stack. A potentially controversial opinion here - prefer closed-source products first and only use open-source if necessary. The best models are still proprietary (GPT-4, Midjourney, ElevenLabs), which means the best applications should be built on top of them.
Stay nimble. Above all, the most important thing when building is to stay nimble. As difficult as it is to keep up, the space is moving so fast that working with any given model can feel like building on quicksand. But until the dust settles, staying agile with lightweight integrations with the various large language models is better. And if you have the resources, have a migration plan to avoid vendor lock-in.
Open questions
This is my shot at a first draft, but I'm still considering several questions - and I'd love your thoughts and answers!
What do you agree or disagree with?
What concepts or tools should be included?
What projects would you want to see?
Would this be worth turning into an actual course?
Skills needed to build a cloud based enterprise system that is secure, scalable and performant. Also, there’s a whole real-time aspect to interacting with this type of solution and inferencing has to be well designed.
great thinking! will add this to my list, coming up with a syllabus/defined set of skills for the AI Engineer is the most requested followup and something I purposely left open for others to weigh in haha