3 Comments
User's avatar
Daniel Nest's avatar

I've actually been wondering just how difficult it was to "plug into" the underlying LLM model from a coding perspective.

In my simple brain it was about "just" making some boxes for input that send it to the model and "just" some UI to make the output formatted to whatever the interface/app is supposed to do.

So I kind of assumed switching which model you communicate with (Claude, GPT, or other) was a relatively simple tweak. Sounds like it's a lot more complex and the degree of lock-in isn't trivial.

By the way, I'm quite sure Midjourney isn't built on top of Stable Diffusion. There was a brief period in late last year when Midjourney V3 had a "--beta" parameter that generated images using a separate Stable Diffusion-bases model. Then Emad (Stability AI) miscommunicated / oversold what was happening, which is where the confusion came from. They also dropped SD altogether for V4 and onwards. (More good discussions about this in the comments here: https://www.reddit.com/r/StableDiffusion/comments/10liqip/if_midjourney_runs_stable_diffusion_why_is_its/)

Then again, since the MJ model is a black box, it's hard to have a definitive read on it.

Expand full comment
Charlie Guo's avatar

Originally, most LLMs had the same input/output. Since they’re completing text, you would just provide the starting text and they would provide the completion. And swapping models was very straightforward.

Chatbots are a different paradigm though. Now ChatGPT has system, user and assistant messages, whereas Claude requires you to format your prompt like “User: ... /n/nAssistant:” and parse the output. I haven’t found an official Bard API, but the unofficial one has a simple prompt/response format. And then adding functions creates another wrinkle. It would not be the most Herculean task to switch now, but with the speed that OpenAI ships it’s almost guaranteed to get harder in the coming months.

I did not know that about Midjourney! All of the reporting I had seen suggested it was an SD fork, but I guess V4 and V5 have been totally different. Honestly that’s even more impressive.

Expand full comment
Daniel Nest's avatar

Ah, that's an interesting insight. So I wasn't too far off in my original assumption. Curious that things are getting more complicated rather than simpler - but I guess with all the added capabilities, the simple input/output text box is no longer sufficient.

Expand full comment