Discussion about this post

User's avatar
Michael Spencer's avatar

My LinkedIn feed is now full of some arcane and obscure open source model I'll likely never hear about again. Since when did leaderboards become a thing engineers actually cared about?

Expand full comment
Daniel Nest's avatar

Great clarification regarding the Chatbot Arena. It pops up in my feed regularly (the last time with the Gemini Pro news), but I never tried digging further, so I never realised it measures interactions with real humans in a "non-lab" scenario. Looking more closely, I can see that Bard "only" has 3K votes compared to GPT-4's dozens of thousands (for every measured version). So the confidence interval is quite different and we may potentially see Bard slip as more votes roll in.

I've now heard from a few people about Bard being much improved lately. So I went and tried it yesterday and didn't quite have an "Aha!" moment of seeing major leaps forward. Then again, I rarely use Bard, so my point of reference isn't that great. (ChatGPT Plus with GPT-4 Turbo did give me better Mermaid code when I compared its output with Bard.)

I appreciate the mention, by the way!

Expand full comment
8 more comments...

No posts