Wow Development Quality Assurance ’s Post

Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.

View profile for Ross Dawson, graphic
Ross Dawson Ross Dawson is an Influencer

Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation

Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics