Wow Development Quality Assurance ’s Post

View organization page for Wow Development Quality Assurance , graphic

17,799 followers

Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.

Ross Dawson

3w Edited

To view or add a comment, sign in

More Relevant Posts

Ross Dawson Ross Dawson is an Influencer

Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation
3w Edited
Report this post
Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.
14 Comments
Like Comment
To view or add a comment, sign in
Lava Kafle
3w
Report this post
Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.
Ross Dawson Ross Dawson is an Influencer

Futurist | Board advisor | Global keynote speaker | Humans + AI Leader | Bestselling author | Podcaster | LinkedIn Top Voice | Founder: AHT Group - Informivity - Bondi Innovation
3w Edited

Despite all the talk of slowing AI progress, models keep leapfrogging. Google's Gemini Exp-1121 is now #1 almost across the board. A few thoughts on the latest Chatbot Arena leaderboard results: 📈 Our baseline expectations for AI advances has become the staggering pace of the last two years. If it is slower, that doesn't mean that it has stopped, or even that it is slow, just possibly not breathtaking. 🐸 The last year has seen rapid leapfrogging of models on the leaderboards, with OpenAI, Anthropic, and Google trading places at the top, and lots of jostling between the top open source models. Every leapfrog is an advance in capabilities. ✨ OpenAI's o1-preview will be followed by the full o1 model, which may be the same as its Orion model supposed to be released next month. Anthropic's Dario Amodei has not committed to its mooted lead model Claude Opus 3.5 being released, but it is likely. Llama 4 is due out soon. 🪐 Grok is already doing very well, with xAI being on the verge of releasing a consumer app (news just out - link in comments) and putting its record-breaking build of 100K GPUs into production. ⚡Chinese open source model Yi-Lightning (from Kai Fu Lee's O1.ai) is doing very well, with many other Chinese open source models such as Qwen and Deepseek performing strongly a little bit down the table. 📉 Claude Sonnet 3.5 - recently at the top of the leaderboard - is now just tenth, showing how fast things move. It's a fabulous model in many ways, but on some measures, many are beating it. Note that if you want to give the top Gemini model a whirl you can do it for free in Google's AI Studio. More digging into the pace of LLM advances coming up.
Like Comment
To view or add a comment, sign in
Sanskar Sharma

Digital Marketer | SEO | AI | Helping businesses grow online through AI and Marketing
2mo
Report this post
Is OpenAI's December AI Model Launch Just a Hoax? Sam Altman Says 'Fake News'—What's Really Going On? The buzz surrounding OpenAI’s potential launch of its next-generation AI model has everyone speculating about what’s next in the tech landscape. December is being thrown around as the launch month, but Sam Altman, OpenAI's CEO, has dismissed these claims as “fake news.” It’s a classic case of mixed signals in the tech world, where anticipation can often blur the lines between fact and fiction. There’s no denying that AI is rapidly evolving, and with each iteration, the capabilities seem to expand exponentially. The excitement surrounding new models is palpable, especially considering the advancements we've seen in recent years. However, it raises an interesting question: how much of this speculation is grounded in reality, and how much is just hype? In my opinion, the best approach is to keep a critical eye on announcements and not get swept away by the frenzy. While it's easy to get caught up in the excitement of new tech, history has shown us that delays and changes in plans are quite common in the industry. So, while we might be eager to see what OpenAI rolls out next, perhaps it's wise to temper our expectations until we have something concrete. Ultimately, whether the December launch is a reality or just another rumor, the conversation around AI's future is crucial. It keeps us engaged, curious, and informed about the rapid advancements that could reshape our world. What are your thoughts on this? Are you excited about the possibilities, or do you think we should hold off on the celebrations until there’s official confirmation? Follow for tech and AI updates
Like Comment
To view or add a comment, sign in
Antonin Carette

Head of R&D and Senior Software Developer
5mo
Report this post
While everyone is excited about new (VERY) large models of LLMs I become more skeptical than before about their impact and how they will evolve in the next months / years... It seems the only improvement for LLMs now is to get BIGGER to get a tiny bit stronger [and I personally think that this is getting a bit ridiculous now]. Looking at latest OpenAI discussions & releases, I suspect this is why OpenAI is going more in the direction of optimising for price / latency / ... with their newer model(s). Presumably, they might found out that next models can't really get all that much better than what we're approaching now. Once you've reached the inflection point you can compete on the ratio price / performance, and not only on performance. However, looking at the conclusion of the latest paper of Meta about LLaMA 3.1, it seems they found out new ideas for their future versions of LLaMA models. I personally think that working on smaller solutions for exact same results / performance would allow more innovation, creativity, or reducing energy cost, and allowing "AI/ML on device" instead of cloud-only solutions. Also, reverse-engineering and understanding small very good solutions might provide more understanding and more experimentation (something that I think we are deeply missing in the latest AI/ML research articles I have read unfortunately). Only the next months will tell us if we are slightly reaching a saturation or not.
Like Comment
To view or add a comment, sign in
Amir Hartman

| Helping leaders embrace AI, and organizations innovate with AI | President Enterprise AI Solutions | Keynote Speaker | Author of "Leadership in the Loop: AI Readiness for Today’s Leaders" |
1mo
Report this post
🌟 Big advancements in AI are here! Google's new Gemini-Exp-1114 model topped the Chatbot Arena leaderboard with a score of 1344, outperforming competitors like OpenAI's GPT-4o in various tasks, including math and creative writing. 🎉 I recently had the chance to try it out, and I’m really impressed by the improvements in reasoning and response quality. While I’m not sure if it’s enough to make me switch just yet, it’s definitely worth exploring! Want to see what all the buzz is about? You can access Gemini-Exp-1114 for free on Google AI Studio. Simply sign up, head over to the preview section, and start experimenting with this state-of-the-art model. Don’t miss out on the chance to explore its impressive capabilities— let me know what you think, and what you can create! 💡 #AI #Gemini

Google drops new Gemini model and it goes straight to the top of the LLM leaderboard

tomsguide.com
Like Comment
To view or add a comment, sign in
Skunpoj Thanarojsophon

Generative AI, RAG, Web App Dev, DevOps, Cloud Infra, ICO and Startups. Assistant Director @ Bank of Thailand
7mo
Report this post
New AI search engine Upend emerges from stealth, powered by 100 LLMs https://lnkd.in/gGP7vBEr

New AI search engine Upend emerges from stealth, powered by 100 LLMs

https://venturebeat.com
Like Comment
To view or add a comment, sign in
Artificiality

291 followers
7mo
Report this post
Facts & Figures on AI and Complex Change: May 26 2024 - 22%: Percentage increase in app revenue for OpenAI on the day of the GPT-4o launch—GPT-4o is free, except on mobile. (AppFigures) - $50,000,000: Annual licencing payment expected from OpenAI to News Corp for the next five years for content from the Wall Street Journal, Barron's, Marketwatch, the New York Post, etc. (Wall Street Journal) - $10,000,000: Annual licencing payment expected from OpenAI to Axel Springer for the next three years. (Wall Street Journal) - $5,000,000-$10,000,000: Annual licensing payment expected from OpenAI to the FT for an unknown number of years. (Wall Street Journal) - 262%: Percentage increase in NVIDIA's quarterly revenue, as compared to the prior year quarter. (NVIDIA) - 40%: Percentage of NVIDIA's data center revenue attributable to inference—aka prompt & response. (SeekingAlpha) - 54%: Percentage of male applications with an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University) - 28%: Percentage of male applications without an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University) - 54%: Percentage of male applications with an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University) - 32%: Percentage of male applications without an 'AI in business' course credit who received an interview invitation from a job application in the UK. (Anglia Ruskin University) - 24%: Wage premium for jobs that require AI specialist skills in some markets. (PWC) - 3.5x: Multiple by which jobs that require specialist AI skills have grown faster than all jobs since 2012. (PWC) - 4.8x: Multiple by which labor productivity has grown in AI-exposed sectors versus other sectors. (PWC) - 27%: Percentage by which jobs are growing more slowly in AI-exposed occupations. (PWC) - 200%: Percentage reduction in false positives during the detection of fraudulant transactions against potentially compromised cards by Mastercard. (Mastercard) - 300%: Percentage increase of the speed of identifying merchants at-risk from—or compromised by—fraudsters by Mastercard. (Mastercard) - $14,000,000,000: Worldwide revenue from generative AI in 2020. (Bloomberg via World Economic Forum) - $1,304,000,000,000: Estimated worldwide revenue from generative AI in 2032. (Bloomberg via World Economic Forum) - 25%: Percentage of webpages from 2013 to 2023 are no longer acessible. (Pew Research) - 38%: Percentage of webpages from 2013 that are no longer accessible. (Pew Research) #ai #artificialintelligence #generativeai #airesearch #complexity #chatgpt #complexchange #changemanagement #futureofwork #artificiality #mindforourminds #intimacyeconomy #agenticweb

Facts & Figures about AI and Complex Change

artificiality.world

1 Comment
Like Comment
To view or add a comment, sign in
Robert Blumofe
7mo
Report this post
Both OpenAI and Google have recently announced updates to their respective generative AI platforms, and with all the buzz around the releases, I want to share my high-level impressions of the latest and greatest. As best as I can tell, it seems that the focus has shifted to enhancements of the user interfaces, while the AI models themselves have minimal, if any, improvements. The new user interfaces are certainly exciting—with multi-modal input and output, they are elegant and fun. Even without significant advancements in the underlying LLMs, these changes will make interacting with AI tools more intuitive than ever. I wonder, though, how much of this is super demo and fun and how much is really useful? I suppose time will tell. What did you think of the latest updates? Let me know in the comments.

4 Comments
Like Comment
To view or add a comment, sign in
Dr Bhavin Parekh

Project Fellow@ DVIKAR@Gujarat University | PhD in Medicine (Cambridge University, UK)
3w
Report this post
In the fast-moving world of AI, competition is heating up—and nowhere is this more evident than in the battle over advanced reasoning models. In just the past few days, three new AI models from Chinese developers—Deepseek R1 (HighFlyer Capital Management), Marco-1 (Alibaba), and OpenMMLab’s hybrid model —have entered the fray, challenging OpenAI’s o1 Preview in performance and accessibility. These releases highlight how quickly open-source innovation is catching up to proprietary giants like OpenAI, whose o1-preview model set a new benchmark for complex reasoning tasks when it was released in mid-September. With OpenAI expected to unveil its next release as early as next week, the pressure is mounting to prove its dominance isn’t slipping. https://lnkd.in/dBg-qu3Z

OpenAI faces critical test as Chinese models close the gap in AI leadership

https://venturebeat.com
Like Comment
To view or add a comment, sign in
Slashdot Media

18,345 followers
4mo
Report this post
Google Gemini 1.5 Pro Leaps Ahead In AI Race, Challenging GPT-4o: An anonymous reader quotes a report from VentureBeat: Google launched its latest artificial intelligence powerhouse, Gemini 1.5 Pro, today, making the experimental "version 0801" available for early testing and feedback through Google AI Studio and the Gemini API. This release marks a major leap forward in the company's AI capabilities and has already sent shockwaves through the tech community. The new model has quickly claimed the top spot on the prestigious LMSYS Chatbot Arena leaderboard (built with Gradio), boasting an impressive ELO score of 1300. This achievement puts Gemini 1.5 Pro ahead of formidable competitors like OpenAI's GPT-4o (ELO: 1286) and Anthropic's Claude-3.5 Sonnet (ELO: 1271), potentially signaling a shift in the AI landscape. Simon Tokumine, a key figure in the Gemini team, celebrated the release in a post on X.com, describing it as "the strongest, most intelligent Gemini we've ever made." Early user feedback supports this claim, with one Redditor calling the model "insanely good" and expressing hope that its capabilities won't be scaled back. "A standout feature of the 1.5 series is its expansive context window of up to two million tokens, far surpassing many competing models," adds VentureBeat. "This allows Gemini 1.5 Pro to process and reason about vast amounts of information, including lengthy documents, extensive code bases, and extended audio or video content." Read more of this story at Slashdot.
Like Comment
To view or add a comment, sign in

17,799 followers

View Profile Connect

Wow Development Quality Assurance ’s Post

More from this author

🚀 Empowering Nepal: Digital: Landscape: Insights: yesterday: lecture: SNG Solution: Refresher Course: Cybersecurity: Legal: Regulatory: Framework

Secrets of Nepalese Economy

Buy 15 rupees theannapurnapost daily: AI: Testing: ArtificialIntelligence: Role: Enhancing: Lava Kafle: Software:

Explore topics