Aleksa Gordić on LinkedIn: I invite you to update your beliefs on the progress of AI asap. OpenAI's…

Founder mode | x-Google DeepMind / Microsoft

5h Edited

I invite you to update your beliefs on the progress of AI asap. OpenAI's o3 can be considered a first AGI by various measures. Software engineering skills: * 71.7% on SWE-bench, an extremely challenging SWE benchmark that closely mimics SWE's work (much more closely than HumanEval et al) * It achieved an ELO of 2727 on Codeforces!! I assume that maybe only a few people seeing this post are anywhere close to this range. And my audience is very much a biased sample full of PhDs. This is top 200 humans! :') Math skills: * Went from 2 -> 25.2% on FrontierMath!! I would bet almost everyone here is closer to 0 than double digit. This benchmark was compiled by some of the best mathematicians in the world (including Field medalists). From Terrence Tao himself: "These are extremely challenging... I think they will resist AIs for several years at least." And on arguably the best benchmark measuring true intelligence, ARC-AGI, it achieves higher than 85%, higher than a median human. Progress on Arc AGI: GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 87% The model still does fail on some tasks easy to humans but let's not use that to discount this achievement. Be aware of Moravec's paradox and the X self awareness test we use for animals and all the failure modes trying to eval essentially an alien intelligence. Crazy time.

15 Comments

Eteimorde Youdiowei

AI Engineer | tinkering with Lightweight Models

I wonder where o2 is.

Takudzwa Makusha

Senior Machine learning engineer at The Shoprite Group of Companies

If a human scores so high on ARC AGI, why are we talking about AGI. This simply means that o3 is able to do task that ai had a blindspot on, like learning to walk which is very easy for humans.