I invite you to update your beliefs on the progress of AI asap. OpenAI's o3 can be considered a first AGI by various measures. Software engineering skills: * 71.7% on SWE-bench, an extremely challenging SWE benchmark that closely mimics SWE's work (much more closely than HumanEval et al) * It achieved an ELO of 2727 on Codeforces!! I assume that maybe only a few people seeing this post are anywhere close to this range. And my audience is very much a biased sample full of PhDs. This is top 200 humans! :') Math skills: * Went from 2 -> 25.2% on FrontierMath!! I would bet almost everyone here is closer to 0 than double digit. This benchmark was compiled by some of the best mathematicians in the world (including Field medalists). From Terrence Tao himself: "These are extremely challenging... I think they will resist AIs for several years at least." And on arguably the best benchmark measuring true intelligence, ARC-AGI, it achieves higher than 85%, higher than a median human. Progress on Arc AGI: GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 87% The model still does fail on some tasks easy to humans but let's not use that to discount this achievement. Be aware of Moravec's paradox and the X self awareness test we use for animals and all the failure modes trying to eval essentially an alien intelligence. Crazy time.
If a human scores so high on ARC AGI, why are we talking about AGI. This simply means that o3 is able to do task that ai had a blindspot on, like learning to walk which is very easy for humans.
Aleksa Gordić let me know if you get access for safety check. Would love to understand what can it do.
+2200 on CF is crazy. I didn't expect they could achieve it that fast.
I tested LLMs on code optimization tasks a whike agi and they sucked…. But thei codeforces rank is scary
M afraid after o3... The model is phenomenal
This is top 200 humans, that do Codeforces.
Thanks Aleksa Gordić this is a very informative post regarding this big milestone
AI Engineer | tinkering with Lightweight Models
2hI wonder where o2 is.