Aleksa Gordić’s Post

View profile for Aleksa Gordić, graphic
Aleksa Gordić Aleksa Gordić is an Influencer

Founder mode | x-Google DeepMind / Microsoft

I invite you to update your beliefs on the progress of AI asap. OpenAI's o3 can be considered a first AGI by various measures. Software engineering skills: * 71.7% on SWE-bench, an extremely challenging SWE benchmark that closely mimics SWE's work (much more closely than HumanEval et al) * It achieved an ELO of 2727 on Codeforces!! I assume that maybe only a few people seeing this post are anywhere close to this range. And my audience is very much a biased sample full of PhDs. This is top 200 humans! :') Math skills: * Went from 2 -> 25.2% on FrontierMath!! I would bet almost everyone here is closer to 0 than double digit. This benchmark was compiled by some of the best mathematicians in the world (including Field medalists). From Terrence Tao himself: "These are extremely challenging... I think they will resist AIs for several years at least." And on arguably the best benchmark measuring true intelligence, ARC-AGI, it achieves higher than 85%, higher than a median human. Progress on Arc AGI: GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 87% The model still does fail on some tasks easy to humans but let's not use that to discount this achievement. Be aware of Moravec's paradox and the X self awareness test we use for animals and all the failure modes trying to eval essentially an alien intelligence. Crazy time.

  • No alternative text description for this image
Eteimorde Youdiowei

AI Engineer | tinkering with Lightweight Models

2h

I wonder where o2 is.

Like
Reply
Takudzwa Makusha

Senior Machine learning engineer at The Shoprite Group of Companies

2h

If a human scores so high on ARC AGI, why are we talking about AGI. This simply means that o3 is able to do task that ai had a blindspot on, like learning to walk which is very easy for humans.

Aleksa Gordić let me know if you get access for safety check. Would love to understand what can it do.

Vahid Sanei

Senior Software Engineer at RMS

3h

+2200 on CF is crazy. I didn't expect they could achieve it that fast.

Like
Reply
Momal Ijaz

ML Engineer | Helping AI/ML enthusiasts grow on Linked In.

2h

I tested LLMs on code optimization tasks a whike agi and they sucked…. But thei codeforces rank is scary

Devanshu Litoria

Generative AI@Adobe - playing with LLMs right now

5h

M afraid after o3... The model is phenomenal

Like
Reply
Ivan Vandot

Glorified System Administrator

2h

This is top 200 humans, that do Codeforces.

Like
Reply
Alex Alvaro

Senior Renewals Account Representative at OpenText

50m

Thanks Aleksa Gordić this is a very informative post regarding this big milestone

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics