Archishman Sengupta’s Post

View profile for Archishman Sengupta, graphic

SDE - 1 Fullstack @StackWealth (YC S21) | ICPC Regionalist | 3x YC dev

AGI is almost here, OpenAI's o3 high-compute model scored 87.5% (ARC-AGI) on the Semi-Private Eval. The key thing about this benchmark is that it is impossible to pre-learn, as every test has new conditions, models were stuck at 30-55%. o3 essentially solves AIME (>90%), GPQA (~90%), ARC-AGI (~90%), and it gets 1/4th of the Frontier Maths. semiprivate v1 scores: GPT-2 (2019): 0% GPT-3 (2020): 0% GPT-4 (2023): 2% GPT-4o (2024): 5% o1-preview (2024): 21% o1 high (2024): 32% o1 Pro (2024): ~50% o3 tuned low (2024): 76% o3 tuned high (2024): 88% again, ARC-AGI is really really hard challenge, easier for humans but extremely hard for LLMs. This is a historic moment we are witnessing as we go deeper building more intelligent beings. Truly exciting times.

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics