Jim Fan’s Post

View profile for Jim Fan, graphic
Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Group). Stanford Ph.D. Building Humanoid robot and gaming foundation models. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

Thoughts about o3: I'll skip the obvious part (extraordinary reasoning, FrontierMath is insanely hard, etc). I think the essence of o3 is about *relaxing a single-point RL super intelligence* to cover more points in the space of useful problems. The world of AI is no stranger to RL achieving god-level stunts. AlphaGo was a super intelligence. It beats the world champion in Go - well above 99.999% of regular players. AlphaStar was a super intelligence. It bests some of the greatest e-sport champion teams on StarCraft. Boston Dynamics e-Atlas was a super intelligence. It performs perfect backflips. Most human brains don't know how to send such surgically precise control signals to their limbs. Similar statement can be made for AIME, SWE-Bench, and FrontierMath - they are like Go, which requires exceptional domain expertise above 99.99....% of average people. o3 is a super intelligence when operating in these domains. The key difference is that AlphaGo uses RL to optimize for a simple, almost trivially defined reward function: winning the game gives 1, losing gives 0. Learning reward functions for sophisticated math and software engineering are much harder. o3 made a breakthrough in solving the reward problem, for the domains that OpenAI prioritizes. It is no longer an RL specialist for single-point task, but an RL specialist for a bigger set of useful tasks. Yet o3's reward engineering could not cover ALL distribution of human cognition. This is why we are still cursed by Moravec's paradox. o3 can wow the Fields Medalists, but still fail to solve some 5-yr-old puzzles like the one below. I am not at all surprised by this cognitive dissonance, just like we wouldn't expect AlphaGo to win Poker games. Huge milestone. Clear roadmap. More to do.

  • No alternative text description for this image
George Davis IV

Founder and Full Stack Developer

12h

@Jim Fan are we looking at this problem properly?? Why not train ONE model on ONE specific subject area? logic test Reading Comprehension Math History Physics Biology Etc Like we teach our kids and then figure out a way to combine two subjects into ONE model.. then three.. four etc?? The papers/articles I read don't ever seem to suggest this idea?? Is there a reason for this? Bad idea? Is there a better solution??

JinJa Birkenbeuel

Founder. CEO. Host. Brand Influencer. Writer. Creative. Digital Media Buyer. Social Media Strategist. President, Birk Creative.

11h

My two thoughts 1) as long as humans continue to design the benchmarks, we will be undefeated.. 2) Today's educational system is ill prepared and ill equipped, lacks imagination and curiosity to prepare our young people for this world. Our systems are running on World War II thoughts and economics and supply chain. Schools which start at elementary to high school all the way through college and postgraduate are teaching young people and students the absolute wrong things. They are not teaching critical thinking skills, navigational skills to align with artificial intelligence platforms, avatars, robots. They are not teaching ethics. Every single academic, environment should include AI throughout its curriculum. It's not happening. Most teachers of today were schooled in an environment of World War II academic structures and nuclear family. They do not have the skills, imagination, or curiosity to do any of this. It's immoral. Parents are out the picture, because their children are getting the majority of their information from the Internet not from the family or the community. Congrats to the chosen few creating the systems. I'm disheartened at the lack of desire to change our educational system to meet the demand of AI for all.

If you look at o3’s outputs they actually follow the rules that are implied by the example images! The desired output here (arc answer key) was that the second red block from the top left also be blue… but that’s not neccessitated by the examples shown. Why? Because none of the example outputs resolve the case where the blue line you would draw between two points do not explicitly intersect the block, but simply are passing adjacent squares as they do here. One of o3’s answers was that this block (top row second from left) should not be blue and correctly fulfilled the rest of the conditions. I think this is actually more logical than the test key 😆 Credit @renderdysphoria on X for pointing this ambiguity out

Let's sum this up: Not actual intelligence. As a community, we need to clean up our bloody language to be within what the words actually indicate rather than stretching and mangling them so far the words lose all ability to indicate anything remotely close to being within objective reality. ~ Leaving us floundering in a sea of gibberish squaking imbeciles. Oh, wait! We're in fairy unicorn land, where wish-casting AGI makes it happen, and the tooth fairy clones herself, marries Santa Claus, and creates golden, goose-like Easter Bunny eggs. These eggs, when lined up with string, play music that puts evil giants to sleep, causing them to dream about our universe. These dreams, in turn, serve as the real Copenhagen observer and cause the wave function to collapse infinitely, all made from the love found only in the genuine laughter of children. Wait, where am I? Oh, yeah—nonsense land, where so-called and self-proclaimed PhDs parade their ignorant ludicrous, unscientific claims as insight, spouting laughable drivel about topics they clearly don't understand. It’s no better than hawking snake oil from the back of a wagon in 1910, except now the pitchman has a lab coat and an inflated sense of self-worth.

Emin Burak Onat, PhD

Building someting new | PhD @UCBerkeley | SkyDeck

12h

RL reward functions are inherently limited - they can't capture the complexity of human reasoning which involves abstract thinking, ethical judgments, and consciousness that resist reduction to simple rewards. Also each problem, organization, and individual has unique trade-off preferences that can't be generalized into reward functions. What makes people think that RL can get close to covering all distribution of human cognition?

Stephen Curtis

Principal Engineer of Spatial Computing & AI for Enterprise

12h

If you're stuck on the puzzle, the blue dots in the input result in blue lines in the output and collisions with red rectangles of those lines turn them blue.

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

5h

Your daily dose of AI hype “RL achieving god level stunts” I must have missed that paper. Do you have the link “FrontierMath” Passing a benchmark does not necessarily imply math has been learnt at it’s core “AlphaGo was a super intelligence” Excelling in some specific task is not even intelligence, let alone super intelligence

It really is fascinating where the latest AI falls down. But it is also a bit worrying that they announced that the AI would essentially lie to achieve goals as well. These training sets are so massive and the results so complex that humans can't test them thoroughly or effectively. The creation of tools and AIs to test and validate other AIs will continue to grow and provide new insights as well. It is a very exciting time to be sure.

Isaac M.

CTO, AI Architect & Engineering Lead Creator of Sharism, Researcher of Artificial Consciousness

11h

If AlphaGo can’t play poker or open a door, it’d not be called super inteligence, super intelligence should be conscious. Human invented those tools for solving some specific problems, they are called tools.

Anton Kalabukhov

Co Founder – Synlabs

13h

Every time I hear achievements like this I hope it's true, but unfortunately it's usually just marketing and it turns out that there is some narrow part or task where everything is perfect, or maybe even perfect only 1 attempt out of 10 and so every time. In the video everything is perfect, when you start working with it, you realize how many problems there are. Here too, the first thing I started to think about is what's really behind closed doors, how it was possible to make such a result on display, but I keep hoping and it's still impressive and pleasing.

See more comments

To view or add a comment, sign in

Explore topics