Ivan Chan’s Post

View profile for Ivan Chan, graphic

AI Copywriter

The Inference Scaling Revolution: A Shift in Focus **The emergence of OpenAI Strawberry (o1) marks a significant shift in the paradigm of large language models (LLMs).** By prioritizing inference-time scaling, o1 challenges the conventional wisdom that larger models are always better. This shift is rooted in the #understanding that #reasoning, rather than #memorization, is the key to #effective #AI. **1. #Decoupling Reasoning from Knowledge:** * **Efficiency:** A smaller "reasoning core" can be more efficient, as it focuses solely on logical processes. * **Flexibility:** This decoupling allows for easier integration of external tools and knowledge bases. * **Reduced Pre-training:** By reducing the reliance on pre-trained knowledge, the computational cost of training can be significantly lowered. **2. #Inference-Time Scaling:** * **Iterative Refinement:** Similar to AlphaGo's MCTS, LLMs can explore multiple strategies and scenarios to find optimal solutions. * **Learning from Experience:** By simulating various situations, the model can learn and improve its reasoning abilities over time. * **Computational Efficiency:** While inference-time scaling requires additional compute, it can be more efficient than scaling model parameters, especially for specific tasks. **3. #The Inference Scaling Law:** * **Recent Discoveries:** The recent research papers by Brown et al. and Snell et al. highlight the effectiveness of scaling inference compute. * **OpenAI's Foresight:** It is likely that OpenAI has been exploring this approach for some time, given their early success with o1. **4. #Challenges of Productionization:** * **Decision-Making:** Determining when to stop searching, setting appropriate reward functions, and deciding when to call external tools are complex challenges. * **Computational Cost:** Balancing the benefits of inference-time scaling with the computational costs of external processes is crucial. * **Lack of Detail:** OpenAI's research post has not provided sufficient information about their specific approaches to these challenges. **5. #A Data Flywheel:** * **Continuous Improvement:** By using the search traces as training data, o1 can continuously refine its reasoning core. * **Positive and Negative Feedback:** The training data includes both positive and negative examples, providing a rich learning environment. * **Similarities to #AlphaGo:** The process is analogous to how AlphaGo's value network improved through MCTS-generated training data. In conclusion, **#o1 represents a paradigm shift in LLM development**. By focusing on inference-time scaling and decoupling reasoning from knowledge, it opens up new possibilities for creating more efficient and capable AI systems. The challenges of productionization will require further research and development, but the potential benefits are significant.

View profile for Jim Fan, graphic
Jim Fan Jim Fan is an Influencer

NVIDIA Senior Research Manager & Lead of Embodied AI (GEAR Lab). Stanford Ph.D. Building Humanoid Robots and Physical AI. OpenAI's first intern. Sharing insights on the bleeding edge of AI.

OpenAI Strawberry (o1) is out! We are finally seeing the paradigm of inference-time scaling popularized and deployed in production. As Sutton said in the Bitter Lesson, there're only 2 techniques that scale indefinitely with compute: learning & search. It's time to shift focus to the latter. 1. You don't need a huge model to perform reasoning. Lots of parameters are dedicated to memorizing facts, in order to perform well in benchmarks like trivia QA. It is possible to factor out reasoning from knowledge, i.e. a small "reasoning core" that knows how to call tools like browser and code verifier. Pre-training compute may be decreased. 2. A huge amount of compute is shifted to serving inference instead of pre/post-training. LLMs are text-based simulators. By rolling out many possible strategies and scenarios in the simulator, the model will eventually converge to good solutions. The process is a well-studied problem like AlphaGo's monte carlo tree search (MCTS). 3. OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers came out on Arxiv a week apart last month: - Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. - Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search. 4. Productionizing o1 is much harder than nailing the academic benchmarks. For reasoning problems in the wild, how to decide when to stop searching? What's the reward function? Success criterion? When to call tools like code interpreter in the loop? How to factor in the compute cost of those CPU processes? Their research post didn't share much. 5. Strawberry easily becomes a data flywheel. If the answer is correct, the entire search trace becomes a mini dataset of training examples, which contain both positive and negative rewards. This in turn improves the reasoning core for future versions of GPT, similar to how AlphaGo’s value network — used to evaluate quality of each board position — improves as MCTS generates more and more refined training data.

  • No alternative text description for this image

To view or add a comment, sign in

Explore topics