🌐 Teaching models to navigate the web with play and exploration: 📑 Paper: https://lnkd.in/eysAffnX 💻 Code: https://lnkd.in/e7UFUExz "WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning 🤖 "Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. 🔄 "This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates: 1️⃣ a self-evolving curriculum that generates new tasks from unsuccessful attempts 2️⃣ a robust outcome-supervised reward model (ORM) 3️⃣ adaptive reinforcement learning strategies to ensure consistent improvements 📈 "We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). 🎯 "Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems."
Kentauros AI
Software Development
Wilmington, Delaware 167 followers
See us on the web: https://www.kentauros.ai/ Come talk with us on Discord: https://discord.gg/hhaq7XYPS6
About us
Build, deploy, and share AI agents with ease on the AgentSea platform.
- Website
-
https://kentauros.ai
External link for Kentauros AI
- Industry
- Software Development
- Company size
- 2-10 employees
- Headquarters
- Wilmington, Delaware
- Type
- Privately Held
- Founded
- 2023
Locations
-
Primary
Wilmington, Delaware 19808, US
Employees at Kentauros AI
Updates
-
Great article on different training strategies: https://lnkd.in/eEX4MH7x...
New LLM Pre-training and Post-training Paradigms
magazine.sebastianraschka.com
-
A fantastic counterpoint to the question of whether reasoning models can really reason. Incredibly well written and a fantastic counterpoint to the industry wisdom that test time compute models. Worth a read. https://lnkd.in/dB9h-2fW
The Problem with Reasoners | Aidan McLaughlin
aidanmclaughlin.notion.site
-
Phi-4 Technical report on the 14B parameter small wonder of a model that punches well above its weight. The paper is a testament to synthetic data, strong organic data and data cleaning. Often times people think there is some magical technique in AI but more often than not, like Dario Amodei said in his Lex Fridman interview more often than not it is some "improvement in infrastructure that lets us train longer and more reliably, or better data or better ways to clean data." Phi-4, is "a 14-billion parameter model that further advances performance of small language models by introducing innovative synthetic data generation methods for reasoning-focused tasks, by optimizing the training curriculum and data mixture, and by introducing new tech-niques in post-training. "Synthetic data constitutes the bulk of the training data for phi-4 and is generated using a diverse array of techniques, including multi-agent prompting, self-revision workflows, and instruction reversal. "These methods enable the construction of datasets that induce stronger reasoning and problem-solving abilities in the model, addressing some of the weaknesses in traditional unsupervised datasets. Synthetic data in phi-4 also plays a crucial role in post-training, where techniques such as rejection sampling and a novel approach to Direct Preference Optimization (DPO) are employed to refine the model’s outputs." https://lnkd.in/dsekkNmu
Phi-4 Technical Report | alphaXiv
alphaxiv.org
-
A great quick video on using the new Gemini 2.0 real time API. Amazing stuff that was sci-fi only last year. https://lnkd.in/dmH4SzQm
Gemini 2.0 - How to use the Live Bidirectional API
-
A fantastic little project to rewrite SQLite in Rust. https://lnkd.in/d7V-XqV3
Introducing Limbo: A complete rewrite of SQLite in Rust
turso.tech
-
The Veo 2 video platform from Google looks incredible. But will tomorrow's games be almost entirely procedurally generated? At Kentauros, we think many folks have gotten a bit ahead of themselves on thinking that AI will just magically generate incredible stories, characters that you connect with deeply and emotionally, twisting mysterious plots, along with powerful multi-layered storytelling. It will happen but it will take many, many years. People won't simply prompt a masterpiece like Arcane into existence. That's because they don't understand the shot composition, the unique style, the way to layer on character depth, pitch perfect dialogue, narrative flow, musical composition, to name a few. In short, regular folks don't know what looks/sounds good, even if they know it once it's *already* been created. Big difference. Even if you have a super magical and perfect video AI editor you still need to know how to craft a good shot and tell a great story. This will be just like the self publishing e-book revolution. We just got a lot more books. Most of them were sub-par but there are some breakthroughs like Wool/Silo that are true masterpiece. The same will happen with video and film-making. The people who will benefit from these tools most will be artists with an amazing eye for story, character, shots, details, plot and emotional resonance. In a decade, the vast majority of artists will just be using these tools and not thinking twice about it, the way folks went from physically cutting film to digital editing. https://lnkd.in/g5qvyk-k
Veo 2 demo | Flamingos
https://www.youtube.com/
-
Finally, the amazing Allen Institute releases their incredible corpus of training data and training scripts for their breakthrough Molmo model (a truly open multimodal model) that has tremendous accuracy returning point data on objects (among other things) because it was trained on 2 million custom image pairs. We desperately need more truly open models, meaning open model, data and training scripts, but with an increasingly hostile regulatory environment to open source Allen Institute is one of the few teams brave enough to do it. We need more open source champions like them because this is how machine learning really advances. It's a real loss for the world when we have only closed models that tell us nothing about the architecture and how it was a trained. https://lnkd.in/dPZEKSi7
PixMo - a allenai Collection
huggingface.co
-
From Dylan Foster (Principal Researcher in AI/ML/RL Theory @ Microsoft Research NE/NYC. Previously @ MIT, Cornell), on Bluesky: "Given a high-quality verifier, language model accuracy can be improved by scaling inference-time compute (e.g., w/ repeated sampling). When can we expect similar gains without an external verifier?" Self-Improvement in Language Models: The Sharpening Mechanism arxiv.org/abs/2412.01951 What is a verifier in test time compute? Check out this excellent little tutorial video: https://lnkd.in/d2nkjPE4
Self-Improvement in Language Models: The Sharpening Mechanism
arxiv.org