Kentauros AI

Software Development

Wilmington, Delaware 167 followers

See us on the web: https://www.kentauros.ai/ Come talk with us on Discord: https://discord.gg/hhaq7XYPS6

Discover all 7 employees

About us

Build, deploy, and share AI agents with ease on the AgentSea platform.

Website: https://kentauros.ai
External link for Kentauros AI
Industry: Software Development
Company size: 2-10 employees
Headquarters: Wilmington, Delaware
Type: Privately Held
Founded: 2023

Locations

Primary

Wilmington, Delaware 19808, US

Get directions

Employees at Kentauros AI

See all employees

Updates

Kentauros AI

167 followers
1h
Report this post
🌐 Teaching models to navigate the web with play and exploration: 📑 Paper: https://lnkd.in/eysAffnX 💻 Code: https://lnkd.in/e7UFUExz "WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning 🤖 "Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. 🔄 "This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates: 1️⃣ a self-evolving curriculum that generates new tasks from unsuccessful attempts 2️⃣ a robust outcome-supervised reward model (ORM) 3️⃣ adaptive reinforcement learning strategies to ensure consistent improvements 📈 "We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). 🎯 "Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems."

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum...

arxiv.org

Like Comment Share
Kentauros AI

167 followers
2d
Report this post
Great article on different training strategies: https://lnkd.in/eEX4MH7x...

New LLM Pre-training and Post-training Paradigms

magazine.sebastianraschka.com

Like Comment Share
Kentauros AI

167 followers
2d
Report this post
A fantastic counterpoint to the question of whether reasoning models can really reason. Incredibly well written and a fantastic counterpoint to the industry wisdom that test time compute models. Worth a read. https://lnkd.in/dB9h-2fW

The Problem with Reasoners | Aidan McLaughlin

aidanmclaughlin.notion.site

Like Comment Share
Kentauros AI

167 followers
2d Edited
Report this post
Phi-4 Technical report on the 14B parameter small wonder of a model that punches well above its weight. The paper is a testament to synthetic data, strong organic data and data cleaning. Often times people think there is some magical technique in AI but more often than not, like Dario Amodei said in his Lex Fridman interview more often than not it is some "improvement in infrastructure that lets us train longer and more reliably, or better data or better ways to clean data." Phi-4, is "a 14-billion parameter model that further advances performance of small language models by introducing innovative synthetic data generation methods for reasoning-focused tasks, by optimizing the training curriculum and data mixture, and by introducing new tech-niques in post-training. "Synthetic data constitutes the bulk of the training data for phi-4 and is generated using a diverse array of techniques, including multi-agent prompting, self-revision workflows, and instruction reversal. "These methods enable the construction of datasets that induce stronger reasoning and problem-solving abilities in the model, addressing some of the weaknesses in traditional unsupervised datasets. Synthetic data in phi-4 also plays a crucial role in post-training, where techniques such as rejection sampling and a novel approach to Direct Preference Optimization (DPO) are employed to refine the model’s outputs." https://lnkd.in/dsekkNmu

Phi-4 Technical Report | alphaXiv

alphaxiv.org

Like Comment Share
Kentauros AI

167 followers
3d
Report this post
A great quick video on using the new Gemini 2.0 real time API. Amazing stuff that was sci-fi only last year. https://lnkd.in/dmH4SzQm

Gemini 2.0 - How to use the Live Bidirectional API

Like Comment Share
Kentauros AI

167 followers
3d
Report this post
Claude Computer Use with the AgentSea stack: https://lnkd.in/diYKz38M

https://www.agentsea.ai/blog?p=a-direct-integration-of-anthropic-computer-use-and-agentdesk

agentsea.ai

Like Comment Share
Kentauros AI

167 followers
3d
Report this post
A fantastic little project to rewrite SQLite in Rust. https://lnkd.in/d7V-XqV3

Introducing Limbo: A complete rewrite of SQLite in Rust

turso.tech

Like Comment Share
Kentauros AI

167 followers
3d Edited
Report this post
The Veo 2 video platform from Google looks incredible. But will tomorrow's games be almost entirely procedurally generated? At Kentauros, we think many folks have gotten a bit ahead of themselves on thinking that AI will just magically generate incredible stories, characters that you connect with deeply and emotionally, twisting mysterious plots, along with powerful multi-layered storytelling. It will happen but it will take many, many years. People won't simply prompt a masterpiece like Arcane into existence. That's because they don't understand the shot composition, the unique style, the way to layer on character depth, pitch perfect dialogue, narrative flow, musical composition, to name a few. In short, regular folks don't know what looks/sounds good, even if they know it once it's *already* been created. Big difference. Even if you have a super magical and perfect video AI editor you still need to know how to craft a good shot and tell a great story. This will be just like the self publishing e-book revolution. We just got a lot more books. Most of them were sub-par but there are some breakthroughs like Wool/Silo that are true masterpiece. The same will happen with video and film-making. The people who will benefit from these tools most will be artists with an amazing eye for story, character, shots, details, plot and emotional resonance. In a decade, the vast majority of artists will just be using these tools and not thinking twice about it, the way folks went from physically cutting film to digital editing. https://lnkd.in/g5qvyk-k

Veo 2 demo | Flamingos

https://www.youtube.com/

Like Comment Share
Kentauros AI

167 followers
3d Edited
Report this post
Finally, the amazing Allen Institute releases their incredible corpus of training data and training scripts for their breakthrough Molmo model (a truly open multimodal model) that has tremendous accuracy returning point data on objects (among other things) because it was trained on 2 million custom image pairs. We desperately need more truly open models, meaning open model, data and training scripts, but with an increasingly hostile regulatory environment to open source Allen Institute is one of the few teams brave enough to do it. We need more open source champions like them because this is how machine learning really advances. It's a real loss for the world when we have only closed models that tell us nothing about the architecture and how it was a trained. https://lnkd.in/dPZEKSi7

PixMo - a allenai Collection

huggingface.co

Like Comment Share
Kentauros AI

167 followers
3d
Report this post
From Dylan Foster (Principal Researcher in AI/ML/RL Theory @ Microsoft Research NE/NYC. Previously @ MIT, Cornell), on Bluesky: "Given a high-quality verifier, language model accuracy can be improved by scaling inference-time compute (e.g., w/ repeated sampling). When can we expect similar gains without an external verifier?" Self-Improvement in Language Models: The Sharpening Mechanism arxiv.org/abs/2412.01951 What is a verifier in test time compute? Check out this excellent little tutorial video: https://lnkd.in/d2nkjPE4

Self-Improvement in Language Models: The Sharpening Mechanism

arxiv.org

Like Comment Share

Kentauros AI

Software Development

Wilmington, Delaware 167 followers

See us on the web: https://www.kentauros.ai/ Come talk with us on Discord: https://discord.gg/hhaq7XYPS6

About us

Locations

Employees at Kentauros AI

Jeffrey Huckabay

Senior Software Engineer at Kentauros

Mariya Davydova

AI Agents Builder | Applied AI Advocate | Founder | Getting Things Done

Patrick Barker

Founder / CTO

Arturo Márquez Flores

Engineer @ Kentauros AI

Updates

Veo 2 demo | Flamingos

https://www.youtube.com/

Join now to see what you are missing

Similar pages

AI Infrastructure Alliance

Stability AI

Pachyderm Inc. (Acquired by HPE)

Beam

Leitmotif

One Medical

Ixana

VMware

Magic

cylib