“We’ve achieved peak data and there’ll be no more. We have to deal with the data that we have. There’s only one internet.” - Ilya Sutskever Agentic is the buzzword. 🐝 “Just as evolution found a new scaling pattern for hominid brains, AI might similarly discover new approaches to scaling beyond how pre-training works today.” https://lnkd.in/ehSG7t9r
So - serious question: what exactly is it, in what he has just said, that leads anyone to think that this is a good investment bet - either in time, people, attention, or money? Progress was pretty much entirely based on data scaling - and that has dried up. But they 'might' find other approaches. 'Agentic' is somehow put forward as the solution - both as a significant value add, and mysteriously, some sort of fix for the other intrinsic problems such as reliability. Don't worry - I'm sure they'll fix all that (despite actively having avoided doing so up to now) No details as to how of course - and don't be tempted to think that just maybe that's for a very good reason. Don't buy the product right?- buy the people. I'm sure it will be fine. That nice man off the telly said it would ...
I am confused why llya says that we have reached the peak of data, why is the internet the only data source for learning? isn’t the world around us full of data to access through cameras, audio and touch? The only reason we aren’t able to process the world’s data is because of being compute bound and/or learning algorithm bound.
Sutskever's assertion that we've hit "peak data" is provocative and challenges the prevailing paradigm in AI development. For years, the mantra has been "bigger data, better models." If we're truly at the point of diminishing returns from simply shoveling more data into the maw of AI, then a fundamental shift in approach is necessary. The idea of AI becoming "agentic" is intriguing and perhaps a bit unsettling. It suggests a move away from passive learning machines towards systems that can actively seek out information and even manipulate their environment to achieve goals. This raises questions about control, ethics, and the very nature of intelligence. Also, a small reminder: our brains didn't just get bigger over time; they developed new structures and organizational principles that allowed for higher-level cognition. Similarly, AI may need to evolve beyond brute-force pattern recognition to achieve truly human-like intelligence. Thank you for sharing, Michael.
All he is saying is that the low signal task design of self-supervised language modelling (in pre-training) is approaching obsolescence. More sophisticated agentic task design is proving dramatically more data efficient. We're not running out of data, we are realizing we need more sophisticated ways to extract supervision from the data.
Not only buzzword, also it is the way it change AI adoption for SME - more guardrails - more role and focus - limited hallucinations and strong accuracy Same trends than in 1900's industrial revolution - People got specialized in daily tasks through production chain - AI will also Quality and Productivity will skyrocket from now
Michael Jackson, thank you for sharing these insightful perspectives on data and ai. 🧠
I envisioned this from the total opposite side of science. He's a total genius. And big up for this biology parallel. He is just on another level. Not talking of internet clogged with AI generated data indistinctible in its syntax from human writing and reasoning. Scary future, I must say.
There is data on the internet...and then there is knowledge in the human mind. Ultimately these AI tools need to learn from humans. That is why in the sciences you have theory and practicum. Former in books and the latter is the "learn by doing" part which is hard to replicate.
I’ve always felt it presumptuous that AIs must be trained (like humans) on human generated content, and that it—like language—is inefficient. The idea that synthetic data is inferior to “real” data may yet be disproven. Good.
CEO / Board Member at Porto Digital
6dThis idea that we have reached the peak of data is both revealing and absurd. Revealing in the sense that it makes it clear that the models have a fundamental flaw—they are incapable of truly learning. If they could, a few dozen books would be enough to achieve an average level of knowledge, true intelligence—that is, the ability to solve challenges at a human level—instead of constantly pretending. Absurd in the sense that claiming we’ve reached the peak of data suggests that we, as humans, will stop being creative and producing content. Just during the time I wrote this comment, hours of video, thousands of pages of text, and hundreds of songs were likely created.