Latency matters and on chip inference helps tremendously. Take a look at this LLaMA3 + Groq demo now! #groq #inference #llama3 #opensource
Mind blowing real-time LLaMA 3 + Groq demo. Now extrapolate this 5 years out - even the technology we currently have will change the landscape of apps as we know it today quite dramatically. Not counting in daily advancements that are happening in AI. It's well known that latency changes user experience, just think of some of the research Google did in the early days showing how even a slight increase in latency leads to significantly less searches (see https://lnkd.in/df26CZam). So if we just speed up the existing capabilities the quality of future apps will increase. This realtime table manipulation is such a cool proxy for all the things we'll be able to do: e.g. imagine curating your Notion tables like this vs having to click through the GUI and google search how to do X. If you want to learn what makes Groq tick, I had their head of sillicon talk about their chips - LPUs: https://lnkd.in/dgx3kmCb And here I had Thomas Scialom, PhD LLaMA 2 author talk about LLaMA 2: https://lnkd.in/deibURjY I'll try to bring on over Thomas over the next days to share insights behind building LLaMA 3!
Mission-Driven Product Leader | Empowering Innovation with Customer Empathy and Ethical AI in SaaS | Product Management Mentor
4moExciting!