The rapid evolution of artificial intelligence (AI) is transforming edge computing, and Sharad Chole, Co-founder and Chief Scientist at Expedera discusses the implications. Expedera, a neural network IP provider, focuses on neural processing units (NPUs) for edge devices, emphasizing low-power operation, optimizing bandwidth, and cost efficiency. In our latest episode of Ask the Experts, Sharad shared his insights on the challenges and opportunities of deploying AI inference workloads at the edge.
The Exponential Growth in AI Model Complexity
Sharad began by noting the exponential growth in AI model sizes, from hundreds of millions to billions and now trillions of parameters. This explosive increase poses significant challenges, especially when deploying these complex models on edge devices with limited resources.
Overcoming Challenges in Edge AI: Memory and Bandwidth
Memory and bandwidth management emerged as central themes in Sharad’s talk. For edge devices to perform AI inference tasks efficiently, they need advanced memory management techniques to handle data processing without overwhelming system resources. Sharad emphasized the role of quantization techniques, which reduce the computational load of AI models, making them more suitable for edge deployment. He categorized AI applications into human task replacement, supervised agents, and tools, noting the industry is increasingly focused on supervised agents and tools for practical deployment.
The Road Ahead for AI at the Edge
Sharad concluded by outlining the critical challenges that lie ahead for AI hardware, particularly the need for efficient memory and bandwidth management for both training and inference. As AI continues to grow in complexity, so too will the demands on hardware.
For those interested in learning more about Expedera’s work in advancing edge AI technology, Sharad invites readers to visit Expedera’s website and connect with him on LinkedIn.
Watch the full video interview below or skip down the page to read the key takeaways.
Expert
- Sharad Chole, Co-founder and Chief Scientist, Expedera
Key Takeaways
- Neural Processing Units: Expedera, founded in 2018, specializes in Neural Processing Units (NPUs) for edge devices, focusing on low-power, optimized bandwidth, and cost-effective solutions, instantiated in over 10 million customer devices already deployed.
- AI Model Challenges: The rapid growth in AI model sizes, such as Stable Diffusion and LLMs, presents significant challenges for edge deployment, particularly in managing memory and optimizing model weights through techniques like quantization and knowledge distillation.
- Multimodal AI Complexity: Multimodal AI, which integrates text, audio, video and other media, increases the complexity and memory demands of models, necessitating advanced methods like cross-attention layers to handle diverse data inputs efficiently.
- AI Workloads: AI applications can be thought of in three broad classes: human replacement, supervised agents, and unsupervised tools, with the latter two showing more immediate practical use, especially in tasks like translation and voice command processing.
- AI Hardware Challenges: The primary challenges for AI hardware include managing the high bandwidth and interconnect needs of large models and ensuring cost-effective, scalable memory solutions for inference, with a focus on balancing capacity and cost.
Key Quote
One thing to point out here is why we are going towards larger models. And this is very interesting to me, it’s because of scaling laws. So, there is a point at which the models start exhibiting interesting capabilities – it’s not just predicting the next token based on what is similar in the corpus, but it’s actually understanding your question and context, and you can ask it more complex questions.
Leave a Reply