Khalifeh Al Jadda, Ph.D.’s Post

Director of Data Science at Google | Founder of Optimized AI Conference

3mo

This weekend I enjoyed the Fall vibes while reading the interesting paper by NVIDIA researchers "LLM Pruning and Distillation in Practice: The Minitron Approach". The paper presents an innovative approach to compressing large language models (LLMs) by combining pruning and distillation techniques. Key takeaways are: 1- Mistral-NeMo-Minitron-8B outperforms Llama 3.1 8B using 40x fewer training tokens! 2- Llama-3.1-Minitron 4Bn performs competitively with its teacher model Llama 3.1 8B. 3- Width pruning outperforms depth pruning when optimizing for parameter efficiency. This is exciting because compressing LLMs opens doors for more efficient deployment in resource-constrained environments, making AI more accessible! https://lnkd.in/e_s47Ph7

6 Comments

Jake Mannix

Technical Fellow @ Walmart | AI & Relevance

3mo

> 1- Mistral-NeMo-Minitron-8B outperforms Llama 3.1 8B using 40x fewer training tokens! We have to be careful here: I really don't like the way they phrase this - their 8B param model is of course based on first pruning down from a much bigger model which is not pretrained on 40x fewer tokens than Llama 3.1 8B. It's only the post-pruning continued pretraining that is so small. There's no free lunch: you still have to start with a heavily token-intensive pretraining model.

3 Reactions

Dr. Mohammed Ali Akour

Higher Education Consultant, Executive Member-Learning Ideas Conference, Member of the International Advisory Council

3mo

"Great insights! The combination of pruning and distillation is definitely a game-changer for optimizing LLMs, especially when the Minitron approach shows such significant gains in efficiency. The fact that Mistral-NeMo-Minitron-8B can outperform larger models with 40x fewer training tokens is impressive. It’s exciting to think how this could enable broader AI deployment in real-world applications where resources are limited. The potential impact on democratizing access to AI technology is huge!"

2 Reactions

Jens Nestel

AI and Digital Transformation, Chemical Scientist, MBA.

3mo

Impressive work simplifying complex models. Their insights raise intriguing possibilities. How might width pruning impact knowledge retention? Eager to discuss further.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Motaseam Yousef

Data Scientist @ KABi | SE | DA | DS | ML | DL | CV | NLP | LLM | RS | TS
9mo
Report this post
Hey everyone, I stumbled upon this super cool paper recently: 'The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits'. It's a game-changer! Picture this: we switch up the LLM weights from float64 to ternary [-1, 0, 1]. Boom! Instant improvement in power, latency, and memory usage. It's like upgrading to turbo mode for your AI! So, we go ahead and convert those weights to ternary (Pareto improvement!), by quantization function. That basically means we're simplifying things by only having three options instead of a gazillion. then when we calculate the output by: Sum( weights * input + bias) We're just adding stuff up without all the fancy multiplication. It's become an addition process only. Who knew you could get the same result with less hardware? check the result below paper link:https://lnkd.in/eu6jpzuY #LLM
2 Comments
Like Comment
To view or add a comment, sign in
Bhoop Singh Gurjar 🇮🇳

ML Research analyst || Vision Transformer Engineer ||AI Multi Agents Enthusiast||Data Scientist
3mo
Report this post
SparseGPT is an innovative pruning algorithm designed for large-scale generative pretrained transformer models, enabling significant model compression while preserving accuracy. It achieves over 50% sparsity in models like OPT-175B and BLOOM-176B without the need for retraining, completing the process efficiently on a single GPU in under 4.5 hours. By employing adaptive mask selection and being compatible with weight quantization, SparseGPT allows for substantial reductions in memory footprint and computational costs, making it a game-changer for deploying large language models in resource-constrained environments.
Like Comment
To view or add a comment, sign in
NVIDIA AI

1,152,586 followers
2mo Edited
Report this post
📣 Exciting news: Zyphra has open-sourced Zyda-2, a massive 5 trillion tokens dataset processed with NVIDIA NeMo Curator. Perfect for training high-accuracy LLMs. ➡️ https://nvda.ws/3BMMT0P ⏱ Zyda-2 outperforms existing state-of-the-art open-source language modeling datasets in aggregate evaluation scores. 📰 Learn how Zyphra has processed this dataset in just 2 days with NeMo Curator. #GenerativeAI
2 Comments
Like Comment
To view or add a comment, sign in
Brian Robinson

--
1w
Report this post
Just finished the course “Generative AI and Open Source Models: Hands-On Practice with Hugging Face Models” by Harpreet Sahota! Check it out: https://lnkd.in/e597x2Wz #generativeai #opensourcedevelopment.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Nagarajan Sankaran

Lead Business Applications QA Consultant @ MetLife | IT Automation, Agile Project Management
2mo
Report this post
Just finished the course “Generative AI and Open Source Models: Hands-On Practice with Hugging Face Models” by Harpreet Sahota! Check it out: https://lnkd.in/egHFA9FE #generativeai #opensourcedevelopment.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
RYAN VINCENT GALAN ,CHRA, MBA-BM, CPHR®, CDMP, CLDP®

Cluster head - Site Acquistion and Management - Mindanao at Globe Telecom
2mo
Report this post
Just finished the course “Generative AI and Open Source Models: Hands-On Practice with Hugging Face Models” by Harpreet Sahota! Check it out: https://lnkd.in/gj_vTZtW #generativeai #opensourcedevelopment.

Certificate of Completion

linkedin.com
Like Comment
To view or add a comment, sign in
Nicolas Charbonnier
1mo
Report this post
Check out my 1hr interview with Cassia.ai, Inc. featuring Marc Greenberg and James Tandon they talk about their solution for 20x lower power consumption in AI Systems and AI Chip design, using Approximate Computing, this could be revolutionary and allow for a more realistic massive AI future! You could run all the largest AI models and training on tiny battery powered future Arm CPU!

Marc Greenberg
1mo

Want to know how Cassia.ai, Inc. can build a FP32 multiplier in 1/10th the area and 1/20th the power of traditional math? Cassia.ai's CEO Dr. James Tandon and I were interviewed by the wonderful Nicolas Charbonnier. Dr. Tandon covers details from his ICICDT 2023 paper on approximate math hardware that turns all the multiplication operations in an AI Neural Network MatMul operation into additions for huge savings in power and area in the Tensor Processing Unit (TPU) of an AI chip. I give my views on AI processing hardware and where it is going. The interview is here: https://lnkd.in/ggt8RfKK Please forgive me for how tired I look... I woke up at 3AM PST, flew from Austin to San Francisco, ubered down to San Jose, spent the entire day until 7PM on the show floor of the OCP summit exhibition, then met up with James and Charbax to conduct the interview around 8PM... I hope y'all understand!

Cassia.ai: 20x Lower Power AI, Energy-efficient Language Models, using Approximate Computing

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Shamik Palit

Technology Evangelist | Academician (FHEA - UK) | Software Systems Engineer | Researcher (PhD - CS , UAE - Certificate of Recognition) | Enterprise IT Consultant
2mo
Report this post
Just finished the course “Generative AI and Open Source Models: Hands-On Practice with Hugging Face Models” by Harpreet Sahota! Check it out: https://lnkd.in/da6xjK52 #generativeai #opensourcedevelopment.

Certificate of Completion

linkedin.com

2 Comments
Like Comment
To view or add a comment, sign in
Majid Manzoor

"AI & ML Innovator | Computer Vision Researcher"
10mo
Report this post
🚀 Exciting news in the world of computer vision! 🖥️ YOLOv9 for custom image object detection. 🔍 YOLOv9 delivers state-of-the-art accuracy and speed. 💡 Harnessing the power of deep learning, this cutting-edge model is revolutionizing how we perceive and interact with visual data. Ready to take your image recognition tasks to the next level? GitHub Link: https://lnkd.in/dqRjWN2G Arxiv Research Paper Link: https://lnkd.in/dC__b-9Z Let's explore YOLOv9 together! #ComputerVision #DeepLearning #YOLOv9 #ObjectDetection 🌟📸
Like Comment
To view or add a comment, sign in
Multiverse Computing

18,816 followers
9mo
Report this post
Our Chief Science Officer Román Orús is representing Multiverse Computing this week at Big Data & AI World 2024 in London. Roman is speaking in two sessions: CompactifAI: Extreme Compression of LLMs and Quantum Inspired: The Path to Quantum Computing. Tensor networks will be a highlight of both talks as Roman explains how this mathematical framework can be used to build advanced optimizers and a variety of machine learning models. Stop by our Stand at B351 to learn more about CompactifAI, our compressor for large language models, and Singularity, our quantum software platform. #AI #ML #QML #bigdata #quantumcomputing
Like Comment
To view or add a comment, sign in

21,158 followers

View Profile Follow

Khalifeh Al Jadda, Ph.D.’s Post

More from this author

If Yahoo! Just Knew the Theory of "Jobs to Be Done" ...

Explore topics

Khalifeh Al Jadda, Ph.D.’s Post

More Relevant Posts

Cassia.ai: 20x Lower Power AI, Energy-efficient Language Models, using Approximate Computing

https://www.youtube.com/

More from this author

If Yahoo! Just Knew the Theory of "Jobs to Be Done" ...

Explore topics