Plamen Florov’s Post

Managing Director, Regiware Bulgaria

8mo

To me quantization is part of democratizing AI.

Andrew Ng

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI

8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W

To view or add a comment, sign in

More Relevant Posts

Dr. Uohna Thiessen✅

Leading AI Strategist, empowering businesses to leverage AI automation (chatbots, conversational AI, & AI agents) to drive growth and innovation.
8mo
Report this post
#Me: I'm not signing up for any #AI courses for a while. #LinkedIn: Dr. Andrew Ng just launched a new AI course. #Me: He releases a new AI course almost every week. I'm sit this one out. #Andrew Ng: I'm excited to introduce #Quantization fundamentals... #Me: Ah, compressing large LLMs so they are more accessible, hmm... #DrNg: ...with Hugging Face #Me: Why didn't you say so? Where's the link?! #quantizationfundamentals #opensource #huggingface #nextelligence

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Usama Bin Haider

Co-founder and MD at AskUs Solution | Experienced iOS Mobile and macOS Developer | Swift Expert | iOS | macOS Developer | Swift | Python Develpor | Machine Learning| Technical Writer
8mo
Report this post
Quantization is a game changer for deploying large language models like Mistral 7B on consumer hardware. This course offers invaluable techniques to harness powerful AI models even with limited resources.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Rawan Alkurd

Senior Data Scientist at Larus Technologies, Ph.D. (Tech. Lead)
8mo
Report this post
Quantization is becoming more and more relevant as LLMs become larger and more resource-intensive, helping to make models more practical and accessible.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Ankur Tripathi

Custodian of Client Success | Practice Lead- Pharmacovigilance & Drug Safety| GenAI-AI/ML| Dreamer-Achiever
8mo
Report this post
Size over performance and accuracy.. may be a good alternative/approach in some use cases but there should be better ways to handle this size issue

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Chirag Raj

Governance | Product Management | FinTech | E-Invoice
8mo
Report this post
This will contribute in making the GenAI race somewhat sustainable with reduced carbon footprint.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Otman MECHBAL GRACIA

Managing and automating sustainable projects. | ❤️Sports & learning | Part of an AI & Web3 community and a polyglot Book Club
8mo
Report this post
Nice AI free training about quantizing open source multimodal and language models 😃

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Anubhav Ghosh

Vice President at Goldman Sachs | Conversational AI & Chatbots | State University of New York-Buffalo
8mo Edited
Report this post
While being futuristic, Quantization of a Neural Network model is the key to reduce dependency on high end hardware. This one is a great explanation of mode quantization techniques that can save millions of 💵 trying to implement LLM #LLM #ConversationalAI #GenAI #Chatbots #contactcenter

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Omer Afzaal

Software Engineer | Empowering Business with AI | Next Js | Python | PostgreSQL | AWS
8mo
Report this post
🪝 “Diving into the Depths of Efficiency: The Quantum Leap in LLMs!” As Andrew Ng introduced, In the relentless pursuit of computational excellence, we’ve reached a pivotal milestone: the quantization of Large Language Models. This isn’t just an incremental improvement; it’s a transformative approach that redefines the boundaries of AI efficiency and performance. While the quantization of Large Language Models (LLMs) heralds a new era of AI efficiency, it’s not without its challenges. Here are some hurdles we’re facing in this quantum leap: 📌 Precision Trade-Offs: The process of quantization involves reducing the precision of the model’s parameters, which can lead to a loss in accuracy. Balancing model size with performance is a delicate act. 📌 Hardware Compatibility: Not all hardware is ready to support quantized models. Ensuring compatibility and performance across diverse platforms is a significant challenge. 📌 Optimization Complexity: Quantization requires sophisticated optimization techniques. Developing algorithms that can effectively compress models without losing essential information is a complex task. 📌 Data Sensitivity: Some tasks are more sensitive to quantization than others. Identifying and mitigating the impact on such tasks is crucial for maintaining the integrity of the model’s output. 📌 Resource Intensity: Although quantization aims to reduce resource usage, the initial process of quantizing and fine-tuning LLMs can be resource-intensive. Despite these challenges, the potential benefits of quantized LLMs are too significant to ignore. As we continue to innovate and overcome these obstacles, we move closer to creating AI that’s not only powerful but also sustainable and inclusive. #AIChallenges #Quantization #LLMs #MachineLearning #AI

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W
Like Comment
To view or add a comment, sign in
Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo
Report this post
LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W

143 Comments
Like Comment
To view or add a comment, sign in
Andrei Kavalenka

Data Scientist | 3 years of experience | Machine Learning and Deep Learning Developer | Passionate about solving complex business problems with ML and LLM's
8mo
Report this post
In the era of super large language models, understanding LLM quantization is essential. I highly recommend diving into the topic of one-bit quantization as well. #LLM #Quantization #AI #DataScience #MachineLearning

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of Landing AI
8mo

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16) data types to load and run LLMs using PyTorch and the Hugging Face Transformers library - Dive into the technical details of linear quantization to map 32-bit floats to 8-bit integers As models get bigger and bigger, quantization becomes more important for making models practical and accessible. Please check out the course here: https://lnkd.in/g66yNW8W

1 Comment
Like Comment
To view or add a comment, sign in

Plamen Florov

520 followers

View Profile Follow

More from this author

Explore topics