Yi Wang’s Post

Principal Engineer at Apple

6mo

It may be easier than you think to use your skills for server-based deep learning on Apple devices. Yunfei Cheng and I attempted to evaluate the learning curve by comparing MLX kernels working on Metal GPUs in Apple Silicon chips to PyTorch kernels on CUDA GPUs. The image below depicts the scalability of self-attention and linear projection on M1 Max, M2 Ultra, A100, and H100. The x-y plane represents the beam shape size used in our Recurrent Drafting work (https://lnkd.in/dvrvUwbU). All of these kernels show a similar scalability trend as the beam shape grows. It is interesting to reveal that the performance difference between CUDA and Metal in SDPA is considerably lesser than in linear projection. For example, linear projection indicated a roughly 100x performance difference between the M1 Max and the H100, whereas SDPA showed just a 25x difference on the same hardware.

Yunfei Cheng

Machine Learning Engineer @ Apple

6mo Edited

How does MLX on Metal perform in handling machine learning tasks? Yi Wang and I conducted a set of benchmarks using M1 Max, M2 Ultra with MLX, A100, and H100 with PyTorch to compare the performance of two fundamental operations, SDPA and Linear Projection. A surprising revelation is the close performance between the M2 Ultra and A100, underscoring the impressive potential of on-device machine learning. The benchmark also reveals distinct performance trends. Linear Projection shows a linear increase in latency with larger input sizes, while SDPA exhibits exponential latency growth due to its higher complexity. Interestingly, the performance disparity in SDPA is much less pronounced than in Linear Projection. For instance, Linear Projection demonstrates a nearly 100x performance difference between the M1 Max and H100, whereas SDPA shows only 25x difference on the same set of hardwares. These findings highlight the significant potential of on-device machine learning, and we look forward to further enhancements in performance, particularly with advancements in Metal.

To view or add a comment, sign in

More Relevant Posts

Shailendra Prajapati

Data Scientist @ Compunnel India | Machine Learning | IoT | Azure | Technical Writer
3w
Report this post
Mastering Hyperparameter Tuning for Optimized Machine Learning Models Hyperparameter tuning is the secret sauce that transforms a good machine learning model into a great one. By fine-tuning parameters like learning rate, tree depth, or number of layers, you can maximize performance and accuracy. Key Highlights from the Article: 1. What is Hyperparameter Tuning? A method to optimize non-learnable parameters in a machine learning model. Impacts training speed, convergence, and overall accuracy. 2. Techniques for Tuning: Grid Search: Systematic exploration of parameter combinations. Random Search: Random sampling of hyperparameters for efficiency. Bayesian Optimization: Intelligent exploration for fewer iterations. 3. Practical Steps with Code: Learn how to implement tuning using libraries like Scikit-learn, TensorFlow, or PyTorch. Understand real-world examples of hyperparameter tuning in action. 4. Challenges: Time-consuming process for large datasets. Risk of overfitting when tuning excessively. https://lnkd.in/gkBMQ4vc Additional Resources: Tools for Automation: Optuna, Ray Tune https://lnkd.in/gExMuTnF Code Examples: Explore hyperparameter optimization on GitHub. https://lnkd.in/g3n_SQbp #MachineLearning #HyperparameterTuning #AI #DataScience #MLModels #OptimizationTips

Hyperparameter Tuning:

blog.devops.dev
Like Comment
To view or add a comment, sign in
Dr.Mohammed Arshad

UAE 🇦🇪 Govt Emp - 9 yrs |HealthcareTech,FinTech,Hospitality | Dy Manager- Database Administration Top Voice | Applications’ Architect | Lead DBA -MS SQL Server/NoSQL(MongoDB)/PostgreSQL/Oracle/Sybase | Data Analytics
7mo
Report this post
“Sharing is Caring” Hi #AI Community!!! #sharingiscaring “Send me an Invite and get your skills endorsed straightaway as pleasantries !” “#Knowledge Booster” 📚 Expand your knowledge base with this helpful article..Keep #Learning !! 👍 NOTE : This is NOT a paid Advertisement #Robotics #artificialintelligence #dataanalytics #datascience #machinelearning #deeplearning #linkedinfamily #knowledgesharing #machinelearningalgorithms #neuralnetworks #deeplearning #deeplearningai #computervision Join me on my way into an exciting world of Data/Analytics/AI Send me an Invite and get your skills endorsed straightaway as pleasantries 😊 🍁Let’s Connect Now

Eric Feuilleaubois (Ph.D)

Deep Learning / ADAS / Autonomous Parking chez VALEO // Curator of Deep_In_Depth news feed on BlueSky
7mo

TD3-BST: A Machine Learning Algorithm to Adjust the Strength of Regularization Dynamically Using Uncertainty Model https://buff.ly/4dquTaX

TD3-BST: A Machine Learning Algorithm to Adjust the Strength of Regularization Dynamically Using Uncertainty Model

https://www.marktechpost.com

1 Comment
Like Comment
To view or add a comment, sign in
mian ahtisham

Article writer and calligraphy artist
8mo
Report this post
Unveiling the Power of Baseline Models in Machine Learning While complex architectures and cutting-edge techniques rightfully captivate the imagination in machine learning, the foundation of any successful model lies in the humble baseline. In the vast landscape of machine learning (ML), where complex algorithms and sophisticated architectures often steal the spotlight, it’s easy to overlook the humble yet crucial baseline models. These unassuming models serve as the foundation upon which more advanced solutions are built. In this article, we’ll delve into the world of baseline models, demystify their purpose, and explore why they are essential in ML development pipelines. Read more: https://lnkd.in/dEsmakVr

Unveiling the Power of Baseline Models in Machine Learning

secret-scribes.blogspot.com
Like Comment
To view or add a comment, sign in
Yunfei Cheng

Machine Learning Engineer @ Apple
6mo Edited
Report this post
How does MLX on Metal perform in handling machine learning tasks? Yi Wang and I conducted a set of benchmarks using M1 Max, M2 Ultra with MLX, A100, and H100 with PyTorch to compare the performance of two fundamental operations, SDPA and Linear Projection. A surprising revelation is the close performance between the M2 Ultra and A100, underscoring the impressive potential of on-device machine learning. The benchmark also reveals distinct performance trends. Linear Projection shows a linear increase in latency with larger input sizes, while SDPA exhibits exponential latency growth due to its higher complexity. Interestingly, the performance disparity in SDPA is much less pronounced than in Linear Projection. For instance, Linear Projection demonstrates a nearly 100x performance difference between the M1 Max and H100, whereas SDPA shows only 25x difference on the same set of hardwares. These findings highlight the significant potential of on-device machine learning, and we look forward to further enhancements in performance, particularly with advancements in Metal.
Like Comment
To view or add a comment, sign in
Sandro V.

Product Manager | M.Sc. CompSci @ Georgia Tech | JLPT N3 | ServiceNow
6mo
Report this post
Considering the price ratio (1 : 14.3) and the TDP ratio (estimated in 1 : 20) for M1 max vs workstations with H100, these results seem promising and make me believe I will see ubiquitous robotics in my life span.
Yunfei Cheng

Machine Learning Engineer @ Apple
6mo Edited

How does MLX on Metal perform in handling machine learning tasks? Yi Wang and I conducted a set of benchmarks using M1 Max, M2 Ultra with MLX, A100, and H100 with PyTorch to compare the performance of two fundamental operations, SDPA and Linear Projection. A surprising revelation is the close performance between the M2 Ultra and A100, underscoring the impressive potential of on-device machine learning. The benchmark also reveals distinct performance trends. Linear Projection shows a linear increase in latency with larger input sizes, while SDPA exhibits exponential latency growth due to its higher complexity. Interestingly, the performance disparity in SDPA is much less pronounced than in Linear Projection. For instance, Linear Projection demonstrates a nearly 100x performance difference between the M1 Max and H100, whereas SDPA shows only 25x difference on the same set of hardwares. These findings highlight the significant potential of on-device machine learning, and we look forward to further enhancements in performance, particularly with advancements in Metal.
Like Comment
To view or add a comment, sign in
Modaai

154 followers
6mo
Report this post
Machine learning can potentially automate tedious tasks and increase productivity by 50% or more in some cases. Yet, there are challenges that arise and shut down ML pilots before takeoff. In the following article, CTO Lior Gavish examines the four most common reasons Machine Learning pilots fail to take off and how to avoid these pitfalls. Read more: https://lnkd.in/g5EcnQhT Source: Techopedia #machinelearning #optimization

4 Common Machine Learning Pitfalls and How To Avoid Them

https://www.techopedia.com
Like Comment
To view or add a comment, sign in
Ricardo Galante

Advanced Analytics & Artificial Intelligence Advisor | SAS Iberia | Data Science & Artificial Intelligence Lecturer
9mo
Report this post
How Bayes’ Theorem is Applied in Machine Learning Bayes’ theorem tells use how to gradually update our knowledge on something as we get more evidence or that about that something.

How Bayes’ Theorem is Applied in Machine Learning - KDnuggets

kdnuggets.com
Like Comment
To view or add a comment, sign in
iCONIFERz

1,630 followers
6d
Report this post
Machine learning model optimization is a dynamic and critical aspect […]

Latest Trends in Machine Learning Model Optimization

iconiferz.com
Like Comment
To view or add a comment, sign in
Giuseppe Canale CISSP

cybersecurity | AI | ML | coding | database | art of phish founder | occasioni.it founder | secondlife.it founder.
1w
Report this post
L2 Regularization: Beyond Overfitting in Machine Learning 💥💥 GET FULL SOURCE CODE AT THIS LINK 👇👇 👉 https://lnkd.in/d4BDUrAi L2 regularization, also known as Ridge Regression, is a popular approach in machine learning for preventing overfitting and improving model generalization. But its applications go beyond this: it also helps in feature selection, collinearity reduction, and stabilizing the coefficients, enhancing model performance. In this post, we dive deeper into the world of L2 regularization. We discuss its mathematical foundation, the underlying motivation of adding the L2 regularization term to the cost function, and the impact of various regularization parameters. Additionally, we compare L2 to L1 regularization and discuss their differences in terms of feature selection and model performance. To explore L2 regularization further, study the following resources: 1. Andrew Ng's Machine Learning course: https://lnkd.in/duN88raz 2. "The Elements of Statistical Learning: Data Mining, Inference, and Learning Algorithms" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. Additional Resources: - "An Introduction to Statistical Learning" available at: https://lnkd.in/dxi9TXn8 - The scikit-learn documentation on Ridge Classifier and Ridge Regression can be found at: https://lnkd.in/dHMvMa6D #STEM #MachineLearning #L2Regularization #RidgeRegression #Overfitting #FeatureSelection #ModelPerformance #DataMinig #MachineLearningAlgorithms #Technology #AI #AIAlgorithms Find this and all other slideshows for free on our website: https://lnkd.in/d4BDUrAi #STEM #MachineLearning #L2Regularization #RidgeRegression #Overfitting #FeatureSelection #ModelPerformance #DataMinig #MachineLearningAlgorithms #Technology #AI #AIAlgorithms https://lnkd.in/d8RgunGf

L2 Regularization: Beyond Overfitting in Machine Learning

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Giuseppe Canale CISSP

cybersecurity | AI | ML | coding | database | art of phish founder | occasioni.it founder | secondlife.it founder.
5d
Report this post
Machine Learning: Understanding Hinge Loss and Its Role in SVM Optimization 💥💥 GET FULL SOURCE CODE AT THIS LINK 👇👇 👉 https://lnkd.in/dbiSDKN9 Machine Learning has become a cornerstone of modern technology, allowing systems to learn from data and make predictions or decisions. In this explanation, we will dive into the concept of Hinge Loss and its significance in Support Vector Machines (SVM) optimization. Hinge Loss is a popular loss function in Machine Learning, particularly in the context of SVMs. Its main goal is to find the best-fit hyperplane that maximally separates data points from different classes. The function introduces a margin, quantifying the distance between the hyperplane and the closest data points from each class. When the margin is large, the model is considered to have a good generalization ability. Additionally, Hinge Loss employs soft-margin techniques, which allow some data points to lie on the wrong side of the hyperplane. These misclassifications, however, are penalized with a loss. By minimizing the Hinge Loss, we optimize the SVM model to find the hyperplane that separates the data points efficiently while maximizing the margin. It is essential to understand the underlying mathematics and meaning of Hinge Loss to take full advantage of the SVM's robustness and versatility. Those seeking to delve deeper into this topic are encouraged to explore the following resources: - [SVM: A Review](https://lnkd.in/dE8EP8fb) - [On Support Vector Methods for Function Approximation, Regression Estimation, and Signal Processing](https://lnkd.in/dyKubkxV) Additional Resources: [Optional] #STEM #Programming #MachineLearning #SupportVectorMachines #SVM #HingeLoss #Optimization #DataScience #Technology #AI #predictiveanalytics #datasciencecommunity Find this and all other slideshows for free on our website: https://lnkd.in/dbiSDKN9 #STEM #Programming #MachineLearning #SupportVectorMachines #SVM #HingeLoss #Optimization #DataScience #Technology #AI #predictiveanalytics #datasciencecommunity https://lnkd.in/dQay8vEb

Machine Learning: Understanding Hinge Loss and Its Role in SVM Optimization

https://www.youtube.com/
Like Comment
To view or add a comment, sign in

5,460 followers

View Profile Follow

Yi Wang’s Post

More from this author

Kubernetes for Distributed Machine Learning

Explore topics

Yi Wang’s Post

More Relevant Posts

L2 Regularization: Beyond Overfitting in Machine Learning

https://www.youtube.com/

Machine Learning: Understanding Hinge Loss and Its Role in SVM Optimization

https://www.youtube.com/

More from this author

Kubernetes for Distributed Machine Learning

Explore topics