Ronak Shah’s Post

Global AI GPU Product Marketing Manager @ AMD Ex-Cadence Ex-Ansys

1w Edited

⚡ NEW AMD ROCm™ 6.3 Release - Power Your AI & HPC Workflows with the Latest from AMD! ⚡ For all AI Enthusiasts, Data Scientists, Developers, and HPC Professionals — Your toolkit just got a serious upgrade! 🌐 ✨ What’s New in ROCm 6.3? ✨ ✅ SGLang for ROCm – Supercharge GenAI Models with up to 6X faster inferencing on LLMs. Speed like never before! ✅ Re-engineered FlashAttention-2 – 3X speedups on backward passes with ultra-efficient forward passes for lightning-fast AI workloads. ✅ AMD Fortran Compiler – Modernize legacy code with GPU acceleration, letting you process larger datasets faster and more efficiently. ✅ Enhanced Computer Vision Libraries – From media & entertainment to autonomous systems, vision-based AI just got smarter. 🔗 Highlights here - https://lnkd.in/gKPdS-Mc 🔗 Full release notes - https://lnkd.in/gryfqu7r #ai #hpc #amd #rocm #instinct #sglang #pytorch #tensorflow #generativeai #ml #datascience #machinelearning

To view or add a comment, sign in

More Relevant Posts

David Elliman FRSA FBCS

Chief of Software Engineering | Emerging Tech Advisory | WEF Expert Network
10mo
Report this post
New Multithreading mode for AI: According to a study published by the University of California, Riverside, using an AI accelerator, CPU, and GPU simultaneously may be possible with simultaneous and heterogeneous multithreading (SHMT). According to the paper, this new multithreading technique can double the performance and halve power consumption for a total of four times greater efficiency. As a proof-of-concept, however, one should not become overly enthusiastic; the endeavor is still in its initial phases. #ai #multithreading #SHMT #gpu #cpu https://lnkd.in/e3xPmt2S

Simultaneous and Heterogenous Multithreading

dl.acm.org
Like Comment
To view or add a comment, sign in
Sayed Raheel Hussain

ML Engineer | AI Researcher | Generative AI & LLMs | Computer Vision & Data Science
9mo Edited
Report this post
I believe recently we had the launch of one of the biggest breakthrough in AI, No, I am not talking about SORA by OPENAI but I am talking about a new chip design that could potentially help AI break computation limit of Large Language Model. It's called LPU with LPU standing for Language Processing Unit™, is a new type of end-to-end processing unit system. The Tensor Streaming Processor (TSP) architecture offers significant advantages over traditional GPUs and CPUs in terms of computational density. It is launched by https://groq.com/ I recently got access to the API of groq for Mixtral-8x7b model as a developer, I used a UI and deployed on hugging face for a personal project. Just look at the amazing speed. It's lightning fast. Great times ahead. #deeplearning #largelanguagemodels #machinelearning #newbeginnings

2 Comments
Like Comment
To view or add a comment, sign in
Sakshi Chaurasia

Data Science Analyst @ BTRNSFRMD | AWS & Azure Certified | Master's in Big Data Analytics
4mo Edited
Report this post
CPU vs. GPU in AI: A Quick Comparison In the world of AI, the choice between CPU and GPU can significantly impact the efficiency and speed of your models. Here’s a quick breakdown: 💻 CPU (Central Processing Unit): The workhorse of general computing. Excellent for tasks requiring complex computations and low-level operations. Ideal for smaller datasets and sequential processing. 🎮 GPU (Graphics Processing Unit): Built for parallel processing with thousands of cores. Accelerates the training of deep learning models. Perfect for large-scale data and tasks that require heavy computational power. When to Use What? CPUs are best for simpler, low-scale AI tasks where sequential processing and versatility are key. GPUs shine in deep learning and large-scale AI applications, where parallelism and speed are crucial. Choosing the right processor can transform your AI projects, so it’s essential to align your hardware with your project needs. #AI #MachineLearning #DeepLearning #TechTalk #DataScience #NVIDIA Image Source : https://lnkd.in/g6T_qKvG
1 Comment
Like Comment
To view or add a comment, sign in
Anton Allen

Senior Vice President of Global Sales
8mo
Report this post
A technical presentation revealing how to optimize GPUs focusing on microbenchmark latencies in GPU architecture, instructions and pipeline capacities. https://lnkd.in/gSvQzd4Y #ArcCompute #ArcHPC #HPC #microbenchmarks #nvidia

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/
Like Comment
To view or add a comment, sign in
Brian K. Lee

Creative Director at Arc Compute | Maximum GPU Performance.
8mo
Report this post
Your GPU infrastructure could be performing much better (much more than you can imagine), and here's how. Listen to our CTO, Michael Buchel, as he takes you through the hidden challenges your GPU infrastructures face, and how they can be solved.

Anton Allen

Senior Vice President of Global Sales
8mo

A technical presentation revealing how to optimize GPUs focusing on microbenchmark latencies in GPU architecture, instructions and pipeline capacities. https://lnkd.in/gSvQzd4Y #ArcCompute #ArcHPC #HPC #microbenchmarks #nvidia

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/
Like Comment
To view or add a comment, sign in
Roy Chartier

Co-founder & CTO at Qvelo | Innovating High-Performance Computing & AI Solutions | 30+ Years of Expertise in Compute, Storage, and Networking at Scale. Founder, Computing for Humanity. Neurodivergent Advocate.
8mo
Report this post
Fantastic presentation on GPU optimization. #GPU #AI #Architecture #Pipelines #FMA #SIMD #InstructionSets #Cache #Code #Optimize #Optimization #ARCCompute

Anton Allen

Senior Vice President of Global Sales
8mo

A technical presentation revealing how to optimize GPUs focusing on microbenchmark latencies in GPU architecture, instructions and pipeline capacities. https://lnkd.in/gSvQzd4Y #ArcCompute #ArcHPC #HPC #microbenchmarks #nvidia

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/
Like Comment
To view or add a comment, sign in
Shrivastava Anshumali

Professor at Rice, Founder ThirdAI
8mo Edited
Report this post
Building and Deploying #GenAI application at #scale ( #RAG with #finetuning on raw pubmedia text -- 120 million text chunks) is now a weekend job with #NeuralDB over spare #CPU (AMD) cycles. No NVIDIA #GPUs in the loop. Save your #GPUs cycles.

Build & Deploy Perpetually-improving Medical Q&A Engine at Scale (120M Chunks) with NeuralDB (No…

medium.com
Like Comment
To view or add a comment, sign in
Angus Berry

Principal Software Engineer
6mo
Report this post
AI has to run on something! Here's an interesting development in GPU alternatives to run AI. Maybe one day this will challenge Nvidia, who's stock splits soon! https://lnkd.in/eiwAfiYx

How a simple circuit could offer an alternative to energy-intensive GPUs

technologyreview.com
Like Comment
To view or add a comment, sign in
Abdelrahman Katkat

Machine Learning Engineer @Omdena | Computer Vision Engineer | Aeronautical Engineer | Battery Pack Engineer
2w
Report this post
🚀 GPU vs. CPU vs. TPU: The Ultimate Silicon Showdown 🧠💻 CPU: The Swiss Army knife 🔪 of computing! It's a general-purpose processor with a few powerful cores 🏋️♂️ designed for sequential tasks like running your OS or juggling applications. Perfect for versatility, but it taps out on parallelism. GPU: The parallel processing wizard 🎩✨! With thousands of cores, it excels at handling data in bulk, ideal for matrix operations in deep learning 🧬 and rendering graphics 🎨. Its SIMD (Single Instruction, Multiple Data) architecture allows it to process multiple data streams simultaneously—perfect for neural network training and video rendering. TPU: The specialized monk 🧘♂️! Built by Google, TPUs are ASICs (Application-Specific Integrated Circuits) 🛠️ crafted for one mission: accelerating TensorFlow computations. Using systolic arrays, they crunch massive matrix multiplications with jaw-dropping efficiency 😲. Faster than GPUs for ML tasks but useless for anything outside AI. In summary: CPU: "I do everything decently, but slowly in parallel." 🐢 GPU: "I parallelize like a boss, bring on the matrices!" ⚡ TPU: "Matrix multiplications? My life's purpose." 🤓 #TechHumor #AI #DeepLearning
4 Comments
Like Comment
To view or add a comment, sign in
Vladimir Dyachkov Ph.D.

CPO • AI-Driven Product Development
2mo
Report this post
Mirage: A Multi-Level Tensor Algebra Super-Optimizer that Automates GPU Kernel Generation for PyTorch Applications Practical Solutions with Mirage for AI Applications Automated GPU Kernel Generation for Enhanced Performance With the rise of artificial intelligence, demand for efficient GPUs is increasing. Writing optimized GPU kernels manually is complex; Mirage automates this process. Benefits of Mirage Mirage simplifies GPU kernel generation, speeding up AI applications. It reduces latency by 15-20% compared to manual coding and offers 1.2x-2.5x faster performance than human-written code. Usage of Mirage Using Mirage is straightforward, requiring only a few lines of code compared to traditional methods. It optimizes computations on GPUs, enhancing productivity and correctness in AI tasks. Four Categories of GPU Optimization Mirage optimizes GPU performance by integrating techniques such as normalization, low-rank adaptation, gated MLP, and attention variants, tailored for AI applications. List of Useful Links: http://t.me/itinai https://lnkd.in/e7mfJSf5 #AIDevelopment #MachineLearning #ArtificialIntelligence #Productivity #CognitiveComputing #FutureOfWork Service) #ArtificialIntelligence #MachineLearning #AI #DeepLearning #Robotics https://lnkd.in/dB95mdw6
Like Comment
To view or add a comment, sign in

2,376 followers

154 Posts

View Profile Connect

Ronak Shah’s Post

More Relevant Posts

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/

Uncovering GPU Potential: Addressing Optimization Challenges in AI and HPC

https://vimeo.com/

Explore topics