Ganesan Janarthanam (Jana)’s Post

Keep IT Simple!

6mo

Exploring and running large language models (LLMs) locally has never been easier, thanks to tools like LM Studio, which provides valuable insights into model size, prompt tokens, RAM, CPU usage, etc => https://lmstudio.ai/ 👍 ---- #ai

1 Comment

Ganesan Janarthanam (Jana)

Keep IT Simple!

6mo

If CLI is your cup of tea, then ... 😀

2 Reactions

To view or add a comment, sign in

More Relevant Posts

Volodymyr Bilyachat

Software Engineering Leader | Driven by a Passion for Problem Solving
8mo Edited
Report this post
Finally started my journey into deep RL with stable baseline and learnt that gpu is way slower then cpu. Will blame framework and gpu (c) follow me for more AI advice

2 Comments
Like Comment
To view or add a comment, sign in
Anshul Jain

Solution Architect | Big data / Java Trainer
3mo
Report this post
RAG without GPU : How to build a Financial Analysis Model with Qdrant, Langchain, and GPT4All x… https://lnkd.in/dFcRKZfN

RAG without GPU : How to build a Financial Analysis Model with Qdrant, Langchain, and GPT4All x…

ai.gopubby.com
Like Comment
To view or add a comment, sign in
Pav Athwal

Head of Legal & Compliance | Data Protection Officer | Artificial Intelligence | Transactions | Cybersecurity | IP | Privacy
9mo
Report this post
Check out Nat Friedman's "craigslist for GPU clusters": https://gpulist.ai/ GPU thirst is a spectacle to behold. https://lnkd.in/gBtPrTpB

Nat Friedman (@natfriedman) on X

twitter.com

3 Comments
Like Comment
To view or add a comment, sign in
Louis Scott

Quantitative finance leader specializing in long-term wealth growth and downside protection. Director with expertise in data-driven strategy, stakeholder management, and leading teams to deliver superior performance.
1w
Report this post
A new post-training training quantization paradigm for diffusion models, which quantize both the weights and activations of FLUX.1 to 4 bits, achieving 3.5× memory and 8.7× latency reduction on a 16GB laptop 4090 GPU. https://lnkd.in/grF6jiBv

SVDQuant: Accurate 4-Bit Quantization Powers 12B FLUX on a 16GB 4090 Laptop with 3x Speedup

hanlab.mit.edu
Like Comment
To view or add a comment, sign in
Najeeb Khan, Ph.D.

Applied Scientist at Amazon
3mo
Report this post
Attention is the core building block for LLMs, but it can be slow and inefficient. FlashAttention offers a faster, exact implementation designed for cutting-edge hardware! 𝗙𝗹𝗮𝘀𝗵𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝘃𝟭: Tiling and recomputation for ~7x speedup. 𝗙𝗹𝗮𝘀𝗵𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝘃𝟮: Reduced ops, reaching 70% utilization of A100 GPUs. 𝗙𝗹𝗮𝘀𝗵𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝘃𝟯: Async execution and low-precision arithmetic, achieving 75% utilization of H100 GPUs. Learn more about FlashAttention in my latest article: https://lnkd.in/gEW2EaaX #FlashAttention #LLMs #GPUComputing

FlashAttention — one, two, three!

medium.com
Like Comment
To view or add a comment, sign in
Saurabh Yadav

Principal Data Scientist (AI/ML) | Forbes 30U30 Asia | Enterpreneur
2mo
Report this post
🚀 New Blog Post: Fundamentals of Transformer Inference Estimations ✍️ As transformer models expand in size and application, understanding how to estimate and optimize inference on GPUs becomes crucial. In my latest blog, I cover the fundamental factors influencing transformer inference performance, including: 🔍 Arithmetic Intensity: Learn how to measure operations per byte for efficiency 🚀 Bound Scenarios: Insight into compute-bound, memory-bound, and overhead-bound estimations 💡 Cost and Latency Estimations: A step-by-step guide for calculating token generation time ⚙️ Performance Boost: Practical tips on batching, quantization, tensor parallelism, and speculative decoding As a bonus, you will find an estimator app that compares GPU costs and performance based on different model parameters. Best viewed on the web browser. 👉 Read the full post here: https://lnkd.in/dBjpcMpM #transformers #genai #llm #deeplearning #fundamentals

Transformer Inference Estimations: Arithmetic Intensity, Throughput and Cost Optimization

yadavsaurabh.com

3 Comments
Like Comment
To view or add a comment, sign in
Nacho Martínez Rincón

Data Scientist Advocate @ Oracle | Follow me to learn about AI/ML
5mo
Report this post
Join me in an in-person workshop / hands-on session in Malaga next week, July 9th at 11AM! The topic is "Deploying LLMs with NVIDIA GPUs on OCI Compute Bare Metal", where I will guide you and solve questions while you practice on your own on deploying Large Language Models on OCI Compute using NVIDIA GPUs. We will talk about the inference and implementation side of things. You can still register using this link: https://lnkd.in/dhk8SurN
1 Comment
Like Comment
To view or add a comment, sign in
Sanskar Ritesh

IT Trainer & Graphic Designer
2mo
Report this post
🖥️ CPU vs GPU: What's the Difference? Both CPU (Central Processing Unit) and GPU (Graphics Processing Unit) are vital in computing, but they serve different roles: 💡 CPU: General-purpose processor, best for sequential tasks like running OS, web browsing, and complex logic. Few cores (4-16), optimized for single-thread performance. Lower latency, great for multitasking with complex decisions. 💡 GPU: Specialized for parallel tasks like image processing, video rendering, and AI. Hundreds or thousands of cores for executing tasks simultaneously. High throughput, perfect for massive parallelism. 🔑 Key Difference: CPU = Sequential, complex tasks 🧠 GPU = Parallel, data-heavy tasks 🎨 #TechTalk #Computing #CPU #GPU #AI #Graphics #Technology #ParallelComputing
Like Comment
To view or add a comment, sign in
Andrei Lopatenko 🇺🇦

VP AI & Engineering | Co-Founder | Keynote speaker | Ex-Google, Apple, WML
2mo Edited
Report this post
a very good explanation of fow GPU works in this thread Perhaps it's known to most of my readers, but I really like the clarity of explanation here https://lnkd.in/gehQmNMf also, please check https://lnkd.in/g8k3d_i5

Juan Linietsky (@reduzio) on X

x.com
Like Comment
To view or add a comment, sign in
ASA Computers

3,454 followers
10mo
Report this post
This Supermicro SYS-751GE-TNRT-NV1 stands as a formidable GPU platform, delivering unparalleled power right at your desk. Boasting a staggering 2 Petaflops of AI performance, it provides data scientists, analytics engineers, and other professionals to harness AI capabilities for diverse workloads. The incorporation of liquid cooling not only ensures optimal performance for CPUs and GPUs but also mitigates the usual noise levels associated with high-powered systems. Get a quote: https://lnkd.in/gC9QvCmc #ai #liquidcooling #serversolutions #aiworkloads
Like Comment
To view or add a comment, sign in

1,349 followers

View Profile Follow

Ganesan Janarthanam (Jana)’s Post

More from this author

AWS Step Functions

Serverless - pyCharmers flute?

Serverless == $Less 🧡

Explore topics