Trevor Chow’s Post

Co-founder at Moonglow (YC S24)

2mo

what should pretraining compute be spent on? 4 years ago: mostly parameters 2 years ago: an even split of parameters and data now: mostly on data you might be wondering why it has changed so much; here's a brief intellectual history of pretraining recipes and why it has evolved over the past 4 years https://lnkd.in/gSSQse8X

Three Kuhnian Revolutions in ML Training

blog.moonglow.ai

2 Comments

Ben Warren

Co-Founder @ Snowpilot (YC S24), Ex-Microsoft

2mo

good stuff!!💡

To view or add a comment, sign in

More Relevant Posts

Chatbot's Life

819 followers
6mo
Report this post
How LLMs Work in 3 Sentences: 1) During training, LLMs digest sequential inputs of text data, which are converted into numerical representations called embeddings. 2) These embeddings capture the semantic meaning and contextual relationships between words, facilitating the model's understanding of language. 3) Through iterative adjustments of internal parameters via back-propagation, LLMs fine-tune their ability to predict the next word or sequence of words in a given context. Just as a chef combines ingredients like a bun, a burger, and cheese to make a cheeseburger, a large language model turns your words into embeddings, mixing them to generate useful responses.

1 Comment
Like Comment
To view or add a comment, sign in
Stefan Kojouharov

Building AI Agents that improve mental health and wellness.
6mo
Report this post
How LLMs Work in 3 Sentences: 1) During training, LLMs digest sequential inputs of text data, which are converted into numerical representations called embeddings. 2) These embeddings capture the semantic meaning and contextual relationships between words, facilitating the model's understanding of language. 3) Through iterative adjustments of internal parameters via back-propagation, LLMs fine-tune their ability to predict the next word or sequence of words in a given context. Just as a chef combines ingredients like a bun, a burger, and cheese to make a cheeseburger, a large language model turns your words into embeddings, mixing them to generate useful responses.

1 Comment
Like Comment
To view or add a comment, sign in
Towards AI

269,372 followers
1mo
Report this post
Advanced Recipes for Contrastive Learning via #TowardsAI → https://bit.ly/3NbafQ5

Advanced Recipes for Contrastive Learning

https://towardsai.net
Like Comment
To view or add a comment, sign in
Chatbot Conference

465 followers
6mo
Report this post
How LLMs Work in 3 Sentences: 1) During training, LLMs digest sequential inputs of text data, which are converted into numerical representations called embeddings. 2) These embeddings capture the semantic meaning and contextual relationships between words, facilitating the model's understanding of language. 3) Through iterative adjustments of internal parameters via back-propagation, LLMs fine-tune their ability to predict the next word or sequence of words in a given context. Just as a chef combines ingredients like a bun, a burger, and cheese to make a cheeseburger, a large language model turns your words into embeddings, mixing them to generate useful responses.

1 Comment
Like Comment
To view or add a comment, sign in
Simon Veitner

Solution Engineer @ d-fine | Kaggle competitions expert proudly holding four silver medals.
1mo
Report this post
This week I want to share an excellent blog post by someone else: This blogpost explores in a very nice and intuitive way how to run distributed training in JAX. https://lnkd.in/ebkZR9sn

Exploring Parallel Strategies with Jax

astralord.github.io
Like Comment
To view or add a comment, sign in
Chirav Dave

AI-ML Lead Engineer @A10Networks | Ex-DiDi Labs | Ex-Amazon | Masters in CS @ASU
1mo
Report this post
🚀 Demystifying DDP & FSDP for AI Developers 🚀 As AI models grow in scale, techniques like Data Parallelism (DDP) and Fully Sharded Data Parallel (FSDP) have become essential tools for efficient distributed training. But understanding how these techniques work can be a challenge, especially when diving into the theoretical concepts behind them. In my latest Medium article, I break down these advanced concepts in a simple yet theoretical way, aiming to make them accessible to developers at all levels. If you’ve ever felt overwhelmed by DDP and FSDP or struggled to grasp their inner workings, this article is for you! 🔍 What’s Inside? 1. A clear, step-by-step explanation of Data Parallelism and Fully Sharded Data Parallel. 2. How these methods optimize training and scale models effectively. 3. Insights into the underlying theory, presented in an intuitive way. Whether you're an AI researcher, developer, or just someone interested in the mechanics behind large-scale deep learning, I hope this article helps you navigate the complexity of DDP and FSDP with confidence! #MachineLearning #DeepLearning #DistributedTraining #DDP #FSDP #TechExplained #ModelTraining #LLMs

Distributed Training Demystified: A Beginner’s Guide to DDP & FSDP

link.medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Ferhat Culfaz

Director at GSK | Interests in Operational Excellence, Large Language Models, Time Series Forecasting in the Clinical Supply Chain
2mo
Report this post
How training LLMs have changed focus from just parameters, to balancing parameters and data, to now which is prima focus on data.

Three Kuhnian Revolutions in ML Training

blog.moonglow.ai
Like Comment
To view or add a comment, sign in
Marqo

7,379 followers
3w
Report this post
Building a search application has never been so easy 👇 Marqo has got you covered from training, to inference, to storage. You don't need to calculate the vectors yourself, simply select the model you want to use. Marqo supports hundreds of embedding models out of the box, as well as custom weights, and models fine-tuned with Marqtune (Marqo's Generalized Contrastive Learning framework). Find out more: https://lnkd.in/dKRiyzqs
2 Comments
Like Comment
To view or add a comment, sign in
Daniel Tunkelang

Query Understanding
4mo
Report this post
The most common goal that my search clients express is a desire to improve their ranking. This post sketches out some challenges of obtaining labeled training data to implement machine-learned ranking, aka learning to rank (LTR).

Where Do LTR Labels Come From?

dtunkelang.medium.com

4 Comments
Like Comment
To view or add a comment, sign in
Sparq

13,106 followers
5mo
Report this post
Our recent study found that user training is a game-changer when it comes to AI-infused products. Learn more by downloading the full report! bit.ly/sparqaistudy
Like Comment
To view or add a comment, sign in

1,317 followers

16 Posts

View Profile Connect

Trevor Chow’s Post

More Relevant Posts

Explore topics