Srivas Chennu’s Post

Machine learning research leader with expertise spanning healthcare, marketing and personalization

1mo

Training modern machine learning models for personalization relies on good negative sampling. Turns out that model performance on popular, mid and tail items depends on both the negative sampling method and your training data. More juicy details and open source code in this in-depth paper by Arushi Prakash, Ph.D. and Dimitris Berberidis, recently published at an ACM RecSys 24 workshop. https://lnkd.in/gTqt94Jd https://lnkd.in/g8nMgvzt

Evaluating Performance and Bias of Negative Sampling in Large-Scale Sequential Recommendation Models

arxiv.org

1 Comment

David Rohde

Research Scientist at Criteo

1mo

Or Bayes ;)

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Logan Thorneloe

Software Engineer at Google
9mo
Report this post
Alignment is an incredibly complex machine learning challenge every machine learning engineer should understand. I wrote about what it is and what it means for you. Most importantly: It takes place throughout the entire machine learning pipeline- not just during training. I also linked to some excellent related articles by Devansh Devansh and Nathan Lambert. Learn about alignment here: https://lnkd.in/gX8YfUH7

Alignment: Understanding the Multi-Billion Dollar Opportunity within Machine Learning

societysbackend.com
Like Comment
To view or add a comment, sign in
Ram Joshi

Building 10X AI Product Strategy. Posts about the process. | AI Product Manager
8mo
Report this post
10 Common Machine Learning Algorithms - What it does and its real-world applications: https://lnkd.in/dEqmZPg5

10 Common Machine Learning Algorithms

aiforproduct.substack.com
Like Comment
To view or add a comment, sign in
Christopher Beckham, PhD

ML Researcher @ Alpaca, Consultant
1mo
Report this post
I had the joy of reading a recent preprint, "Questionable practices in machine learning" by Leech et al. It echoes many of the sentiments I've had for the past couple of years. I wrote some thoughts about it here: https://lnkd.in/e_HkFY9n

paper thoughts – questionable research practices (QRPs) in machine learning

beckham.nz

3 Comments
Like Comment
To view or add a comment, sign in
Andrew Tang

ml ∪ swe | ex-@facebook
7mo
Report this post
A new arXiv paper "CatLIP" from Apple introduces an efficient method for pre-training vision models on large-scale image-text datasets to achieve CLIP-level performance. CatLIP is a weakly supervised pre-training method for vision models using web-scale image-text data, which recasts pre-training as a classification task. This reformulation addresses computational challenges of contrastive learning used in CLIP. Key Findings: Efficiency: CatLIP is 2.7x faster to pre-train than CLIP on web-scale data while achieving similar downstream accuracy on ImageNet-1k (e.g., 84.3% vs. 84.4% top-1 for ViT B/16). Performance on smaller datasets: Unlike CLIP, CatLIP benefits from longer training on smaller datasets like CC3M, with accuracy improving as pre-training epochs increase. Scaling properties: CatLIP's performance scales well with larger models (ViT B/16 < ViT L/16 < ViT H/16) and bigger pre-training datasets (CC3M < DataComp-1.3B), similar to CLIP. Data-efficient transfer learning: Initializing the classifier with embeddings from CatLIP's pre-trained classifier enables more data-efficient transfer learning, especially with limited labeled data. Generalization to complex tasks: Representations learned with CatLIP generalize well to various downstream tasks, such as multi-label classification, semantic segmentation, and object detection, matching or slightly outperforming CLIP. In summary, CatLIP provides a more computationally efficient alternative to CLIP for pre-training on web-scale image-text data, while maintaining similar or better performance on a range of downstream computer vision tasks. The strong performance and efficiency of CatLIP make it a viable option for learning high-quality visual representations from large-scale weakly-labeled data. Paper: https://lnkd.in/g9U7jYMV Github: https://lnkd.in/gNJHauCd
2 Comments
Like Comment
To view or add a comment, sign in
Prudhvi P

Data Scientist | Knowledge Entrepreneur | Breathes to teach | Craves to Learn | Dreams to write | Loves to Solve
7mo
Report this post
Kernel tricks in SVM are wrongly understood most of the times. The simple innovation on mathematical side is often ignored by learners. For non linearly separable data, converting them to higher dimensions makes them linearly separable. But converting into higher dimension is problematic - time consuming, right functions to be obtained. At the end of the day, if you want a hyper plane separating set of points, all you need is all possible dot products between the points. So, even in higher dimensions, all we need is dot product between all possible set of points. Kernel Functions are functions which give that dot product between any two points converted into higher dimensions without actually transforming them. So kernel function is not actually transforming data into higher dimensions. It takes two points in original dimensions and returns dot product of points in higher dimension. It just gives what you want as an end outcome - the dot product / similarities between points in higher dimensions with which you can figure out the hyperplane separating the points. And this is less computationally expensive.

5 Comments
Like Comment
To view or add a comment, sign in
Charlie Lopez

Data Scientist | M.Sc. Physics & M.Sc. Engineering
7mo
Report this post
"Learning with Noisy Foundation Models" Hao Chen et al. Foundation models are pre-trained on large-scale datasets before being fine-tuned for specific downstream tasks using additional sample data. However, since the pre-trainign datasets are difficult to access, this posses a potential risk of having both noisy labels and data during initial training, leading to problems in model generalization and even security risks. The authors of this paper address for first time this issue through extensive experimentation on synthetic noisy datasets, showting that even slight noise can have deteriorating effects on out-of-domain (OOD) performance for any model type, architecture, training objectives, and downstream applications. This is because the pre-training noise shapes the feature space differently. Therefore, they propose a new tuning method, NMTune, that is able to improve this feature space and mitigate the effects of noise to improve model generalization. The results of this research showcase very important considerations when handling LLM models, as most downstream tasks involve some form of fine-tuning of foundation models which have already been trained on data that is largely unavailable for the public. Not only this can introduce potential biases and/or risks during inference, but also deteriorate the model's performance during OOD predictions. #llm #foundationmodel #noisydata #finetuning #researchpaper https://lnkd.in/geE2biuf

Learning with Noisy Foundation Models

arxiv.org
Like Comment
To view or add a comment, sign in
Jonathan Miller

Senior Machine Learning Algorithm Engineer at KLA
7mo
Report this post
Last week I had the privilege of taking part in "Machine Learning approaches for complexity" part of the Erice International School on Complexity. There were many interesting lectures and discussions, both on applications of machine learning and on ideas that were new to me on sparsity and theoretical approaches. I was speaking about Fourier analytic Barron space theory and the gbtoolbox. Some slides have been uploaded, including mine, and can be found at https://lnkd.in/gwvrvVAp Check out the slides if you want to understand how the toolbox works, a bit about Fourier analytic Barron space theory, or some examples of the toolbox in practice. One of the most important examples I showed, was with the Digits dataset. Digits is like an easier MNIST, which consists of 4.5-bit 8 by 8 handwritten images instead of 8-bit 28 by 28 handwritten images. In the example, the Barron-E initialized network had sensitivity and accuracy close to the trained network. The next version of the gbtoolbox is being worked on, and I will announce it's release when it is ready.

Erice International School on Complexity: the XVIII Course "Machine Learning approaches for complexity"

indico.lucas.lu.se
Like Comment
To view or add a comment, sign in
Pegah Tafvizi

| Machine Learning (ML) | Deep Learning (DL) | iOS App Development with (ML) & (DL) Capabilities | Researcher
5mo Edited
Report this post
Check the difference, all included the code as well with examples for better explanations :-) Day 3 for machine learning Hope you like it Best Regards

Day 3 _ Model vs Instance Based Model

https://ingoampt.com
Like Comment
To view or add a comment, sign in
Dr. Shubhangi Borse

Certified Data Scientist
3mo
Report this post
I recently completed a machine learning course and earned a certification, enhancing my skills in developing predictive models and applying advanced techniques to solve complex problems.
5 Comments
Like Comment
To view or add a comment, sign in

368 followers

4 Posts

View Profile Follow

Srivas Chennu’s Post

More Relevant Posts

Explore topics