Training modern machine learning models for personalization relies on good negative sampling. Turns out that model performance on popular, mid and tail items depends on both the negative sampling method and your training data. More juicy details and open source code in this in-depth paper by Arushi Prakash, Ph.D. and Dimitris Berberidis, recently published at an ACM RecSys 24 workshop. https://lnkd.in/gTqt94Jd https://lnkd.in/g8nMgvzt
Srivas Chennu’s Post
More Relevant Posts
-
Alignment is an incredibly complex machine learning challenge every machine learning engineer should understand. I wrote about what it is and what it means for you. Most importantly: It takes place throughout the entire machine learning pipeline- not just during training. I also linked to some excellent related articles by Devansh Devansh and Nathan Lambert. Learn about alignment here: https://lnkd.in/gX8YfUH7
Alignment: Understanding the Multi-Billion Dollar Opportunity within Machine Learning
societysbackend.com
To view or add a comment, sign in
-
10 Common Machine Learning Algorithms - What it does and its real-world applications: https://lnkd.in/dEqmZPg5
10 Common Machine Learning Algorithms
aiforproduct.substack.com
To view or add a comment, sign in
-
I had the joy of reading a recent preprint, "Questionable practices in machine learning" by Leech et al. It echoes many of the sentiments I've had for the past couple of years. I wrote some thoughts about it here: https://lnkd.in/e_HkFY9n
paper thoughts – questionable research practices (QRPs) in machine learning
beckham.nz
To view or add a comment, sign in
-
A new arXiv paper "CatLIP" from Apple introduces an efficient method for pre-training vision models on large-scale image-text datasets to achieve CLIP-level performance. CatLIP is a weakly supervised pre-training method for vision models using web-scale image-text data, which recasts pre-training as a classification task. This reformulation addresses computational challenges of contrastive learning used in CLIP. Key Findings: Efficiency: CatLIP is 2.7x faster to pre-train than CLIP on web-scale data while achieving similar downstream accuracy on ImageNet-1k (e.g., 84.3% vs. 84.4% top-1 for ViT B/16). Performance on smaller datasets: Unlike CLIP, CatLIP benefits from longer training on smaller datasets like CC3M, with accuracy improving as pre-training epochs increase. Scaling properties: CatLIP's performance scales well with larger models (ViT B/16 < ViT L/16 < ViT H/16) and bigger pre-training datasets (CC3M < DataComp-1.3B), similar to CLIP. Data-efficient transfer learning: Initializing the classifier with embeddings from CatLIP's pre-trained classifier enables more data-efficient transfer learning, especially with limited labeled data. Generalization to complex tasks: Representations learned with CatLIP generalize well to various downstream tasks, such as multi-label classification, semantic segmentation, and object detection, matching or slightly outperforming CLIP. In summary, CatLIP provides a more computationally efficient alternative to CLIP for pre-training on web-scale image-text data, while maintaining similar or better performance on a range of downstream computer vision tasks. The strong performance and efficiency of CatLIP make it a viable option for learning high-quality visual representations from large-scale weakly-labeled data. Paper: https://lnkd.in/g9U7jYMV Github: https://lnkd.in/gNJHauCd
To view or add a comment, sign in
-
Kernel tricks in SVM are wrongly understood most of the times. The simple innovation on mathematical side is often ignored by learners. For non linearly separable data, converting them to higher dimensions makes them linearly separable. But converting into higher dimension is problematic - time consuming, right functions to be obtained. At the end of the day, if you want a hyper plane separating set of points, all you need is all possible dot products between the points. So, even in higher dimensions, all we need is dot product between all possible set of points. Kernel Functions are functions which give that dot product between any two points converted into higher dimensions without actually transforming them. So kernel function is not actually transforming data into higher dimensions. It takes two points in original dimensions and returns dot product of points in higher dimension. It just gives what you want as an end outcome - the dot product / similarities between points in higher dimensions with which you can figure out the hyperplane separating the points. And this is less computationally expensive.
To view or add a comment, sign in
-
"Learning with Noisy Foundation Models" Hao Chen et al. Foundation models are pre-trained on large-scale datasets before being fine-tuned for specific downstream tasks using additional sample data. However, since the pre-trainign datasets are difficult to access, this posses a potential risk of having both noisy labels and data during initial training, leading to problems in model generalization and even security risks. The authors of this paper address for first time this issue through extensive experimentation on synthetic noisy datasets, showting that even slight noise can have deteriorating effects on out-of-domain (OOD) performance for any model type, architecture, training objectives, and downstream applications. This is because the pre-training noise shapes the feature space differently. Therefore, they propose a new tuning method, NMTune, that is able to improve this feature space and mitigate the effects of noise to improve model generalization. The results of this research showcase very important considerations when handling LLM models, as most downstream tasks involve some form of fine-tuning of foundation models which have already been trained on data that is largely unavailable for the public. Not only this can introduce potential biases and/or risks during inference, but also deteriorate the model's performance during OOD predictions. #llm #foundationmodel #noisydata #finetuning #researchpaper https://lnkd.in/geE2biuf
Learning with Noisy Foundation Models
arxiv.org
To view or add a comment, sign in
-
Last week I had the privilege of taking part in "Machine Learning approaches for complexity" part of the Erice International School on Complexity. There were many interesting lectures and discussions, both on applications of machine learning and on ideas that were new to me on sparsity and theoretical approaches. I was speaking about Fourier analytic Barron space theory and the gbtoolbox. Some slides have been uploaded, including mine, and can be found at https://lnkd.in/gwvrvVAp Check out the slides if you want to understand how the toolbox works, a bit about Fourier analytic Barron space theory, or some examples of the toolbox in practice. One of the most important examples I showed, was with the Digits dataset. Digits is like an easier MNIST, which consists of 4.5-bit 8 by 8 handwritten images instead of 8-bit 28 by 28 handwritten images. In the example, the Barron-E initialized network had sensitivity and accuracy close to the trained network. The next version of the gbtoolbox is being worked on, and I will announce it's release when it is ready.
Erice International School on Complexity: the XVIII Course "Machine Learning approaches for complexity"
indico.lucas.lu.se
To view or add a comment, sign in
-
Check the difference, all included the code as well with examples for better explanations :-) Day 3 for machine learning Hope you like it Best Regards
Day 3 _ Model vs Instance Based Model
https://ingoampt.com
To view or add a comment, sign in
-
I recently completed a machine learning course and earned a certification, enhancing my skills in developing predictive models and applying advanced techniques to solve complex problems.
To view or add a comment, sign in
Research Scientist at Criteo
1moOr Bayes ;)