Fedor Borisyuk’s Post

Principal Staff Software Engineer

2mo

This week we are presenting our paper "LiNR: Model Based Neural Retrieval on GPUs at LinkedIn" accepted at CIKM 2024 (https://lnkd.in/gUrWqRcD). Please stop by and say hi to Aman Gupta, who will be there in person :) We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes at production scale. Talented co-authors include Fedor Borisyuk, Qingquan Song, Mingzhou Zhou, Ganesh Parameswaran, Madhulekha Arun, Siva P., Tugrul Bingol, Zhoutao Pei, Stanley(Kuang) Lee, Lu Z., Hugh Shao, Syed Ali Naqvi, Sen Zhou, Aman Gupta

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

arxiv.org

4 Comments

Jiaqi Zhai

Recommendations @ Meta

1mo

Congrats Fedor and glad to see strong performance of learned similarities / MoL across more use cases!

4 Reactions

Hongyi Ma

Software Engineer at LinkedIn

2mo

haha, Congrats Fedor Borisyuk, it seems the coffee/tea you had at Ebar is working perfect!

1 Reaction

Dhyey Mavani

Prev. AI/ML, DBs @ LinkedIn, Amazon | Computer Science, Math (Honors), Statistics (Honors) @ Amherst College | Ex-CSRMP Fellow @ Google | Ex-Quant Fellow @ Jane Street, D.E. Shaw

2mo

Many congratulations to the team! Thanks for your guidance Fedor Borisyuk! It was my pleasure to be able to work with the team to build out the PyTorch inference engine support at LinkedIn for this OON use case outlined in the paper! Siva P. Pratik Dixit Dhritiman Das Vishal Shah Qingquan Song Syed Ali Naqvi

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Zac Policzer

Distributed systems engineer, Closeted node.js lover, Constantly embarassing
2mo Edited
Report this post
Super excited about this. I designed Venice CDC for #venicedb about two years ago and now it's powering AI on GPU's at LinkedIn! The team working on LiNR talk about it in the linked paper are presenting at #CIKM this week. Check it out! #ai #gpu

Fedor Borisyuk

Principal Staff Software Engineer
2mo

This week we are presenting our paper "LiNR: Model Based Neural Retrieval on GPUs at LinkedIn" accepted at CIKM 2024 (https://lnkd.in/gUrWqRcD). Please stop by and say hi to Aman Gupta, who will be there in person :) We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes at production scale. Talented co-authors include Fedor Borisyuk, Qingquan Song, Mingzhou Zhou, Ganesh Parameswaran, Madhulekha Arun, Siva P., Tugrul Bingol, Zhoutao Pei, Stanley(Kuang) Lee, Lu Z., Hugh Shao, Syed Ali Naqvi, Sen Zhou, Aman Gupta

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
Venice

843 followers
2mo
Report this post
In this advanced use case, #VeniceDB is used to merge batch and stream inputs, and to transport the data to serving nodes which load it into GPUs, thus updating their ML model in near real time. Check out the paper for more details!

Fedor Borisyuk

Principal Staff Software Engineer
2mo

This week we are presenting our paper "LiNR: Model Based Neural Retrieval on GPUs at LinkedIn" accepted at CIKM 2024 (https://lnkd.in/gUrWqRcD). Please stop by and say hi to Aman Gupta, who will be there in person :) We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes at production scale. Talented co-authors include Fedor Borisyuk, Qingquan Song, Mingzhou Zhou, Ganesh Parameswaran, Madhulekha Arun, Siva P., Tugrul Bingol, Zhoutao Pei, Stanley(Kuang) Lee, Lu Z., Hugh Shao, Syed Ali Naqvi, Sen Zhou, Aman Gupta

LiNR: Model Based Neural Retrieval on GPUs at LinkedIn

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
TuringPost

5,089 followers
6mo
Report this post
Introducing Mamba, an innovative selective state space model that addresses the limitations of both Transformers and traditional SSMs. The Mamba architecture integrates selective SSMs into a neural network, removing attention and MLP blocks. Let's explore its key features! 👇 ▪️ Selective SSMs Selective SSMs are a special type of building block used in Mamba neural networks. They allow the network to focus on specific parts of the input sequence, improving efficiency and performance in tasks like language modeling and audio analysis. ▪️ Simplified structure: Unlike traditional models that use a mix of different blocks, Mamba has a single, efficient block called the "Mamba block." These Mamba blocks are all the same and stacked together, making the model easier to understand and faster to run. ▪️ Hardware-aware algorithm: Regular SSMs are powerful but can be slow on hardware like GPUs. Mamba addresses this by using a hardware-aware parallel algorithm. It skips convolutions and uses a scan operation to better manage memory, making Mamba faster and more efficient. For more fascinating information about Mamba, read our article: https://lnkd.in/gKeH5BYB
Like Comment
To view or add a comment, sign in
Dawood Sarfraz

Machine Learning | Deep Learning | Computer Vision | Natural Language Processing | LLMs
8mo Edited
Report this post
Why we need a Library like PyTorch at all since Numpy already provides data structures and utilities for working with multi-dimensional numeric data. There are two main reasons: Automatic Differentiation: One of the key features of PyTorch is its automatic differentiation capability through the autograd module. This feature allows PyTorch to automatically compute gradients of the loss function with respect to model parameters, facilitating gradient-based optimization methods like gradient descent. While NumPy provides array operations, it doesn't have built-in support for automatic differentiation, which is essential for training neural networks efficiently. GPU Acceleration: PyTorch seamlessly integrates with CUDA, NVIDIA's parallel computing platform, enabling GPU acceleration for numerical computations. This allows PyTorch to leverage the computational power of GPUs for training deep neural networks, resulting in significant speedups compared to CPU-only implementations. While NumPy can be used with GPU-accelerated libraries like CuPy, PyTorch provides a more streamlined approach to GPU programming, especially for deep learning tasks.
Like Comment
To view or add a comment, sign in
David Ruau

Business strategic alliances, Drug Discovery AI, EMEA at NVIDIA
2w
Report this post
We recently announced a new math library called cuEquivariance. I am particularly happy to see it coming out as this an important set of operations used in equivariant neural network (ENN) such as DiffDock, MACE and NequIP (just the tip of the iceberg - search perplexity.ai to see where they are used). Simply put this technique allows ENN to recognise objects such as molecules in different orientations. The NVIDIA team Mario Geiger, Emine Kucukbenli, Becca Z. release cuEquivariance to facilitate the construction of high-performance ENN. Blogpost: https://lnkd.in/eYUqXuQq Github: https://lnkd.in/e696TwUJ #nvidia #chemistry #neuralnetwork #materialscience #molecularscience #protein #RNA #graphanalysis

Accelerate Drug and Material Discovery with New Math Library NVIDIA cuEquivariance | NVIDIA Technical Blog

developer.nvidia.com

1 Comment
Like Comment
To view or add a comment, sign in
Mudith Nahata

ML Engineer @ Transpoze.AI | EdTech | Deep Learning | NLP | Computer Vision | Solved 350+ Problems
8mo Edited
Report this post
* If you are an ML Engineer and you are not using Keras, Here are some incredible benchmarks!✨ Description: * Keras are faster than Pytorch, and produce the output rapidly when compared to Pytorch.⚡ * Keras is a Python-based, deep learning API that runs on top of the TensorFlow machine learning platform, and fully supports GPUs.⚡ *All the benchmarks were done in GPU, I would expect the result in TPU(Tensor Processing Unit).⚡ * To use a TPU, select a TPU runtime (for example, in Colab). Once you’ve connected to the runtime, you need to use a TPU Cluster Resolver to automatically detect the TPU on any supported platform.⚡ * Once you’ve set it up, the TPU workflow will be similar to implementing multi-GPU training on a single machine. The main difference is that the distribution strategy used is TPU Strategy.⚡ * Performance Benchmarks: Benchmarks comparing TPUs and GPUs on similar tasks often show TPUs excelling in tasks optimized for their architecture, offering faster training times and more efficient processing.⚡ 🔗 https://lnkd.in/gcqhGKmF #AI##ML##DeepLearning##DataScience#
Like Comment
To view or add a comment, sign in
Binghao Ng, Ph.D.

Quantitative Problem Solver
1mo
Report this post
Fascinated by the expressiveness of Scala and inspired by Fabio Labella’s talk on using fs2 for control flow, I decided to see if I can apply fs2’s functional streams to a simple Machine Learning workflow. I also used Storch, a Scala library implementing the ever familiar PyTorch API to enable the training of neural networks with GPUs. Blog: https://lnkd.in/gxpV_UDn Repo: https://lnkd.in/gJRDyqPw #scala #functionalprogramming #storch #fs2
Like Comment
To view or add a comment, sign in
Haris Ahmad

Senior Machine Learning Engineer at Bosch Global Software Technologies | Data Science | Generative AI | LLMs | 2x GCP Certified
8mo Edited
Report this post
💡Learnings 💡 Recently been working around LLMs and Generative AI when I found out some observations and challenges.📚 🔸While working with LLMs in GPUs in jupyter notebooks , had a situation where model was loaded on GPU, and had to clear the gpu memory without stopping the kernel. Unfortunately after multiple attempts and research found out there is no way I can release all the memory which is occupied by model. Strange! 🔸After digging deeper , and out of curiosity when i tried the same with just a simple Pytorch tensor, loading it in GPU and then releasing the tensor. Surprisingly it failed too to release all memory from the GPU. ➡️This behaviour is noticed on jupyter notebooks , where it occupies the space once and it cant be released until the kernel is restarted! Pretty weird. Pytorch experts out there can feel free to correct or enlighten me if theres a way to solve this!
Like Comment
To view or add a comment, sign in
Sanjai Arvinth

Data Scientist | Student @KCE, 3rd year | Machine Learning | Artificial Intelligence |
1mo
Report this post
Evaluating the notable variations in deep learning task performance between CPUs and GPUs. This review shows how GPUs transform training times and model efficiency by offering remarkable speedups. Examine the differences between popular models such as LSTM, MobileNetV2, ResNet50, and others to see why GPUs are crucial to current artificial intelligence processes. #DeepLearning #AI #MachineLearning #GPUs #CPUs #TechInnovation #PerformanceAnalysis #NeuralNetworks #DataScience #ComputingPower #TechTrends #MLModels #ArtificialIntelligence #Efficiency #TechInsights

CPU vs GPU for Deep Learning

link.medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Muluken Zemed

Front End Developer
6mo
Report this post
My latest research on "A hybrid convolutional neural network and support vector machine classifier for Amharic character recognition" has just been published with @SpringerNature in Neural Computing & Applications Journal. Read here: https://rdcu.be/dJGl0
1 Comment
Like Comment
To view or add a comment, sign in

2,475 followers

25 Posts

View Profile Connect

Fedor Borisyuk’s Post

More Relevant Posts

Explore topics