Kristian Aune’s Post

Founder / Head of Customer Success, Vespa.ai - ex Yahoo

6mo

Combining Fuzzy Search and Prefix search in the same query AND keeping the latency down is harder than you might think! It will be exciting to see applications with high query rates using this. The latest advances in embedding in multiple resolutions and multi-phase ranking will enable new use cases in many organizations:

Vespa.ai

2,041 followers

6mo

The latest Vespa newsletter is out - highlights: 👉 RAG Vespa now provides LLM inference support, you can now implement a retrieval-augmented generation (RAG) application entirely as a Vespa application. We have added a new sample application demonstrating RAG end-to-end on Vespa: - Generation using an external LLM like OpenAI - Running an LLM locally inside the Vespa application on CPU - Running an LLM inside the Vespa application on Vespa Cloud on GPU 👉 Fuzzy Search with Prefix Match A prefix search will match “Edvard Grieg” and “Edvard Gr”. A fuzzy search matches “Edvard Grieg” with “Edward Grieg”. From Vespa 8.337 you can combine the two to match “Edvard Grieg” and “Edward Gr”. Very powerful for query completion! 👉 Pyvespa Lots of new features, including a notebook that demonstrates how the mixedbread.ai rerank models (cross encoders) can be used for global phase reranking in Vespa. 👉 Vector search performance Up to 9x faster distance calculations! Improvements for euclidean, angular, hamming and dotproduct as well as for HNSW indexing. 👉 Embeddings Since Vespa 8.329, embed the data _once_ for multiple resolutions. Store low-res in memory, hi-res on disk to optimize for cost - then use two-phase ranking for low-latency search with high precision. Get started using Vespa with LlamaIndex! Check out the new Vespa Vector Store demo notebook. Deep dive into this and more features like 10x faster data migration in the latest Vespa Newsletter:

Vespa Newsletter, May 2024

blog.vespa.ai

To view or add a comment, sign in

More Relevant Posts

Vespa.ai

2,041 followers
6mo
Report this post
The latest Vespa newsletter is out - highlights: 👉 RAG Vespa now provides LLM inference support, you can now implement a retrieval-augmented generation (RAG) application entirely as a Vespa application. We have added a new sample application demonstrating RAG end-to-end on Vespa: - Generation using an external LLM like OpenAI - Running an LLM locally inside the Vespa application on CPU - Running an LLM inside the Vespa application on Vespa Cloud on GPU 👉 Fuzzy Search with Prefix Match A prefix search will match “Edvard Grieg” and “Edvard Gr”. A fuzzy search matches “Edvard Grieg” with “Edward Grieg”. From Vespa 8.337 you can combine the two to match “Edvard Grieg” and “Edward Gr”. Very powerful for query completion! 👉 Pyvespa Lots of new features, including a notebook that demonstrates how the mixedbread.ai rerank models (cross encoders) can be used for global phase reranking in Vespa. 👉 Vector search performance Up to 9x faster distance calculations! Improvements for euclidean, angular, hamming and dotproduct as well as for HNSW indexing. 👉 Embeddings Since Vespa 8.329, embed the data _once_ for multiple resolutions. Store low-res in memory, hi-res on disk to optimize for cost - then use two-phase ranking for low-latency search with high precision. Get started using Vespa with LlamaIndex! Check out the new Vespa Vector Store demo notebook. Deep dive into this and more features like 10x faster data migration in the latest Vespa Newsletter:

Vespa Newsletter, May 2024

blog.vespa.ai
Like Comment
To view or add a comment, sign in
Héla Ben Khalfallah

Web Solutions Engineer | Software Architect | Senior Web Developer | FrontendOps | Frontend Expert | Lead Web Developer | Technical Writer | Software Craftsmanship | Reactjs | Nodejs
9mo
Report this post
For me, it is not possible to develop without mastering algorithms and data structures. Computing is the use of algorithms to manipulate data structures. 😎 https://lnkd.in/etzgBwGe https://lnkd.in/e2WsVyce

Algorithms and Their Impact on Performance

betterprogramming.pub
Like Comment
To view or add a comment, sign in
Sachintha Hewawasam

Associate Technical Lead | Specializing in Java, Full Stack Development | Building Future-proof Software Systems
8mo
Report this post
A Comprehensive Guide for Mastering Two-Pointer Algorithms. #Algorithms #2pointer

Mastering Two-Pointer Algorithms: A Comprehensive Guide

blog.devgenius.io
Like Comment
To view or add a comment, sign in
Anshul Pal

Freelance Content Writer | Blogger | Developer Roles | SEO | Digital Marketing
9mo
Report this post
Stack-based Algorithms: Stacks in DFS and Backtracking #algorithms #apalgorithms #stack #dfs #backtracking #dfsalgo #algorithmmastery #algorithmupdates #algorithmicthinking #algotrading https://lnkd.in/gYnkWrDF

Stack-based Algorithms: Stacks in DFS and Backtracking — APalgorithm

apalgorithm.com
Like Comment
To view or add a comment, sign in
Sowmya V.

Research @ National Research Council, Canada
9mo
Report this post
I spent some time this week using this new text embedding model that was released earlier this month, and I am happy with the results (for stuff I tested it with, of course!). I can't run it on a local mac machine anymore (for training under 5 epochs), like I did with other sentence transformer models I was using (it crashed within the first epoch). But performance wise, it was good. I will definitely explore further. #embeddings #sentencetransformers #nlproc https://lnkd.in/gzV9ScNQ

Open Source Strikes Bread - New Fluffy Embedding Model

mixedbread.ai

1 Comment
Like Comment
To view or add a comment, sign in
Ashish Patel 🇮🇳

🔥 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
1mo
Report this post
Carnegie Mellon’s RAGViz shows why 70% of LLM responses miss the mark (and how to fix it) Ever asked an LLM a question and thought, this doesn’t quite hit the spot? You’re not alone. Nearly 70% of the time, these models miss the mark because they aren’t fully tuned in to the context they’re pulling from. That’s where Carnegie Mellon’s School of Computer Science comes in with a tool that’s about to change the game: RAGViz. Here’s the deal: In Retrieval-Augmented Generation (RAG), models pull info from external documents to generate responses. Sounds great in theory, but here’s the catch—current systems don’t let you see how much attention the model gives to these documents. You’re left guessing why the output feels off. Enter RAGViz: RAGViz gives you a front-row seat to what’s going on under the hood of your LLM. It’s like putting on x-ray glasses for your model’s brain. 🧠 Here’s why it’s a Important : ✏️ You can actually see how much attention your model is giving to the right parts of the document (or if it’s completely distracted). ✏️ You can tweak and test by adding or removing context documents to see how that changes the output. No more flying blind—you get clear, actionable insights to fine-tune your model’s performance. 🎬 How it works 1️⃣ Plug it in: RAGViz hooks up to your LLM and retrieval index with a simple interface. 2️⃣ Spot the focus: It visualizes which words your model is focusing on and which ones it’s ignoring. 3️⃣ Test and compare: Add or remove context docs to see how the output changes in real time. 4️⃣ Fast results: You’ll get answers in about 5 seconds, even on a regular GPU. 5️⃣ Open-source & customizable: Use your own models, tweak settings, and make it your own. Why you’ll love it: 🔍 Crystal-clear visibility: No more guessing why a response went off track. ⚡ Super fast: About 5 seconds per query. 📊 Efficient & flexible: Works smoothly, even on a moderate setup. 🛠️ Fully open-source: Ready for you to dive in and make it yours. P.S. Want to see RAGViz in action? We’ve got a demo video and all the code ready for you. Dive in and start exploring! 🚀 #LLMs #DataScience #Machinelearning

5 Comments
Like Comment
To view or add a comment, sign in
Mastering LLM (Large Language Model)

43,919 followers
2mo
Report this post
Get {50%} 𝗢𝗙𝗙 (𝗖𝗼𝗱𝗲 - LLM50) on our 𝗟𝗟𝗠 𝗜𝗻𝘁𝗲𝗿𝘃𝗶𝗲𝘄 𝗣𝗿𝗲𝗽 𝗖𝗼𝘂𝗿𝘀𝗲 -https://lnkd.in/grTzEtpH =========================== 🚀 𝗠𝗲𝗺𝗼𝗥𝗔𝗚: 𝗔 𝗥𝗲𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻𝗮𝗿𝘆 𝗥𝗔𝗚 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 𝗳𝗼𝗿 𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗱 𝗘𝘃𝗶𝗱𝗲𝗻𝗰𝗲 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹! MemoRAG is transforming the way we approach retrieval-augmented generation (RAG) by incorporating a super-long memory model for global understanding across massive datasets. 🌐 Unlike traditional RAG frameworks, MemoRAG doesn’t just focus on explicit queries. Instead, it taps into its global memory to recall query-specific clues, creating responses that are not only more accurate but also contextually rich. 🔥 𝗞𝗲𝘆 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 🔥 1. 𝗚𝗹𝗼𝗯𝗮𝗹 𝗠𝗲𝗺𝗼𝗿𝘆: Processes up to 1 million tokens in a single context, offering unmatched depth and breadth across data. 2. 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝗯𝗹𝗲 & 𝗙𝗹𝗲𝘅𝗶𝗯𝗹𝗲: Fine-tune and adapt MemoRAG to new tasks with only a few hours of training! 3. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗖𝗹𝘂𝗲𝘀: Bridges raw inputs to answers using clues derived from memory—unlocking insights from even the most complex queries. 4. 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗖𝗮𝗰𝗵𝗶𝗻𝗴: Up to 30x faster context pre-filling through advanced caching, chunking, indexing, and encoding. 5. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗥𝗲𝘂𝘀𝗲: Long contexts can be encoded once and reused, improving efficiency for tasks with repetitive data needs. 🆕 𝗟𝗶𝘁𝗲 𝗠𝗼𝗱𝗲 𝗼𝗳 𝗠𝗲𝗺𝗼𝗥𝗔𝗚 🆕 You can now experience MemoRAG’s powerful pipeline with just a few lines of code! Ideal for GPUs with 16GiB or 24GiB memory, the Lite Mode simplifies getting started while maintaining exceptional performance. ✨ 𝗕𝗮𝘀𝗶𝗰 𝗨𝘀𝗮𝗴𝗲 & 𝗠𝗼𝗱𝗲𝗹 𝗖𝗼𝗺𝗽𝗮𝘁𝗶𝗯𝗶𝗹𝗶𝘁𝘆 MemoRAG works seamlessly with HuggingFace models, using the MemoRAG.memorize() method to build global memory across long input contexts. 🧠 𝗟𝗼𝗻𝗴 𝗟𝗟𝗠𝘀 𝗮𝘀 𝗠𝗲𝗺𝗼𝗿𝘆 𝗠𝗼𝗱𝗲𝗹𝘀 🧠 MemoRAG also supports long-context LLMs like Meta-Llama-3.1-8B-Instruct and Llama3.1-8B-Chinese-Chat, optimizing memory through MInference. Check out the provided notebooks and scripts for detailed usage and unlock the power of MemoRAG! 💡 #MemoRAG #RAGFramework #LLMs #GlobalMemory #AI #NLP #MInference #GenerativeAI #MasteringLLM
Like Comment
To view or add a comment, sign in
Massimiliano Marchesiello

AI & Machine Learning Specialist | Data Scientist
1mo
Report this post
How to Create a RAG Evaluation Dataset From Documents https://ift.tt/UQnWN1T Automatically create domain-specific datasets in any language using LLMs Our automatically generated RAG evaluation dataset on the Hugging Face Hub $PDF input file from the European Union licensed under CC BY 4.0$. Image by the author In this article I will show you how to create your own RAG dataset consisting of contexts, questions, and answers from documents in any language. Retrieval-Augmented Generation $RAG$ \[1\] is a technique that allows LLMs to access an external knowledge base. By uploading PDF files and storing them in a vector database, we can retrieve this knowledge via a vector similarity search and then insert the retrieved text into the LLM prompt as additional context. This provides the LLM with new knowledge and reduces the possibility of the LLM making up facts $hallucinations$. The basic RAG pipeline. Image by the author from the article “How to Build a Local Open-Source LLM Chatbot With RAG” However, there are many parameters we need to set in a RAG pipeline, and researchers are always suggesting new improvements. How do we know which parameters to choose and which methods will really improve performance for our particular use case? This is why we need a validation/dev/test dataset to evaluate our RAG pipeline. The dataset should be from the domain we are interested in and in the language we want to use. Table Of Contents · Deploying a Local LLM With VLLM · Creating a RAG Evaluation Dataset ∘ Read Files ∘ Generating Question-Answer-Context Samples ∘ Filtering out Bad Question-Answer Pairs ∘ Saving The Dataset ∘ Creating a RAG Dataset in Another Language · Conclusion · References Deploying a Local LLM With VLLM First, we get a local LLM up and running. I used VLLM to set up an OpenAI-compatible LLM server with a quantized Llama-3.2–3B-Instruct. Make sure you use an LLM that has been trained on the language you want to use. Deploying a local LLM with Docker and VLLM is quite simple: With Docker: docker run --runtime nvidia --gpus all \ -v \~/.cache/huggingface:/root/.cache/huggingface \ --env "HUGGING\_FACE\_HUB\_TOKEN=\<secret\>" \ -p 8000:8000 \ --ipc=host \ vllm/vllm-openai:latest \ --model AMead10/Llama-3.2-3B-Instruct-AWQ \ --quantization awq \ --max-model-len 2048 With Docker Compose: services: vllm: image: vllm/vllm-openai:latest command: \["--model", "AMead10/Llama-3.2-3B-Instruct-AWQ", "--max-model-len", "2048", "--quantization", "awq"\] ports: - 8000:8000 volumes: - \~/.cache/huggingface:/root/.cache/huggingface environment: - "HUGGING\_FACE\_HUB\_TOKEN=\<secret\>" deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: \[gpu\] Now we can use our local LLM with the official OpenAI Python SDK. If you want to use the official OpenAI models, just...

$How to Create a RAG Evaluation Dataset From Documents https://ift.tt/UQnWN1T Automatically create domain-specific datasets in any language using LLMs Our automatically generated RAG evaluation dataset on the Hugging Face Hub $PDF input file from the European Union licensed under CC BY 4.0$. Image by the author In this article I will show you how to create your own RAG dataset consisting...$

How to Create a RAG Evaluation Dataset From Documents https://ift.tt/UQnWN1T Automatically create domain-specific datasets in any language using LLMs Our automatically generated RAG evaluation dataset on the Hugging Face Hub $PDF input file from the European Union licensed under CC BY 4.0$. Image by the author In this article I will show you how to create your own RAG dataset consisting...

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Tushar Khandelwal

NIT Jalandhar | IT-25 | 3⭐ Leetcode | Full Stack & Flutter Developer | Top 100 Finalist HackWithInfy-24| Top 65 Finalist Amazon Hackon-24|Tech Enthusiast | Ex-Intern @Medlr
5mo
Report this post
Recent Practice with Binary Search Problems 1. Capacity to Ship Packages Within B Days / Minimum Number of Days to Make m Bouquets: Use binary search to find the minimum capacity or days needed. 2. Search in 2D Matrix Using Flatten 2D to 1D: Optimize search in a matrix with O(log(m*n)) complexity. 3. Allocate Books 4. Painter's Partition 5. Minimized Maximum of Products Distributed to Any Store 6. First and Last Occurrences in Array: Use binary search twice to find the first and last occurrence of a target. 7. Service and Cache: Implement using map and binary search for optimal caching strategy. 8. Median in a Row-Wise Sorted Matrix: Perform two binary searches—one for the range and one for counting elements less than mid. Time Complexity: O(log(maximum number in matrix) * r * log(c)). 9. Rotated Sorted Array With Unique Elements: Apply binary search twice to ensure always searching in the sorted part of the array. 10. Rotated Sorted Array with Duplicates: Address the challenge of arr[low] = arr[mid] = arr[high] by shrinking the search space. 11. Find Min in Rotated Sorted Array: Handle sorted portions to find the minimum element. 12. Number of Times Array is Rotated: Determine the rotation count using binary search. 13. Find the Nth Root: Time Complexity: O(log(m) * log(n)). 14. Find the Peak Element 15. Search in a Bitonic Array: Find the peak element and apply binary search in ascending and descending parts. Looking to solve more challenging problems #BinarySearch #ProblemSolving #Algorithms
Like Comment
To view or add a comment, sign in
Eumentis

1,119 followers
10mo
Report this post
Just like porting an image classification model, doing the same for an object detection model involves a few lines of code. However, this was achieved with a fair bit of research and understanding of the porting process. Do give it a read and a clap on Medium to encourage Madhur Zanwar to continue sharing his knowledge. 👏 👏 #machinelearning #ml #mlengineer #objectdetection #edgeai #ondeviceml #pytorch #playtorch #ptl #modelconversion #mljourney

Porting PyTorch based object detection model to mobile compatible format.

medium.com

1 Comment
Like Comment
To view or add a comment, sign in

1,209 followers

View Profile Connect

Kristian Aune’s Post

Vespa Newsletter, May 2024

blog.vespa.ai

More from this author

Vespa Meetup in Sunnyvale

Explore topics