Combining Fuzzy Search and Prefix search in the same query AND keeping the latency down is harder than you might think! It will be exciting to see applications with high query rates using this. The latest advances in embedding in multiple resolutions and multi-phase ranking will enable new use cases in many organizations:
The latest Vespa newsletter is out - highlights: 👉 RAG Vespa now provides LLM inference support, you can now implement a retrieval-augmented generation (RAG) application entirely as a Vespa application. We have added a new sample application demonstrating RAG end-to-end on Vespa: - Generation using an external LLM like OpenAI - Running an LLM locally inside the Vespa application on CPU - Running an LLM inside the Vespa application on Vespa Cloud on GPU 👉 Fuzzy Search with Prefix Match A prefix search will match “Edvard Grieg” and “Edvard Gr”. A fuzzy search matches “Edvard Grieg” with “Edward Grieg”. From Vespa 8.337 you can combine the two to match “Edvard Grieg” and “Edward Gr”. Very powerful for query completion! 👉 Pyvespa Lots of new features, including a notebook that demonstrates how the mixedbread.ai rerank models (cross encoders) can be used for global phase reranking in Vespa. 👉 Vector search performance Up to 9x faster distance calculations! Improvements for euclidean, angular, hamming and dotproduct as well as for HNSW indexing. 👉 Embeddings Since Vespa 8.329, embed the data _once_ for multiple resolutions. Store low-res in memory, hi-res on disk to optimize for cost - then use two-phase ranking for low-latency search with high precision. Get started using Vespa with LlamaIndex! Check out the new Vespa Vector Store demo notebook. Deep dive into this and more features like 10x faster data migration in the latest Vespa Newsletter: