Snowflake released the #SnowflakeArctic embed family of models and made them open source under an Apache 2.0 license. The Massive Text Embedding Benchmark (MTEB) Retrieval Leaderboard shows that the Arctic embed model with only 334 million parameters is the only one to perform better than the average retrieval performance of 55.9. The five models, ranging from x-small (xs) to large (l), can be used right away on Hugging Face. Companies use private datasets along with #LLMs as part of a Retrieval Augmented Generation (#RAG) or #semanticsearch service. The impressive embedding models put into practice the technical know-how, search capabilities, and research and development that Snowflake got from Neeva in May. Snowflake Copilot is integrated into Snowflake Cortex, their #AI service, and can be utilized for a variety of tasks such as question answering, and summarization. With these announcements, Snowflake is breaking ground on open LLMs and showing it is continuously innovating for their customers on the AI journey.
William McKnight’s Post
More Relevant Posts
-
Wanted to share the article in Towards AI about Volga, the project I've been working on - https://lnkd.in/eZdNSWUX Volga (https://lnkd.in/efFNx38g) is an open-source feature engine made for real-time AI/ML. It aims to remove dependency on complex multi-part computational layers in real-time ML systems (Spark+Flink) and/or managed feature platforms like Tecton, Fennel, Chalk Please share your feedback, especially if you have experience setting up feature infra/pipelines for real-time ML or with managed feature platforms. #ai #ml #mlops #spark #flink #fennel #tecton #chalk
To view or add a comment, sign in
-
❄ Snowflake 𝐋𝐚𝐮𝐧𝐜𝐡𝐞𝐬 𝐀𝐫𝐜𝐭𝐢𝐜 𝐄𝐦𝐛𝐞𝐝 – 𝐓𝐡𝐞 𝐖𝐨𝐫𝐥𝐝'𝐬 𝐇𝐢𝐠𝐡𝐞𝐬𝐭 𝐑𝐚𝐧𝐤𝐞𝐝 𝐏𝐫𝐚𝐜𝐭𝐢𝐜𝐚𝐥 𝐓𝐞𝐱𝐭 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞𝐬! 🚀 We're thrilled to introduce the 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞 𝐀𝐫𝐜𝐭𝐢𝐜 𝐞𝐦𝐛𝐞𝐝 - our latest family of state-of-the-art text embedding models now available! These models are designed to enhance retrieval performance efficiently and are open-sourced under the Apache 2.0 license. 🔍 𝐊𝐞𝐲 𝐇𝐢𝐠𝐡𝐥𝐢𝐠𝐡𝐭𝐬: • 𝐏𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞: Our largest model outperforms others with a smaller size, offering a significant advantage in retrieval tasks. • 𝐀𝐜𝐜𝐞𝐬𝐬𝐢𝐛𝐢𝐥𝐢𝐭𝐲: Models range from x-small to large, catering to diverse enterprise needs. • 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲: Optimized for lower latency and cost, making them ideal for enterprise search applications. 💡 These models integrate seamlessly into your systems, providing top-notch retrieval capabilities with minimal setup. Available now on Hugging Face and soon in Snowflake Cortex. 📈 𝐒𝐭𝐚𝐲 𝐀𝐡𝐞𝐚𝐝: Leverage these models to power up your applications with advanced AI capabilities. 🔗 Read more about this release: https://lnkd.in/dnv25qwp #Snowflake #AI #MachineLearning #DataScience #TechNews
To view or add a comment, sign in
-
👋 Hello, Generative AI Enthusiasts! Ready to level up? It's time to Snowflake ! ❄️✨ Why choose Generative AI with Snowflake? 🤔 1️⃣ 𝐅𝐮𝐥𝐥𝐲 𝐇𝐨𝐬𝐭𝐞𝐝 𝐋𝐋𝐌𝐬: Snowflake handles everything! No infrastructure worries—just pure AI power. 💪 2️⃣ 𝐙𝐞𝐫𝐨 𝐒𝐞𝐭𝐮𝐩: Jump right into AI without the hassle. Get started in seconds! ⏱️ 3️⃣ 𝐃𝐚𝐭𝐚 𝐒𝐭𝐚𝐲𝐬 𝐒𝐚𝐟𝐞: Keep your data within Snowflake—no need to move it around. Secure and simple! 🔒 4️⃣ 𝐍𝐨 𝐄𝐱𝐭𝐫𝐚 𝐏𝐲𝐭𝐡𝐨𝐧 𝐌𝐨𝐝𝐮𝐥𝐞𝐬: Forget about dependency management and security concerns like Snyk scores. 🚫🐍 5️⃣ 𝐒𝐐𝐋-𝐥𝐢𝐤𝐞 𝐒𝐢𝐦𝐩𝐥𝐢𝐜𝐢𝐭𝐲: Leverage AI as easily as using 𝐂𝐎𝐔𝐍𝐓(1) or 𝐒𝐔𝐌() —just another SQL function! 📊 6️⃣ 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐕𝐞𝐜𝐭𝐨𝐫 𝐒𝐭𝐨𝐫𝐚𝐠𝐞: Store your vector embeddings directly in Snowflake! No need for additional vector databases like OpenSearch, Pinecone, or FAISS. Everything you need is right here. 📦 7️⃣ 𝐀𝐥𝐥-𝐢𝐧-𝐎𝐧𝐞 𝐋𝐋𝐌 𝐀𝐜𝐜𝐞𝐬𝐬: Get access to LLMs trained by top researchers at 𝐌𝐢𝐬𝐭𝐫𝐚𝐥, Reka, 𝐌𝐞𝐭𝐚, 𝐆𝐨𝐨𝐠𝐥𝐞, and more, including 𝐒𝐧𝐨𝐰𝐟𝐥𝐚𝐤𝐞 𝐀𝐫𝐜𝐭𝐢𝐜—all in one place. 🌍 Ready to kickstart your Snowflake AI journey? 🚀 Check out my recent video to dive in! 🎥 https://lnkd.in/dRMRv75m #snowflake #genai #LetItSnow #CortexAI
To view or add a comment, sign in
-
Databricks’ Retrieval Augmented Generation (RAG) has been a massive success since its launch in December last year, with bespoke RAG applications proliferating on the Databricks Data Intelligence Platform. So major updates are always big news - especially when they’re being aimed directly at the Enterprise AI market. I’m pleased to see this focus. General-purpose #LLMs have a number of weaknesses in the #EnterpriseDataManagement context, which demand specific remedies. Has anyone tested out the new RAG features yet? #AI #GenAI
Production-Quality RAG Applications with Databricks
databricks.com
To view or add a comment, sign in
-
One of the primary factors that drew me to Databricks was its foundation on open-source principles. The founders were pioneers in developing some of the world's most popular open-source data technologies, including Apache Spark, Delta Lake, and MLflow. I firmly believe in the transformative potential of open technology to drive innovation at an accelerated pace. That's why I am particularly thrilled about our investments in Mistral AI, an open and portable generative AI solution tailored for developers and businesses alike. Our customers can leverage Mistral AI's open models on Databricks Data Intelligence Platform, empowering them to build and deploy generative AI applications. This strategic integration of open technology exemplifies our commitment to fostering innovation and enabling our customers to unlock new possibilities in the realm of AI-driven solutions. https://lnkd.in/gpZ6nQFW
Databricks invests in Mistral and brings its AI models to data intelligence platform
https://venturebeat.com
To view or add a comment, sign in
-
BigQuery & Vertex AI Conquer IMDB Reviews (Project Included!) Big thanks to Professor Wael Damra for giving me such an assignment! I'm thrilled to share that I successfully analyzed movie reviews from the IMDB dataset using BigQuery and Vertex AI. I've even included a document outlining the complete work process! Here's a breakdown of what I achieved: Setting Up: Established a Vertex AI connection within BigQuery, granted access, and created a remote model linked to the text-bison LLM. (Details in the document) Sentiment Analysis: Used ml.generate_text to analyze sentiment (positive/negative) for specific movies. I documented time, settings, and results for each analysis, identified sentiment label match rates, and reviewed mismatched labels to provide my own assessment. (Full details in the document.) Keyword Extraction: Leveraging ml.generate_text, I extracted the top 3 keywords for each review (excluding specific words and converting plurals to singular). I then used SPLIT/UNNEST and OVER functions to identify the top 5 most frequent keywords per movie. (See the document for the complete process) #BigQuery #VertexAI #TextAnalysis #MachineLearning #IMDB Let's connect and explore the exciting world of data warehousing and AI!
To view or add a comment, sign in
-
Every database today is a vector store / database but Business requirements for Search are way more complex than what can be solved by just ANN retrieval! This is the future of Semantic Search - A multi-stage retrieval and ranking mechanism involving; - First stage: BM25, Dense or Sparse Vector retrieval - Mid Stage: A learning to rank mechanism or rescoring - Final Ranking: AI based models like Cohere's Rerank 3 If your Search Business requirements are complex and domain specific, there is no simple solution. A multi-stage system like this is the only true solution, with Elasticsearch the leader by far, and offering all the relevant capabilities. Read more about Elasticsearch adding support for Cohere’s Rerank 3 model at https://lnkd.in/gNntYH7p. Also, not to forget, Retrieval Augmented Generation (or RAG) works best if relevance of the Search Engine Results is on-point, and something only a multi-stage retrieval platform like this can achieve for a set of disparate, diverse datasets (which is generally the Business Requirement for RAG use-cases). #vectorsearch #vectordb #elasticsearch #rag
To view or add a comment, sign in
-
Successfully harnessing the potential of #AI is critical to successful #EnterpriseDataManagement today - and it will be table stakes in the future. So it’s exciting to hear that our partners Snowflake have released their own #LLM, an enterprise-grade large language model. The claims being made about the speed and cost of training Arctic LLM should have anyone who uses data at scale paying close attention. TechCrunch has some questions here, but I think that the stated goal of having “an API that our customers can use so that business users can directly talk to data” - in the words of CEO Sridhar Ramaswamy - is a prospect nobody should be ignoring. #MasterDataManagement #GenAI #DataStrategy
Snowflake releases a flagship generative AI model of its own | TechCrunch
https://techcrunch.com
To view or add a comment, sign in
-
🌟 Introduction to Snowflake Arctic Snowflake has released a large language model (LLM), "Arctic," tailored for enterprise AI. It's open-source and focuses on cost-effective training. Unlike consumer-focused LLMs, this one targets businesses that need LLMs for internal tasks or clients. 🏢 Enterprise-Centric Design Arctic is designed to solve the problem of high costs and resource demands for building custom LLMs. It excels at enterprise-related tasks like SQL generation, coding, and instruction-following. Its open-source nature, under the Apache 2.0 license, allows access to its weights and code. 💻 Open-Source Data and Recipes Snowflake is sharing data recipes and research insights, contributing to the open-source AI community. Arctic is available on Hugging Face, a key platform for open-source AI models. The open-source nature allows users to train custom models affordably. 📊 Cost-Effective Training Arctic is a cost-effective model with a training budget under $2 million. This is significantly lower than other models, like GPT-4, which reportedly cost around $60 million to train. This reduction in cost encourages innovation and access to AI tools for smaller entities. 🧠 Unique Dense-Hybrid Transformer Architecture Arctic uses a unique architecture with 480 billion parameters across 128 fine-grained experts. This approach aims to increase training efficiency and improve model quality without raising compute costs. It merges dense and mixture-of-experts models, showcasing a new approach to AI. 🌐 Enterprise Intelligence and Open Research Snowflake introduces the term "Enterprise Intelligence" to capture the abilities required by enterprise clients, such as SQL, coding, and instruction-following. The focus on open research insights aims to give back to the community, encouraging innovation and collective knowledge growth. #OpenSourceAI #SnowflakeArctic #EnterpriseAI #LLM #AIInnovation #CostEffectiveAI #ApacheLicense #MixtureOfExperts #AICommunity #EnterpriseIntelligence
BIG win for Open Source AI | Snowflake Arctic 128 Experts MoE, "Cookbook" create world-class models
https://www.youtube.com/
To view or add a comment, sign in
-
Introduction Text embedding plays a crucial role in modern ai workloads, particularly in the context of enterprise search and retrieval systems..... #AI #AImodels #AITools #ArtificialIntelligence #changing #Embedding #flakesnow #game #guidance #impact #Integration #models #RAG #Snowflake #Text #textembedding #Usecases
How Snowflake Text Embedding Models Are Changing the Game
https://technicalterrence.com
To view or add a comment, sign in
More from this author
-
Introducing Airbyte Data Integration and Custom Connectors
William McKnight 10mo -
Building a Marketplace to Help Businesses Quickly Transform AI Initiatives to Reality (2 of 2)
William McKnight 1y -
Building a Marketplace to Help Businesses Quickly Transform AI Initiatives to Reality (1 of 2)
William McKnight 1y