Was a bit surprised to see Wired touting RAG as "taking hold in Silicon Valley" like it's the hot new thing - RAG has been around for ages (at least in LLM terms) and I actually think its application space is currently shrinking as context windows grow. Their "reduce AI hallucinations" thing is not really the whole story - I mean, yes, RAG can definitely help in the sense that asking a broad question to the base model (with no input other than the prompt) is likely to result in hallucinations, whereas feeding in concrete data (with RAG or otherwise) will ground the response in that data and therefore produce fewer hallucinations. But RAG is really just a tool for feeding that data - and, yes, for massive search tasks (like broad legal research requests across entire databases), RAG may be the best (if not only) way to do it. But for smaller tasks (like analyzing a single document) it's often better to just feed the input in directly, which the larger context windows of the newest models increasingly allow. I feel like a few months ago (around the time Opus was released) people were asking whether RAG would be obsolete - that was an overreaction, but still it doesn't really fit this narrative of RAG as some hot new thing in the LLM world. Maybe I missed the pendulum swinging the other way again? Not sure. https://lnkd.in/eUEwBmFf
J. Michael Dockery’s Post
More Relevant Posts
-
We've recently been exploring the power of structured outputs in AI, and the results have been great! Though OpenAI is struggling to deliver and be at par with the expectations they set with their announcements. Their latest release promises 100% compliance with user-provided JSON schemas which I believe is valuable. Why does this matter? JSON has become a extremely important in ensuring reliability with LLMs, and we've been leveraging this to build a suite of internal AI Agents. These agents aren't just tools—they're our testing ground for cutting-edge AI techniques and new research, helping us automate, manage, and streamline our daily tasks. Until now, we were working with a mix of Claude 3.5 Sonnet, GPT-4o-mini, and Llama 3.1-70b. Each has its strengths, but we were stuck on a crucial problem: calling multiple functions and generating multiple messages without needing to ping the LLM model repeatedly and reliably. Claude 3.5 sonnet still works great but is pretty expensive for now. Enter structured outputs. With this new capability, we've created a custom type that not only lets us handle multiple tasks and messages in one go but does so with a level of reliability and efficiency we've never seen before. Below you will see a screenshot of Kaira, I coded her as a personal assistant and she helps me battle procrastination, knows about me and manages tasks, schedules meetings, and keeps me on track. _ PS - I'm also building VoxEstate, a platform dedicated to equipping the real estate industry with state-of-the-art technology.
To view or add a comment, sign in
-
🔥 Taming the Chaos of LLM Outputs with Ollama’s Structured Outputs 🛠️ If you’ve ever worked with LLM models, you know the pain of dealing with inconsistent, messy outputs. Parsing through unstructured text or unpredictable data is a time-sink nobody enjoys. That’s why I’m so excited about Ollama’s Structured Outputs it’s been a game-changer for making AI responses reliable and ready-to-use. I recently wrote about how this feature lets you define a JSON schema to structure AI outputs exactly how you need them. Whether it’s parsing data, analyzing images, or building APIs, this is the tool I didn’t know I needed. In the blog, I share: How Ollama’s Structured Outputs work Real-world use cases (and trust me, there are a lot!) An example from a Car Management System I built that extracts details from descriptions and images. check it out: https://lnkd.in/e6ZT6vaj #AI #Ollama #JSON #LLM
Bringing Structure to LLM's with Ollama’s Structured Outputs
gpt-labs.ai
To view or add a comment, sign in
-
We just released our benchmarks of Meta's Llama 3 model. See the full results at https://www.vals.ai. Some key takeaways: - On our datasets, Llama 3 70B performed better overall than any other open-source model. - Llama 3 70b showed extremely high accuracy on Legalbench, performing 2nd (ahead of GPT). However, this dataset is public -- we're investigating further to see if there's evidence of pre-training on the evaluation set. - On our proprietary datasets, it performed very well, but not better than Opus or GPT 4. This may be a more accurate picture of performance. Additional Findings: - Llama 3 70b was 5th overall on our Contract Law dataset. On the Corporate Finance dataset, it was 6th, behind the larger DBRX but still above Mixtral. On TaxEval it was third overall, beating out Sonnet and GPT 3.5, more than 10% higher than the next open source model. It is likely that these proprietary datasets share a more accurate picture of the model’s performance – strongly competitive with GPT 3.5 / Sonnet / Gemini Pro 1.0 tier models and far superior to other open source alternatives. - Much like its larger counterpart, the Llama 3 8B model showed unreasonably strong performance on LegalBench. On the other datasets, it still showed impressive quality, usually in line with the Llama 70b and Mixtral models. - Generally, llama 3 had low latency and high throughput for all our tests. Inference was done with Together AI.
Vals.ai: LegalBench
vals.ai
To view or add a comment, sign in
-
Agents, Agents, Agents Imagine having your personal army of 1000 AI Agents tirelessly working for you, 24/7. This is definitely the future. When is it going to happen? This is not the right question to ask. The truth is, the future is closer than you think. The earlier you adopt, the earlier you gain a significant advantage over those who wait. 𝐖𝐡𝐚𝐭 𝐚𝐫𝐞 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬? You can think of AI agents as self-driving systems. Unlike traditional systems that follow a certain set of rules, AI agents can act as a system that uses a large language models (LLMs) to reason through a problem, create a plan to solve the problem, and execute the plan with the help of a set of tools. 𝐖𝐡𝐚𝐭 𝐜𝐚𝐧 𝐀𝐈 𝐀𝐠𝐞𝐧𝐭𝐬 𝐝𝐨 𝐭𝐨𝐝𝐚𝐲? These agents today are mostly being used for performing background research by combining LLMs and accessing tools like google search, wikipedia, and other sources of proprietary data. Now, Imagine those 1000 agents working together. This is going to be achieved through Multi-agent LLM systems, which involve multiple single-agent LLM-based systems interacting and collaborating to solve complex tasks. These systems prioritize diverse agent roles, inter-agent communication, and collective decision making. This is exactly where I asked myself these questions: - How can multi-agent systems impact businesses? - How can we integrate multi-agent systems into our existing products? - How can we ensure that these systems are secure, robust and scalable? These questions led me to enroll in the course on Maven called "Architecting Multi-agent LLM systems" by 🟢 Amir Feizpour. Throughout this live course, I'll be posting my insights and learnings as I seek answers to these questions. I'm Mohsin Iqbal Click my name + follow + 🔔
To view or add a comment, sign in
-
S&P Global, a renowned financial intelligence supplier, quietly announced on Wednesday the debut of S&P AI Benchmarks by Kensho. This unique solution seeks to establish a new benchmark for assessing the performance of large language models (LLMs) in complicated financial and quantitative applications. The benchmarking tool, created by S&P Global's AI-focused business Kensho, evaluates an LLM's ability to do tasks such as mathematical reasoning, data extraction from financial documents, and exhibiting domain-specific expertise. The results are then displayed on a leaderboard, giving a clear picture of each model's ability. https://lnkd.in/gbt2Tqi2 #techarticle #technologyinnovation
S&P Global launches groundbreaking AI benchmark for financial industry
https://venturebeat.com
To view or add a comment, sign in
-
As with most things in technologies development, there are tradeoffs to scaling large AI models. The tradeoffs mean we will end in the middle of the popular narratives, not all the way on one side. Holding lots in my head around scaling and not jumping to one extreme: 1. Scaling is very likely to keep lowering test loss 2. Lower test loss is not tied to anything specific 3. Bigger models store more information and tiny changes in probability result in big perceived differences 4. AGI is still whatever you want it to be Mostly, scaling is an economics story at this point -- see how long they can be solvent. Read about it here: https://lnkd.in/dJ9vgi65
How scaling changes model behavior
interconnects.ai
To view or add a comment, sign in
-
We are expanding S&P AI Benchmarks to provide researchers, financial professionals, and business leaders with more tools to objectively evaluate #AI systems like #LLMs. 🚀 Our newest benchmark assesses long-document QA and ranks how well an LLM can answer questions of extremely long and complex documents, like SEC filings. The rankings can help professionals understand the relative performance of LLM applications and corresponding #RAG architectures for specific financial and business use cases. Learn more and check out the new leaderboard!
We’re expanding S&P AI Benchmarks to cover Long-document QA
blog.kensho.com
To view or add a comment, sign in
-
LLM Models are temporary ☝🏻But ... Your business's value is permanent. Let me explain. In the rapidly evolving world of large language models (LLMs), it's easy to get caught up in the hype surrounding the latest and greatest models from tech giants like: ✓ OpenAI, ✓ Anthropic, ✓ Google, ✓ Meta, and ✓ Microsoft. However, the harsh truth is that your choice of LLM model is unlikely to be the key to your business's success or your unfair advantage. These tech giants are the dominant players in the LLM space, with deep pockets and vast resources. Creating, training, and deploying LLMs securely is incredibly expensive and resource-intensive. That's the IP they're building for themselves so that businesses like yours end up using them and paying them usage fees for that IP. One of the primary sources of your unfair advantage is YOUR DATA and how you leverage it. While you may need to use an LLM to power certain aspects of your product, the specific model you choose will never be your unfair advantage. You don't own the intellectual property (IP) of these LLM models ... You own the value you create around them for your customers. 👉🏻 For example, if you create a chatbot that summarizes medical documents, your IP lies in the fine-tuning, customization, and architectural decisions that enable efficient document retrieval and summarization – not the LLM itself. New versions of LLMs are released at a rapid pace, sometimes monthly, and you must anticipate and adapt to these changes. Build your solution modularly to easily swap out LLMs if a better-suited model becomes available for your use case. 💡Work smarter, not harder. ____________________________________ 👋🏻 Hi, I'm Svetlana and I help business leaders apply AI to solve complex business challenges. ↳ If you've enjoyed this post, you'll love my newsletter Subscribe: https://lnkd.in/gpzZHYYf
To view or add a comment, sign in
-
These are some great points on how to keep your wits about you in the current LLM hype wave.
LLM Models are temporary ☝🏻But ... Your business's value is permanent. Let me explain. In the rapidly evolving world of large language models (LLMs), it's easy to get caught up in the hype surrounding the latest and greatest models from tech giants like: ✓ OpenAI, ✓ Anthropic, ✓ Google, ✓ Meta, and ✓ Microsoft. However, the harsh truth is that your choice of LLM model is unlikely to be the key to your business's success or your unfair advantage. These tech giants are the dominant players in the LLM space, with deep pockets and vast resources. Creating, training, and deploying LLMs securely is incredibly expensive and resource-intensive. That's the IP they're building for themselves so that businesses like yours end up using them and paying them usage fees for that IP. One of the primary sources of your unfair advantage is YOUR DATA and how you leverage it. While you may need to use an LLM to power certain aspects of your product, the specific model you choose will never be your unfair advantage. You don't own the intellectual property (IP) of these LLM models ... You own the value you create around them for your customers. 👉🏻 For example, if you create a chatbot that summarizes medical documents, your IP lies in the fine-tuning, customization, and architectural decisions that enable efficient document retrieval and summarization – not the LLM itself. New versions of LLMs are released at a rapid pace, sometimes monthly, and you must anticipate and adapt to these changes. Build your solution modularly to easily swap out LLMs if a better-suited model becomes available for your use case. 💡Work smarter, not harder. ____________________________________ 👋🏻 Hi, I'm Svetlana and I help business leaders apply AI to solve complex business challenges. ↳ If you've enjoyed this post, you'll love my newsletter Subscribe: https://lnkd.in/gpzZHYYf
To view or add a comment, sign in
-
I'm feeling a little saucy today and had this conversation a few times already this week so will put it here. LLMs are played out. Dead. Over. There are 4-5 companies with the financial and data resources to build a general LLM and they've already done it. And each of those organizations craves yet more data to train and adjust the model. LLMs are now feature rich and data poor. Most of these generic LLMs do 90% of the job for the vast majority of layperson applications, but never come close to perfection. The future is Small Language Models (SLMs) where researchers either build a full predictive model from scratch but for the purpose of doing one thing specifically well 100% of the time. So deep but not wide. These models will be built on highly-curated, well-understood data alongside the subject matter expertise that UNDERSTANDS both the problem and the solution. If I've said it 100 times I've said it once--the data IS the AI. Allie K. Miller
To view or add a comment, sign in