Dominik Stosik, MD, PhD’s Post

6mo

Teaching LLMs When to Retrieve Information for Improved QA Labruna et al. propose Adaptive Retrieval LLM (ADAPT-LLM), which teaches an LLM to dynamically decide whether to retrieve context via an IR system on a per-question basis. This approach is particularly relevant to the recently introduced Med-Gemini models, which leverage web search integration to improve clinical reasoning capabilities. The key steps in ADAPT-LLM are: 1. Prompt the LLM with just the question. If answered correctly, parametric memory suffices. If not, flag the question as requiring retrieval. 2. Construct a training set from an open-domain QA dataset. For correctly answered questions, provide only the Q&A. For others, add a <RET> token to indicate retrieval is needed, along with the Q&A and gold passage. 3. Fine-tune the LLM on this dataset to learn when to answer directly or retrieve via <RET>. At inference, ADAPT-LLM either answers directly or outputs <RET>, triggering retrieval of a relevant passage that is appended to the question prompt for generating the final answer. Experiments used Llama-2 (7B) fine-tuned on NQ and SQuAD, with Contriever for retrieval. On PopQA, ADAPT-LLM outperformed never/always retrieve baselines and matched a popularity-score approach. When ADAPT-LLM chose to retrieve, accuracy was significantly higher with context. When answering directly, it achieved high accuracy from parametric memory alone. Retrieval quality was a key bottleneck. The adaptive retrieval approach in ADAPT-LLM aligns well with the uncertainty-guided search strategy employed by Med-Gemini-L 1.0 for complex clinical reasoning tasks. By selectively integrating web search results based on model uncertainty, Med-Gemini-L 1.0 achieved state-of-the-art performance on benchmarks like MedQA. The success of both ADAPT-LLM and Med-Gemini highlights the importance of judiciously leveraging external information to complement the parametric knowledge of large language models, enabling improved question-answering performance in both open-domain and specialized medical contexts. Furthermore, one could envision an extension of this framework wherein the language model is trained to generate intermediate annotations or structured representations of the retrieved information that it deems most pertinent to the query at hand, akin to the note-taking process employed by human experts when researching a topic. This could potentially enhance the model's ability to distill and synthesize relevant knowledge from the retrieved context, leading to more accurate and informative final responses.

To view or add a comment, sign in

More Relevant Posts

Ashish Patel 🇮🇳

🔥 6x LinkedIn Top Voice | Sr AWS AI ML Solution Architect at IBM | Generative AI Expert | Author - Hands-on Time Series Analytics with Python | IBM Quantum ML Certified | 12+ Years in AI | MLOps | IIMA | 100k+Followers
5mo
Report this post
Why is Self-Tuning the Key to Unlocking the Future of LLM Knowledge Acquisition? As the world rapidly evolves, keeping large language models (LLMs) up-to-date with the latest knowledge is crucial. Zhang et al.'s research introduces Self-Tuning, a groundbreaking framework designed to enable autonomous knowledge acquisition in LLMs from raw documents. Here's why Self-Tuning is so vital: ---------------------------------- ✺ Key Innovations: ➜ Self-Teaching Strategy: Inspired by the Feynman Technique, this approach enhances learning through explanation, boosting memorization, comprehension, and self-reflection in LLMs. ➜ Novel Datasets: The Wiki-Newpages-2023-QA datasets challenge LLMs' knowledge acquisition skills across single-domain, multi-domain, and cross-domain contexts. ✺ How Self-Tuning Works: ➜ Knowledge Absorption: LLMs integrate new information from training documents using self-supervised, knowledge-intensive tasks. ➜ Knowledge Extraction and Refinement: Models enhance question-answering abilities by analyzing unseen documents alongside prior QA data. ➜ Continuous Learning: LLMs continually refine their knowledge base from new documents, ensuring ongoing improvement. ✺ Key Findings ☋ Single-Domain Performance: ↬ Reduced perplexity indicates better information retention. ↬ Enhanced exact match scores show improved content retrieval. ↬ High accuracy in reasoning tasks underscores deeper understanding. ↬ Strong performance on benchmarks like Natural Questions (NQ) and CommonsenseQA (CSQA) showcases robust knowledge retention. ☋ Multi-Domain & Cross-Domain Excellence: Self-Tuning outperforms traditional methods in continuous pre-training and instruction tuning scenarios. ✺ Implications ↦ Practical Impact: Self-Tuning combats knowledge obsolescence, enhancing LLM reliability for real-world applications. It demonstrates the power of systematic knowledge acquisition for modern LLMs. ↦ Theoretical Insights: The study underscores the importance of a structured learning framework that includes comprehension and self-reflection. It sets new standards for evaluating LLM knowledge, emphasizing multi-dimensional assessment. ✺ Looking Ahead ❝ Expanding Horizons: Testing Self-Tuning on various models like Mistral-7B, Orca2-7B, and potential Llama3 to explore broader applicability. ❝ Enhancing Capabilities: Developing targeted self-teaching tasks for domain-specific knowledge and mathematical reasoning. ❝ Advancing Benchmarks: Combining real-world problem-solving with advanced reasoning, blending factual knowledge with complex thinking. Self-Tuning is not just a new technique; it's a transformative approach that equips LLMs with the ability to learn, adapt, and retain new knowledge autonomously. This advancement paves the way for more intelligent, adaptive, and reliable language models, making them indispensable tools for the future. Are you ready to embrace the future of LLMs with Self-Tuning?
4 Comments
Like Comment
To view or add a comment, sign in
Mouli Siramdasu

Data Scientist & AI Engineer | NLP and Deep Learning Specialist | Advancing Generative AI Innovations
7mo Edited
Report this post
𝐑𝐀𝐅𝐓: 𝐀𝐝𝐚𝐩𝐭𝐢𝐧𝐠 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥 𝐭𝐨 𝐃𝐨𝐦𝐚𝐢𝐧 𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐑𝐀𝐆 Most of the LLMs have been constrained by having their Knowledge bases frozen at a certain point in time. This can lead to hallucinations or inaccurate outputs when asked about more recent events or domain-specific topics not covered comprehensively in the training data. To address this, techniques like Retrieval-Augmented Generation (RAG) have been developed to allow ingesting relevant domain knowledge into the prompt, leading to more accurate and up-to-date response. University of California, Berkeley came up with a new approach 𝐑𝐀𝐅𝐓 to enhances the ability of model to answer questions in domain-specific contexts. 🔍 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐑𝐀𝐅𝐓? RAFT is an innovative adaptation strategy that combines the strengths of Retrieval Augmented Generation (RAG) and supervised fine-tuning to improve Large Language Models (LLMs). It trains models to discern and utilize relevant documents while ignoring distractors, thereby refining their reasoning capabilities and accuracy in "open-book" settings. 📚 𝐓𝐡𝐞 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 𝐨𝐟 𝐃𝐨𝐦𝐚𝐢𝐧-𝐒𝐩𝐞𝐜𝐢𝐟𝐢𝐜 𝐊𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞 As LLMs are increasingly deployed in specialized domains, such as legal or medical fields, the need to incorporate domain-specific knowledge becomes paramount. Traditional methods either fail to leverage the learning opportunities presented by fixed domain settings or do not prepare models for the open-book nature of real-world applications. 🤖 𝐇𝐨𝐰 𝐑𝐀𝐅𝐓 𝐖𝐨𝐫𝐤𝐬 RAFT addresses these challenges by training models to answer questions using a set of retrieved documents, including both relevant and distractor documents. This approach not only enables models to learn domain-specific knowledge through fine-tuning but also ensures robustness against inaccurate retrievals. 📈 𝐓𝐡𝐞 𝐈𝐦𝐩𝐚𝐜𝐭 𝐨𝐟 𝐑𝐀𝐅𝐓 RAFT consistently outperforms existing fine-tuning approaches across various datasets, including PubMed, HotpotQA, and Gorilla datasets. It demonstrates a significant improvement in the model's performance, particularly in domain-specific RAG. 🧠 𝐂𝐡𝐚𝐢𝐧-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 A key factor in RAFT's success is the generation of a reasoning process, by Chain-of-Thought, which guides the model to the correct answer while providing a detailed explanation based on the context. The code and demo for RAFT are open-sourced, inviting developers and researchers to explore and build upon this innovative approach. 𝐏𝐚𝐩𝐞𝐫: https://lnkd.in/e66uScBg #AI #NLP #FineTuning

RAFT: Adapting Language Model to Domain Specific RAG

arxiv.org
Like Comment
To view or add a comment, sign in
William W Collins

Innovative Transformational Leader | Multi-Industry Experience | AI & SaaS Expert | Generative AI | DevOps, AIOps, SRE & Cloud Technologies | Experienced Writer | Essayist | Digital Content Creator | Author
2mo
Report this post
Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog | ML@CMU | Carnegie Mellon University ([Global] Data Breach) URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models (LLMs) concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to (at least informally) believe that LLMs do some degree of both: they clearly memorize parts of the training data—for example, they are often able to reproduce large portions of training data verbatim [Carlini et al., 2023]—but they also seem to learn from this data, allowing them to generalize to new settings. The precise extent to which they do one or the other has massive implications for the practical and legal aspects of such models [Cooper et al., 2023]. Do LLMs truly produce new content, or do they only remix their training data? Should the act of training on copyrighted data be deemed an unfair use of data, or should fair use be judged by some notion of model memorization? When dealing with humans, we distinguish plagiarizing content from learning from it, but how should this extend to LLMs? The answer inherently relates to the definition of memorization for LLMs and the extent to which they memorize their training data. However, even defining memorization for LLMs is challenging, and many existing definitions leave much to be desired. In our recent paper (project page), we propose a new definition of memorization based on a compression argument. Our definition posits that a phrase present in the training data is memorized if we can make the model reproduce the phrase using a prompt (much) shorter than the phrase itself. Operationalizing this definition requires finding the shortest adversarial input prompt that is specifically optimized to produce a target output. We call this ratio of input-to-output tokens the Adversarial Compression Ratio (ACR). In other words, memorization is inherently tied to whether a certain output can be represented in a compressed form beyond what language models can do with typical text. We argue that such a definition provides an intuitive notion of memorization. If a certain phrase exists within the LLM training data (e.g., is not itself generated text) and it can be reproduced with fewer input tokens than output tokens, then the phrase must be stored somehow within the weights of the LLM. Although it may be more natural to consider compression in terms of the LLM-based notions of input/output perplexity, we argue that a simple compression ratio based on input/output token counts provides a more intuitive explanation to non-technical audiences and has the potential to serve as a legal basis for important questions about memorization and permissible data use. In addition to its intuitive nature, our definition has several other desirable qualities. We show that it appropriately ascribes many...

$Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog \| ML\@CMU \| Carnegie Mellon University $\[Global\] Data Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners...$

Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog \| ML\@CMU \| Carnegie Mellon University $\[Global\] Data Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners...

https://blog.ml.cmu.edu
Like Comment
To view or add a comment, sign in
William W Collins

Innovative Transformational Leader | Multi-Industry Experience | AI & SaaS Expert | Generative AI | DevOps, AIOps, SRE & Cloud Technologies | Experienced Writer | Essayist | Digital Content Creator | Author
2mo
Report this post
Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog | ML@CMU | Carnegie Mellon University ([Global] Security Breach) URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models (LLMs) concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to (at least informally) believe that LLMs do some degree of both: they clearly memorize parts of the training data—for example, they are often able to reproduce large portions of training data verbatim [Carlini et al., 2023]—but they also seem to learn from this data, allowing them to generalize to new settings. The precise extent to which they do one or the other has massive implications for the practical and legal aspects of such models [Cooper et al., 2023]. Do LLMs truly produce new content, or do they only remix their training data? Should the act of training on copyrighted data be deemed an unfair use of data, or should fair use be judged by some notion of model memorization? When dealing with humans, we distinguish plagiarizing content from learning from it, but how should this extend to LLMs? The answer inherently relates to the definition of memorization for LLMs and the extent to which they memorize their training data. However, even defining memorization for LLMs is challenging, and many existing definitions leave much to be desired. In our recent paper (project page), we propose a new definition of memorization based on a compression argument. Our definition posits that a phrase present in the training data is memorized if we can make the model reproduce the phrase using a prompt (much) shorter than the phrase itself. Operationalizing this definition requires finding the shortest adversarial input prompt that is specifically optimized to produce a target output. We call this ratio of input-to-output tokens the Adversarial Compression Ratio (ACR). In other words, memorization is inherently tied to whether a certain output can be represented in a compressed form beyond what language models can do with typical text. We argue that such a definition provides an intuitive notion of memorization. If a certain phrase exists within the LLM training data (e.g., is not itself generated text) and it can be reproduced with fewer input tokens than output tokens, then the phrase must be stored somehow within the weights of the LLM. Although it may be more natural to consider compression in terms of the LLM-based notions of input/output perplexity, we argue that a simple compression ratio based on input/output token counts provides a more intuitive explanation to non-technical audiences and has the potential to serve as a legal basis for important questions about memorization and permissible data use. In addition to its intuitive nature, our definition has several other desirable qualities. We show that it appropriately ascribes...

$Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog \| ML\@CMU \| Carnegie Mellon University $\[Global\] Security Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most...$

Rethinking LLM Memorization by Avi Schwarzschild via Machine Learning Blog \| ML\@CMU \| Carnegie Mellon University $\[Global\] Security Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most...

https://blog.ml.cmu.edu
Like Comment
To view or add a comment, sign in
Upendra Kumar

Machine Learning Engineer | Generative AI | MLOps
6mo
Report this post
A research paper read (Adaptive-RAG ) Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity 🚀 Retrieval-Augmented Large Language Models, which incorporate the non-parametric knowledge from external knowledge bases into LLMs, have emerged as a promising approach to enhance response accuracy in several tasks, such as Question-Answering (QA). 🚀 While interacting with QA, every time we don't ask queries with same complexities. Some time we aks simple queries which can we answerd using single-step or complex queries which can be answered using multi-step or even queries which don't require external source. 🚀 Normally we use the same strategy for retrieval-augemented-generation from the simplest to the most sophisticated queries. End up using unnecessary computational overhead while dealing with simple queries or fail to adequately address complex multi-step queries. Also some queries fall into no-retrieval methods where don't require any external knowledge base but we end up calling external source. 🚀 In this research paper they have proposed a novel adaptive QA framework that can dynamically select the most suitable strategy for retrieval-augemented LLMs from the simplest to the most sophisticated ones based on the query complexity. 🚀 This selection process is operationalized with a classifier, which is a smaller Language Model trained to predict the complexity level of incoming queries with automatically collected labels, obtained from the actual predicted outcomes of models and inherent inductive biases in datasets. 🚀 This approach offers a balanced strategy, seamlessly adapting between the iterative and single-step retrieval-augmented LLMs, as well as the no-retrieval methods, in response to a range of query complexities. 🚀 They have validated their model on a set of open-domain QA datasets, covering multiple query complexities. For detailed information: 👇 Research Paper: https://lnkd.in/dmcu-Av4 #RAG #LLMs
Like Comment
To view or add a comment, sign in
Woocheol Cho

AI Researcher | Gen AI | LLM | RAG | NLP | Machine learning
7mo Edited
Report this post
Adaptive-RAG: Learning to Adapt Retrieval-AugmentedLarge Language Models through Question Complexity #RAGpaper RAG-based QA systems need to handle many different kinds of queries. Existing one-size-fits-all RAG methods either fail to handle complex queries or unnecessarily complicate simple queries, resulting in wasted computational cost. Soyeong Jeong, Jinheon Baek, Sukmin Cho, Sung Ju Hwang, Jong C. Park propose Adaptive-RAG, an adaptive QA framework that dynamically selects the appropriate strategy based on the complexity of the query. Adaptive-RAG classifies the complexity of a query and takes different QA strategies according to the classification results to generate answers more effectively and efficiently. 📕Key terms ✔ Classifier: A model to classify the complexity of a query. fine-tuned T5-large model ✔ Non-Retrieval QA: Answers using only LLM without referencing external sources. ✔ Single-step Approach QA: Utilizes external resources through single-step search ✔ Multi-step Approach QA: Synthesize multiple external sources through multiple step search/inference 📕Overview Adaptive-RAG classifies the complexity of a query into 3 types with a classifier at query input, and executes different QA strategies based on the classification results. [Intuitive Query] => Non-Retrieval QA [Simple Query] => Single-step Approach QA [Complex Query] => Multi-step Approach QA The key to Adaptive-RAG is the performance of the classifier. To train the classifier, they built and fine-tuned the dataset by applying three QA strategies to a QA benchmark dataset and labeling the results as query complexity. 📕Why Adaptive-RAG? ✔ Can handle not only simple queries, but also complex queries that require concatenation and aggregation of multiple pieces of information. ✔ Reduce unnecessary computational costs by flexibly responding to query complexity. ✔ It shows better performance and efficiency than the existing uniform RAG-based QA system. 📄 Paper: https://lnkd.in/emgrXwKS 👨💻 Code: https://lnkd.in/edbRE9qh 👨🍳 Llamaindex Cookbook: https://lnkd.in/e2tSBz6Q Thanks to Jerry Liu LlamaIndex To learn more about the papers in Korean, check out the blogs below https://lnkd.in/eD_7ts_C #RAG #AdvancedRAG #Llamaindex #Langchain #RAGpaper
Like Comment
To view or add a comment, sign in
William W Collins

Innovative Transformational Leader | Multi-Industry Experience | AI & SaaS Expert | Generative AI | DevOps, AIOps, SRE & Cloud Technologies | Experienced Writer | Essayist | Digital Content Creator | Author
2mo
Report this post
Rethinking LLM Memorization by Avi Schwarzschild via Blog | Machine Learning | Carnegie Mellon University ([Global] Data Breach) URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models (LLMs) concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to (at least informally) believe that LLMs do some degree of both: they clearly memorize parts of the training data—for example, they are often able to reproduce large portions of training data verbatim [Carlini et al., 2023]—but they also seem to learn from this data, allowing them to generalize to new settings. The precise extent to which they do one or the other has massive implications for the practical and legal aspects of such models [Cooper et al., 2023]. Do LLMs truly produce new content, or do they only remix their training data? Should the act of training on copyrighted data be deemed an unfair use of data, or should fair use be judged by some notion of model memorization? When dealing with humans, we distinguish plagiarizing content from learning from it, but how should this extend to LLMs? The answer inherently relates to the definition of memorization for LLMs and the extent to which they memorize their training data. However, even defining memorization for LLMs is challenging, and many existing definitions leave much to be desired. In our recent paper (project page), we propose a new definition of memorization based on a compression argument. Our definition posits that a phrase present in the training data is memorized if we can make the model reproduce the phrase using a prompt (much) shorter than the phrase itself. Operationalizing this definition requires finding the shortest adversarial input prompt that is specifically optimized to produce a target output. We call this ratio of input-to-output tokens the Adversarial Compression Ratio (ACR). In other words, memorization is inherently tied to whether a certain output can be represented in a compressed form beyond what language models can do with typical text. We argue that such a definition provides an intuitive notion of memorization. If a certain phrase exists within the LLM training data (e.g., is not itself generated text) and it can be reproduced with fewer input tokens than output tokens, then the phrase must be stored somehow within the weights of the LLM. Although it may be more natural to consider compression in terms of the LLM-based notions of input/output perplexity, we argue that a simple compression ratio based on input/output token counts provides a more intuitive explanation to non-technical audiences and has the potential to serve as a legal basis for important questions about memorization and permissible data use. In addition to its intuitive nature, our definition has several other desirable qualities. We show that it appropriately ascribes many famous...

$Rethinking LLM Memorization by Avi Schwarzschild via Blog \| Machine Learning \| Carnegie Mellon University $\[Global\] Data Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to...$

Rethinking LLM Memorization by Avi Schwarzschild via Blog \| Machine Learning \| Carnegie Mellon University $\[Global\] Data Breach$ URL: https://ift.tt/5VABFdh Introduction A central question in the discussion of large language models $LLMs$ concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to...

https://blog.ml.cmu.edu
Like Comment
To view or add a comment, sign in
Mohan Ramadoss

Lead Cloud Engineer Cloud (AWS-AZURE-GCP) & Container | 4*AWS | CK{AD} | Iac| Openshift |Devops|3* GOOGLE CLOUD| 3* RHCE| Virtualization Specialist|1 * Azure. AI & ML | Generative AI
3w
Report this post
A "Notepad LLM" likely refers to a language learning model (LLM) integrated into a notepad-style interface or tool, where users can interact with an LLM in a simple, text-based environment, similar to a traditional notepad application. This setup leverages AI models like GPT to provide intelligent assistance for text-based tasks. https://lnkd.in/gdZHpa7F What it is: A Notepad LLM could be used as an AI-powered text editor where the LLM assists with tasks like: Writing suggestions Grammar and spelling corrections Summarizing long documents Generating code snippets Providing explanations or research on a topic Automating repetitive tasks such as reformatting or structuring text. How it can be used: Writing Assistance: While writing in the notepad interface, the LLM can provide grammar and style suggestions, auto-complete sentences, and even help improve the clarity of your writing. Summarization: You can input long blocks of text, and the LLM will generate concise summaries, saving you time when reviewing documents or preparing reports. Code Generation: For developers, the LLM can suggest code snippets, troubleshoot issues, or explain complex programming concepts. Data Analysis: It can assist with parsing data, generating simple insights, or suggesting improvements in your datasets (if integrated with relevant tools). Research and Content Generation: You can ask it to pull in relevant research information, generate outlines for reports, or even draft essays and articles. Multilingual Support: The LLM can translate text or help in learning new languages by providing explanations or contextual translations. Examples of Usage: For writers: It serves as a brainstorming tool, helping generate ideas, restructure paragraphs, or enhance overall flow. For students: Summarize lectures, generate study notes, and explain complex subjects. For developers: Assist in writing code and explaining how specific functions or algorithms work. For content creators: Outline and create content for blogs, social media, and more. In a practical sense, a "Notepad LLM" can be seen as a productivity-enhancing tool that combines traditional notetaking with the power of modern AI.

5 Comments
Like Comment
To view or add a comment, sign in
Hans Weytjens

Scientific Researcher - Guest Professor - Entrepreneur
4mo Edited
Report this post
𝐂𝐚𝐋𝐌: 𝐂𝐨𝐧𝐭𝐫𝐚𝐬𝐭𝐢𝐧𝐠 𝐋𝐚𝐫𝐠𝐞 𝐚𝐧𝐝 𝐒𝐦𝐚𝐥𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 Technical level ●●○○○ Larger context windows in large language models (LLMs) make it straightforward to ingest external documents for 𝐢𝐧-𝐜𝐨𝐧𝐭𝐞𝐱𝐭 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠. Retrieval Augmented Generation (𝐑𝐀𝐆) further enhances LLM capabilities by enabling the extraction of relevant document chunks from a database. This expands the LLM's knowledge base to include vast amounts of private or specialized documents that were not available during training. In-context learning and RAG are two prevalent 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧-𝐬𝐞𝐞𝐤𝐢𝐧𝐠 methods where large LLMs excel. Given their ability to identify relevant information from vast text corpora, the largest modern LLMs have become the preferred models for information-seeking tasks. However, there is a caveat. Very large LLMs, with their extensive parametric memories, often "know" a lot from being pre-trained on vast datasets. When tasked with finding information from documents (either in-context or via RAG), they may tend to 𝐫𝐞𝐥𝐲 𝐨𝐧 𝐭𝐡𝐞𝐢𝐫 𝐦𝐞𝐦𝐨𝐫𝐲 rather than the provided documents, especially if the documents lack the necessary information for a satisfactory answer. 𝐆𝐫𝐨𝐮𝐧𝐝𝐞𝐝 𝐠𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐨𝐧 aims to counteract this by forcing the LLM to 𝐜𝐢𝐭𝐞 𝐬𝐨𝐮𝐫𝐜𝐞𝐬 for its answers, as illustrated in Panel ‘a’ of the accompanying image. More advanced, multi-stage approaches might include a preprocessing step that summarizes relevant documents before generating an answer, as shown in Panel ‘b’. 𝐂𝐚𝐋𝐌 (𝐂ontrasting 𝐋arge and s𝐌all language models) introduces an innovative approach by leveraging small LLMs to validate the output of larger models. This method, developed by a team from Google in collaboration with two Californian universities, hinges on the idea that 𝐬𝐦𝐚𝐥𝐥𝐞𝐫 𝐦𝐨𝐝𝐞𝐥𝐬, with their limited parametric memory, are 𝐛𝐞𝐭𝐭𝐞𝐫 𝐚𝐭 𝐩𝐫𝐨𝐜𝐞𝐬𝐬𝐢𝐧𝐠 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐞𝐝 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐫𝐞𝐥𝐲𝐢𝐧𝐠 𝐨𝐧 𝐩𝐫𝐞-𝐭𝐫𝐚𝐢𝐧𝐞𝐝 𝐤𝐧𝐨𝐰𝐥𝐞𝐝𝐠𝐞. They can validate the larger LLM's answer based solely on the documents cited. If the answers match, a green light is given. If not, the large LLM is fed its previously cited documents along with the next best relevant ones for a new iteration to improve the answer. Panel ‘c’ visualizes CaLM. The authors report promising results for their approach, noting that the optimal small model had a parameter size of 13 billion parameters. This agentic application highlights the usefulness of smaller LLMs beyond just speed, energy, and memory efficiency. However, it is worth noting that the large LLMs used in the study were GPT-3.5-Turbo and Unicorn (PaLM 2's largest variant), both launched in the fall of 2023. Consequently, the performance comparison with current top-of-the-line models like Claude Sonnet 3.5 or GPT-4o remains unaddressed.
Like Comment
To view or add a comment, sign in

1,676 followers

View Profile Follow

Dominik Stosik, MD, PhD’s Post

More from this author

Meta's Open Source AI Initiative: A Genuine Masterstroke for Innovation or Just Hype?

The ‘Unreasonable’ Innovators: Pioneering AI in Healthcare

Decoding Diabetes: A Melody Unheard

Explore topics