| AI in HealthTech from Stanford | CEO & Founder at attained.ai | PhD in Pharmacology | Medical Doctor | US Patent Holder |
Teaching LLMs When to Retrieve Information for Improved QA Labruna et al. propose Adaptive Retrieval LLM (ADAPT-LLM), which teaches an LLM to dynamically decide whether to retrieve context via an IR system on a per-question basis. This approach is particularly relevant to the recently introduced Med-Gemini models, which leverage web search integration to improve clinical reasoning capabilities. The key steps in ADAPT-LLM are: 1. Prompt the LLM with just the question. If answered correctly, parametric memory suffices. If not, flag the question as requiring retrieval. 2. Construct a training set from an open-domain QA dataset. For correctly answered questions, provide only the Q&A. For others, add a <RET> token to indicate retrieval is needed, along with the Q&A and gold passage. 3. Fine-tune the LLM on this dataset to learn when to answer directly or retrieve via <RET>. At inference, ADAPT-LLM either answers directly or outputs <RET>, triggering retrieval of a relevant passage that is appended to the question prompt for generating the final answer. Experiments used Llama-2 (7B) fine-tuned on NQ and SQuAD, with Contriever for retrieval. On PopQA, ADAPT-LLM outperformed never/always retrieve baselines and matched a popularity-score approach. When ADAPT-LLM chose to retrieve, accuracy was significantly higher with context. When answering directly, it achieved high accuracy from parametric memory alone. Retrieval quality was a key bottleneck. The adaptive retrieval approach in ADAPT-LLM aligns well with the uncertainty-guided search strategy employed by Med-Gemini-L 1.0 for complex clinical reasoning tasks. By selectively integrating web search results based on model uncertainty, Med-Gemini-L 1.0 achieved state-of-the-art performance on benchmarks like MedQA. The success of both ADAPT-LLM and Med-Gemini highlights the importance of judiciously leveraging external information to complement the parametric knowledge of large language models, enabling improved question-answering performance in both open-domain and specialized medical contexts. Furthermore, one could envision an extension of this framework wherein the language model is trained to generate intermediate annotations or structured representations of the retrieved information that it deems most pertinent to the query at hand, akin to the note-taking process employed by human experts when researching a topic. This could potentially enhance the model's ability to distill and synthesize relevant knowledge from the retrieved context, leading to more accurate and informative final responses.