Top LLM Papers of the Week (November Week 1, 2024)

Kalyan KS

Published Nov 8, 2024

[1] Multi-expert Prompting

Multi-Expert prompting is designed to enhance LLM safety, reliability and usefulness. Multi-expert Prompting significantly outperforms Expert Prompting and comparable baselines. It achieves state-of-the-art truthfulness by outperforming the best baseline by 8.69% with ChatGPT. Multi-expert Prompting is efficient, explainable, and highly adaptable to diverse scenarios. [Tweet] and [Paper]

[2] Adaptive Filtering for RAG Systems

This paper presents E2E-AFG, an end-to-end model with adaptive filtering for RAG systems. Adaptive filtering enables the model to focus more effectively on relevant content while reducing the influence of irrelevant information and generating accurate answers. The proposed E2E-AFG consistently outperforms baseline models across all tasks. [Tweet] and [Paper]

"Top LLM Papers of the Week" newsletter is read by over 20k+ AI Researchers, Engineers and Developers. If you would like to promote with us, contact Kalyan KS

[3] Benchmarking Library for MoEs in LLMs

Mixture of Experts (MoEs) plays an important role in the development of more efficient and effective LLMs. This work introduces LibMoE, a comprehensive and modular framework to streamline the research, training, and evaluation of MoE algorithms. LibMoE is built upon three core principles: (i) modular design, (ii) efficient training; (iii) comprehensive evaluation. With the modular design, LibMoE will be invaluable for researchers working with MoEs. [Tweet] and [Paper]

[4] RAGViz Tool

This paper presents RAGViz, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents. RAGViz tool includes a built-in user interface, retrieval index, and Large Language Model (LLM) backbone. RAGViz provides two main functionalities: (1) token and document-level attention visualization, and (2) generation comparison upon context document addition and removal. RAGViz operates efficiently with a median query time of about 5 seconds on a moderate GPU node. [Tweet] and [Paper]

[5] Shortcut Learning in LLMs (Survey)

This paper provides a comprehensive survey on shortcut learning in In-Context Learning (ICL). It conducts a detailed exploration of the types of shortcuts in ICL tasks, their causes, available benchmarks, and strategies for mitigating shortcuts. Based on corresponding observations, it summarizes the unresolved issues in existing research and attempts to outline the future research landscape of shortcut learning. [Tweet] and [Paper]

[6] Quantization Techniques for LLMs (Survey)

This paper provides a comprehensive analysis of quantization techniques with a particular focus on their application to LLMs. The paper covers mathematical theory of quantization, followed by a review of common quantization methods and how they are implemented. The paper also examines several prominent quantization methods applied to LLMs, detailing their algorithms and performance outcomes. [Tweet] and [Paper]

[7] Survey of Small Language Models

Despite their proficiency in various tasks, LLMs like face limitations due to large parameter sizes and computational demand. Small Language Models (SLMs) are increasingly favored for their low inference latency, cost-effectiveness, efficient development, and easy customization and adaptability. This paper presents a comprehensive survey of SLM covering techniques for improving the performance of SLMs, training from scratch, fine-tuning, knowledge distillation, quantization, and leveraging LLM-enhancing technologies to optimize SLMs etc. [Tweet] and [Paper]

[8] 4-bit Activations for 1-bit LLMs

This paper introduces BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid quantization and sparsification strategy to mitigate the quantization errors introduced by the outlier channels. Results demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs, while being faster in inference with enabling 4-bit (INT4/FP4) kernels. [Tweet] and [Paper]

[9] OpenCoder LLM

This paper introduces OpenCoder, a top-tier code LLM that achieves performance comparable to leading models. The authors identify the key ingredients for building a top-tier code LLM: (1) code optimized heuristic rules for data cleaning and methods for data deduplication, (2) recall of text corpus related to code and (3) high-quality synthetic data in both annealing and supervised fine-tuning stages. [Tweet] and [Paper]

Do subscribe to the newsletter so that you won't miss interesting updates related to Generative AI, LLMs and RAG.