🚀 Sergey Shchegrikovich’s Post

View profile for 🚀 Sergey Shchegrikovich, graphic

CTO at ListAlpha | CRM for private equity funds

Reducing hallucinations in LLMs using probabilities   One of the issues of the wider adoption of LLMs is hallucinations when LLM generates a response with wrong facts in the answer. A user loses trust at this moment, and developers need to fix this. Many approaches use static or predefined rules, such as React or CoT, to deal with hallucinations. However, models via API also return probabilities for each token besides text in the response. These probabilities are numerical representations of how sure the model is in the answer. When a LLM returns a token with low probability it's a sign of potential hallucination.   The 'Active Retrieval Augmented Generation' propose a solution of a RAG system based on tokens' probabilities. The solution is called Forward-Looking Active REtrieval augmented generation (FLARE). Don’t be confused by Faithful Logic-Aided Reasoning and Exploration (FLARE).   The FLARE focuses on when and what to retrieve during generation. This means that we need to call for external information only when we are sure that the model does not have enough knowledge. We can check this by looking at the probabilities of generated tokens. By requesting external information, we need to request exactly what the model needs. This is achieved by generating a temporary next sentence, which will be used as a query for external information.   FLARE starts by retrieving an initial set of documents to generate the first sentence. The second sentence is generated as temporary. We analyze probabilities in the second sentence. If probabilities are high, we accept it and generate the next temporary sentence. If the probabilities are low, we send a search query for external documents. With a new set of external documents we generate the second sentence again. The process ends when the model stops producing new tokens.   It's interesting that this approach can be extended to a few sentences and even paragraphs and can also be used to focus on specific parts of the sentence. The 'Mitigating Entity-Level Hallucination in Large Language Models' leverages the approach for named entities.   The paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD). It works by splitting a sentence into entities and calculating the probability and entropy for each entity. If these values are below the threshold, the entity is marked as hallucinated and requires replacement. The text before the hallucination is safe, so we need to replace only starting from the first identified hallucination. At this point, the self-correction based on external knowledge (SEK) algorithm executes a search query and replaces the entity in question with the correct value. If LLM generates 'Albert Einstein was born in Berlin', then DRAD will detect that Berlin is a hallucination, and SEK will replace it with Ulm. All of these technics triggers a call for external documents at ideal timing, when the information is really needed.

  • No alternative text description for this image
🚀 Sergey Shchegrikovich

CTO at ListAlpha | CRM for private equity funds

1mo

Originally published here with review of DRAGIN - https://shchegrikovich.substack.com/p/reducing-hallucinations-in-llms-using References: 1. https://arxiv.org/abs/2407.09417 - Mitigating Entity-Level Hallucination in Large Language Models 2. https://arxiv.org/abs/2305.06983 - Active Retrieval Augmented Generation 3. https://arxiv.org/abs/2403.10081 - DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models 4. https://cookbook.openai.com/examples/using_logprobs - Using logprobs

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

1mo

A token with high probability can also trigger hallucinations because LLM output has only local structure and coherence

Dinakar .R

CloudIDSS for Build & Advisory in value based Transformations w/ A4AFE ™

1mo

Given it is also based on the quality of training data, the primary responsibility must be with the LLMs' model vendors.

Like
Reply
Kostiantyn W.

Rust, AI, Web, CG for Web for Entertainment, Education, and Fintech

1mo

Great insights, Sergey! Given your background with high-load applications, what challenges would you anticipate in implementing a dynamic retrieval system like DRAD at scale?

Like
Reply
Nissim Matatov

Data Scientist and Machine Learning Engineer

1mo

Same as to use feature importance in classic ML to evaluate real feature importance - depends on model quality . LLM can be weak and still hallucinate. Dependency on data still big. But, I like to see the link to classic ML.

See more comments

To view or add a comment, sign in

Explore topics