🚀 Sergey Shchegrikovich’s Post

CTO at ListAlpha | CRM for private equity funds

1mo

Reducing hallucinations in LLMs using probabilities One of the issues of the wider adoption of LLMs is hallucinations when LLM generates a response with wrong facts in the answer. A user loses trust at this moment, and developers need to fix this. Many approaches use static or predefined rules, such as React or CoT, to deal with hallucinations. However, models via API also return probabilities for each token besides text in the response. These probabilities are numerical representations of how sure the model is in the answer. When a LLM returns a token with low probability it's a sign of potential hallucination. The 'Active Retrieval Augmented Generation' propose a solution of a RAG system based on tokens' probabilities. The solution is called Forward-Looking Active REtrieval augmented generation (FLARE). Don’t be confused by Faithful Logic-Aided Reasoning and Exploration (FLARE). The FLARE focuses on when and what to retrieve during generation. This means that we need to call for external information only when we are sure that the model does not have enough knowledge. We can check this by looking at the probabilities of generated tokens. By requesting external information, we need to request exactly what the model needs. This is achieved by generating a temporary next sentence, which will be used as a query for external information. FLARE starts by retrieving an initial set of documents to generate the first sentence. The second sentence is generated as temporary. We analyze probabilities in the second sentence. If probabilities are high, we accept it and generate the next temporary sentence. If the probabilities are low, we send a search query for external documents. With a new set of external documents we generate the second sentence again. The process ends when the model stops producing new tokens. It's interesting that this approach can be extended to a few sentences and even paragraphs and can also be used to focus on specific parts of the sentence. The 'Mitigating Entity-Level Hallucination in Large Language Models' leverages the approach for named entities. The paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD). It works by splitting a sentence into entities and calculating the probability and entropy for each entity. If these values are below the threshold, the entity is marked as hallucinated and requires replacement. The text before the hallucination is safe, so we need to replace only starting from the first identified hallucination. At this point, the self-correction based on external knowledge (SEK) algorithm executes a search query and replaces the entity in question with the correct value. If LLM generates 'Albert Einstein was born in Berlin', then DRAD will detect that Berlin is a hallucination, and SEK will replace it with Ulm. All of these technics triggers a call for external documents at ideal timing, when the information is really needed.

5 Comments

🚀 Sergey Shchegrikovich

CTO at ListAlpha | CRM for private equity funds

1mo

Originally published here with review of DRAGIN - https://shchegrikovich.substack.com/p/reducing-hallucinations-in-llms-using References: 1. https://arxiv.org/abs/2407.09417 - Mitigating Entity-Level Hallucination in Large Language Models 2. https://arxiv.org/abs/2305.06983 - Active Retrieval Augmented Generation 3. https://arxiv.org/abs/2403.10081 - DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models 4. https://cookbook.openai.com/examples/using_logprobs - Using logprobs

4 Reactions

Pranab Ghosh

AI Consultant || MIT Alumni || Entrepreneur || Open Source Project Owner || Blogger

1mo

A token with high probability can also trigger hallucinations because LLM output has only local structure and coherence

4 Reactions

Dinakar .R

CloudIDSS for Build & Advisory in value based Transformations w/ A4AFE ™

1mo

Given it is also based on the quality of training data, the primary responsibility must be with the LLMs' model vendors.

Kostiantyn W.

Rust, AI, Web, CG for Web for Entertainment, Education, and Fintech

1mo

Great insights, Sergey! Given your background with high-load applications, what challenges would you anticipate in implementing a dynamic retrieval system like DRAD at scale?

Nissim Matatov

Data Scientist and Machine Learning Engineer

1mo

Same as to use feature importance in classic ML to evaluate real feature importance - depends on model quality . LLM can be weak and still hallucinate. Dependency on data still big. But, I like to see the link to classic ML.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Ashwini Chauhan

Entrepreneur | Gen AI Technology Leader | AI consultant and advisor | AI business integration expert |
1mo Edited
Report this post
If the model is not sure about the token's it has generated, how can you be sure. Reading probabilities for the token's generated is really the nicest way to understand model's working.
🚀 Sergey Shchegrikovich

CTO at ListAlpha | CRM for private equity funds
1mo

Reducing hallucinations in LLMs using probabilities One of the issues of the wider adoption of LLMs is hallucinations when LLM generates a response with wrong facts in the answer. A user loses trust at this moment, and developers need to fix this. Many approaches use static or predefined rules, such as React or CoT, to deal with hallucinations. However, models via API also return probabilities for each token besides text in the response. These probabilities are numerical representations of how sure the model is in the answer. When a LLM returns a token with low probability it's a sign of potential hallucination. The 'Active Retrieval Augmented Generation' propose a solution of a RAG system based on tokens' probabilities. The solution is called Forward-Looking Active REtrieval augmented generation (FLARE). Don’t be confused by Faithful Logic-Aided Reasoning and Exploration (FLARE). The FLARE focuses on when and what to retrieve during generation. This means that we need to call for external information only when we are sure that the model does not have enough knowledge. We can check this by looking at the probabilities of generated tokens. By requesting external information, we need to request exactly what the model needs. This is achieved by generating a temporary next sentence, which will be used as a query for external information. FLARE starts by retrieving an initial set of documents to generate the first sentence. The second sentence is generated as temporary. We analyze probabilities in the second sentence. If probabilities are high, we accept it and generate the next temporary sentence. If the probabilities are low, we send a search query for external documents. With a new set of external documents we generate the second sentence again. The process ends when the model stops producing new tokens. It's interesting that this approach can be extended to a few sentences and even paragraphs and can also be used to focus on specific parts of the sentence. The 'Mitigating Entity-Level Hallucination in Large Language Models' leverages the approach for named entities. The paper proposes Dynamic Retrieval Augmentation based on hallucination Detection (DRAD). It works by splitting a sentence into entities and calculating the probability and entropy for each entity. If these values are below the threshold, the entity is marked as hallucinated and requires replacement. The text before the hallucination is safe, so we need to replace only starting from the first identified hallucination. At this point, the self-correction based on external knowledge (SEK) algorithm executes a search query and replaces the entity in question with the correct value. If LLM generates 'Albert Einstein was born in Berlin', then DRAD will detect that Berlin is a hallucination, and SEK will replace it with Ulm. All of these technics triggers a call for external documents at ideal timing, when the information is really needed.
Like Comment
To view or add a comment, sign in
Umit Kavala

Talks about #management, #accomodation, #architecture, #cloudcomputing, and #distributedsystems
3mo
Report this post
As the capabilities of LLMs (Large Language Models) and adjacent tools like embedding models grew significantly over the past year, more and more developers are considering integrating LLMs into their applications.

Building LLM-powered applications in Go

go.dev
Like Comment
To view or add a comment, sign in
Paul Golding

Hands-on R&D Multidisciplinary AI Leader | 30 patents in AI/ML | Enterprise AI | AI Chip Design | Quantum AI
9mo
Report this post
LLM Reasoning -- a giant hallucination? I finally read this: "Can Large Language Models Reason and Plan?" -- https://lnkd.in/g_84qAgd It starts with a kind of jaundiced tone, which is always a pity in a paper. If you have to say something like "An LLM is just a [your favorite reduction]" then you are not saying anything useful. As I have to keep explaining, engineering is about specifications, not fads, phrases-du-jour or other word salads. In my opinion, a french fry is actually an optimal ketch-up delivery vehicle, but it's still just a fried potato. If an LLM can perform a function to a specification, then it doesn't matter if you label that function with the word "reasoning". In many cases, including my own usage of the term, it often means "making sense of the prompts in a domain-specific way that is congruent with what the user intended". Granted, the author gives a more technical definition from the planning literature, but then why not start with that? Usefully, the author says something that ought to be obvious to an engineer: "When all we are looking for are abstract plans, such as “wedding plans,” with no intention of actually executing said plans directly, it is easy to confuse them for complete executable plans." This is indeed correct and here I have to side with the author when reacting to the myriad recent claims about "LLM Agents". As I have shown and posted about, it is trivially easy to get an agent to fail by perturbing even slightly idealized examples -- e.g. like many in langchain's docs. As I demonstrated with structured queries, the example in langchain's docs (which is trained with only 5 records related to music, about which we can assume LLMs know quite a bit) fails easily. A slight change in the prompt caused the example code to hallucinate about a non-existent JSON property in the vector records. Ditto any of the CoT examples -- they deteriorate rapidly with perturbations. So, here the author makes a strong point (about LLMs verifying LLMs): "...while for many computational tasks (e.g. those in class NP), the verification is often of lower complexity than generation, that fact doesn’t seem particularly relevant for LLMs which are generating (approximately retrieving) guesses, rather than actually solving the problem with *guarantees*." There is a point at which the narrowing of the approximation (reduction in temperature) and addition of guard-rails (hard-coded rules) to achieve performance guarantees converts the LLM back into a "good old-fashion expert system" (to paraphrase some of the author's concluding remarks). The author's other remark about the "leakage" of human intervention (to get the desired results) is an important one. I see this often in LLM and langchain demos -- the LLM is actually failing and the shortfall is being made up for by human fiddling and disguised as "prompt engineering" -- well, yes, except for the next time it fails and no human is there to notice.
1 Comment
Like Comment
To view or add a comment, sign in
FriendliAI

2,019 followers
6mo Edited
Report this post
🌟 Structured Outputs for LLM Agents! 🌟 Structured Output is crucial for most enterprise AI use cases, as deviations can break downstream pipelines. Our new Structured Outputs feature ensures LLM responses adhere to specific formats like JSON and other structured outputs. It provides easy integration with other systems. It enforces patterns (e.g., Regex) and formats, allowing seamless import of LLM outputs into your code, reducing parsing errors and enhancing efficiency. This feature also enables you to maintain compatibility across different systems. Our latest blog covers examples of Structured Sentiment Analysis to get output in JSON Schema, Data Wrangling with CSVs (Regex), and generating language-specific results (e.g., Korean text). Check our blog: https://lnkd.in/dQSZ6_sa #Friendli #FriendliAI #LLMserving #llmagent

Introducing Structured Output on Friendli Engine for Building LLM Agents

friendli.ai
Like Comment
To view or add a comment, sign in
Byung-Gon Chun

CEO and Co-founder at FriendliAI, CSE Professor at Seoul National University | We're hiring!
6mo
Report this post
Friendli supports structured outputs for building LLM agents. Use LLMs as the brain for controlling workflow with the Friendli structured outputs feature, applicable for both open-source and custom LLMs! #friendliai #friendli #llm #llmagent #structuredoutputs

FriendliAI

2,019 followers
6mo Edited

🌟 Structured Outputs for LLM Agents! 🌟 Structured Output is crucial for most enterprise AI use cases, as deviations can break downstream pipelines. Our new Structured Outputs feature ensures LLM responses adhere to specific formats like JSON and other structured outputs. It provides easy integration with other systems. It enforces patterns (e.g., Regex) and formats, allowing seamless import of LLM outputs into your code, reducing parsing errors and enhancing efficiency. This feature also enables you to maintain compatibility across different systems. Our latest blog covers examples of Structured Sentiment Analysis to get output in JSON Schema, Data Wrangling with CSVs (Regex), and generating language-specific results (e.g., Korean text). Check our blog: https://lnkd.in/dQSZ6_sa #Friendli #FriendliAI #LLMserving #llmagent

Introducing Structured Output on Friendli Engine for Building LLM Agents

friendli.ai
Like Comment
To view or add a comment, sign in
Krzysztof Chruniak

Domain Architect, AI, ML, LLM, Big Data, Cloud, Serverless
9mo
Report this post
🚀 Elevating AI Assistance with LLM Agents: Beyond the Basics of RAG 🚀 The evolution of AI and machine learning has ushered us into an era where the capabilities of Large Language Models (LLMs) are being pushed beyond conventional boundaries. A fascinating article by Alex Honchar on Towards Data Science introduces us to the concept of LLM Agents with Langchain, highlighting the limitations of Retrieval-Augmented Generation (RAG) and paving the way for more advanced AI assistants. 🧠 The Genesis of LLM Agents LLM agents, as described by Honchar, transcend the traditional use of LLMs by incorporating elements such as external memory, reasoning, tools, answers, and actions. This approach seeks to emulate first-order principles of brain structure, enabling AI assistants to perform more complex tasks with a higher degree of autonomy and intelligence. 💡 Why RAG Isn't Always Enough While RAG architecture has been a significant step forward in augmenting LLMs with external memory, it falls short when tasks require advanced planning, diverse memory types, and the integration of various tools. The article dives deep into the creation of LLM agents from scratch, utilizing Langchain to provide a structured approach to enhancing AI's problem-solving capabilities. 🛠️ Building Blocks of LLM Agents Planning and Reasoning: Moving beyond simple "input-output" models, techniques like "chain of thought" enable LLMs to self-correct and think step-by-step, thereby enhancing their decision-making process. Memory Integration: Mapping different types of memories—sensory, short-term, and long-term—to LLM agents allows for a more nuanced interaction with the AI, resembling human cognitive functions. Tool Augmentation: Incorporating various tools, from domain-specific LLMs to rule-based systems, empowers agents to tackle a broader range of tasks effectively. 🌐 A Step Towards Cognitive AI The introduction of LLM agents represents a leap towards creating AI systems that can understand, reason, and interact with the world in ways previously limited to human intelligence. This groundbreaking approach opens up new possibilities for AI applications across industries, from customer service bots capable of complex problem-solving to research assistants that can navigate vast amounts of data with precision. 👥 Let's Discuss 🔹 How do you see the evolution of LLM agents impacting your industry? 🔹 Are there specific tasks within your organization that could benefit from this advanced AI capability? 👉 https://lnkd.in/gHRk7taq #AI #MachineLearning #LLMAgents #Innovation #DataScience #LLM

Intro to LLM Agents with Langchain: When RAG is Not Enough

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Donald Lutz
8mo
Report this post
Intro to LLM Agents with Langchain: When RAG is Not Enough #ai #ml #llm #langchain #agents https://lnkd.in/eR6GENcR

Intro to LLM Agents with Langchain: When RAG is Not Enough

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Pavan Belagatti Pavan Belagatti is an Influencer

GenAI Evangelist | Developer Advocate | Tech Content Creator | 30k Newsletter Subscribers | Empowering AI/ML/Data Startups
1mo
Report this post
The growth of #RAG frameworks from naive to assistant based. Initially, RAG relied on a simple "Retrieve-Read" framework, which was adequate for basic question-answering but insufficient for complex, multi-step reasoning tasks. As language models advanced, various prompt-based RAG strategies emerged, incorporating pre-retrieval and post-retrieval prompts to refine the process. However, these strategies heavily relied on the foundational capabilities of the language models. Consequently, the focus shifted to Supervised Fine-Tuning (SFT)-based RAG methods, which involve fine-tuning language models specifically for RAG tasks to enhance their performance. While SFT-based methods have improved the quality of generated responses, they face two limitations that hinder their practical application. Firstly, these fine-tuned models are not easily adaptable to emerging LLMs, requiring retraining for each new foundational LLM. Secondly, directly fine-tuning a foundational LLM in the RAG scenario may change its innate abilities, potentially leading to negative impacts on the model’s performance on other tasks. To address these challenges, the researchers propose Assistant-based Retrieval-Augmented Generation (ASSISTRAG), which integrates an intelligent information assistant as a plugin within LLMs. This approach comprises a trainable assistant for information management and a static main LLM dedicated to task execution, as depicted in the figure. Know more about AssistRAG in the research paper: https://lnkd.in/ghiMxPsp ------------------------------------------------------- No matter what RAG approach you choose, having a robust data platform for your RAG applications is highly recommended. SingleStore being a versatile data platform supports all types of data and handles the vector data efficiently. Try SingleStore for FREE: https://lnkd.in/gi2YE4_z
5 Comments
Like Comment
To view or add a comment, sign in
Tayyib Ul Hassan

Efficient NLP | AI in Healthcare | University of Alberta | AMII | NUST’25
5mo Edited
Report this post
Tokenization, the division of input text into input tokens, is an often overlooked aspect of the large language model (LLM) pipeline and could be the source of useful or harmful inductive biases. Historically, LLMs have relied on byte pair encoding. With the increased use of LLMs for reasoning, various number-specific tokenization schemes have been adopted, with popular models like LLaMa and PaLM opting for single-digit tokenization while GPT-3.5 and GPT-4 have separate tokens for each 1-, 2-, and 3-digit numbers. I explained everything about the basics of tokenization in my recent medium article. I have also implemented the standard BPE algorithm which you can use to train your own custom tokenizer on text of your choice. 🚀 Medium Article: https://lnkd.in/dU9zibp8 💻 Colab Notebook: https://lnkd.in/dYudKi3m 📰 Paper: https://lnkd.in/dnSUbekF

All you need to know about Tokenization in LLMs

medium.com
Like Comment
To view or add a comment, sign in
Prof. Willie LU

32yrs in ICT techs, 19yrs in IP management, 22yrs in int'l relations & global business
1mo
Report this post
Good reference, but how to insert preset prediction or preset known knowledge into the AssistRAG?
Pavan Belagatti Pavan Belagatti is an Influencer

GenAI Evangelist | Developer Advocate | Tech Content Creator | 30k Newsletter Subscribers | Empowering AI/ML/Data Startups
1mo

The growth of #RAG frameworks from naive to assistant based. Initially, RAG relied on a simple "Retrieve-Read" framework, which was adequate for basic question-answering but insufficient for complex, multi-step reasoning tasks. As language models advanced, various prompt-based RAG strategies emerged, incorporating pre-retrieval and post-retrieval prompts to refine the process. However, these strategies heavily relied on the foundational capabilities of the language models. Consequently, the focus shifted to Supervised Fine-Tuning (SFT)-based RAG methods, which involve fine-tuning language models specifically for RAG tasks to enhance their performance. While SFT-based methods have improved the quality of generated responses, they face two limitations that hinder their practical application. Firstly, these fine-tuned models are not easily adaptable to emerging LLMs, requiring retraining for each new foundational LLM. Secondly, directly fine-tuning a foundational LLM in the RAG scenario may change its innate abilities, potentially leading to negative impacts on the model’s performance on other tasks. To address these challenges, the researchers propose Assistant-based Retrieval-Augmented Generation (ASSISTRAG), which integrates an intelligent information assistant as a plugin within LLMs. This approach comprises a trainable assistant for information management and a static main LLM dedicated to task execution, as depicted in the figure. Know more about AssistRAG in the research paper: https://lnkd.in/ghiMxPsp ------------------------------------------------------- No matter what RAG approach you choose, having a robust data platform for your RAG applications is highly recommended. SingleStore being a versatile data platform supports all types of data and handles the vector data efficiently. Try SingleStore for FREE: https://lnkd.in/gi2YE4_z
Like Comment
To view or add a comment, sign in

10,113 followers

113 Posts

View Profile Connect

🚀 Sergey Shchegrikovich’s Post

More Relevant Posts

Explore topics