Gaurav Malhotra’s Post

Name: How LLM tracking and monitoring can boost LLM applications | Gaurav Malhotra posted on the topic | LinkedIn
Uploaded: 2024-05-19T18:57:12.316Z
Duration: 1 min 50 s
Channel: Gaurav Malhotra

Gaurav Malhotra

7mo Edited

As engineering leaders integrate LLMs, robust tracking and monitoring are crucial for: 🔐 Mitigating risks (security, privacy, compliance) 💰 Optimizing costs and performance 🔍 Enabling debugging and model management 🌟 Monitoring LLM with Weaved/Instrumented Agents and AOP: - 🤖 Agents monitor LLM calls using separate scripts - 🧩 AOP intercepts calls without changing original code - 🤝 Engineers focus on LLM development, AOP handles monitoring - 🌐 Works with any LLM framework, follows OpenTracing standards - 📈 Integrates with Jaeger for end-to-end tracing and visualization - 🔍 Tracks prompts, arguments, and replies Benefits of AOP for LLM applications: - Separation of Concerns - Non-invasive Instrumentation - Centralized Monitoring - Consistency and Standardization - Flexibility and Extensibility 💡 Best Practices for Planning Agentic Systems: 1. ⚠️ Vague sub-tasks, CoT failure, expensive ToT prompting 2. ✅ Use Planner with feedback loop 3. 🧩 Decouple components for production-grade agents 4. 🎯 Use domain-specific models 5. 🌊 Continuous memory support for planning 6. 🔍 Explore symbolic planning and external memory 🚀 Combining agent monitoring and agentic planning ensures reliable, efficient, and optimized LLM applications with comprehensive tracking, shared responsibility, and ethical oversight. 🌟 #LLMMonitoring #AOP #AgenticSystems #OpenTracing #Jaeger #LLM #LLMGuardRails #LangChain Engineering, just write below code, see video to AOP magics kicksin: llm_chain = LLMChain(prompt=prompt, llm=llm) question = "What is the capital of France?" result = llm_chain({"question": question}) See attached demo #Gonnect #LLMMonitoring #AOP #AgenticSystems

To view or add a comment, sign in

More Relevant Posts

Ashish .

ML intern @ UNSW Sydney | ML Intern @ IISc | ML Intern @ Charpixel Technologies | Project with SONY Research India | Summer Research Intern @ IIT Bombay & IIT Roorkee & IIT Patna | 3⭐ coder @ CodeChef | CSE @ AMU
4mo Edited
Report this post
LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work. Source: https://lnkd.in/gsZGaC-j
5 Comments
Like Comment
To view or add a comment, sign in
Alexandre Martins Pinto

Technology Leader | Chief Data Scientist & AI Innovator | NED & Lecturer | Digital Transformation & Data Strategy Expert | Advanced AI Solutions for Competitive Advantage | (PhD AI)
4mo Edited
Report this post
Like all neural networks, regardless of how complex they may be, LLMs (yes, they’re based on Transformers which are a particular type of neural networks) are great approximators of continuous functions - that’s why they handle so well the nuances and fluidity of the probabilistic world. That’s also why they handle so badly (not to say they don’t handle it at all!) the world of discrete (in the mathematical sense) problems, such as number theory, of which reasoning with prime numbers is just one instance. In other words, LLMs can solve arbitrarily complex polynomial approximation problems but it’s unrealistic to expect they can solve discrete combinatorial problems (otherwise would we be proving P = NP?). I continue to argue that the road to AGI needs both continuous and discrete ways of knowledge representation and reasoning. Food for thought
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
1 Comment
Like Comment
To view or add a comment, sign in
Sabyasachi Patajoshi

Looking for AI/ML Engineer Roles | Ex Cyclica (Recursion) | MS CS Columbia University | Google Cloud Certifed Professional Machine Learning Engineer and Microsoft Certifed Azure Data Scientist | Triple AI Masters
4mo Edited
Report this post
Correction : ChatGPT seems to use Beam search to overcome deficiencies in the original post, hence my initial review of the post or the original post itself may not be entirely correct. Like older sequence-to-sequence LSTM/RNN models, GPT utilizes Beam search to get globally relevant answers. So the mentioned problem is more likely to be related to domain knowledge of the model rather than the prediction algorithm.
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
Like Comment
To view or add a comment, sign in
Thomas Manandhar-Richardson

Data Scientist at Bryant Research & Researcher at Good Growth Co | Building a better food system for everyone | Posts about Large Language Models | Effective Altruist | Dresses smart, talks too much.
4mo
Report this post
This is an essential point, but it has an easy solution. Sure, LLMs generate their answers one word at a time and can't look back at what they're writing, as they're writing it. But the key thing here is "as they're writing it". You can just ask them to reflect on their answer after they've provided it. Simply follow all questions like this with "Are you sure? i can't check it myself". That will force it to look at it's answer and in my experience, it nearly always self corrects. The best illustration of this for me was when GPT4 first launched: I asked it to "give me 10 highly cited papers on the link between income and happiness". GPT3.5 flat out invented 6 of them! But GPT4 got 8 correct. The 2 it got wrong, was because it swapped the author names by accident or got the journal wrong. But when I simply followed up with "Are all these references correct?" it easily corrected both mistakes.
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
Like Comment
To view or add a comment, sign in
Muhammad Raheel Anwar

Senior Flutter Developer | UI UX Designer | Large Language Models Expert
4mo
Report this post
LLM is not what you think it is. It works based on prediction upon previous tokens, so it can be wrong and worst too. This can only be possible to have a verification of the prediction upon each token generation. #llm #LanguageModel #AI #chatgpt #generativeAI
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
2 Comments
Like Comment
To view or add a comment, sign in
Abhijit Gupta, PhD

PhD Machine Learning | Data Scientist @ Tesco | Hackathon champion | Algorithms, AI R&D, ML, Statistics | FinTech
4mo Edited
Report this post
Well, ardent proponents of LLMs would say, “Change your prompt/do prompt engineering “. LLMs internally are auto-regressive machines. Tokens are generated sequentially based on probability. There’s a small but non-zero chance that it picks up an incorrect token for the query, such as the first one. The second one is based on conditional probability given the first one and the context. So, it’s very possible that you can get an utter nonsense response from them. Funnily enough, if you add “a” and modify the prompt to “Is 3307 a prime ?” you can get a more sensible response, highlighting the importance of domain knowledge and miles to cover before we can expect LLMs to fully grasp nuanced queries.
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
Like Comment
To view or add a comment, sign in
Aakash Gupta Aakash Gupta is an Influencer

Builder @Think Evolve | Data Scientist | Top Voice 2024
4mo Edited
Report this post
This is critical to how LLMs operate. The tokens generated by the LLM are dependent on the one before it. So if the first token is incorrect, the model will keep generating the wrong response. I am sharing links to two quick experiments, that I performed. Would have loved to share the images and video clips, but the following links will do: 1. 3307 is not a prime number https://lnkd.in/dBi-y9ic 2. 3307 is a prime number https://lnkd.in/dipkmPam How do you prevent such hallucinations: COT or Chain-Of-Thought is one of the process used to ground the LLM and make it question its assumptions. In the first instance, I asked the LLM to check its assumptions, and gave it the information that prime numbers cannot be divided by primes smaller than itself. This statement made it to reevaluate its statement, and provide the correct answer. 😀 FYI Anthropomorphism: is the attribution of human traits, emotions, or intentions to non-human entities, including animals, deities, and objects. This term is often used to describe how people perceive inanimate objects or animals as having human-like qualities or characteristics.
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
1 Comment
Like Comment
To view or add a comment, sign in
Seray Şenay Çörekçi

Language Teacher | Curriculum Designer
4mo
Report this post
That makes a lot of sense and here is why. The statement "cannot be divided" is problematic. Because they in fact can but the results have pieces smaller than one whole. (That is . Something) 5 metres divided by 2 is 2.5, we can's say we can't divide when we use those numbers right? If you can't still make the connection; remember those jug and water videos that parents were teaching kids how to communicate code. If they missed a step the water would be all over the place. Any child with a calculator in the class would be able to understand prime numbers can be divided but teacher said "I said so." Today we say "I said so" to ai, tomorrow A robot may not injure a human being or, through inaction, allow a human being to come to harm. "But this is not an injury, because I said so..." Ai showed us, we can't use teacher's authority (or parent's) to right the mistaken info. Today the opportunity is fixing the definition, tomorrow something else. #ai #future #education #communication
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
1 Comment
Like Comment
To view or add a comment, sign in
Patrick FitzGerald
4mo Edited
Report this post
This is not what I would use an LLM for, but Andriy provides a great example of a larger principle. In real life, you would normally perform steps like: 🔍📋 ask the model to verify its answers or to show the steps of its reasoning process ✅🔄 have the model cross-check its initial response before finalising it 🤝🤖🔢 use ensemble methods, in which multiple models generate answers independently, and a final decision is made based on consensus or majority voting. None - NONE - of these will eliminate the problem entirely. "Generarive AI is therefore useless" - well, now re-paste the prompt 1,000 times into different models and then ask 1,000 average working humans the problem. One thing I notice in my discussions is the assumption that workforces are filled with 100% accurate, never-copy-and-paste, perfectly reliable human beings. Generative AI lifts boats in the water.
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
Like Comment
To view or add a comment, sign in
Kevin Ulland

Executive Director @ RR Donnelley | Co-Host Acceptance Criteria Podcast | Product Vision & Strategy | Cross-Functional Team Leadership | Product Evangelist
4mo
Report this post
E X A C T L Y - it's ruining the "AI" brand when there are tons of interesting non-LLM machine learning uses that will get tarred along with these huckster's. Also, shameless plug for anyone who wants to learn more in our episode on how these ML and LLM solutions work: https://lnkd.in/g6erDD99
Andriy Burkov Andriy Burkov is an Influencer

PhD in AI, ML at TalentNeuron, author of 📖 The Hundred-Page Machine Learning Book and 📖 the Machine Learning Engineering book
4mo

LLMs generate the answer one token at a time and the second token isn't known as long as the first token wasn't generated. Knowing this is crucial for understanding why LLMs generate nonsensical answers trying to explain unexplainable. The below screenshot illustrates the problem perfectly. The user asks whether a number prime or not. The LLM generates the first word, which is "No" in this case. That's it. There's no way back. It will continue generating a logical explanation of a wrong answer, which is impossible. The chance of generating a wrong first token is never zero, so the situations like this are inevitable. And there's nothing to laugh at if you understands how LLMs work.
1 Comment
Like Comment
To view or add a comment, sign in

1,396 followers

164 Posts

View Profile Connect

Gaurav Malhotra’s Post

More Relevant Posts

Explore topics