Haifeng Zhao’s Post

Leading expert in machine learning, specializing in the development of innovative AI products and architectures

3mo

Are you a Product Manager, Engineering Leader, or ML Enthusiast focused on building LLM applications or looking for practical ways to evaluate them effectively? In my latest article, Practical Guidance for Evaluating Large Language Model (LLM) Products, I cover key topics such as the importance of evaluation systems in LLM applications, differences between LLM and traditional ML evaluation, and methodologies for assessing relevance and faithfulness. I also dive into engineering considerations for optimizing business outcomes and efficiently allocating resources. I welcome your opinions and discussions—let's share insights on this ever-evolving field! https://lnkd.in/da6879Fd #LLM #LargeLanguageModel #MLEvaluation #MachineLearning #AI

Practical Guidance for Evaluating Large Language Model (LLM) Products

medium.com

1 Comment

Dr. Zachary Daniels

Cultivating Digital Success for Businesses | Your Partner for Growth and Online Visibility

3mo

Fascinating insights on LLM evaluations. Assessing relevance and faithfulness is critical. Haifeng Zhao

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Ben Wilde

Head of Innovation at Georgian
1w
Report this post
Whether you're selecting models, fine-tuning, or moving to production, having a robust evaluation strategy is important for LLM applications. In this new article we discuss various strategies for figuring out if an LLM application is working correctly (or not), covering: * Scaling evaluation from manual to automated testing * Data sourcing strategies including 'Correct by Construction' * Key metrics from traditional NLP to LLM-as-Judge approaches * Practical implementation considerations https://lnkd.in/gsSEqdZQ Thanks to my co-authors Angeline Yasodhara and Rodrigo Ceballos Lentini #LLM #AI #MLOps #SoftwareEngineering #GenerativeAI

Measurably Correct: Strategies for LLM Evaluation.

medium.com

1 Comment
Like Comment
To view or add a comment, sign in
Rakesh Sharma

Data & AI Architect| Multi Cloud Engineering and delivering high impact solutions
6mo
Report this post
Level up your LLM game: Choosing the right metrics for success Building effective Large Language Models (LLMs) requires a data-driven approach. But choosing the right metrics to evaluate success can be tricky! This post breaks it down for you: Why it matters: Measuring LLM performance is crucial, whether you're in research or production. It helps you: * Deliver the user experience you envisioned (production) * Validate your LLM's capabilities (research) Picking your champions: The best metrics depend on your LLM's purpose: * Classical tasks (classification): Look to libraries like torchmetrics and sklearn.metrics. * Generative tasks: * RAG: RAGAS library is your friend! * Code generation: Evaluate both execution accuracy and efficiency. * Limited ground truth? Build an LLM evaluator on a smaller dataset for broader assessment. Want to dive deeper? Stay tuned for a comprehensive metrics list by task category! #LLMs #MachineLearning #AI #Metrics P.S. Feel free to share your favorite LLM evaluation tricks in the comments! https://lnkd.in/gMagVxna

Evaluating LLM systems: Metrics, challenges, and best practices

medium.com
Like Comment
To view or add a comment, sign in
Luca Berton

Tech Consultant in AI, Automation, and Cloud Technologies | Cloud Native Architect & AI Engineer | Future-Proof Businesses | Published Author | Creator of Ansible Pilot | Ex-JPMorgan Ex-Red Hat
4mo
Report this post
Into the world of Large Language Models (LLMs) it's fascinating to see how much goes into optimizing them for performance and scalability. Techniques like prompt engineering, model pruning, and even load balancing are essential to make sure these models run efficiently in real-world applications. It’s incredible how much power and potential these models have, but it also shows me just how important it is to master these optimization strategies. Learning this stuff isn’t just about building something cool—it’s about making sure it works at scale. #AI #MachineLearning #LLMs #DeveloperLife https://lnkd.in/epHddQZg

Optimizing Your LLM for Performance and Scalability - KDnuggets

kdnuggets.com
Like Comment
To view or add a comment, sign in
Nikhil Akki

Engineering Lead (GenAI & Cloud)
8mo
Report this post
Unleash the Power of LLMs: Tailoring General Models for Specialized Tasks Fine-tuning Large Language Models is the key to optimizing AI performance for domain-specific tasks, from legal contracts to sentiment analysis. #AI #LLM #Finetuning https://lnkd.in/dsd-H5T8

Unleashing the Power of LLMs: Fine-tuning for Tailored Perfection

nikhilakki.in
Like Comment
To view or add a comment, sign in
Sergei RYBALKO

Senior Data Scientist, PhD @SAP Labs
5mo
Report this post
"Generative AI and large language models (LLMs) like GPT-4, Llama, and Claude have pathed a new era of AI-driven applications and use cases. However, evaluating LLMs can often feel daunting or confusing with many complex libraries and methodologies, It can easily get overwhelming. LLM Evaluation doesn't need to be complicated. You don't need complex pipelines, databases or infrastructure components to get started building an effective evaluation pipeline." https://lnkd.in/dk9FiAVH.

LLM Evaluation doesn't need to be complicated

philschmid.de
Like Comment
To view or add a comment, sign in
Observer

4,992 followers
4mo
Report this post
The supply of quality, real-world data used to train generative A.I. models appears to be dwindling as digital publishers increasingly restrict their access to their public data. That means the advancement of large language models like OpenAI’s GPT-4 and Google’s Gemini could hit a wall once the A.I.s scrape all the remaining data on the internet. To address the growing A.I. training data crisis, some experts are considering synthetic data as a potential alternative. Read more: https://lnkd.in/exifvztU By Aaron Mok

Can Synthetic Data Help Solve Generative A.I.’s Training Data Crisis?

https://observer.com
Like Comment
To view or add a comment, sign in
Ryan McDonough

Head of Software Engineering - Exploring AI // Global Legal Solutions
6mo Edited
Report this post
Selective State Spaces (SSMs) offer a promising alternative to large language models like GPT, addressing key challenges such as computational inefficiency with long sequences and high energy use. SSMs efficiently process long data sequences and focus on the most relevant information, making them faster and more resource-efficient. This adaptability makes SSMs ideal for tasks like legal document review, e-discovery, and legal research. Models like Mamba, which leverage SSMs, are looking like they outperform traditional models in handling extensive data, offering significant improvements in performance and practicality for legal tech applications. Have a deeper read about SSMs in my recent blog post: #ssm #llms #legaltech #artificialintelligence

Selective State Space Models, GPT but better?

ryanmcdonough.co.uk

3 Comments
Like Comment
To view or add a comment, sign in
Rubem Didini
2mo
Report this post
https://lnkd.in/dP9YXYp7 This article explains how to use an LLM (Large Language Model) to perform the chunking of a document based on concept of “idea”. I use OpenAI’s gpt-4o model for this example, but the same approach can be applied with any other LLM, such as those from Hugging Face, Mistral, and others. Everyone can access this article for free.....

Efficient Document Chunking Using LLMs: Unlocking Knowledge One Block at a Time

towardsdatascience.com

2 Comments
Like Comment
To view or add a comment, sign in
Siddharth M.

Solving Healthcare Business Problems | AI Leader | Roche
6mo
Report this post
🔆 Exciting approach to evaluating LLMs on factuality DeepMind's Search-Augmented Factuality Evaluator (SAFE): 🖌 Automated evaluation: Employs an LLM to assess the factuality of long-form text generated by LLMs. 🖌 Fact verification: Breaks down text into individual claims and uses Google Search to verify their accuracy. 🖌 Independent operation: Reduces the reliance on human annotators for evaluating LLM outputs. 🖌 Higher agreement rate: Demonstrates superior agreement with human judgments compared to individual human annotators. 🖌 F1@K metric: Extends the traditional F1 score to measure the overall factuality of long-form responses, balancing precision and recall based on desired length. 🖌 Open access: LongFact dataset and SAFE code available on GitHub for further research and development. 🖌 Potential for LLMs: Showcases the ability of LLMs to not only generate content but also evaluate and improve the quality of their own outputs. You can read more about it here - https://lnkd.in/gcXrzzJ6 #LLM #GenAI #AI #ML #Learning #Datascience

Long-form factuality in large language models

arxiv.org
Like Comment
To view or add a comment, sign in
Towards Data Science

639,382 followers
8mo Edited
Report this post
How to Interpret GPT2-Small - Mechanistic Interpretability on prediction of repeated tokens by Shuyang Xiang

How to Interpret GPT2-Small

towardsdatascience.com
Like Comment
To view or add a comment, sign in

1,849 followers

19 Posts

View Profile Follow

Haifeng Zhao’s Post

More Relevant Posts

Explore topics