Samuel Tan’s Post

APJ (Gen)AI @ Databricks 🧱 | Futurist & Trail Blazer 🔥 | Youth Servant Leader @ New Creation Church | Brickserves CSR Leader | VMware Alumni ☁️

2w Edited

✅ Generate evaluation datasets in 5 minutes. ✅ Evaluate agent quality without waiting for subject matter experts to label data ✅ Quickly identify and fix low-quality outputs. Getting evals right is the first step to building a working, production agent system. We help to automate the *building* of evals!

Streamline AI Agent Evaluation with New Synthetic Data Capabilities

databricks.com

To view or add a comment, sign in

More Relevant Posts

Daniel Miessler

AI / Security Researcher and Entrepreneur. Founder/CEO of Unsupervised Learning. Building AI that upgrades humans.
1mo
Report this post
I created a system that lets me use o1-preview to rate the human-level quality of results from lesser models. I give it: * The input * The prompt * The output And then I use o1 to assess how well it did. And it's flexible so you can use whatever models, upgrade the rating system, or whatever. https://lnkd.in/gZ5RZPS4

Using the Smartest AI to Rate Other AI

danielmiessler.com

4 Comments
Like Comment
To view or add a comment, sign in
Agus Sudjianto

A geek who can speak: Co-creator of PiML, SVP Risk & Technology H2O.ai, Retired EVP-Head of Wells Fargo MRM
10mo Edited
Report this post
SR11-7 Principles and GenAI Model Validation. On a fireside chat with Sri Satish Ambati, the founder and CEO of H2O.ai today. A fun conversation on my favorite subject: model validation and model risk and this time specifically on GenAI. Learning from the experience from predictive models, the principles of SE11-7 are still applicable. The evaluation metrics are different (e.g., hallucination, toxicity, fairness etc) from predictive models but the elements are similar. Conceptual Soundness: - Data suitability and quality—unfortunately, mostly being neglected in many LLM training - Prompt design and testing (variable selection in predictive models) - Interpretability—yes, we need to understand and evaluate the embedding and can be done easily using dimensionality reduction, clustering and visualization. - Benchmarking Outcome Analysis: - Identification of performance weakness (prompts or cluster of prompts and their responses) - Reliability/uncertainty of outcomes - Robustness to prompt perturbation - Resilience: performance under distribution drift
8 Comments
Like Comment
To view or add a comment, sign in
Raymond Randle III

Account Executive @ BMC Software
4mo
Report this post
Discover how AI-powered service models are revolutionizing troubleshooting processes in the tech industry! 🚀

AI-Powered Service Models Speed Troubleshooting

https://thenewstack.io
Like Comment
To view or add a comment, sign in
Greg Bukowski

Strategic CTO at BMC Software
4mo
Report this post
Discover how AI-powered service models are revolutionizing troubleshooting processes in the tech industry! 🚀

AI-Powered Service Models Speed Troubleshooting

https://thenewstack.io
Like Comment
To view or add a comment, sign in
Towards AI

269,441 followers
3mo
Report this post
RouteLLM: How I Route to The Best Model to Cut API Costs via #TowardsAI → https://bit.ly/46mwWte

RouteLLM: How I Route to The Best Model to Cut API Costs

https://towardsai.net
Like Comment
To view or add a comment, sign in
Steve Busi
1mo
Report this post
I’ve been working pretty deeply in the context area, around Tools. We have definitions for two classes of Tools here at Astral: Generative tools and Workflow tools. At implementation, a tool is a service that either enhances the context (Generative) or performs operations (Workflow) for the agent. As each Agent in MAS should have access to any tool, we are still in the world where traditional distributed architectures apply as the Agents and the Tools are simply services. https://lnkd.in/eV_SYN_B

Multi Agent Systems for Supply Chain: White Paper

start.astralinsights.ai
Like Comment
To view or add a comment, sign in
Daryan D.

Building a visionary and industry leading MLSecOps company. Join us, at protect.ai
10mo
Report this post
If you are wondering about the intersection of Regulation and AI in finance, a great place to start is to understand the impact of SR11-7. Learn more about how Protect AI can help you meet these compliance elements, and build more #secureai by implementing #MLSecOps.
Agus Sudjianto

A geek who can speak: Co-creator of PiML, SVP Risk & Technology H2O.ai, Retired EVP-Head of Wells Fargo MRM
10mo Edited

SR11-7 Principles and GenAI Model Validation. On a fireside chat with Sri Satish Ambati, the founder and CEO of H2O.ai today. A fun conversation on my favorite subject: model validation and model risk and this time specifically on GenAI. Learning from the experience from predictive models, the principles of SE11-7 are still applicable. The evaluation metrics are different (e.g., hallucination, toxicity, fairness etc) from predictive models but the elements are similar. Conceptual Soundness: - Data suitability and quality—unfortunately, mostly being neglected in many LLM training - Prompt design and testing (variable selection in predictive models) - Interpretability—yes, we need to understand and evaluate the embedding and can be done easily using dimensionality reduction, clustering and visualization. - Benchmarking Outcome Analysis: - Identification of performance weakness (prompts or cluster of prompts and their responses) - Reliability/uncertainty of outcomes - Robustness to prompt perturbation - Resilience: performance under distribution drift
Like Comment
To view or add a comment, sign in
Logz.io

14,060 followers
4mo
Report this post
🔍 Debug faster with AI-driven log analysis! 🚀 Identify issues quickly using Logz.io—scan logs in seconds, visualize trends, and surface exceptions effortlessly. Ready to streamline your troubleshooting? 🌟 #LogManagement #AI #TechInnovation #observability

Log Management - Logz.io

logz.io
Like Comment
To view or add a comment, sign in
LLM Strategic Solutions

35 followers
8mo
Report this post
GenAI/LLM combined with a process (termed "RAG") of including documents and data "context" you have in your enterprise can be immensely effective at unlocking the power of GenAI. Like almost everything, the process to move from testing to production becomes more complicated. In the article referenced, the author Wenqi, did an amazing job labeling these points of challenge! I highly recommend the read. It does view RAG as a bit homogeneous with respect to your data inclusion- meaning that in the production world there are reasons why we want to combine traditional data stores of structured and unstructured data with the advantages of Vector DBs. That enhancement does not change how important Wenqi suggestions are in this document and how she labeled the point in the process you and your engineers need to be aware of. https://lnkd.in/eiJB9EiZ

12 RAG Pain Points and Proposed Solutions

towardsdatascience.com
Like Comment
To view or add a comment, sign in
Shyam Sunder Kumar

LLM Agents x SLMs
7mo
Report this post
🚀 LLM-Based Autonomous Agents: Profiling Module Following our last post on the architecture of LLM-based autonomous agents, today we're diving into the Profiling Module. 🎯 Role Definition: Embeds specific role profiles (coder, analyst, expert) into prompts to guide behavior. 🧠 Behavior Shaping: Influences how the agent processes information and makes decisions, ensuring consistency and relevance. 📅 Tools: Which tools the agent will use to achieve the desired results. By embedding detailed role profiles, the Profiling Module ensures personalized, efficient, and consistent interactions, enabling agents to perform tasks with human-like decision-making. Stay tuned for more insights on enhancing autonomous agent capabilities!
Like Comment
To view or add a comment, sign in

5,786 followers

330 Posts

View Profile Connect

Samuel Tan’s Post

More Relevant Posts

Explore topics