different perspectives in "dealing with Agent based Solutions" People, - do not care about "the Metrics" , just connect - run and deploy - do care about "the Metrics" and spend efforts on it - do care about "the Experience" and "the dimension" of the problem it solves Vs do not care just build and deploy. So What is "the Metrics" ? (on the Problem we solve - not a generic benchmarks ) - the Usual RAG evaluation? or Do Agents Planning / ReACTing need a different measurement as not all the agents are RAG? - the conventional scores like accuracy (W/ weighted) / precision with the expected solutions Vs the Agents solutions in the end of action ? - the Confidence/ Conformal kind of Scores for the agents "When used to do Prediction" ? and What is "the Experience"? - Whether Human In the Loop is needed? - How often and Where do we use it ? Does it Make the User Annoyed or Enabled? - Does the Design of HITL makes the system benefitted ? or just Wasted? and What is the dimension of the Solution? - Let's say I want to solve the problem of Why I received Lesser Redemption than What I had invested in Mutual Funds? (a common user may not know in detail, but this is a problem and a agent is put to solve this?) - If the Agent gives the retrieved info on an article , but It does move to the next layer of action to check the context/details of the User Account (provided the flow of actions are in place while defining the agent itself) - here the article based output is still relevant but it did not finish the entire loop - Or if the Agent planned and wrongly called the another Agent to ask for the help... it goes on..... -- Ultimately only the partial solution is attempted. same thing you can face in "agentic code editors/ app builders like replit,v0...) - they miss to finish nail on the right error / module. And there is another one Issue which is continued effect from the Bias of the Engineer : take the case , in which "Agent has to build a regression model" --> the bias starts when "the Engineer missed to set right expectations and instructions which need to be mutually exclusive and completely exhaustive" --> examples , the engineer missed to guide how to plan (like check for distributions , or check for hyper-parameters or even select right feature engineering - how and why - may not be explicitly given but implicitly need to be trigged) Using Agents in the Places of Deterministic Decision Making might still be challenging in the enterprise context. Shouldn't We Focus on Everything? Especially the Experience and Metrics part