Avi Chawla’s Post

View profile for Avi Chawla, graphic

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.

I don't rely on Accuracy in multiclass classification settings to measure model improvement 🧩 Consider probabilistic multiclass classification models. Using "Accuracy" as a signal to measure model improvement can be deceptive. It can mislead you into thinking that you are not making any progress in improving the model. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (530+ pages) by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?

  • chart, line chart
Gia N.

Sr Data Scientist

4mo

This metric is actually interesting. Btw, would it make more sense to use another metric, like F1, to assess improvement instead?

To view or add a comment, sign in

Explore topics