Avi Chawla’s Post

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.

9mo

Evaluating model improvements with Accuracy can be misleading 🧩 The efficacy of a model improvement step is best determined using performance metrics. However, improving probabilistic multiclass-classification models using "Accuracy" as a signal can be deceptive. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. So make sure you are gradually progressing on the Accuracy front too. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?

3 Comments

Avi Chawla

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.

9mo

Find a more vivid explanation here: https://www.blog.dailydoseofds.com/p/how-to-reliably-improve-probabilistic

1 Reaction

Kevin Ruiz

Data Scientist & Business Intelligence Strategist | Legal Operations Specialist | Driving Impact with AI, Machine Learning, and Advanced Analytics | Advocate for Ethical AI Practices

9mo

Thank you, this is important.

Carlos Hernández

Chemical Engineer-Data Analyst JR-Chemical Analyst

9mo

Thanks for posting

See more comments

To view or add a comment, sign in

More Relevant Posts

Avi Chawla

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.
4mo
Report this post
I don't rely on Accuracy in multiclass classification settings to measure model improvement 🧩 Consider probabilistic multiclass classification models. Using "Accuracy" as a signal to measure model improvement can be deceptive. It can mislead you into thinking that you are not making any progress in improving the model. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (530+ pages) by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?
3 Comments
Like Comment
To view or add a comment, sign in
Tichaona Mutomba

Aspiring Data Science Student || Data Science and Analytics || Credit risk Analytics
1mo
Report this post
In the drive to build high-performing models, developers often concentrate on model development and prediction accuracy, leaving critical data issues like class imbalance, class overlap, noise, and heavily-tailed distributions unaddressed issues that are key to robust real-world performance. Class imbalance, where some classes are underrepresented, leads models to overlook minority events, while class overlap makes it difficult to distinguish between similar categories. Noise, such as mislabeled data or outliers, can mislead model training, and heavily tailed distributions, often found in risk data, skew predictions by giving undue weight to extreme values. Tackling these data challenges through techniques like resampling, robust loss functions, noise reduction, and transformations is essential to create models that are not only accurate but also resilient, fair, and effective across diverse applications in real world situations. #classimbalance #noise #machinelearning #classoverlap

Handling imbalanced data: 7 innovative techniques for successful analysis | Data Science Dojo

datasciencedojo.com

1 Comment
Like Comment
To view or add a comment, sign in
Vinay Banka
6mo
Report this post
Overview of the 10 Machine Learning Algorithms Here’s an overview of the algorithms I’ll cover. 1. Linear Regression: Used For: Regression Description: Linear regression draws a straight line called a regression line between the variables. This line goes approximately through the middle of the data points, thus minimizing the estimation error. It shows the predicted value of the dependent variable based on the value of the independent variables. Evaluation Metrics: Mean Squared Error (MSE): Represents the average of the squared error, the error being the difference between actual and predicted values. The lower the value, the better the algorithm performance. R-Squared: Represents the variance percentage of the dependent variable that can be predicted by the independent variable. For this measure, you should strive to get to 1 as close as possible 🔗https://lnkd.in/ggGxt_MY 🔗https://lnkd.in/gBqW6Ti2 2. Logistic Regression Used For: Classification Description: It uses a logistic function to translate the data values to a binary category, i.e., 0 or 1. This is done using the threshold, usually set at 0.5. The binary outcome makes this algorithm perfect for predicting binary outcomes, such as YES/NO, TRUE/FALSE, or 0/1. Evaluation Metrics: Accuracy: The ratio between correct and total predictions. The closer to 1, the better. Precision: The measure of model accuracy in positive predictions; shown as the ratio between correct positive predictions and total expected positive outcomes. The closer to 1, the better. Recall: It, too, measures the model’s accuracy in positive predictions. It is expressed as a ratio between correct positive predictions and total observations made in the class. F1 Score: The harmonic mean of the model’s recall and precision. The closer to 1, the better. 🔗https://lnkd.in/gBSSxUzh 3.Decision Trees: Used For: Regression & Classification Description: Decision trees are algorithms that use the hierarchical or tree structure to predict value or a class. The root node represents the whole dataset, which then branches into decision nodes, branches, and leaves based on the variable values. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression. 4.Naive Bayes Used For: Classification Description: This is a family of classification algorithms that use Bayes’ theorem, meaning they assume the independence between features within a class. Evaluation Metrics: Accuracy Precision Recall F1 score. 🔗https://lnkd.in/gsVkEsQA and see more Machine learning algorithms 2nd page uploaded

Mean Squared Error (MSE)

https://statisticsbyjim.com
Like Comment
To view or add a comment, sign in
Vinay Banka
6mo
Report this post
5. K-Nearest Neighbors (KNN) Used For: Regression & Classification Description: It calculates the distance between the test data and the k-number of the nearest data points from the training data. The test data belongs to a class with a higher number of ‘neighbors’. Regarding the regression, the predicted value is the average of the k chosen training points. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/gMxK2vSy 6. Support Vector Machines (SVM) Used For: Regression & Classification Description: This algorithm draws a hyperplane to separate different classes of data. It is positioned at the largest distance from the nearest points of every class. The higher the distance of the data point from the hyperplane, the more it belongs to its class. For regression, the principle is similar: hyperplane maximizes the distance between the predicted and actual values. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression Hyperplane: 🔗https://lnkd.in/gstszcfb 7. Random Forest: Used For: Regression & Classification Description: The random forest algorithm uses an ensemble of decision trees, which then make a decision forest. The algorithm’s prediction is based on the prediction of many decision trees. Data will be assigned to a class that receives the most votes. For regression, the predicted value is an average of all the trees’ predicted values. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/geNDkSDE 8.Gradient Boosting Used For: Regression & Classification Description: These algorithms use an ensemble of weak models, with each subsequent model recognizing and correcting the previous model's errors. This process is repeated until the error (loss function) is minimized. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/gMAvVYHV
Like Comment
To view or add a comment, sign in
Haiqing Hua

I share news from Chinese website (you can use google translate please also subscribe my YouTube Channel) | Ideologist | Poet | Futurist | Educator | Technologist | Business Analyst | Data Analyst | Realtor |
9mo
Report this post
Autocorrelation, also known as serial correlation, is a statistical concept that measures the degree of similarity between a time series and a lagged version of itself over successive time intervals. In simpler terms, it examines the correlation between observations at different time points within the same series. Autocorrelation is a critical concept in time series analysis and has several important implications: 1. **Identifying Patterns**: Autocorrelation helps in identifying patterns or dependencies present in a time series. For example, positive autocorrelation at lag 1 indicates that an observation is positively correlated with the preceding observation, suggesting the presence of a trend. 2. **Modeling Assumptions**: Many time series models, such as autoregressive (AR) and moving average (MA) models, assume certain levels of autocorrelation. Understanding the autocorrelation structure of a series is essential for selecting appropriate models and making valid inferences. 3. **Model Diagnostic**: Autocorrelation plots (ACF plots) are commonly used in model diagnostics to identify potential violations of modeling assumptions. Significant autocorrelation at specific lags in the ACF plot may indicate that the model does not adequately capture the temporal dependencies in the data. 4. **Forecasting Accuracy**: Autocorrelation information can be leveraged to improve forecasting accuracy. Models that account for autocorrelation patterns often outperform naive models, especially for time series data with strong autocorrelation. 5. **Inference and Hypothesis Testing**: Autocorrelation affects the standard errors of parameter estimates in time series models. Ignoring autocorrelation can lead to biased parameter estimates and invalid hypothesis tests. Techniques such as Newey-West standard errors or autocorrelation-robust inference are used to address this issue. Autocorrelation is commonly measured using the autocorrelation function (ACF) or the autocorrelation coefficient. The ACF is a plot of the autocorrelation values at different lags, while the autocorrelation coefficient quantifies the strength and direction of autocorrelation at specific lags. Overall, autocorrelation is a fundamental concept in time series analysis, providing valuable insights into the temporal structure of data and guiding the selection and evaluation of time series models.
Like Comment
To view or add a comment, sign in
Ashwini D

Aspiring Data Scientist with Strong Proficiency in Machine Learning, Deep Learning and NLP | Actively Seeking Learning and Growth Opportunities
5mo Edited
Report this post
Day21 Classification Metrics Accuracy : It's the ratio of correct predictions to total predictions. Formula= (True Positives+True Negatives)/Total Predictions Type I and Type II Errors: Type I Error (False Positive): Incorrectly predicting a positive outcome when it is actually negative. Type II Error (False Negative): Incorrectly predicting a negative outcome when it is actually positive. When Accuracy Can Mislead: Accuracy can be misleading in cases of imbalanced datasets. For example, if we're trying to identify terrorists among a large population of normal people, where only a tiny fraction are actual terrorists, a model that predicts everyone as normal could achieve very high accuracy. However, this model fails to identify any terrorists, making it ineffective for our goal. Precision: what proportion of predicted positive is truly positive. useful when the cost of false positives is high. Formula: True Positives/ (True Positives + False Positives) Recall: what proportion of actual positive is correctly classified. important when the cost of false negatives is high. Formula= True Positives / (True Positives + False Negatives) F1 Score: The harmonic mean of precision and recall. It balances precision and recall, especially useful when classes are imbalanced. Formula= 2× (Precision×Recall)/(Precision+Recall) Multiclass Classification:In multiclass classification, we extend these metrics to multiple classes: Macro Precision: Average precision over all classes. Weighted Precision: Average precision over all classes, weighted by the number of true instances for each class. The classification report combines all these metrics, providing a comprehensive evaluation of the model's performance across all classes. It's an essential tool for understanding your model's strengths and weaknesses.

3 Comments
Like Comment
To view or add a comment, sign in
Ronaldo Teixeira

Data Analyst | Econometrician | Senior Statistician | Economic Specialist at Upwork | Agricultural and Resource Economics | Quantitative Research | Impact Evaluation | Survey CTO Programmer| Research Consultant at OMR
3w
Report this post
Challenges and Solutions in Logistic Regression Modeling: A Practical Case in R While fitting a logistic regression model to predict the probability of default (Default), I encountered some common challenges that can undermine the reliability of the results. Using a fictitious dataset with variables such as Age, Income, and Credit History, the model produced extremely large coefficients with p-values equal to 1, indicating that none of the variables were statistically significant. Additionally, the predicted probabilities ranged from near-zero values (~2.14e-11) to exactly 1, suggesting issues with overfitting or perfect separation in the data. Identified Issues: 1️⃣ Small sample size: With only 5 observations, the model was unstable and unreliable. Logistic regression typically requires at least 10-20 observations per explanatory variable for robust results. 2️⃣ Collinearity among variables: Variables like Age and Income may be highly correlated, leading to imprecise coefficient estimates. 3️⃣ Different variable magnitudes: Variables with drastically different scales can hinder the model's convergence. Implemented Solutions: 🔄 Normalization of variables: I standardized Age and Income to bring their scales to a comparable level. * Collinearity check: Using Variance Inflation Factor (VIF), I evaluated the degree of correlation among variables and made adjustments where necessary. * Model re-estimation: After addressing these issues, I refitted the model and recalculated the predicted probabilities to ensure the results were more stable and interpretable. Key Takeaways: Logistic regression is a powerful tool, but its reliability depends on data quality and proper preprocessing. Challenges like small sample sizes, collinearity, and scale differences can significantly impact model performance. Tools like R provide robust methods to diagnose and address these issues, making it easier to derive meaningful insights. Have you faced similar challenges while building models? Share with me please and thank you !!
Like Comment
To view or add a comment, sign in
Justin Rashidi

Co-Founder & Head of Consumer Strategy at SeedX, Inc. | Forbes 30 Under 30 | Helping Organizations Drive Growth Through Data-Driven Innovation and Transformation
1mo
Report this post
Mixed Media Models (MMM) are making a comeback. But honestly, they are a monster to implement and fine-tune. There are a lot of off-the-shelf solutions, but they are hit or miss if their "model" will be accurate for your business. If you are new to the data science world and thinking about implementing a data warehouse for advanced analytics, you probably should not start with MMMs. Where should you start? Casual impact analysis. Now, what is it? Casual impact analysis uses Bayesian modeling to better understand the impact of new "interventions" on an outcome. For example, when you launch new channels or make changes in an organization, what is its impact on revenue or other core KPIs. This method is better than a standard A/B test because it considers historical (time-series) data and allows you to isolate macro events that could be happening across your entire business. Now, this sounds complicated, and a lot of information is written for other data scientists, but I am more of an explain it to me like I am five kind of guy. Bayesian modeling will take all of your historical data (the more the better) and predict what it thinks will happen if nothing changes. You then introduce the new variable you want to test and Bayesian modeling will give you the impact it had against what it predicted. The benefits of this is since you are running this on a small test group, all marketing efforts can still continue as normal. This helps to understand the true impact of the variable you are introducing without any confounding effects you could experience in a standard A/B test. And since your are taking your historic data it is more accurate. The best part is that this is significantly cheaper to set up and run and can quickly be used to validate assumptions. From here, you can start to graduate to MMMs, where you take your tests and feed them back into your main model to help fine-tune. But if you are new, just start with casual impact analysis.
Like Comment
To view or add a comment, sign in
Michael Stroud

Data Scientist - Associate | Data Analyst | | Python | SQL | Google Cloud | ETL | BigQuery | Data Pipelines| Cloud Composer | Looker | GIT | Scikt-Learn
5mo
Report this post
Understanding the Confusion Matrix for Model Performance 📊 For data analysts and machine learning engineers, the Confusion Matrix is more than just a tool—it's a roadmap to model improvement. By breaking down true positives, false positives, true negatives, and false negatives, it provides a clear snapshot of your model’s accuracy and areas for improvement. One key insight from the Confusion Matrix is the ability to identify specific types of errors your model is making. For example, a high number of false positives might indicate that your model is overly sensitive and is incorrectly labeling negative samples as positive. Conversely, a high number of false negatives suggests that your model might be missing out on detecting positive instances. Armed with this detailed information, you can adjust your model's threshold or consider other techniques such as resampling your data, feature engineering, or trying different algorithms to improve your model's overall performance. The Confusion Matrix not only guides model refinement but also aids in explaining model reliability to stakeholders who may not have a technical background, fostering better decision-making and trust in your predictive systems. In what ways has understanding the Confusion Matrix improved your model performance? Share your experiences and insights below! #MachineLearning #DataScience
Like Comment
To view or add a comment, sign in
Muhammad Umer Naseem

Data Scientist | Machine Learning | Deep Learning
4mo Edited
Report this post
𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝘁𝘀 𝗟𝗶𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻𝘀: Linear regression is a fundamental tool in data science, but it's not without its challenges: 𝗠𝘂𝗹𝘁𝗶𝗰𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆: When predictor variables are highly correlated, it can distort the coefficient estimates and reduce the model's reliability. 𝗦𝗺𝗮𝗹𝗹 𝗦𝗮𝗺𝗽𝗹𝗲 𝗦𝗶𝘇𝗲: If the number of samples is less than the number of variables, the model can become unstable and overfit the data. To address these issues, we turn to more advanced techniques: 𝗥𝗶𝗱𝗴𝗲 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: Adds a penalty to the model that shrinks the coefficients, mitigating the impact of multicollinearity. 𝗟𝗮𝘀𝘀𝗼 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: Goes a step further by performing feature selection, setting some coefficients to zero, thus simplifying the model. 𝗣𝗮𝗿𝘁𝗶𝗮𝗹 𝗟𝗲𝗮𝘀𝘁 𝗦𝗾𝘂𝗮𝗿𝗲𝘀 (𝗣𝗟𝗦): Focuses on finding a set of components that explain the maximum variance in both the predictors and the response, especially useful when predictors are highly collinear. 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 (𝗣𝗖𝗔): Reduces dimensionality by transforming the predictors into a set of uncorrelated components, retaining most of the original variability with fewer variables. These techniques are powerful tools in the data scientist's toolkit, allowing us to build more robust and interpretable models. 🌟 #DataScience #MachineLearning #LinearRegression #RidgeRegression #LassoRegression #PLS #PCA #FeatureSelection #BigData
Like Comment
To view or add a comment, sign in

87,126 followers

2,239 Posts

View Profile Follow

Avi Chawla’s Post

More Relevant Posts

Explore topics