I don't rely on Accuracy in multiclass classification settings to measure model improvement 🧩 Consider probabilistic multiclass classification models. Using "Accuracy" as a signal to measure model improvement can be deceptive. It can mislead you into thinking that you are not making any progress in improving the model. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (530+ pages) by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?
Avi Chawla’s Post
More Relevant Posts
-
Evaluating model improvements with Accuracy can be misleading 🧩 The efficacy of a model improvement step is best determined using performance metrics. However, improving probabilistic multiclass-classification models using "Accuracy" as a signal can be deceptive. In other words, it is possible that we are actually making good progress in improving the model... ...but “Accuracy” is not reflecting that (YET). The problem arises because Accuracy only checks if the prediction is correct or not. And during iterative model building, the model might not be predicting the true label with the highest probability... ...but it might be quite confident in placing the true label in the top "k" output probabilities. Thus, using a "top-k accuracy score" can be a much better indicator to assess whether my model improvement efforts are translating into meaningful enhancements in predictive performance or not. For instance, if top-3 accuracy increases from 75% to 90%, it is clear that the improvement technique was effective: - Earlier, the correct prediction was in the top 3 labels only 75% of the time. - But now, the correct prediction is in the top 3 labels 90% of the time. Thus, one can effectively direct the engineering efforts in the right direction. Of course, what I am saying should ONLY be used to assess the model improvement efforts. This is because true predictive power will be determined using traditional Accuracy. So make sure you are gradually progressing on the Accuracy front too. As depicted in the image below: - It is expected that “Top-k Accuracy” may continue to increase during model iterations. This reflects improvement in performance. - Accuracy, however, may stay the same during successive improvements. Nonetheless, we can be confident that the model is getting better and better. For a more visual explanation, check out this issue: https://lnkd.in/dP_h8SFM. -- 👉 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today: https://lnkd.in/gzfJWHmu -- 👉 Over to you: What are some other ways to assess model improvement efforts?
To view or add a comment, sign in
-
Autocorrelation, also known as serial correlation, is a statistical concept that measures the degree of similarity between a time series and a lagged version of itself over successive time intervals. In simpler terms, it examines the correlation between observations at different time points within the same series. Autocorrelation is a critical concept in time series analysis and has several important implications: 1. **Identifying Patterns**: Autocorrelation helps in identifying patterns or dependencies present in a time series. For example, positive autocorrelation at lag 1 indicates that an observation is positively correlated with the preceding observation, suggesting the presence of a trend. 2. **Modeling Assumptions**: Many time series models, such as autoregressive (AR) and moving average (MA) models, assume certain levels of autocorrelation. Understanding the autocorrelation structure of a series is essential for selecting appropriate models and making valid inferences. 3. **Model Diagnostic**: Autocorrelation plots (ACF plots) are commonly used in model diagnostics to identify potential violations of modeling assumptions. Significant autocorrelation at specific lags in the ACF plot may indicate that the model does not adequately capture the temporal dependencies in the data. 4. **Forecasting Accuracy**: Autocorrelation information can be leveraged to improve forecasting accuracy. Models that account for autocorrelation patterns often outperform naive models, especially for time series data with strong autocorrelation. 5. **Inference and Hypothesis Testing**: Autocorrelation affects the standard errors of parameter estimates in time series models. Ignoring autocorrelation can lead to biased parameter estimates and invalid hypothesis tests. Techniques such as Newey-West standard errors or autocorrelation-robust inference are used to address this issue. Autocorrelation is commonly measured using the autocorrelation function (ACF) or the autocorrelation coefficient. The ACF is a plot of the autocorrelation values at different lags, while the autocorrelation coefficient quantifies the strength and direction of autocorrelation at specific lags. Overall, autocorrelation is a fundamental concept in time series analysis, providing valuable insights into the temporal structure of data and guiding the selection and evaluation of time series models.
To view or add a comment, sign in
-
𝗠𝗮𝗰𝗵𝗶𝗻𝗲 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 𝗧𝗶𝗺𝗲 𝗖𝗼𝗺𝗽𝗹𝗲𝘅𝗶𝘁𝘆 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Linear regression scales well with large datasets, where n is the number of data points and p is the number of features. 𝗟𝗼𝗴𝗶𝘀𝘁𝗶𝗰 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 → Training Time Complexity: O(n * p * i) → Prediction Time Complexity: O(p) 🟢 Involves iterative updates (i = iterations), making it slower than linear regression but efficient for binary classification. 𝗞-𝗡𝗲𝗮𝗿𝗲𝘀𝘁 𝗡𝗲𝗶𝗴𝗵𝗯𝗼𝗿𝘀 (𝗞-𝗡𝗡) → Training Time Complexity: O(1) → Prediction Time Complexity: O(n * p) 🟠 Training is instant, but prediction time grows with dataset size n, as it calculates the distance to every point. 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗩𝗲𝗰𝘁𝗼𝗿 𝗠𝗮𝗰𝗵𝗶𝗻𝗲𝘀 (𝗦𝗩𝗠) → Training Time Complexity: O(n^2 * p) (or O(n^3) for kernels) → Prediction Time Complexity: O(s * p) (s = support vectors) 🔴 Computationally heavy for training, especially with kernels, but effective in high-dimensional spaces. 𝗗𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗧𝗿𝗲𝗲𝘀 → Training Time Complexity: O(n * p * log(n)) → Prediction Time Complexity: O(log(n)) 🟢 Efficient for both training and prediction, well-suited for non-linear data. 𝗥𝗮𝗻𝗱𝗼𝗺 𝗙𝗼𝗿𝗲𝘀𝘁 → Training Time Complexity: O(k * n * p * log(n)) (k = number of trees) → Prediction Time Complexity: O(k * log(n)) 🟠 Scales better than decision trees by reducing overfitting, but increases complexity with added trees. 𝗡𝗮𝗶𝘃𝗲 𝗕𝗮𝘆𝗲𝘀 → Training Time Complexity: O(n * p) → Prediction Time Complexity: O(p) 🟢 Simple and very fast, both in training and prediction, making it ideal for high-dimensional datasets. 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 → Training Time Complexity: O(i * n * p) (i = iterations) → Prediction Time Complexity: O(p) 🔴 Highly dependent on the architecture and number of layers, with significant training times. → Note: Time complexities provide a general guideline; actual performance can vary based on data, implementation, and hardware.
To view or add a comment, sign in
-
In the drive to build high-performing models, developers often concentrate on model development and prediction accuracy, leaving critical data issues like class imbalance, class overlap, noise, and heavily-tailed distributions unaddressed issues that are key to robust real-world performance. Class imbalance, where some classes are underrepresented, leads models to overlook minority events, while class overlap makes it difficult to distinguish between similar categories. Noise, such as mislabeled data or outliers, can mislead model training, and heavily tailed distributions, often found in risk data, skew predictions by giving undue weight to extreme values. Tackling these data challenges through techniques like resampling, robust loss functions, noise reduction, and transformations is essential to create models that are not only accurate but also resilient, fair, and effective across diverse applications in real world situations. #classimbalance #noise #machinelearning #classoverlap
To view or add a comment, sign in
-
𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗟𝗶𝗻𝗲𝗮𝗿 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗮𝗻𝗱 𝗜𝘁𝘀 𝗟𝗶𝗺𝗶𝘁𝗮𝘁𝗶𝗼𝗻𝘀: Linear regression is a fundamental tool in data science, but it's not without its challenges: 𝗠𝘂𝗹𝘁𝗶𝗰𝗼𝗹𝗹𝗶𝗻𝗲𝗮𝗿𝗶𝘁𝘆: When predictor variables are highly correlated, it can distort the coefficient estimates and reduce the model's reliability. 𝗦𝗺𝗮𝗹𝗹 𝗦𝗮𝗺𝗽𝗹𝗲 𝗦𝗶𝘇𝗲: If the number of samples is less than the number of variables, the model can become unstable and overfit the data. To address these issues, we turn to more advanced techniques: 𝗥𝗶𝗱𝗴𝗲 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: Adds a penalty to the model that shrinks the coefficients, mitigating the impact of multicollinearity. 𝗟𝗮𝘀𝘀𝗼 𝗥𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝗼𝗻: Goes a step further by performing feature selection, setting some coefficients to zero, thus simplifying the model. 𝗣𝗮𝗿𝘁𝗶𝗮𝗹 𝗟𝗲𝗮𝘀𝘁 𝗦𝗾𝘂𝗮𝗿𝗲𝘀 (𝗣𝗟𝗦): Focuses on finding a set of components that explain the maximum variance in both the predictors and the response, especially useful when predictors are highly collinear. 𝗣𝗿𝗶𝗻𝗰𝗶𝗽𝗮𝗹 𝗖𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 (𝗣𝗖𝗔): Reduces dimensionality by transforming the predictors into a set of uncorrelated components, retaining most of the original variability with fewer variables. These techniques are powerful tools in the data scientist's toolkit, allowing us to build more robust and interpretable models. 🌟 #DataScience #MachineLearning #LinearRegression #RidgeRegression #LassoRegression #PLS #PCA #FeatureSelection #BigData
To view or add a comment, sign in
-
5. K-Nearest Neighbors (KNN) Used For: Regression & Classification Description: It calculates the distance between the test data and the k-number of the nearest data points from the training data. The test data belongs to a class with a higher number of ‘neighbors’. Regarding the regression, the predicted value is the average of the k chosen training points. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/gMxK2vSy 6. Support Vector Machines (SVM) Used For: Regression & Classification Description: This algorithm draws a hyperplane to separate different classes of data. It is positioned at the largest distance from the nearest points of every class. The higher the distance of the data point from the hyperplane, the more it belongs to its class. For regression, the principle is similar: hyperplane maximizes the distance between the predicted and actual values. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression Hyperplane: 🔗https://lnkd.in/gstszcfb 7. Random Forest: Used For: Regression & Classification Description: The random forest algorithm uses an ensemble of decision trees, which then make a decision forest. The algorithm’s prediction is based on the prediction of many decision trees. Data will be assigned to a class that receives the most votes. For regression, the predicted value is an average of all the trees’ predicted values. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/geNDkSDE 8.Gradient Boosting Used For: Regression & Classification Description: These algorithms use an ensemble of weak models, with each subsequent model recognizing and correcting the previous model's errors. This process is repeated until the error (loss function) is minimized. Evaluation Metrics: Accuracy, precision, recall, and F1 score -> for classification MSE, R-squared -> for regression 🔗https://lnkd.in/gMAvVYHV
To view or add a comment, sign in
-
Uncertainty pervades every stage of the #ML pipeline, including: -Data preprocessing -Feature engineering -Model training -Model prediction -Model deployment In data-driven ML methods, uncertainty arises from: -The intrinsic ambiguity in the data -Variations in sampling -Flawed models -Errors in model approximations Even in rule-based or symbolic ML systems, the complexity of the market introduces uncertainty, affecting any conclusions drawn. Addressing uncertainty is crucial not as a theoretical idea but as a practical requirement, as #trading systems must generate profit despite working with imperfect information. For now, the most effective way I have found to use total uncertainty is by: -Modeling it to switch-off the system -Iniciating a process of optimization -Metric of quality performance -Filtering and assets selection To understand it, a quick reminder of some concepts: -Aleatoric-statistical uncertainty: What will a random sample drawn from a probability distribution be? -Epistemic-systematic uncertainty: What is the relevant probability distribution? -Total uncertainty: The sum of aleatoric and epistemic when both of them are independent The basic protocol is based on: PnL → Conformal prediction → Switch-off (or optimization) Some considerations: -Although robust and stochastic optimization methods are often used to handle optimization under Bayesian uncertainty, they don’t effectively predict epistemic uncertainty with sufficient accuracy -By integrating models that account for epistemic uncertainty, we can improve the optimization process by considering the gaps in information or data -It is necessary to change the typical CP method to fit into this phase of the system development -The rule is simple: If PnL < Lower interval: Switch-off, else: continue Aleatoric uncertainty arises from the inherent randomness or noise in the data: -This type of uncertainty can’t be reduced by collecting more data -Conformal prediction helps quantify this uncertainty by constructing prediction intervals that account for the variability in the data -To handle it with conformal prediction we have some key methods: •Nonconformity measure: To capture the deviations of the closed trades from the PnL’s benchmark •Prediction intervals: For this application we are interested only in the lower interval 👇👇👇Continue in the comments👇👇👇
To view or add a comment, sign in
-
Linear regression (LR) and generalized linear models (GLM) are two great tools in data science, but understanding their difference is key to mastering their application and maximizing analytical insights. While both model relationships between variables, they differ in assumptions, formulations, and handling of data variability. Understanding these differences is crucial for effective analysis. LR assumes normally distributed continuous response variables with constant variance. GLM relaxes these assumptions, accommodating various response variable types and distributions through link functions. Adjustment to Data Variability: LR struggles with heteroscedasticity and non-normal errors, while GLM handles them by choosing appropriate error distributions and link functions. GLM also specializes in modeling categorical and count data, which LR cannot handle well. LR and GLM offer distinct advantages in statistical modeling. By understanding their differences and how they adjust to data variability, analysts can make more informed decisions and draw reliable conclusions from their data. - You can check this post from Daily Dose of Data Science from Avi Chawla for a deeper understanding and a great reading resource!!
Generalized Linear Models (GLMs): The Supercharged Linear Regression
dailydoseofds.com
To view or add a comment, sign in
-
Linear Regression is one of the most important tools in a Data Scientist's toolbox. Here's everything you need to know in 3 minutes. 1. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn). 2. OLS does this by minimizing the sum of the squares of the differences between the observed dependent variable values and those predicted by the linear model. These differences are called "residuals." 3. "Best fit" in the context of OLS means that the sum of the squares of the residuals is as small as possible. Mathematically, it's about finding the values of β0, β1, ..., βn that minimize this sum. 4. Slopes (β1, β2, ..., βn): These coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant. 5. R-squared (R²): This statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data. 6. t-Statistics and p-Values: For each coefficient, the t-statistic and its associated p-value test the null hypothesis that the coefficient is equal to zero (no effect). A small p-value (< 0.05) suggests that you can reject the null hypothesis. 7. Confidence Intervals: These intervals provide a range of plausible values for each coefficient (usually at the 95% confidence level). Understanding and interpreting these outputs is crucial for assessing the quality of the model, understanding the relationships between variables, and making predictions or conclusions based on the model.
To view or add a comment, sign in
-
Understanding the Confusion Matrix for Model Performance 📊 For data analysts and machine learning engineers, the Confusion Matrix is more than just a tool—it's a roadmap to model improvement. By breaking down true positives, false positives, true negatives, and false negatives, it provides a clear snapshot of your model’s accuracy and areas for improvement. One key insight from the Confusion Matrix is the ability to identify specific types of errors your model is making. For example, a high number of false positives might indicate that your model is overly sensitive and is incorrectly labeling negative samples as positive. Conversely, a high number of false negatives suggests that your model might be missing out on detecting positive instances. Armed with this detailed information, you can adjust your model's threshold or consider other techniques such as resampling your data, feature engineering, or trying different algorithms to improve your model's overall performance. The Confusion Matrix not only guides model refinement but also aids in explaining model reliability to stakeholders who may not have a technical background, fostering better decision-making and trust in your predictive systems. In what ways has understanding the Confusion Matrix improved your model performance? Share your experiences and insights below! #MachineLearning #DataScience
To view or add a comment, sign in
Sr Data Scientist
4moThis metric is actually interesting. Btw, would it make more sense to use another metric, like F1, to assess improvement instead?