Many predictive models shine in their initial training, but when you start using them, they flop. One of the key reasons is overfitting. Hear more about it below ▶️
I think that it is always better to explain overfitting through a bias-variance trade-off since it is not always about the absence of data (in many real-world problems such as cyber and finance, temporality plays a major tool). Cool video!
Zohar Bronfman - Thanks! BTW - in addition to ensuring a robust amount of datasets and a significant percentage of positive targets, I would also recommend including enough subsets of data to facilitate cross-validation. This can ensure that the model generalizes well to predictive data. Thanks for sharing.
Zohar Bronfman Overfitting remains a critical challenge in predictive modeling. Great post and your thoughts are amazing
AI expert with 20 years industry experience and PhD in machine learning
1moGood introduction, and good point about the risk of class imbalance in training data. But I wouldn't want people to think that there's a magic number of training data that will save them from overfitting. What "too few training data" means depends on the complexity of your model. The more complex your model is (I.e. the more parameters it has), the more training data will be required to avoid overfitting. With today's deep learning models, thousands of training samples may be totally inadequate. When in doubt, cross-validate. It's the best way to get a handle on the extent of overfitting (because there will always be some).