One of the most common questions I get is "𝐌𝐲 𝐩𝐫𝐞𝐝𝐢𝐜𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥 𝐢𝐬𝐧'𝐭 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐰𝐞𝐥𝐥 𝐞𝐧𝐨𝐮𝐠𝐡...𝐰𝐡𝐚𝐭 𝐬𝐡𝐨𝐮𝐥𝐝 𝐈 𝐝𝐨?" If model performance is disappointing, there are three main levers that we can pull to try improving the performance. 🔷 The first and most powerful lever is changing the data that the model is using. We can add more features to the model or by transforming the features that we’ve already included. In my experience, this is the most powerful of the levers. 🔷 Another lever we can pull is changing the type of model or the type of feature selection. If we see that a regression model isn’t working well, we can try a decision tree, for example. We can also try using a penalized regression model that performs feature selection automatically during the modeling process. 🔷 The other lever is tuning the hyperparameters. A hyperparameter is like a setting knob on a model. It adjusts the model's rules so changes in the hyperparameters produce models with very different results. Any combination of these three levers may be used to improve model performance. Depending on what data is accessible, it may not be feasible to add more features, so the data scientist must rely on hyperparameter tuning and model selection to improve the quality of predictions.
“Working well” can mean different things to different people. I was working on a predictive model of a surgical complication with a rate of a few percent. Using calibration curves, I was able to show my model had good association between predicted and actual rates up to 20-25 percent, but not beyond. While I was initially disappointed, collaborators were delighted - the model could identify who was at relatively high risk!
Helping organizations make better decisions with data
4moAgree with #1 especially. Getting the data right - even just cleaner - can make a big difference.