Isaac Pacheco Rojas’ Post

View profile for Isaac Pacheco Rojas, graphic

Data Analytics/Science Engineer

Linear regression (LR) and generalized linear models (GLM) are two great tools in data science, but understanding their difference is key to mastering their application and maximizing analytical insights. While both model relationships between variables, they differ in assumptions, formulations, and handling of data variability. Understanding these differences is crucial for effective analysis. LR assumes normally distributed continuous response variables with constant variance. GLM relaxes these assumptions, accommodating various response variable types and distributions through link functions. Adjustment to Data Variability: LR struggles with heteroscedasticity and non-normal errors, while GLM handles them by choosing appropriate error distributions and link functions. GLM also specializes in modeling categorical and count data, which LR cannot handle well. LR and GLM offer distinct advantages in statistical modeling. By understanding their differences and how they adjust to data variability, analysts can make more informed decisions and draw reliable conclusions from their data. - You can check this post from Daily Dose of Data Science from Avi Chawla for a deeper understanding and a great reading resource!!

Generalized Linear Models (GLMs): The Supercharged Linear Regression

Generalized Linear Models (GLMs): The Supercharged Linear Regression

dailydoseofds.com

Avi Chawla

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.

8mo

Wonderful breakdown, Isaac Pacheco Rojas. Glad you loved the article. This is how we can breakdown the assumptions of LR (depicted in the image): - Firstly, it assumes that the conditional distribution of Y given X is a Gaussian. - Next, it assumes a very specific form for the mean of the above Gaussian. It says that the mean should always be a linear combination of the features (or predictors). - Lastly, it assumes a constant variance for the conditional distribution P(Y|X) across all levels of X. Generalized linear models (GLMs) relax all these assumptions, which makes them more adaptable to real-world datasets.

  • No alternative text description for this image
Toh Au Yu

Assistant Economist (Fast-Track Economist Scheme) | Incoming Master of Science in Statistics Student | Economics and Statistics Graduate

4mo

Actually, for the usual OLS regression, heteroskedasticity can be easily dealt with by simply using heteroskedasticity-robust standard errors. Another way is to use feasible generalized least squares (FGLS, which is different from GLM!) instead. Non-normality of errors is also not usually an issue for statistical inference if you have a large sample size as your estimators will still be asymptotically normally distributed (although it is true that GLMs can give a better fit, especially if the outcome variable is discrete). An additional fun fact is that OLS regression is just a special case of a GLM where the link function is an identity function and where the conditional distribution of the outcome variable (conditioned on the independent variables) is a normal distribution.

Like
Reply
Alejandro Pereira

AI & Software Consulting 🤖 | Let's Build Together 🛠️

8mo

Very interesting, haven’t seen this breakdown before

Carlos Gutierrez

Proven systems to monetize your tech skills with freelance

8mo

great to have a visual version too!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics