Isaac Pacheco Rojas’ Post

Data Analytics/Science Engineer

8mo Edited

Linear regression (LR) and generalized linear models (GLM) are two great tools in data science, but understanding their difference is key to mastering their application and maximizing analytical insights. While both model relationships between variables, they differ in assumptions, formulations, and handling of data variability. Understanding these differences is crucial for effective analysis. LR assumes normally distributed continuous response variables with constant variance. GLM relaxes these assumptions, accommodating various response variable types and distributions through link functions. Adjustment to Data Variability: LR struggles with heteroscedasticity and non-normal errors, while GLM handles them by choosing appropriate error distributions and link functions. GLM also specializes in modeling categorical and count data, which LR cannot handle well. LR and GLM offer distinct advantages in statistical modeling. By understanding their differences and how they adjust to data variability, analysts can make more informed decisions and draw reliable conclusions from their data. - You can check this post from Daily Dose of Data Science from Avi Chawla for a deeper understanding and a great reading resource!!

Generalized Linear Models (GLMs): The Supercharged Linear Regression

dailydoseofds.com

4 Comments

Avi Chawla

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.

8mo

Wonderful breakdown, Isaac Pacheco Rojas. Glad you loved the article. This is how we can breakdown the assumptions of LR (depicted in the image): - Firstly, it assumes that the conditional distribution of Y given X is a Gaussian. - Next, it assumes a very specific form for the mean of the above Gaussian. It says that the mean should always be a linear combination of the features (or predictors). - Lastly, it assumes a constant variance for the conditional distribution P(Y|X) across all levels of X. Generalized linear models (GLMs) relax all these assumptions, which makes them more adaptable to real-world datasets.

1 Reaction

Toh Au Yu

Assistant Economist (Fast-Track Economist Scheme) | Incoming Master of Science in Statistics Student | Economics and Statistics Graduate

4mo

Actually, for the usual OLS regression, heteroskedasticity can be easily dealt with by simply using heteroskedasticity-robust standard errors. Another way is to use feasible generalized least squares (FGLS, which is different from GLM!) instead. Non-normality of errors is also not usually an issue for statistical inference if you have a large sample size as your estimators will still be asymptotically normally distributed (although it is true that GLMs can give a better fit, especially if the outcome variable is discrete). An additional fun fact is that OLS regression is just a special case of a GLM where the link function is an identity function and where the conditional distribution of the outcome variable (conditioned on the independent variables) is a normal distribution.

Alejandro Pereira

AI & Software Consulting 🤖 | Let's Build Together 🛠️

8mo

Very interesting, haven’t seen this breakdown before

1 Reaction

Carlos Gutierrez

Proven systems to monetize your tech skills with freelance

8mo

great to have a visual version too!

See more comments

To view or add a comment, sign in

More Relevant Posts

Sakhawat Hossain

Data Scientist & Analyst | Boosted Efficiency by 15% | Master's in Data Science & Mathematics
5mo
Report this post
Linear Regression is one of the most important tools in a Data Scientist's toolbox. Here's everything you need to know in 3 minutes. 1. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn). 2. OLS does this by minimizing the sum of the squares of the differences between the observed dependent variable values and those predicted by the linear model. These differences are called "residuals." 3. "Best fit" in the context of OLS means that the sum of the squares of the residuals is as small as possible. Mathematically, it's about finding the values of β0, β1, ..., βn that minimize this sum. 4. Slopes (β1, β2, ..., βn): These coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant. 5. R-squared (R²): This statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data. 6. t-Statistics and p-Values: For each coefficient, the t-statistic and its associated p-value test the null hypothesis that the coefficient is equal to zero (no effect). A small p-value (< 0.05) suggests that you can reject the null hypothesis. 7. Confidence Intervals: These intervals provide a range of plausible values for each coefficient (usually at the 95% confidence level).
Like Comment
To view or add a comment, sign in
Richard Ntambi

Statistician - (Ministry of Finance, Planning & Economic Development) #DoingMore
5mo
Report this post
#LinearRegression Linear Regression is one of the most important tools in a Data Scientist's toolbox. Here's everything you need to know in 3 minutes. 1. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn). 2. OLS does this by minimizing the sum of the squares of the differences between the observed dependent variable values and those predicted by the linear model. These differences are called "residuals." 3. "Best fit" in the context of OLS means that the sum of the squares of the residuals is as small as possible. Mathematically, it's about finding the values of β0, β1, ..., βn that minimize this sum. 4. Slopes (β1, β2, ..., βn): These coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant. 5. R-squared (R²): This statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data. 6. t-Statistics and p-Values: For each coefficient, the t-statistic and its associated p-value test the null hypothesis that the coefficient is equal to zero (no effect). A small p-value (< 0.05) suggests that you can reject the null hypothesis. 7. Confidence Intervals: These intervals provide a range of plausible values for each coefficient (usually at the 95% confidence level).
2 Comments
Like Comment
To view or add a comment, sign in
Ankit Mahida

Master of Advanced Statistics | Tutor | Data analyst Enthusiast | MySQL | Python |
8mo
Report this post
Linear Regression is one of the most important tools in a Data Scientist's toolbox. Here's everything you need to know in 3 minutes. 1. OLS regression aims to find the best-fitting linear equation that describes the relationship between the dependent variable (often denoted as Y) and independent variables (denoted as X1, X2, ..., Xn). 2. OLS does this by minimizing the sum of the squares of the differences between the observed dependent variable values and those predicted by the linear model. These differences are called "residuals." 3. "Best fit" in the context of OLS means that the sum of the squares of the residuals is as small as possible. Mathematically, it's about finding the values of β0, β1, ..., βn that minimize this sum. 4. Slopes (β1, β2, ..., βn): These coefficients represent the change in the dependent variable for a one-unit change in the corresponding independent variable, holding other variables constant. 5. R-squared (R²): This statistic measures the proportion of variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, with higher values indicating a better fit of the model to the data. 6. t-Statistics and p-Values: For each coefficient, the t-statistic and its associated p-value test the null hypothesis that the coefficient is equal to zero (no effect). A small p-value (< 0.05) suggests that you can reject the null hypothesis. 7. Confidence Intervals: These intervals provide a range of plausible values for each coefficient (usually at the 95% confidence level). Understanding and interpreting these outputs is crucial for assessing the quality of the model, understanding the relationships between variables, and making predictions or conclusions based on the model.
Like Comment
To view or add a comment, sign in
Avi Chawla

Co-founder @ Daily Dose of Data Science (120k readers) | Follow to learn about Data Science, Machine Learning Engineering, and best practices in the field.
9mo
Report this post
Having linear data does not guarantee that Linear Regression will work 🧩 . . Linear regression comes with its own set of modeling challenges. For instance: - It cannot model count (or discrete) data. - After modeling, the output can be negative. This may not make sense at times — predicting the number of goals scored, number of calls received, etc. Furthermore, in linear regression: - Residuals are expected to be normally distributed around the mean. - Hence, outcomes on either side of the mean (m-x, m+x) are equally likely. For instance: - if the expected number (mean) of calls received is 1... - ...then, according to linear regression, receiving 3 calls (1+2) is just as likely as receiving -1 (1-2) calls. - This does not make any sense. Thus, if the above assumptions don't hold, linear regression won't help. Instead, what you may need is Poisson regression. Poisson regression: - is more suitable if your response (or outcome) is count-based. - assumes that the response comes from a Poisson distribution. Contrary to linear regression, in Poisson regression: - Residuals may follow an asymmetric distribution around the mean (λ). - Hence, outcomes on either side of the mean (λ-x, λ+x) are NOT equally likely. For instance: - if the expected number (mean) of calls received is 1... - ...then, according to Poisson regression, it is possible to receive 3 (1+2) calls, but it is impossible to receive -1 (1-2) calls. - This is because its outcome is also non-negative. The effectiveness of Poisson regression is evident from the image below. -- 👉 Get a Free Data Science PDF (550+ pages) with 320+ posts by subscribing to my daily newsletter today: https://bit.ly/DailyDS. -- 👉 Can you tell me some limitations or considerations for using Poisson regression? #machinelearning #datascience .
1 Comment
Like Comment
To view or add a comment, sign in
Anmol Choubey

Software Developer at ExxonMobil
3mo Edited
Report this post
Exploring Linear Regression (2/2–Understanding the Assumptions of Linear Regression) Linear regression is a fundamental tool in data science, essential for predicting outcomes and exploring relationships between variables. However, its predictive power is contingent on several critical assumptions. Understanding and validating these assumptions can elevate your analysis from insightful to transformative. Here’s a closer look at the key assumptions and how to verify them: 1. Linearity: The relationship between independent and dependent variables should be linear. To assess this, plot a scatter plot of the independent versus dependent variables to visually inspect whether the linearity assumption holds. 2. No Multicollinearity: Independent variables should not be highly correlated with each other, as this can lead to redundancy and instability in your model. Use the Variance Inflation Factor (VIF) to identify and address multicollinearity. Generally, a VIF > 10 indicates very high multicollinearity (consider removing these variables), 5 ≤ VIF ≤ 10 suggests high multicollinearity (consider giving a thought before removing), and VIF < 5 indicates low multicollinearity. 3. Normality of Residuals: Residuals (errors) should follow a normal distribution to ensure valid inferences about the model coefficients. Use a QQ-plot and the Shapiro-Wilk test to check for normality. 4. Homoscedasticity: Residuals should have constant variance across all levels of the independent variables. Plot the residuals against the dependent variable and check for any patterns or changes in variance across the range. 5. No Autocorrelation: Observations, especially in time series data, should be independent of each other to avoid bias. Implemented a project to predict admission chances, leveraging Ordinary Least Squares (OLS) which minimizes the sum of squared errors and estimates the optimal coefficients. By rigorously applying key assumptions, we significantly enhanced the model's efficiency and accuracy. This approach demonstrates how adhering to these foundational assumptions can lead to more reliable and effective predictive modelling. https://lnkd.in/gzF47sz5
Like Comment
To view or add a comment, sign in
Nitanshu Joshi

Data Scientist | ML & LLM Researcher
2mo Edited
Report this post
🥎 The Forgotten Hero - Linear Regression 🥎 Linear Regression is a fundamental technique in Data Science and Statistics, which we can consider a cornerstone for predictive modelling. It uncovers a relationship between variables. It centers on finding the best fit line, expressed by the equation: y = mx + b (where m —>slope & b —> intercept) This line enables predictions by inputting new values to estimate outcomes. Linear Regression's simplicity and effectiveness make it valuable across various fields, from finance to healthcare, aiding decision-makers in data-driven analysis and forecasting. 🧐 The question is - How do you find this magical line? 🧐 This is where Gradient Descent shines. This algorithm optimizes the cost function, typically the Mean Squared Error (MSE), to find the best fit line. It iteratively calculates the gradient of the MSE with respect to model parameters (slope and intercept), and then updates these parameters in the opposite direction of the gradient. The process continues, adjusting the learning rate, until convergence is achieved, minimizing the difference between predicted and actual values. A useful comparison for Gradient descent in the real world is “It is like a hiker searching for the lowest point in a valley, taking careful steps to avoid overshooting.” Also, for those who are unaware, the cost function is essential in machine learning to quantify the model's performance. It measures the difference between predicted and actual values, enabling optimization of model parameters to improve accuracy and minimize prediction errors. Measuring the Performance 📈 After building the model, evaluating its performance using metrics like R-squared and Root Mean Squared Error (RMSE) is important. R-squared indicates how well the model explains data variance, while RMSE measures the average prediction errors. #DataScience #Statistics #MachineLearning #MLModels #LinearRegression
Like Comment
To view or add a comment, sign in
DAMA UK

5,443 followers
2mo
Report this post
Regression Analysis is a statistical technique used to examine the relationship between one dependent variable and one or more independent (explanatory) variables. It helps to understand how the dependent variable changes when one of the independent variables is varied, while holding other independent variables constant. Regression analysis is commonly used for predictive modelling and for determining the strength of relationships between variables. There are several key types of Regression Analysis: Linear Regression Assumes a linear relationship between the dependent variable and one (simple linear regression) or more (multiple linear regression) independent variables Logistic Regression Used when the dependent variable is categorical, typically binary (0 or 1). Here, the relationship between the variables is modelled using the logistic function (sigmoid curve) Polynomial Regression A form of linear regression where the relationship between the independent and dependent variables is modelled as an nth-degree polynomial - useful when the data exhibits a curvilinear relationship Ridge and Lasso Regression These are variations of linear regression that apply regularisation techniques to prevent overfitting. Ridge adds a penalty proportional to the sum of the squares of the coefficients, while Lasso adds a penalty proportional to the absolute value of the coefficients. Purposes of Regression Analysis: ⚡ Prediction: Forecasting outcomes based on input variables 👭 Relationship Determination: Identifying and quantifying the strength of relationships between variables 📈 Trend Estimation: Estimating trends and patterns within datasets 🧪 Hypothesis Testing: Testing hypotheses about relationships and the significance of different predictors Regression analysis is a fundamental tool in data science, research, and machine learning for deriving insights from data and making data-driven decisions. Among its many useful applications are prediction of market trends, demand, or pricing, risk assessment, and understanding customer behaviour. #data #datascience #datainsights #DAMAUK
Like Comment
To view or add a comment, sign in
Quincy Larson

Teacher and founder of freeCodeCamp.org 🏕️ Host of the freeCodeCamp Podcast 🎧
6mo
Report this post
What's the difference between Linear Regression and Logistic Regression? This tutorial by data scientist Olu Samuel Praise will teach you how to apply each of these techniques for analyzing data and making predictions. In short, Logistic Regression will give you a binary classification of data: is this picture of a hot dog – yes or no? And Linear Regression will give you a continuous value like a percentage. For example, predicting exam scores from students based on their attendance and hours studied. Again, both approaches are super useful and I think you'll learn a lot from this quick read. (10 minute read):

Linear vs Logistic Regression: How to Choose the Right Regression Model for Your Data

freecodecamp.org

2 Comments
Like Comment
To view or add a comment, sign in
Florin Lungu

DevOps Chapter Lead / Sr. DevOps Engineer / Site Reliability Engineer - Deutsche Bank
6mo
Report this post
Linear vs Logistic Regression: How to Choose the Right Regression Model for Your Data Regression models identify trends in a dataset and predict outcomes based on the trends they have analyzed and identified. Linear and Logistic Regression are two types of regression models that are similar but carry out their functions in distinct ways. They're also two fundamental techniques in machine learning that Read mode on following blog post!

Linear vs Logistic Regression: How to Choose the Right Regression Model for Your Data

freecodecamp.org
Like Comment
To view or add a comment, sign in
Debraj Bandyopadhyay

Masters' Union Co'27 | IITD | ISI AIR 11 | CMI | ML/DL Intermediate
2mo
Report this post
Why Do We Use Mean Squared Error in Linear Regression? A Derivation Using Maximum Likelihood Estimation 🎯 🔗 Linking Back to Week 1: Probability, Likelihood & MLE Welcome back to the DecodeAI series! In Week 1, we explored Probability, Likelihood, and MLE. Today, we’ll dive into how MLE helps us derive Mean Squared Error (MSE) as the loss function in Linear Regression. Let’s break it down! 🚀 🧑💻 What Is Linear Regression & Mean Squared Error? Linear regression predicts continuous values (like house prices 🏠) based on input features (size, rooms, etc.). Mean Squared Error (MSE) measures a model's performance by averaging the squared differences between actual and predicted values. But why use MSE? 🤔 We assume the residuals (errors) are normally distributed and IID (Independent and Identically Distributed) - an assumption we covered in our last post. This is crucial to link MLE and MSE. 📈 Linear Regression as a Curve Fitting Problem Linear regression is about fitting a line (or hyperplane) to the data, like we discussed in MLE. Our goal is to find parameters that maximize the likelihood of the observed data. For a model with n features: Y = b + w1.X1 + w2.X2 +....+ wn.Xn + ε Here, Y is the continuous output, X represents the input features, b is the bias, and w’s are the weights and ε is the residual term. The goal is to maximize L(w1, w2,..., wn, b | D) where D is our dataset of m data points (X features, y outputs). 🧮 Logical Steps: MLE & Mean Squared Error Here’s the derivation without too much technical detail (check the images for the full math! 😉). 1. Normal Distribution of Errors: Since the errors (ε) are normally distributed, Y is also normally distributed. 2. Defining Probability: We define the probability of Y given X and parameters. 3. Linking Probability to Likelihood: The probability of the data translates into the likelihood of the parameters. The likelihood for the dataset is the product of individual probabilities for each data point. 4. Log-Likelihood: Instead of maximizing the likelihood (multiplying small probabilities), we maximize the log-likelihood, turning products into sums. 5. Sum of Square Residuals: From this, we derive the Sum of Squared Residuals as our objective function, which leads to MSE. Detailed derivation in the images! 🧠📚 ❓ Did You Know? The Mean Squared Error isn't just a random choice—it’s closely tied to the assumption that the residuals follow a normal distribution. If the residuals followed a different distribution (like Laplace), we'd end up with a different loss function, like Mean Absolute Error (MAE). 😯 Stay tuned for Friday’s post, where we’ll derive the Logistic Regression loss function using MLE and compare it with Linear Regression! 🔍 Meanwhile, challenge yourself - can you derive the MLE-based loss for Logistic Regression before reading the next post? #MachineLearning #AI #DataScience #LinearRegression #MLE #Mathematics #Statistics #AIforGood

2 Comments
Like Comment
To view or add a comment, sign in

279 followers

8 Posts

View Profile Follow

Isaac Pacheco Rojas’ Post

More Relevant Posts

Explore topics