Interpreting The Basic Outputs (SPSS) of Multiple Linear Regression
Interpreting The Basic Outputs (SPSS) of Multiple Linear Regression
Interpreting The Basic Outputs (SPSS) of Multiple Linear Regression
net/publication/333973273
CITATIONS READS
0 17,247
1 author:
Chuda Dhakal
Institute of Agriculture and Animal Science
8 PUBLICATIONS 4 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Chuda Dhakal on 24 June 2019.
Abstract: Regression analysis is one of the important tools to the researchers, except the complex, cumbersome and the expensive
undertaking of it; especially in obtaining the estimates correctly and interpreting them plentifully. We perceive a need for more inclusive
and thoughtful interpretation of (in this example) multiple regression results generated through SPSS. The objective of this study is to
comprehend and demonstrate the in-depth interpretation of basic multiple regression outputs simulating an example from social science
sector. In this paper we have mentioned the procedure (steps) to obtain multiple regression output via (SPSS Vs.20) and hence the
detailed interpretation of the produced outputs has been demonstrated. We have illustrated the interpretation of the coefficient from the
output, Model Summary table (R2, Adj. R2, and SE); Statistical significance of the model from ANOVA table, and the statistical
significance of the independent variables from coefficients table. An expansive and attentive interpretation of multiple regression
outputs has been explained untiringly. Both statistical and the substantive significance of the derived multiple regression model are
explained. Every single care has been taken in the explanation of the results throughout the study to make it a competent template to the
researcher for any real-life data they will use. Because every effort has been made to clearly interpret the basic multiple regression
outputs from SPSS, any researcher should be eased and benefited in their fields when they use multiple regression for better prediction
of their outcome variable.
Keywords: Multiple regression,Regression outputs, R squared, Adj. R Square, Standard error, Multicollinearity
Considering none of the eight assumptions mentioned According to Frost (2017) caveats about R2 is: small R-
earlier, have been violated, regression output to the given squared values are not always a problem, and high R-
data was generated through the following steps conducted in squared values are not necessarily good. For instance, for an
SPSS Vs. 20. outcome variable like human behaviour which is very hard
to predict, a high value of R-squared is almost impossible.
Click„Analyse‟,„Regression‟, „Linear‟. Select „hours per And, this does not mean any predicted model to such case is
week‟ in the dependent variable box and „no of children‟ and always useless. A good model can have a low R2 value. On
„years of education‟ in the independent variable box. Select the other hand, a biased model can have a high R2 value! A
„enter‟ as the as the method [The default method for the variety of other circumstances can artificially inflate R2.
multiple linear regression analysis]. Click „Statistics‟, select
[„Model fit‟ and „Estimates‟ are default selections] „R To accurately report the data interpretation of "Adjusted R
squared change‟, „Confidence Intervals‟, „Part and partial Square" (adj. R2) is another important factor. A value of .803
correlations‟ and „Collinearity diagnostics‟ and click (coefficients table) in this example indicates true 80.3% of
„continue‟. Outputs (Model summary table, Anova and variation in the outcome variable is explained by the
Coefficients) generated through the command mentioned predictors which are to keep in the model. High discrepancy
afore are discussed and interpreted systematically in the between the values of R-squared and Adjusted R
following result and discussion section. At times while Squareindicates a poor fit of the model. Any addition of
discussing, same output table has been replicated as per the useless variable to a model causes a decrease in adjusted r-
ease to see the results close by. squared. But, for any useful variable added, adjusted r-
squared will increase. Adjusted R2 will always be less than
3. Results and Discussion or equal to R2. Adjusted R2 therefore, adjusts for the number
of terms in a model. As R2 always increases and never
Our research question for the multiple linear regression is: decreases, it can appear to be a better fit with the more terms
Can we explain the outcome variable, hours per week that a added to the model and the adjusted R2 penalizes one from
husband spends at house work with the given independent being completely misleading.
variables no of children, wife’s year of education and
husband’s years of education? Stephanie (2018) cautions about how to differentiate
between R2 and adjusted R2. R2 Shows how well data points
Determining how well the model fits fit a regression line assuming every single variable explains
The first table of interest is the model summary (Table 2). the variation in the dependent variable which is not true.
This table provides the R, R2, adjusted R2, and the standard Whereas, adjusted R2 tells how well the data points fit a
error of the estimate, which can be used to determine how regression line showing the percentage of variation
well a regression model fits the data: explained only by the independent variables that actually
affect the dependent variable. In addition, example of
Table 2: Model summary interpreting and applying a multiple regression model
Model summary (n.d.)reveals that the "adjusted R²" is intended to "control
Adjusted Std. Error of for" overestimates of the population R² resulting from small
Model R R Square
R Square the estimate samples, high collinearity or small subject/variable ratios. Its
1 .925a .856 .803 .547 perceived utility varies greatly across research areas and
a. Predictors: (Constant), husband‟s years of education, time.
wife‟s years of education, no of children
The standard error (in this example .55)of a model fit is a
The "R" column represents the value of R, the multiple measure of the precision of the model. It is the standard
correlation coefficient. R can be considered to be one deviation of the residuals. It shows how wrong one could be
measure of the quality of the prediction of the dependent if s/he used the regression model to make predictions or to
variable; in this case, hours per week. A value of .925 in this estimate the dependent variable or variable of interest. As R²
example, indicates a good level of prediction. The "R increases the standard error will decrease. On average, our
Volume 8 Issue 6, June 2019
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: 4061901 10.21275/4061901 1449
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
ResearchGate Impact Factor (2018): 0.28 | SJIF (2018): 7.426
estimates of hours per week with this model will be wrong Table 3: ANOVA
by .55which is not an ignorable amount given the scale of ANOVAa
hours per week. And hence, the standard error is wished to Model Sum of Squares df Mean Square F Sig.
be as small as possible. The standard error is used to get a 1 Regression 14.274 3 4.758 15.907 .001b
confidence interval for the predicted values. Residual 2.393 8 .299
Total 16.667 11
Correlated predictors (multicollinearity) may cause large a. Dependent Variable: hours per week
standard error of the estimate of the regression coefficient. b. Predictors: (Constant), husband‟s years of education,
However, even with the presence of multicollinearity the wife‟s years of education, no. of children
regression can still be precise if the "magnified" standard
error is still small enough. Statistical significance of the independent variables
Statistical significance of each of the independent variables
Statistical significance of the model tests whether the unstandardized (or standardized)
The F-ratio in the ANOVA (Table 3) tests whether the coefficients are equal to 0 (zero) in the population(i.e. for
overall regression model is a good fit for the data. The table each of the coefficients, H0: β = 0 versus Ha: β ≠ 0 is
shows that the independent variables statistically conducted). If p < .05, the coefficients are statistically
significantly predict the dependent variable, F (3, 8) = significantly different to 0 (zero). The usefulness of these
15.907, p (.001) < .05 (i.e., the regression model is a good fit tests of significance are to investigate if each explanatory
of the data). variable needs to be in the model, given that the others are
already there.
Table 4: Coefficients
Coefficientsa
Unstandardized Standardized
Correlations Collinearity Statistics
Model Coefficients Coefficients t Sig.
B Std. Error Beta Zero-order Partial Part Tolerance VIF
(Constant) 2.021 1.681 1.203 .263
No. of children .367 .185 .348 1.984 .082 .759 .574 .266 .584 1.711
Wife‟s year of education .271 .080 .491 3.386 .010 .641 .767 .454 .853 1.173
Husband‟s years of education -.211 .081 -.425 -2.584 .032 -.653 -.675 -.346 .663 1.509
a. Dependent Variable: hours per week
Given that, the t-value and corresponding p-value are in the you could be, while estimating its value. For instance, in this
"t" and "Sig." columns (Table 4), respectively, in this example relative to the coefficient .271 of wife’s years of
example, the tests tell us that wife’s years of education education its standard error .080 is small.
p(.010)<0.05 and husband’s years of education
p(.032)<0.05 are significant, but no of children is not Estimated model coefficients
significant P(.082)>0.05. This means that the explanatory The general form of the equation to predict hours per
variable no of children is no more useful in the model, when week from no of children, wife’s years of education, and
the other two variables are already in the model. In other husband’s years of education, is:
words, with wife’s years of education and husband’s years
of education in the model, no of children no more adds a Predicted hours per week = 2.021 + 0.367 (no of children) +
substantial contribution to explaining hours per week. 0.271(wife’s year of education) – 0.211 (husband’s years of
education)
Like the standard error of model fit discussed above, the
standard error of the coefficients in regression output are This is obtained from the (Table 5) below:
also wished to be as small as possible. It reflect show wrong
Table 5: Coefficients
Unstandardized Standardized
Correlations Collinearity Statistics
Model Coefficients Coefficients t Sig.
B Std. Error Beta Zero-order Partial Part Tolerance VIF
(Constant) 2.021 1.681 1.203 .263
No. of children .367 .185 .348 1.984 .082 .759 .574 .266 .584 1.711
Wife‟s year of education .271 .080 .491 3.386 .010 .641 .767 .454 .853 1.173
Husband‟s years of education -.211 .081 -.425 -2.584 .032 -.653 -.675 -.346 .663 1.509
a. Dependent Variable: hours per week
Constant 2.021, is the predicted value for the dependent For this reason, this is only a meaningful interpretation if it
variable (in this example)hours per week if all independent is reasonable that the predictors can take the value 0 in
variables, no of children = 0, wife’s years of education = 0 practice. Besides, Karen (2018) reveals that the data set
and husband’s years of education=0. That is, we would should include values for all predictors that were near 0.
expect an average hour per week of 2.021 husband spends Therefore, if both of these conditions are not true, the
for house work when all predictor variables take the value 0.
Volume 8 Issue 6, June 2019
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Paper ID: 4061901 10.21275/4061901 1450
International Journal of Science and Research (IJSR)
ISSN: 2319-7064
ResearchGate Impact Factor (2018): 0.28 | SJIF (2018): 7.426
constant term (the y-intercept) in the regression line really negative sign of the coefficient) in hours per week by
has no meaningful interpretation. 0.211hours.
However, as it (the y-intercept) places the regression line in Accordingly, standardized coefficients are called beta
the right place, this is always kept in there while presenting weights, given in the “beta” column. The beta weight
the regression model. In this example, it is easy to see that in measure how much the outcome variable increases
the data set, no of children sometimes is 0, but both wife’s (in standard deviations) when the predictor variable is
years of education and husband’s years of education, are not increased by one standard deviation assuming other
close to 0, then our intercept has no real interpretation. variables in the model are held constant. These are useful
measures to rank the predictor variables based on their
Unstandardized coefficients indicate how much the contribution (irrespective of sign) in explaining the outcome
dependent variable varies with an independent variable variable.
when all other independent variables are held constant. The
regression coefficient provides the expected change in the Hence in this case, wife’s years of education is the highest
dependent variable (here: hours per week)for a one-unit contributing (.491) predictor to explain hours per week, and
increase in the independent variable. Referring to the the next is husband’s years of education (-.425). However,
coefficients (Table 5) above the unstandardized coefficient only when the model is specified perfectly and there is no
for no of children is 0.367. This means for every unit multicollinearity among the predictors, Stephanie (2018)
increase (one child increase) in no of children, there is 0.367 explains.
hours increase in hours per week. But each one-year increase
in husband’s years of education causes reduction (the Zero order partial and part correlation
Table 6: Coefficients
Coefficientsa
Unstandardized Standardized
Correlations Collinearity Statistics
Model Coefficients Coefficients t Sig.
B Std. Error Beta Zero-order Partial Part Tolerance VIF
(Constant) 2.021 1.681 1.203 .263
No. of children .367 .185 .348 1.984 .082 .759 .574 .266 .584 1.711
Wife‟s year of education .271 .080 .491 3.386 .010 .641 .767 .454 .853 1.173
Husband‟s years of education -.211 .081 -.425 -2.584 .032 -.653 -.675 -.346 .663 1.509
a. Dependent Variable: hours per week
Zero order correlation are the bivariate correlation between model. In this example, part coefficients of determination
the predictors and the dependent variable. Hence .759 in this (SPSS does not produce this) are (.266) 2, (.454)2 and (-.346)2
example is the direct effect of no of children on hours per respectively for the predictors no of children, wife’s years of
week, this ignores the effect of other two predictor variables education, and husband’s years of education. These unique
that may/may-not be influencing the dependent variable. contributions of the predictors when added up, approximates
(7.1+20.6+12) = 39.7% of the variation in the outcome
But when the effect of the other two independent variables variable. And this percentage of variance in the response
are accounted (but kept constant) to no of children and hours variable is different from the R-squared value (85.6 %) in
per week, the correlation changes to be less strong.574, the model. Meaning that (85.6-39.7=45.9) 46% overlapping
which is partial correlation. And, For the same case, the part predictive work was done by the predictors. Which is not
correlation .266 is the correlation between no of children and that bad. This proves the combination of the variables had
hours per week where the effect of the other two been quite good.
independent variables are completely excluded out.
The information in the (Table 6) above also allows us to
From the causal perspective, this means (in this example) if check for multicollinearity. A common rule of thumb: for
we change no of children, we change the other variables, any predictor VIF > 10 should be examined for possible
too. Now, when we model the TOTAL effects from no of multicollinearity problem (Dhakal, 2016). In our multiple
children on hours per week, we have to account for the linear regression model. VIF should be < 10 (or Tolerance >
direct effect (which appears to be strong) and the indirect 0.1) for all variables, which they are.
effect of no of children influencing the other variables which
in turn influence hours per week. When we combine the 4. Summary
strong direct impact with the indirect effects, we end up with
an overall "weaker" impact. Putting the above all together we could write up the results
as follows:
Because part correlations are the correlations that presume
the effect of the other predictors have been excluded out, A multiple regression was run to predict hours per week a
these are helpful to identify if the multiple regression used husband spends at house work, from no of children, wife’s
was beneficial. I.e. to estimate gain in predictive ability years of education and husband’s years of education. The
(how much gain had been there in the predictive ability due model statistically significantly predicted hours per week F
to the combination of the predictors in the model) of the (3, 8) = 15.907, p(.001) < .05, R2 = 0.856. Out of three only