The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
1 Introduction
The multiple linear regression model and its estimation using ordinary
least squares (OLS) is doubtless the most widely used tool in econometrics.
It allows to estimate the relation between a dependent variable and a set
of explanatory variables. Prototypical examples in econometrics are:
y1 1 x11 · · · x1K u1
β0
y
2
1 x21 · · · x2K u
2
β1
y3 1 x31 · · · x3K u3
= .
.
+
.. .. .. .. ..
.. .
.
. . . . .
βK
yN 1 xN 1 · · · xN K uN
N ×1 N × (K + 1) (K + 1) × 1 N ×1
OLS1: Linearity
yi = x0i β + ui and E[ui ] = 0
3 Short Guides to Microeconometrics
OLS2: Independence
{xi , yi }N
i=1 i.i.d. (independent and identically distributed)
OLS2 means that the observations are independently and identically dis-
tributed. This assumption is in practice guaranteed by random sampling.
OLS3: Exogeneity
a) ui |xi ∼ N (0, σi2 )
b) ui ⊥
⊥ xi (independent)
c) E[ui |xi ] = 0 (mean independent)
d) Cov[xi , ui ] = 0 (uncorrelated)
OLS3a assumes that the error term is normally distributed conditional
on the explanatory variables. OLS3b means that the error term is in-
dependent of the explanatory variables. OLS3c states that the mean of
the error term is independent of the explanatory variables. OLS3d means
that the error term and the explanatory variables are uncorrelated. Either
OLS3a or OLS3b imply OLS3c and OLS3d. OLS3c implies OLS3d.
OLS5: Identifiability
E[xi x0i ] = QXX is positive definite and finite
rank(X) = K + 1 < N
The OLS5 assumes that the regressors are not perfectly collinear, i.e. no
variable is a linear combination of the others. For example, there can only
be one constant. Intuitively, OLS5 means that every explanatory variable
adds additional information. OLS5 also assumes that all regressors (but
the constant) have strictly positive variance both in expectations and in
the sample and not too many extreme values.
N
X
S (β) = (yi − x0i β)2 = (y − Xβ)0 (y − Xβ) → min
β
i=1
4 Goodness-of-fit
4
E(y|x)
OLS
3 data
0
y
−1
−2
−3
−4
0 2 4 6 8 10
x
Assuming OLS1, OLS2, OLS3a, OLS4, and OLS5, the following proper-
ties can be established for finite, i.e. even small, samples.
E[β|X]
b =β
−1
with variance V [β|X]
b = σ 2 (X 0 X) under homoscedasticity (OLS4a)
−1 −1
and V [β|X]
b = σ (X X) X ΩX (X 0 X) under known heteroscedas-
2 0 0
Vb (β|X)
b c2 (X 0 X)−1
=σ
with
c2 = b0 u
u b
σ .
N −K −1
βbk − q
t= ∼ tN −K−1
se[
b βbk ]
7 Short Guides to Microeconometrics
interval is given by
βbk − t(1−α/2),(N −K−1) se[
b βbk ] , βbk + t(1−α/2),(N −K−1) se[
b βbk ]
Assuming OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5 the follow-
ing properties can be established for large samples.
plim βb = β
where QXX = E[xi x0i ] and QXΩX = E[u2i xi x0i ] is assumed positive
definite (see handout on “Heteroskedasticity in the Linear Model”).
Avar[
[ β] c2 (X 0 X)−1
b =σ
9 Short Guides to Microeconometrics
Note: In practice we can almost never be sure that the errors are
homoscedastic and should therefore always use robust standard errors.
9 Asymptotic Tests
βbk − q A
z= ∼ N (0, 1)
se[
b βbk ]
The t-test, F -test and confidence interval for small samples depend on the
normality assumption OLS3a (see Table 1). This assumption is strong and
11 Short Guides to Microeconometrics
unlikely to be satisfied. The asymptotic z-test, Wald test and the con-
fidence interval for large samples rely on much weaker assumptions. Al-
though most statistical software packages report the small sample results
by default, we would typically prefer the large sample approximations. In
practice, small sample and asymptotic tests and confidence intervals are
very similar already for relatively small samples, i.e. for (N − K) > 30.
Large sample tests also have the advantage that they can be based on
heteroscedasticity robust standard errors.
Implementation in Stata 17
The multiple linear regression model is estimated by OLS with the regress
command. For example,
webuse auto.dta
regress mpg weight displacement
F -tests for one or more restrictions are calculated with the post-estimation
command test. For example
test weight
N-1 "!
t-Test *+ : -. = 0 p-value *+ : -. = 0
14
15 Short Guides to Microeconometrics
Implementation in R 4.3.1
The multiple linear regression model is estimated by OLS with the lm()
function. For example,
library(haven)
auto <- read_dta("http://www.stata-press.com/data/r17/auto.dta")
ols <- lm(mpg~weight+displacement, data=auto)
summary(ols)
confint(ols)
F -tests for one or more restrictions are calculated with the command
waldtest which also uses the two packages sandwich and lmtest
waldtest(ols, "weight", vcov=sandwich)
Call:
lm(formula = mpg ~ weight + displacement, data = auto)
Residuals:
Min 1Q Median 3Q Max
-6.9654 -2.0618 -0.5368 0.9775 13.8371
p-value 𝐻! : 𝛽" = 0
Coefficients:
𝛽&! Estimate Std. Error t value Pr(>|t|) t-Test 𝐻! : 𝛽" = 0
(Intercept) 40.084522 2.020110 19.843 < 2e-16 ***
𝛽&# weight -0.006567 0.001166 -5.631 3.35e-07 *** 𝑠'𝑒(𝛽&" )
displacement 0.005281 0.009870 0.535 0.594
𝛽&"
---
The Multiple Linear Regression Model
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
𝜎, N-K-1
Residual standard error: 3.456 on 71 degrees of freedom
2
R Multiple R-squared: 0.6529, Adjusted R-squared: 0.6432
adj. R2
F-statistic: 66.79 on 2 and 71 DF, p-value: < 2.2e-16
F-Test
𝐻! : 𝛽# = 0 > confint(ols)
p-value
and 𝛽" = 0 2.5 % 97.5 % 𝐻! : 𝛽# = 0
(Intercept) 36.056536844 44.112507832 and 𝛽" = 0
weight -0.008892523 -0.004241688
displacement -0.014398605 0.024960155 N-K-1
K
95%-confidence
interval for 𝛽&"
16
17 Short Guides to Microeconometrics
References
Introductory textbooks
Advanced textbooks
Companion textbooks