The Multiple Linear Regression Model: Version: 30-10-2023, 16:07

Short Guides to Microeconometrics Kurt Schmidheiny
Fall 2023 University of Basel
The Multiple Linear Regression Model
1 Introduction
The multiple linear regression model and its estimation using ordinary
least squares (OLS) is doubtless the most widely used tool in econometrics.
It allows to estimate the relation between a dependent variable and a set
of explanatory variables. Prototypical examples in econometrics are:
• Wage of an employee as a function of her education and her work

experience (the so-called Mincer equation).
• Price of a house as a function of its number of bedrooms and its age

(an example of hedonic price regressions).
The dependent variable is an interval variable, i.e. its values represent

a natural order and differences of two values are meaningful. In practice,
this means that the variable needs to be observed with some precision
and that all observed values are far from ranges which are theoretically
excluded. Wages, for example, do strictly speaking not qualify as they
cannot take values beyond two digits (cents) and values which are nega-
tive. In practice, monthly wages in dollars in a sample of full time workers
is perfectly fine with OLS whereas wages measured in three wage cate-
gories (low, middle, high) for a sample that includes unemployed (with
zero wages) ask for other estimation tools.
Version: 30-10-2023, 16:07

The Multiple Linear Regression Model 2
2 The Econometric Model
The multiple linear regression model assumes a linear (in parameters)

relationship between a dependent variable yi and a set of explanatory
variables x0i =(xi0 , xi1 , ..., xiK ). xik is also called an independent
variable, a covariate or a regressor. The first regressor xi0 = 1 is a constant
unless otherwise specified.
Consider a sample of N observations i = 1, ... , N . Every single obser-
vation i follows
yi = x0i β + ui
where β is a (K + 1)-dimensional column vector of parameters, x0i is a

(K + 1)-dimensional row vector and ui is a scalar called the error term.
The whole sample of N observations can be expressed in matrix nota-
tion,
y = Xβ + u
where y is a N -dimensional column vector, X is a N × (K + 1) matrix

and u is a N -dimensional column vector of error terms, i.e.
     
y1 1 x11 · · · x1K   u1
β0
 y 
 2 
 1 x21 · · · x2K   u 
 2 
 β1
   
 y3  1 x31 · · · x3K  u3 
      
=    .
 .
 +
 ..  .. .. ..  .. 
      
 ..   .
.

 .   . . .   . 
βK
yN 1 xN 1 · · · xN K uN
N ×1 N × (K + 1) (K + 1) × 1 N ×1
The data generation process (dgp) is fully described by a set of as-

sumptions. Several of the following assumptions are formulated in dif-
ferent alternatives. Different sets of assumptions will lead to different
properties of the OLS estimator.
OLS1: Linearity
yi = x0i β + ui and E[ui ] = 0
3 Short Guides to Microeconometrics
OLS1 assumes that the functional relationship between dependent and

explanatory variables is linear in parameters, that the error term enters
additively and that the parameters are constant across individuals i.
OLS2: Independence
{xi , yi }N
i=1 i.i.d. (independent and identically distributed)
OLS2 means that the observations are independently and identically dis-
tributed. This assumption is in practice guaranteed by random sampling.
OLS3: Exogeneity
a) ui |xi ∼ N (0, σi2 )
b) ui ⊥
⊥ xi (independent)
c) E[ui |xi ] = 0 (mean independent)
d) Cov[xi , ui ] = 0 (uncorrelated)
OLS3a assumes that the error term is normally distributed conditional
on the explanatory variables. OLS3b means that the error term is in-
dependent of the explanatory variables. OLS3c states that the mean of
the error term is independent of the explanatory variables. OLS3d means
that the error term and the explanatory variables are uncorrelated. Either
OLS3a or OLS3b imply OLS3c and OLS3d. OLS3c implies OLS3d.
OLS4: Error Variance

a) V [ui |xi ] = σ 2 < ∞ (homoscedasticity)
b) V [ui |xi ] = σi2 = g(xi ) < ∞ (conditional heteroscedasticity)
OLS4a (homoscedasticity) means that the variance of the error term is
a constant. OLS4b (conditional heteroscedasticity) allows the variance of
the error term to depend on the explanatory variables.
OLS5: Identifiability
E[xi x0i ] = QXX is positive definite and finite
rank(X) = K + 1 < N
The OLS5 assumes that the regressors are not perfectly collinear, i.e. no
variable is a linear combination of the others. For example, there can only
be one constant. Intuitively, OLS5 means that every explanatory variable
adds additional information. OLS5 also assumes that all regressors (but
the constant) have strictly positive variance both in expectations and in
the sample and not too many extreme values.
3 Estimation with OLS
Ordinary least squares (OLS) minimizes the squared distances between

the observed and the predicted dependent variable y:
N
X
S (β) = (yi − x0i β)2 = (y − Xβ)0 (y − Xβ) → min
β
i=1
The resulting OLS estimator of β is:

−1
βb = (X 0 X) X 0 y
Given the OLS estimator, we can predict the dependent variable by

ybi = x0i βb and the error term by u
bi = yi − x0i β.
b ubi is called the residual.
4 Goodness-of-fit
The goodness-of-fit of an OLS regression can be measured as

SSR SSE
R2 = 1 − =
SST SST
PN 2
where SST = i=1 (yi − y) is the total sum of squares and SSR =
PN 2 PN
i=1 u
bi the residual sum of squares. SSE = yi − y)2 is called
i=1 (b
4
E(y|x)
OLS
3 data
0
y
−1
−2
−3
−4
0 2 4 6 8 10
x
Figure 1: The linear regression model with one regressor. β0 = −2,

β1 = 0.5, σ 2 = 1, x ∼ uniform(0, 10), u ∼ N (0, σ 2 ).
the explained sum of squares if the regression contains a constant and

therefore y = yb. In this case, R2 lies by definition between 0 and 1 and
reports the fraction of the sample variation in y that is explained by the
xs.
Note: R2 increases by construction with every (also irrelevant) addi-
tional regressors and is therefore not a good criterium for the selection of
regressors. The adjusted R2 is a modified version that does not necessarily
increase with additional regressors:
N − 1 SSR
adj. R2 = 1 − .
N − K − 1 SST
5 Small Sample Properties
Assuming OLS1, OLS2, OLS3a, OLS4, and OLS5, the following proper-
ties can be established for finite, i.e. even small, samples.
• The OLS estimator of β is unbiased :
E[β|X]
b =β
• The OLS estimator is (multivariate) normally distributed:

b ∼ N β, V [β|X]
β|X b
−1
with variance V [β|X]
b = σ 2 (X 0 X) under homoscedasticity (OLS4a)
−1 −1
and V [β|X]
b = σ (X X) X ΩX (X 0 X) under known heteroscedas-
2 0 0
ticity (OLS4b). Under homoscedasticity (OLS4a) the variance V

can be unbiasedly estimated as
Vb (β|X)
b c2 (X 0 X)−1
=σ
with
c2 = b0 u
u b
σ .
N −K −1
• Gauß-Markov-Theorem: under homoscedasticity (OLS4a),
βb is BLUE (best linear unbiased estimator)
6 Tests in Small Samples
Assume OLS1, OLS2, OLS3a, OLS4a, and OLS5.

A simple null hypotheses of the form H0 : βk = q is tested with the
t-test. If the null hypotheses is true, the t-statistic
βbk − q
t= ∼ tN −K−1
se[
b βbk ]
follows a t-distribution with N − K − 1 degrees of freedom. The standard

error se[
b βbk ] is the square root of the element in the (k + 1)−th row and
(k+1)−th column of Vb [β|X].b For example, to perform a two-sided test of
H0 against the alternative hypotheses HA : βk 6= q on the 5% significance
level, we calculate the t-statistic and compare its absolute value to the
0.975-quantile of the t-distribution. With N = 30 and K = 2, H0 is
rejected if |t| > 2.052.
A null hypotheses of the form H0 : Rβ = q with J linear restrictions
is jointly tested with the F -test. If the null hypotheses is true, the F -
statistic
0 −1
0
Rβb − q RVb (β|X)R
b Rβb − q
F = ∼ FJ,N −K−1
J
follows an F distribution with J numerator degrees of freedom and N −
K − 1 denominator degrees of freedom. For example, to perform a two-
sided test of H0 against the alternative hypotheses HA : Rβ 6= q at the
5% significance level, we calculate the F -statistic and compare it to the
0.95-quantile of the F -distribution. With N = 30, K = 2 and J = 2, H0
is rejected if F > 3.35. We cannot perform one-sided F -tests.
Only under homoscedasticity (OLS4a), the F -statistic can also be
computed as
(SSRrestricted − SSR)/J (R2 − Rrestricted
2
)/J
F = = 2
∼ FJ,N −K−1
SSR/(N − K − 1) (1 − R )/(N − K − 1)
2
where SSRrestricted and Rrestricted are, respectively, estimated by re-
striced least squares which minimizes S(β) s.t. Rβ = q. Exclusionary
restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
H0 : Rβ = q. In this case, restricted least squares is simply estimated as
a regression were the explanatory variables k, m, ... are excluded.
7 Confidence Intervals in Small Samples
Assuming OLS1, OLS2, OLS3a, OLS4a, and OLS5, we can construct

confidence intervals for a particular coefficient βk . The (1 − α) confidence
interval is given by

βbk − t(1−α/2),(N −K−1) se[
b βbk ] , βbk + t(1−α/2),(N −K−1) se[
b βbk ]
where t(1−α/2),(N −K−1) is the (1 − α/2) quantile of the t-distribution with

N − K − 1 degrees of freedom.
For example, the 95 % confidence interval
with N = 30 and K = 2 is βk − 2.052se[
b b βk ] , βk + 2.052se[
b b b βk ] .
b
8 Asymptotic Properties of the OLS Estimator
Assuming OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5 the follow-
ing properties can be established for large samples.
• The OLS estimator is consistent:
plim βb = β
• The OLS estimator is asymptotically normally distributed under

OLS4a as
√ d
N (βb − β) −→ N 0, σ 2 Q−1

XX
and under OLS4b as

√ d
N (βb − β) −→ N 0, Q−1 −1

XX QXΩX QXX
where QXX = E[xi x0i ] and QXΩX = E[u2i xi x0i ] is assumed positive
definite (see handout on “Heteroskedasticity in the Linear Model”).
• The OLS estimator is approximately normally distributed

A
βb ∼ N β, Avar[β]
b
where the asymptotic variance Avar[β]

b can be consistently esti-
mated under OLS4a (homoscedasticity) as
Avar[
[ β] c2 (X 0 X)−1
b =σ
with σc2 = ub0 u

b/N and under OLS4b (heteroscedasticity) as the ro-
bust or Eicker-Huber-White estimator (see handout on “Heteroscedas-
ticity in the linear Model”)
N
!
−1 −1
X
0
Avar[
[ β] b = (X X) b xi x (X 0 X) .
u 2 0
i i
i=1
Note: In practice we can almost never be sure that the errors are
homoscedastic and should therefore always use robust standard errors.
9 Asymptotic Tests
Assume OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5.

A simple null hypotheses of the form H0 : βk = q is tested with the
z-test. If the null hypotheses is true, the z-statistic
βbk − q A
z= ∼ N (0, 1)
se[
b βbk ]
follows approximately the standard normal distribution. The standard

error se[
b βbk ] is the square root of the element in the (k + 1)−th row and
(k + 1)−th column of Avar[[ β]. b For example, to perform a two sided
test of H0 against the alternative hypotheses HA : βk 6= q on the 5%
significance level, we calculate the z-statistic and compare its absolute
value to the 0.975-quantile of the standard normal distribution. H0 is
rejected if |z| > 1.96.
A null hypotheses of the form H0 : Rβ = q with J linear restrictions is
jointly tested with the Wald test. If the null hypotheses is true, the Wald
statistic
0 −1
b 0 A
W = Rβb − q RAvar[
[ β]R Rβb − q ∼ χ2J
follows approximately an χ2 distribution with J degrees of freedom. For

example, to perform a test of H0 against the alternative hypotheses HA :
Rβ 6= q on the 5% significance level, we calculate the Wald statistic and

compare it to the 0.95-quantile of the χ2 -distribution. With J = 2, H0 is
rejected if W > 5.99. We cannot perform one-sided Wald tests.
Under OLS4a (homoscedasticity) only, the Wald statistic can also be
computed as
(SSRrestricted − SSR) (R2 − Rrestricted
2
) A 2
W = = ∼ χJ
SSR/N (1 − R2 )/N
2
where SSRrestricted and Rrestricted are, respectively, estimated by re-
stricted least squares which minimizes S(β) s.t. Rβ = q. Exclusionary
restrictions of the form H0 : βk = 0, βm = 0, ... are a special case of
H0 : Rβ = q. In this case, restricted least squares is simply estimated as
a regression were the explanatory variables k, m, ... are excluded.
Note: the Wald statistic can also be calculated as
A
W = J · F ∼ χ2J
where F is the small sample F -statistic. This formulation differs by a
factor (N − K − 1)/N but has the same asymptotic distribution.
10 Confidence Intervals in Large Samples
Assuming OLS1, OLS2, OLS3d, OLS5, and OLS4a or OLS4b, we can

construct confidence intervals for a particular coefficient βk . The (1 − α)
confidence interval is given by

βbk − z(1−α/2) se[
b βbk ] , βbk + z(1−α/2) se[
b βbk ]
where z(1−α/2) is the (1 − α/2) quantile of the standard
normal distribu-
tion. For example, the 95 % confidence interval is βk − 1.96se[
b b βbk ] , βbk +

1.96se[
b βbk ] .
11 Small Sample vs. Asymptotic Properties
The t-test, F -test and confidence interval for small samples depend on the
normality assumption OLS3a (see Table 1). This assumption is strong and
unlikely to be satisfied. The asymptotic z-test, Wald test and the con-
fidence interval for large samples rely on much weaker assumptions. Al-
though most statistical software packages report the small sample results
by default, we would typically prefer the large sample approximations. In
practice, small sample and asymptotic tests and confidence intervals are
very similar already for relatively small samples, i.e. for (N − K) > 30.
Large sample tests also have the advantage that they can be based on
heteroscedasticity robust standard errors.
12 More Known Issues
Non-linear functional form: The true relationship between the dependent

variable and the explanatory variables is often not linear and thus in vi-
olation of assumption OLS1. The multiple linear regression model allows
for many forms of non-linear relationships by transforming both depen-
dent and explanatory variables. See the handout on “Functional Form in
the Linear Model” for details.
Aggregate regressors: Some explanatory variables may be constant
within groups (clusters) of individual observations. For example, wages of
individual workers are regressed on state-level unemployment rates. This
is a violation of the independence across individual observations (OLS2 ).
In this case, the usual standard errors will be too small and t-statistics too
√
large by a factor of up to M , where M is the average number of individ-
ual observations per group (cluster). For example, the average number of
workers per state. Cluster-robust standard errors will provide asymptot-
ically consistent standard errors for the usual OLS point estimates. See
the handout on “Clustering in the Linear Model” for more details and
generalizations.
Omitted variables: Omitting explanatory variables in the regression
generally violates the exogeneity assumption (OLS3 ) and leads to biased
and inconsistent estimates of the coefficients for the included variables.
This omitted-variable bias does not occur if the omitted variables are
uncorrelated with all included explanatory variables.
Irrelevant regressors: Including irrelevant explanatory variables, i.e.

variables which do not have an effect on the dependent variable, does not
lead to biased or inconsistent estimates of the coefficients for the other
included variables. However, including too many irrelevant regressors may
lead to very imprecise estimates, i.e. very large standard errors, in small
datasets.
Reverse causality: A reverse causal effect of the dependent variable
on one or several explanatory variables is a violation of the exogeneity
assumption (OLS3 ) and leads to biased and inconsistent estimates. See
the handout on “Instrumental Variables” for a potential solution.
Measurement error : Imprecise measurement of the explanatory vari-
ables is a violation of OLS3 and leads to biased and inconsistent estimates.
See the handout on “Instrumental Variables” for a potential solution.
Multicollinearity: Perfectly correlated explanatory variables violate
the identifiability assumption (OLS5 ) and their effects cannot be esti-
mated separately. The effects of highly but not perfectly correlated vari-
ables can in principle be separately estimated. However, the estimated
coefficients will be very imprecise, i.e. the standard errors will be very
large. If variables are (almost) perfectly correlated in all conceivable states
of the world, there is no theoretical meaning of separate effects. If mul-
ticollinearity is only a feature of a specific sample, collecting more data
may provide the necessary variation to estimate separate effects.
Heterogeneous effects: OLS1 assumes that the parameters βk are con-
stant across individuals i. However, in reality, effects βik likely differ
across i, i.e. the effects are heterogeneous and researchers seek to esti-
mate an average treatment effect AT Ek = E(βik ). Unfortunately, the
OLS estimator βbk is in general not an unbiased estimator for AT Ek . An
exception is the regression of a dependent variable yi on a single dummy
variable Di which takes value 1 for the treated group and 0 for the control
group: yi = β0 + β1 Di + ui . βb1 is then the difference between the aver-
age of the treated and the average of the control group and an unbiased
estimator for the AT E provided that Di is indendent of ui (OLS3b).
Implementation in Stata 17
The multiple linear regression model is estimated by OLS with the regress
command. For example,
webuse auto.dta
regress mpg weight displacement
regresses the mileage of a car (mpg) on weight and displacement (see

annotated output next page). A constant is automatically added if not
suppressed by the option noconst
regress mpg weight displacement, noconst
Estimation based on a subsample is performed as

regress mpg weight displacement if weight>3000
where only cars heavier than 3000 lb are considered. Transformations of

variables are included with new variables
generate logmpg = log(mpg)
generate weight2 = weight^2
regress logmpg weight weight2 displacement
The Eicker-Huber-White covariance is reported with the option robust

regress mpg weight displacement, vce(robust)
F -tests for one or more restrictions are calculated with the post-estimation
command test. For example
test weight
tests H0 : β1 = 0 against HA : β1 6= 0, and

test weight displacement
tests H0 : β1 = 0 and β2 = 0 against HA : β1 6= 0 or β2 6= 0

New variables with residuals and fitted values are generated by
predict uhat if e(sample), resid
predict pricehat if e(sample)
SSE/K K
#
"! N-K-1
%('
$ () N
K
F-Test
N-K-1 *+ : -. = 0
and -# = 0
Source SS df MS Number of obs = 74
p-value
F( 2, 71) = 66.79
*+ : -. = 0
SSE Model 1595.40969 2 797.704846 Prob > F = 0.0000 and -# = 0
Residual 848.049768 71 11.9443629 R-squared = 0.6529
SSR R2
Adj R-squared = 0.6432

SST Total 2443.45946 73 33.4720474 Root MSE = 3.4561 adj. R2
N-1 "!
mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]

-7.
weight -.0065671 .0011662 -5.63 0.000 -.0088925 -.0042417
-7#
displacement .0052808 .0098696 0.54 0.594 -.0143986 .0249602
-7+ _cons 40.08452 2.02011 19.84 0.000 36.05654 44.11251
546(-7. ) 95%-confidence interval for -7.
t-Test *+ : -. = 0 p-value *+ : -. = 0
14
Implementation in R 4.3.1
The multiple linear regression model is estimated by OLS with the lm()
function. For example,
library(haven)
auto <- read_dta("http://www.stata-press.com/data/r17/auto.dta")
ols <- lm(mpg~weight+displacement, data=auto)
summary(ols)
confint(ols)
regresses the mileage of a car (mpg) on weight and displacement.

A constant is automatically added if not suppressed by -1
lm(mpg~weight+displacement-1, data=auto)
Estimation based on a subsample is performed as

lm(mpg~weight+displacement, subset=(weight>3000), data=auto)
where only cars heavier than 3000 lb are considered.

Tranformations of variables can be directly included in the formula
lm(log(mpg)~weight+I(weight^2)+ displacement, data=auto)
where transformations of explanatory variables must be wrapped in I().

The Eicker-Huber-White covariance is reported after estimation with
library(sandwich)
library(lmtest)
coeftest(ols, vcov=sandwich)
F -tests for one or more restrictions are calculated with the command
waldtest which also uses the two packages sandwich and lmtest
waldtest(ols, "weight", vcov=sandwich)
tests H0 : β1 = 0 against HA : β1 6= 0 with Eicker-Huber-White, and

waldtest(ols, .~.-weight-displacement, vcov=sandwich)
tests H0 : β1 = 0 and β2 = 0 against HA : β1 6= 0 or β2 6= 0.

Residuals and fitted values, respectively, are stored in vectors by
uhat <- resid(ols)
mpghat <- fitted(ols)
> summary(ols)
Call:
lm(formula = mpg ~ weight + displacement, data = auto)
Residuals:
Min 1Q Median 3Q Max
-6.9654 -2.0618 -0.5368 0.9775 13.8371
p-value 𝐻! : 𝛽" = 0
Coefficients:
𝛽&! Estimate Std. Error t value Pr(>|t|) t-Test 𝐻! : 𝛽" = 0
(Intercept) 40.084522 2.020110 19.843 < 2e-16 ***
𝛽&# weight -0.006567 0.001166 -5.631 3.35e-07 *** 𝑠'𝑒(𝛽&" )
displacement 0.005281 0.009870 0.535 0.594
𝛽&"
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
𝜎, N-K-1
Residual standard error: 3.456 on 71 degrees of freedom
2
R Multiple R-squared: 0.6529, Adjusted R-squared: 0.6432
adj. R2
F-statistic: 66.79 on 2 and 71 DF, p-value: < 2.2e-16
F-Test
𝐻! : 𝛽# = 0 > confint(ols)
p-value
and 𝛽" = 0 2.5 % 97.5 % 𝐻! : 𝛽# = 0
(Intercept) 36.056536844 44.112507832 and 𝛽" = 0
weight -0.008892523 -0.004241688
displacement -0.014398605 0.024960155 N-K-1
K
95%-confidence
interval for 𝛽&"
16
References
Introductory textbooks
Stock, James H. and Mark W. Watson (2020), Introduction to Economet-

rics, 4th Global ed., Pearson. Chapters 4 - 9.
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed., Cengage Learning. Chapters 2 - 8.
Advanced textbooks
Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:

Methods and Applications, Cambridge University Press. Sections 4.1-
4.4.
Wooldridge, Jeffrey M. (2002), Econometric Analysis of Cross Section and
Panel Data, MIT Press. Chapters 4.1 - 4.23.
Companion textbooks
Angrist, Joshua D. and Jörn-Steffen Pischke (2009), Mostly Harmless

Econometrics: An Empiricist’s Companion, Princeton University Press.
Chapter 3.
Kennedy, Peter (2008), A Guide to Econometrics, 6th ed., Blackwell Pub-
lishing. Chapters 3 - 11, 14.

The Multiple Linear Regression Model: Version: 30-10-2023, 16:07

Uploaded by

Copyright:

Available Formats

The Multiple Linear Regression Model: Version: 30-10-2023, 16:07

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Multiple Linear Regression Model: Version: 30-10-2023, 16:07

Uploaded by

Copyright:

Available Formats

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2023 University of Basel

The Multiple Linear Regression Model

• Wage of an employee as a function of her education and her work

• Price of a house as a function of its number of bedrooms and its age

The dependent variable is an interval variable, i.e. its values represent

Version: 30-10-2023, 16:07

2 The Econometric Model

The multiple linear regression model assumes a linear (in parameters)

where β is a (K + 1)-dimensional column vector of parameters, x0i is a

where y is a N -dimensional column vector, X is a N × (K + 1) matrix

The data generation process (dgp) is fully described by a set of as-

OLS1 assumes that the functional relationship between dependent and

OLS4: Error Variance

3 Estimation with OLS

Ordinary least squares (OLS) minimizes the squared distances between

The resulting OLS estimator of β is:

Given the OLS estimator, we can predict the dependent variable by

The goodness-of-fit of an OLS regression can be measured as

Figure 1: The linear regression model with one regressor. β0 = −2,

the explained sum of squares if the regression contains a constant and

5 Small Sample Properties

• The OLS estimator of β is unbiased :

• The OLS estimator is (multivariate) normally distributed:

ticity (OLS4b). Under homoscedasticity (OLS4a) the variance V

• Gauß-Markov-Theorem: under homoscedasticity (OLS4a),

βb is BLUE (best linear unbiased estimator)

6 Tests in Small Samples

Assume OLS1, OLS2, OLS3a, OLS4a, and OLS5.

follows a t-distribution with N − K − 1 degrees of freedom. The standard

7 Confidence Intervals in Small Samples

Assuming OLS1, OLS2, OLS3a, OLS4a, and OLS5, we can construct

where t(1−α/2),(N −K−1) is the (1 − α/2) quantile of the t-distribution with

8 Asymptotic Properties of the OLS Estimator

• The OLS estimator is consistent:

• The OLS estimator is asymptotically normally distributed under

and under OLS4b as

• The OLS estimator is approximately normally distributed

where the asymptotic variance Avar[β]

with σc2 = ub0 u

Assume OLS1, OLS2, OLS3d, OLS4a or OLS4b, and OLS5.

follows approximately the standard normal distribution. The standard

follows approximately an χ2 distribution with J degrees of freedom. For

Rβ 6= q on the 5% significance level, we calculate the Wald statistic and

10 Confidence Intervals in Large Samples

Assuming OLS1, OLS2, OLS3d, OLS5, and OLS4a or OLS4b, we can

11 Small Sample vs. Asymptotic Properties

12 More Known Issues

Non-linear functional form: The true relationship between the dependent

Irrelevant regressors: Including irrelevant explanatory variables, i.e.

regresses the mileage of a car (mpg) on weight and displacement (see

Estimation based on a subsample is performed as

where only cars heavier than 3000 lb are considered. Transformations of

The Eicker-Huber-White covariance is reported with the option robust

tests H0 : β1 = 0 against HA : β1 6= 0, and

tests H0 : β1 = 0 and β2 = 0 against HA : β1 6= 0 or β2 6= 0

Adj R-squared = 0.6432

mpg Coef. Std. Err. t P>|t| [95% Conf. Interval]