Chapter7 Anova

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

ANOVA method

Multiple comparisons methods following ANOVA.

P ROBABILITY AND S TATISTICS


C HAPTER 7: A NALYSIS OF VARIANCE (ANOVA)

Dr. Phan Thi Huong

HoChiMinh City University of Technology


Faculty of Applied Science, Department of Applied Mathematics
Email: [email protected]

HCM city — 2021.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

E XAMPLE 1
A manufacturer of paper used for making grocery bags is interested
in improving the tensile strength of the product. Product
engineering thinks that tensile strength is a function of the
hardwood concentration in the pulp and that the range of
hardwood concentrations of practical interest is between 5 and
20%. A team of engineers responsible for the study decides to
investigate four levels of hardwood concentration: 5%, 10%, 15%,
and 20%. They decide to make up six test specimens at each
concentration level, using a pilot plant. All 24 specimens arc tested
on a laboratory tensile tester, in random order. The data from this
experiment are shown in the Table below.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

Hardwood Observations
Concentration (%) 1 2 3 4 5 6 Totals Averages
5 7 8 15 11 9 10 60 10.00
10 12 17 13 18 19 15 94 15.67
15 14 18 19 17 16 18 102 17.00
20 19 25 22 23 18 20 127 21.17
383 15.96
TABLE 1: Tensile Strength of Paper (psi)

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

Question: does the hardwood concentration affect the tensile


strength of the bags?

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

Question: does the hardwood concentration affect the tensile


strength of the bags?
Statistical problem: comparing the tensile strength means
between 4 groups of hardwood concentration.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

Question: does the hardwood concentration affect the tensile


strength of the bags?
Statistical problem: comparing the tensile strength means
between 4 groups of hardwood concentration.
The experiment is carried out in random order (a completely
randomized design).

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

I NTRODUCTION

Question: does the hardwood concentration affect the tensile


strength of the bags?
Statistical problem: comparing the tensile strength means
between 4 groups of hardwood concentration.
The experiment is carried out in random order (a completely
randomized design).
⇒ we need a statistical technique called ANOVA.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE DEFINITIONS


The levels of the factor are sometimes called treatments.
The response for each of the a treatments is a random variable.
The observed data would appear as shown in the Table below.
Treatment Observations Totals Averages
1 y 11 y 12 · · · y 1n y 1· ȳ 1·
2 y 21 y 22 · · · y 2n y 2· ȳ 2·
.. .. .. ...... .. .. ..
. . . ... . . .
k y k1 y k2 · · · y kn y k· ȳ k·
y ·· ȳ ··
Where
n
X
yi · = yi j , ȳ i · = y i · /n, i = 1, 2, . . . , k
j =1
k X
X n
y ·· = yi j , ȳ ·· = y ·· /N , N = kn
i =1 j =1

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE MODELS

Considering the model:

Yi j = µ + τi + ²i j (1)

where i = 1, 2, . . . , k and j = 1, 2, . . . , n. In the formula,


- µ is a parameter common to all treatments called the overall
mean,
- τi is a parameter associated with the ith treatment called the
ith treatment effect,
- and ²i j a random error component.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE MODELS

The model is also written as


(
i = 1, 2, . . . , k
Yi j = µi + ²i j (2)
j = 1, 2, . . . n

where µi = µ + τi is the mean of the ith treatment.


⇒ each treatment defines a population that has mean µi .
⇒ if εi j ∼ N (0, σ2 ), each treatment can be thought of as a normal
population with mean µi and variance σ2 .

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE A SSUMPTIONS

The Assumptions of the ANOVA for the fixed-effects and single


factor model:
Pk
i =1 τi = 0
The populations are normally distributed.
The population has equal variances, e.i. εi j ∼ N (0, σ2 ).
The samples are random and independent.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE HYPOTHESES

The null hypothesis:

H 0 : τ 1 = τ2 = . . . = τk = 0

Changing the levels of the factor has no effect on the mean.


response.
The alternative hypothesis:

H1 : τi 6= 0 for at least one i

There exists the difference between the levels of the factor.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE VARIATION

The ANOVA partitions the total variability in the sample data into
two component parts.

The sum of squares identity is

k X
n k k X
n
(y i j − ȳ ·· )2 = n ( ȳ i · − ȳ ·· )2 + (y i j − ȳ i · )2
X X X
i =1 j =1 i =1 i =1 j =1

or

SST = SSB + SSE

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE TOTAL VARIATION


SST describes the total variability in the data:
SST = ki=1 nj=1 (y i j − ȳ ·· )2 .
P P

A computational formula:
k X
n y ··2
y i2j −
X
SST =
i =1 j =1 N
Dr. Phan Thi Huong Probability and Statistics
ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE VARIATION BETWEEN


TREATMENTS MEANS

SSB describes the total variability between treatment means:


SSB = n ki=1 ( ȳ i · − ȳ ·· )2 .
P

A computational formula:
k y2 y ··2

X
SSB = −
i =1 n N
Dr. Phan Thi Huong Probability and Statistics
ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE VARIATION WITHIN


TREATMENTS

SSE describes the total variability of observation within


treatments: SSE = ki=1 nj=1 (y i j − ȳ i · )2 .
P P

A computational formula:
SSE = SST − SSB.
Dr. Phan Thi Huong Probability and Statistics
ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE MEAN SQUARES

SSB
The mean square for treatment: M SB = k−1
SSE
The mean square for errors: M SE = k(n−1)
The expected value of the treatment sum of squares is

k
E (SSB ) = (k − 1)σ2 + n τ2i
X
i =1

and the expected value of the error sum of squares is

E (SSE ) = k(n − 1)σ2

⇒ if H0 is true, M SB is an unbiased estimator of σ2 .


⇒ M SE is an unbiased estimator of σ2 regardless of whether or
not H0 is true.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE ANOVA F- TEST

(
H 0 : H 0 : τ 1 = τ2 = . . . = τk = 0
H1 : τi 6= 0 with at least one i

The test statistic:


M SB SSB /(k − 1)
F0 = = (3)
M SE SSE /[k(n − 1)]

F 0 has a Fisher distribution with (k − 1) and k(n − 1) degrees of


freedom, F 0 ∼ f k−1,k(n−1) .
Given α, we would reject H0 if f 0 > f k−1,k(n−1),α .

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

T HE A NALYSIS OF VARIANCE - T HE ANOVA F- TEST

Source of variation SS df MS F

Treatments SSB k −1 M SB
M SB
Error SSE k(n − 1) M SE f0 = M SE

Total SST kn − 1

TABLE 2: Analysis of Variance for a Single-Factor Experiment,


Fixed-Effects Model

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

M ULTIPLE COMPARISONS METHODS FOLLOWING ANOVA.

When the null hypothesis H0 : τ1 = τ2 = . . . = τk is rejected in


the ANOVA, we know that some of the treatment or factor-level
means are different. However, the ANOVA does not identify
which means are different.
To identify which pairs of treatment means are different, we
use multiple comparisons methods. Here we describe a very
simple one, Fisher’s least significant difference (LSD) method.
The Fisher LSD method compares all pairs of means with the
null hypothesest H0 : µi = µ j (for all i 6= j ).

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

M ULTIPLE COMPARISONS METHODS FOLLOWING ANOVA.

T HEOREM 2.1
If the assumption of ANOVA is adapted, then

(Ȳi − Ȳ j ) − (µi − µ j )
T= q
2M SE
n

follows Student distribution with k(n − 1) degrees of freedom.

T HE F ISHER LSD METHOD FOR CONFIDENCE INTERVALS .


100(1 − α)% CI for µi − µ j is given by
s s
2M SE 2M SE
ȳ i − ȳ j − t k(n−1),α/2 ≤ µi − µ j ≤ ȳ i − ȳ j + t k(n−1),α/2
n n

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

M ULTIPLE COMPARISONS METHODS FOLLOWING ANOVA.


T HE F ISHER LSD METHOD FOR HYPOTHESIS TESTS .
Consider the hypotheses:

H 0 : µi − µ j = 0
H 1 : µi − µ j 6= 0

ȳ i − ȳ j
The test statistic value: t 0 = q −→ t − t est
2M SE
n
Particularly, H 0 is rejected when
s
k(n−1) 2M SE
| ȳ i − ȳ j | > t α/2
n
q
k(n−1) 2M SE
where LSD = t α/2 n is called the least significant difference.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

E XAMPLE

A manufacturer of paper used for making grocery bags is interested


in improving the tensile strength of the product. Product
engineering thinks that tensile strength is a function of the
hardwood concentration in the pulp and that the range of
hardwood concentrations of practical interest is between 5 and
20%. A team of engineers responsible for the study decides to
investigate four levels of hardwood concentration: 5%, 10%, 15%,
and 20%. They decide to make up six test specimens at each
concentration level, using a pilot plant. All 24 specimens arc tested
on a laboratory tensile tester, in random order. The data from this
experiment are shown in the Table below.

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

E XAMPLE

Hardwood Observations
Concentration (%) 1 2 3 4 5 6 Totals Averages
5 7 8 15 11 9 10 60 10.00
10 12 17 13 18 19 15 94 15.67
15 14 18 19 17 16 18 102 17.00
20 19 25 22 23 18 20 127 21.17
383 15.96
TABLE 3: Tensile Strength of Paper (psi)

Dr. Phan Thi Huong Probability and Statistics


ANOVA method
Multiple comparisons methods following ANOVA.

E XAMPLE

(A) Does the hardwood concentration affect the tensile strength of


the bags?
(B) Find the confidence interval for the different means of tensile
strength of the bags between two hardwood concentration
levels 10 and 15.
(C) Interpret the multiple comparison result.

Dr. Phan Thi Huong Probability and Statistics

You might also like