University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
University of Gondar College of Medicine and Health Science Department of Epidemiology and Biostatistics
1
College of medicine and health science
Department of Epidemiology and Biostatistics
December, 2018
Analysis of variance
Introduction # 1
3
Sampling Distribution ..........
9
Standard deviation and Standard error
x
p =
n
Example
Some BLUE estimators
17
Interval Estimation
18
[x z . , x z . ]
2 n 2 n
[ p z . p(1 p) / n, p z . p(1 p) / n]
2 2
Interval estimation
22
23
24
25
Confidence intervals…
26
The 95% confidence interval is calculated in such a way that, under the
conditions assumed for underlying distribution, the interval will contain true
population parameter 95% of the time.
Loosely speaking, you might interpret a 95% confidence interval as one which
you are 95% confident contains the true parameter.
90% CI is narrower than 95% CI since we are only 90% certain that the interval
includes the population parameter.
On the other hand 99% CI will be wider than 95% CI; the extra width meaning
that we can be more certain that the interval will contain the population
parameter. But to obtain a higher confidence from the same sample, we must
be willing to accept a larger margin of error (a wider interval).
Confidence intervals…
27
Confidence interval for a single mean
CI =
30 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
Confidence interval ……
31
f(t)
}
0 .2
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106 0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 0 1.372
-2.228 2.228
16 1.337 1.746 2.120 2.583 2.921
}
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22
23
1.321
1.319
1.717
1.714
2.074
2.069
2.508
2.500
2.819
2.807 Wheneverisisnot
Whenever notknown
known(and
(andthe
thepopulation
populationisis
24
25
1.318
1.316
1.711
1.708
2.064
2.060
2.492
2.485
2.797
2.787 assumednormal),
assumed normal),thethecorrect
correctdistribution
distributiontotouse
useisis
26
27
1.315
1.314
1.706
1.703
2.056
2.052
2.479
2.473
2.779
2.771 thet tdistribution
the distributionwith
withn-1
n-1degrees
degreesofoffreedom.
freedom.
28
29
1.313
1.311
1.701
1.699
2.048
2.045
2.467
2.462
2.763
2.756 Note,however,
Note, however,that
thatforforlarge
largedegrees
degreesofoffreedom,
freedom,
30
40
1.310
1.303
1.697
1.684
2.042
2.021
2.457
2.423
2.750
2.704 thet tdistribution
the distributionisisapproximated
approximatedwell
wellbybythe
theZZ
60
120
1.296
1.289
1.671
1.658
2.000
1.980
2.390
2.358
2.660
2.617 distribution.
distribution.
1.282 1.645 1.960 2.326 2.576
Point and Interval Estimation of the Population Proportion (p)
xi
0 . 295
x = i =1
0 . 01844
n 16
AAstock
stockmarket
marketanalyst
analystwants
wantstotoestimate
estimatethetheaverage
averagereturn
returnon
onaa
certainstock.
certain stock. AArandom
randomsample
sampleof of15
15days
daysyields
yieldsan
anaverage
average
(annualized)return
(annualized) returnof
of x 10.37 andandaastandard
standarddeviation
deviationof
ofss==
3.5.Assuming
3.5. Assumingaanormal
normalpopulation
populationof ofreturns,
returns,give
giveaa95%
95%confidence
confidence
intervalfor
interval forthe
theaverage
averagereturn
returnononthis
thisstock.
stock.
df
---
t0.100
-----
t0.050
-----
t0.025
------
t0.010
------
t0.005
------ The critical value of t for df = (n -1) = (15 -1)
1
.
3.078
.
6.314
.
12.706
.
31.821
.
63.657
. =14 and a right-tail area of 0.025 is:
t 0.025 2.145
. . . . . .
. . . . . .
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 The corresponding confidence interval or
15 1.341 1.753 2.131 2.602 2.947 s
. . . . . .
interval estimate is: x t 0 . 025
.
.
.
.
.
.
.
.
.
.
.
. n
35
.
10.37 2.145
15
10.37 1.94
8.43,12.31
Example 3:
38
ˆ X 160 P (1 P ) 0 .4 ( 0 .6 )
P 0.4 P2ˆ 0.0245
n 400 n 400
Hence, we can conclude that about 98% confident that the true proportion of people in
the population who participate in sports between 34.5% and 45.7%.
HYPOTHESIS TESTING
40
Introduction
Researchers are interested in answering many types of
1 2
Identify the null hypothesis H0 and Choose a. The value should be small, usually less
than 10%. It is important to consider the
the alternate hypothesis HA. consequences of both types of errors.
3
Select the test statistic and determine 4
its value from the sample data. This
value is called the observed value of Compare the observed value of the statistic to the
the test statistic. Remember that t critical value obtained for the chosen a.
statistic is usually appropriate for a
small number of samples; for larger
number of samples, a z statistic can 5
work well if data are normally Make a decision.
distributed. 6
Conclusion
Test Statistics
46
Because of random variation, even an unbiased sample may not
accurately represent the population as a whole.
As a result, it is possible that any observed differences or
associations may have occurred by chance.
A test statistics is a value we can compare with known distribution
of what we expect when the null hypothesis is true.
The general formula of the test statistics is:
Observed _ Hypothesized
Test statistics = value value .
Standard error
The known distributions are Normal distribution, student’s distribution ,
Chi-square distribution ….
Critical value
47
The critical value separates the critical region from the noncritical region
for a given level of significance
Decision making
48
2 H 0 : 0 ( 0 )
H A : 1 0 ( 0 )
x
z cal 0
, z tabulated z for one tailed test
n
if z cal z tab reject H o
Decision :
if z cal z tab do not reject H o
3 H 0 : 0 ( 0 )
H A : 1 0 ( 0 )
if z cal z tab reject H o
Decision :
if z cal z tab do not reject H o
The P- Value
56
When the p-value is less than to 0.05, we often say that the
result is statistically significant.
Hypothesis testing for single population mean
61
To the hypothesis we need to follow the steps
Step 1: State the hypothesis
Ho: P=Po=0.3
Ha: P≠Po ≠0.3
Step 2: Fix the level of significant (α=0.05)
Step 3: Compute the calculated and tabulated value of the test statistic
p Po 0 .175 0 .3 0 .125
z cal 8.39
p (1 p ) 0 .3( 0 .7 ) 0 .0149
n 947
z tab 1 .96
Example……
66
If the sample size is small (if np<5 and n(1-p)<5) then use student’s
t- statistic for the tabulated value of the test statistic.
67
Comparing Two Population Means;
Independent Samples, Vars Known cont’d…
12 22
x1 x 2
n1 n2
Comparing Two Population Means;
Ind’t Samples, Vars Known cont’d…
z
x1 x 2 D 0
12 2
2
n1 n 2 ; where Do = (µ1 – µ2)o
Hypothesis testing for two sample means
72
The steps to test the hypothesis for difference of means is the same
with the single mean
Step 1: state the hypothesis
Ho: µ1-µ2 =0
VS
HA: µ1-µ2 ≠0, HA: µ1-µ2 <0, HA: µ1-µ2 >0
Step 2: Significance level (α)
Step 3: Test statistic
(x y) (1 2 )
z cal
12 22
n1 n2
Hypothesis …
73
H O : 1 2 0
H A : 1 2 0
SOLUTION
75
(x y) (1 2 ) ( 4 .3 3 .4 ) 0
z cal
2
2
2 .9 2 3 .5 2
1
2
n1 n2 12 15
1 .6 1 .6
5 . 33
1 . 5178 1 . 23
z z 0 . 025 1 . 96
2
Comparing Two Population Means;
case-2: Independent Samples, Variances Unknown
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…
s 2p
n1 1 s12 n 2 1 s 22
n1 n 2 2
1 1
s 2
p where Do = (µ1 – µ2)o
n1 n2
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…
1 1
x1 x 2 t 2 ( n1 n 2 2 ) s
2
p
n1 n2
t
x1 x 2 D 0
1 1
s 2
p
n1 n2
Comparing Two Population Means;
Ind’t Samples, Vars Unknown cont’d…
z
x1 x 2 D 0
s 12 s 2
2
n 1 n 2
x1 x D df
s 2
1 /n 1 s /n 2
2
2 2
s
t 2 0
2 2
2 2
s 12
s 22 /n 1 s /n 2
n1 n2
1
2
n1 1 n2 1
Comparing Two Population Means;
Case-3: Paired/matched/repeated sampling
Paired sampling cont’d…
Paired sampling cont’d…
Hypothesis testing for two proportions
p̂1 1 p̂1 p̂ 2 1 p̂ 2
p̂1 p̂ 2 z 2
n1 n2
( p 1 p 2 ) ( 1 2 )
z cal
p 1 (1 p 1 ) p 2 (1 p 2 )
n1 n2
Small sample size
89
Comparing Two Population Proportions cont’d…
A.
Where ≈ 0.005.
pˆ1 1 pˆ1 pˆ 2 1 pˆ 2
pˆ1 pˆ 2 z 2 0.019 1.96* 0.0035
n1 n2
The 95% CI for the difference is = (0.012, 0.026)
92
ANOVA
Introduction
Here in the case of two independent sample t-test, we
have one continuous dependent variable (interval/ratio
data) and;
Conducting multiple t-tests can lead also to severe inflation of the Type I
error rate (false positives) and is not recommended
The ANOVA uses data from all groups at a time to estimate standard
errors, which can increase the power of the analysis
Assumptions of One Way ANOVA)
A2
One way model: Data are
deviations from treatment means,
Ais:
A1
Xij = μ + Ai + Ɛij
Sum of vertical deviations squared
= SSe
G-1 G-2
Decomposing the total variability
102
n a n a n a
Total SS = Σ Σ (xij – )2 = ΣiΣjxij2 - (ΣiΣjxij)2 /na = SST
i=1 j=1
n a n a a n
Within SS = Σ Σ (xij – j
)2 = ΣiΣjxij2 - Σj(Σixij)2/n = SSW
i=1 j=1
n a a n
Between SS = Σ Σ ( i j
– )2 = Σj(Σixij)2/n - (ΣiΣjxij)2 /na = SSB
i=1 j=1
This is assuming each of the ‘a’ groups has equal size, ‘n’.
Source of df SS MS F
variation
Between groups a-1 SSB = A - CF SSB/(a-1) MSB/MSW
Within groups na-a SSW = T - A SSW/(na –a)
Total na-1 SST = T - CF
Total 21 55232
Since the P-value is less than 0.05, the null hypothesis is rejected
Pair-wise comparisons of group means post hoc tests or multiple comparisons
Therefore, the main explanation for the difference between the groups
that was identified in the ANOVA is thus the difference between groups
I and II.
Which post hoc method Shall I use?
The post hoc tests differ from one another in how they calculate
the p value for the mean difference between groups.
Least Squares Difference (LSD) is the most liberal of the post hoc
tests and has a high Type I error rate. It is simply multiple t-tests
Thank You!