STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao
STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao
STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao
Unit – 3: Sampling Distribution: population and sample, sampling distribution of the mean (σ known),
sampling distribution of the mean (σ unknown), sampling distribution of the variance: Chi-square and F-
distributions.
Unit – 4:
Estimation and Test of Hypothesis of Means: Point estimation, interval estimation, test of hypothesis,
hypothesis concerning one mean, hypothesis concerning two means, matched pair comparisons.
Unit – 5: Estimation and Test of Hypothesis of Variances and Proportions: Estimation of variance,
hypothesis concerning one variance, hypothesis concerning two variances, estimation of proportion,
hypothesis concerning one proportion, hypothesis concerning several proportions.
(Population and sample, sampling distribution of the mean (σ known), sampling distribution of the mean (σ
unknown), sampling distribution of the variance: Chi-square and F-distributions.)
Sampling Distribution: population and sample, sampling distribution of the mean (σ known), sampling
distribution of the mean (σ unknown), sampling distribution of the variance: Chi-square and F-
distributions.
Population: A collection of objects (collection of numbers, measurements, observations etc) is called a
Population
Examples:
(1) Heights of the students in a University
(2) Marks obtained by the students of SSC in Mathematics
(3) Scores of candidates obtained in a competitive exam
(4) The set of outcomes when a coin tossed 1000 times
(5) Collection of all even numbers
Parameters: The statistical measures (Mean, Median, Variance etc.) about a population are called
Parameters.
The Mean, Variance and Standard deviation of a population are respectively denoted by the symbols
, 2 and
Statistics: The statistical measures (Mean, Median, Variance etc.) about a sample are called statistics.
The Mean, Variance and Standard deviation of a sample are respectively denoted by the symbols
x , s 2 and s
Examples:
(1) For the population of ‘the Heights of the students in a University’, the heights of the students in class
of 40 is a sample
(2) For the population of ‘the Marks obtained by the students of SSC in Mathematics’, the marks of the
students of a particular school is a sample
(3) For the population of ‘the Scores of candidates obtained in a competitive exam’, the scores of the
candidates from a particular college is a sample
(4) For the population of ‘the set of outcomes when a coin tossed 1000 times’, a collection of 10 outcomes
is a sample
(5) For the population of ‘the Collection of all even numbers’, a collection of 20 even numbers is a sample
Random Sample: A sample of size n taken from a population is called a random sample if the probability
of any choice of n objects from the population is same.
Large Sampling: Collecting large samples from a given population is called large sampling
Small Sampling: Collecting small samples from a given population is called small sampling
Sampling with replacement: Collecting samples in which the objects may repeat, from a given population
is called sampling with replacement
Sampling without replacement: Collecting samples in which the objects not repeat, from a given
population is called sampling without replacement
Number of Samples:
N
(i) The number of samples of size n without replacement, taken from a finite population of size N is C n
n
(ii) The number of samples of size n with replacement, taken from a finite population of size N is N
(iii) The number of samples of size n with or without replacement, taken from an infinite population is
Examples:
This frequency distribution is called the sampling distribution of the mean (SDM)
Note:
Here drawing a sample of size n 2 without replacement from the above population is a random
experiment with the possible outcomes 1, 2, 1, 3 , 1, 4 , 2, 3 , 2, 4 , 3, 4
That is, the sample space S 1, 2, 1, 3 , 1, 4 , 2, 3 , 2, 4 , 3, 4
If X is the random variable which gives the mean the sample, then the range of X is
1.5, 2, 2.5, 3, 3.5
(2) Consider the finite population 1, 2, 3, 4 of size N 4
The number of samples of size n 2 with replacement is given by N n 4 2 16
1, 1.5, 2, 2.5,
1.5, 2, 2.5, 3,
2, 2.5, 3, 3.5,
2.5, 3, 3.5, 4
Note:
N n
(1) The value of is called the finite population correction factor
N 1
(2) The standard deviation of X is called Standard error of the mean; that is,
Standard error (SE), X
n
(3) The value of 0.6745 is called the Probable error of the mean; that is,
n
Probable error (PE) 0.6745
n
Chebyshev’s Theorem:
If X is the random variable which gives the mean a random sample of size n, taken from a population
having mean and variance 2 , then P X k 1 2
nk2
, where k is a positive constant.
Note:
1. P X k 1
2
nk2
or P k X k 1
2
nk2
2. P X k nk2
2
4
7.5 6.25 1.25
The means of the above 6 samples are respectively 1.5, 2, 2.5, 2.5, 3, 3.5
6
2.25 4 6.25 6.25 9 12.25
X
2
6
2.5
40
2
6
6.6667 6.25 0.4167
(v) Verification:
We have X 2.5
2 N n 1.25 4 2
0.4167 X
2
and
n N 1 2 4 1
2 N n
Therefore, X and X are verified.
2
n N 1
4
7.5 6.25 1.25
1, 1.5, 2, 2.5,
1.5, 2, 2.5, 3,
2, 2.5, 3, 3.5,
2.5, 3, 3.5, 4
(iv) Mean and Variance of the Sampling distribution of the mean (SDM)
Mean, X
x f 11 1.52 23 2.54 33 3.52 41 40 2.5
f 16 16
Variance, 2
x f 2
2
X
f X
1 1 1.5 2 2 3 2.5 4 3 3 3.5 2 4 1
2 2 2 2 2 2 2
2
X
16
2.5
110
6.875 6.25 0.625
2
16
(v) Verification:
2 1.25
We have X 2.5 and 0.625 X2
n 2
2
(8) A normal population has mean 0.1 and standard deviation 2.1. Find the probability that the mean of a
sample of size 900 will be negative
PZ 0.94
F 0.94
1 F 0.94 1 0.8264 0.1736
(10) If a 1- gallon can of paint covers on an average 513 square feet with a standard deviation of 31.5
square feet, what is the probability that the mean area covered by a sample of 40 of these 1- gallon
cans will be anywhere from 510 to 520 square feet
(11) For a large sample of size n, verify that there is a 50-50 chances that the mean of a random sample
from an infinite population with standard deviation differ from by less than 0.6745
n
We have prove that P X 0.6745 0.5
n
X
Consider, P X 0.6745 P 0.6745
n
n
P Z 0.6745
P 0.6745 Z 0.6745
F 0.6745 F 0.6745
F 0.67 1 F 0.67
0.7486 1 0.7486
0.497
(12) If the mean of breaking strength of copper wire is 575 lbs, with a standard deviation of 8.3 lbs. How
large a sample must be used in order that there will be one chance in 100 that the mean breaking
strength of the sample is less than 572 lbs
X
Here 575 and 8.3 By central limit theorem, Z
n
We have to find the value of n such that P X 572
1
100
Now PX 572 0.01
X 572
P 0.01
n
n
572 575
P Z 0.01
8 . 3
n
3 n
P Z 0.01
8 . 3
3 n
F 0.01
8.3
3 n
1 F 0.01
8 . 3
3 n
F 0.99
8.3
3 n
2.33
8.3
2.33 8.3
2
n 41.55 42
3
(13)
Exercise:
(1) Define (i) Population (ii) Sample (iii) Large sample (iv) Small sample (v) Random sample
(2) Define (i) Sampling (ii) Sampling Distribution of the Mean (iii) finite population correction factor
(iv) Standard error of the mean (v) Probable error of the mean
(3) State (i) Chebyshev’s Theorem (ii) Central Limit Theorem
(4) Write all possible samples of size 3 without replacement from the population 1, 2, 3, 4 .
Also compute the means of all these samples
(5) Write all possible samples of size 3 without replacement from the population 1, 2, 3, 4 .
Also compute the means of all these samples
(6) Find the finite population correction factor for n 5 and N 1000
(7) Find the finite population correction factor for n 25 and N 1000
(8) What is the effect on standard error, if a sample is taken from an infinite population and its size is
decreased from 800 to 200?
(9) What is the effect on standard error, if a sample is taken from an infinite population and its size is
increased from 300 to 2700?
(10) A population consists 4 numbers 3, 7, 11, 15
(i) Find the Mean and Variance of the population
(ii) Write all possible samples of size 2 with replacement
(iii) Write the Sampling distribution of the mean (SDM)
(iv) Find Mean and Variance of the Sampling distribution of the mean (SDM)
(v) Verify the results in (iv) with suitable formula
(11) A population consists 4 numbers 3, 7, 11, 15
(i) Find the Mean and Variance of the population
(ii) Write all possible samples of size 2 without replacement
(iii) Write the Sampling distribution of the mean (SDM)
(iv) Find Mean and Variance of the Sampling distribution of the mean (SDM)
(v) Verify the results in (iv) with suitable formula
(12) If X is the mean of a random sample of size n, taken from the population 1, 2, 3, N , find the
mean and variance of the Sampling distribution of the mean (SDM)
(13) Construct sampling distribution of means for the population 5, 10, 12, 18 by drawing samples of size
2 without replacement. Find (i) Mean of the population (ii) Standard deviation of the population (iii)
Mean of the sampling distribution of means (iv) Standard deviation of the sampling distribution of
means
(14) A random sample of size 100 is taken from a normal population with mean 76 and standard deviation
16. Find the probability that the mean of the sample will (i) exceed 77 (ii) fall between 75 and 78
(15) A random sample of size 81 is taken from a normal population with mean 65 and standard deviation
10. Find the probability that the mean of the sample will (i) exceed 67 (ii) fall between 66 and 68
(16) A random sample of size 36 is taken from a normal population with mean 155 and standard deviation
15. Find the probability that the mean of the sample will be (i) less than 157 (ii) in between 153 and
158 (iii) greater than 160
(17) A sample of size 400 is taken from an infinite population with Standard deviation 16. Find the
Standard error and Probable error of the mean
(18) If the mean of breaking strength of copper wire is 676 lbs, with a standard deviation of 12 lbs. How
large a sample must be used in order that there will be one chance in 100 that the mean breaking
strength of the sample is less than 672 lbs
(19) The mean of certain normal population is equal to the standard error of the mean of samples of size
64. Find the probability that the mean of the sample size 36 will be negative
(20)
Sampling Distribution of the Mean with unknown (SDM with unknown ):
X
If we do not know the value of , then we cannot use the Central limit theorem Z .
n
In this case we use sample standard deviation ' s ' in place of population standard deviation so that we
X
have a random variable different from Z . This new random variable is denoted by t ; that is t .
s
n
The probability distribution corresponding to this random variable t is called t -distribution with parameter
n 1 . This parameter is known as degrees of freedom, denoted by ; that is, n 1.
Properties of t - distribution:
t - Notation:
If 0 then t is a point on t – axis such that P(t t ) or P(t t )
That is, the area between t – axis and the curve from t to is
(or the area between t – axis and the curve from to t is )
Note:
(1) t1 , t0 , t 1 0
2
(2) t t1 0 or t t1 or t t1
Testing a claim using t -distribution: To test a given claim using t -distribution, we follow the rule given
below.
(i) If t t0.005 then the claim is accepted
(ii) If t t0.005 then the claim is rejected
t - Table:
In this table the values of t are available for different values of and
Example:
1
t2 2
1
Note: The pdf of t -distribution is given by f (t )
1
,
2 2
1
For 1 we have f (t ) (1 t 2 ) 1 for t
Problems:
(1) Find (i) t0.05 when 12 (ii) t 0.01 when 8 (iii) t 0.995 when 10
(2) (i) Find t 0.975 when 13 (ii) Find t 0.99 when 10 (iii) Find t0.95 when 11
(3) Find (i) P(t 2.365) when 7 (ii) P(t 1.318) when 24
(iii) P(t 2.567) when 17 (iv) P(1.356 t 2.179) when 12
(i) When 7 , P(t 2.365) P(t t0.025 ) 1 P(t t0.025 ) 1 0.025 0.975
(ii) When 24 , P(t 1.318) P(t t0.1 ) 0.1
(iii) When 17 , P(t 2.567) P(t t0.01 ) 1 P(t t0.01 ) 1 P(t t0.01 ) 1 0.01 0.99
(iv) When 12 , P(1.356 t 2.179) P(t0.1 t t0.025 ) 1 (0.1 0.025) 0.875
P(t0.1 t t0.025 ) P(t t0.1 ) P(t t0.025 ) P(t t0.9 ) P(t t0.025 ) 0.9 0.025 0.875
(i) P(t t0.025 ) (ii) P(t t0.1 ) (iii) P(t t0.01 ) (iv) P(t0.1 t t0.025 )
(4) A random sample of size 25 from a normal population has mean x 47.5 and the standard deviation
s 8.4 . Does this information tend to support or refute the claim that the mean of the population is
42.1
Note: In place of x 0.515 , if we take x 0.48 then t 4.0754 , t t0.005 and therefore we
need to reject the claim.
1
(6) The t distribution with 1 degree of freedom is given by f (t ) (1 t 2 ) 1 for t . Verify
the value given for t0.05 when 1 in the table.
From the tables, when 1, t0.05 6.314
Now P(t 6.314)
1
f (t )dt
1
1 t 2
1
dt tan 1 t
6.314
6.314 6.314
1
1
tan 1
() tan 1 (6.314)
2
7 11
1.4137
1.4137 0.05018 0.05
22 7
(7)
Exercise:
(1) (i) Find t 0.025 when 14 (ii) Find t 0.01 when 10 (iii) Find t 0.995 when 7
(2) (i) Find t 0.99 when 6 (ii) Find t 0.975 when 24 (iii) Find t 0.975 when 19
(3) Find (i) P(t 2.821) when 9 (ii) P(t 2.947) when 15
(iii) P(t 1.729) when 19 (iv) P(1.714 t 2.069) when 23
(4) Find k for a sample of size 24 from a normal population such that (i) P(2.069 t k ) 0.965
(ii) P(k t 2.807) 0.095 (iii) P(k t k ) 0.9
(5) A random sample of size 25 from a normal population has mean x 45.4 and the standard deviation
s 9.7 . Does this information tend to support or refute the claim that the mean of the population is
40
(6) A process for making certain ball bearings is under control if the diameters of the bearings have a
mean of 0.5 cm. What can you say about the process if a random sample of 10 of these bearings has a
mean diameter of 0.506 cm and standard deviation of 0.004 cm. ( n 10 , t 0.005 3.25 , t 4.7434
and reject the claim)
(7) The tensile strength (1,000 psi) of a new composite can be modeled as a normal distribution. A
random sample of size 25 specimens has mean x 45.3 and standard deviation s 7.9 . Does this
information tend to support or refute the claim that the mean of the population is 40.5 ?
(8) The process of making concrete in a mixer is under control if the rotations per minute of the mixer
have a mean of 22 rpm. What can we say about this process if a sample of 20 of these mixers has a
mean rpm of 22.75 rpm and a standard deviation of 3 rpm?
(9) A manufacturer of fuses claims that with 20% overload, the fuses will blow in 12.4 minutes on the
average. To test this claim, a sample of 20 of the fuses was subjected to a 20% overload, and the
times it took them to blow had a mean of 10.63 minutes and standard deviation of 2.48 minutes. If it
can be assumed that the data constitute a random sample from a normal population, do they tend to
support or refute the manufacturer’s claim
1
(10) The t distribution with 1 degree of freedom is given by f (t ) (1 t 2 ) 1 for t . Verify
the value given for t0.1 when 1 in the table.
The frequency distribution of the variances of all random samples of fixed size, taken from a population is
called the Sampling Distribution of the Variance (SDV). It is denoted by S 2 .
Examples:
1 n n
The variance xi x 2 or 1 xi 2 ( x ) 2
n i 1 n i 1
The variances of these 6 samples are respectively given as below.
1 2
2
1 2 2 1.5 2.5 2.25 0.25
2
1 2
2
1 32 2 5 4 1
2
1 2
2
1 4 2 2.5 8.5 6.25 2.25
2
1 2
2
2 32 2.5 6.5 6.25 0.25
2
1 2
2
2 4 2 3 10 9 1
2
1 2
2
3 4 2 3.5 12.5 12.25 0.25
2
X X
n
2
(n 1) S 2 i
normal population having variance 2 , then 2 i 1
is a random variable having
2 2
the Chi-square distribution with the parameter n 1.
2 Distribution: A continuous random variable having the probability density function given by
x
1 1
f ( x) for x 0 and 0 , is called the Chi-square ( ) random variable with
2
x2 e 2
2 2
2
parameter known as degrees of freedom. The probability distribution is called 2 Distribution.
Note:
(i) (n) e x dx e x x 2 n1dx
x n 1 2
0 0
(ii) (1) 1 and (n) (n 1) ! for n 2, 3, 4,
x x
1 1
(iii) If 2 , then f ( x) e 2 e 2 for x 0
21 2
x x
1 1
(iv) If 4 , then f ( x) xe 2 xe 2 for x 0
2 2
2
4
x x
1 1 2 2
(v) If 6 , then f ( x) x 2e 2 x e for x 0
2 3
3
16
Properties of 2 Distribution:
(i) The curve given by the probability density function is called the 2 curve.
(ii) The 2 curve is lies in the 1st quadrant
(iii) It is not symmetric about any axis and has the following shape.
(iv) It depends on the value of
(v) P 2 0 1 ; That is, the area between 2 – axis and
the curve from 0 to is 1
2 Notation:
If 0 then 2 is a point on 2 – axis such that P( 2 2 )
That is, the area between 2 – axis and the curve from 2 to is
2 - Table:
In this table the values of 2 are available for different values of and
Example:
Testing a claim using 2 -distribution: To test a given claim using 2 -distribution, we follow the rule
given below.
(i) If 2 02.005 then the claim is accepted
(ii) If 2 02.005 then the claim is rejected
Problems:
(1) Find (i) 02.025 when 12 (ii) 02.05 when 8 (iii) 02.99 when 16
Solution:
(i) When 12 , 02.025 23.337
(ii) When 8 , 02.05 15.507
(iii) When 16 , 02.99 5.812
(2) Find (i) P( 2 12.833) when 5 (ii) P( 2 3.325) when 9 (iii) P( 2 13.12) when
25
Solution:
(i) When 5 , P( 2 12.833) P( 2 02.025 ) 1 P( 2 02.025 ) 1 0.025 0.975
(ii) When 9 , P( 2 3.325) P( 2 02.95 ) 0.95
(iii) When 25 , P( 2 13.12) P( 2 02.975 ) 1 P( 2 02.975 ) 1 0.975 0.025
(3) Find (i) P 2.088 2 16.919 when 9 (ii) P 7.564 2 35.718 when 17
Solution:
(i) When 9 , P 2.088 2 16.919 P 02.99 2 02.05
P 2 02.99 P 2 02.05
0.99 0.05 0.94
(5) The claim that the variance of a normal population 2 18.5 is rejected if the standard deviation of a
random sample of size 25 exceeds 5.7559. What is the probability that the claim will be rejected even
though 2 18.5 .
Solution:
Here n 25, n 1 24 and 2 18.5
Given that the claim is rejected if S 5.7559
The probability that the claim will be rejected is given by
P(S 5.7559) P(S 2 33.1304)
(n 1) S 2 (n 1)
P 33.1304
2
2
33.1304 24
P2
18.5
P 2 42.9799 P 2 02.01 0.01
(6) A random sample of 10 observations is taken from a normal population having the variance
2 42.5. Find the approximate probability of obtaining a sample standard deviation between 3.14
and 8.94.
(n 1) S 2
Solution: Here n 10 and n 1 9 . Also 2 42.5 , we know that 2 2
The probability of obtaining a sample standard deviation between 3.14 and 8.94 is given by
P(3.14 S 8.94) P(9.8596 S 2 79.9236)
(n 1) (n 1) S 2 (n 1)
P 9.8596 79.9236
2
2
2
9.8596 9 79.9236 9
P 2
42.5 42.5
P 2.0879 16.9249
2
P 02.99 2 02.05
P 2 02.99 P 2 02.05
0.99 0.05 0.94
1 2x
(7) The Chi-square distribution with 4 degrees of freedom is given by f ( x) 4 xe x0
0 x0
Find the probability that the variance of a random sample with 12 will exceed 180.
Solution: Here 4 and 12
The probability that the variance of a random sample will exceed 180 is given by
(n 1) S 2 (n 1)
P(S 180) P
2
180
P 2
180 4
P 2 5
144
2 2
2x x
1 e
x
f ( x)dx xe 2 dx x
1 1 e 2
4 1
1
2
4
2
5 5
2 5
1
5 5 5
7 2
0 10e 4e e
2 2
4 2
(8)
Exercise:
(1) Find (i) 02.05 when 6 (ii) 02.99 when 10 (iii) 02.025 when 14 (iv) 02.01 when 21
(2) Determine the probabilities: (i) P( 2 27.688) when 13 (ii) P( 2 18.475) when 7
(iii) P( 2 14.256) when 29
(3) Find (i) P 12.461 2 48.278 when 28 (ii) P 28.685 2 29.141 when 14
(4) A random sample of 12 observations is taken from a normal population having the variance
2 42.5. Find the approximate probability of obtaining a sample variance between 10.057 and
84.691. (Hint: P(10.057 s 2 84.691) P 2.6029 2 21.92 P 02.995 2 02.025 0.97 )
(5) The claim that the variance of a normal population 2 4 is rejected if the variance of a random
sample of size 9 exceeds 7.7535. What is the probability that the claim will be rejected even though
2 4.
(6) A random sample of 15 observations is taken from a normal population having variance 2 90.25.
Find the approximate probability of obtaining a sample standard deviation between 7.25 and 10.75.
(7) A manufacturer claims that any of his lists of items cannot have variance more than 1. A sample of
25 items has a variance of 1.2 . Test whether the claim of the manufacturer is correct.
1 2x
(8) The Chi-square distribution with 2 degrees of freedom is given by f ( x) 2 e x0
0 x0
Find the probability that the standard deviation of a random sample with 10 will exceed 10 2 .
(Hint: P( S 10 2 ) P( S 200) P 4 f ( x)dx )
2 2
4
F Distribution: A continuous random variable having the probability density function given by
1
2
1 2 1
1 1
2
2 x2
f ( x) for x 0, 1 0 and 2 0 , is called the F random variable
1 2 1 2
1 2
2 2 1 x
2
with parameters 1 and 2 known as degrees of freedoms. The probability distribution is called F
Distribution.
Note:
(n) e x dx e x x 2 n1dx
x n 1 2
(i)
0 0
(ii) (1) 1 and (n) (n 1) ! for n 2, 3, 4,
1
(iii) If 1 2 and 2 2 , then f ( x) for x 0
(1 x) 2
6x
(iv) If 1 4 and 2 4 , then f ( x) for x 0
(1 x) 4
Properties of F Distribution:
(i) The curve given by the probability density function is called the F curve.
(ii) The F curve is lies in the 1st quadrant
(iii) It is not symmetric about any axis and has the following shape.
(iv) It depends on the values of 1 and 2
(v) PF 0 1 ; That is, the area between F – axis and
the curve from 0 to is 1
F Notation:
If 0 then F is a point on F – axis such that P( F F )
That is, the area between F – axis and the curve from F to is .
The value of F corresponding to 1 and 2 is denoted by F ( 1 , 2 )
1
Note: F ( 1 , 2 )
F1 ( 2 , 1 )
Theorem: If S12 and S 22 are the variances of independent random samples of sizes n1 and n2 , taken from
S12
two normal populations having the same variance, then F is a random variable having the
S 22
F distribution with the parameters 1 n1 1 and 2 n2 1 .
S 22
Note: If F , then the parameters are in the order of 2 n2 1 and 1 n1 1
S12
F0.05 Tables: In this table the values of F0.05 ( 1 , 2 ) are available for different values of 1 and 2
F0.01 Tables: In this table the values of F0.01 ( 1 , 2 ) are available for different values of 1 and 2
Example:
Problems:
(1) For an F distribution find
(i) F0.05 with 1 7 and 2 15
(ii) F0.01 with 1 25 and 2 19
(iii) F0.95 with 1 19 and 2 25
(iv) F0.99 with 1 22 and 2 12
Solution:
(i) F0.05 (7,15) 2.71
(ii) F0.01 (25,19) 2.91
1 1 1
(iii) F0.95 (19, 25) 0.4739
F10.95 (25,19) F0.05 (25,19) 2.11
1 1 1
(iv) F0.99 (22,12) 0.3205
F10.99 (12, 22) F0.01 (12, 22) 3.12
(2) If two independent random samples of size n1 7 and n2 13 are taken from a normal population,
what is the probability that the variance of the first sample will be at least three times as large as that
of the second sample?
Solution:
Here n1 7 and n2 13 , so 1 6 and 2 12
Therefore, the probability that the variance of the first sample will be at least three times as large as
that of the second sample is given by
S2
P( S12 3S 22 ) P 12 3 P( F 3) P( F F0.05 ) 0.05 F0.05 (6,12) 3
S2
(3) If two independent random samples of size n1 8 and n2 8 are taken from a normal population,
what is the probability that the variance of the first sample will be at least seven times as large as that
of the second sample?
Solution:
Here n1 8 and n2 8 , so 1 7 and 2 7
Therefore, the probability that the variance of the first sample will be at least seven times as large as
that of the second sample is given by
S12
P( S1 7 S 2 ) P 2 7 P( F 7) P( F F0.01 ) 0.01
2 2
F0.01 (7, 7) 6.99 7
2
S
(4) If two independent random samples of size n1 13 and n2 26 are taken from a normal population,
what is the probability that the variance of the second sample will be at least 2.5 times as large as that
of the first sample?
Solution:
Here n1 13 and n2 26 , so 1 12 and 2 25
Therefore, the probability that the variance of the second sample will be at least 2.5 times as large as
that of the first sample is given by
S 22
P( S 2 2.5 S1 ) P 2 2.5 P( F 2.5) P( F F0.05 ) 0.05
2 2
F0.05 (25,12) 2.5
1S
(5)
(6)
Problems:
(1) For an F distribution find
(i) F0.05 with 1 20 and 2 10
(ii) F0.01 with 1 20 and 2 5
(iii) F0.95 with 1 15 and 2 12
(iv) F0.95 with 1 12 and 2 15
(v) F0.99 with 1 5 and 2 20
(vi) F0.99 with 1 20 and 2 5
(2) If two independent random samples of size n1 26 and n2 8 are taken from a normal population,
what is the probability that the variance of the second sample will be at least 2.4 times as large as that
of the first sample?
(3) If two independent random samples of size n1 9 and n2 16 are taken from a normal population,
what is the probability that the variance of the first sample will be at least four times as large as that of
the second sample?
(4) If two independent random samples of size n1 13 and n2 7 are taken from a normal population,
what is the probability that the variance of the first sample will be at least four times as large as that of
the second sample?
B.Tech IV Semester (2020 Batch)
STATISTICAL METHODS (20BM1109)
(For CSE-DS)
Units – 4: Estimation and Test of Hypothesis of Means
(Point estimation, interval estimation, test of hypothesis, hypothesis concerning one mean, hypothesis
concerning two means, matched pair comparisons)
Point Estimation: Estimating a population parameter in terms of a single numerical value is called Point
Estimation.
Estimator: If a statistic is used to estimate a population parameter , then is called an estimator for
Unbiased Estimator: A statistic is called an unbiased estimator for a population parameter if the
mean of the sampling distribution the statistic is ; that is or E
For example,
(i) We know that X or EX
Therefore, the statistic X is an unbiased estimator for the population parameter
In other words, sample mean is an unbiased estimator for the population mean
(ii) Similarly,
1 n
X i X 2 is an unbiased estimator for the population variance 2
n 1 i 1
X np
(iii) If X is Binomial random variable with parameters n and p , then we have E p
n n
X
Therefore, the statistic is an unbiased estimator for the population parameter p
n
X
Here p is called proportion and is called sample proportion
n
More efficient unbiased estimator: Let be a population parameter and 1 , 2 be two statistics such that
(i) 1 and 2 are unbiased estimators for ; that is, E1 and E 2
(ii) The variance of the sampling distribution of the statistic 1 is less than that of the statistic 2 ; that is,
V 1 V 2
Then 1 called more efficient unbiased estimator than 2 for
Maximum Error of the Mean:
Note: For a large sample of size n, verify that the probability is 1 that the mean of a random sample
from an infinite population with standard deviation differ from by less than the maximum error E ;
That is, P X E 1
Consider,
P X E P X z
P
X
z
2 n 2
n
P Z z P z Z z
2 2 2
F z F z F z 1 F z
2 2 2 2
1 1 1
2 2
1
Let x be the mean of a random sample of size n, taken from a population having mean and variance 2
and E be the maximum error of the mean. Then
x E are called the Upper and Lower confidence limits of the Mean with the probability 1 or
1 100% confidence.
And
Note:
(1)* z 0.01 2.33
(2)* z0.05 1.645
(3)* z0.005 2.575
(4)* z0.025 1.96
Examples:
(1) What is the maximum error one can expect with probability 0.90 when using the mean of a random
sample size n 64 to estimate the mean of a population with 2 2.56
The Maximum error of the Mean, E z
2 n
1.6
1.645 0.329
64
(2) A sample of size 10 with standard deviation 0.03 was taken from a population. Find the maximum
error with 95% confidence
s
The Maximum error of the Mean, E t
2 n
0.03
2.262 0.0215
10
(3) The Mean and standard deviation of a sample of size n 50 are 11,795 and 14,054 respectively, find
95% confidence interval for mean
s
The Maximum error of the Mean, E z
2 n
14054
1.96 3895.57
50
Lower confidence limit, x E 11795 3895.57 7899.43
Upper confidence limit, x E 11795 3895.57 15690.57
Therefore, confidence interval x E, x E 7899.43, 15690.57
(4) The Mean and Variance of a sample of size n 300 are 54 and 225 respectively, find 95%
confidence interval for mean
s
The Maximum error of the Mean, E z
2 n
15
1.96 1.6974
300
Lower confidence limit, x E 54 1.6974 52.3026
Upper confidence limit, x E 54 1.6974 55.6974
Therefore, confidence interval x E, x E 52.3026, 55.6974
(5) The Mean and standard deviation of a sample of size n 100 are 155 and 16 respectively, find 99%
confidence interval for the population mean
s
The Maximum error of the Mean, E z 2.575 16 4.12
2 n 100
Lower confidence limit, x E 155 4.12 150.88
Upper confidence limit, x E 155 4.12 159.12
Therefore, confidence interval x E, x E 150.88, 159.12
(6) The Mean and standard deviation of a sample of size n 23 are 68 and 10 respectively, find 99%
confidence interval for the population mean
s
The Maximum error of the Mean, E t
2 n
10
2.819 5.878
23
Lower confidence limit, x E 68 5.878 62.122
Upper confidence limit, x E 68 5.878 73.878
Therefore, confidence interval x E, x E 62.122, 73.878
(7) Ten bearings made by a certain process have a mean diameter of 0.506cm with a standard deviation
of 0.004cm. Assuming the data may be taken as a random sample from a normal distribution,
construct a 95% confidence interval for the actual average diameter of the bearings.
s
The Maximum error of the Mean, E t
2 n
0.004
2.262 0.0029
10
Lower confidence limit, x E 0.506 0.0029 0.5031
Upper confidence limit, x E 0.506 0.0029 0.5089
Therefore, confidence interval x E, x E 0.5031, 0.5089
(8) Five independent measurements of the flash point of Diesel oil gave the values 144, 147, 146, 142,
144. Assuming normality, determine a 99% confidence interval for the mean
1 n
144 147 146 142 144
Sample mean, x
n i 1
xi
5
144.6
1 n
Sample variance, s 2 x i x 2
n 1 i 1
144 144.62 147 144.62 146 144.62 142 144.62 144 144.62
4
0.36 5.76 1.96 6.76 0.36 15.2
3.8
4 4
Sample standard deviation, s 3.8 1.95
(SD mode; 144M+147M+…144M+; shift2; 1=….; for clearing data: on shift mode 3 = =)
s
The Maximum error of the Mean, E t
2 n
1.95
4.604 4.02
5
Lower confidence limit, x E 144.6 4.02 140.58
Upper confidence limit, x E 144.6 4.02 148.62
Therefore, confidence interval x E, x E 140.58, 148.62
(9) The dean of a college wants to use the mean of a random sample to estimate the average amount of
time students take to get from one class to the next, and she wants to be able to assert with 99%
confidence that the error is at most 0.25 minute. If it can be presumed from experience that =1.40
minutes, how large a sample will she have to take?
The Maximum error of the Mean, E z
2 n
z
2
2 2.575 1.4 2
Therefore, n 14.42 207.9364 208
2
E 0.25
1 n 2 1 n 2
(10) E X i X E X i X
n i 1 n i 1
1
2
E X i 2 X i X X
n
2
n i 1
1 n 2
E X i 2 X i X X
n n
2
n i 1 i 1 i 1
1 n
E X i 2X X i X 1
n n
2 2
n i 1 i 1 i 1
1 n 2
E X i 2X n X n n X
2
n i 1
1 n 2
E X i 2n X n X
2 2
n i 1
1 n 2
E X i n X
2
n i 1
1 n
n i 1
E X i n E X
2 2
1
n
n 2 n X2
1
n 2 2
X2
2
n n
n 1 2
n
2
Therefore, X i X is not an unbiased estimator for 2
1 n 2
n i 1
1 n
But E X i X 2 2
n 1 i 1
X i X 2 is an unbiased estimator for 2
n
1
Therefore,
n 1 i 1
(11)
(12)
Exercise:
(1) What is the maximum error one can expect with probability 0.99 when using the mean of a random
sample of size n 64 to estimate the mean of a population with 2 2.56
(2) What is the maximum error one can expect with probability 0.98 when using the mean of a random
sample of size n 15 with s 2 1.96 to estimate the mean of a population
(3) A sample of size 49 with mean 60 was taken from a population whose S.D. is 10. Find 95%
confidence interval for population mean
(4) A random sample of size n=100 is taken from a population with σ = 5. Given that the sample mean is
x 21.6. Construct a 95% confidence interval for the population mean
(5) Random sample of size 81 was taken from a population; whose variance is 20.25 and mean is 32.
Construct 98% confidence interval for true mean
(6) The Mean and standard deviation of a sample of size n 18 are 75 and 16 respectively, find 98%
confidence interval for the population mean
(7) Nine bearings made by a certain process have a mean diameter of 0.506cm with a standard deviation
of 0.004cm. Assuming the data may be taken as a random sample from a normal distribution,
construct a 99% confidence interval for the actual average diameter of the bearings.
(8) In an air pollution study, the following amounts of suspended benzene soluble organic matter (in
micrograms per cubic meter) were obtained at an experiment station for eight different samples of air:
2.2, 1.8, 3.1, 2.0, 2.4, 2.0, 2.1 and 1.2. Construct a 95% confidence interval for the corresponding true
mean
Examples:
1. Average height of the students in a Class is 5.8ft; that is 5.8
2. Average height of the students in a Class is at most 6.2ft; that is 6.2
3. Average height of the students in a Class is at least 5.1ft; that is 5.1
4. 15
5. 20
6. Variance of the marks obtained by the students of SSC in Mathematics is 21; that is 2 18
7. 2 8
8. 2 8
Classification of Hypothesis: It is classified as two types (i) Simple Hypothesis (ii) Composite Hypothesis
Simple Hypothesis: A hypothesis which gives complete information about a population parameter is
called a simple hypothesis. In other words, a hypothesis which contains the symbol ' ' is called a simple
hypothesis.
1. 15 is composite
2. 15 is simple
3. 20 is composite
4. 2 18 is simple
5. 2 8 is composite
Testing of Hypothesis: Verifying the validity of a hypothesis using a given sample data is called a Testing
of Hypothesis
Errors in Testing of Hypothesis: If a hypothesis is true and accepted or a hypothesis is false and rejected,
then in either case there is no error in the decision. Otherwise, there is an error in the decision and this error
is in types. (i) Type I Error (ii) Type II Error
Type I Error: If a hypothesis is true but rejected, then there is an error in rejecting is called Type I error.
The probability of obtaining Type I error is called Level of Significance (LoS) and it is denoted by
Type II Error: If a hypothesis is false but accepted, then there is an error in accepting is called Type II
error. The probability of obtaining Type II error is denoted by
Null Hypothesis (NH): In the testing of a hypothesis, a hypothesis which is formulated for the sake
rejection under the assumption that it is true. Null hypothesis is denoted by H 0
Note: Usually a null hypothesis is simple
Alternative Hypothesis (AH): In the testing of a hypothesis, a hypothesis which is not the null hypothesis
is called Alternative hypothesis and it is denoted by H 1
Note: Usually Alternative hypothesis is composite
Examples:
1. Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
2. Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
3. Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
4. Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
5. Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
Classification of Tests Hypothesis: Tests of hypothesis are classified as two types (i) One Tailed Test
(OTT) (ii) Two Tailed Test (TTT)
And One Tailed Tests are classified as two types (i) Left One Tailed Test (LOTT) (ii) Right One Tailed
Test (ROTT)
Left One Tailed Test (LOTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains the
symbol ' ' , then the test is called Left One Tailed Test
Right One Tailed Test (ROTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains
the symbol ' ' , then the test is called Right One Tailed Test
Two Tailed Test (TTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains the
symbol ' ' , then the test is called Two Tailed Test
Examples:
1. In a testing of a hypothesis, if
Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
Then it is One Tailed Test or Left One Tailed Test
2. In a testing of a hypothesis, if
Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
Then it is One Tailed Test or Right One Tailed Test
3. In a testing of a hypothesis, if
Null hypothesis H 0 : 15
Alternative hypothesis H1 : 15
Then it is Two Tailed Test
Critical region: The region under a probability curve in which H 0 is rejected, is called Critical region
Guidelines for formulating H 0 and H 1 : When the goal of an experiment is to establish an assertion, the
negation of the assertion should be taken as the Null hypothesis H 0 . The assertion becomes the Alternative
hypothesis H 1 . In detail we have the following.
Note:
(1)* z 0.01 2.33
(2)* z0.05 1.645
(3)* z0.005 2.575
(4)* z0.025 1.96
H1 Reject H 0 if
0 Z z
0 Z z
0 Z z
2
Test statistics and Critical regions for tests of Hypotheses for ONE MEAN
Examples:
(1) According the norms established for a mechanical aptitude test, persons who are 18 years old should
average 73.2 with a standard deviation of 8.6. If 45 randomly selected persons of that age averaged
76.7; test the null hypothesis 73.2 against the alternative hypothesis 73.2 at the 0.01 level of
significance
Here n 45 large sample , 73.2, 8.6, x 76.7 and 0.01
(i) Null Hypothesis H 0 : 73.2
(ii) Alternative Hypothesis H 1 : 73.2
(iii) Level of Significance : 0.01
X
(iv) Test statistic : Z
n
(v) Criterion : Reject H 0 if Z z and z z 0.01 2.33
(vi) Calculation :
X 76.7 73.2
Z 2.73 And z z 0.01 2.33
8.6
n 45
(vii) Decision: Since Z z , reject H 0 based on the sample data at 0.01
(3) It is claimed that a random sample of 49 tires has mean life of 15200 km. This sample was drawn
from a population whose mean is 15150 km and standard deviation 1200 km. Test the claim at 0.05
level of significance
(5) A sample of 400 items is taken from a population whose standard deviation is 10. The mean of the
sample is 40. Test whether the sample has come from a population with mean 38 at 0.05 level of
significance
We have to test 38
Here n 400 large sample , 38, s 10, x 40 and 0.05
(i) Null Hypothesis H 0 : 38
(ii) Alternative Hypothesis H 1 : 38
(iii) Level of Significance : 0.05
X
(iv) Test statistic : Z
s
n
(v) Criterion : Reject H 0 if Z z and z z 0.025 1.96
2 2
(vi) Calculation :
X 40 38
Z 4, Z 4 And z z 0.025 1.96
s 10
2
n 400
Decision: Since Z z , reject H 0 based on the sample data at 0.05
2
(6) A trucking firm is suspicious of the claim that the average life time of certain tires is at least 28,000
miles. To check the claim, the firm puts 40 of these tires on its trucks and gets a mean life time of
27,463 miles with a standard deviation of 1,348 miles. What can it conclude if the probability of
Type I error is to be at most 0.01?
(7) The specifications for a certain kind of ribbon call for a mean breaking strength of 180 pounds. If five
pieces of the ribbon have a mean breaking strength of 169.5 pounds with a standard deviation of 5.7
pounds, test the null hypothesis 180 pounds against the alternative hypothesis 180 pounds at
the 0.01 level of significance. Assume that population distribution is normal
(8) A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with standard
deviation 648 psi. Use this information and the level of significance 0.05, test whether the true
average compressive strength of the steel from which this sample came is 58,000 psi. Assume
normality
(9) A sample of 26 bulbs gives a mean life of 990 hours with standard deviation of 20 hours. The
manufacturer claims that the mean life of bulbs is at least1000 hours. Is the sample not up to the
standard at level of significance 0.05
(10) The heights of 10 males of given locality are found to be 70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches.
Is it reasonable to believe that average height is greater than 64 inches 5% level of significance?
70 662 67 662 62 662 68 662 64 662 66 662
9
90
10
9
Sample standard deviation, s 10 3.162
( SD mode, 70M+67M+…66M+,shift2, 1=….for clearing data: on shift mode 3 ==)
(11) A random sample of size 16 values from a normal population showed a mean 53 and sum of squares
of deviations from the mean equals to 150. Can this sample be regarded as taken from the population
having mean 56 at 5% level of significance?
n
Here n 16 small sample , n 1 15, 56, x 53, x x 150 and 0.05
2
i
i 1
n
Sample variance, s 2
1
xi x 2 150 10
n 1 i 1 15
Sample standard deviation, s 10 3.162
(1) Define (i) Hypothesis (ii) Simple hypothesis (iii) Composite hypothesis (iv) Null hypothesis
(v) Alternative hypothesis
(2) Explain the errors in the testing a hypothesis
(3) Write the guidelines for formulating H 0 and H 1
(4) Explain the procedure for testing a hypothesis
(5) Write the test statistic and criteria for the testing a hypothesis concerning one mean large and small
samples
(6) According the norms established for a mechanical aptitude test, persons who are 18 years old should
average 76.4 with a standard deviation of 9.2. If 49 randomly selected persons of that age averaged
70.2. Test the null hypothesis 76.4 against the alternative hypothesis 76.4 at the 0.01 level
of significance
(7) Tests performed with a random sample of 40 diesel engines produced by large manufacturer show
that they have mean thermal efficiency of 31.4% with a standard deviation of 1.6%. At the 0.01 level
of significance, test the null hypothesis 32.3% against the alternative hypothesis 32.3%
(8) In 64 randomly selected hours of production, the mean and the standard deviation of the number of
acceptable pieces produced by a automatic stamping machine are x 1038 and s 146 . At the 0.05
level of significance, does this enable us to reject the null hypothesis 1000 against the alternative
hypothesis 1000
(9) A company producing computers states that the mean life time of its computers is 1600 hours. Test
the claim at 0.01 level of significance against alternative hypothesis H1 : 1600 hrs, if 100
computers produced by this company has mean life time of 1570 hours with SD of 120 hours
(10) In a labor-management discussion it was brought up those workers at a certain large plant take on
average 32.6 minutes to get to work. If a random sample of 60 workers took on the average 34.1
minutes with a standard deviation of 6.1 minutes, can we reject the null hypothesis 32.6 in a
favor of the alternative hypothesis 32.6 at the (i) 0.05 (ii) 0.01 level of significance
(11) An ambulance service claims that it takes on the average less than 10 minutes to reach its destination
in emergency calls. A sample of 36 calls has a mean of 9 minutes and the variance of 16 minutes.
Test the significance at 0.05 level Hint n 36, H 0 : 10, H1 : 10
(12) A sample of 64 students has a mean weight of 70 kgs. Can this be regarded as a sample from a
population with mean weight 56 kgs and standard deviation 25 kgs. Test at level of significance 0.05
Hint n 64, H 0 : 56, H1 : 56
(13) A sample of 400 individuals is found to have mean height of 67.47 inches. Can it be reasonably
regarded as a sample from a large population with mean height of 67.39 inches and standard
deviation 1.30 inches? Hint n 400, H 0 : 67.9, H1 : 67.9
(14) A lady stenographer claims that she can take dictation at the rate of 120 words per min. Can we reject
her claim on the basis of 100 trials in which she demonstrates a mean of 116 words with a standard
deviation of 15 words Hint n 100, H 0 : 120, H1 : 120
(15) A company is making engine parts with axle diameter of 0.7 inch. A random sample of 10 parts
shows a mean diameter of 0.742 inch with a standard deviation of 0.04 inch. Test whether the work
meet the specification at 0.05 level of significance. Hint n 10, H 0 : 0.7, H 1 : 0.7
(16) A machine is designed to produce insulating washers for electrical devices of average thickness of
0.025 cm. A random sample of 10 washers was found to have a thickness of 0.024 cm with a standard
deviation of 0.002 cm. Test the significance of the deviation at the 0.05 level.
Hint n 10, H 0 : 0.025, H1 : 0.025
(17) The average breaking strength of certain kind of steel rods is 18.5 thousands of pounds. To test this, a
sample of 14 steel rods of this kind was tested. The mean and standard deviation are respectively
obtained as 17.85 and 1.955 thousands of pounds. Test at 0.05 level of significance.
Hint n 14, H 0 : 18.5, H1 : 18.5
(18) A random sample from a company’s very extensive files shows that the orders for a certain kind of
machinery were filled, respectively in 10, 12, 19, 14, 15, 18, 11 and 13 days. Use the level of
significance 0.01 to test the claim that on the average such orders are filled in 10.5 days. Choose the
alternative hypothesis so that rejection of null hypothesis implies that it takes longer than indicated.
Hint n 8, H 0 : 10.5, H1 : 10.5, x 14, s 3.207
(19) A manufacturer of certain kind of electric bulbs claims that his bulbs have a mean life of 25 months.
A random sample of 6 bulbs gave the life months 24, 26, 30, 20, 20 and 18. Can you accept the
manufacturer’s claim at 5% level of significance?
Hint n 6, H 0 : 25, H1 : 25, x 23, s 4.52
(20) A random sample of 8 envelops is taken from a letter box of a post office and their weights in grams
are found to be 12.1, 11.9, 12.4, 12.3, 11.9, 12.1, 12.4 and 12.1. Does this sample indicate at 1% level
of significance that the average weight of envelops received the post office is 12.35 grams.
Hint n 8, H 0 : 12.35, H1 : 12.35, x 12.15, s 0.632
(21)
H1 Reject H 0 if
1 2 Z z
1 2 Z z
1 2 Z z
2
Note: If n1 30, n2 30 and 1 , 2 are not known, then we can use s1 , s 2 in place of 1 , 2 respectively
Case (ii): For small samples (n1 30, n2 30) and 1 , 2 unknown,
n1 n2 2
p
x x1 x2 x2
2 2
or 2
1
n1 n2 2
p
(5) Criterion :
H1 Reject H 0 if
1 2 t t
1 2 t t
1 2 t t
2
and 2
n1 n2 2
p
x x1 x2 x2
2 2
Or
2 1
n1 n2 2
p
Test statistics and Critical regions for tests of Hypotheses for ONE & TWO means
n 1 s1 2 n2 1 s 2 2
where, p2 1
n1 n2 2
x x1 x2 x2
2 2
or 2
1
n1 n2 2
p
2
n1 1 s1 n2 1 s 2
2 2
n1 n2 2
p
or
x x1 x2 x2
2 2
2
1
n1 n2 2
p
Examples:
(1) The means of two large samples of sizes 1000 and 2000 members are 67.5 inches and 68.0 inches
respectively. Test whether the samples are drawn from the same population of standard deviation 2.5
inches at 0.05 level of significance
Here n1 1000, n2 2000 large samples , x1 67.5, x2 68, 2.5 and 0.05
We need to check 1 2 or 1 2 0
(i) Null Hypothesis H 0 : 1 2 0
(ii) Alternative Hypothesis H 1 : 1 2 0
(iii) Level of Significance : 0.05
Z 1
X X 2 67.5 68 5.16, Z 5.16
12 22
2.5
2
2.5
2
n1 n2 1000 2000
And z z 0.025 1.96
2
(2) Samples of students were drawn from 2 Universities and from their weights in kgs., mean and
standard deviations are calculated and given below. Make a large sample test the significance of the
difference between the means at 0.05 level of significance
Here n1 400, n2 100 large samples , x1 55, x2 57, s1 10, s2 15 and 0.05
(3) A random sample of 1000 men from North India shows that mean wage is Rs.5/- per day with
standard deviation Rs.1.5/-. A sample of 1500 men from South India given mean wage is Rs.4.5/- per
day with standard deviation Rs.2/-. Does the mean rate of wages vary between the regions? Choose
the level of significance 0.01
Z 1
X X 2 5 4.5
7.1307, Z 7.1307
s12 s 22
1.52
22
n1 n2 1000 1500
And z z 0.005 2.575
2
Z 1
X X 2 72 70 1.1547
s12 s 22
8
2
6
2
n1 n2 32 36
And z z 0.05 1.645
(vii) Decision: Since Z z accept H 0 based on the sample data at 0.05
That is, the performance of boys and girls is same
(5) The IQs of 16 students from one area of a city showed a mean of 107 with a standard deviation of 10,
while the IQs of 14 students from another area of the city showed a mean of 112 with a standard
deviation of 8. Is there a significant difference between the IQs of the two groups at at the level of
significance 0.05
Here
n1 16, n2 14 small samples , n1 n2 2 28, x1 107, x2 112, s1 10, s2 8 and 0.05
p2 p2 n1 n2 2
n1 n2
(v) Criterion : Reject H 0 if t t and t t 0.025 2.048
2 2
(vi) Calculation :
n 1 s1 n2 1 s 2 15102 1382 2332 83.29
p2 1
2 2
n1 n2 2 28 28
t
X 1 X2
107 112
1.4971, t 1.4971
p2 p2 83.29 83.29
n n 16 14
1 2
x1 42 39 48 60 41
1 230
46
5 5
x2 38 42 56 64 68 69 62
1 399
57
7 7
Sum of the squares of the deviations of 1st sample, x1 x1 16 49 4 196 25 290
2
n1 n2 2
p
p2 p2
n1 n2
(v) Criterion : Reject H 0 if t t and t t0.05 1.812
(vi) Calculation :
t
X 1 X 2
46 57
1.7036
2 2 121.6 121.6
p
p
n n 5 7
1 2
And t t0.05 1.812
(vii) Decision: Since t t , accept H 0 based on the sample data at 0.05
That is, there no significant increase in the weights
(7) Two independent samples of size 8 and 7 has the following values
Sample I 11 11 13 11 15 9 12 14
Sample II 9 11 10 13 9 8 10
x 2 x2 1 1 0 9 1 4 0 16
2
And p2 x 1 x1 x2 x2
2 2
26 16
3.2308
n1 n2 2 13
(i) Null Hypothesis H 0 : 1 2 0
(ii) Alternative Hypothesis H 1 : 1 2 0
(iii) Level of Significance : 0.05
t
X 1 X 2
12 10
2.15 and t 2.15
p2 p2 3.2308 3.2308
n n 8 7
1 2
And t t 0.025 2.16
2
(8) The means and standard deviations of two samples of sizes 200 and 100 are respectively given by 60
and 50, 8 and 12. Find the (i) standard error (ii) maximum error (iii) 95% confidence interval of the
difference between the means
Here n1 200, n2 100 large samples , x1 60, x2 50, s1 8, s2 12
Also, confidence 95%, probability 1 0.95 and so 0.05
Now, 0.025 and z z0.025 1.96
2 2
p2
n1 1 s1 n2 1 s2 2 15102 1382
2
2332
83.2857
n1 n2 2 28 28
p2 p2
Standard error, SE 83.2857 83.2857 3.3398
n
1 n2 16 14
p2 p2
Maximum error, E t (2.048) 83.29 83.29 6.8399
n
2 1 n2 16 14
Lower confidence limit, ( x1 x2 ) E (107 112) 6.8399 11.8399
Upper confidence limit, ( x1 x2 ) E (107 112) 6.8399 1.8399
Therefore, confidence interval ( x1 x2 ) E, ( x1 x2 ) E 11.8399, 1.8399
Solution: Suppose that average loss of working hours before and after a safety program are 1 and
2 respectively. Now the safety program is effective means that less number of loss of working hours
and hence we have to test 1 2 or 1 2 0 . If is the average of the differences of the loss of
working hours, then we have to test 0 .
Confidence Interval:
Here n 10 (small sample ), n 1 9, 0, x 5.2 and s 4.077
Also, confidence 90%, probability 1 0.90 and so 0.1
Now, 0.05 and t t0.05 1.833
2 2
s
The Maximum error of the Mean, E t
2 n
4.077
1.833 2.3632
10
Lower confidence limit, x E 5.2 2.3632 2.8368
Upper confidence limit, x E 5.2 2.3632 7.5632
Therefore, confidence interval x E, x E 2.8368, 7.5632
(11)
Exercise:
(1) Write the test statistic and criteria for the testing a hypothesis concerning two means large and small
samples
(2) A sample of 100 electric bulbs produced by manufacturer A showed a mean life time of 1190 hours
and a standard deviation of 90 hours. A sample of 75 bulbs produced by manufacturer B showed a
mean life time of 1230 hours, with a standard deviation of 120 hours. Is there a significant difference
between the mean life time of two brands at the level of significance 0.05
(3) A company claims that its light bulbs are superior to those of its main competitor. In a study showed
that a sample of n1 40 of its bulbs has a mean life time of 647 hours of continuous use with a
standard deviation of 27 hours, while a sample of n2 40 bulbs made by its main competitor had a
mean life time of 638 hours of continuous use with a standard deviation of 31 hours, does this
substantiate the claim at the 0.05 level of significance
Hint n1 40, n2 40, H 0 : 1 2 0, H1 : 1 2 0, Z 1.38, z0.05 2.645, accept H 0
(4) The means of two random samples of sizes 8 and 7 are 1234 and 1036 respectively. The standard
deviations of these two samples are 36 and 40 respectively. Is there a significant difference between
the means at the level of significance 0.05
Hint n1 8, n2 7, H 0 : 1 2 0, H1 : 1 2 0, t 9.39, t 0.025 2.160, reject H 0
(5) The means of two random samples of sizes 9 and 7 are 196.42 and 198.82 respectively. The sum of
the squares deviations from the mean are 26.94 and 18.73 respectively. Can the samples be
considered to have been drawn from the same population? Use the 0.01 level of significance
Hint n1 9, n2 8, H 0 : 1 2 0, H1 : 1 2 0, t 2.63, t 0.005 2.977, accept H 0
(6) Measuring specimens of nylon yarn taken from two spinning machines, it was found that 8 specimens
from the first machine had a mean denier of 9.67 with a standard deviation of 1.81 while 10
specimens from second machine had a mean denier of 7.43 with a standard deviation of 1.48.
Assuming that the populations sampled are normal and have same variance, test the null hypothesis
1 2 1.5 against the alternative hypothesis 1 2 1.5 at the 0.05 level of significance.
(7) The following random samples are measurements of the heat-producing capacity (in millions of
calories per ton) of specimens of coal from two mines
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity or not at the level of significance 0.05
(9) In a study of the effectiveness of physical exercise in weight reduction program, a group of 11
persons engaged in a prescribed program of physical exercise for a month showed the following
results.
Weight before 209 178 169 212 180 192 159 180 170 153 183
Weight after 196 171 160 207 177 190 128 196 164 152 179
Use the level of significance 0.01 test whether the prescribed program is effective.
(10)
B.Tech IV Semester (2020 Batch)
STATISTICAL METHODS (20BM1109)
(For CSE-DS)
(Estimation of variance, hypothesis concerning one variance, hypothesis concerning two variances,
estimation of proportion, hypothesis concerning one proportion, hypothesis concerning several
proportions)
Proportion:
Let p, q be the success and failure probabilities of an event of a trial. Let the trial be conducted in any
number of times. Then the collection of all successes and failures of the event is a population known as
Binomial population. For this population, p is called Proportion or True proportion. If x is the number
x x
of successes in n trials, then is called the Sample Proportion and it is denoted by P ; that is, P
n n
Example:
Let a coin be tossed 10 times and ‘getting head H ’ be the event. Suppose that the outcomes in these 10
tosses are H , H , H , T , T , H , T , H , H , T respectively. Now the collection of all the successes and failures of
the event is a population; that is, Population S , S , S , F , F , S , F , S , S , F .
1
For this population, Proportion p
2
(i) If we collect 1st five outcomes S , S , S , F , F , then it is a sample of size n 5
x 3
For this sample, sample proportion P
n 5
(ii) If we collect 1 two outcomes S , S , then it is a sample of size n 2
st
x 2
For this sample, sample proportion P 1
n 2
(iii) If we collect 1 four outcomes S , S , S , F , then it is a sample of size n 4
st
x 3
For this sample, sample proportion P
n 4
(iv) If we collect last two outcomes S , F , then it is a sample of size n 2
x 1
For this sample, sample proportion P
n 2
Examples:
(1) Among 900 people in a state 90 are found to be rice eaters. Construct 99% confidence interval for the
true proportion p
x 90
Here n 900 , x 90 and 0.1
n 900
Also, confidence 99%, probability 1 0.99 and so 0.01
Now, 0.005 and z z 0.005 2.575
2 2
x x
1
n n
The Maximum error of the proportion, E z
2
n
2.575
0.1 1 0.1 0.02575
900
x
Lower confidence limit, E 0.1 0.02575 0.07425
n
x
Upper confidence limit, E 0.1 0.02575 0.12575
n
x
Therefore, confidence interval E , E 0.07425, 0.12575
x
n n
(2) In a random sample of 400 industrial accidents, it was found that 231 were due at least partially to
unsafe working conditions. Construct 99% confidence interval for the true proportion using the large
sample confidence interval formula
x 231
Here n 400 , x 231 and 0.5775
n 400
Also, confidence 99%, probability 1 0.99 and so 0.01
Now, 0.005 and z z 0.005 2.575
2 2
x x
1
n n
The Maximum error of the proportion, E z
2
n
2.575
0.5775 1 0.5775 0.0636
400
x
Lower confidence limit, E 0.5775 0.0636 0.5139
n
x
Upper confidence limit, E 0.5775 0.0636 0.6411
n
x
Therefore, confidence interval E , E 0.5139, 0.6411
x
n n
(3) If x 36 of n 100 persons interviewed are familiar with the tax incentives for installing certain
energy saving devices construct a 95% confidence interval for the corresponding true proportion
x 36
Here n 100 , x 36 and 0.36
n 100
Also, confidence 95%, probability 1 0.95 and so 0.05
Now, 0.025 and z z 0.025 1.96
2 2
x x
1
n n
The Maximum error of the proportion, E z
2
n
1.96
0.36 1 0.36 0.0941
100
x
Lower confidence limit, E 0.36 0.0941 0.2659
n
x
Upper confidence limit, E 0.36 0.0941 0.4541
n
x
Therefore, confidence interval E , E 0.2659, 0.4541
x
n n
(4) Find the sample size if the true proportion does not exceed 0.12 to estimate the true Proportion of
defective items with at least 95% confidence with error 0.04.
(5) What is the size of the smallest sample required to estimate an unknown proportion to within a
maximum error of 0.06 with at least 95% confidence
Exercise:
(1) In a sample survey conducted in a large city, 136 of 400 persons answered ‘yes’ to the question of
whether their cities public transportation is adequate. With 99% confidence, what can be say about
x 136
the maximum error, if = 0.34 is used as an estimate of the corresponding true proportion
n 400
(2) If sample size n 400 and proportion to success p 0.578 then construct 98% confidence interval
for the proportion p
(3) Among 900 people in a state 90 are found to be blind. Construct 98% confidence interval for the true
proportion
(4) A random sample of 500 apples was taken from, a large consignment and 60 were found to be bad.
Obtain the 98% confidence limits for the percentage number of bad apples in the consignment
(5) What is the size of the smallest sample required to estimate an unknown proportion to within a
maximum error of 0.08 with at least 98% confidence
(6) Find the sample size if the true proportion does not exceed 0.2 to estimate the true Proportion of
defective items with at least 95% confidence with error 0.05
Test of Hypothesis – One Proportion:
H1 Reject H 0 if
p p0 Z z
p p0 Z z
p p0 Z z
2
Examples:
(1) An airline claims that only 6% of all lost luggage is never found. If, in a random sample, 17 of 200
pieces of lost luggage are not found, test the null hypothesis p 0.06 against the alternate hypothesis
p 0.06 at 0.05 level of significance
(3) In a big city 325 men out of 600 men were found to be smokers. Does this information support the
conclusion that the majority of men in this city are smokers
(5) A coin is tossed 960 times and head turns up 183 times. Is the coin biased? Use the 0.05 level of
significance
Exercise:
(1) Write the test statistic and criteria for the testing a hypothesis concerning one proportion and two
proportions
(2) A manufacturer of submersible pumps claims that at most 30% of the pumps require repairs within
the first five years of operation. If a random sample of 120 of these pumps includes 47 which
required repairs within the first five years of operation, test the null hypothesis p 0.3 against the
alternative hypothesis p 0.3 at the 0.05 level of significance
(3) An ambulance service’s claim that at least 40% of its calls are life-threatening emergencies, a random
sample was taken from its files, and it was found that only 49 of 150 calls were life-threatening
emergencies. Can the null hypothesis p 0.4 be rejected against the alternative hypothesis p 0.4 at
the 0.01 level of significance
(4) In a random sample of 600 cars making a right turn at a certain intersection, 157 pulled into the
wrong lane. Test the null hypothesis that actually 30% of all drivers make this mistake at the given
intersection, using the alternative hypothesis p 0.3 and the level of significance 0.01
(5) In a random sample of 160 workers exposed to a certain amount of radiation, 24 experienced some ill
effects. Test the null hypothesis p 0.18 versus the alternative hypothesis p 0.18 at the 0.01 level
(6) A coin is tossed 960 times and head turns up 450 times. Is the coin biased? Use the 0.05 level of
significance Hint n 960, x 450, H 0 : p 0.5, H1 : p 0.5
(7) A coin is tossed 400 times and head turns up 216 times. Is the coin biased? Use the 0.05 level of
significance Hint n 400, x 216, H 0 : p 0.5, H1 : p 0.5
(8) A die is thrown 900 times and it falls with 5 upwards 185 times. Is the die biased? Use the 0.01 level
1 1
of significance Hint n 900, x 185, H 0 : p , H 1 : p
6 6
Test of Hypothesis – Two Proportions:
H1 Reject H 0 if
p1 p2 0 Z z
p1 p2 0 Z z
p1 p2 0 Z z
2
x1 x2
Note: pˆ is known as the proportion by pooling
n1 n2
x x
Upper and Lower confidence limits: 1 2 E
n1 n2
x x x x
Confidence Interval: 1 2 E , 1 2 E
n1 n2 n1 n2
pˆ 1 pˆ pˆ 1 pˆ x x
Where, E z and pˆ 1 2
2
n1 n2 n1 n2
Examples:
(1) A manufacturer of electronic equipment subject’s samples of two competing brands of transistors to
an accelerated performance test. If 45 of 180 transistors of the first kind and 34 of 120 transistors of
the second kind fail the test, what can he conclude at the level of significance 0.05 about the
difference between the corresponding sample proportions
(2) A study shows that 16 of 200 tractors produced one assembly line required extensive adjustments
before they could be shipped, while the same was true for 14 of 400 tractors produced another
assembly line. At the 0.01 level of significance, does this support the claim that the second line does
superior work?
(3) A machine puts out 9 imperfect articles in a sample of 200 articles. After the machine is overhauled it
puts out 5 imperfect articles in a sample of 700 articles. Test at 5% level of significance, whether the
machine is improved.
Exercise:
(1) Random samples of 400 men and 600 women were asked whether they would like to have a flyover
near their residence. 200 men and 325 women were in favor of the proposal. Test the hypothesis that
proportions of men and women in favor of the proposal are same, at 5% level of significance.
Hint n1 400, x1 200, n2 600, x2 325, H 0 : p1 p2 0, H1 : p1 p2 0
(2) One method of seeding clouds was successful in 57 of 150 attempts while another method was
successful in 33 of 100 attempts. At the 0.05 level of significance, can we conclude that the first
method is better than second?
Hint n1 150, x1 57, n2 100, x2 33, H 0 : p1 p2 0, H1 : p1 p2 0
(3)
Let n1 , n2 , n3 , nk be the sizes of k number of samples taken from k number of populations respectively.
Let x1 , x2 , x3 , xk be the numbers of successes of these k samples respectively.
Let n n1 n2 n3 nk and x x1 x2 x3 xk .
Now all these values can be tabulated as follows
In the above table, each entry in (i, j ) th cell is called Observed frequency and it is denoted by Oi j for
i 1, 2 and j 1, 2, 3, k ; that is,
O11 x1 , O1 2 x2 , O1 k xk ,
O21 n1 x1 , O2 2 n2 x2 , O2 k nk xk
ei j
i th
row total j th column total for i 1, 2 and j 1, 2, 3, k ; that is,
n
x n1 x n2 x nk
e11 , e1 2 , e1 k ,
n n n
n x n1 n x n2 n x nk
e21 , e2 2 , e2 k
n n n
(1) Null Hypothesis H 0 : p1 p2 p3 pk
(2) Alternative Hypothesis H 1 : Not all p1 , p2 , p3 , pk are equal
(3) Level of Significance :
2 k O ei j 2
: with k 1
2 ij
(4) Test statistic
i 1 j 1 ei j
: Reject H 0 if 2
2
(5) Criterion
Examples:
(1) Samples of three kinds of materials, subjected to extreme temperature changes, produced the results
shown in the following table
Use the 0.05 level of significance to test whether, under the stated conditions, the probability of
crumbling is the same for the three kinds of materials
Here the data is in 2 rows and 3 columns; that is the number of samples, k 3
Therefore, k 1 2
Expected frequency, ei j
i
row total j th column total
th
for i 1, 2 and j 1, 2, 3
n
e11
90120 36, e 9080 24, e 90100 30,
12 13
300 300 300
e21
210120 84, e 21080 56, e 210100 70
22 23
300 300 300
2 k O ei j 2
Now
2 ij
i 1 j 1 ei j
O 11 e11 2
O 12 e1 2 2
O 13 e13 2
e11 e1 2 e13
O 21 e21 2
O 22 e2 2 2
O 23 e2 3 2
e2 1 e2 2 e2 3
41 36 2
27 24 2 22 30 2
36 24 30
79 84 53 56 2 78 70 2
2
84 56 70
0.6944 0.375 2.1333 0.2976 0.1607 0.9143
4.5753
(2) Four methods are under development for making discs of a superconducting material. Fifty discs are
made by each method and they are checked for superconductivity when cooled with liquid nitrogen
Test whether there is any significant difference between the proportions of super conductors
produced at the 0.05 level of significance
Here the data is in 2 rows and 4 columns; that is the number of samples, k 4
Therefore, k 1 3
Expected frequency, ei j
i
row total j th column total
th
for i 1, 2 and j 1, 2, 3, 4
n
e11
12050 30, e 12050 30, e 12050 30, e 12050 30,
12 13 14
200 200 200 200
e21
8050 20, e 8050 20, e 8050 20, e 8050 20
22 23 24
200 200 200 200
2 k O ei j 2
Now
2 ij
i 1 j 1 ei j
O 11 e11 2
O 12 e1 2 2
O 13 e13 2
O 14 e1 4 2
e11 e1 2 e13 e1 4
O 21 e2 1 2
O 22 e2 2 2
O 23 e2 3 2
O24 e2 4 2
e2 1 e2 2 e2 3 e2 4
31 30 2
42 30 2 22 30 2 25 30 2
30 30 30 30
19 20 8 20 28 20 2 25 20 2
2 2
20 20 20 20
0.0333 4.8 2.1333 2.1333 0.8333 0.05 7.2 3.2 1.25
19.4999
(3)
Exercise:
(1) The following data come from a study in which random samples of the employees of three
government agencies were asked questions about their pension plan:
Use the 0.01 level of significance to test the null hypothesis that the actual proportions of employees
favoring the pension plan are the same
(2) Tests are made on the proportion of defective castings produced by 5 different molds. If there were
14 defectives among 100 castings made with Mold I, 33 defectives among 200 castings made with
Mold II, 21 defectives among 180 castings made with Mold III, 17 defectives among 120 castings
made with Mold IV, and 25 defectives among 150 castings made with Mold V, use the 0.01 level of
significance to test whether the true proportion of defectives is the same for each mold.
(3) The following table gives the classification of 100 workers according to gender and nature of work.
Test whether the nature of work is independent of the gender of the worker at the 0.05 level of
significance
(4)
(5)
Examples:
(1) Suppose that the refractive indices of 20 pieces of glass (randomly selected from a large shipment
4
purchased by the optical firm) have a variance of 1.20 10 . Construct a 95% confidence interval for , the
standard deviation of the population.
(n 1) s 2 19 1.20 10 4
Lower confidence limit for 2 is given by 0.000069
2 32.852
2
(n 1) s 2 19 1.20 10 4
Upper confidence limit for is given by
2
0.000256
2 8.907
1
2
Exercise:
(1)
(2) Ff
(3)
H1 Reject H 0 if
2 02 2 12
2 02 2 2
2 02 2 2 or 2 2
1
2 2
S12 F F (n1 1, n2 1)
If 12 22 then F
S 22 2
Exercise:
(1) If 12 determinations of the specific heat of iron have a standard deviation of 0.0086, test the null hypothesis
0.01 for such determinations. Use the alternative hypothesis 0.01 and the level of significance
0.01
(2) The security department of a large office building wants to test the null hypothesis that = 2.0 minutes for
the time it takes a guard to walk his round against the alternative hypothesis that 2.0 minutes. What can it
conclude at the 0.01 level of significance if a random sample of size n = 30 yields s = 1.8 minutes?
(3) A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with standard
deviation 648 psi. Use this information and the level of significance 0.05, test the null hypothesis
600 psi against the alternative hypothesis 600
(4) It is desired to determine whether there is less variability in the silver plating done by company 1 than in that
done by company 2. If independent random samples of size 12 of the two companies work yield S 1 = 0.035 mil
and S2 = 0.062 mil, test the null hypothesis 12 22 against the alternative hypothesis 12 22 at the 0.05
level of significance
(5) The following random samples are measurements of the heat-producing capacity (in millions of
calories per ton) of specimens of coal from two mines
4 Two Means
p2 p2 x1 x2 E, x1 x2 E
Small Samples E t
n1 n2
2
where p2
n1 1 s1 2 n2 1 s 2 2
n1 n2 2
or
x x1 x2 x2
2 2
2
1
n1 n2 2
p
5 One Proportion
x x x x
1 E, E
n n
E z
n n
2
n
6 Two Proportions pˆ 1 pˆ pˆ 1 pˆ x1 x2 x1 x2
E z E, E
n n
2 n1 n2 1 2 n1 n2
7 One Variance
n 1s 2 n 1s 2
,
2
2
1
2 2
n 1 s1 2 n2 1 s 2 2
where, p2 1
n1 n2 2
x x1 x2 x2
2 2
or 2
1
n1 n2 2
p
5 One x p p0 Z z
p
Proportion xn p p p0 Z z
Z or Z n
n p 1 p p 1 p p p0 Z z
n 2
6 Two x1 x 2 p1 p2 Z z
Proportions
n1 n2 x x2 p1 p2 Z z
Z where, pˆ 1
pˆ 1 pˆ pˆ 1 pˆ n1 n2 p1 p2 Z z
2
n1 n2
7 Several 2 k O ei j 2 H 0 : p1 , p2 , p3 , pk 2 2
2 ij
Proportions are equal
i 1 j 1 ei j
H 1 : p1 , p2 , p3 , pk
with k 1 are not equal
8 One (n 1) S 2 02
2
2 12
Variance 2
with n 1
2 2 02 2 2
2 02 2 2 or
1
2
2 2
2
9 Two s2
2 2
F F 2 , 1
Variances F 2
2
(provided s 22 s12 ) , with 2 , 1 1 2
s1
2
s
F 1
2
(provided s12 s 22 ) , with 1 , 2 12 22 F F 1 , 2
s2
F F M , m
2
s
F M
2
(provided s M2 s m2 ) , with M , m 2 02
s m 2