STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao

B.
Tech IV Semester (2020 Batch)

STATISTICAL METHODS (20BM1109)
(For CSE (DS))
Units – 3, 4 & 5
Unit – 3: Sampling Distribution: population and sample, sampling distribution of the mean (σ known),
sampling distribution of the mean (σ unknown), sampling distribution of the variance: Chi-square and F-
distributions.
Unit – 4:
Estimation and Test of Hypothesis of Means: Point estimation, interval estimation, test of hypothesis,
hypothesis concerning one mean, hypothesis concerning two means, matched pair comparisons.
Unit – 5: Estimation and Test of Hypothesis of Variances and Proportions: Estimation of variance,
hypothesis concerning one variance, hypothesis concerning two variances, estimation of proportion,
hypothesis concerning one proportion, hypothesis concerning several proportions.
Unit – 3: Statistics: Sampling Distribution

1. Sampling Distribution of the Mean (σ known)
2. Sampling Distribution of the Mean (σ unknown , t-distribution)
3. Sampling Distribution of the Variance (Chi-square and F-distributions)
Unit – 4: Statistics: Estimation and Test of Hypothesis of Means

1. Estimation: Mean
2. Test of hypothesis
3. Test of hypothesis: One Mean
4. Test of hypothesis: Two Means
Unit – 5: Statistics: Estimation and Test of Hypothesis of Variances and Proportions

1. Estimation: Proportions
2. Test of hypothesis: One Proportion
3. Test of hypothesis: Several Proportions
4. Estimation: Variance
5. Test of hypothesis: One Variances
6. Test of hypothesis: Two Variances
B.Tech IV Semester (2020 Batch)
(For CSE (DS))
Units – 3, 4 & 5
(Population and sample, sampling distribution of the mean (σ known), sampling distribution of the mean (σ
unknown), sampling distribution of the variance: Chi-square and F-distributions.)
Sampling Distribution: population and sample, sampling distribution of the mean (σ known), sampling
distribution of the mean (σ unknown), sampling distribution of the variance: Chi-square and F-
distributions.
Population: A collection of objects (collection of numbers, measurements, observations etc) is called a
Population
The number of objects in a population is called its Size and is denoted by N

If N is finite then the population is called Finite population
If N is infinite then the population is called Infinite population
Examples:
(1) Heights of the students in a University
(2) Marks obtained by the students of SSC in Mathematics
(3) Scores of candidates obtained in a competitive exam
(4) The set of outcomes when a coin tossed 1000 times
(5) Collection of all even numbers
Parameters: The statistical measures (Mean, Median, Variance etc.) about a population are called
Parameters.
The Mean, Variance and Standard deviation of a population are respectively denoted by the symbols
 ,  2 and 
Sample: A finite sub collection from a population is called a Sample

The number of objects in a sample is called its Size and is denoted by n
If n  30 then the sample is called large sample
If n  30 then the sample is called small sample
Statistics: The statistical measures (Mean, Median, Variance etc.) about a sample are called statistics.
The Mean, Variance and Standard deviation of a sample are respectively denoted by the symbols
x , s 2 and s
Examples:
(1) For the population of ‘the Heights of the students in a University’, the heights of the students in class
of 40 is a sample
(2) For the population of ‘the Marks obtained by the students of SSC in Mathematics’, the marks of the
students of a particular school is a sample
(3) For the population of ‘the Scores of candidates obtained in a competitive exam’, the scores of the
candidates from a particular college is a sample
(4) For the population of ‘the set of outcomes when a coin tossed 1000 times’, a collection of 10 outcomes
is a sample
(5) For the population of ‘the Collection of all even numbers’, a collection of 20 even numbers is a sample
Random Sample: A sample of size n taken from a population is called a random sample if the probability
of any choice of n objects from the population is same.
Sampling: Collecting samples from a given population is called sampling
Large Sampling: Collecting large samples from a given population is called large sampling
Small Sampling: Collecting small samples from a given population is called small sampling
Sampling with replacement: Collecting samples in which the objects may repeat, from a given population
is called sampling with replacement
Sampling without replacement: Collecting samples in which the objects not repeat, from a given
population is called sampling without replacement
Number of Samples:
N
(i) The number of samples of size n without replacement, taken from a finite population of size N is C n
n
(ii) The number of samples of size n with replacement, taken from a finite population of size N is N
(iii) The number of samples of size n with or without replacement, taken from an infinite population is 
Examples:
(1) Consider the finite population 1, 2, 3, 4  of size N  4

The number of samples of size n  2 without replacement is given by
N
C n  4C 2  6
The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

The means of these 6 samples are respectively 1.5, 2, 2.5, 2.5, 3, 3.5
The frequency distribution of the these sample means is given as follows
Sample mean x 1.5 2 2.5 3 3.5

Frequency f 1 1 2 1 1
This frequency distribution is called the sampling distribution of the mean (SDM)
Note:
Here drawing a sample of size n  2 without replacement from the above population is a random
experiment with the possible outcomes 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 
That is, the sample space S  1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 
If X is the random variable which gives the mean the sample, then the range of X is
1.5, 2, 2.5, 3, 3.5 
The number of samples of size n  2 with replacement is given by N n  4 2  16
The samples are 1,1, 1, 2 , 1, 3 ,  1, 4 ,

2,1, 2, 2 , 2, 3 ,  2, 4 ,
3,1, 3, 2 , 3, 3 ,  3, 4 ,
 4,1,  4, 2 ,  4, 3 ,  4, 4 
The means of these 16 samples are respectively given follows
1, 1.5, 2, 2.5,
1.5, 2, 2.5, 3,
2, 2.5, 3, 3.5,
2.5, 3, 3.5, 4
Sample mean x 1 1.5 2 2.5 3 3.5 4

Frequency f 1 2 3 4 3 2 1
This frequency distribution is the sampling distribution of the mean (SDM)
Sampling Distribution of the Mean (SDM):

The frequency distribution of the means of all random samples of fixed size, taken from a population is
called the Sampling Distribution of the Mean (SDM). It is denoted by X .
Sampling Distribution of the Mean with  (SDM with  ):

Theorem: If X is the random variable which gives the mean a random sample of size n, taken from a
population having mean  and variance  2 , then
(i) Mean of SDM X is given by  X   or E X    
(ii) For infinite population or sampling with replacement,
2 2
Variance of SDM X is given by   2
X
 
or V X 
n n
(iii) For finite population of size N and sampling without replacement,
2  N n
Variance of SDM X is given by   2
 
X
n  N 1 
Note:
N n
(1) The value of is called the finite population correction factor
N 1
(2) The standard deviation of X is called Standard error of the mean; that is,

Standard error (SE), X 
n

(3) The value of 0.6745  is called the Probable error of the mean; that is,
n

Probable error (PE)  0.6745 
n
Chebyshev’s Theorem:
If X is the random variable which gives the mean a random sample of size n, taken from a population

having mean  and variance  2 , then P X    k  1   2
nk2
, where k is a positive constant.
Note:
 
1. P X    k  1 
2
nk2
or P   k  X    k   1 
2
nk2

2. P X    k   nk2
2
Central Limit Theorem:

If X is the random variable which gives the mean a random sample of size n, taken from a population
X 
having mean  and variance  2 , then Z  .
  
 
 n
Examples:
(1) Find the finite population correction factor for n  10 and N  1000
N  n 1000  10
Finite population correction factor,   0.991
N  1 1000  1
N  n 100  5
Finite population correction factor,   0.9596
N  1 100  1
(3) What is the effect on standard error, if a sample is taken from an infinite population and its size is
increased from 400 to 900?
Here n1  400 and n2  900

 
Initially, the Standard Error SE1   X  
n1 400
 
After the sample size increasing, the Standard Error SE2   X  
n2 900
SE2  400 2
Now consider   
SE1 900  3
2
Therefore, the Standard Error is decreased times of its original value
3
decreased from 100 to 25?
Here n1  100 and n2  25
 
Initially, the Standard Error SE1   X  
n1 100
 
After the sample size decreasing, the Standard Error SE2   X  
n2 25
SE2  100
Now consider   2
SE1 25 
Therefore, the Standard Error is increased 2 times of its original value
(5) *A population consists 4 numbers 1, 2, 3, 4

(i) Find the Mean and Variance of the population
(ii) Write all possible samples of size 2 without replacement
(iii) Write the Sampling distribution of the mean (SDM)
(iv) Find Mean and Variance of the Sampling distribution of the mean (SDM)
(v) Verify the results in (iv) with suitable formula
(i) Mean and Variance of the population:

1  2  3  4 10
Mean,     2.5
4 4
1
Variance,  2   x 2   2
n
1  4  9  16
  2
4
 2.5
30

2
4
 7.5  6.25  1.25
(ii) Samples of size 2 without replacement:

N
C n  4C 2  6
The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

(iii) The Sampling distribution of the mean (SDM):
The means of the above 6 samples are respectively 1.5, 2, 2.5, 2.5, 3, 3.5
Sample mean x 1.5 2 2.5 3 3.5

Frequency f 1 1 2 1 1
Which is the sampling distribution of the mean (SDM)

(iv) Mean and Variance of the Sampling distribution of the mean (SDM)
1 1.5  2  2.5  2.5  3  3.5 15

Mean,  X 
6
 x
6

6
 2.5
1
Variance,  X2   x 2   X
2
6
2.25  4  6.25  6.25  9  12.25
  X
2
6
 2.5
40

2
6
 6.6667  6.25  0.4167
(v) Verification:
We have  X  2.5  
 2  N  n  1.25  4  2 
     0.4167   X
2
and
n  N 1  2  4 1 
2  N n
Therefore,  X   and  X    are verified.
2
n  N 1 
(6) *A population consists 4 numbers 1, 2, 3, 4

(ii) Write all possible samples of size 2 with replacement
(i) Mean and Variance of the population:

1  2  3  4 10
Mean,     2.5
4 4
1
Variance,  2   x 2   2
n
1  4  9  16
  2
4
 2.5
30

2
4
 7.5  6.25  1.25
(ii) Samples of size 2 with replacement:

The number of samples of size n  2 without replacement is given by N n  4 2  16
The samples are 1,1, 1, 2 , 1, 3 ,  1, 4 ,
2,1, 2, 2 , 2, 3 ,  2, 4 ,
3,1, 3, 2 , 3, 3 ,  3, 4 ,
 4,1,  4, 2 ,  4, 3 ,  4, 4 
(iii) The Sampling distribution of the mean (SDM):
The means of the above 16 samples are respectively
1, 1.5, 2, 2.5,
1.5, 2, 2.5, 3,
2, 2.5, 3, 3.5,
2.5, 3, 3.5, 4
Sample mean x 1 1.5 2 2.5 3 3.5 4

Frequency f 1 2 3 4 3 2 1
Which is the sampling distribution of the mean (SDM)
(iv) Mean and Variance of the Sampling distribution of the mean (SDM)
Mean,  X 
 x f  11  1.52  23  2.54  33  3.52  41  40  2.5
f 16 16
Variance,  2

x f  2
2
X
f X

1 1  1.5 2  2 3  2.5 4  3 3  3.5 2  4 1
2 2 2 2 2 2 2

2
X
16
 2.5
110
  6.875  6.25  0.625
2
16
(v) Verification:
2 1.25
We have  X  2.5   and   0.625   X2
n 2
 2
Therefore, X   and  X2  are verified.

n
(7) A random sample of size 100 is taken from an infinite population having the mean 76 and the variance
256. What is the probability that the sample mean will be between 75 and 78
Here n  100 ,   76,  2  256 and   16

X 
By central limit theorem, Z 
 
 
 n
   
   
 75   X   78     75  76 78  76 
P75  X  78  P     P  Z

            16   16  
 n   n   n     100    
         100  
 P 0.63  Z  1.25
 F 1.25  F  0.63
 F 1.25  1  F 0.63
 0.8944  1  0.7357  0.6301
(8) A normal population has mean 0.1 and standard deviation 2.1. Find the probability that the mean of a
sample of size 900 will be negative
Here n  900,   0.1 and   2.1

X 
 
 
 n
   
   
 X  0   0  0.1 
P  X  0   P    P Z 
          2.1  
  n   n     
        900  
 PZ  1.43
 F  1.43
 1  F 1.43  1  0.9236  0.0764
(9) A random sample of size 64 is taken from a normal population with mean 51.4 and standard deviation
6.8. Find the probability that the mean of the sample will (i) exceed 52.9 (ii) fall between 50.5 and
52.3 (iii) less than 50.6
Here n  64,   51.4 and   6.8

X 
 
 
 n
   
   
 X   52.9     52.9  51.4 
(i) Exceed 52.9, PX  52.9  P    P Z   6.8  
     
  n     
   
   n    64  
 PZ  1.76  1  PZ  1.76
 1  F 1.76  1  0.9608  0.0392
(ii) Fall between 50.5 and 52.3,
   
   
 50.5   X   52.3     50.5  51.4 52.3  51.4 
P50.5  X  52.3  P     P  6.8   Z   6.8  

         
        
    
  n  n  n    64   64  
 P1.06  Z  1.06
 F 1.06  F  1.06
 F 1.06  1  F 1.06
 2F 1.06 1  20.8554 1  0.7108
   
   
 X   50.6     50.6  51.4 
(iii) Less than 50.6 PX  50.6  P   P Z 
      6.8  
         
 n   n     64  
 PZ  0.94
 F  0.94
 1  F 0.94  1  0.8264  0.1736
(10) If a 1- gallon can of paint covers on an average 513 square feet with a standard deviation of 31.5
square feet, what is the probability that the mean area covered by a sample of 40 of these 1- gallon
cans will be anywhere from 510 to 520 square feet
Here n  40,   513 and   31.5

X 
 
 
 n
   
   
 510   X   520     510  513 520  513 
P510  X  520  P    P Z
        31.5   31.5  
             
  n  n  n    40   40  
 P 0.6  Z  1.4
 F 1.4  F  0.6
 F 1.4  1  F 0.6
 0.9192 1  0.7258  0.645
(11) For a large sample of size n, verify that there is a 50-50 chances that the mean of a random sample
from an infinite population with standard deviation  differ from  by less than 0.6745  
n
  
We have prove that P X    0.6745    0.5
 n
 
 
    X  
Consider, P X    0.6745    P  0.6745 
 n     
 
  n 
 
 P Z  0.6745
 P 0.6745  Z  0.6745
 F 0.6745  F  0.6745
 F 0.67  1  F 0.67
 0.7486  1  0.7486
 0.497
(12) If the mean of breaking strength of copper wire is 575 lbs, with a standard deviation of 8.3 lbs. How
large a sample must be used in order that there will be one chance in 100 that the mean breaking
strength of the sample is less than 572 lbs
X 
Here   575 and   8.3 By central limit theorem, Z 
 
 
 n
We have to find the value of n such that P  X  572  
1
100
Now PX  572  0.01
 
 
 X   572   
 P    0.01

      
 n    
    n  
 
 
 572  575 
 P Z    0.01
  8 . 3 
   
  n 
 3 n
 P Z     0.01

 8 . 3 
 3 n
 F     0.01

 8.3 
 3 n
 1  F    0.01

 8 . 3 
 3 n
 F    0.99

 8.3 
3 n
  2.33
8.3
 2.33  8.3 
2
n   41.55  42
 3 
(13)
Exercise:
(1) Define (i) Population (ii) Sample (iii) Large sample (iv) Small sample (v) Random sample
(2) Define (i) Sampling (ii) Sampling Distribution of the Mean (iii) finite population correction factor
(iv) Standard error of the mean (v) Probable error of the mean
(3) State (i) Chebyshev’s Theorem (ii) Central Limit Theorem
(4) Write all possible samples of size 3 without replacement from the population 1, 2, 3, 4 .
Also compute the means of all these samples
(5) Write all possible samples of size 3 without replacement from the population 1, 2, 3, 4 .
Also compute the means of all these samples
decreased from 800 to 200?
increased from 300 to 2700?
(10) A population consists 4 numbers 3, 7, 11, 15
(ii) Write all possible samples of size 2 with replacement
(11) A population consists 4 numbers 3, 7, 11, 15
(ii) Write all possible samples of size 2 without replacement
(12) If X is the mean of a random sample of size n, taken from the population 1, 2, 3,  N , find the
mean and variance of the Sampling distribution of the mean (SDM)
(13) Construct sampling distribution of means for the population 5, 10, 12, 18 by drawing samples of size
2 without replacement. Find (i) Mean of the population (ii) Standard deviation of the population (iii)
Mean of the sampling distribution of means (iv) Standard deviation of the sampling distribution of
means
(14) A random sample of size 100 is taken from a normal population with mean 76 and standard deviation
16. Find the probability that the mean of the sample will (i) exceed 77 (ii) fall between 75 and 78
10. Find the probability that the mean of the sample will (i) exceed 67 (ii) fall between 66 and 68
15. Find the probability that the mean of the sample will be (i) less than 157 (ii) in between 153 and
158 (iii) greater than 160
(17) A sample of size 400 is taken from an infinite population with Standard deviation 16. Find the
Standard error and Probable error of the mean
(18) If the mean of breaking strength of copper wire is 676 lbs, with a standard deviation of 12 lbs. How
large a sample must be used in order that there will be one chance in 100 that the mean breaking
strength of the sample is less than 672 lbs
(19) The mean of certain normal population is equal to the standard error of the mean of samples of size
64. Find the probability that the mean of the sample size 36 will be negative
(20)
Sampling Distribution of the Mean with unknown  (SDM with unknown  ):
X 
If we do not know the value of  , then we cannot use the Central limit theorem Z  .
 
 
 n
In this case we use sample standard deviation ' s ' in place of population standard deviation  so that we
X 
have a random variable different from Z . This new random variable is denoted by t ; that is t  .
 s 
 
 n
The probability distribution corresponding to this random variable t is called t -distribution with parameter
n  1 . This parameter is known as degrees of freedom, denoted by ; that is,   n  1.
Note: In the t -distribution, for the sample x1 , x2 , x3 ,  xn ,

1 n
(i) Sample mean x is given by x   xi
n i 1
1 n
(ii) Sample variance is given by s 2    x i  x 2
n  1 i 1
Properties of t - distribution:
(1) The curve given by t -distribution is called t -curve

(2) The t -curve is continuous and above the X-axis (or t - axis)
(3) The curve is symmetric about the Y-axis
(4) The t -curve is similar to Z -curve (standard normal curve)
(5) The area between t - axis and the curve from   to  is 1 unit
(6) The mean of t is 0 and the variance of t is greater than 1; that is,  t  0 and  t2  1
(7) As n  , variance of t tends to 1; that is,  t2  1 . In other words, as n  , t  Z
X 
(8) If n is large ( n  30 ), then we can write Z in place of t ; that is, Z 
 s 
 
 n
(9) P(t  a )  P(t  a )
1
(10) P (t  0)  P(t  0) 
2
t - Notation:
If   0 then t  is a point on t – axis such that P(t  t )   or P(t  t )  
That is, the area between t – axis and the curve from t to  is 
(or the area between t – axis and the curve from   to  t is  )
Note:
(1) t1   , t0   , t 1  0
2
(2) t  t1  0 or  t  t1 or t  t1
Testing a claim using t -distribution: To test a given claim using t -distribution, we follow the rule given
below.
(i) If t  t0.005 then the claim is accepted
(ii) If t  t0.005 then the claim is rejected
t - Table:
In this table the values of t  are available for different values of  and 
Example:
(i) For   7 or n  8 and   0.025 , t  t0.025  2.365

(ii) For   10 or n  11 and   0.05 , t  t0.05  1.812
(iii) For   15 or n  16 and   0.005 , t  t0.005  2.947
 1

 t2  2
1  
 
Note: The pdf of t -distribution is given by f (t )  
1  
 ,  
2 2
1
For   1 we have f (t )  (1  t 2 ) 1 for    t  

Problems:
(1) Find (i) t0.05 when   12 (ii)  t 0.01 when   8 (iii) t 0.995 when   10
(i) When   12 , t0.05  1.782

(ii) When   8 ,  t0.01  2.896
(iii) When   10 , t0.995  t10.995  t0.005  3.169 ( t  t1 )
(2) (i) Find t 0.975 when   13 (ii) Find  t 0.99 when   10 (iii) Find t0.95 when   11
(i) When  13 , t0.975  t10.975  t 0.025  2.160 ( t  t1 )

(ii) When  10 ,  t0.99  t10.99  t 0.01  2.764 ( t  t1 )
(iii) When  11 , t0.95  t10.95  t 0.05  1.796 ( t  t1 )
(3) Find (i) P(t  2.365) when   7 (ii) P(t  1.318) when   24
(iii) P(t  2.567) when   17 (iv) P(1.356  t  2.179) when   12
(i) When   7 , P(t  2.365)  P(t  t0.025 )  1  P(t  t0.025 )  1  0.025  0.975
(ii) When   24 , P(t  1.318)  P(t  t0.1 )  0.1
(iii) When   17 , P(t  2.567)  P(t  t0.01 )  1  P(t  t0.01 )  1  P(t  t0.01 )  1  0.01  0.99
(iv) When   12 , P(1.356  t  2.179)  P(t0.1  t  t0.025 )  1  (0.1  0.025)  0.875
P(t0.1  t  t0.025 )  P(t  t0.1 )  P(t  t0.025 )  P(t  t0.9 )  P(t  t0.025 )  0.9  0.025  0.875
(i) P(t  t0.025 ) (ii) P(t  t0.1 ) (iii) P(t  t0.01 ) (iv) P(t0.1  t  t0.025 )
(4) A random sample of size 25 from a normal population has mean x  47.5 and the standard deviation
s  8.4 . Does this information tend to support or refute the claim that the mean of the population is
  42.1
Here n  25  small sample  and   n  1  24

Also x  47.5 , s  8.4 ,   42.1 and t0.005  2.797
We know that given claim is accepted if and only if t  t0.005
X   47.5  42.1
Now t    3.2143 and t  3.2143
 s   8.4 
   
 n  25 
Since t  t0.005 , reject the claim
(5) A process for making certain ball bearings is under control if the diameters of the bearings have a
mean of 0.5 cm. What can you say about the process if a random sample of 12 of these bearings has a
mean diameter of 0.515 cm and standard deviation of 0.017 cm
Here n  12  small sample  and   n 1  11

Also x  0.515 , s  0.017 ,   0.5 and t0.005  3.106
We know that given claim is accepted (the process is under control) if and only if t  t0.005
X   0.515  0.5
Now t    3.0566 and t  3.0566
 s   0.017 
   
 n  12 
Since t  t0.005 , accept the claim; that is, the process is under control
Note: In place of x  0.515 , if we take x  0.48 then t  4.0754 , t  t0.005 and therefore we
need to reject the claim.
1
(6) The t distribution with 1 degree of freedom is given by f (t )  (1  t 2 ) 1 for    t   . Verify

the value given for t0.05 when   1 in the table.
From the tables, when   1, t0.05  6.314
 
Now P(t  6.314) 
1
 f (t )dt   
1
1 t 2
1

dt  tan 1 t



6.314
6.314 6.314
1 

1

tan 1

()  tan 1 (6.314)  
 2
 7 11
 1.4137  

 1.4137  0.05018  0.05
 22  7 
(7)
Exercise:
(1) (i) Find t 0.025 when   14 (ii) Find  t 0.01 when   10 (iii) Find t 0.995 when   7
(2) (i) Find t 0.99 when   6 (ii) Find t 0.975 when   24 (iii) Find  t 0.975 when   19
(3) Find (i) P(t  2.821) when   9 (ii) P(t  2.947) when   15
(iii) P(t  1.729) when   19 (iv) P(1.714  t  2.069) when   23
(4) Find k for a sample of size 24 from a normal population such that (i) P(2.069  t  k )  0.965
(ii) P(k  t  2.807)  0.095 (iii) P(k  t  k )  0.9
(5) A random sample of size 25 from a normal population has mean x  45.4 and the standard deviation
s  9.7 . Does this information tend to support or refute the claim that the mean of the population is
  40
(6) A process for making certain ball bearings is under control if the diameters of the bearings have a
mean of 0.5 cm. What can you say about the process if a random sample of 10 of these bearings has a
mean diameter of 0.506 cm and standard deviation of 0.004 cm. ( n  10 , t 0.005  3.25 , t  4.7434
and reject the claim)
(7) The tensile strength (1,000 psi) of a new composite can be modeled as a normal distribution. A
random sample of size 25 specimens has mean x  45.3 and standard deviation s  7.9 . Does this
information tend to support or refute the claim that the mean of the population is   40.5 ?
(8) The process of making concrete in a mixer is under control if the rotations per minute of the mixer
have a mean of 22 rpm. What can we say about this process if a sample of 20 of these mixers has a
mean rpm of 22.75 rpm and a standard deviation of 3 rpm?
(9) A manufacturer of fuses claims that with 20% overload, the fuses will blow in 12.4 minutes on the
average. To test this claim, a sample of 20 of the fuses was subjected to a 20% overload, and the
times it took them to blow had a mean of 10.63 minutes and standard deviation of 2.48 minutes. If it
can be assumed that the data constitute a random sample from a normal population, do they tend to
support or refute the manufacturer’s claim
1
(10) The t distribution with 1 degree of freedom is given by f (t )  (1  t 2 ) 1 for    t   . Verify

the value given for t0.1 when   1 in the table.
Sampling Distribution of the Variance (SDV):
The frequency distribution of the variances of all random samples of fixed size, taken from a population is
called the Sampling Distribution of the Variance (SDV). It is denoted by S 2 .
Examples:

N
C n  4C 2  6
The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

The means of these 6 samples are respectively 1.5, 2, 2.5, 2.5, 3, 3.5
1 n n
The variance   xi  x 2 or 1  xi 2  ( x ) 2
n i 1 n i 1
The variances of these 6 samples are respectively given as below.

1 2
2

1  2 2  1.5  2.5  2.25  0.25
2

1 2
2

1  32  2  5  4  1
2

1 2
2

1  4 2  2.5  8.5  6.25  2.25
2

1 2
2

2  32  2.5  6.5  6.25  0.25
2

1 2
2

2  4 2  3  10  9  1
2

1 2
2

3  4 2  3.5  12.5  12.25  0.25
2
The frequency distribution of the these sample variances is given as follows
Sample variance s 2 0.25 1 2.25

Frequency f 3 2 1
Which is the sampling distribution of the Variance (SDV)

Theorem: If S 2 is the random variable which gives the variance a random sample of size n, taken from a
 X  X
n
2
(n  1) S 2 i
normal population having variance  2 , then  2   i 1
is a random variable having
2 2
the Chi-square distribution with the parameter   n  1.
 2  Distribution: A continuous random variable having the probability density function given by
 x
1 1 
f ( x)  for x  0 and   0 , is called the Chi-square (  ) random variable with
2

x2 e 2
 
2 2  
2
parameter  known as degrees of freedom. The probability distribution is called  2  Distribution.
Note:
 
(i) (n)   e x dx   e  x x 2 n1dx
x n 1 2
0 0
(ii) (1)  1 and (n)  (n  1) ! for n  2, 3, 4, 
x x
1  1 
(iii) If   2 , then f ( x)  e 2 e 2 for x  0
21 2
x x
1  1 
(iv) If   4 , then f ( x)  xe 2  xe 2 for x  0
2 2
2
4
x x
1  1 2 2
(v) If   6 , then f ( x)  x 2e 2  x e for x  0
2 3
3
16
Properties of  2  Distribution:
(i) The curve given by the probability density function is called the  2  curve.
(ii) The  2  curve is lies in the 1st quadrant
(iii) It is not symmetric about any axis and has the following shape.
(iv) It depends on the value of 
(v) P 2  0  1 ; That is, the area between  2 – axis and
the curve from 0 to  is 1
 2  Notation:
If   0 then  2 is a point on  2 – axis such that P(  2  2 )  
That is, the area between  2 – axis and the curve from  2 to  is 
2 - Table:
In this table the values of  2 are available for different values of  and 
Example:
(i) For   6 or n  7 and   0.025 , 2   02.025  14.449

(ii) For   10 or n  11 and   0.05 , 2   02.05  18.307
(iii) For   15 or n  16 and   0.99 ,    0.99  5.229
2 2
Testing a claim using  2 -distribution: To test a given claim using  2 -distribution, we follow the rule
given below.
(i) If  2   02.005 then the claim is accepted
(ii) If  2   02.005 then the claim is rejected
Problems:
(1) Find (i)  02.025 when   12 (ii)  02.05 when   8 (iii)  02.99 when   16
Solution:
(i) When   12 ,  02.025  23.337
(ii) When   8 ,  02.05  15.507
(iii) When   16 ,  02.99  5.812
(2) Find (i) P(  2  12.833) when   5 (ii) P(  2  3.325) when   9 (iii) P(  2  13.12) when
  25
Solution:
(i) When   5 , P(  2  12.833)  P(  2   02.025 )  1  P(  2   02.025 )  1  0.025  0.975
(ii) When   9 , P(  2  3.325)  P(  2   02.95 )  0.95
(iii) When   25 , P(  2  13.12)  P(  2   02.975 )  1  P(  2   02.975 )  1  0.975  0.025
(3) Find (i) P 2.088   2  16.919 when   9 (ii) P 7.564   2  35.718 when   17
Solution:
(i) When   9 , P 2.088   2  16.919  P  02.99   2   02.05 
  
 P  2   02.99  P  2   02.05 
 0.99  0.05  0.94
(ii) When   17 , P 7.564   2  35.718  P  02.975   2   02.005 

  
 P  2   02.975  P  2   02.005 
 0.975  0.005  0.97
(4) The claim that the variance of a normal population  2  21.3 is rejected if the variance of a random
sample of size 15 exceeds 39.74. What is the probability that the claim will be rejected even though
 2  21.3 .
Solution:
Here n  15,   n  1  14 and  2  21.3
Given that the claim is rejected if S 2  39.74
The probability that the claim will be rejected is given by
 (n  1) S 2 (n  1) 
P( S 2  39.74)  P   39.74 
 
2
 2 
 39.74  14 
 P2  
 21.3 
   
 P  2  26.120  P  2   02.025  0.025
(5) The claim that the variance of a normal population  2  18.5 is rejected if the standard deviation of a
random sample of size 25 exceeds 5.7559. What is the probability that the claim will be rejected even
though  2  18.5 .
Solution:
Here n  25,   n  1  24 and  2  18.5
Given that the claim is rejected if S  5.7559
The probability that the claim will be rejected is given by
P(S  5.7559)  P(S 2  33.1304)
 (n  1) S 2 (n  1) 
 P   33.1304 
 
2
 2 
 33.1304  24 
 P2  
 18.5 
 
 P  2  42.9799  P  2   02.01   0.01
(6) A random sample of 10 observations is taken from a normal population having the variance
 2  42.5. Find the approximate probability of obtaining a sample standard deviation between 3.14
and 8.94.
(n  1) S 2
Solution: Here n  10 and   n  1  9 . Also  2  42.5 , we know that  2  2

The probability of obtaining a sample standard deviation between 3.14 and 8.94 is given by
P(3.14  S  8.94)  P(9.8596  S 2  79.9236)
 (n  1) (n  1) S 2 (n  1) 
 P  9.8596   79.9236 
  2
 2
 2 
 9.8596  9 79.9236  9 
P  2  
 42.5 42.5 
 P 2.0879    16.9249
2
 P  02.99   2   02.05 
  
 P  2   02.99  P  2   02.05 
 0.99  0.05  0.94
 1  2x
(7) The Chi-square distribution with 4 degrees of freedom is given by f ( x)   4 xe x0

 0 x0
Find the probability that the variance of a random sample with   12 will exceed 180.
Solution: Here   4 and   12
The probability that the variance of a random sample will exceed 180 is given by
 (n  1) S 2 (n  1) 
P(S  180)  P 
2
 180

  P   2 
180  4 
 
 P 2  5 
     144 
2 2

    
     2x    x 
  1  e 
x
  f ( x)dx   xe 2 dx  x  
1  1 e 2
4  1  
 1  
2
4
     
    2  
5 5
   2      5
1   
5 5 5
 7 2
 0    10e  4e   e
2 2
4    2
(8)
Exercise:
(1) Find (i)  02.05 when   6 (ii)  02.99 when   10 (iii)  02.025 when   14 (iv)  02.01 when   21
(2) Determine the probabilities: (i) P(  2  27.688) when   13 (ii) P(  2  18.475) when   7
(iii) P(  2  14.256) when   29
(3) Find (i) P 12.461   2  48.278 when   28 (ii) P 28.685   2  29.141 when   14
(4) A random sample of 12 observations is taken from a normal population having the variance
 2  42.5. Find the approximate probability of obtaining a sample variance between 10.057 and
84.691. (Hint: P(10.057  s 2  84.691)  P 2.6029   2  21.92  P  02.995   2   02.025   0.97 )
(5) The claim that the variance of a normal population  2  4 is rejected if the variance of a random
sample of size 9 exceeds 7.7535. What is the probability that the claim will be rejected even though
 2  4.
(6) A random sample of 15 observations is taken from a normal population having variance  2  90.25.
Find the approximate probability of obtaining a sample standard deviation between 7.25 and 10.75.
(7) A manufacturer claims that any of his lists of items cannot have variance more than 1. A sample of
25 items has a variance of 1.2 . Test whether the claim of the manufacturer is correct.
 1  2x
(8) The Chi-square distribution with 2 degrees of freedom is given by f ( x)   2 e x0

 0 x0
Find the probability that the standard deviation of a random sample with   10 will exceed 10 2 .

 
(Hint: P( S  10 2 )  P( S  200)  P   4   f ( x)dx )
2 2
4
F  Distribution: A continuous random variable having the probability density function given by
1
      2
 1 2   1  
1 1
 2 
 2 x2
f ( x)  for x  0,  1  0 and  2  0 , is called the F  random variable
 1   2    1 2
       1  2
 2   2  1  x 
 2 
with parameters  1 and  2 known as degrees of freedoms. The probability distribution is called F 
Distribution.
Note:
 
(n)   e x dx   e  x x 2 n1dx
x n 1 2
(i)
0 0
(ii) (1)  1 and (n)  (n  1) ! for n  2, 3, 4, 
1
(iii) If  1  2 and  2  2 , then f ( x)  for x  0
(1  x) 2
6x
(iv) If  1  4 and  2  4 , then f ( x)  for x  0
(1  x) 4
Properties of F  Distribution:
(i) The curve given by the probability density function is called the F  curve.
(ii) The F  curve is lies in the 1st quadrant
(iii) It is not symmetric about any axis and has the following shape.
(iv) It depends on the values of  1 and  2
(v) PF  0  1 ; That is, the area between F – axis and
the curve from 0 to  is 1
F  Notation:
If   0 then F is a point on F – axis such that P( F  F )  
That is, the area between F – axis and the curve from F to  is  .
The value of F corresponding to  1 and  2 is denoted by F ( 1 , 2 )
1
Note: F ( 1 , 2 ) 
F1 ( 2 , 1 )
Theorem: If S12 and S 22 are the variances of independent random samples of sizes n1 and n2 , taken from
S12
two normal populations having the same variance, then F  is a random variable having the
S 22
F  distribution with the parameters  1  n1 1 and  2  n2 1 .
S 22
Note: If F  , then the parameters are in the order of  2  n2 1 and  1  n1 1
S12
F0.05  Tables: In this table the values of F0.05 ( 1 , 2 ) are available for different values of  1 and  2
F0.01  Tables: In this table the values of F0.01 ( 1 , 2 ) are available for different values of  1 and  2
Example:
(i) For  1  10,  2  6 and   0.05 , F0.05 ( 1 , 2 )  F0.05 (10, 6)  4.06

1 1 1
(ii) For  1  6,  2  10 and   0.95 , F0.95 ( 1 , 2 )  F0.95 (6, 10)   
F10.95 (10, 6) F0.05 (10, 6) 4.06
(iii) For  1  9,  2  13 and   0.01 , F0.01 ( 1 , 2 )  F0.01 (9,13)  4.19
1 1 1
(iv) For  1  13,  2  9 and   0.99 , F0.99 ( 1 , 2 )  F0.99 (13, 9)   
F10.99 (9, 13) F0.01 (9, 13) 4.19
Problems:
(1) For an F  distribution find
(i) F0.05 with  1  7 and  2  15
(ii) F0.01 with  1  25 and  2  19
(iii) F0.95 with  1  19 and  2  25
(iv) F0.99 with  1  22 and  2  12
Solution:
(i) F0.05 (7,15)  2.71
(ii) F0.01 (25,19)  2.91
1 1 1
(iii) F0.95 (19, 25)     0.4739
F10.95 (25,19) F0.05 (25,19) 2.11
1 1 1
(iv) F0.99 (22,12)     0.3205
F10.99 (12, 22) F0.01 (12, 22) 3.12
(2) If two independent random samples of size n1  7 and n2  13 are taken from a normal population,
what is the probability that the variance of the first sample will be at least three times as large as that
of the second sample?
Solution:
Here n1  7 and n2  13 , so  1  6 and  2  12
Therefore, the probability that the variance of the first sample will be at least three times as large as
that of the second sample is given by
 S2 
P( S12  3S 22 )  P 12  3   P( F  3)  P( F  F0.05 )  0.05  F0.05 (6,12)  3
 S2 
what is the probability that the variance of the first sample will be at least seven times as large as that
of the second sample?
Solution:
Here n1  8 and n2  8 , so  1  7 and  2  7
Therefore, the probability that the variance of the first sample will be at least seven times as large as
that of the second sample is given by
 S12 
P( S1  7 S 2 )  P 2  7   P( F  7)  P( F  F0.01 )  0.01
2 2
 F0.01 (7, 7)  6.99  7
 2
S 
what is the probability that the variance of the second sample will be at least 2.5 times as large as that
of the first sample?
Solution:
Here n1  13 and n2  26 , so  1  12 and  2  25
Therefore, the probability that the variance of the second sample will be at least 2.5 times as large as
that of the first sample is given by
 S 22 
P( S 2  2.5 S1 )  P 2  2.5   P( F  2.5)  P( F  F0.05 )  0.05
2 2
 F0.05 (25,12)  2.5
 1S 
(5)
(6)
Problems:
(1) For an F  distribution find
(i) F0.05 with  1  20 and  2  10
(ii) F0.01 with  1  20 and  2  5
(iii) F0.95 with  1  15 and  2  12
(iv) F0.95 with  1  12 and  2  15
(v) F0.99 with  1  5 and  2  20
(vi) F0.99 with  1  20 and  2  5
what is the probability that the variance of the second sample will be at least 2.4 times as large as that
of the first sample?
what is the probability that the variance of the first sample will be at least four times as large as that of
the second sample?
what is the probability that the variance of the first sample will be at least four times as large as that of
the second sample?
(For CSE-DS)
Units – 4: Estimation and Test of Hypothesis of Means
(Point estimation, interval estimation, test of hypothesis, hypothesis concerning one mean, hypothesis
concerning two means, matched pair comparisons)
Estimation: Estimating a population parameter using sample data is called Estimation.

It is two types (i) Point Estimation (ii) Interval Estimation
Point Estimation: Estimating a population parameter in terms of a single numerical value is called Point
Estimation.
Interval Estimation: Estimating a population parameter in terms of an interval is called Interval

Estimation.
Estimator: If a statistic  is used to estimate a population parameter  , then  is called an estimator for

Unbiased Estimator: A statistic  is called an unbiased estimator for a population parameter  if the
mean of the sampling distribution the statistic  is  ; that is    or E   
For example,
(i) We know that  X   or EX   
Therefore, the statistic X is an unbiased estimator for the population parameter 
In other words, sample mean is an unbiased estimator for the population mean
(ii) Similarly,
1 n
 X i  X 2 is an unbiased estimator for the population variance  2
n  1 i 1
 X  np
(iii) If X is Binomial random variable with parameters n and p , then we have E    p
n n
X
Therefore, the statistic is an unbiased estimator for the population parameter p
n
X
Here p is called proportion and is called sample proportion
n
More efficient unbiased estimator: Let  be a population parameter and 1 ,  2 be two statistics such that
(i) 1 and  2 are unbiased estimators for  ; that is, E1    and E 2   
(ii) The variance of the sampling distribution of the statistic 1 is less than that of the statistic  2 ; that is,
V 1   V  2 
Then 1 called more efficient unbiased estimator than  2 for 
Maximum Error of the Mean:
Case (i): For large samples (n  30) or  known,

  
E  z    is called the maximum error of the Mean with the probability 1  
2 n 
Case (ii): For small samples (n  30) and  unknown,

 s 
E  t    is called the maximum error of the Mean with the probability 1  
2 n 
Note: For a large sample of size n, verify that the probability is 1   that the mean of a random sample
from an infinite population with standard deviation  differ from  by less than the maximum error E ;

That is, P X    E  1  
 
 
   
Consider,  
P X    E  P X    z    

 P
X 
  
 z 

 2 n   2 
 
   
 n 
   
 P Z  z    P  z   Z  z  
 2   2 2 
        
 F  z    F   z    F  z    1  F  z  
 2  2  2    2 
      
 1    1  1  
 2   2 
 1
Confidence Interval - One Mean:
Let x be the mean of a random sample of size n, taken from a population having mean  and variance  2
and E be the maximum error of the mean. Then
x  E are called the Upper and Lower confidence limits of the Mean  with the probability 1   or
1   100% confidence.
And
the interval x  E, x  E  is called the Confidence Interval of the Mean 

with the probability 1   or 1   100% confidence.
Note: (i) For large samples (n  30) or  known,
Upper and Lower confidence limits: x  E

  
Confidence Interval: x  E, x  E  where, E  z  
2 n 
(ii) For small samples (n  30) and  unknown,
Upper and Lower confidence limits: x  E

 s 
Confidence Interval: x  E, x  E  where, E  t   
2 n 
Maximum Error & Confidence Interval for ONE MEAN
S.No. Maximum Error Confidence Interval Probability/

Confidence
1 One Mean    x  E, x  E  1 /
Large Samples (n  30) E  z    1   100%
or  known 2  n 
2 One Mean  s  x  E, x  E  1 /
Small Samples (n  30) E  t    1   100%
and  unknown 2  n 
Note:
(1)* z 0.01  2.33
(2)* z0.05  1.645
(3)* z0.005  2.575
(4)* z0.025  1.96
Examples:
(1) What is the maximum error one can expect with probability 0.90 when using the mean of a random
sample size n  64 to estimate the mean of a population with  2  2.56
Here n  64  n  30 , large sample ,  2  2.56 and   1.6

Also probability 1    0.90 and so   0.10

Now,  0.05 and z   z 0.05  1.645
2 2
  
The Maximum error of the Mean, E  z   
2  n 
 1.6 
 1.645   0.329
 64 
(2) A sample of size 10 with standard deviation 0.03 was taken from a population. Find the maximum
error with 95% confidence
Here n  10  n  30 , small sample , s  0.03 and   n  1  9

Also, confidence  95%, probability 1    0.95 and so   0.05

Now,  0.025 and t   t 0.025  2.262
2 2
 s 
The Maximum error of the Mean, E  t   
2 n 
 0.03 
 2.262   0.0215
 10 
(3) The Mean and standard deviation of a sample of size n  50 are 11,795 and 14,054 respectively, find
95% confidence interval for mean
Here n  50  large sample , x  11,795 and s  14,054


Now,  0.025 and z   z 0.025  1.96
2 2
 s 
2  n 
 14054 
 1.96   3895.57
 50 
Lower confidence limit, x  E  11795  3895.57  7899.43
Upper confidence limit, x  E  11795  3895.57  15690.57
Therefore, confidence interval x  E, x  E   7899.43, 15690.57
(4) The Mean and Variance of a sample of size n  300 are 54 and 225 respectively, find 95%
confidence interval for mean
Here n  300  large sample , x  54, s 2  225 and s  15


Now,  0.025 and z   z 0.025  1.96
2 2
 s 
2 n 
 15 
 1.96    1.6974
 300 
Lower confidence limit, x  E  54 1.6974  52.3026
Upper confidence limit, x  E  54 1.6974  55.6974
(5) The Mean and standard deviation of a sample of size n  100 are 155 and 16 respectively, find 99%
confidence interval for the population mean
Here n  100  large sample , x  155 and s  16


Now,  0.005 and z   z 0.005  2.575
2 2
 s 
The Maximum error of the Mean, E  z     2.575 16   4.12
2 n   100 
Here n  23  small sample , x  68, s  10 and   n  1  22


Now,  0.005 and t   t 0.005  2.819
2 2
 s 
2 n 
 10 
 2.819   5.878
 23 
(7) Ten bearings made by a certain process have a mean diameter of 0.506cm with a standard deviation
of 0.004cm. Assuming the data may be taken as a random sample from a normal distribution,
construct a 95% confidence interval for the actual average diameter of the bearings.
Here n  10  small sample , x  0.506, s  0.004 and   n  1  9


Now,  0.025 and t   t 0.025  2.262
2 2
 s 
2 n 
 0.004 
 2.262   0.0029
 10 
Lower confidence limit, x  E  0.506  0.0029  0.5031
Upper confidence limit, x  E  0.506  0.0029  0.5089
(8) Five independent measurements of the flash point of Diesel oil gave the values 144, 147, 146, 142,
144. Assuming normality, determine a 99% confidence interval for the mean
Here n  5  small sample  and   n  1  4


Now,  0.005 and t   t 0.005  4.604
2 2
1 n
144  147  146  142  144
Sample mean, x  
n i 1
xi 
5
 144.6
1 n
Sample variance, s 2    x i  x 2
n  1 i 1

144  144.62  147  144.62  146  144.62  142  144.62  144  144.62
4
0.36  5.76  1.96  6.76  0.36 15.2
   3.8
4 4
Sample standard deviation, s  3.8  1.95
(SD mode; 144M+147M+…144M+; shift2; 1=….; for clearing data: on shift mode 3 = =)
 s 
2  n 
 1.95 
 4.604   4.02
 5 
(9) The dean of a college wants to use the mean of a random sample to estimate the average amount of
time students take to get from one class to the next, and she wants to be able to assert with 99%
confidence that the error is at most 0.25 minute. If it can be presumed from experience that  =1.40
minutes, how large a sample will she have to take?
Here   1.4 and maximum error , E  0.25


Now,  0.005 and z   z 0.005  2.575
2 2
  
2  n 
 z  
2

 2   2.575  1.4  2
Therefore, n       14.42  207.9364  208
2
 E   0.25 
 
1 n 2 1  n 2

(10) E   X i  X    E    X i     X     
 n i 1  n  i 1 
1 
 2 
 E    X i     2 X i    X     X     
n
2
n  i 1 
1  n 2
E    X i     2  X i    X      X    
n n

2
n  i 1 i 1 i 1 
1  n 
E    X i     2X     X i     X    1
n n

2 2
n  i 1 i 1 i 1 
1  n 2
 E    X i     2X   n X  n    n X    
2
n  i 1 
1  n 2
 E    X i     2n  X     n  X    
2 2
n  i 1 
1  n 2
 E    X i     n X    
2
n  i 1 

1 n
 
n  i 1
   
E  X i     n E X    
2 2 


1
n

n 2  n  X2 
1

 n 2   2  
 X2 
2 

n  n 
n 1 2
 
n
2
Therefore,  X i  X  is not an unbiased estimator for  2
1 n 2
n i 1
 1 n
But E   X i  X 2    2
 n  1 i 1 
X i  X 2 is an unbiased estimator for  2
n
1
Therefore, 
n  1 i 1
(11)
(12)
Exercise:
sample of size n  64 to estimate the mean of a population with  2  2.56
sample of size n  15 with s 2  1.96 to estimate the mean of a population
(3) A sample of size 49 with mean 60 was taken from a population whose S.D. is 10. Find 95%
confidence interval for population mean
(4) A random sample of size n=100 is taken from a population with σ = 5. Given that the sample mean is
x  21.6. Construct a 95% confidence interval for the population mean
(5) Random sample of size 81 was taken from a population; whose variance is 20.25 and mean is 32.
Construct 98% confidence interval for true mean
(7) Nine bearings made by a certain process have a mean diameter of 0.506cm with a standard deviation
of 0.004cm. Assuming the data may be taken as a random sample from a normal distribution,
construct a 99% confidence interval for the actual average diameter of the bearings.
(8) In an air pollution study, the following amounts of suspended benzene soluble organic matter (in
micrograms per cubic meter) were obtained at an experiment station for eight different samples of air:
2.2, 1.8, 3.1, 2.0, 2.4, 2.0, 2.1 and 1.2. Construct a 95% confidence interval for the corresponding true
mean
Hypothesis: A definite statement about a population parameter is called a Hypothesis
Examples:
1. Average height of the students in a Class is 5.8ft; that is   5.8
2. Average height of the students in a Class is at most 6.2ft; that is   6.2
3. Average height of the students in a Class is at least 5.1ft; that is   5.1
4.   15
5.   20
6. Variance of the marks obtained by the students of SSC in Mathematics is 21; that is  2  18
7.  2  8
8.  2  8
Classification of Hypothesis: It is classified as two types (i) Simple Hypothesis (ii) Composite Hypothesis
Simple Hypothesis: A hypothesis which gives complete information about a population parameter is
called a simple hypothesis. In other words, a hypothesis which contains the symbol '  ' is called a simple
hypothesis.
Composite Hypothesis: A hypothesis which is not simple is called a composite hypothesis.

Examples:
1.   15 is composite
2.   15 is simple
3.   20 is composite
4.  2  18 is simple
5.  2  8 is composite
Testing of Hypothesis: Verifying the validity of a hypothesis using a given sample data is called a Testing
of Hypothesis
Errors in Testing of Hypothesis: If a hypothesis is true and accepted or a hypothesis is false and rejected,
then in either case there is no error in the decision. Otherwise, there is an error in the decision and this error
is in types. (i) Type I Error (ii) Type II Error
Type I Error: If a hypothesis is true but rejected, then there is an error in rejecting is called Type I error.
The probability of obtaining Type I error is called Level of Significance (LoS) and it is denoted by 
Type II Error: If a hypothesis is false but accepted, then there is an error in accepting is called Type II
error. The probability of obtaining Type II error is denoted by 
Null Hypothesis (NH): In the testing of a hypothesis, a hypothesis which is formulated for the sake
rejection under the assumption that it is true. Null hypothesis is denoted by H 0
Note: Usually a null hypothesis is simple
Alternative Hypothesis (AH): In the testing of a hypothesis, a hypothesis which is not the null hypothesis
is called Alternative hypothesis and it is denoted by H 1
Note: Usually Alternative hypothesis is composite
Examples:
1. Null hypothesis H 0 :   15
Alternative hypothesis H1 :   15
Alternative hypothesis H1 :   15
Alternative hypothesis H1 :   15
4. Null hypothesis H 0 :   15
5. Null hypothesis H 0 :   15
Classification of Tests Hypothesis: Tests of hypothesis are classified as two types (i) One Tailed Test
(OTT) (ii) Two Tailed Test (TTT)
And One Tailed Tests are classified as two types (i) Left One Tailed Test (LOTT) (ii) Right One Tailed
Test (ROTT)
Left One Tailed Test (LOTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains the
symbol ' ' , then the test is called Left One Tailed Test
Right One Tailed Test (ROTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains
the symbol '  ' , then the test is called Right One Tailed Test
Two Tailed Test (TTT): In the testing of a hypothesis, if the alternative hypothesis H 1 contains the
symbol '  ' , then the test is called Two Tailed Test
Examples:
1. In a testing of a hypothesis, if
Null hypothesis H 0 :   15
Then it is One Tailed Test or Left One Tailed Test
Then it is One Tailed Test or Right One Tailed Test
Alternative hypothesis H1 :   15
Then it is Two Tailed Test
Critical region: The region under a probability curve in which H 0 is rejected, is called Critical region
Guidelines for formulating H 0 and H 1 : When the goal of an experiment is to establish an assertion, the
negation of the assertion should be taken as the Null hypothesis H 0 . The assertion becomes the Alternative
hypothesis H 1 . In detail we have the following.
S.No. Claim / Assertion H0 H1 Type of test

1    0 or   0   0 LOTT
  0
2    0 or   0   0 ROTT
  0
3    0 or   0   0 TTT
  0
Procedure for testing a hypothesis: To test a hypothesis we follow the steps.
(1) Null Hypothesis: Formulate Null Hypothesis H 0
(2) Alternative Hypothesis: Formulate Alternative Hypothesis H 1
(3) Level of Significance: Specify the Level of Significance (LoS) 
(4) Test statistic: Specify the test statistic or test formula
(5) Criterion: Specify the criterion for testing H 0 against H 1
(6) Calculation: Calculate the test statistic value using sample data
(7) Decision: Decide whether H 0 reject or not
Note:
(1)* z 0.01  2.33
(2)* z0.05  1.645
(3)* z0.005  2.575
(4)* z0.025  1.96
Test of Hypothesis – One Mean:

Case (i): For large samples (n  30) ,
(1) Null Hypothesis H 0 :   0

(2) Alternative Hypothesis H 1 : Any one of the following    0 ,    0 ,    0
(3) Level of Significance :
X 
(4) Test statistic : Z
 
 
 n
(5) Criterion :
H1 Reject H 0 if
  0 Z   z
  0 Z  z
  0 Z  z
2
Note: If n  30 and  is not known, then we can use s in place of 
Case (ii): For small samples (n  30) ,

(1) Null Hypothesis H 0 :   0
(2) Alternative Hypothesis H 1 : Any one of the following    0 ,    0 ,    0
X 
(4) Test statistic : t ,   n 1
 s 
 
 n
(5) Criterion :
H1 Reject H 0 if
  0 t  t
  0 t  t
  0 t  t
2
Test statistics and Critical regions for tests of Hypotheses for ONE MEAN
S.No Test of Hypothesis Test Statistic H1 Reject H 0 if

1 One Mean X    0 Z   z
Large Samples Z
    0 Z  z
 
 n   0 Z  z
2
2 One Mean X    0 t  t
Small Samples t ,   n 1
 s    0 t  t
 
 n   0 t  t
2
Examples:
(1) According the norms established for a mechanical aptitude test, persons who are 18 years old should
average 73.2 with a standard deviation of 8.6. If 45 randomly selected persons of that age averaged
76.7; test the null hypothesis   73.2 against the alternative hypothesis   73.2 at the 0.01 level of
significance
Here n  45  large sample ,   73.2,   8.6, x  76.7 and   0.01
(i) Null Hypothesis H 0 :   73.2
(ii) Alternative Hypothesis H 1 :   73.2
(iii) Level of Significance :   0.01
X 
(iv) Test statistic : Z
 
 
 n
(v) Criterion : Reject H 0 if Z  z and z  z 0.01  2.33
(vi) Calculation :
X   76.7  73.2
Z   2.73 And z  z 0.01  2.33
   8.6 
   
 n  45 
(vii) Decision: Since Z  z , reject H 0 based on the sample data at   0.01
x  75.7 then Z = 1.95 therefore, Z  z accept H 0

x  75.7 then Z = 1.95,   0.05 , z0.05  1.645 therefore, Z  z reject H 0
(2) In a labor-management discussion it was brought up those workers at a certain large plant take on
average 32.6 minutes to get to work. If a random sample of 60 workers took on the average 33.8
minutes with a standard deviation of 6.1 minutes, can we reject the null hypothesis   32.6 in a
favor of the alternative hypothesis   32.6 at the 0.05 level of significance
Here n  60  large sample ,   32.6, s  6.1, x  33.8 and   0.05
(i) Null Hypothesis H 0 :   32.6
(ii) Alternative Hypothesis H 1 :   32.6
X 
 s 
 
 n
(vi) Calculation :
X   33.8  32.6
Z   1.5238 And z  z 0.05  1.645
 s   6.1 
   
 n  60 
(vii) Decision: Since Z  z accept H 0 based on the sample data at   0.05
(3) It is claimed that a random sample of 49 tires has mean life of 15200 km. This sample was drawn
from a population whose mean is 15150 km and standard deviation 1200 km. Test the claim at 0.05
level of significance
Here n  49  large sample ,   15150,   1200, x  15200 and   0.05
(i) Null Hypothesis H 0 :   15150

(ii) Alternative Hypothesis H 1 :   15150
X 
 
 
 n
(v) Criterion : Reject H 0 if Z  z  and z   z 0.025  1.96
2 2
(vi) Calculation :
X   15200  15150
Z   0.2917, Z  0.2917
    1200 
   
 n  49 
And z   z 0.025  1.96
2
(vii) Decision: Since Z  z  , accept H 0 based on the sample data at   0.05

2
Note: (i) x  15300 , Z  0.875 , accept H 0 (ii) x  15900 , Z  4.375 , Reject H 0

(4) An ambulance service claims that it takes on the average less than 10 minutes to reach its destination
in emergency calls. A sample of 36 calls has a mean of 11 minutes and the variance of 16 minutes.
Test the significance at 0.05 level
Here n  36  large sample ,   10, x  11, s 2  16, s  4 and   0.05
(ii) Alternative Hypothesis H 1 :   10
X 
 s 
 
 n
(v) Criterion : Reject H 0 if Z   z and  z   z 0.05  1.645
(vi) Calculation :
X   11  10
Z   1.5 And  z   z 0.05  1.645
 s   4 
   
 n   36 
Decision: Since Z   z , accept H 0 based on the sample data at   0.05
(5) A sample of 400 items is taken from a population whose standard deviation is 10. The mean of the
sample is 40. Test whether the sample has come from a population with mean 38 at 0.05 level of
significance
We have to test   38
Here n  400  large sample ,   38, s  10, x  40 and   0.05
X 
 s 
 
 n
2 2
(vi) Calculation :
X  40  38
Z   4, Z  4 And z   z 0.025  1.96
 s   10 
    2
 n  400 
Decision: Since Z  z  , reject H 0 based on the sample data at   0.05
2
(6) A trucking firm is suspicious of the claim that the average life time of certain tires is at least 28,000
miles. To check the claim, the firm puts 40 of these tires on its trucks and gets a mean life time of
27,463 miles with a standard deviation of 1,348 miles. What can it conclude if the probability of
Type I error is to be at most 0.01?
Here n  40  large sample ,   28000, x  27463, s  1348 and   0.01

Suspicious of the claim   28000 means that   28000
That is, we need check the hypothesis   28000
(iii) Level of Significance :   0.01 and we will test at maximum,   0.01
X 
 s 
 
 n
(vi) Calculation :
X   27463  28000
Z   2.52
 s   1348 
   
 n  40 
And  z   z 0.01  2.33
(vii) Decision: Since Z   z , reject H 0 based on the sample data at   0.01
That is, the trucking firm’s suspicion is true
(7) The specifications for a certain kind of ribbon call for a mean breaking strength of 180 pounds. If five
pieces of the ribbon have a mean breaking strength of 169.5 pounds with a standard deviation of 5.7
pounds, test the null hypothesis   180 pounds against the alternative hypothesis   180 pounds at
the 0.01 level of significance. Assume that population distribution is normal
Here n  5  small sample ,  n  1  4,   180, x  169.5, s  5.7 and   0.01

X 
(iv) Test statistic : t
 s 
 
 n
(v) Criterion : Reject H 0 if t  t and  t  t 0.01  3.747
(vi) Calculation :
X   169.5  180
t   4.119
 s   5.7 
   
 n  5
And  t  t 0.01  3.747
(vii) Decision: Since t  t , reject H 0 based on the sample data at   0.01
(8) A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with standard
deviation 648 psi. Use this information and the level of significance 0.05, test whether the true
average compressive strength of the steel from which this sample came is 58,000 psi. Assume
normality
Here n  6  small sample ,  n  1  5,   58000, x  58392, s  648 and   0.05

X 
 s 
 
 n
(v) Criterion : Reject H 0 if t  t  and t   t 0.025  2.571
2 2
(vi) Calculation :
X   58392  58000
t   1.48179, t  1.48179
 s   648 
   
 n  6
And t   t 0.025  2.571
2
(vii) Decision: Since t  t  , accept H 0 based on the sample data at   0.05

2
(9) A sample of 26 bulbs gives a mean life of 990 hours with standard deviation of 20 hours. The
manufacturer claims that the mean life of bulbs is at least1000 hours. Is the sample not up to the
standard at level of significance 0.05
Here n  26  small sample ,  n  1  25,   1000, x  990, s  20 and   0.05

Not up to the standard means that   1000
That is, we need check the hypothesis   1000

X 
 s 
 
 n
(v) Criterion : Reject H 0 if t  t and  t  t 0.05  1.708
(vi) Calculation :
X   990  1000
t   2.5495
 s   20 
   
 n  26 
And  t  t 0.05  1.708
(vii) Decision: Since t  t , reject H 0 based on the sample data at   0.05
That is, the sample is not up to the standards
(10) The heights of 10 males of given locality are found to be 70, 67, 62, 68, 61, 68, 70, 64, 64, 66 inches.
Is it reasonable to believe that average height is greater than 64 inches 5% level of significance?
Here n  10  small sample ,  n  1  9,   64 and   0.05

1 n 70  67  62  68  61  68  70  64  64  66 660
Sample mean, x   xi    66
n i 1 10 10
1 n
n  1 i 1

70  662  67  662  62  662  68  662    64  662  66  662
9
90

 10
9
Sample standard deviation, s  10  3.162
( SD mode, 70M+67M+…66M+,shift2, 1=….for clearing data: on shift mode 3 ==)

(ii) Alternative Hypothesis H 1 :   64
X 
 s 
 
 n
(v) Criterion : Reject H 0 if t  t and t  t 0.05  1.833
(vi) Calculation :
X   66  64
t  2
 s   3.162 
   
 n   10 
And t  t 0.05  1.833
(vii) Decision: Since t  t , reject H 0 based on the sample data at   0.05
(11) A random sample of size 16 values from a normal population showed a mean 53 and sum of squares
of deviations from the mean equals to 150. Can this sample be regarded as taken from the population
having mean 56 at 5% level of significance?
n
Here n  16  small sample ,  n  1  15,   56, x  53,  x  x   150 and   0.05
2
i
i 1
n
Sample variance, s 2 
1
 xi  x 2  150  10
n  1 i 1 15
Sample standard deviation, s  10  3.162

X 
 s 
 
 n
2 2
(vi) Calculation :
X  53  56
t   3.795, t  3.795
 s   3.162 
   
 n   16 
And t   t 0.025  2.131
2
(vii) Decision: Since t  t  , reject H 0 based on the sample data at   0.05

2
(12) A machine is expected to produce nails of length 7 cms. A random sample of 8 nails was found to
measure in 7.2, 7.3, 7.1, 6.9, 6.8, 6.5, 6.9 and 6.5 respectively. On the basis of this sample what can
we say about the reliability of the machine at 1% level of significance?
Here n  8  small sample ,  n  1  7,   7 and   0.01

1 n 7.2  7.3  7.1  6.9  6.8  6.5  6.9  6.5 55.2
Sample mean, x   xi    6.9
n i 1 8 8
1 n
n  1 i 1
1
7

 7.2  6.9  7.3  6.9   7.1  6.9   6.9  6.9   6.8  6.9 
2 2 2 2 2
 6.5  6.9   6.9  6.9   6.5  6.9

2 2 2

0.09  0.16  0.04  0  0.01  0.16  0  0.16 0.62
   0.0886
7 7
Sample standard deviation, s  0.0886  0.2977
(i) Null Hypothesis H 0 :  7

(ii) Alternative Hypothesis H 1 : 7
X 
 s 
 
 n
2 2
(vi) Calculation :
X  6.9  7
t   0.95, t  0.95
 s   0.2977 
   
 n  8 
And t   t 0.005  3.499
2

2
(13)
(14)
Exercise:
(1) Define (i) Hypothesis (ii) Simple hypothesis (iii) Composite hypothesis (iv) Null hypothesis
(v) Alternative hypothesis
(2) Explain the errors in the testing a hypothesis
(3) Write the guidelines for formulating H 0 and H 1
(4) Explain the procedure for testing a hypothesis
(5) Write the test statistic and criteria for the testing a hypothesis concerning one mean large and small
samples
(6) According the norms established for a mechanical aptitude test, persons who are 18 years old should
average 76.4 with a standard deviation of 9.2. If 49 randomly selected persons of that age averaged
70.2. Test the null hypothesis   76.4 against the alternative hypothesis   76.4 at the 0.01 level
of significance
(7) Tests performed with a random sample of 40 diesel engines produced by large manufacturer show
that they have mean thermal efficiency of 31.4% with a standard deviation of 1.6%. At the 0.01 level
of significance, test the null hypothesis   32.3% against the alternative hypothesis   32.3%
(8) In 64 randomly selected hours of production, the mean and the standard deviation of the number of
acceptable pieces produced by a automatic stamping machine are x  1038 and s  146 . At the 0.05
level of significance, does this enable us to reject the null hypothesis   1000 against the alternative
hypothesis   1000
(9) A company producing computers states that the mean life time of its computers is 1600 hours. Test
the claim at 0.01 level of significance against alternative hypothesis H1 :   1600 hrs, if 100
computers produced by this company has mean life time of 1570 hours with SD of 120 hours
(10) In a labor-management discussion it was brought up those workers at a certain large plant take on
average 32.6 minutes to get to work. If a random sample of 60 workers took on the average 34.1
minutes with a standard deviation of 6.1 minutes, can we reject the null hypothesis   32.6 in a
favor of the alternative hypothesis   32.6 at the (i) 0.05 (ii) 0.01 level of significance
(11) An ambulance service claims that it takes on the average less than 10 minutes to reach its destination
in emergency calls. A sample of 36 calls has a mean of 9 minutes and the variance of 16 minutes.
Test the significance at 0.05 level Hint n  36, H 0 :   10, H1 :   10 
(12) A sample of 64 students has a mean weight of 70 kgs. Can this be regarded as a sample from a
population with mean weight 56 kgs and standard deviation 25 kgs. Test at level of significance 0.05
Hint n  64, H 0 :   56, H1 :   56 
(13) A sample of 400 individuals is found to have mean height of 67.47 inches. Can it be reasonably
regarded as a sample from a large population with mean height of 67.39 inches and standard
deviation 1.30 inches? Hint n  400, H 0 :   67.9, H1 :   67.9 
(14) A lady stenographer claims that she can take dictation at the rate of 120 words per min. Can we reject
her claim on the basis of 100 trials in which she demonstrates a mean of 116 words with a standard
deviation of 15 words Hint n  100, H 0 :   120, H1 :   120 
(15) A company is making engine parts with axle diameter of 0.7 inch. A random sample of 10 parts
shows a mean diameter of 0.742 inch with a standard deviation of 0.04 inch. Test whether the work
meet the specification at 0.05 level of significance. Hint n  10, H 0 :   0.7, H 1 :   0.7 
(16) A machine is designed to produce insulating washers for electrical devices of average thickness of
0.025 cm. A random sample of 10 washers was found to have a thickness of 0.024 cm with a standard
deviation of 0.002 cm. Test the significance of the deviation at the 0.05 level.
Hint n  10, H 0 :   0.025, H1 :   0.025 
(17) The average breaking strength of certain kind of steel rods is 18.5 thousands of pounds. To test this, a
sample of 14 steel rods of this kind was tested. The mean and standard deviation are respectively
obtained as 17.85 and 1.955 thousands of pounds. Test at 0.05 level of significance.
Hint n  14, H 0 :   18.5, H1 :   18.5 
(18) A random sample from a company’s very extensive files shows that the orders for a certain kind of
machinery were filled, respectively in 10, 12, 19, 14, 15, 18, 11 and 13 days. Use the level of
significance 0.01 to test the claim that on the average such orders are filled in 10.5 days. Choose the
alternative hypothesis so that rejection of null hypothesis implies that it takes longer than indicated.
Hint n  8, H 0 :   10.5, H1 :   10.5, x  14, s  3.207 
(19) A manufacturer of certain kind of electric bulbs claims that his bulbs have a mean life of 25 months.
A random sample of 6 bulbs gave the life months 24, 26, 30, 20, 20 and 18. Can you accept the
manufacturer’s claim at 5% level of significance?
Hint n  6, H 0 :   25, H1 :   25, x  23, s  4.52 
(20) A random sample of 8 envelops is taken from a letter box of a post office and their weights in grams
are found to be 12.1, 11.9, 12.4, 12.3, 11.9, 12.1, 12.4 and 12.1. Does this sample indicate at 1% level
of significance that the average weight of envelops received the post office is 12.35 grams.
Hint n  8, H 0 :   12.35, H1 :   12.35, x  12.15, s  0.632 
(21)
Test of Hypothesis – Two Means:
Case (i): For large samples (n1  30, n2  30) ,
(i) Null Hypothesis H 0 : 1   2  

(ii) Alternative Hypothesis H 1 : Any one of the following 1   2   , 1   2   , 1   2  
(iii) Level of Significance :

X 1  X 2   
  12  22 
  
 n1 n 2 
(v) Criterion :
H1 Reject H 0 if
1   2   Z   z
1   2   Z  z
1   2   Z  z
2
Note: If n1  30, n2  30 and  1 ,  2 are not known, then we can use s1 , s 2 in place of  1 ,  2 respectively
Case (ii): For small samples (n1  30, n2  30) and  1 ,  2 unknown,
(1) Null Hypothesis H 0 : 1   2  

(2) Alternative Hypothesis H 1 : Any one of the following 1   2   , 1   2   , 1   2  
(4) Test statistic

X  X 2    with   n  n  2
: t 1 1 2
  p2  p2 
  
 n1 n2 
 
where  2

n1  1 s1  n2  1 s 2
2 2
n1  n2  2
p
 x  x1   x2  x2  
2 2
or  2

1
n1  n2  2
p
(5) Criterion :
H1 Reject H 0 if
1   2   t  t
1   2   t  t
1   2   t  t
2
Note: (i)  p2 is known as the variance by pooling

(ii) If n1  n2  2  30 , then we can use   
Confidence Interval - Two Means:
Case (i): For large samples (n1  30, n2  30) ,

with the probability 1   or 1   100% confidence,
Upper and Lower confidence limits: x1  x2   E

  12  22 
Confidence Interval: x1  x2   E, x1  x2   E  where, E  z    
2  1
n n 2 
Case (ii): For small samples (n  30, n2  30) ,

with the probability 1   or 1   100% confidence,
Upper and Lower confidence limits: x1  x2   E

  p2  p2 
x1  x2   E, x1  x2   E  where, E  t    
Confidence Interval:  n1 n2  with   n1  n2  2
2  
n1  1 s1  n2  1 s 2
2 2
and  2

n1  n2  2
p
 x  x1   x2  x2  
2 2
Or  
2 1
n1  n2  2
p
Test statistics and Critical regions for tests of Hypotheses for ONE & TWO means
S.No Test of Test Statistic H1 Reject H 0 if

Hypothesis
1 One Mean X    0 Z   z
Large Samples Z
    0 Z  z
 
 n   0 Z  z
2
2 One Mean X    0 t  t
Small Samples t ,   n 1
 s    0 t  t
 
 n   0 t  t
2
3 Two Means
Z
X 1  X 2   1   2   Z   z
Large Samples 1   2   Z  z
  12  22 
   1   2   Z  z
 1
n n 2 
2
4 Two Means
t
X 1  X 2  
,   n1  n2  2
1   2   t  t
Small Samples
  p2  p2  1   2   t  t
   1   2  
 n1 n2  t  t
  2
n  1 s1 2  n2  1 s 2 2
where,  p2  1
n1  n2  2
 x  x1   x2  x2  
2 2
or  2

1
n1  n2  2
p
Maximum Error & Confidence Interval
S.No. Maximum Error Confidence Interval Probability/

Confidence
1 Two
  12  22  x1  x2   E, x1  x2   E  1 /
Means E  z    1   100%
Large 2  1
n n 2 
Samples
2 Two
  p2  p2  x1  x2   E, x1  x2   E  1 /
Means E  t    1   100%
Small  n1 n2 
Samples
2  
where
 2

n1  1 s1  n2  1 s 2
2 2
n1  n2  2
p
or
 x  x1   x2  x2  
2 2
 2

1
n1  n2  2
p
Examples:
(1) The means of two large samples of sizes 1000 and 2000 members are 67.5 inches and 68.0 inches
respectively. Test whether the samples are drawn from the same population of standard deviation 2.5
inches at 0.05 level of significance
Here n1  1000, n2  2000  large samples , x1  67.5, x2  68,   2.5 and   0.05
We need to check 1  2 or 1  2  0
(i) Null Hypothesis H 0 : 1  2  0  
(ii) Alternative Hypothesis H 1 : 1   2  0

X 1  X 2   
  12  22 
  
 n1 n2 
2 2
(vi) Calculation :
Z 1
X  X 2     67.5  68  5.16, Z  5.16
  12  22 
 
2.5
2

2.5
2

 n1 n2  1000 2000
And z   z 0.025  1.96
2
(vii) Decision: Since Z  z  , reject H 0 based on the sample data at   0.05

2
That is, the samples are not drawn from the same population
(2) Samples of students were drawn from 2 Universities and from their weights in kgs., mean and
standard deviations are calculated and given below. Make a large sample test the significance of the
difference between the means at 0.05 level of significance
Mean SD Sample size

University A 55 10 400
University B 57 15 100
Here n1  400, n2  100  large samples , x1  55, x2  57, s1  10, s2  15 and   0.05

X 1  X 2   
 s12 s 22 
  
 n1 n2 
2 2
(vi) Calculation :
Z
X 1  X 2  

55  57
 1.2649, Z  1.2649
 s12 s 22 
  
102  152
 n1 n2  400 100
And z   z 0.025  1.96
2

2
That is, there is no significant difference between the means
(3) A random sample of 1000 men from North India shows that mean wage is Rs.5/- per day with
standard deviation Rs.1.5/-. A sample of 1500 men from South India given mean wage is Rs.4.5/- per
day with standard deviation Rs.2/-. Does the mean rate of wages vary between the regions? Choose
the level of significance 0.01
Here n1  1000, n2  1500  large samples , x1  5, x2  4.5, s1  1.5, s2  2 and   0.01


X 1  X 2   
 s12 s 22 
  
 n1 n2 
2 2
(vi) Calculation :
Z 1
X  X 2     5  4.5
 7.1307, Z  7.1307
 s12 s 22 
  
1.52 
22
 n1 n2  1000 1500
And z   z 0.005  2.575
2

2
(4) The average marks scored by 32 boys, is 72 with a standard deviation of 8. While that for 36 girls is
70 with a standard deviation of 6. Does this indicate that the boys perform better than girls at the
level of significance 0.05
Here n1  32, n2  36  large samples , x1  72, x2  70, s1  8, s2  6 and   0.05
(i) Null Hypothesis H 0 : 1   2  0  

(ii) Alternative Hypothesis H 1 : 1   2 or 1   2  0
X 1  X 2  
 s12 s22 
  
 n1 n2 
(vi) Calculation :
Z 1
X  X 2     72  70  1.1547
 s12 s 22 
  
8
2

6
2
 n1 n2  32 36
And z  z 0.05  1.645
(vii) Decision: Since Z  z accept H 0 based on the sample data at   0.05
That is, the performance of boys and girls is same
(5) The IQs of 16 students from one area of a city showed a mean of 107 with a standard deviation of 10,
while the IQs of 14 students from another area of the city showed a mean of 112 with a standard
deviation of 8. Is there a significant difference between the IQs of the two groups at at the level of
significance 0.05
Here
n1  16, n2  14  small samples ,   n1  n2  2  28, x1  107, x2  112, s1  10, s2  8 and   0.05


 X1  X 2  
where  p 
2 n1  1 s1  n2  1 s 2
2 2
  p2  p2  n1  n2  2
  
 n1 n2 
 
2 2
(vi) Calculation :
n  1 s1  n2  1 s 2  15102  1382  2332  83.29
 p2  1
2 2
n1  n2  2 28 28
t
X 1  X2  

107  112
 1.4971, t  1.4971
  p2  p2  83.29 83.29

  
n n  16 14
 1 2 
And t   t 0.025  2.048

2
(vii) Decision: Since t  t  accept H 0 based on the sample data at   0.05

2
That is, there no significant difference between the IQs of the two groups
(6) A group of 5 patients treated with medicine ‘A’ weigh 42, 39, 48, 60 and 41 kgs. Second group of 7
patients from the same hospital treated with medicine ‘B’ weigh 38, 42, 56, 64, 68, 69 and 62 kgs. Do
you agree with the claim that medicine ‘B’ increases the weight significantly at the level of
significance 0.05
Here n1  5, n2  7  small samples ,   n1  n2  2  10 and   0.05
x1  42  39  48  60  41 
1 230
 46
5 5
x2  38  42  56  64  68  69  62 
1 399
 57
7 7
Sum of the squares of the deviations of 1st sample,   x1  x1   16  49  4  196  25  290
2
Sum of the squares of the deviations of 2nd sample,

 x  x2   361  225  1  49  121  144  25  926
2
2
And  p2   x 1  x1   x2  x2 

2 2
  290  926  121.6
n1  n2  2 10

(ii) Alternative Hypothesis H 1 : 1   2  0
X  X 2    x 
 x1   x2  x2 
2 2
(iv) Test statistic : t 1

where  2

1
n1  n2  2
p
  p2  p2 
  
 n1 n2 
 
(v) Criterion : Reject H 0 if t  t and  t  t0.05  1.812
(vi) Calculation :
t
X 1  X 2  

46  57
 1.7036
  
2 2 121.6 121.6

 
p
 p
n n  5 7
 1 2 
And  t  t0.05  1.812
(vii) Decision: Since t  t , accept H 0 based on the sample data at   0.05
That is, there no significant increase in the weights
(7) Two independent samples of size 8 and 7 has the following values
Sample I 11 11 13 11 15 9 12 14
Sample II 9 11 10 13 9 8 10
Is the difference between the means of samples significant?
Here n1  8, n2  7  small samples ,   n1  n2  2  13

Since  is not given, we can take   0.05
1
11  11  13  11  15  9  12  14  96  12
x1 
8 8
x2  9  11  10  13  9  8  10 
1 70
 10
7 7
 x1  x1   1  1  1  1  9  9  0  4  26
2
 x 2  x2   1  1  0  9  1  4  0  16
2
And  p2   x 1  x1   x2  x2 
2 2

26  16
 3.2308
n1  n2  2 13
(ii) Alternative Hypothesis H 1 : 1  2  0

 X1  X 2  
where  p 
2  x1  x1 2  x2  x2 2
  p2  p2  n1  n2  2
  
 n1 n2 
 
2 2
(vi) Calculation :
t
X 1  X 2  

12  10
 2.15 and t  2.15
  p2  p2  3.2308 3.2308

  
n n  8 7
 1 2 
And t   t 0.025  2.16
2

2
That is, there no significant difference in the means
(8) The means and standard deviations of two samples of sizes 200 and 100 are respectively given by 60
and 50, 8 and 12. Find the (i) standard error (ii) maximum error (iii) 95% confidence interval of the
difference between the means
Here n1  200, n2  100  large samples , x1  60, x2  50, s1  8, s2  12

Now,  0.025 and z  z0.025  1.96
2 2
 s12 s22   64 144 

Standard error, SE          1.3266
 n1 n2   200 100 
 s12 s22   64 144 
Maximum error, E  z      (1.96)     2.6
2  n1 n2   200 100 
Lower confidence limit, ( x1  x2 )  E  (60  50)  2.6  7.4
Upper confidence limit, ( x1  x2 )  E  (60  50)  2.6  12.6
Therefore, confidence interval ( x1  x2 )  E, ( x1  x2 )  E   7.4, 12.6
(9) The means and standard deviations of two samples of sizes 16 and 14 are respectively given by 107
and 112, 10 and 8. Find the (i) standard error (ii) maximum error (iii) 95% confidence interval of the
difference between the means
Here n1  16, n2  14  small samples ,   n1  n2  2  28, x1  107, x2  112, s1  10, s2  8

Now,  0.025 and t   t 0.025  2.048
2 2
 p2 
n1  1 s1  n2  1 s2 2  15102  1382
2

2332
 83.2857
n1  n2  2 28 28
  p2  p2 
Standard error, SE      83.2857  83.2857  3.3398
n 
 1 n2  16 14
  p2  p2 
Maximum error, E  t      (2.048) 83.29  83.29  6.8399
n 
2  1 n2  16 14
Lower confidence limit, ( x1  x2 )  E  (107 112)  6.8399  11.8399
Upper confidence limit, ( x1  x2 )  E  (107 112)  6.8399  1.8399
Therefore, confidence interval ( x1  x2 )  E, ( x1  x2 )  E   11.8399, 1.8399
Paired sample t-test or Paired t-test or Matched pair of comparison test:

In the application of the two sample t-test (two means, small/large samples), the samples are independent.
But this test cannot be used when we deal with ‘before and after’ kind of data. Here the data is in pairs like
a data related to ‘before and after a training program’, ‘before and after taking a particular medicine’,
‘before and after listened a class’ etc. In this case, we work with the differences of the paired data and these
differences looked upon as a sample from the population of these differences. If this sample is small, we
use the one sample t-test (one mean small sample test); otherwise, we use the corresponding large sample
test (one mean large sample test). The one sample t-test used for this kind of paired data is called paired
sample t-test
(10) The following are the average weekly loss of working hours due to accidents in 10 industrial plants
before and after a certain safety program was put into operation: 45 and 36, 73 and 60, 46 and 44, 124
and 119, 33 and 35, 57 and 51, 83 and 77, 34 and 29, 26 and 24, 17 and 11. Use the level of
significance 0.05 test whether the safety program is effective. Also find the 90% confidence interval
for the mean improvement in loss of working hours.
Solution: Suppose that average loss of working hours before and after a safety program are 1 and
 2 respectively. Now the safety program is effective means that less number of loss of working hours
and hence we have to test 1  2 or 1  2  0 . If  is the average of the differences of the loss of
working hours, then we have to test   0 .
Here the differences are 9, 13, 2, 5, - 2, 6, 6, 5, 2 and 6

n  10 (small sample ),   n  1  9,   0 and   0.05
1 n 9  13  2  5  2  6  6  5  2  6 52
Sample mean, x   xi    5.2
n i 1 10 10
1 n
n  1 i 1
1
9

 9  5.2   13  5.2   2  5.2   5  5.2    2  5.2 
2 2 2 2 2
 6  5.2   6  5.2   5  5.2   2  5.2   6  5.2 

2 2 2 2 2

 16.6222
Sample standard deviation, s  4.077
(i) Null Hypothesis H 0 :  0

(ii) Alternative Hypothesis H 1 :  0
X 
 s 
 
 n
(v) Criterion : Reject H 0 if t  t and t  t 0.05  1.833
(vi) Calculation :
X  5.2  0
t   4.033 And t  t 0.05  1.833
 s   4.077 
   
 n   10 
(vii) Decision: Since t  t , reject H 0 based on the sample data at   0.05
That is the safety program is effective
Confidence Interval:
Here n  10 (small sample ),   n  1  9,   0, x  5.2 and s  4.077

Now,  0.05 and t   t0.05  1.833
2 2
 s 
2 n 
 4.077 
 1.833   2.3632
 10 
(11)
Exercise:
(1) Write the test statistic and criteria for the testing a hypothesis concerning two means large and small
samples
(2) A sample of 100 electric bulbs produced by manufacturer A showed a mean life time of 1190 hours
and a standard deviation of 90 hours. A sample of 75 bulbs produced by manufacturer B showed a
mean life time of 1230 hours, with a standard deviation of 120 hours. Is there a significant difference
between the mean life time of two brands at the level of significance 0.05
(3) A company claims that its light bulbs are superior to those of its main competitor. In a study showed
that a sample of n1  40 of its bulbs has a mean life time of 647 hours of continuous use with a
standard deviation of 27 hours, while a sample of n2  40 bulbs made by its main competitor had a
mean life time of 638 hours of continuous use with a standard deviation of 31 hours, does this
substantiate the claim at the 0.05 level of significance
Hint n1  40, n2  40, H 0 : 1   2  0, H1 : 1   2  0, Z  1.38, z0.05  2.645, accept H 0 
(4) The means of two random samples of sizes 8 and 7 are 1234 and 1036 respectively. The standard
deviations of these two samples are 36 and 40 respectively. Is there a significant difference between
the means at the level of significance 0.05
Hint n1  8, n2  7, H 0 : 1   2  0, H1 : 1   2  0, t  9.39, t 0.025  2.160, reject H 0 
(5) The means of two random samples of sizes 9 and 7 are 196.42 and 198.82 respectively. The sum of
the squares deviations from the mean are 26.94 and 18.73 respectively. Can the samples be
considered to have been drawn from the same population? Use the 0.01 level of significance
Hint n1  9, n2  8, H 0 : 1   2  0, H1 : 1   2  0, t  2.63, t 0.005  2.977, accept H 0 
(6) Measuring specimens of nylon yarn taken from two spinning machines, it was found that 8 specimens
from the first machine had a mean denier of 9.67 with a standard deviation of 1.81 while 10
specimens from second machine had a mean denier of 7.43 with a standard deviation of 1.48.
Assuming that the populations sampled are normal and have same variance, test the null hypothesis
1   2  1.5 against the alternative hypothesis 1   2  1.5 at the 0.05 level of significance.
(7) The following random samples are measurements of the heat-producing capacity (in millions of
calories per ton) of specimens of coal from two mines
Mine 1 8,260 8,130 8,350 8,070 8,340

Mine 2 7,950 7,890 7,900 8,140 7,920 7,840
Use 0.01 level of significance, to test whether the difference between the means of these two samples,
is significant
(8) Two horses A and B were tested according to the time (in seconds) to run a particular race with the
following results
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29
Test whether the two horses have the same running capacity or not at the level of significance 0.05
(9) In a study of the effectiveness of physical exercise in weight reduction program, a group of 11
persons engaged in a prescribed program of physical exercise for a month showed the following
results.
Weight before 209 178 169 212 180 192 159 180 170 153 183
Weight after 196 171 160 207 177 190 128 196 164 152 179
Use the level of significance 0.01 test whether the prescribed program is effective.
(10)
(For CSE-DS)
Unit – 5: Estimation and Test of Hypothesis of Variances and Proportions
(Estimation of variance, hypothesis concerning one variance, hypothesis concerning two variances,
estimation of proportion, hypothesis concerning one proportion, hypothesis concerning several
proportions)
Proportion:
Let p, q be the success and failure probabilities of an event of a trial. Let the trial be conducted in any
number of times. Then the collection of all successes and failures of the event is a population known as
Binomial population. For this population, p is called Proportion or True proportion. If x is the number
x x
of successes in n trials, then is called the Sample Proportion and it is denoted by P ; that is, P 
n n
Example:
Let a coin be tossed 10 times and ‘getting head H ’ be the event. Suppose that the outcomes in these 10
tosses are H , H , H , T , T , H , T , H , H , T respectively. Now the collection of all the successes and failures of
the event is a population; that is, Population   S , S , S , F , F , S , F , S , S , F .
1
For this population, Proportion p 
2
(i) If we collect 1st five outcomes S , S , S , F , F , then it is a sample of size n  5
x 3
For this sample, sample proportion P  
n 5
(ii) If we collect 1 two outcomes S , S , then it is a sample of size n  2
st
x 2
For this sample, sample proportion P    1
n 2
(iii) If we collect 1 four outcomes S , S , S , F , then it is a sample of size n  4
st
x 3
n 4
(iv) If we collect last two outcomes S , F , then it is a sample of size n  2
x 1
n 2
Maximum Error of the Proportion:

x  x
1  
n  n
(i) E  z   is the maximum error of the Proportion with the probability 1  
2
n
2
 z 
(ii) If we know the value of p , then the sample size is given by n  p 1  p   2 
E 
 
2
 z 
1  2 
(iii) If we do not know the value of p , then the sample size is given by n 
4 E 
 
Confidence Interval – One Proportion:
With the probability 1   or 1   100% confidence,

x
Upper and Lower confidence limits:  E
n
x  x
1  
x x  n  n
Confidence Interval:   E ,  E  where, E  z  
n n  2
n
Examples:
(1) Among 900 people in a state 90 are found to be rice eaters. Construct 99% confidence interval for the
true proportion p
x 90
Here n  900 , x  90 and   0.1
n 900

Now,  0.005 and z   z 0.005  2.575
2 2
x  x
1  
n  n
The Maximum error of the proportion, E  z  
2
n
 2.575
0.1 1  0.1  0.02575
900
x
Lower confidence limit,  E  0.1  0.02575  0.07425
n
x
Upper confidence limit,  E  0.1  0.02575  0.12575
n
x 
Therefore, confidence interval   E ,  E   0.07425, 0.12575
x
n n 
(2) In a random sample of 400 industrial accidents, it was found that 231 were due at least partially to
unsafe working conditions. Construct 99% confidence interval for the true proportion using the large
sample confidence interval formula
x 231
Here n  400 , x  231 and   0.5775
n 400

Now,  0.005 and z   z 0.005  2.575
2 2
x  x
1  
n  n
2
n
 2.575
0.5775 1  0.5775  0.0636
400
x
n
x
n
x 
x
n n 
(3) If x  36 of n  100 persons interviewed are familiar with the tax incentives for installing certain
energy saving devices construct a 95% confidence interval for the corresponding true proportion
x 36
Here n  100 , x  36 and   0.36
n 100

Now,  0.025 and z   z 0.025  1.96
2 2
x  x
1  
n  n
2
n
 1.96
0.36 1  0.36  0.0941
100
x
n
x
n
x 
x
n n 
(4) Find the sample size if the true proportion does not exceed 0.12 to estimate the true Proportion of
defective items with at least 95% confidence with error 0.04.
Here the maximum error of the proportion, E  0.04 and p  0.12


Now,  0.025 and z   z 0.025  1.96
2 2
2
 z 
For known proportion, the sample size n  p 1  p   2 
E 
 
2
 1.96 
 0.121  0.12  253.55  254
 0.04 
(5) What is the size of the smallest sample required to estimate an unknown proportion to within a
maximum error of 0.06 with at least 95% confidence
Here the maximum error of the proportion, E  0.06


Now,  0.025 and z   z 0.025  1.96
2 2
2
 z  2
For unknown proportion, the sample size n 

1  2   1  1.96   266.7  267
4 E  4  0.06 
 
Exercise:
(1) In a sample survey conducted in a large city, 136 of 400 persons answered ‘yes’ to the question of
whether their cities public transportation is adequate. With 99% confidence, what can be say about
x 136
the maximum error, if  = 0.34 is used as an estimate of the corresponding true proportion
n 400
(2) If sample size n  400 and proportion to success p  0.578 then construct 98% confidence interval
for the proportion p
(3) Among 900 people in a state 90 are found to be blind. Construct 98% confidence interval for the true
proportion
(4) A random sample of 500 apples was taken from, a large consignment and 60 were found to be bad.
Obtain the 98% confidence limits for the percentage number of bad apples in the consignment
(5) What is the size of the smallest sample required to estimate an unknown proportion to within a
maximum error of 0.08 with at least 98% confidence
(6) Find the sample size if the true proportion does not exceed 0.2 to estimate the true Proportion of
defective items with at least 95% confidence with error 0.05
Test of Hypothesis – One Proportion:
(1) Null Hypothesis H 0 : p  p0

(2) Alternative Hypothesis H 1 : Any one of the following p  p0 , p  p0 , p  p0
x
p
xn p n
(4) Test statistic : Z or Z 
n p 1  p  p 1  p 
n
(5) Criterion :
H1 Reject H 0 if
p  p0 Z   z
p  p0 Z  z
p  p0 Z  z
2
Examples:
(1) An airline claims that only 6% of all lost luggage is never found. If, in a random sample, 17 of 200
pieces of lost luggage are not found, test the null hypothesis p  0.06 against the alternate hypothesis
p  0.06 at 0.05 level of significance
Here n  200, x  17, p  0.06 and   0.05
(i) Null Hypothesis H 0 : p  0.06

(ii) Alternative Hypothesis H 1 : p  0.06
xn p
n p 1  p 
(vi) Calculation :
xn p 17  2000.06
Z   1.4887
n p 1  p  2000.061  0.06
And z  z 0.05  1.645
(vii) Decision: Since Z  z , accept H 0 based on the sample data at   0.05

(2) In a study designed to investigate whether certain detonators used with explosives in coal mining
meet the requirement that at least 90% will ignite the explosive when charged, it is found that 174 of
200 detonators function properly. Test the null hypothesis p  0.9 against the alternative hypothesis
p  0.9 at the 0.05 level of significance
Here n  200, x  174, p  0.9 and   0.05
(i) Null Hypothesis H 0 : p  0. 9

(ii) Alternative Hypothesis H 1 : p  0.9
xn p
n p 1  p 
(vi) Calculation :
xn p 174  2000.9
Z   1.4142
n p 1  p  2000.91  0.9
And  z   z 0.05  1.645
(vii) Decision: Since Z   z , accept H 0 based on the sample data at   0.05
(3) In a big city 325 men out of 600 men were found to be smokers. Does this information support the
conclusion that the majority of men in this city are smokers
Here n  600, x  325 and   0.05 ( assumed )

We have the check the Hypothesis p  0.5

(ii) Alternative Hypothesis H 1 : p  0.5
xn p
n p 1  p 
(vi) Calculation :
xn p 325  6000.5
Z   2.0412
n p 1  p  6000.51  0.5
And z  z 0.05  1.645

That is, the majority of men in this city are smokers
(4) In a large consignment of oranges, a random sample of 64 oranges revealed that 14 oranges were bad.
Is it reasonable to assume that 20% of oranges were bad at 5% level of significance?
Here n  64, x  14 and   0.05

20
We have to check the hypothesis p   0.2
100
(i) Null Hypothesis H 0 : p  0.2

(ii) Alternative Hypothesis H 1 : p  0. 2
xn p
n p 1  p 
2 2
(vi) Calculation :
xn p 14  640.2
Z   0.375, Z  0.375
n p 1  p  640.21  0.2
And z   z 0.025  1.96
2

2
(5) A coin is tossed 960 times and head turns up 183 times. Is the coin biased? Use the 0.05 level of
significance
Here n  960, x  183 and   0.05

If a coin is unbiased, then the probability of getting a Head in a single toss, p  0.5
So we have to check the hypothesis p  0.5

(ii) Alternative Hypothesis H 1 : p  0.5
xn p
n p 1  p 
2 2
(vi) Calculation :
xn p 183  9600.5
Z   19.1713, Z  19.1713
n p 1  p  9600.51  0.5
And z   z 0.025  1.96
2

2
That is, the coin is biased
(6)
(7)
(8)
Exercise:
(1) Write the test statistic and criteria for the testing a hypothesis concerning one proportion and two
proportions
(2) A manufacturer of submersible pumps claims that at most 30% of the pumps require repairs within
the first five years of operation. If a random sample of 120 of these pumps includes 47 which
required repairs within the first five years of operation, test the null hypothesis p  0.3 against the
alternative hypothesis p  0.3 at the 0.05 level of significance
(3) An ambulance service’s claim that at least 40% of its calls are life-threatening emergencies, a random
sample was taken from its files, and it was found that only 49 of 150 calls were life-threatening
emergencies. Can the null hypothesis p  0.4 be rejected against the alternative hypothesis p  0.4 at
the 0.01 level of significance
(4) In a random sample of 600 cars making a right turn at a certain intersection, 157 pulled into the
wrong lane. Test the null hypothesis that actually 30% of all drivers make this mistake at the given
intersection, using the alternative hypothesis p  0.3 and the level of significance 0.01
(5) In a random sample of 160 workers exposed to a certain amount of radiation, 24 experienced some ill
effects. Test the null hypothesis p  0.18 versus the alternative hypothesis p  0.18 at the 0.01 level
significance Hint n  960, x  450, H 0 : p  0.5, H1 : p  0.5 
significance Hint n  400, x  216, H 0 : p  0.5, H1 : p  0.5 
(8) A die is thrown 900 times and it falls with 5 upwards 185 times. Is the die biased? Use the 0.01 level
 1 1
of significance  Hint n  900, x  185, H 0 : p  , H 1 : p  
 6 6
Test of Hypothesis – Two Proportions:
(1) Null Hypothesis H 0 : p1  p2  0

(2) Alternative Hypothesis H 1 : Any one of the following p1  p2  0 , p1  p2  0 , p1  p2  0
x1 x 2

n1 n2 x  x2
(4) Test statistic : Z where, pˆ  1
pˆ 1  pˆ  pˆ 1  pˆ  n1  n2

n1 n2
(5) Criterion :
H1 Reject H 0 if
p1  p2  0 Z   z
p1  p2  0 Z  z
p1  p2  0 Z  z
2
x1  x2
Note: pˆ  is known as the proportion by pooling
n1  n2
Confidence Interval - Two Proportions:
x x 
Upper and Lower confidence limits:  1  2   E
 n1 n2 
 x x  x x  
Confidence Interval:   1  2   E ,  1  2   E 
  n1 n2   n1 n2  
pˆ 1  pˆ  pˆ 1  pˆ  x x
Where, E  z    and pˆ  1 2
2
n1 n2 n1  n2
Examples:
(1) A manufacturer of electronic equipment subject’s samples of two competing brands of transistors to
an accelerated performance test. If 45 of 180 transistors of the first kind and 34 of 120 transistors of
the second kind fail the test, what can he conclude at the level of significance   0.05 about the
difference between the corresponding sample proportions
Here n1  180, x1  45, n2  120, x2  34 and   0.05

We have to check the hypothesis p1  p2  0
(i) Null Hypothesis H 0 : p1  p2  0

(ii) Alternative Hypothesis H 1 : p1  p2  0
x1 x 2

n1 n2 x  x2
(iv) Test statistic : Z where, pˆ  1
pˆ 1  pˆ  pˆ 1  pˆ  n1  n2

n1 n2
2 2
(vi) Calculation :
x1  x2 45  34 79
pˆ     0.2633
n1  n2 180  120 300
x1 x2 45 34
 
n1 n2 180 120
Z 
pˆ 1  pˆ  pˆ 1  pˆ  0.2633 1  0.2633 0.2633 1  0.2633
 
n1 n2 180 120
0.25  0.2833  0.0333
   0.641
0.194 0.194 0.0011  0.0016

180 120
Z  0.641
And z   z 0.025  1.96
2

2
(2) A study shows that 16 of 200 tractors produced one assembly line required extensive adjustments
before they could be shipped, while the same was true for 14 of 400 tractors produced another
assembly line. At the 0.01 level of significance, does this support the claim that the second line does
superior work?
Here n1  200, x1  16, n2  400, x2  14 and   0.05

Second line does superior work means that tractors produced by 2nd line required less adjustments;
that is, p1  p2 or p1  p2  0
So we have to check the hypothesis p1  p2  0

(ii) Alternative Hypothesis H 1 : p1  p2  0
x1 x 2

n1 n2 x  x2
pˆ 1  pˆ  pˆ 1  pˆ  n1  n2

n1 n2
(vi) Calculation :
x  x2 16  14 30
pˆ  1    0.05
n1  n2 200  400 600
x1 x2 16 14
 
n1 n2 200 400
Z   2.3841
pˆ 1  pˆ  pˆ 1  pˆ  0.05 1  0.05 0.05 1  0.05
 
n1 n2 200 400
And z  z 0.01  2.33
(3) A machine puts out 9 imperfect articles in a sample of 200 articles. After the machine is overhauled it
puts out 5 imperfect articles in a sample of 700 articles. Test at 5% level of significance, whether the
machine is improved.
Here n1  200, x1  9, n2  700, x2  5 and   0.05

The machine is improved means that the machine puts out less number of imperfect articles; that is,
p1  p2 or p1  p2  0
So we have to check the hypothesis p1  p2  0

(ii) Alternative Hypothesis H 1 : p1  p2  0
x1 x 2

n1 n2 x  x2
pˆ 1  pˆ  pˆ 1  pˆ  n1  n2

n1 n2
(vi) Calculation :
x  x2 95 14
pˆ  1    0.0156
n1  n2 200  700 900
x1 x2 9 5
 
n1 n2 200 700
Z 
pˆ 1  pˆ  pˆ 1  pˆ  0.0156 1  0.0156 0.0156 1  0.0156
 
n1 n2 200 700
0.045  0.0071 0.0379
   3.8091
0.0154 0.0154 0.0154 0.0154
 
200 700 200 700
And z  z 0.05  1.645

That is, the machine is improved
(4)
Exercise:
(1) Random samples of 400 men and 600 women were asked whether they would like to have a flyover
near their residence. 200 men and 325 women were in favor of the proposal. Test the hypothesis that
proportions of men and women in favor of the proposal are same, at 5% level of significance.
Hint n1  400, x1  200, n2  600, x2  325, H 0 : p1  p2  0, H1 : p1  p2  0 
(2) One method of seeding clouds was successful in 57 of 150 attempts while another method was
successful in 33 of 100 attempts. At the 0.05 level of significance, can we conclude that the first
method is better than second?
Hint n1  150, x1  57, n2  100, x2  33, H 0 : p1  p2  0, H1 : p1  p2  0 
(3)
Test of Hypothesis – Several Proportions:
Let n1 , n2 , n3 ,  nk be the sizes of k number of samples taken from k number of populations respectively.
Let x1 , x2 , x3 ,  xk be the numbers of successes of these k samples respectively.
Let n  n1  n2  n3    nk and x  x1  x2  x3    xk .
Now all these values can be tabulated as follows
Sample 1 Sample 2  Sample k Total

No. of successes x1 x2  xk x
No. of failures n1  x1 n2  x2  nk  x k nx
Total n1 n2  nk n
In the above table, each entry in (i, j ) th cell is called Observed frequency and it is denoted by Oi j for
i  1, 2 and j  1, 2, 3, k ; that is,
O11  x1 , O1 2  x2 ,  O1 k  xk ,
O21  n1  x1 , O2 2  n2  x2 ,  O2 k  nk  xk
And the Expected frequency of each (i, j ) th cell is given by
ei j 
i th
 
row total  j th column total  for i  1, 2 and j  1, 2, 3, k ; that is,
n
x n1 x n2 x nk
e11  , e1 2  ,  e1 k  ,
n n n
n  x  n1 n  x  n2 n  x  nk
e21  , e2 2  ,  e2 k 
n n n
(1) Null Hypothesis H 0 : p1  p2  p3    pk
(2) Alternative Hypothesis H 1 : Not all p1 , p2 , p3 ,  pk are equal
2 k O  ei j  2
:    with   k  1
2 ij
(4) Test statistic
i 1 j 1 ei j
: Reject H 0 if  2  
2
(5) Criterion
Examples:
(1) Samples of three kinds of materials, subjected to extreme temperature changes, produced the results
shown in the following table
Material A Material B Material C Total

Crumbled 41 27 22 90
Remained intact 79 53 78 210
Total 120 80 100 300
Use the 0.05 level of significance to test whether, under the stated conditions, the probability of
crumbling is the same for the three kinds of materials
Here the data is in 2 rows and 3 columns; that is the number of samples, k  3
Therefore,   k  1  2
(i) Null Hypothesis H 0 : p1 , p2 , p3 are all equal or p1  p2  p3

(ii) Alternative Hypothesis H 1 : p1 , p2 , p3 are not all equal
2 k O  ei j  2
:    with   k  1
2 ij
(iv) Test statistic
i 1 j 1 ei j
: Reject H 0 if  2   and    0.05  5.991
2 2 2
(v) Criterion
(vi) Calculation :
Observed frequency, Oi j  entry in (i,j)th cell, for i  1, 2 and j  1, 2, 3

O11  41, O1 2  27, O13  22,
O21  79, O2 2  53, O2 3  78,
Expected frequency, ei j 
i  
row total  j th column total
th
for i  1, 2 and j  1, 2, 3

n
e11 
90120  36, e  9080  24, e  90100  30,
12 13
300 300 300
e21 
210120  84, e  21080  56, e  210100  70
22 23
300 300 300
2 k O  ei j  2
Now   
2 ij
i 1 j 1 ei j

O 11  e11  2

O 12  e1 2  2

O 13  e13  2
e11 e1 2 e13

O 21  e21  2

O 22  e2 2  2

O 23  e2 3  2
e2 1 e2 2 e2 3

41  36 2 
27  24 2  22  30 2
36 24 30

79  84  53  56 2  78  70 2
2
84 56 70
 0.6944  0.375  2.1333  0.2976  0.1607  0.9143
 4.5753
And    0.05  5.991

2 2
(vii) Decision: Since  2   , accept H 0 based on the sample data at   0.05

2
(2) Four methods are under development for making discs of a superconducting material. Fifty discs are
made by each method and they are checked for superconductivity when cooled with liquid nitrogen
Method I Method II Method III Method IV Total

Super conductors 31 42 22 25 120
Failures 19 8 28 25 80
Total 50 50 50 50 200
Test whether there is any significant difference between the proportions of super conductors
produced at the 0.05 level of significance
Here the data is in 2 rows and 4 columns; that is the number of samples, k  4
Therefore,   k  1  3
(i) Null Hypothesis H 0 : p1 , p2 , p3 , p4 are all equal or p1  p2  p3  p4

(ii) Alternative Hypothesis H 1 : p1 , p2 , p3 , p4 are not all equal
2 k O  ei j  2
:    with   k  1
2 ij
(iv) Test statistic
i 1 j 1 ei j
: Reject H 0 if  2   and    0.05  7.815
2 2 2
(v) Criterion
(vi) Calculation :
Observed frequency, Oi j  entry in (i,j)th cell, for i  1, 2 and j  1, 2, 3, 4
O11  31, O1 2  42, O13  22, O1 4  25,
O21  19, O2 2  8, O2 3  28, O2 4  25
Expected frequency, ei j 
i
row total  j th column total
th
 
for i  1, 2 and j  1, 2, 3, 4

n
e11 
12050  30, e  12050  30, e  12050  30, e  12050  30,
12 13 14
200 200 200 200
e21 
8050  20, e  8050  20, e  8050  20, e  8050  20
22 23 24
200 200 200 200
2 k O  ei j  2
Now   
2 ij
i 1 j 1 ei j

O 11  e11  2

O 12  e1 2  2

O 13  e13  2

O 14  e1 4  2
e11 e1 2 e13 e1 4

O 21  e2 1  2

O 22  e2 2  2

O 23  e2 3  2

O24  e2 4  2
e2 1 e2 2 e2 3 e2 4

31  30 2 
42  30 2  22  30 2  25  30 2
30 30 30 30

19  20  8  20  28  20 2  25  20 2
2 2
20 20 20 20
 0.0333  4.8  2.1333  2.1333  0.8333  0.05  7.2  3.2  1.25
 19.4999
And    0.05  7.815

2 2
(vii) Decision: Since  2   , reject H 0 based on the sample data at   0.05

2
(3)
Exercise:
(1) The following data come from a study in which random samples of the employees of three
government agencies were asked questions about their pension plan:
Agency 1 Agency 2 Agency 3

For the pension plan 67 84 109
Against the pension plan 33 66 41
Use the 0.01 level of significance to test the null hypothesis that the actual proportions of employees
favoring the pension plan are the same
(2) Tests are made on the proportion of defective castings produced by 5 different molds. If there were
14 defectives among 100 castings made with Mold I, 33 defectives among 200 castings made with
Mold II, 21 defectives among 180 castings made with Mold III, 17 defectives among 120 castings
made with Mold IV, and 25 defectives among 150 castings made with Mold V, use the 0.01 level of
significance to test whether the true proportion of defectives is the same for each mold.
(3) The following table gives the classification of 100 workers according to gender and nature of work.
Test whether the nature of work is independent of the gender of the worker at the 0.05 level of
significance
Stable Unstable Total

Males 40 20 60
Females 10 30 40
Total 50 50 100
(4)
(5)
ESTIMATION AND TEST OF HYPOTHESIS OF VARIANCES
Confidence Interval – One Variance:

(n  1) s 2 (n  1) s 2
Lower and Upper confidence limits are respectively given by and
 2 2
1
2 2
 
 (n  1) s 2 (n  1) s 2 
Confidence Interval is given by  , 
  2 
2
1
 2 2 
Examples:
(1) Suppose that the refractive indices of 20 pieces of glass (randomly selected from a large shipment
4
purchased by the optical firm) have a variance of 1.20 10 . Construct a 95% confidence interval for  , the
standard deviation of the population.
Here n  20 ,   n  1  19 and s 2  1.20 10 4

 
Now,  0.025 and 1   0.975
2 2
From the tables,  2   02.025  32.852 and  2    02.975  8.907
1
2 2
(n  1) s 2 19 1.20  10 4
Lower confidence limit for  2 is given by   0.000069
 2 32.852
2
(n  1) s 2 19 1.20  10 4
Upper confidence limit for  is given by
2
  0.000256
2 8.907
1
2
Therefore, the lower confidence limit for  is 0.000069  0.0083

And the upper confidence limit for  is 0.000256  0.0160
Hence the confidence interval for  is 0.0083, 0.0160
Exercise:
(1)
(2) Ff
(3)
Test of Hypothesis – One Variance:
(4) Null Hypothesis H 0 :  2   02

(5) Alternative Hypothesis H 1 : Any one of the following  2   02 ,  2   02 ,  2   02
(n  1) S 2
:   ,   n 1
2
(7) Test statistic 2

(8) Criterion :
H1 Reject H 0 if
 2   02  2  12
 2   02  2  2
 2   02  2   2  or  2   2
1
2 2
Test of Hypothesis – Two Variances:
(1) Null Hypothesis H 0 :  12   22

(2) Alternative Hypothesis H 1 : Any one of the following  12   22 ,  12   22 ,  12   22
(4) Test statistic & Criterion :
H1 Test Statistic Reject H 0 if

 12   22 S 22 F  F (n2  1, n1  1)
F
S12
 12   22 S12 F  F (n1  1, n2  1)
F
S 22
 12   22 S 22 F  F (n2  1, n1  1)
If  12   22 then F 
S12 2
S12 F  F (n1  1, n2  1)
If  12   22 then F 
S 22 2
Exercise:
(1) If 12 determinations of the specific heat of iron have a standard deviation of 0.0086, test the null hypothesis
  0.01 for such determinations. Use the alternative hypothesis   0.01 and the level of significance
  0.01
(2) The security department of a large office building wants to test the null hypothesis that  = 2.0 minutes for
the time it takes a guard to walk his round against the alternative hypothesis that   2.0 minutes. What can it
conclude at the 0.01 level of significance if a random sample of size n = 30 yields s = 1.8 minutes?
(3) A random sample of 6 steel beams has a mean compressive strength of 58,392 psi with standard
deviation 648 psi. Use this information and the level of significance 0.05, test the null hypothesis
  600 psi against the alternative hypothesis   600
(4) It is desired to determine whether there is less variability in the silver plating done by company 1 than in that
done by company 2. If independent random samples of size 12 of the two companies work yield S 1 = 0.035 mil
and S2 = 0.062 mil, test the null hypothesis  12   22 against the alternative hypothesis  12   22 at the 0.05
level of significance
(5) The following random samples are measurements of the heat-producing capacity (in millions of
calories per ton) of specimens of coal from two mines
Mine 1 8,260 8,130 8,350 8,070 8,340

Mine 2 7,950 7,890 7,900 8,140 7,920 7,840
Use 0.02 level of significance, to test whether it is reasonable to assume that the variances of the two
populations are equal
(6)
Maximum Error & Confidence Interval with probability 1  
S.No. Maximum Error Confidence Interval

1 One Mean    x  E, x  E 
Large Samples E  z   
2  n 
2 One Mean  s  x  E, x  E 
Small Samples E  t   
2  n 
3 Two Means
  12  22  x1  x2   E, x1  x2   E 
Large Samples E  z   
2  1
n n 2 
4 Two Means
  p2  p2  x1  x2   E, x1  x2   E 
Small Samples E  t   
 n1 n2 
2  
where  p2 
n1  1 s1 2  n2  1 s 2 2
n1  n2  2
or
 x  x1   x2  x2  
2 2
 2

1
n1  n2  2
p
5 One Proportion
x  x x x 
1     E,  E 
n n 
E  z 
n  n
2
n
6 Two Proportions pˆ 1  pˆ  pˆ 1  pˆ    x1 x2   x1 x2  
E  z        E,     E 
 n n  
2 n1 n2  1 2   n1 n2  
7 One Variance  
 n  1s 2 n  1s 2 
 , 
  2
2
1

 2 2 
Test statistics and Critical regions for tests of Hypotheses

S.No Test of Test Statistic H1 Reject H 0 if
Hypothesis
1 One Mean X    0 Z   z
Large Z
    0 Z  z
Samples  
 n   0 Z  z
2
2 One Mean X    0 t  t
Small t ,   n 1
 s    0 t  t
Samples  
 n   0 t  t
2
3 Two
Z
X 1  X 2   1   2   Z   z
Means 1   2   Z  z
  12  22 
Large    1   2   Z  z
Samples  1
n n 2 
2
4 Two
t
X 1  X 2  
,   n1  n2  2
1   2   t  t
Means 1   2   t  t
  p2  p2 
Small    1   2  
Samples  n1 n2  t  t
  2
n  1 s1 2  n2  1 s 2 2
where,  p2  1
n1  n2  2
 x  x1   x2  x2  
2 2
or  2

1
n1  n2  2
p
5 One x p  p0 Z   z
p
Proportion xn p p  p0 Z  z
Z or Z  n
n p 1  p  p 1  p  p  p0 Z  z
n 2
6 Two x1 x 2 p1  p2   Z   z
Proportions 
n1 n2 x  x2 p1  p2   Z  z
Z where, pˆ  1
pˆ 1  pˆ  pˆ 1  pˆ  n1  n2 p1  p2   Z  z
 2
n1 n2
7 Several 2 k O  ei j  2 H 0 : p1 , p2 , p3 ,  pk  2   2
  
2 ij
Proportions are equal
i 1 j 1 ei j
H 1 : p1 , p2 , p3 ,  pk
with   k  1 are not equal
8 One (n  1) S 2    02
2
 2   12
Variance   2
with   n  1
2  2   02  2   2
 2   02 2  2 or
1
2
 2   2
2
9 Two s2
 
2 2
F  F  2 , 1 
Variances F 2
2
(provided s 22  s12 ) , with  2 , 1  1 2
s1
2
s
F 1
2
(provided s12  s 22 ) , with  1 , 2   12   22 F  F  1 , 2 
s2
F  F  M , m 
2
s
F M
2
(provided s M2  s m2 ) , with  M , m   2   02
s m 2

STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao

Uploaded by

Copyright:

Available Formats

STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STATISTICAL METHODS Units-345notes BTech S.srinivasa Rao

Uploaded by

Copyright:

Available Formats

B.

Tech IV Semester (2020 Batch)

Unit – 3: Statistics: Sampling Distribution

Unit – 4: Statistics: Estimation and Test of Hypothesis of Means

Unit – 5: Statistics: Estimation and Test of Hypothesis of Variances and Proportions

The number of objects in a population is called its Size and is denoted by N

Sample: A finite sub collection from a population is called a Sample

If n  30 then the sample is called large sample

If n  30 then the sample is called small sample

Sampling: Collecting samples from a given population is called sampling

(1) Consider the finite population 1, 2, 3, 4  of size N  4

The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

The frequency distribution of the these sample means is given as follows

Sample mean x 1.5 2 2.5 3 3.5

The samples are 1,1, 1, 2 , 1, 3 ,  1, 4 ,

The frequency distribution of the these sample means is given as follows

Sample mean x 1 1.5 2 2.5 3 3.5 4

This frequency distribution is the sampling distribution of the mean (SDM)

Sampling Distribution of the Mean (SDM):

Sampling Distribution of the Mean with  (SDM with  ):

Central Limit Theorem:

Here n1  400 and n2  900

(5) *A population consists 4 numbers 1, 2, 3, 4

(i) Mean and Variance of the population:

(ii) Samples of size 2 without replacement:

The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

The frequency distribution of the these sample means is given as follows

Sample mean x 1.5 2 2.5 3 3.5

Which is the sampling distribution of the mean (SDM)

1 1.5  2  2.5  2.5  3  3.5 15

(6) *A population consists 4 numbers 1, 2, 3, 4

(i) Mean and Variance of the population:

(ii) Samples of size 2 with replacement:

The means of the above 16 samples are respectively

The frequency distribution of the these sample means is given as follows

Sample mean x 1 1.5 2 2.5 3 3.5 4

Which is the sampling distribution of the mean (SDM)

Therefore, X   and  X2  are verified.

Here n  100 ,   76,  2  256 and   16

Here n  900,   0.1 and   2.1

Here n  64,   51.4 and   6.8

Here n  40,   513 and   31.5

Note: In the t -distribution, for the sample x1 , x2 , x3 ,  xn ,

(1) The curve given by t -distribution is called t -curve

(i) For   7 or n  8 and   0.025 , t  t0.025  2.365

(i) When   12 , t0.05  1.782

(i) When  13 , t0.975  t10.975  t 0.025  2.160 ( t  t1 )

Here n  25  small sample  and   n  1  24

Here n  12  small sample  and   n 1  11

Sampling Distribution of the Variance (SDV):

(1) Consider the finite population 1, 2, 3, 4  of size N  4

The samples are 1, 2, 1, 3 , 1, 4 ,  2, 3 ,  2, 4 , 3, 4 

The frequency distribution of the these sample variances is given as follows

Sample variance s 2 0.25 1 2.25

Which is the sampling distribution of the Variance (SDV)

(i) For   6 or n  7 and   0.025 , 2   02.025  14.449

(ii) When   17 , P 7.564   2  35.718  P  02.975   2   02.005 

(i) For  1  10,  2  6 and   0.05 , F0.05 ( 1 , 2 )  F0.05 (10, 6)  4.06

Estimation: Estimating a population parameter using sample data is called Estimation.

Interval Estimation: Estimating a population parameter in terms of an interval is called Interval