© Ncert Not To Be Republished: Measures of Dispersion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

CHAPTER

Measures of Dispersion

Studying this chapter should enable you to: know the limitations of averages; appreciate the need of measures of dispersion; enumerate various measures of dispersion; calculate the measures and compare them; distinguish between absolute and relative measures.

1. INTRODUCTION
In the previous chapter, you have studied how to sum up the data into a single representative value. However, that value does not reveal the variability present in the data. In this chapter you will study those

o n

T lis R b E u C p N re e b to t

d e h

measures, which seek to quantify variability of the data. Three friends, Ram, Rahim and Maria are chatting over a cup of tea. During the course of their conversation, they start talking about their family incomes. Ram tells them that there are four members in his family and the average income per member is Rs 15,000. Rahim says that the average income is the same in his family, though the number of members is six. Maria says that there are five members in her family, out of which one is not working. She calculates that the average income in her family too, is Rs 15,000. They are a little surprised since they know that Marias father is earning a huge salary. They go into details and gather the following data:

MEASURES OF DISPERSION

75

Family Incomes Sl. No. 1. 2. 3. 4. 5. 6. Total income Average income Ram 12,000 14,000 16,000 18,000 --------Rahim 7,000 10,000 14,000 17,000 20,000 22,000 Maria 0 7,000 8,000 10,000 50,000 -----75,000 15,000

60,000 90,000 15,000 15,000

Do you notice that although the average is the same, there are considerable differences in individual incomes? It is quite obvious that averages try to tell only one aspect of a distribution i.e. a representative size of the values. To understand it better, you need to know the spread of values also. You can see that in Rams family., dif ferences in incomes are comparatively lower. In Rahims family, differences are higher and in Marias family are the highest. Knowledge of only average is insufficient. If you have another value which reflects the quantum of

o n

T lis R b E u C p N re e b to t
2. MEASURES BASED VALUES
Range
UPON

variation in values, your understanding of a distribution improves considerably. For example, per capita income gives only the average income. A measure of dispersion can tell you about income inequalities, thereby improving the understanding of the relative standards of living enjoyed by different strata of society. Dispersion is the extent to which values in a distribution differ from the average of the distribution. To quantify the extent of the variation, there are certain measures namely: (i) Range (ii) Quartile Deviation (iii) Mean Deviation (iv) Standard Deviation

d e h

Apart from these measures which give a numerical value, there is a graphic method for estimating dispersion. Range and Quartile Deviation measure the dispersion by calculating the spread within which the values lie. Mean Deviation and Standard Deviation calculate the extent to which the values differ from the average.

SPREAD

OF

Range (R) is the difference between the largest (L) and the smallest value (S) in a distribution. Thus, R=LS Higher value of Range implies higher dispersion and vice-versa.

76

STATISTICS FOR ECONOMICS

Activities Look at the following values: 20, 30, 40, 50, 200 Calculate the Range. What is the Range if the value 200 is not present in the data set? If 50 is replaced by 150, what will be the Range? Range: Comments Range is unduly affected by extreme values. It is not based on all the values. As long as the minimum and maximum values remain unaltered, any change in other values does not affect range. It can not be calculated for open-ended frequency distribution.

Quartile Deviation The presence of even one extremely high or low value in a distribution can reduce the utility of range as a measure of dispersion. Thus, you may need a measure which is not unduly affected by the outliers. In such a situation, if the entire data is divided into four equal parts, each containing 25% of the values, we get the values of Quartiles and Median. (You have already read about these in Chapter 5). The upper and lower quartiles (Q3 and Q 1, respectively) are used to calculate Inter Quartile Range which is Q3 Q1. Inter -Quartile Range is based upon middle 50% of the values in a distribution and is, therefore, not affected by extreme values. Half of the Inter -Quartile Range is called Quartile Deviation. Thus:
Q3 - Q1 2 Q.D. is therefore also called SemiInter Quartile Range. Q .D . =

Notwithstanding some limitations, Range is understood and used frequently because of its simplicity. For example, we see the maximum and minimum temperatures of different cities almost daily on our TV screens and form judgments about the temperature variations in them.
Open-ended distributions are those in which either the lower limit of the lowest class or the upper limit of the highest class or both are not specified.

Collect data about 52-week high/low of 10 shares from a newspaper. Calculate the range of share prices. Which stock is most volatile and which is the most stable?

o n

T lis R b E u C p N re e b to t
Example 1

d e h

Calculation of Range and Q.D. for ungrouped data Calculate Range and Q.D. of the following observations: 20, 25, 29, 30, 35, 39, 41, 48, 51, 60 and 70 Range is clearly 70 20 = 50 For Q.D., we need to calculate values of Q3 and Q1.

Activity

MEASURES OF DISPERSION

77

n +1 th value. 4 n being 11, Q1 is the size of 3rd value. As the values are already arranged in ascending order, it can be seen that Q1, the 3rd value is 29. [What will you do if these values are not in an order?]
Q1 is the size of

Range is just the dif ference between the upper limit of the highest class and the lower limit of the lowest class. So Range is 90 0 = 90. For Q.D., first calculate cumulative frequencies as follows:
ClassIntervals CI 010 1020 2040 4060 6090 Frequencies f 5 8 16 7 4 Cumulative Frequencies c. f. 05 13 29 36 40

3( n + 1) th Similarly, Q3 is size of 4 value; i.e. 9th value which is 51. Hence Q3 = 51 Q .D . =

Q3 - Q1 51 - 29 = 11 = 2 2 Do you notice that Q.D. is the average difference of the Quartiles from the median.
Activity Calculate the median and check whether the above statement is correct.

Calculation of Range and Q.D. for a frequency distribution. Example 2 For the following distribution of marks scored by a class of 40 students, calculate the Range and Q.D.
Class intervals CI 010 1020 2040 4060 6090

o n

TABLE 6.1

T lis R b E u C p N re e b to t
n = 40

d e h

Q1 is the size of

n th value in a 4

continuous series. Thus it is the size of the 10th value. The class containing the 10th value is 1020. Hence Q1 lies in class 1020. Now, to calculate the exact value of Q 1 , the following formula is used:
i f Where L = 10 (lower limit of the relevant Quartile class) c.f. = 5 (Value of c.f. for the class preceding the Quartile class) i = 10 (interval of the Quartile class), and f = 8 (frequency of the Quartile class) Thus,
Q1 = 10 + 10 - 5 10 = 16.25 8 3n th 4

n Q1 = L + 4

cf

No. of students (f) 5 8 16 7 4 40

Similarly, Q3 is the size of

78

STATISTICS FOR ECONOMICS

value; i.e., 30th value, which lies in class 4060. Now using the formula for Q3, its value can be calculated as follows:

3n - c.f. Q3 = L + 4 f

to rich and poor, from the median of the entire group. Quartile Deviation can generally be calculated for open-ended distributions and is not unduly affected by extreme values.

Q3 = 40 +

30 - 29 7

20

3. M EASURES AVERAGE

OF

D ISPERSION

Q3 = 42.87 Q.D. =

42.87 - 16.25 = 13.31 2

In individual and discrete series, Q1 is the size of

n +1 th value, but in a 4

continuous distribution, it is the size of

n th value. Similarly, for Q3 and 4

median also, n is used in place of n+1.

If the entire group is divided into two equal halves and the median calculated for each half, you will have the median of better students and the median of weak students. These medians differ from the median of the entire group by 13.31 on an average. Similarly, suppose you have data about incomes of people of a town. Median income of all people can be calculated. Now if all people are divided into two equal groups of rich and poor, medians of both groups can be calculated. Quartile Deviation will tell you the average difference between medians of these two groups belonging

o n

T lis R b E u C p N re e b to t
Set A : Set B : 5, 1, 9, 9,

Recall that dispersion was defined as the extent to which values differ from their average. Range and Quartile Deviation do not attempt to calculate, how far the values are, from their average. Yet, by calculating the spread of values, they do give a good idea about the dispersion. Two measures which are based upon deviation of the values from their average are Mean Deviation and Standard Deviation. Since the average is a central value, some deviations are positive and some are negative. If these are added as they are, the sum will not reveal anything. In fact, the sum of deviations from Arithmetic Mean is always zero. Look at the following two sets of values. 16 20

d e h
FROM

You can see that values in Set B are farther from the average and hence more dispersed than values in Set A. Calculate the deviations from Arithmetic Mean amd sum them up. What do you notice? Repeat the same with Median. Can you comment upon the quantum of variation from the calculated values?

MEASURES OF DISPERSION

79

Mean Deviation tries to overcome this problem by ignoring the signs of deviations, i.e., it considers all deviations positive. For standard deviation, the deviations are first squared and averaged and then square root of the average is found. We shall now discuss them separately in detail. Mean Deviation

Mean Deviation which is simply the arithmetic mean of the differences of the values from their average. The average used is either the arithmetic mean or median. (Since the mode is not a stable average, it is not used to calculate Mean Deviation.)
Activities Calculate the total distance to be travelled by students if the college is situated at town A, at town C, or town E and also if it is exactly half way between A and E. Decide where, in you opinion, the college should be established, if there is only one student in each town. Does it change your answer?

Suppose a college is proposed for students of five towns A, B, C, D and E which lie in that order along a road. Distances of towns in kilometres from town A and number of students in these towns are given below:
Town A B C D E Distance from town A 0 2 6 14 18 No. of Students 90 150 100 200 80 620

Now, if the college is situated in town A, 150 students from town B will have to travel 2 kilometers each (a total of 300 kilometres) to reach the college. The objective is to find a location so that the average distance travelled by students is minimum. You may observe that the students will have to travel more, on an average, if the college is situated at town A or E. If on the other hand, it is somewhere in the middle, they are likely to travel less. The average distance travelled is calculated by

o n

T lis R b E u C p N re e b to t

d e h

Calculation of Mean Deviation from Arithmetic Mean for ungrouped data. Direct Method Steps:

(i) The A.M. of the values is calculated (ii) Difference between each value and the A.M. is calculated. All dif ferences are considered positive. These are denoted as |d| (iii) The A.M. of these dif ferences (called deviations) is the Mean Deviation. S |d| i.e. M.D. = n Example 3 Calculate the Mean Deviation of the following values; 2, 4, 7, 8 and 9.

80

STATISTICS FOR ECONOMICS

The A.M. =
X 2 4 7 8 9

SX =6 n
|d| 4 2 1 2 3 12

M.D.( X )

12 = = 2.4 5

Assumed Mean Method

Mean Deviation can also be calculated by calculating deviations from an assumed mean. This method is adopted especially when the actual mean is a fractional number. (Take care that the assumed mean is close to the true mean). For the values in example 3, suppose value 7 is taken as assumed mean, M.D. can be calculated as under: Example 4 X

o n
2 4 7 8 9
M.D.( x ) =

T lis R b E u C p N re e b to t
M.D.( x ) =

Where |d| is the sum of absolute deviations taken from the assumed mean. x is the actual mean. A x is the assumed mean used to calculate deviations. fB is the number of values below the actual mean including the actual mean. fA is the number of values above the actual mean. Substituting the values in the above formula:
11 + (6 - 7)(2 - 3) 12 = = 2.4 5 5

d e h

Mean Deviation from median for ungrouped data. Direct Method Using the values in example 3, M.D. from the Median can be calculated as follows, (i) Calculate the median which is 7. (ii) Calculate the absolute deviations from median, denote them as |d|. (iii) Find the average of these absolute deviations. It is the Mean Deviation. Example 5 X 2 4 7 8 9

|d|

5 3 0 1 2

[X-Median] |d| 5 3 0 1 2 11

11

In such cases, the following formula is used,


S| d | + ( x - Ax )(S f B - S f A ) n

MEASURES OF DISPERSION

81

M. D. from Median is thus,


M.D.( median ) = S | d | 11 = = 2.2 n 5

(iii) Multiply each |d| value with its corresponding frequency to get f|d| values. Sum them up to get f|d|. (iv) Apply the following formula,
M.D. ( x ) = S f |d| Sf

Short-cut method To calculate Mean Deviation by short cut method a value (A) is used to calculate the deviations and the following formula is applied.
M.D.( Median ) = S | d | + ( Median - A )(S f B - S f A ) n

where, A = the constant from which deviations are calculated. (Other notations are the same as given in the assumed mean method). Mean Deviation from Mean for Continuous distribution
TABLE 6.2 Profits of companies (Rs in lakhs) Class-intervals 1020 2030 3050 5070 7080

Steps:

(i) Calculate the distribution.

o n

T lis R b E u C p N re e b to t
Example 6
C.I. f m.p. 15 25 40 60 75 |d| 1020 2030 3050 5070 7080 5 8 16 8 3 25.5 15.5 0.5 19.5 34.5 40

Mean Deviation of the distribution in Table 6.2 can be calculated as follows:

d e h
f|d| 127.5 124.0 8.0 156.0 103.5 519.0

M.D.( x ) =

S f | d | 519 = = 12.975 Sf 40

Number of Companies frequencies 5 8 16 8 3

Mean Deviation from Median


TABLE 6.3

Class intervals 2030 3040 4060 6080 8090

Frequencies 5 10 20 9 6

40

50

mean

of

the

(ii) Calculate the absolute deviations |d| of the class midpoints from the mean.

The procedure to calculate Mean Deviation from the median is the same as it is in case of M.D. from Mean, except that deviations are to be taken from the median as given below:

82

STATISTICS FOR ECONOMICS

Example 7
C.I. 2030 3040 4060 6080 8090 f 5 10 20 9 6 50 m.p. 25 35 50 70 85 |d| 25 15 0 20 35 f|d| 125 150 0 180 210 665

Calculation of Standard Deviation for ungrouped data Four alternative methods are available for the calculation of standard deviation of individual values. All these methods result in the same value of standard deviation. These are: (i) Actual Mean Method (ii) Assumed Mean Method (iii) Direct Method (iv) Step-Deviation Method Actual Mean Method:

M.D.( Median ) = 665 = 13.3 50

S f |d| = Sf

Mean Deviation: Comments Mean Deviation is based on all values. A change in even one value will affect it. It is the least when calculated from the median i.e., it will be higher if calculated from the mean. However it ignores the signs of deviations and cannot be calculated for open-ended distributions.

Standard Deviation

Standard Deviation is the positive square root of the mean of squared deviations from mean. So if there are five values x1, x2, x3, x4 and x5, first their mean is calculated. Then deviations of the values from mean are calculated. These deviations are then squared. The mean of these squared deviations is the variance. Positive square root of the variance is the standard deviation. (Note that Standard Deviation is calculated on the basis of the mean only).

o n

T lis R b E u C p N re e b to t
Example 8
X d 5 10 25 30 50 19 14 +1 +6 +26 0

Suppose you have to calculate the standard deviation of the following values: 5, 10, 25, 30, 50

d e h

d2

361 196 1 36 676

1270

Following formula is used: S d2 n

s=

s=

1270 = 5

254 = 15.937

Do you notice the value from which deviations have been calculated in the above example? Is it the Actual Mean? Assumed Mean Method For the same values, deviations may be calculated from any arbitrary value

MEASURES OF DISPERSION

83

A x such that d = X A x . Taking A x = 25, the computation of the standard deviation is shown below: Example 9
X 5 10 25 30 50 d 20 15 0 +5 +25 5 d2 400 225 0 25 625 1275

(This amounts to taking deviations from zero) Following formula is used. s= or s = or s = S x2 - ( x )2 n


4150 - (24 )2 5

254 = 15.937

Formula for Standard Deviation

s=

S d2 Sd n n

s=

-5 1275 = 5 5

The sum of deviations from a value other than actul mean is not equal to zero

Direct Method

Standard Deviation can also be calculated from the values directly, i.e., without taking deviations, as shown below: Example 10
X

5 10 25 30 50

o n

T lis R b E u C p N re e b to t
2 2

Standard Deviation is not affected by the value of the constant from which deviations are calculated. The value of the constant does not figure in the standard deviation formula. Thus, Standard Deviation is Independent of Origin.

d e h

254 = 15.937

Step-deviation Method

If the values are divisible by a common factor, they can be so divided and standard deviation can be calculated from the resultant values as follows: Example 11

Since all the five values are divisible by a common factor 5, we divide and get the following values:
x x' d d2 14.44 7.84 0.04 1.44 27.04 50.80 5 10 25 30 50 1 2 5 6 10 3.8 2.8 +0.2 +1.2 +5.2 0

x2 25 100 625 900 2500 4150

120

(Steps in the calculation are same as in actual mean method). The following formula is used to calculate standard deviation:

84

STATISTICS FOR ECONOMICS

s=

S d2 c n

x c c = common factor Substituting the values, x =


s= 50.80 5 5

Standard Deviation is not independent of scale. Thus, if the values or deviations are divided by a common factor, the value of the common factor is used in the formula to get the value of Standard Deviation.

Standard Deviation in Continuous frequency distribution:

s = 10.16 5
s = 15.937

Alternatively, instead of dividing the values by a common factor, the deviations can be divided by a common factor. Standard Deviation can be calculated as shown below: Example 12
x 5 10 25 30 50

20 15 0 +5 +25

Deviations have been calculated from an arbitrary value 25. Common factor of 5 has been used to divide deviations.
s= S d 2 n

o n
2

T lis R b E u C p N re e b to t
Actual Mean Method
d' d2 4 3 0 +1 +5 1 16 9 0 1 25

Like ungrouped data, S.D. can be calculated for grouped data by any of the following methods: (i) Actual Mean Method (ii) Assumed Mean Method (iii) Step-Deviation Method

d e h
(7) fd2 3251.25 1922.00 4.00 3042.00 3570.75 11790.00

For the values in Table 6.2, Standard Deviation can be calculated as follows: Example 13
(1) CI (2) f

(3) m

(4) fm

(5) d

(6) fd

51

1020 2030 3050 5070 7080

5 8 16 8 3 40

15 25 40 60 75

75 200 640 480 225

25.5 15.5 0.5 +19.5 +34.5

127.5 124.0 8.0 +156.0 +103.5 0

1620

Sd n

Following steps are required: 1. Calculate the mean of distribution.


x=

the

51 -1 5 s= 5 5
s = 10.16 5 = 15.937

Sfm 1620 = = 40.5 Sf 40 2. Calculate deviations of mid-values from the mean so that d = m - x (Col. 5) 3. Multiply the deviations with their

MEASURES OF DISPERSION

85

corresponding frequencies to get fd values (col. 6) [Note that fd = 0] 4. Calculate fd 2 values by multiplying fd values with d values. (Col. 7). Sum up these to get fd2. 5. Apply the formula as under:
s= Sfd2 11790 = = 17.168 n 40

4. Multiply fd values (Col. 5) with d values (col. 4) to get fd2 values (col. 6). Find fd2. 5. Standard Deviation can be calculated by the following formula.
s= Sfd2 Sfd n n
2

Assumed Mean Method

For the values in example 13, standard deviation can be calculated by taking deviations from an assumed mean (say 40) as follows: Example 14
(1) CI 1020 2030 3050 5070 7080 (2) f 5 8 16 8 3 40

The following steps are required: 1. Calculate mid-points of classes (Col. 3) 2. Calculate deviations of mid-points from an assumed mean such that d = m A x (Col. 4). Assumed Mean = 40. 3. Multiply values of d with corresponding frequencies to get fd values (Col. 5). (note that the total of this column is not zero since deviations have been taken from assumed mean).

o n

T lis R b E u C p N re e b to t
or s = 294.75 = 17.168 Step-deviation Method
(3) m (4) d (5) fd (6) fd2 15 25 40 60 75 -25 -15 0 +20 +35 125 120 0 160 105 +20 3125 1800 0 3200 3675

or s =

11800 20 40 40

d e h
(6) fd' (7) fd'2 125 72 0 128 147 472

In case the values of deviations are divisible by a common factor, the calculations can be simplified by the step-deviation method as in the following example. Example 15
(1) CI (2) f

(3) m

(4) d

(5) d'

11800

1020 2030 3050 5070 7080

5 8 16 8 3 40

15 25 40 60 75

25 15 0 +20 +35

5 3 0 +4 +7

25 24 0 +32 +21 +4

Steps required:

1. Calculate class mid-points (Col. 3) and deviations from an arbitrarily chosen value, just like in the assumed mean method. In this example, deviations have been taken from the value 40. (Col. 4) 2. Divide the deviations by a common factor denoted as C. C = 5 in the

86

STATISTICS FOR ECONOMICS

above example. The values so obtained are d' values (Col. 5). 3. Multiply d' values with corresponding f' values (Col. 2) to obtain fd' values (Col. 6). 4. Multiply fd' values with d' values to get fd'2 values (Col. 7) 5. Sum up values in Col. 6 and Col. 7 to get fd' and fd'2 values. 6. Apply the following formula.

Set A Set B

500 100000

700 120000

1000 130000

s =

2 Sfd Sfd c Sf Sf 2

or s =

472 4 5 40 40

or s = 11.8 - .01 5 or

s = 11.79 5 s = 17.168

Standard Deviation: Comments

Standard Deviation, the most widely used measure of dispersion, is based on all values. Therefore a change in even one value affects the value of standard deviation. It is independent of origin but not of scale. It is also useful in certain advanced statistical problems.

5. ABSOLUTE AND RELATIVE MEASURES OF DISPERSION


All the measures, described so far, are absolute measures of dispersion. They calculate a value which, at times, is difficult to interpret. For example, consider the following two data sets:

o n

T lis R b E u C p N re e b to t
2

Suppose the values in Set A are the daily sales recorded by an icecream vendor, while Set B has the daily sales of a big departmental store. Range for Set A is 500 whereas for Set B, it is 30,000. The value of Range is much higher in Set B. Can you say that the variation in sales is higher for the departmental store? It can be easily observed that the highest value in Set A is double the smallest value, whereas for the Set B, it is only 30% higher. Thus absolute measures may give misleading ideas about the extent of variation specially when the averages differ significantly. Another weakness of absolute measures is that they give the answer in the units in which original values are expressed. Consequently, if the values are expressed in kilometers, the dispersion will also be in kilometers. However, if the same values are expressed in meters, an absolute measure will give the answer in meters and the value of dispersion will appear to be 1000 times. To overcome these problems, relative measures of dispersion can be used. Each absolute measure has a relative counterpart. Thus, for Range, there is Coefficient of Range which is calculated as follows: Coefficient of Range =

d e h

L- S L+ S

where L = Largest value S = Smallest value Similarly, for Quartile Deviation, it

MEASURES OF DISPERSION

87

is Coefficient of Quartile Deviation which can be calculated as follows: Coefficient of Quartile Deviation

be compared even across different groups having different units of measurement.

Q3 - Q 1 rd Q3 + Q 1 where Q3=3 Quartile


is

7. LORENZ CURVE
The measures of dispersion discussed so far give a numerical value of dispersion. A graphical measure called Lorenz Curve is available for estimating dispersion. You may have heard of statements like top 10% of the people of a country earn 50% of the national income while top 20% account for 80%. An idea about income disparities is given by such figures. Lorenz Curve uses the information expressed in a cumulative manner to indicate the degree of variability. It is specially useful in comparing the variability of two or more distributions. Given below are the monthly incomes of employees of a company.
TABLE 6.4 Incomes Number of employees 5 10 18 10 7

Q1 = 1st Quartile For Mean Deviation, it Coefficient of Mean Deviation. Coefficient of Mean Deviation =

M.D.( x ) M.D.( Median ) or Median x Thus if Mean Deviation is calculated on the basis of the Mean, it is divided by the Mean. If Median is used to calculate Mean Deviation, it is divided by the Median. For Standard Deviation, the relative measure is called Coefficient of Variation, calculated as below: Coefficient of Variation = Standard Deviation 100 Arithmetic Mean

It is usually expressed in percentage terms and is the most commonly used relative measure of dispersion. Since relative measures are free from the units in which the values have been expressed, they can Example 16
Income limits (1)

o n

T lis R b E u C p N re e b to t
05,000 5,00010,000 10,00020,000 20,00040,000 40,00050,000 (2) 2500 10000 25000 55000 100000 2.5 10.0 25.0 55.0 100.0 5 10 18 10 7 5 15 33 43 50

d e h

Mid-points

Cumulative Cumulative No. of Comulative Comulative mid-points mid-points as employees frequencies frequencies as percentages frequencies percentages (3) (4) (5) (6) (7) 10 30 66 86 100

05000 500010000 1000020000 2000040000 4000050000

2500 7500 15000 30000 45000

88

STATISTICS FOR ECONOMICS

Construction of the Lorenz Curve Following steps are required. 1. Calculate class mid-points and find cumulative totals as in Col. 3 in the example 16, given above. 2. Calculate cumulative frequencies as in Col. 6. 3. Express the grand totals of Col. 3 and 6 as 100, and convert the cumulative totals in these columns into percentages, as in Col. 4 and 7. 4. Now, on the graph paper, take the cumulative percentages of the variable (incomes) on Y axis and cumulative percentages of frequencies (number of employees) on X-axis, as in figure 6.1. Thus each axis will have values from 0 to 100. 5. Draw a line joining Co-ordinate (0, 0) with (100,100). This is called the line of equal distribution shown as line OC in figure 6.1. 6. Plot the cumulative percentages of the variable with corresponding cumulative percentages of frequency. Join these points to get the curve OAC. Studying the Lorenz Curve

from line OC has the highest dispersion.

OC is called the line of equal distribution, since it would imply a situation like, top 20% people earn 20% of total income and top 60% earn 60% of the total income. The farther the curve OAC from this line, the greater is the variability present in the distribution. If there are two or more curves, the one which is the farthest

o n

T lis R b E u C p N re e b to t
8. CONCLUSION

d e h

Although Range is the simplest to calculate and understand, it is unduly affected by extreme values. QD is not affected by extreme values as it is based on only middle 50% of the data. However, it is more dif ficult to interpret M.D. and S.D. both are based upon deviations of values from their average. M.D. calculates average of deviations from the average but ignores signs of deviations and therefore appears to be unmathematical. Standard Deviation attempts to calculate average deviation from mean. Like M.D., it is based on all values and is also applied in more advanced statistical problems. It is the most widely used measure of dispersion.

MEASURES OF DISPERSION

89

Recap A measure of dispersion improves our understanding about the behaviour of an economic variable. Range and Quartile Deviation are based upon the spread of values. M.D. and S.D. are based upon deviations of values from the average. Measures of dispersion could be Absolute or Relative. Absolute measures give the answer in the units in which data are expressed. Relative smeasures are free from these units, and consequently can be used to compare different variables. A graphic method, which estimates the dispersion from shape of a curve, is called Lorenz Curve.

1. A measure of dispersion is a good supplement to the central value in understanding a frequency distribution. Comment. 2. Which measure of dispersion is the best and how? 3. Some measures of dispersion depend upon the spread of values whereas some calculate the variation of values from a central value. Do you agree? 4. In a town, 25% of the persons earned more than Rs 45,000 whereas 75% earned more than 18,000. Calculate the absolute and relative values of dispersion. 5. The yield of wheat and rice per acre for 10 districts of a state is as under: District 1 2 3 4 5 6 7 8 9 10 Wheat 12 10 15 19 21 16 18 9 25 10 Rice 22 29 12 23 18 15 12 34 18 12 Calculate for each crop, (i) Range (ii) Q.D. (iii) Mean Deviation about Mean (iv) Mean Deviation about Median (v) Standard Deviation (vi) Which crop has greater variation? (vii) Compare the values of different measures for each crop.

o n

T lis R b E u C p N re e b to t
EXERCISES

d e h

6. In the previous question, calculate the relative measures of variation and indicate the value which, in your opinion, is more reliable. 7. A batsman is to be selected for a cricket team. The choice is between X and Y on the basis of their five previous scores which are:

90

STATISTICS FOR ECONOMICS

X 25 85 40 80 120 Y 50 70 65 45 80 Which batsman should be selected if we want, (i) a higher run getter, or (ii) a more reliable batsman in the team? 8. To check the quality of two brands of lightbulbs, their life in burning hours was estimated as under for 100 bulbs of each brand. Life (in hrs) 050 50100 100150 150200 200250 No. of bulbs Brand A Brand B 15 20 18 25 22 2 8 60 25 5

(i) Which brand gives higher life? (ii) Which brand is more dependable?

9. Averge daily wage of 50 workers of a factory was Rs 200 with a Standard Deviation of Rs 40. Each worker is given a raise of Rs 20. What is the new average daily wage and standard deviation? Have the wages become more or less uniform? 10. If in the previous question, each worker is given a hike of 10 % in wages, how are the Mean and Standard Deviation values affected? 11. Calculate the Mean Deviation about Mean and Standard Deviation for the following distribution. Classes 2040 4080 80100 100120 120140 Frequencies 3 6 20 12 9 50

12. The sum of 10 values is 100 and the sum of their squares is 1090. Find the Coefficient of Variation.

o n

T lis R b E u C p N re e b to t
100 100

d e h