CHP 2 Mat161

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

CH 2 PRESENTING AND SUMMARIZING DATA

Data can be organized and displayed graphically using many different types of tables, charts and
graphs. Graphical displays help us see the important characteristics of data sets. In this chapter,
we will learn how to organize and summarize quantitative and qualitative data graphically.

2.1 Data Listing


The purpose of listing a set of data is to simplify its use. By listing, the identity of each value (or
item) in the data set is preserved. Two methods of listing: dotplots and stem-and-leaf plots.

Dotplots
A dotplot consists of dots which are plotted along a scale of values which represents the category
of each data. The number of dots for each category corresponds to the number of times each
value or item occurs in the data set. Dotplots are suitable for both quantitative and qualitative
data.

Example: The following data shows the number of students who were absent from a Statistics
course lecture, conducted 40 times in a particular semester.

0 2 3 0 4 1 2 3 2 0
1 2 3 4 5 5 2 3 0 4
1 2 4 1 6 1 3 4 5 2
2 3 4 5 3 3 2 4 5 2

Construct a dotplot for the number of absentees in a lecture.

Solution:
The variable of interest: The number of absentees in a lecture

1
Stem-and-Leaf Plots
A stem-and-leaf plot combines sorting and graphing of a data set. It can only be applied to
quantitative data. Each number in a data set is separated into two parts; the stem and the leaf.
The leftmost digit(s) becomes the stem and the rightmost digit(s), the leaf.

Example: The following stem-and-leaf plot depicts the ages of staff members at an
administrative office. The ages are arranged in increasing order, from 20 to 61 years.
Stem (tens) Leaves (units)
2 003345568 data values are 20, 20, 23, …, 28
3 0112556899
4 00033345677789
5 1112
6 01

How to plot: First, determine the stem values. The stem starts with the leftmost digit of the
minimum data value and consecutively increased until the leftmost digit of the maximum data
value. Allign the stem values vertically and draw a vertical line next to it. Put in the
corresponding leaf values for each stem on the right of the vertical line.

Example: The following are ages of 30 patients who had their first heart attack.
65 40 63 67 75 79 85 45 90 60
55 67 86 90 49 78 76 54 67 89
56 45 50 85 67 90 83 72 98 55
Construct a stem-and-leaf display.

Solution:
The variable of interest: Ages of patients who had their first heart attack

Stem Leaf
4 0 5 5 9
5 0 4 5 5 6
6 0 3 5 7 7 7 7
7 2 5 6 8 9
8 3 5 5 6 9
9 0 0 0 8

Key: 4|0 means 40 years old of patients

2.1 Data Grouping


Grouping or classifying data into different non-overlapping classes helps us see and assess
important characteristics of a large mass of data. For qualitative data, a class is a name from the

2
data set. For quantitative data, a class is an interval of numbers if the data is of the continuous
type or an individual number if the data is discrete.

Frequency Distribution
A frequency distribution groups data values into different classes (categories) and records the
number of data points that fall into each category.

Example: Twenty-five army inductees were given a blood test to determine their blood type.
The resulting data set consists of twenty-five data values, each being blood type A, B, AB, or O.
Grouping the data points into the four categories of blood type, we have the following frequency
distribution table:

Blood Type Number of army inductees


A 5
B 7
O 9
AB 4
Total 25

Frequency Distribution (Qualitative Data) : Blood Type

Example: A survey was conducted by an economist to determine the monthly income of


families in rural areas. 70 families from a village were randomly chosen and the results are as
follows:

Monthly Income (in RM) Number of families


Less than RM499 5
RM500 – RM599 10
RM600 – RM699 15
RM700 – RM799 20
RM800 – RM899 15
More than RM900 5
Total 70

Frequency Distribution (Quantitative Data): Monthly Income of Rural Families

To construct a frequency distribution for quantitative data, we need to understand the following
definitions:

i. Class Limit
The smallest and largest values that can fall in a given class interval are referred to as the class
limits. Classes without the lower limit or upper limit are known as open classes.

3
Example: (Refer to the frequency distribution of rural family incomes above)
Lower limit for classes: 500
Upper limit for classes: 899

ii. Class Boundary


A class boundary is a value shared by two consecutive classes. It is the midpoint between the
upper limit of a class and the lower limit of the next class after it. Therefore, an upper boundary
of a class is also the lower boundary for the next class after it. Class boundaries are also known
as real class limits.

Example: (Refer to the above frequency distribution of rural family incomes.) The lower and
upper boundaries of the second class (RM500 - RM599) are 499.5 and 599.5 respectively.

iii. Class Width


The difference between the upper boundary and the lower boundary of a class gives the class
width, i. e. Class width = Upper boundary – Lower boundary. A class width is also called a class
size.

Example: Consider the third class from the above example, i.e. class RM600 – RM699.
Class width = 100

iv. Class Midpoint


A class midpoint is obtained by dividing the sum of the two class limits (or the two class
boundaries) by 2.
Class midpoint = upper limit + lower limit
2
= upper boundary + lower boundary
2

Example: Consider the third class from the above example, i.e. class RM600 – RM699. The
class midpoint is:

How to Construct a Frequency Distribution

(1) Determine the number of classes


The following formula can be used to determine the number of classes which is suitable for
the data size.
Sturge Formula
k = 1 + 3.3 log n ,
where k = number of classes, n = number of data points

4
Example: A data set has n = 30 data points. A suitable number of classes is:

k = 1 + 3.3 log 30
k = 5.8745 ≈ 6 classes

(2) Determine the class width (or size)


Class width = Largest value – Smallest value
Number of classes

(3) Determine the lower limit of the first class ( the starting point)
Any number that is equal to or less than the smallest value in the data set can be used as the
lower limit of the first class.

Example: The following data are weights of 50 mice (in gm) used in an experiment to
investigate the effect of lacking in a certain vitamin.
135 90 115 118 121 137 132 120 104 125
119 115 101 129 87 108 110 133 135 126
127 103 110 126 118 88 104 137 120 95
146 126 119 119 105 132 126 118 100 113
106 125 117 102 145 129 124 113 94 148
Construct a frequency distribution table, showing the class boundaries and class midpoints in it.

Solution: By following the above procedure, determine:


(1) the number of classes: k = 1 + 3.3 log 50
k = 6.607 ≈ 7

(2) the class width: = 8.714 ≈ 9

(3) the lower limit of the first class: 87

Classes are recorded in the first column of the frequency table. Class boundaries and class
midpoints are also recorded. The frequencies represent the number of mice that belong to each of
the seven different classes/categories of weights.

5
Weight (gm) Class boundary Class midpoint Tally Frequency, f
87 – 95 86.5 – 95.5 91 |||| 5
96 – 104 95.5 – 104.5 100 |||| | 6
105 – 113 104.5 – 113.5 109 |||| || 7
114 – 122 113.5 – 122.5 118 |||| |||| || 12
123 – 131 122.5 – 131.5 127 |||| |||| 10
132 – 140 131.5 – 140.5 136 |||| || 7
141 – 149 140.5 – 149.5 145 ||| 3
Total frequency 50

Frequency Table: Weight of mice in the lack of vitamin research.

Relative Frequency and Percentage Table


A relative frequency table is obtained by simply replacing frequencies of classes with their
relative frequencies. Multiplying the relative frequencies with 100 gives the percentages of
frequencies. The relative frequency and percentage of each class are computed as follows:

Relative frequency = Frequency of the class = f __


Sum of all frequencies f

Percentage = (Relative frequency) x 100

Example: The relative frequency and percentage table for the data on weights of mice lacking in
vitamin is as follows:

Weight (gm) Relative Frequency Percentage


87 – 95 0.10 10
96 – 104 0.12 12
105 – 113 0.14 14
114 – 122 0.24 24
123 – 131 0.20 20
132 – 140 0.14 14
141 – 149 0.06 6
Total 1.00 100

Relative Frequency and Percentage Table: Weight of mice lack of vitamin.

Cumulative Frequency Table


A cumulative frequency distribution gives the total number of data points that fall below or
above the boundary of each class, starting with the lower boundary of the first class and ending
with the upper boundary of the last class. In a “more than” cumulative frequency distribution,

6
we add up the frequencies of classes that exceed a class boundary. In a “less than” cumulative
frequency distribution table, we add up the frequencies of the classes that are less than a class
boundary.

Example: Using the frequency distribution of weights of mice, prepare a (i) “more than” and
(ii) “less than” cumulative frequency distribution for the weight s of mice.

Solution:
Weight (gm) Cumulative Frequency
> 86.5 50
> 95.5 45
> 104.5 39
> 113.5 32
> 122.5 20
> 131.5 10
> 140.5 3
> 149.5 0

A “more than” cumulative frequency distribution

Weight (gm) Cumulative Frequency


< 86.5 0
< 95.5 5
< 104.5 11
< 113.5 18
< 122.5 30
< 131.5 40
< 140.5 47
< 149.5 50

A “less than” cumulative frequency distribution

2.2 Graphical Presentation

Graphical displays can reveal the main characteristics of a data set at a glance. The bar chart and
the pie chart are two types of graphs that can be used to display qualitative data. For quantitative
data, the most commonly used graphical displays are histograms, frequency polygons and
cumulative frequency graphs or ogives.

Bar Chart
A bar chart is a graph made up of bars whose heights represent the frequencies of the respective
categories in a frequency table.

7
Example: Draw a bar chart for the following frequency distribution:

Job Sector Information Engineering Medical Others


Technology
Number of students 44 30 15 11

Solution:

Pie Charts
A pie chart is a circular representation of a frequency table, whereby a category is represented by
a sector of the circle. If the categories have different frequencies, then the size of the sectors in a
pie chart should be drawn proportionately to its relative frequency or percentage. The right
proportion is given by the size of the angle of a sector, computed as follows:

Angle of each sector = sector frequency x 360o


Total frequency

Example: Draw a pie chart for the above frequency distribution.

Solution:
Job Sector Frequency Percentage Angle at the centre of
circle
Information 44 158.4°
Technology
Engineering 30 108°
Medical 15 54°
Others 11 39.6°

Total 100% 360o

8
others

Info tech 11.0%


44.0%

engineering

30.0%

medical
15.0%

Histogram
A histogram is usually constructed for a set of grouped, continuous data. The continuity of data
values are depicted by bars that are adjacent to each other, without any gaps in between. To
ensure that there are no gaps in between adjacent bars, we can use class boundaries in the
construction of histograms.
The width of the bars in a histogram represents the width of the classes in a frequency
distribution and these values are plotted on the horizontal axis. Class frequencies, relative
frequencies, or percentages are plotted on the vertical axis and represented by the heights of the
bars.
Note: Histograms cannot be used in connection with frequency distributions that have open
classes, and they must be used with extreme care when class intervals are not equal. In cases
where class intervals are not equal, it is best to represent the class frequencies by the area of the
bars instead of their heights.

Example: Draw a histogram for the following distribution of mice weight:

Weight (gram) Class Boundary Frequency


87 – 95 86.5 – 95.5 5
96 – 104 95.5 - 104.5 6
105 – 113 104.5 – 113.5 7
114 – 122 113.5 – 122.5 12
123 – 131 122.5 – 131.5 10
132 – 140 131.5 – 140.5 7
141 – 149 140.5 – 149.5 3

9
Solution:
14

F 12
R 12
e
10
q 10
u
e 8

n 7 7
c 6
6
y
5
4

3
2

0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weighs of mice

Frequency Polygon
A frequency polygon is constructed by plotting class frequencies against the midpoints of the
classes. The points are then connected by straight lines, thus forming a polygon. To close the
frequency polygon, an additional class with zero frequency is added to each end of the
distribution. A frequency polygon is usually drawn on a histogram, as shown in the diagram
below.

14

f 12
r 12
e
q 10
10
u
e 8
n
7 7
c 6
6
y
5
4

3
2

0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weights of mice

Ogive
A cumulative frequency polygon or ogive is obtained by plotting the cumulative frequency of
values that are less than an upper class boundary against the upper class boundary. The points are
then joined together by straight lines. If relative cumulative frequencies or percentages had been
used, we would call the graph a relative frequency ogive or a percentage ogive.

10
Example: Draw an ogive based on a “less than” cumulative frequency distribution of the mice
weight.
Weight (gram) Cumulative Frequency
Less than 86.5 0
95.5 5
104.5 11
113.5 18
122.5 30
131.5 40
140.5 47
149.5 50

Solution:
60

C
u 50
m
u
l 40
a
t
i 30
v
e
20
f
r
e
q 10
86.5 95.5 104.5 113.5 122.5 131.5 140

0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weights of mice

Use of Ogive:
From an ogive, we can estimate the number of data points with values that are less than or more
than a given or specified value. Besides that, we can also estimate the data value at the centre
location (median value) or the data value at any other locations.

11
Example: Based on the “more than” ogive, estimate the
(i) number of mice with weight less than 130 gram and
(ii) mice weight which is exceeded by fifty percent of all the mice in the sample.

Solution:
(i) A „more than‟ cumulative frequency ogive

There are 12 mice have weight more than 130 gm, thus, 38 mice have weight less than
130 gm.

(ii) × 50 = 25

Therefore, 25 mice have weight more than 118 gm.

12

You might also like