CHP 2 Mat161
CHP 2 Mat161
CHP 2 Mat161
Data can be organized and displayed graphically using many different types of tables, charts and
graphs. Graphical displays help us see the important characteristics of data sets. In this chapter,
we will learn how to organize and summarize quantitative and qualitative data graphically.
Dotplots
A dotplot consists of dots which are plotted along a scale of values which represents the category
of each data. The number of dots for each category corresponds to the number of times each
value or item occurs in the data set. Dotplots are suitable for both quantitative and qualitative
data.
Example: The following data shows the number of students who were absent from a Statistics
course lecture, conducted 40 times in a particular semester.
0 2 3 0 4 1 2 3 2 0
1 2 3 4 5 5 2 3 0 4
1 2 4 1 6 1 3 4 5 2
2 3 4 5 3 3 2 4 5 2
Solution:
The variable of interest: The number of absentees in a lecture
1
Stem-and-Leaf Plots
A stem-and-leaf plot combines sorting and graphing of a data set. It can only be applied to
quantitative data. Each number in a data set is separated into two parts; the stem and the leaf.
The leftmost digit(s) becomes the stem and the rightmost digit(s), the leaf.
Example: The following stem-and-leaf plot depicts the ages of staff members at an
administrative office. The ages are arranged in increasing order, from 20 to 61 years.
Stem (tens) Leaves (units)
2 003345568 data values are 20, 20, 23, …, 28
3 0112556899
4 00033345677789
5 1112
6 01
How to plot: First, determine the stem values. The stem starts with the leftmost digit of the
minimum data value and consecutively increased until the leftmost digit of the maximum data
value. Allign the stem values vertically and draw a vertical line next to it. Put in the
corresponding leaf values for each stem on the right of the vertical line.
Example: The following are ages of 30 patients who had their first heart attack.
65 40 63 67 75 79 85 45 90 60
55 67 86 90 49 78 76 54 67 89
56 45 50 85 67 90 83 72 98 55
Construct a stem-and-leaf display.
Solution:
The variable of interest: Ages of patients who had their first heart attack
Stem Leaf
4 0 5 5 9
5 0 4 5 5 6
6 0 3 5 7 7 7 7
7 2 5 6 8 9
8 3 5 5 6 9
9 0 0 0 8
2
data set. For quantitative data, a class is an interval of numbers if the data is of the continuous
type or an individual number if the data is discrete.
Frequency Distribution
A frequency distribution groups data values into different classes (categories) and records the
number of data points that fall into each category.
Example: Twenty-five army inductees were given a blood test to determine their blood type.
The resulting data set consists of twenty-five data values, each being blood type A, B, AB, or O.
Grouping the data points into the four categories of blood type, we have the following frequency
distribution table:
To construct a frequency distribution for quantitative data, we need to understand the following
definitions:
i. Class Limit
The smallest and largest values that can fall in a given class interval are referred to as the class
limits. Classes without the lower limit or upper limit are known as open classes.
3
Example: (Refer to the frequency distribution of rural family incomes above)
Lower limit for classes: 500
Upper limit for classes: 899
Example: (Refer to the above frequency distribution of rural family incomes.) The lower and
upper boundaries of the second class (RM500 - RM599) are 499.5 and 599.5 respectively.
Example: Consider the third class from the above example, i.e. class RM600 – RM699.
Class width = 100
Example: Consider the third class from the above example, i.e. class RM600 – RM699. The
class midpoint is:
4
Example: A data set has n = 30 data points. A suitable number of classes is:
k = 1 + 3.3 log 30
k = 5.8745 ≈ 6 classes
(3) Determine the lower limit of the first class ( the starting point)
Any number that is equal to or less than the smallest value in the data set can be used as the
lower limit of the first class.
Example: The following data are weights of 50 mice (in gm) used in an experiment to
investigate the effect of lacking in a certain vitamin.
135 90 115 118 121 137 132 120 104 125
119 115 101 129 87 108 110 133 135 126
127 103 110 126 118 88 104 137 120 95
146 126 119 119 105 132 126 118 100 113
106 125 117 102 145 129 124 113 94 148
Construct a frequency distribution table, showing the class boundaries and class midpoints in it.
Classes are recorded in the first column of the frequency table. Class boundaries and class
midpoints are also recorded. The frequencies represent the number of mice that belong to each of
the seven different classes/categories of weights.
5
Weight (gm) Class boundary Class midpoint Tally Frequency, f
87 – 95 86.5 – 95.5 91 |||| 5
96 – 104 95.5 – 104.5 100 |||| | 6
105 – 113 104.5 – 113.5 109 |||| || 7
114 – 122 113.5 – 122.5 118 |||| |||| || 12
123 – 131 122.5 – 131.5 127 |||| |||| 10
132 – 140 131.5 – 140.5 136 |||| || 7
141 – 149 140.5 – 149.5 145 ||| 3
Total frequency 50
Example: The relative frequency and percentage table for the data on weights of mice lacking in
vitamin is as follows:
6
we add up the frequencies of classes that exceed a class boundary. In a “less than” cumulative
frequency distribution table, we add up the frequencies of the classes that are less than a class
boundary.
Example: Using the frequency distribution of weights of mice, prepare a (i) “more than” and
(ii) “less than” cumulative frequency distribution for the weight s of mice.
Solution:
Weight (gm) Cumulative Frequency
> 86.5 50
> 95.5 45
> 104.5 39
> 113.5 32
> 122.5 20
> 131.5 10
> 140.5 3
> 149.5 0
Graphical displays can reveal the main characteristics of a data set at a glance. The bar chart and
the pie chart are two types of graphs that can be used to display qualitative data. For quantitative
data, the most commonly used graphical displays are histograms, frequency polygons and
cumulative frequency graphs or ogives.
Bar Chart
A bar chart is a graph made up of bars whose heights represent the frequencies of the respective
categories in a frequency table.
7
Example: Draw a bar chart for the following frequency distribution:
Solution:
Pie Charts
A pie chart is a circular representation of a frequency table, whereby a category is represented by
a sector of the circle. If the categories have different frequencies, then the size of the sectors in a
pie chart should be drawn proportionately to its relative frequency or percentage. The right
proportion is given by the size of the angle of a sector, computed as follows:
Solution:
Job Sector Frequency Percentage Angle at the centre of
circle
Information 44 158.4°
Technology
Engineering 30 108°
Medical 15 54°
Others 11 39.6°
8
others
engineering
30.0%
medical
15.0%
Histogram
A histogram is usually constructed for a set of grouped, continuous data. The continuity of data
values are depicted by bars that are adjacent to each other, without any gaps in between. To
ensure that there are no gaps in between adjacent bars, we can use class boundaries in the
construction of histograms.
The width of the bars in a histogram represents the width of the classes in a frequency
distribution and these values are plotted on the horizontal axis. Class frequencies, relative
frequencies, or percentages are plotted on the vertical axis and represented by the heights of the
bars.
Note: Histograms cannot be used in connection with frequency distributions that have open
classes, and they must be used with extreme care when class intervals are not equal. In cases
where class intervals are not equal, it is best to represent the class frequencies by the area of the
bars instead of their heights.
9
Solution:
14
F 12
R 12
e
10
q 10
u
e 8
n 7 7
c 6
6
y
5
4
3
2
0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weighs of mice
Frequency Polygon
A frequency polygon is constructed by plotting class frequencies against the midpoints of the
classes. The points are then connected by straight lines, thus forming a polygon. To close the
frequency polygon, an additional class with zero frequency is added to each end of the
distribution. A frequency polygon is usually drawn on a histogram, as shown in the diagram
below.
14
f 12
r 12
e
q 10
10
u
e 8
n
7 7
c 6
6
y
5
4
3
2
0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weights of mice
Ogive
A cumulative frequency polygon or ogive is obtained by plotting the cumulative frequency of
values that are less than an upper class boundary against the upper class boundary. The points are
then joined together by straight lines. If relative cumulative frequencies or percentages had been
used, we would call the graph a relative frequency ogive or a percentage ogive.
10
Example: Draw an ogive based on a “less than” cumulative frequency distribution of the mice
weight.
Weight (gram) Cumulative Frequency
Less than 86.5 0
95.5 5
104.5 11
113.5 18
122.5 30
131.5 40
140.5 47
149.5 50
Solution:
60
C
u 50
m
u
l 40
a
t
i 30
v
e
20
f
r
e
q 10
86.5 95.5 104.5 113.5 122.5 131.5 140
0
86.5 95.5 104.5 113.5 122.5 131.5 140.5 149.5
Weights of mice
Use of Ogive:
From an ogive, we can estimate the number of data points with values that are less than or more
than a given or specified value. Besides that, we can also estimate the data value at the centre
location (median value) or the data value at any other locations.
11
Example: Based on the “more than” ogive, estimate the
(i) number of mice with weight less than 130 gram and
(ii) mice weight which is exceeded by fifty percent of all the mice in the sample.
Solution:
(i) A „more than‟ cumulative frequency ogive
There are 12 mice have weight more than 130 gm, thus, 38 mice have weight less than
130 gm.
(ii) × 50 = 25
12