General Mathematics VCE Units 3&4 Second Edition
General Mathematics VCE Units 3&4 Second Edition
General Mathematics VCE Units 3&4 Second Edition
Acknowledgements xii
4 Data transformation
4A The squared transformation. . . . . . . . . . . . . . . . .
4B The log transformation . . . . . . . . . . . . . . . . . . . . 221
4C The reciprocal transformation . . . . . . . . . . . . . . . . 230
4D Choosing and applying the appropriate transformation . . . . 239
Review of Chapter 4 . . . . . . . . . . . . . . . . . . . . . . 245
11B . . . . . . . . .
Interpreting transition matrices . . . . . . 555
11C Transition matrices – using recursion . . . . . . . . . . . . . 559
11D Transition matrices – using the rule Sn+1 = TSn + B . . . . . . . 570
11E Leslie matrices . . . . . . . . . . . . . . . . . . . . . . . . 574
Review of Chapter 11 . . . . . . . . . . . . . . . . . . . . . 588
12 Revision: Matrices
12A . . . . . . . . . . . . . . . . . . . . 596
Exam 1 type questions
Glossary 765
Answers 774
The textbook and its accompanying digital resources provide comprehensive coverage of the
assumed knowledge and skills required.
General Mathematics Units 3 and 4 provide the following four Areas of Study:
Data analysis
Recursion and financial modelling
Matrices and their applications
Networks and decision mathematics
Other key features of Cambridge General Mathematics VCE Units 3&2 Second Edition
For each topic within a chapter:
The topic starts with a clear outline of its Learning intentions
Exam 1-style questions are included at the end of every exercise.
The textbook is supported by an extensive range of online teacher and student resources, as
outlined in the ‘How to use this resource’ section on the following pages.
The TI-Nspire calculator examples and instructions have been completed by Peter Flynn,
and those for the Casio ClassPad by Mark Jelinek, and we thank them for their helpful
Overview of the
downloadable PDF
9 The convenience of a downloadable PDF
textbook has been retained for times when
users cannot go online.
10 PDF annotation and search features are
10 10
Overview of the Interactive Textbook
The Interactive Textbook (ITB) is an online HTML version of the print textbook powered
by the HOTmaths platform, included with the print book or available as a separate purchase.
11 The material is formatted for on screen use with a convenient and easy-to-use navigation
system and links to all resources.
12 Workspaces for all questions, which can be enabled or disabled by the teacher, allow
students to enter working and answers online and to save them. Input is by typing, with the
help of a symbol palette, handwriting and drawing on tablets, or by uploading images of
writing or drawing done on paper.
13 Self-assessment tools enable students to check answers, mark their own work, and rate
their confidence level in their work. This helps develop responsibility for learning and
communicates progress and performance to the teacher. Student accounts can be linked to
the learning management system used by the teacher in the Online Teaching Suite, so that
teachers can review student self-assessment and provide feedback or adjust marks.
14 All worked examples have video versions to encourage independent learning.
15 Worked solutions are included and can be enabled or disabled in the student ITB accounts
by the teacher.
16 An expanded and revised set of Desmos interactives and activities based on embedded
graphics calculator and geometry tool windows demonstrate key concepts and enable
students to visualise the mathematics.
17 The Desmos graphics calculator, scientific calculator, and geometry tool are also
embedded for students to use for their own calculations and exploration.
18 Revision of prior knowledge is provided with links to diagnostic tests and Year 10
HOTmaths lessons.
19 Quick quizzes containing automarked multiple-choice questions have been thoroughly
expanded and revised, enabling students to check their understanding.
20 Definitions pop up for key terms in the text, and are also provided in a dictionary.
21 Messages from the teacher assign tasks and tests.
22 The HOTmaths learning management system with class and student analytics and reports,
and communication tools.
23 Teacher’s view of a student’s working and self-assessment which enables them to modify
the student’s self-assessed marks, and respond where students flag that they had diffculty.
24 A HOTmaths-style test generator.
25 An expanded and revised suite of chapter tests, assignments and sample investigations.
26 Editable curriculum grids and teaching programs.
27 A brand-new Exam Generator, allowing the creation of customised printable and online
trial exams (see below for more).
Custom exams can model end-of-year exams, or target specific topics or types of questions
that students may be having difficulty with.
Features include:
The author and publisher wish to thank the following sources for permission to reproduce
Every effort has been made to trace and acknowledge copyright. The publisher apologises for
any accidental infringement and welcomes information that would redress this situation.
Chapter questions
I What is categorical data?
I What is numerical data?
I How do we summarise and display categorical data?
I How do we use the distribution of a categorical variable to answer
statistical questions?
I What is a dot plot?
I What is a stem plot?
I What is a histogram?
I What is a boxplot?
I What are summary statistics, and how do we choose which ones to use?
I How do we use the distribution of a numerical variable to answer statistical
I What is the normal distribution?
I What is the 68-95-99.7% rule and why is it useful?
I What are standardised values and why are they useful?
We can think of data as factual information about a person, object or situation which has
been collected and recorded. In General Mathematics Units 1&2 we learned a range of
statistical procedures to help us analyse such sets of data. In this chapter, we will review
and extend our knowledge of those procedures for data which has been collected from a
single variable, which is called univariate data.
1A Types of data
Learning intentions
I To be able to classify data as categorical or numerical.
I To be able to further classify categorical data as nominal or ordinal.
I To be able to further classify numerical data as discrete or continuous.
A group university of students were asked to complete a survey, and the information
collected from eight of these students is shown in the following table:
Since the answers to each of the questions in the survey will vary from student to student,
each question defines a different variable namely:
The values we collect about each of these variables are called data.
Types of variables
Variables come in two general types, categorical and numerical:
Categorical variables
Data generated by a categorical variable can be used to organise individuals into one of
several groups or categories that characterise this quality or attribute.
For example, a ‘C’ in the Study mode column indicates that the student studies on-campus,
while a ‘3’ in the Fitness level column indicates that their fitness level is low.
Categorical variables can be further classified as one of two types: nominal and ordinal.
Nominal variables
The variable Study mode is an example of a nominal variable because the values of the
variable, on-campus or online, simply name the group to which the students belong.
Ordinal variables
Ordinal variables have data values that can be used to both name and order.
The variable Fitness level is an example of an ordinal variable. The data generated by
this variable contains two pieces of information. First, each data value can be used to
group the students by fitness level. Second, it allows us to logically order these groups
according to their fitness level – in this case, as ‘low’, ‘medium’ or ‘high’.
Numerical variables
Numerical variables have data values which are quantities, generally arising from
counting or measuring.
For example, a ‘179’ in the Height column indicates that the person is 179 cm tall, while an
‘82’ in the Pulse rate column indicates that they have a pulse rate of 82 beats/minute.
Numerical variables can be further classified as one of two types: discrete and continuous.
Discrete variables
Discrete variables are those which may take on only a countable number of distinct
values such as 0, 1, 2, 3, 4, . . .
Discrete random variables often arise when the situation involves counting. The number
of mobile phones in a house is an example of a discrete variable.
As a guide, discrete variables arise when we ask the question ‘How many?’
Continuous variables
Continuous variables are ones which take an infinite number of possible values, and
are often associated with measuring rather than counting.
Thus, even though we might record a person’s height as 179 cm, their height could be any
value between 178.5 and 179.4 cm. We have just rounded to 179 cm for convenience, or
to match the accuracy of the measuring device.
As a guide, continuous variables arise when we ask the question ‘How much?’
Numerical or categorical?
Deciding whether data are numerical or categorical is not an entirely trivial exercise. Two
things that can help your decision-making are:
1 Numerical data can always be used to perform arithmetic computations. This is not the
case with categorical data. For example, it makes sense to calculate the average weight of
a group of individuals, but not the average house number in a street. This is a good test to
apply when in doubt.
2 It is not the variable name alone that determines whether data are numerical or
categorical; it is also the way the data are recorded. For example, if the data for variable
weight are recorded in kilograms, they are numerical. However, if the data are recorded as
‘underweight’, ‘normal weight’, ‘overweight’, they are categorical.
a Discrete, as the number of chocolate chips will only take whole number values.
b Continuous, as the data can take any value, limited only by the accuracy to which the
time can be measured.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
1A 1A Types of data 5
c Ordinal, as the numbers in this data do not represent quantities, they represent each
person’s level of approval of the coach.
d Nominal, as the numbers in this data again do not represent quantities, they represent
e Ordinal, as each student’s weight has been recorded into three categories which can
be ordered.
sheet Exercise 1A
5 The variables weight (light, medium, heavy) and age (under 25 years, 25-40 years, over
40 years) are:
A a nominal and an ordinal variable respectively
B an ordinal and a nominal variable respectively
C both nominal variables
D both ordinal variables
E both continuous variables
6 Data relating to the following five variables were collected from a group of university
study mode (1= on campus, 2 = online)
study load (1= full-time, 2-part-time)
number of contact hours per week
number of subject needed to complete degree
The number of these variables that are discrete numerical variables is:
A 1 B 2 C 3 D 4 E 5
A group of 11 preschool children were asked to choose between chocolate and vanilla
ice-cream (C = chocolate, V = vanilla):
Construct a frequency table (including percentage frequencies) to display the data.
Explanation Solution
1 Set up a table as shown.
2 Count up the number of chocolate (6) and vanilla
flavour Number Percentage
(5), and record in the Number column.
chocolate 6 54.5
3 Add to find the total number, 11 (6 + 5).
vanilla 5 45.5
4 Convert the frequencies into percentage
frequencies. e.g. percentage chocolate= Total 11 100.0
× 100% = 54.5%
5 Finally, total the percentages and record.
Note that the total should always equal the total number of observations – in this case, 11,
and that the percentages should add to 100%. However, if percentages are rounded to one
decimal place a total of 99.9 or 100.1 is sometimes obtained. This is due to rounding error.
Totalling the count and percentages helps check on your tallying and percentaging.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
8 Chapter 1 Investigating data distributions
Explanation Solution
a The data enable us to both group the countries a Ordinal
by climate type and put these groups in some sort
of natural order according to the ‘warmth’ of the
different climate types. The variable is ordinal.
b 1 Label the horizontal axis with the variable name, b 15
Climate type. Mark the scale off into three equal
intervals and label them ‘Cold’, ‘Mild’ and ‘Hot’.
15 Cold
required to identify which segment represents which category (see
opposite). The segmented bar chart opposite was formed from the 10
climate data used in Example 3. In a percentage segmented bar
chart, the lengths of each segment in the bar are determined by 5
the percentages. When this is done, the height of the bar is 100%.
Explanation Solution
1 In a segmented bar chart, the horizontal axis has no 100
label. 90 Hot
2 Label the vertical axis ‘Percentage’. Scale allowing Cold
for the maximum of 100 (%), Mark the scale in tens.
3 Draw a single bar of height 100. Divide the bar into 50
three by inserting dividing lines at 13% and 73.9% 40
(13 + 60.9%).
4 Shade (or colour) the segments differently. 10
5 Insert a legend to identify each shaded segment by 0
climate type.
The mode
One of the features of a data set that is quickly revealed with a frequency table or a bar
chart is the mode or modal category.
In a bar chart, the mode is given by the category with the tallest bar or longest segment.
For the previous bar charts, the modal category is clearly ‘mild’. That is, for the countries
considered, the most frequently occurring climate type is ‘mild’.
Modes are particularly important in ‘popularity’ polls. For example, in answering questions
such as ‘Which is the most watched TV station between 6:00 p.m and 8:00 p.m.?’ or ‘When
is the time a supermarket is in peak demand: morning, afternoon or night?’
Note, however, that the mode is only of real interest when a single category stands out
from the others.
The climate types of 23 countries were classified as being, ‘cold’, ‘mild’ or ‘hot’. The
majority of the countries, 60.9%, were found to have a mild climate. Of the remaining
countries, 26.1% were found to have a hot climate, while 13.0% were found to have a
cold climate.
Exercise 1B
Example 3 2 The following data identify the state of residence of a group of people where
1 = Victoria, 2 = South Australia and 3 = Western Australia.
2 1 1 1 3 1 3 1 1 3 3
a Is the variable state of residence, categorical or numerical?
b Construct a frequency table (with both numbers and percentages) to show the
distribution of state of residence for this group of people.
c Construct a bar chart of the percentaged frequency table.
b Copy and complete the table giving the percentages correct to the nearest whole
c Display the data in the form of a percentage segmented bar chart.
b Use the information in the frequency table to complete the following report
describing the distribution of student responses to the question, ‘How often do you
play sport?’
When students were asked the question, ‘How often do you play sport’,
the dominant response was ‘Sometimes’, given by % of the students. Of
the remaining students, % of the students responded that they played sport
while % said that they played sport .
0 almost none some of most of almost all
of the time the time the time of the time
The percentage of people who chose the modal response to this question is closest to:
A 30% B 43% C 50% D 57% E 70%
Frequency tables can also be used to organise numerical data. For a discrete variable which
only takes a small number of values the process is the same as that for categorical data, as
shown in the following example.
The number of bedrooms in each of the 24 properties sold in a certain area over a one
month period are as follows:
2 3 4 3 3 4 3 4 4 1 3 2 1 2 2 2 4 5 3 4 4 5 3 4
Construct a table for these data showing both frequency and percentage frequency, giving
the values of the percentage frequency rounded to one decimal place.
Explanation Solution
1 Find the maximum and the minimum
Number of Frequency
values in the data set. Here the
bedrooms Number %
minimum is 1 and the maximum is 5.
1 2 8.3
2 Construct a table as shown, including
2 5 20.8
all the values between the minimum
3 7 29.2
and the maximum.
4 8 33.3
3 Count the number of 1s, 2s, etc. in the
5 2 8.3
data set. Record these values in the
number column and add the frequencies Total 24 99.9
The choice of intervals can vary but there are some guidelines.
A division which results in about 5 to 15 groups, is preferred.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
1C Displaying and describing numerical data 15
Choose an interval width that is easy for the reader to interpret such as 10 units, 100 units
or 1000 units (depending on the data).
By convention, the beginning of the interval is given the appropriate exact value, rather
than the end. As a result, intervals of 0–49, 50–99, 100–149 would be preferred over the
intervals 1–50, 51–100, 101–150 etc.
When we then organise the data into a frequency table using these data intervals we call this
table a grouped frequency table.
The data below give the average hours worked per week in 23 countries.
35.0 48.0 45.0 43.0 38.2 50.0 39.8 40.7 40.0 50.0 35.4 38.8
40.2 45.0 45.0 40.0 43.0 48.8 43.3 53.1 35.6 44.1 34.8
Construct a grouped frequency table with five intervals.
Explanation Solution
1 Set up a table as shown. Use five
Average Frequency
intervals: 30.0–34.9, 35.0–39.9, . . . ,
hours worked Number Percentage
30.0−34.9 1 4.3
2 List these intervals, in ascending order,
under Average hours worked. 35.0−39.9 6 26.1
3 Count the number of countries whose 40.0−44.9 8 34.8
average working hours fall into each of 45.0−49.9 5 21.7
the intervals. Record these values in the 50.0−54.9 3 13.0
‘Number’ column. Total 23 99.9
4 Convert the counts into percentages
and record in the ‘Percentage’ column.
5 Total the number and percentage columns.
Explanation Solution
1 Label the horizontal axis with the variable 9
name, Average hours worked. Mark the 8
scale using the start of each interval: 30, 7
35, . . . 6
1 Start a new document by pressing / + N
(or c>New. If prompted to save an existing
document, move the cursor to No and press ·.
2 Select Add Lists & Spreadsheet.
Enter the data into a list named marks.
a Move the cursor to the name cell of column
A and type in marks as the list variable.
Press ·.
b Move the cursor down to row 1, type in the first data value and press ·.
Continue until all the data have been entered. Press · after each entry.
3 Statistical graphing is done through the Data &
Statistics application. Press / + I (or
alternatively press / ) and select Add Data &
a Press e · (or click on the Click to add
variable box on the x-axis) to show the list of
variables. Select marks.
Press · to paste marks to that axis.
b A dot plot is displayed as the default. To change
the plot to a histogram, press b>Plot Type>
Histogram. The histogram shown opposite has
a column (or bin) width of 2, and a starting
point (alignment) of 3. See Step 5 below for
instructions on how to change the appearance
of a histogram.
4 Data analysis
a Move the cursor over any column; a { will
appear and the column data will be displayed
as shown opposite.
b To view other column data values, move the
cursor to another column.
Note: If you click on a column, it will be selected.
Hint: If you accidentally move a column or data point, / + d · will undo the move.
5 Change the histogram column (bin) width to 4 and the starting point to 2.
a Press / + b to access the context menu as shown (below left).
Hint: Pressing / + b · with the cursor on the histogram gives you a context menu that
relates only to histograms. You can access the commands through b>Plot Properties.
b Select Bin Settings>Equal Bin Width.
c In the settings menu (below right) change the Width to 4 and the Starting Point
(Alignment) to 2 as shown. Press ·.
1 From the application menu
screen, locate the built-in
Statistics application.
Tap to open.
Tapping from the icon panel
(just below the touch screen) will
display the application menu if it
is not already visible.
2 Enter the data into a list named
The screen should look like the one shown above right.
3 Set up the calculator to plot a
statistical graph.
a Tap from the toolbar. This
opens the Set StatGraphs dialog
b Complete the dialog box as
given below.
Draw: select On.
Type: select Histogram ( ).
XList: select main\marks ( ).
Freq: leave as 1.
c Tap Set to confirm your selections.
Note: To make sure only this graph is drawn, select SetGraph from the menu bar at the top and
confirm that there is a tick only beside StatGraph1 and no others.
4 To plot the graph:
a Tap in the toolbar.
b Complete the Set Interval
dialog box as follows.
HStart: type 2 (i.e. the
starting point of the
first interval)
HStep: type 4 (i.e. the
interval width).
Tap OK to display
Note: The screen is split into two halves, with the graph displayed in the bottom half, as shown
above. Tapping from the icon panel allows the graph to fill the entire screen. Tap again to
return to half-screen size.
5 Tapping from the toolbar
places a marker (+) at the
top of the first column of the
histogram (see opposite) and
tells us that:
a the first interval begins at
2 (x c = 2)
b for this interval, the
frequency is 1 (F c = 1).
To find the frequencies
and starting points of the
other intervals, use the cursor key arrow ( ) to move from interval to interval.
The purpose of constructing a histogram is to help understand the key features of the data
distribution. These features are:
How are the data distributed? Is the histogram peaked? That is, do some data values tend to
occur much more frequently than others, or is the histogram relatively flat, showing that all
values in the distribution occur with approximately the same frequency?
Symmetric distributions
If a histogram is single-peaked, does the histogram region tail off evenly on either side of the
peak? If so, the distribution is said to be symmetric (see Histogram 1).
lower tail peak upper tail peak peak
10 10
8 8
6 6
4 4
2 2
0 0
Histogram 1 Histogram 2
A single-peaked symmetric distribution is characteristic of the data that derive from
measuring variables such as intelligence test scores, weights of oranges, or any other data for
which the values vary evenly around some central value.
The double-peaked distribution (histogram 2) is symmetric about the dip between the two
peaks. A histogram that has two distinct peaks indicates a bimodal (two modes) distribution.
A bimodal distribution often indicates that the data have come from two different
populations. For example, if we were studying the distance the discus is thrown by Olympic-
level discus throwers, we would expect a bimodal distribution if both male and female
throwers were included in the study.
Skewed distributions
Sometimes a histogram tails off primarily in one direction. If a histogram tails off to the
right, we say that it is positively skewed (Histogram 3). The distribution of salaries of
workers in a large organisation tends to be positively skewed. Most workers earn a similar
salary with some variation above or below this amount, but a few earn more and even fewer,
such as the senior manager, earn even more. The distribution of house prices also tends to be
positively skewed.
peak long upper tail long lower tail peak
10 10
8 +ve skew 8 −ve skew
6 6
4 4
2 2
0 0
Histogram 3 Histogram 4
If a histogram tails off to the left, we say that it is negatively skewed (Histogram 4). The
distribution of age at death tends to be negatively skewed. Most people die in old age, a few
in middle age and fewer still in childhood.
Histograms 6 to 8 display the 8
distribution of test scores for three 7
different classes taking the same 6
subject. They are identical in shape, 5
but differ in where they are located
along the axis. In statistical terms we
say that the distributions are ‘centred’
at different points along the axis. But 0
what do we mean by the centre of a 50 60 70 80 90 100 110 120 130 140 150
distribution? Histograms 5 to 7
This is an issue we will return to in more detail later in the chapter. For the present we will
take the centre to be the middle of the distribution.
The middle of a symmetric distribution is reasonably easy to locate by eye. Looking at
histograms 5 to 7, it would be reasonable to say that the centre or middle of each distribution
lies roughly halfway between the extremes; half the observations would lie above this point
and half below. Thus we might estimate that histogram 5 (yellow) is centred at about 60,
histogram 6 (light blue) at about 100, and histogram 7 (dark blue) at about 140.
For skewed distributions, it is more difficult to estimate the middle of a distribution by eye.
The middle is not halfway between the extremes because, in a skewed distribution, the
scores tend to bunch up at one end.
However, if we imagine a cardboard 5
cut-out of the histogram, then the middle line that divides
4 the area of the
lies on the line that divides the histogram histogram in half
If the histogram is single-peaked, is it narrow? This would indicate that most of the data
values in the distribution are tightly clustered in a small region. Or is the peak broad? This
would indicate that the data values are more widely spread out. Histograms 9 and 10 are
both single-peaked. Histogram 9 has a broad peak, indicating that the data values are not
very tightly clustered about the centre of the distribution. In contrast, Histogram 10 has a
narrow peak, indicating that the data values are tightly clustered around the centre of the
6 12
4 8
2 4
0 0
2 4 6 8 10 12 14 16 18 20 22 2 4 6 8 10 12 14 16 18 20 22
Histogram 9 Histogram 10
Outliers are any data values that stand out from the main body of data. These are data values
that are atypically high or low. See, for example, Histogram 11, which shows an outlier. In
this case it is a data value that is atypically low compared to the rest of the data values.
In the histogram shown there appears to be 10 main body of data
an outlier, a data value which is lower than 8 outlier
than rest of the data values. Such values 6
should be checked, they may indicate an 4
unusually low value, but they may also 2
indicate an error in the data. 0
Histogram 11
Explanation Solution
1 Determine the shape of the distribution. 1 The distribution is clearly negatively
skewed, with a long lower tail.
2 Locate the (approximate) centre of the 2 The centre of the distribution is around
distribution, the value seems to divide 38-39 weeks.
the area of the histogram in half.
3 Consider the spread of the distribution, 3 The data values range from 25-42
are the majority of the values close to weeks, but most of the data values are
the centre, or quite spread out? close to the centre, in the range of 36-41
4 Can we identify any outliers? 4 Those values which are less than 34
weeks seem to be small in comparison
to the rest of the data, but which values
are outliers cannot be determined from
the histogram.
We can see from this example that it is very difficult to give an exact values for centre and
spread, and to clearly identify outliers, from the histogram. We will return to this example
later in the chapter once we have discussed which measures of centre and spread are
appropriate for this distribution, and when we have an exact definition on an outlier.
Exercise 1C
Constructing a frequency table for discrete numerical data taking a small number of
Example 6 1 The number times a sample of 20 people bought take-away food over a one week
period is as follows:
0 5 3 0 1 0 2 4 3 1 0 2 1 2 1 5 3 0 0 4
a Construct a frequency table for the data, including both the frequency and
percentage frequency.
b What percentage of people bought take away food more than 3 times?
c What is the mode of this distribution?
2 The number of chocolate chips per biscuit in a sample 40 biscuits was found to be as as
2 5 4 4 5 4 6 4 4 4 5 4 4 5 6 6 5 5 4 6
4 5 5 4 5 4 6 4 6 4 5 4 5 4 6 5 5 6 4 6
a Construct a frequency table for the data, including both the frequency and
percentage frequency.
b What percentage of biscuits contained three or less chocolate chips?
c What is the mode of this distribution?
ii 25–29 words? 10
averages of cricketers playing for a
district team. 2
a How many players have their 1
averages recorded in this histogram?
b How many of these cricketers had a 0 5 10 15 20 25 30 35 40 45 50 55
batting average: Batting average
i 20 or more?
ii less than 15?
iii at least 20 but less than 30?
iv of 45?
c What percentage of these cricketers had a batting average:
i 50 or more? ii at least 20 but less than 40?
8 The numbers of children in the families of 25 VCE students are listed below.
1 6 2 5 5 3 4 1 2 7 3 4 5
3 1 3 2 1 4 4 3 9 4 3 3
a Use a CAS calculator to construct a histogram so that the column width is one and
the first column starts at 0.5.
b What is the starting point for the fourth column and what is the count?
c Redraw the histogram so that the column width is two and the first column starts
at 0.
d i What is the count in the interval from 6 to less than 8?
ii What actual data value(s) does this interval include?
a 20 b 80
15 65
10 40
5 20
0 0
Histogram A Histogram B
c 20 d 20
15 15
10 10
5 5
0 0
Histogram C Histogram D
e 20 f 20
15 15
10 10
5 5
0 0
Histogram E Histogram F
40000 50000 60000 70000 80000 90000 100000 110000 120000 130000 140000 150000
11 The number of people in the company who earn from $65,000 to less than $70,000 per
year is equal to:
A 20 B 30 C 50 D 32 E 62
Dot plots and stem plots are two simple plots used to display numerical data. They are
generally constructed by hand (that is, without using a calculator), from a data set that is
reasonably small.
A dot plot consists of a number line with each data point marked by a dot. When several
data points have the same value, the points are stacked on top of each other.
While some CAS calculators will construct a stem plot, they were designed to be a quick
and easy way of ordering and displaying a small data set by hand.
For example, consider the marks obtained by 17 VCE students on a statistics test.
2 12 13 9 18 17 7 16 12 10 16 14 11 15 16 15 17
If we use the process described in Example 11 to form a stem plot, we end up with a
‘bunched-up’ plot like the below.
0 2 7 9
1 0 1 2 2 3 4 5 5 6 6 6 7 7 8
We can solve this problem by ‘splitting’ the stems.
Generally the stem is split into halves or fifths as shown below.
Key: 1|6 = 16 Key: 1|6 = 16 Key: 1|6 = 16
0 2 7 9 0 2 0
1 0 1 2 2 3 4 5 5 6 6 6 7 7 8 0 7 9 0 2
Single stem 1 0 1 2 2 3 4 0
1 5 5 6 6 6 7 7 8 0 7
Stem split into halves 0 9
1 0 1
1 2 2 3
1 4 5 5
1 6 6 6 7 7
1 8
Stem split into fifths
Splitting the stems is useful when there are only a few different values for the stem.
Exercise 1D
Constructing a dot plot
Example 10 1 The following data gives the number of rooms in 11 houses.
4 6 7 7 8 4 4 8 8 7 8
a Is the variable number of rooms discrete or continuous?
b Construct a dot plot.
2 The following data give the number of children in the families of 14 VCE students:
1 6 2 5 5 3 4 4 2 7 3 4 3 4
a Is the variable number of children discrete or continuous?
b Construct a dot plot.
c Write down the value of the mode. What does the mode represent in the context of
the data?
3 The following data give the average life expectancies in years of 13 countries.
76 75 74 74 73 73 75 71 72 75 75 78 72
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
32 Chapter 1 Investigating data distributions 1D
4 Describe the shape of each of the following distributions (negatively skewed, positively
skewed, or approximately symmetric).
a b
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
9 The stem plot on the right shows the ages, Age (years) key: 2|0 represents 20
in years, of all the people attending a
2 2 3
meeting. 2 5 6 6 7 7 8 9
a How many people attended the meeting? 3 0 1 3 3 4 4 4 4
b What is the shape of the distribution of 3 5 5 5 6 7 7 7 8 8 8 9 9
ages? 4 0 2 3 3 4 4
4 5 5 6 8
c How many of these people were less
5 0
than 33 years old?
Many numerical variables that we deal with in statistics have values that range over
several orders of magnitude. For example, the populations of countries range from a few
thousand to hundreds of thousands, to millions, to hundreds of millions to just over 1 billion.
Constructing a histogram that effectively locates every country on the plot is impossible.
One way to solve this problem is to use a scale that spreads out the countries with small
populations and ‘pulls in’ the countries with huge populations.
A scale that will do this is called a logarithmic scale (or, more commonly, a log scale).
Consider the numbers:
0.01, 0.1, 1, 10, 100, 1000, 10 000, 100 000, 1 000 000
Such numbers can be written more compactly as:
10−2 , 10−1 , 100 , 101 , 102 , 103 , 104 , 105 , 106
In fact, if we make it clear we are only talking about powers of 10, we can merely
write down the powers:
−2, −1, 0, 1, 2, 3, 4, 5, 6
These powers are called the logarithms of the numbers or ‘logs’ for short.
When we use logarithms to write numbers as powers of 10, we say we are working with
logarithms to the base 10. We can indicate this by writing log10 .
log10 x
If log10 x = b, then 10b = x
Logarithmic transformation
A logarithmic transformation involves changing the scale on the horizontal axis from x
to log10 (x), and replacing each of the data values with its logarithm.
For example, the histogram below displays the body weights (in kg) of a number of animal
species. Because the animals represented in this data set have weights ranging from around
1 kg to 90 tonnes (a dinosaur), most of the data are bunched up at one end of the scale and
much detail is missing. The distribution of weights is highly positively skewed, with an
Percentage 40%
0 10 000 20 000 30 000 40 000 50 000 60 000 70 000 80 000 90 000
Body weight
distribution is now approximately symmet- 16%
ric, with no outliers, and the histogram is 12%
considerably more informative. 8%
We can now see that the percentage of
animals with weights between 10 and –2 –1 0 1 2 3 4 5 6
100 kg is similar to the percentage of log(body weight)
animals with weights between 100 and 1000 kg.
Explanation Solution
a Open a calculator screen, type log (45) and press
·. Write down the answer correct to two
significant figures.
b If the log of a number is 2.7125, then the number
is 102.7125 .
Enter the expression 102.7125 and press ·.
Write down the answer correct to the nearest
whole number. a log 45 = 1.65 . . .
= 1.7 (to 2 sig. figs)
b 10 2.7125
= 515.82 . . .
= 516 (to the nearest
whole number)
a What body weight (in kg) is
represented by the number 4 on
the log scale?
b How many of these animals have
body weights more than 10 000 –2 –1 0 1 2 3 4 5 6
kg? log(body weight)
c The weight of a cat is 3.3 kg. Use your calculator to determine the log of its weight
correct to two significant figures.
d Determine the weight (in kg) of the animal with a log(body weight) of 3.4 (the
elephant). Write your answer correct to the nearest whole number.
Explanation Solution
a If the log of a number is 4 then the a 104 = 10 000 kg
number is 10 = 10 000.
1 a Start a new document by pressing / + N.
b Select Add Lists & Spreadsheet.
Enter the data into a column named weight.
sheet Exercise 1E
a The brain weight (in g) of a mouse
is 0.4 g. What value would be 3
plotted on the log scale?
b The brain weight (in g) of an
African elephant is 5712 g. What 0
−2 −1 0 1 2 3 4 5 6
is the log of this brain weight (to
two significant figures)?
c What brain weight (in g) is represented by the number 2 on the log scale?
d What brain weight (in g) is represented by the number –1 on the log scale?
e Use the histogram to determine the number of these animals with brain weights:
i 1000 g or more ii from 1 to less than 100 g iii 1 g or more
emissions (in thousands of metric tons) for
239 different countries, plotted on a log10 40
scale. 30
1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00
Based on this histogram, the percentage of countries with carbon dioxide emissions (in
thousands of metric tons) from 10 000 to less than 100 000 is equal to:
A 21 B 25 C 26 D 50 E 60
6 The following histogram 40
shows the amount spent 36
by tourists from several 30
countries in one year 26
(spending), plotted on a log10 20
scale. 14
6.00 6.50 7.00 7.50 8.00 8.50 9.00 9.50 10.00 10.50 11.00 11.50 12.00 12.50 13.00
The number of countries where tourists spent from $100 000 000 to less than
$1 000 000 000 per year is equal to:
A 12 B 25 C 33 D 37 E 51
The median
The median is the middle value in an ordered data set.
n + 1
For n data values the median is located at the th position.
n is odd, the median will be the middle data value
n is even, the median will be the average of the two middle data values.
Order each of the following data sets, locate the median, and then write down its value.
a 2 9 1 8 3 5 3 8 1 b 10 1 3 4 8 6 10 1 2 9
Explanation Solution
a For an odd number of data values, the
median will be the middle data value.
1 Write down the data set in order. 1 1 2 3 3 5 8 8 9
1 1 2 3 3 5 8 8 9
9 + 1
2 Locate the middle data value by eye Median is the th or fifth value.
or use the rule.
3 Write down the median. Median = 3
b For an even number of data values, the
median will be the average of the two
middle data values.
Determine the median age of these cricketers and mark its location on the dot plot.
Explanation Solution
The median value is the middle
data value in the dot plot.
1 Locate the middle data value
(or use the rule) and identify it
on the dot plot.
17 18 19 20 21 22 23 24 25 26 27 28 29 30
Age (years)
2 Write down its value. Median = 22 years
The stem plot opposite displays the maximum temperature (in ◦ C) Key: 0|8 = 8◦ C
for 12 days in January. 1 899
Determine the median maximum temperature for these 12 days. 2 0257899
3 13
Explanation Solution
1 For an even number of data values, as in this Key: 0|8 = 8◦ C
example, the median will be the average of the 1 899
two middle data values. 2 0257899
2 Locate the two middle data values in the dot plot
3 13
by eye (or use the rule) and identify them on the
25 + 27
3 Determine the median by finding the average of M= = 26◦ C
these two data values.
Having found the median value in a dot plot or stem plot, we now look at ways of doing the
same with the first measure of spread, the range.
The range
The range
The range, R, is the simplest measure of spread of a distribution. It is the difference
between the largest and smallest values in the data set.
R = largest data value − smallest data value
Explanation Solution
1 Identify the lowest and highest values in the stem Key: 0|8 = 8◦ C
plot and write them down. 1 899
2 0257899
3 13
2 Substitute into the rule for the range and Lowest = 18, highest = 33, range
evaluate. = 33 − 18 = 15◦ C
Because the range depends only on the two extreme values in the data, it is not always an
informative measure of spread. For example, one or other of these two values might be an
outlier. Furthermore, any data with the same highest and lowest values will have the same
range, irrespective of the way in which the data are spread out in between.
A more refined measure of spread that overcomes these limitations of the range is the
interquartile range (IQR).
Example 18 Finding the IQR from an ordered stem plot when n is even
Weight (kg) 1 | 2 represents 1.2 kg
Find the interquartile range of the weights of
1 9
the 18 cats whose weights are displayed in
2 1 3 5 8
the ordered stem plot below.
3 0 0 4 9 9
4 0 4 5 8
5 0 3
6 3 4
Explanation Solution
1 There are 18 values in total. This means that
there are nine values in the lower ‘half’, and nine
in the upper ‘half’.
2 The median of the lower half (Q1 ) is the middle Lower half:
of lower nine values, which is the 5th value from 1.9 2.1 2.3 2.5 2.8 3.0 3.0 3.4 3.9
the bottom. Q1 = 2.8
3 The median of the upper half (Q3 ) is the middle Upper half:
of the upper nine values, which is the 5th value 3.9 4.0 4.4 4.5 4.8 5.0 5.3 6.3 6.4
from the top. Q3 = 4.8
Example 19 Finding the IQR from an ordered stem plot when n is odd
The stem plot shows the life expectancy (in Stem: 5|2 = 52 years
years) for 23 countries. Find the IQR for life 5 2
expectancies. 5 56
6 4
6 6679
7 122334444
7 556677
Explanation Solution
1 Since there are 23 values, the median is the 12th Stem: 5|2 = 52 years
value from either end which is 73. Mark the 5 2
value 73 on the stem plot.
5 5 6
2 Since n is odd, to find the quartiles the median
6 4 Q1
value is excluded. This leaves 11 values below
the median and 11 values above the median. 6 6 6 7 9 median
Q1 = midpoint of the bottom 11 data values 7 1 2 2 3 3 4 4 4 4
Q3 = midpoint of the top 11 data values.
7 5 5 6 6 7 7
Write these values down.
Q1 = 66, Q3 = 75
To check that these quartiles are correct, write the data values in order, and mark the median
and the quartiles. If correct, the median divides the data set up into four equal groups.
Q1 Q2 (= M) Q3
| 55 {z
56 64 66} 66 67
| 69 {z } 73 73
71 72 72 | 74 {z } 75 75
74 74 74 | 76 {z 76 77 77
5 values 5 values 5 values 5 values
10 20 30 40 50 60 70 80 90 100
internet hours per week
Explanation Solution
1 Since there are 23 values, we can locate There is 1 value in the interval 10-20 (total
which interval contains the median by 1), 2 values in the interval 40-50 (total 3),
adding the number of values in each 4 values in the interval 50-60 (total 7), 5
interval moving from left to right. values in the interval 60-70 (total 12). Thus
M = the 12th value from the bottom the median is in the interval 60-70.
(adding from left to right)
2 Similarly
Q1 = the 6th value from the bottom Q1 is in the interval 50-60
(adding from left to right)
Q3 = the 6th value from the top Q3 is in the interval 80-90
(adding from right to left)
Why is the IQR a more useful measure of spread than the range?
The IQR is a measure of spread of a distribution that includes the middle 50% of
observations. Since the upper 25% and lower 25% of observations are discarded, the
interquartile range is generally not affected by the presence of outliers.
The mean
The mean of a set of data is what most people call the ‘average’. The mean of a set of data
is given by:
sum of data values
mean =
total number of data values 3
For example, consider the set of data:
2 3 4
2 3 3 4
0 1 2 3 4 5 6
The mean of this set of data is given by:
2 + 3 + 3 + 4 12 mean
mean = = =3
4 4
From a pictorial point of view, the mean is the balance point of a distribution (see above).
Note that in this case, the mean and the median coincide; the balance point of the
distribution is also the point that splits the distribution in half. That is, there are two
data points to the left of the mean and two to the right. This is a general characteristic of
symmetric distributions.
However, consider the data set M
2 3 3 8 3
The median remains at M = 3, but:
2 3 8
2 + 3 + 3 + 8 16
mean = = =4
4 4 0 1 2 3 4 5 6 7 8
Note that the mean is affected by changing the largest data value but that the median is not.
Some notation
Because the rule for the mean is relatively simple, it is easy to write in words. However, later
you will meet other rules for calculating statistical quantities that are extremely complicated
and hard to write out in words.
To overcome this problem, we introduce a shorthand notation that enables complex
statistical formulas to be written out in a compact form. In this notation, we use:
the Greek capital letter sigma, , as a shorthand way of writing ‘sum of’
a lower case x to represent a data value
a lower case x with a bar, x̄ (pronounced ‘x bar’), to represent the mean of the data values
an n to represent the total number of data values.
The rule for calculating the mean then becomes: x̄ =
Explanation Solution
a n is the number of data values. n=8
x = 38 + 36 + 35 + 43 + 46 + 64 + 48 + 25
b x is the sum of the data
values. = 335
c x̄ is the
P mean. It is defined by
x 335
x x̄ = = = 41.9
x̄ = . n 8
CAS 3: How to calculate the mean and standard deviation using the
TI-Nspire CAS
The following are the heights (in cm) of a group of women.
176 160 163 157 168 172 173 169
Determine the mean and standard deviation of the women’s heights. Give your answers
correct to two decimal places.
1 Start a new document by pressing / + N.
2 Select Add Lists & Spreadsheet.
Enter the data into a list named height, as shown.
3 Statistical calculations can be done in either
the Lists & Spreadsheet application or the
Calculator application (used here).
Press / + I and select Add Calculator.
a Press b>Statistics>Stat Calculations>One-
Variable Statistics. Press · to accept the
Num of Lists as 1.
b i To complete this screen, use the ¢ arrow
and · to paste in the list name height.
4 Write down the answers to the required degree The mean height of the women
of accuracy (i.e. two decimal places). is x̄ = 167.25 cm and the stan-
dard deviation is s = 6.67 cm.
Notes: a The sample standard deviation is sx.
b Use the £ ¤ arrows to scroll through the results screen to obtain values for additional
statistical values.
CAS 3: How to calculate the mean and standard deviation using the
The following are all heights (in cm) of a group of women.
176 160 163 157 168 172 173 169
Determine the mean and standard deviation of the women’s heights correct to two
decimal places.
1 Open the Statistics application
and enter the data into the
column labelled height.
2 To calculate the mean and standard
deviation, select Calc from the
menu One-Variable from the
drop-down menu to open the Set
Calculation dialog box shown
3 Complete the dialog box as shown.
XList: select main\height ( ).
Freq: leave as 1.
4 Tap OK to confirm your selections
and calculate the required statistics,
as shown.
5 Write down the answers to two decimal The mean height of the women is
places. x̄ = 167.25 cm.
The standard deviation is
s x = 6.67 cm.
Notes: a The value of the standard deviation is given by s x .
b Use the side-bar arrows to scroll through the results screen to obtain values for additional
statistical values (i.e. median, Q3 and the maximum value) if required.
Exercise 1F
2 The prices of nine second-hand mountain bikes advertised for sale were as follows.
$650 $3500 $750 $500 $1790 $1200 $2950 $430 $850
What is the median price of these bikes?
0 1 2 3 4 5 6
Number of times visited
7 The stem plot displays the test scores for 20 students. Key: 1|0 = 10
a Describe the shape of the distribution. 1 0 2
b Determine the median, M. 1 5 5 6 9
c Determine the quartiles, Q1 and Q3 . 2 3 3 4
d Calculate the IQR and the range, R. 2 5 7 9 9 9
3 0 1 2 4
3 5 9
9 The stem plot displays the university participation rates Key: 0|1 = 1
(%) in 17 countries. 0 1 3 8 9
a Determine the median, M. 1 2 3 7
b Determine the quartiles Q1 and Q3 . 2 0 1 2 5 6 6
c Calculate the IQR and the range, R. 3 0 6 7
5 5
Example 20 10 The histogram shows the time taken to complete a complex task by a group of students.
Find possible values for the median and quartiles of this distribution.
55 60 65 70 75 80 85 90 95 100 105 110 115
time taken to complete task (minutes)
11 A group of 195 people were asked to record (to one decimal place) the average number
of hours they spent on email each week over a 10 week period. The data are shown in
the following histogram:
0 5 10 15 20 25 30 35
hours spent per week on email
13 Calculate the mean and locate the median and modal value(s) of the following scores.
a 1 3 2 1 2 6 4 5 4 3 2
b 3 12 5 4 3 2 6 5 4 5 5 6
14 The temperature of a hospital patient (in degrees Celsius) taken at 6-hourly intervals
over 2 days was as follows.
35.6 36.5 37.2 35.5 36.0 36.5 35.5 36.0
a Calculate the patient’s mean and median temperature over the 2-day period.
b What do these values tell you about the distribution of the patient’s temperature?
15 The amounts (in dollars) spent by seven customers at a corner store were:
0.90 0.80 2.15 16.55 1.70 0.80 2.65
a Calculate the mean and median amount spent by the customers.
b Does the mean or the median give the best indication of the typical amount spent by
customers? Explain your answer.
16 For which of the following distributions might you question using the mean as a
measure of the centre of the distribution? Justify your selection.
a b c
Age distribution in a country Urban car accident rates Blood cholesterol levels
17 The stem plot shows the distribution of weights (in Weight (kg)
kg) of 22 footballers. 6 9
a Name the shape of the distribution. Which 7 0 2
measure of centre, the mean or the median, do 7 6 6 7 8
you think would best indicate the typical weight
8 0 0 1 2 3 3 4
of these footballers?
8 5 5 5 6
b Determine both the mean and median to check
9 1 2
your prediction.
9 8
10 3
19 Without using the statistical capabilities of your calculator, write down the mean and
standard deviation of the following six data values: 7.1 7.1 7.1 7.1 7.1 7.1
20 For which of the following variables does it not make sense to calculate a mean or
standard deviation?
a Speed (in km/h) b Sex c Age (in years)
d Post code e Neck circumference (in cm)
f Weight (underweight, normal, overweight)
22 Calculate the mean and standard deviation for the variables in the table.
Give answers to the nearest whole number for cars and TVs, and one decimal place for
alcohol consumption.
23 The median, M, of the number of take away coffees bought is equal to:
A 10.5 B 18.5 C 20.5 D 21 E 26
24 The interquartile range, IQR, of the number of take away coffees bought is equal to:
A 10.5 B 15.0 C 15.5 D 18.5 E 20.5
0 5 10 15 20 25 30 35 40 45 50 55 60
Five-number summary
A listing of the median, M, the quartiles Q1 and Q3 , and the smallest and largest data
values of a distribution, written in the order
minimum, Q1 , M, Q3 , maximum
is known as a five-number summary.
The five-number summary is the starting point for constructing one of the most useful
graphical tools in data analysis, the boxplot.
The boxplot
The boxplot (or box-and-whisker plot) is a graphical display of a five-number summary. The
essential features of a boxplot are summarised below.
The boxplot
A boxplot is a graphical display of a five-number summary.
whisker whisker
minimum Q1 M Q3 maximum
25% 25% 25% 25%
In a boxplot:
a box extends from Q1 to Q3 , locating the middle 50% of the data values
the median is shown by a vertical line drawn within the box
lines (called whiskers) are extended out from the lower and upper ends of the box to
the smallest and largest data values of the data set respectively
25% of the data values are from the minimum to Q1
25% of the data values are from Q1 to the median M
25% of the data values are from the median M to Q3
25% of the data values are from Q3 to the maximum
The stem plot shows the distribution of life Key: 5|2 = 52 years
expectancies (in years) in 23 countries. 5 2 minimum
The five-number summary for these data is: 5 5 6
6 4 Q1
minimum 52 6 6 6 7 9 median
first quartile (Q1 ) 66 7 1 2 2 3 3 4 4 4 4
median (M) 73 7 5 5 6 6 7 7 maximum
third quartile (Q3 ) 75
maximum 77
By modifying the boxplots, we can decide which explanation is most likely, but firstly we
need a more exact definition of an outlier.
Defining outliers
An outlier in a distribution is any data point that lies more than 1.5 interquartile ranges
below the first quartile or more than 1.5 interquartile ranges above the third quartile.
To be more informative the boxplot can be modified so that the outliers are plotted
individually in the boxplot with a dot or cross, and the whisker now ends only to the largest
or smallest data value that is not outside these limits.
An example of a
boxplot with outliers
is shown opposite.
0 10 20 30 40 50 60
Three of the data values 30, 40, and 60 are possible outliers.
To display outliers on a boxplot, we must first determine the location of what we call the
upper and lower fences. These are imaginary lines drawn one and a half interquartile ranges
(or box widths) above and below the box ends, as shown in the diagram following. Data
values outside these fences are then classified as possible outliers and plotted separately.
outliers outlier
When drawing a boxplot, any observation identified as an outlier is shown by a dot. The
whiskers end at the smallest and largest values that are not classified as outliers.
While we have used a five-number summary as the starting point for our introduction to
boxplots, in practice the starting point for constructing a boxplot is raw data. Constructing a
boxplot from raw data is a task for your CAS calculator.
Interpreting boxplots
Constructing a boxplot is not an end in itself. The prime reason to construct boxplots is to
help us answer statistical questions. To do this, you need to know how to read values from a
boxplot and use them to determine statistics such as the median, the interquartile range and
the range. We also use boxplots to identify possible outliers.
Explanation Solution
a The median (the vertical line in the box) M = 36
b Quartiles Q1 and Q3 (end points of box) Q1 = 30, Q3 = 44
c Interquartile range (IQR = Q3 − Q1 ) IQR = Q3 − Q1 = 44 − 30 = 14
d Minimum and maximum values (extremes) Min = 4, Max = 92
e The values of the possible outliers (dots) 4, 70, 84 and 92 are possible
f Upper fence (given by Q3 + 1.5 × IQR) Upper fence = Q3 + 1.5 × IQR
= 44 + 1.5 × 14 = 65
Any value above 65 is an outlier.
g Lower fence (given by Q1 − 1.5 × IQR) Lower fence = Q1 − 1.5 × IQR
= 30 − 1.5 × 14 = 9
Any value below 9 is an outlier.
Once we know the location of the quartiles, we can use the boxplot to estimate percentages.
Explanation Solution
a 54 is the first quartile (Q1 ); 25% of values are less than Q1 . a 25%
b 55 is the median or second quartile (Q2 ); 50% of values are less b 50%
than Q2 .
c 59 is the third quartile (Q3 ); 75% of values are less than Q3 . c 75%
d 75% of values are less than 59 and 25% are greater than 59. d 25%
e As 75% of values are less than 59 and 25% are less than 54, 50% of e 50%
values are between 54 and 59.
f As 100% of values are less than 86 and 25% of values are less than f 75%
54, 75% of values are between 54 and 86.
A symmetric distribution
A symmetric distribution tends to be centred on its
median and have values evenly spread around the
median. As a result, its boxplot will also be symmetric,
its median is close to the middle of the box and its
whiskers are approximately equal in length. Q1 M Q3
Describe the distribution represented by the boxplot in terms of shape, centre and spread.
Give appropriate values.
0 5 10 15 20 25 30 35 40 45 50
The distribution is positively skewed with no outliers. The distribution is centred at 10,
the median value. The spread of the distribution, as measured by the IQR, is 16 and, as
measured by the range, 45.
Describe the distributions represented by the boxplot in terms of shape and outliers,
centre and spread. Give appropriate values.
0 5 10 15 20 25 30 35 40 45 50
The distribution is symmetric but with outliers. The distribution is centred at 41, the
median value. The spread of the distribution, as measured by the IQR, is 5.5 and, as
measured by the range, 37. There are four outliers: 10, 15, 20 and 25.
The boxplot shows the gestation period (completed weeks) for a sample for 1000 babies
born in Australia one year. Describe the distribution of gestation period in terms of shape,
centre, spread and outliers.
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Gestation period (weeks)
The distribution of gestational period is negatively skewed with several outliers. The
distribution is centred at 39 weeks, the median value. The range of the distribution is 17
weeks, but the interquartile range is only 2 weeks. Any gestational period less than 35
weeks or less is considered unusual, with outliers at 25, 26, 27, 28, 29, 30, 31, 32, 33 and
34 weeks.
Exercise 1G
2 Construct a five-number summary for the stem plot Key: 13|6 = 136
opposite. 13 6 7
14 3 6 8 8 9
15 2 5 8 8 8
16 4 5 5 6 7 9
17 8 8 9
18 2 9
3 3 7 8 9 12 13 15 17 20 21
22 25 26 26 26 27 30 36 37 55
a Show that the five number summary for this data is:
Min =3, Q1 = 10.5, M = 21, Q3 = 26.5, Max = 55
b Show that the upper fence is equal to 50.5.
c Explain why this boxplot will show at least one outlier.
d Construct a boxplot showing the outlier.
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
30 32 34 36 38 40 42 44 46 48 50 52 54 56
0 10 20 30 40 50 60 70 80 90 100
0 10 20 30 40 50 60 70 80 90 100
b When the data were originally entered, a value of 31 was incorrectly entered as 35.
Would the 31 be shown as an outlier when the error is corrected? Explain your
Boxplot 1 Boxplot 2
Boxplot 3 Boxplot 4
Histogram A Histogram B
Histogram C Histogram D
0 5 10 15 20 25 30 35 40 45 50
0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400
Example 26 15 Describe the distributions represented by the following boxplots in terms of shape,
centre, spread and outliers (if any). Give appropriate values.
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
17 The percentage of students who scored more than 24 marks is closest to:
A 15% B 25% C 50% D 75% E 100%
mean 398
minimum 20
first quartile (Q1 ) 120
median (M) 273
third quartile (Q3 ) 650
maximum 1650
20 The largest five values in the data set are 1100gm, 1250gm, 1550gm, 1600gm and
1650gm. Which of these are outliers?
We know that the interquartile range is the spread of the middle 50% of the data set. Can we
find some similar way in which to interpret the standard deviation?
It turns out we can, but we need to restrict ourselves to symmetric distributions that have
an approximate bell shape. Again, while this may sound very restrictive, many of the data
distributions we work with in statistics (but not all) can be well approximated by this type of
distribution. In fact, it is so common that it is called the normal distribution.
To give you an understanding of what this rule means in practice, it is helpful to view this
rule graphically.
68-95-99.7% Rule
Since the normal distribution is symmetrical, and 100% of the observations are within the
normal curve, we can use the 68–95–99.7% rule to allocate percentages to the tails of the
distribution in each instance.
Since around 68% of the data values will lie within
one standard deviation (SD) of the mean, then we
can also say that around 16% of values lie in each
16% 68% 16%
of the tails.
Since around 95% of the data values will lie within
two standard deviations of the mean, then we can 95%
also say that around 2.5% of values lie in each of 2.5% 2.5%
the tails.
Putting together all of this information, we can then allocate percentages of the data which
fall into each section of the normal curve, as shown in the following diagram:
The distribution of delivery times for pizzas made by House of Pizza is approximately
normal, with a mean of 25 minutes and a standard deviation of 5 minutes.
a What percentage of pizzas have delivery times of between 15 and 35 minutes?
b What percentage of pizzas have delivery times of greater than 30 minutes?
c In 1 month, House of Pizza delivers 2000 pizzas. Approximately many of these pizzas
are delivered in less than 10 minutes?
Explanation Solution
a 1 Sketch, scale and label a normal
mean = 25
distribution curve with a mean of SD = 5
25 and a standard deviation of 5.
10 15 20 25 30 35 40
Delivery time
Standard scores
The 68–95–99.7% rule makes the standard deviation a natural measuring stick for normally
distributed data.
For example, a person who obtained a score of 112 on an IQ test with a mean of 100
and a standard deviation of 15 has an IQ score less than one standard deviation from the
mean. Her score is typical of the group as a whole, as it lies well within the middle 68% of
scores. In contrast, a person who scores 133 stands out; her score is more than two standard
deviations from the mean and this puts her in the top 2.5%.
Because of the additional insight provided by relating the standard deviations to percentages,
it is common to transform data into a new set of units that show the number of standard
deviations a data value lies from the mean of the distribution. This is called standardising
and these transformed data values are called standardised or z-scores.
a positive z-score indicates that the actual score it represents lies above the mean
a z-score of zero indicates that the actual score is equal to the mean
a negative z-score indicates that the actual score lies below the mean.
The heights of a group of young women have a mean of x̄ = 160 cm and a standard
deviation of s = 8 cm. Determine the standard or z-scores of a woman who is:
a 172 cm tall b 150 cm tall c 160 cm tall.
Explanation Solution
1 Write down the data value (x), the a x = 172, x̄ = 160, s = 8
mean ( x̄) and the standard deviation (s). x − x̄ 172 − 160 12
z= = = = 1.5
2 Substitute the values into the formula s 8 8
x − x̄ b x = 150, x̄ = 160, s = 8
z= and evaluate. x − x̄ 150 − 160 10
s z= = =− = −1.25
s 8 8
c x = 160, x̄ = 160, s = 8
x − x̄ 160 − 160 0
z= = = =0
s 8 8
Another student studying the same two subjects obtained a mark of 55 for both
Psychology and Statistics. Does this mean that she performed equally well in both
subjects? Use standardised marks to help you arrive at your conclusion.
Explanation Solution
1 Write down her mark (x), the mean ( x̄) Pyschology: x = 55, x̄ = 65, s = 10
and the standard deviation (s) for each x − x̄ 55 − 65 −10
z= = = = −1
subject and compute a standardised s 10 10
score for both subjects. Statistics: x = 55, x̄ = 60, s = 5
x − x̄ 55 − 60 −5
z= = = = −1
s 5 5
2 Write down your conclusion. Yes, her standardised score, z = −1, was
the same for both subjects. In both subjects
she finished in the bottom 16%.
Suppose the weight of a certain species of bird is normally distributed with a mean of 42
grams with a standard deviation of 3 grams.
a If a bird selected at random from this population has a standardised weight of z = −1,
what percentage of birds in this population weigh more than this bird?
b Approximately what percentage of birds would weigh between 39 and 48 grams?
Explanation Solution
a Locate z = −1 on the graph above. We can see that the percentage of the
distribution below z = −1 is 16%, so the
percentage above z = −1 is 84%.
b 1 Substitute the values of x into the x̄ = 42, s = 3
x − x̄ x − x̄ 39 − 42
formula z = and evaluate. x = 39 z= = = −1
s s 3
2 Locate z = −1 and z = 2 on the x − x̄ 48 − 42
x = 48 z = = =2
graph above, and determine the s 3
The percentage of the distribution
percentage of the distribution
between z = −1 and z = 2 is 81.5%.
between these values.
A class test (out of 50) has a mean mark of x̄ = 34 and a standard deviation of s = 4.
Joe’s standardised test mark was z = −1.5. What was Joe’s actual mark?
Explanation Solution
1 Write down mean ( x̄), the standard deviation (s) and x̄ = 34, s = 4, z = −1.5
Joe’s standardised score (z).
2 Write down the rule for calculating the actual score x = x̄ + z × s
and substitute these values into the rule. = 34 + (−1.5) × 4 = 28
Joe’s actual mark was 28.
If we know something about the percentages associated with a normal distribution, we can
use this information to find the values of the mean, or standard deviation, or both.
In the following example we are given the value of the mean, and one percentage associated
with the distribution. From this we can determine the value of the standard deviation.
Example 33 Finding the value of the standard deviation given the mean and one
Suppose the heights of red flowering gum trees have a mean of 10.2 metres, and 2.5% of
these trees grow to more than 11.4 metres tall. Assuming that the heights of these trees
are approximately normally distributed, what is the standard deviation of the height of the
red flowering gum trees?
Explanation Solution
1 Since 2.5% of the trees are taller than x̄ = 10.2, z = 2
11.4 metres, this height corresponds to
a z-score of 2.
2 Write down the rule for calculating the x = x̄ + z × s
actual score and substitute these values 11.4 = 10.2 + 2 × s
into the rule.
Example 34 Finding the value of the standard deviation given the mean and two
sheet Exercise 1H
3 The distribution of times taken for walkers to complete a circuit in a park is normal,
with a mean time of 14 minutes and a standard deviation of 3 minutes.
a What percentage of walkers complete the circuit in:
i more than 11 minutes? ii less than 14 minutes?
iii between 14 and 20 minutes?
b In a week, 1000 walkers complete the circuit. Approximately how many will take
less than 8 minutes?
10 To be considered for a special training program applicants are required to sit for an
aptitude test. Suppose that 2000 people sit for the test, and their scores on the aptitude
test are approximately normally distributed with a mean of 45 and a standard deviation
of 2. People who score more than 49 are selected for the special training program.
People who are not chosen for the training program, but score more than 47, are invited
to resit the aptitude test at a later date.
a What percentage of people who sat for the test are eligible for the training program?
b Approximately how many people would be invited to resit the aptitude test at a later
13 The weights of bananas from a certain grower are approximately normally distributed.
If the standard deviation of the weight of these bananas is 5 g, and 16% of the bananas
weigh less than 96 g, what is the mean weight of the bananas?
Example 34 14 The birth weights of babies are known to be approximately normally distributed. If
16% of babies weigh more than 4.0 kg, and 0.15% of babies weigh more than 5.0 kg,
estimate the mean and standard deviation of this distribution. Give your answers to one
decimal place.
16 The body weights of a large group of 14-year-old girls have a mean of 54 kg and a
standard deviation of 10.0 kg.
a Kate weighs 56 kg. Determine her standardised weight.
b Lani has a standardised weight of –0.75. Determine her actual weight.
c Find:
i percentage of these girls who weigh more than 74 kg
ii percentage of these girls who weigh between 54 and 64 kg
iii percentage of these girls who have standardised weights less than –1
iv percentage of these girls who have standardised weights greater than –2.
17 Suppose that IQ scores are normally distributed with mean of 100 and standard
deviation of 15.
a What percentage of people have an an IQ:
i greater than 115?
ii less than 70?
b To be allowed to join an elite club, a potential member must have an IQ in the
top 2.5% of the population. What IQ score would be necessary to join this club?
c One student has a standardised score of 2.2. What is their actual score?
18 The heights of women are normally distributed with a mean of 160 and a standard
deviation of 8.
a What percentage of women would be:
i taller than 152 cm?
ii shorter than 176 cm?
b What height would put a woman among the tallest 0.15% of the population?
c What height would put a woman among the shortest 2.5% of the population?
d One woman has a standardised height of -1.2. What is her actual height? Give your
answer to one decimal place.
19 The percentage of students in the state who scored more than 71 marks is closest to:
A 0.15% B 2.5% C 5% D 15% E 0.3%
20 The top 2.5% of the state are to be awarded a distinction. What would be the lowest
mark required to gain a distinction in this exam?
A 36 B 43 C 57 D 64 E 71
23 The table below shows Miller’s swimming times (in seconds) for 50 metres in each of
butterfly, backstroke, breaststroke and freestyle. Also shown are the mean and standard
deviation of the times recorded for all of the swimmers in his swimming club. In how
many of these swimming styles is he in the fastest 2.5% of swimmers at his swimming
Style Miller’s time Mean Standard deviation
Butterfly 38.8 46.2 3.2
Breaststroke 51.4 55.1 4.1
Backstroke 53.5 48.3 2.5
Freestyle 33.3 38.2 2.3
A 0 B 1 C 2 D 3 E 4
Univariate data Univariate data are generated when each observation involves
Assign- recording information about a single variable.
Types of data Data can be classified as numerical or categorical.
Numerical Numerical variables have data values which are quantities. Numerical
variables variables come in two types: discrete and continuous. Discrete
variables are those which may take on only a countable number of
distinct values such as 0 1 2 3 4 . . . and are often associated with
counting. Continuous variables are ones which take an infinite number
of possible values, and are often associated with measuring rather than
Frequency table A frequency table lists the values a variable takes, along with how
often (frequently) each value occurs. Frequency can be recorded as:
the number of times a value occurs – e.g. the number of Year 12
students in the data set is 32.
the percentage of times a value occurs – e.g. the percentage of Year
12 in the data set is 45.5%.
Bar chart Bar charts are used to display frequency distribution of categorical
data. Each value of the variable is represented by a bar showing the
frequency, or the percentage frequency.
Segmented bar A segmented bar chart is like a bar chart, but the bars are stacked one
chart on top of another to give a single bar with several segments.
Mode, modal The mode (or modal interval) is the value of a variable (or the interval)
category that occurs most frequently.
Describing the The distribution of a numerical variable can be described in terms of:
distribution of
shape: symmetric or skewed (positive or negative)
a numerical
variable outliers: values that seem unusually small or large.
centre: the median or mean.
spread: the IQR , range or the standard deviation.
Dot plot A dot plot consists of a number line with each data point marked by a
dot. A dot plot is particularly suitable for displaying a small data set of
discrete numerical data.
Stem plot The stem plot is particularly suitable for displaying a small to medium
sized data sets of numerical data.It shows each data value separated into
two parts: the leading digits, which make up the stem of the number,
and its last digit, which is called the leaf.
Log scales Log scales can be used to transform a skewed histogram to symmetry.
Summary Summary statistics are numerical values for special features of a data
statistics distribution such as centre and spread.
Mean The mean ( x̄) is a summary statistic that can be used to locate the centre
of a symmetric distribution.
P The value of the mean is determined from
the formula: x̄ =
Range The range (R) is the difference between the smallest and the largest data
values. It is the simplest measure of spread.
Standard The standard deviation (s) is a summary statistic that measures the
deviation spread of the data values around the mean. The value of the standard
deviation is determined from the formula:
(x − x̄)2
Median The median (M) is a summary statistic that can be used to locate the
centre of a distribution. It is the midpoint of a distribution, so that
50% of the data values are less than this value and 50% are more. It is
sometimes denoted as Q2
Quartiles Quartiles are summary statistics that divide an ordered data set into
four equal groups.
Interquartile The interquartile range (IQR) gives the spread of the middle
range 50% of data values in an ordered data set. It is found by evaluating
IQR = Q3 − Q1
Five-number The median, the first quartile, the third quartile, along with the
summary minimum and the maximum values in a data set, are known as a
five-number summary.
Outliers Outliers are data values that appear to stand out from the rest of the
data set. They are values that are less than the lower fence or more than
the upper fence.
The normal Data distributions that have a bell shape can be modelled by a normal
distribution distribution.
Standardised The value of the standard score gives the distance and direction of a data
scores value from the mean in terms of standard deviations.
The rule for calculating a standardised score is:
actual score − mean
standardised score =
standard deviation
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
1F 21 I can calculate the mean and standard deviation using the CAS calculator.
1G 23 I can construct a boxplot with outliers from data using the CAS calculator.
1G 27 I can use boxplots to describe a distribution with outliers.
1H 34 I can solve for the values of the mean and standard deviation.
Multiple-choice questions
Data pertaining to the following five variables was collected about secondhand cars:
number of seats
age (1 = less than 2 years, 2 = from 30,000-60,000km, 3 = more than 80,000km)
mileage (in kilometres)
The percentage segmented bar chart shows the 100 Hair color
distribution of hair colour for 200 students. 90 Other
80 Red
3 The number of students with brown hair is Black
70 Brown
closest to:
A 4 B 34 C 57
D 72 E 114
4 The most common hair colour is: 30
A black B blonde 20
C brown D red E other 10
6 8 10 12 14 16 18 20 22 24 26 28
Test score
5 The number of students in the class who obtained a test score less than 14 is:
A 4 B 10 C 14 D 16 E 28
9 Find the number with log equal to 2.314; give the answer to the nearest whole number.
A 2 B 21 C 206 D 231 E 20606
The following information relates to Questions 10 and 11.
192 countries in 2011. 16%
−1.0 −0.5 0.0 0.5 1.0 1.5 2.0
10 Australia’s per capita CO2 emissions in 2011 were 16.8 tonnes. In which column of the
histogram would Australia be located?
A −0.5 to <0.0 B 0.0 to <0.5 C 0.5 to <1.0 D 1.0 to <1.5 E 1.5 to < 2.0
11 The percentage of countries with per capita CO2 emissions of under 10 tonnes is
closest to:
A 14% B 17% C 31% D 69% E 88%
The stem plot opposite displays the distribution of the Key: 1|5 means 15 marks
marks obtained by 25 students. 0 2
1 5 9 9 9
2 0 4 4 5 5 8 8 8
3 0 3 5 5 6 8 9
4 1 2 3 5
6 0
15 The shape of the data distribution displayed by this stem plot is best described as:
A approximately symmetric B approximately symmetric with an
C negatively skewed with an outlier D negatively skewed
E positively skewed with an outlier
50 55 60 65 70 75 80 50 55 60 65 70 75 80
50 55 60 65 70 75 80 50 55 60 65 70 75 80
22 For the data represented by boxplot D, the percentage of data values greater than 65 is:
A 2.5% B 25% C 50% D 75% E 100%
23 To be an outlier in boxplot D, a score must be:
A either less than 52.5 or greater than 72.5 B greater than 72.5
C either less than 55 or greater than 70 D greater than 70
E less than 55
24 The mean ( x̄) and standard deviation (s) for the following set of test marks
1 1 10 15 16 25 8 10 12
are closest to:
A 7.1, 10.9 B 7.5, 10.9 C 10.9, 7.1 D 10.9, 7.5 E 10.8, 7.5
25 It would not be appropriate to determine the mean and standard deviation of a group of
A ages B phone numbers C heights
D weights E family sizes
26 The median is a more appropriate measure of the centre of a distribution than the mean
when the distribution is:
A symmetric B symmetric with no C bell-shaped
D clearly skewed E normal
27 The mean length of 10 garden stakes is 180.5 cm, and the standard deviation is 2.9 cm.
If the length of each garden stake is reduced by exactly 5 cm, the mean and standard
deviation of the 10 stakes will be:
A 175.5 cm and 2.4 cm B 180.5 cm and 2.4 cm C 175.5 cm and 2.9 cm
D 175.5 cm and 3.4 cm E 185.5 and 2.9 cm
28 A student’s mark on a test is 50. The mean mark for their class is 55 and the standard
deviation is 2.5. Their standard score is:
A −2.5 B −2.0 C 0 D 2 E 2.5
29 The percentage of trips each week that take 78 minutes or more is:
A 16% B 34% C 50% D 68% E 84%
30 The number of trips each week that take between 70 and 82 minutes is approximately:
A 4 B 32 C 68 D 127 E 163
32 A standardised time for a trip is z = −0.25. The actual time (in minutes) is:
A 77 B 77.25 C 77.75 D 78.25 E 79
33 The time of a bus trip has a standardised time of z = 2.1. This time is:
A very much below average B just below average C around average
D just above average E very much above average
34 The table shows the time taken to run one kilometre Runner Time(mins)
(in minutes) by three runners. To be invited to join
Albie 7.5
the athletics team the standardised score for their time
needs to be no more than than 0.5. Lincoln 4.9
Wendy 8.0
If the mean running time for one kilometre is 7.0 minutes and the standard deviation is
1.2 minutes, who will be invited to join the athletics team?
A Only Wendy B Lincoln and Albie C Only Lincoln
D Albie and Wendy E All three runners
c Use the percentages to construct a percentage segmented bar chart for the data.
d Write a short report describing the distribution of responses.
histogram to answer the following questions.
a How many students:
i were surveyed? 4
ii spent from $100 to less than $105 per
b What is the mode? 0
90 100 110 120 130 140
c How many students spent $110 or more
Amount ($)
per month?
d What percentage spent less than $100 per month?
e i What is the shape of the distribution displayed by the histogram?
ii In which interval is the median of the distribution?
iii In which interval is the upper quartile of the distribution (Q3 )?
3 The amount of weight lost in one week by 32 people who participated in a weight loss
program was recorded and displayed in the ordered stem plot below.
Weight loss (kg) key: 2|0 represents 2.0
1 5 5 7 8 9 9
2 2 2 2 3 3 4 4
2 5 6 6 7 7 8
3 0 1 3 3 4
3 5 5 5 7
4 1 2 2
5 0
a Describe the shape of the distribution.
b Determine the median weight loss. Give your answer to 2 decimal places.
c Find the value of the interquartile range. Give your answer to 2 decimal places.
d What percentage of this group had a weight loss of more than 3.5 kg? Give your
answer to 2 decimal places.
e Is the weight loss of 5.0 kg an outlier for this data set? Justify your answer.
4 The systolic blood pressure (measured in mmHg) for a group of 2000 people
was measured. The results are summarised in the five-number summary below:
Min = 75, Q1 = 110, M = 125, Q3 = 140, Max = 180
a Use the five-number summary to construct a simple boxplot.
b Indicate on your plot where the lower and upper fences would be, and hence if there
would be any outliers.
c Assume that the distribution of systolic blood pressure for this sample of 2000
people is approximately normally distributed, with a mean of 128 mmHg and a
standard deviation of 20 mmHg.
i Approximately what percentage of people of people have a systolic blood
pressure between 108 mmHg and 148 mmHg?
ii Suppose a person has a blood pressure three standard deviations below the mean,
what would be their actual blood pressure?
iii Of the 2000 people measured, how many could we expect to have a blood
pressure three standard deviations below the mean?
iv Of the 2000 people measured, how many actually did have a blood pressure
three standard deviations below the mean?
5 The hand span in centimetres of 200 women was recorded and displayed in the dot plot
a Write down the modal hand span, in centimetres, for this group of 200 women.
b The mean hand span for this group of 200 women is 17.9 cm, and the standard
deviation is 1.1 cm. Use the information in the dot plot to determine the percentage
of women in this group who had an actual hand span more than two standard
deviations above or below the mean. Round your answer to one decimal place.
c The five-number summary for this sample of hand spans, in centimetres, is given
Min =15.0, Q1 = 17.0, M = 18.0, Q3 =18.5, Max = 21.5
Use the five-number summary to construct a boxplot showing outliers.
Chapter questions
I What are bivariate data?
I What are explanatory and response variables?
I What are two-way frequency tables and how do we interpret them?
I How do we construct and interpret segmented bar charts from two-way
frequency tables?
I How do we construct and interpret parallel dot plots?
I How do we construct and interpret back-to-back stem plots?
I How do we construct and interpret parallel boxplots?
I What is a scatterplot, how is it constructed and what does it tell us?
I What do we mean when we describe the association between two numerical
variables in terms of direction, form and strength?
I What is the difference between observation and experimentation?
I What is the difference between association and causation?
In this chapter we begin our study of bivariate data, data which is recorded on two
variables from the same subject.
So far you have learned how to display, describe and compare the distributions of single
variables. In the process you learned how to use data to answer questions like ‘What is the
favourite colour of prep-grade students?’ or ‘How do the weights of tuna fish vary?’ In each
case we concentrated on investigating the statistical variables individually.
However, questions like ‘Does the new treatment for headache work more quickly than
the old treatment?’, ‘Are city voters more likely to vote for the Greens party than country
voters?’ or ‘Can we predict a student’s test score from the time (in hours) they spent
studying for the test?’ cannot be answered by considering variables separately. All of these
questions relate to situations where the two variables are linked in some way (associated) so
that they vary together. The data generated in these circumstances is called bivariate data.
Analysing bivariate data requires a new set of statistical tools. Developing and applying
these tools is the subject of the next four chapters.
Does the new treatment for headache work more quickly than the old treatment?
The two variables in this question are type of treatment, a categorical variable taking
the values ‘new’ and ‘old’, and time taken for the headache to be relieved, a numerical
variable, measured in minutes. Thus, investigation of a question like this can be classified
as investigating the association between a categorical variable and a numerical
Are city voters more likely to vote for the Greens party than country voters?
This question involves two variables, place of residence, which is a categorical variable
taking the values ‘city’ and ‘country’, and vote for the Greens, which also is a categorical
variable taking the values ‘yes’ and ‘no’. Investigation of a question like this can be
classified as investigating the association between two categorical variables.
Can we predict a student’s test score (%) from time (in hours) spent studying for the test?
The variable test score is a numerical variable, as is time spent studying for the test.
Investigation of a question like this can be classified as investigating the association
between two numerical variables.
As discussed in Chapter 1, categorical variables can be further classified as nominal or
ordinal, and numerical variables can be further classified as discrete or continuous.
For each of the following questions, determine if they involve investigating associations
one numerical variable and one categorical variable or
two categorical variables or
two numerical variables.
a Are younger people (age measured in years) more likely to believe in astrology
(measured as ‘yes’ or ‘no’) than older people?
b Do people who weigh more (weight measured in kg) tend to have higher blood
pressure (blood pressure measured in mmHg)?
c Are people who have a driver’s licence (measured as ‘yes’ or ‘no’) more likely to be in
favour of lowering the driving age (measured as ‘yes’ or ‘no’)?
a One numerical variable (age) and one categorical variable (belief in astrology)
b Two numerical variables (weight and blood pressure)
c Two categorical variables (have a driver’s licence and support for lowering the driving
explaining voting preference. In this situation place of residence is the explanatory variable
and vote for Greens is the response variable.
It is important to be able to identify the explanatory and response variables before you
explore the association between them. Consider the following examples.
We wish to investigate the question, ‘Does the time it takes a student to get to school
depend on their mode of transport?’ The variables here are time and mode of transport.
Which is the response variable (RV) and which is the explanatory variable (EV)?
Explanation Solution
In asking the question in this way we are suggesting EV: mode of transport
that a student’s mode of transport might explain the RV: time
differences we observe in the time it takes students to
get to school.
Can we predict people’s height (in cm) from their wrist measurement? The variables in
this investigation are height and wrist measurement. Which is the response variable (RV)
and which is the explanatory variable (EV)?
Explanation Solution
Since we wish to predict height from wrist circumfer- EV: wrist measurement
ence, we are using wrist measurement as the predictor RV: height
or explanatory variable. Height is then the response
It is important to note that, in Example 3, we could have asked the question the other way
around; that is, ‘Can we predict people’s wrist measurement from their height?’ In that case
height would be the explanatory variable, and wrist measurement would be the response
variable. The way we ask our statistical question is an important factor when there is no
obvious explanatory variable.
Note: The explanatory variable is sometimes called the independent variable (IV) and the response variable
the dependent variable (DV).
Exercise 2A
3 The following pairs of variables are related. In each case identify which is likely
to be the explanatory variable and which is the response variable, and the level of
measurement of each variable (categorical or numerical). The variable names are
a exercise level (1 = light, 2 = moderate, 3 = a lot) and age (years).
b years of education (years) and salary level ($ per annum).
c comfort level (0 = uncomfortable, 1 = comfortable) and temperature (◦ C).
d time of year (summer, autumn, winter, spring) and incidence of hay fever (1 = never,
2 = sometimes, 3 = regularly).
e age group (less than 25, 25 - 40, more than 40) and musical taste (classical, rock,
rap, country, indie, dance, jazz).
f AFL team supported and state of residence.
5 The variables weight (light, medium, heavy) and height (less than 160cm, 160-175cm,
over 175cm) are:
A both nominal variables
B both ordinal variables
C a nominal and an ordinal variable respectively
D an ordinal and a nominal variable respectively
E both continuous variables
6 Researchers believe that reaction time might be lower in cold temperatures. They
devise an experiment where reaction time in seconds is measured at three different
temperature levels (1 = less than 8◦ C, 2 = from 8◦ C to 18 ◦ C, 3 = more than 18◦ C).
The response variable, and its classification are:
A reaction time, categorical B temperature, categorical
C reaction time, numerical D temperature, numerical
E temperature, ordinal
If two variables are related or linked in some way, we say they are associated. To begin the
investigation of an association between two categorical variables we create a contingency
table or a two-way frequency table. It is called a two-way frequency table because it is
summarising data from two variables.
The first thing to note is that these two variables, attitude to gun control (for or against)
and residence (city or country), are both categorical variables. Categorical data are usually
presented in the form of a frequency table.
Suppose we continue until we have interviewed a sample Residence Frequency
of 100 people, and we find that there are 58 who live in the
Country 58
country and 42 who live in the city. We can present this
result in a frequency table as shown to the right. City 42
Total 100
From this table, we can see that there were more country
than city people in our sample.
Suppose also when we record the attitude to Attitude to gun control Frequency
gun control, we might have 62 ‘for’ and 38
For 62
‘against’ gun control. Again, we could present
these results in a frequency table as shown to Against 38
the right. Total 100
From this table, we can see that more people in the sample were for gun control than against
gun control. However, we cannot tell from the information contained in the tables whether
attitude to gun control depends on the residence of the person. To do this we need to
construct a two-way frequency table, which gives both the attitude to gun control and the
residence for each person in the sample.
We begin by counting the number of people in the sample who are:
from the country and for gun control
from the city and for gun control
from the country and against gun control
from the city and against gun control.
Suppose again from our sample of 100 people we find the following frequencies:
32 country people are for gun control
30 city people are for gun control
26 country people are against gun control
12 city people are against gun control.
In two-way frequency tables, it is conventional to let the categories of the response variable
label the rows of the table and the categories of the explanatory variable label the columns
of the table. Following this convention, we can create the following two-way frequency
Attitude to gun control Country City
For 32 30
Against 26 12
To complete the table, it is usual to calculate the row and column sums, as shown below.
Attitude to gun control Country City Total
For 32 30 62 Row sum
Against 26 12 38 Row sum
Total 58 42 100
Column sum Column sum
The shaded regions in the table are called the cells of the table. It is the numbers in these
cells that we look at when investigating the relationship between the two variables.
The following data were obtained when a sample of ten Year 9 students were asked
if they intended to go to university (university). The gender of the student was also
Consider again the two-way frequency table created to investigate the association between
place of residence and attitude to gun control. This table tells us that more country people
are in favour of gun control than city people. But is this just due to the fact that there were
more country people in the sample? To help us answer this question we need to express the
frequencies in each cell as percentage frequencies.
Attitude to gun control Country City
For 55.2% 71.4%
Against 44.8% 28.6%
Total 100.0% 100.0%
If the variables attitude to gun control and residence were not associated, we would expect
approximately equal percentages of country people and city people to be ‘for’ gun control.
Finding a single row in the two-way frequency distribution in which percentages are clearly
different is sufficient to identify a relationship between the variables.
We could have also arrived at this conclusion by focusing our attention on the percentages
‘against’ gun control. We might report our findings as follows.
In this sample of 100 people, a higher percentage of city people were for gun control than
country people: 71.4% to 55.2%. This indicates that a person’s attitude to gun control is
associated with their place of residence.
We will now consider a two-way percentage frequency table that shows no evidence of a
relationship. Consider the following table that summarises responses to the question ‘Should
mobile phones be banned in cinemas?’ These responses were obtained from 100 students
in Year 10 and Year 12 – we are interested in investigating whether there is an association
between these variables.
Year level
Should mobile phones Year 10 Year 12
be banned in cinemas?
Yes 87.9% 86.8%
No 12.1% 13.2%
Total 100.0% 100.0%
When we look across the first row of the table, we see that the percentages in favour are very
similar. In this case, we might report our findings as follows.
In this sample of 100 Year 10 and Year 12 students, we see that the percentage of Year 10
and Year 12 students in support of banning mobile phones in cinemas is similar: 87.9%
to 86.8%. This indicates that a person’s support for banning mobile phones in cinemas is
not associated with their year level.
University Male Female Total
Yes 50 54 104
No 55 41 96
Total 105 95 200
Explanation Solution
1 Determine the column Gender
percentages and complete the
University Male Female
table as shown.
Yes 47.6% 56.8%
No 52.4% 43.2%
Total 100.0% 100.0%
2 Select an appropriate row to We can see from the top row that a greater
compare the male and female proportion of females than males (56.8% compared
percentages. with 47.6%) were intending to go to university.
3 Construct a report. Report: In this sample of 200 Year 9 students, a
greater proportion of females than males (56.8%
compared with 47.6%) were intending to go to
university. There is an association between gender
and intention to go to university.
Two-way frequency tables for categorical variables taking more than two
The table below displays the smoking status for a group of adults (smoker, past smoker,
never smoked) by educational level (Year 9 or less, Year 10 or 11, Year 12, university). This
is still a two-way frequency table (because involves two variables), each of these variables
can take three values, and so we call this a 3×3 table.
Again, we look for an association between variables by comparing the percentages across
one of the rows. The following report has been prepared using the percentages in the
‘Smoker’ row.
From Table 3.3 we see that the percentage of smokers steadily decreases with education
level, from 33.9% for Year 9 or below to 18.4% for university. This indicates that
smoking is associated with level of education.
Explanation Solution
a Age is a possible explanation for the Age group is the EV.
level of interest in sport, but interest in
sport cannot explain age.
b If we look across all rows, we can There is an association between the level
see that the percentages are different of interest in sport and age. A high level
for each age group. Select one row to of interest in sport is seen to decrease
compare and discuss – here we have steadily across the age categories from
chosen ‘high’. 56.5% for under 18 years, 50.2% for 19–25
years, 40.7% for 26–35 years to, at its
lowest, 35% for 36–50 years.
Yr 9 or less Yr 10 or 11 Yr 12 University
Level of education
Construct a segmented bar chart to display the association interest in sport and age group
displayed in the table in Example 6.
1 Since age group is the EV, this variable will label the horizontal axis.
2 The vertical axis should be scaled from 0% to 100%, in intervals of 10%.
3 There will be a bar for each value of age group, that is, a bar for each column of the
4 Mark off, and colour, with each value of interest in sport assigned in the same colour.
5 Add a Key showing which colour has been assigned to each value of interest in sport.
Interest in sport
50% Low
40% Medium
30% High
under 18 18-25 26-35 36-50
Age group (years)
The percentaged segmented bar chart allows an easier visual comparison of the percentages
than does the percentaged two-way table, and can be used to investigate the association
between two categorical variables, as shown in the following example.
The percentaged segmented bar chart below shows the association between preferred
holiday (country or coast) and age group (under 40, 40 or over) for a sample of 800
visitors to a travel website.
Preferred holiday
60% country
50% coast
under 40 40 and over
Age group
Does the percentaged segmented bar chart support the contention there is an association
between preferred holiday and age group?
Explanation Solution
If we look across all rows, we can see that There is an association between the
the heights of the segments, and thus the holiday preference and age. Those aged
percentages, are different for each age under forty are more likely to choose a
group. Select one row to compare and coastal holiday (75%) than those aged
discuss – here we have chosen ‘coast’. forty or over (60%).
sheet Exercise 2B
Example 5 2 The following data were obtained when a sample of 30 adults were asked if they
supported reducing university fees. They were also classified by their age group: 17–18
years, 19–25 years, or 26 years or more. The results are given in the table below.
Age group Reduce fees Age group Reduce fees Age group Reduce fees
17–18 Yes 26 or more Yes 26 or more No
19–25 Yes 17–18 Yes 19–25 Yes
26 or more No 19–25 Yes 17–18 No
17–18 Yes 17–18 Yes 26 or more Yes
19–25 Yes 17–18 Yes 17–18 No
26 or more Yes 26 or more No 26 or more Yes
17–18 Yes 19–25 Yes 19–25 Yes
19–25 No 26 or more Yes 17–18 Yes
26 or more No 17–18 No 19–25 No
19–25 No 17–18 Yes 26 or more Yes
a Identify which variable is the explanatory variable and which is the response
b Create a two-way frequency table from these data, with the values of the
explanatory variable labelling the columns.
c Calculate the column percentages for the table.
Example 7 6 It was suggested that students in Dr Evans’ mathematics class would achieve higher
grades than students in Dr Smith’s mathematics class. The following table shows the
results for each class that year.
Exam grade Dr Evans Dr Smith Total
Fail 2 3 5
Pass 11 20 31
Credit or above 5 9 14
Total 18 32 50
Example 8 7 Are those people who are satisfied with their job more likely to be satisfied with their
life? Data collected from a survey of 200 adults are summarised in the following
percentaged segmented bar chart.
Satisfaction with life
60% dissatisfied
50% satisfied
satisfied dissatisfied
Satisfaction with job
Does the data support the contention that people who are satisfied with their job are
more likely to be satisfied with their life? Write a brief report quoting appropriate
8 Researchers predicted that using a special pillow would be more effective in curing
snoring than treatment with drugs. The association between the outcome of treatment
and type of treatment is shown in the following percentaged segmented bar chart.
50% no cure
40% partial cure
drug pillow
Type of treatment
a Identify which variable is the explanatory variable and which is the response
b Does the data support the contention the special pillow is more effective at
treating snoring than the drug treatment? Write a brief report quoting appropriate
9 As part of the General Social Survey conducted in the US, respondents were asked to
say whether they found life exciting, pretty routine or dull. Their marital status was
also recorded as married, widowed, divorced, separated or never married. The results
are organised into a table as shown.
Tertiary qualification
Happy with life Yes No Total
Yes 116 138 254
No 12 34 46
Total 128 172 300
10 The percentage of participants in the study who do not have a tertiary education is
closest to:
A 57.3% B 80.2% C 54.3% D 11.3% E 19.8%
11 Of those people in the study who did not have a tertiary education, the percentage who
are happy with their lives is closest to:
A 57.3% B 80.2% C 54.3% D 11.3% E 19.8%
12 The data in the table supports the contention that there is an association between
tertiary qualifications and happiness because:
A 84.7% of people are happy.
B more people without a tertiary qualification are happy than people with a tertiary
C 90.6% of people with a tertiary qualification are happy, compared to 80.2% of those
without a tertiary qualification.
D 54.3% of happy people do not have a tertiary qualification.
E 57.3% of people do not have a tertiary qualification, compared to 42.7% who do.
In the previous section, we learned how to identify and describe associations between two
categorical variables. In this section, we will learn to identify and describe associations
between a numerical variable and a categorical variable. Suppose, for example, we wish to
investigate the association between attendance at a revision class, and test score. Here we
can actually identify two variables. One is the variable test score, a numerical variable, and
the other is the variable attended revision class, which is a categorical variable taking the
values ‘yes’ or ‘no’.
The outcome of such an investigation will be a brief written report that compares the
distribution of the numerical variable across two or more groups, the number of groups
equal to the number of values which the categorical variable can take. The starting point for
these investigations will be, as always, a graphical display of the data. Here our options are
parallel dot plots, back-to-back stem plots or the parallel boxplots.
Using a graphical display of the data, as well as the values of the relevant summary statistics,
we can compare the distributions of the numerical variable for each value of the categorical
variable according to:
shape centre spread
If any of these are noticeably different for differing values of the categorical variable we
will conclude that the two variables are associated. Because it is often difficult to clearly
identify the shape of a distribution with a small amount of data, we usually confine ourselves
to comparing centre and spread, using the medians and IQRs, when using dot plots and
back-to-back stem plots.
The parallel dot plots below display the distribution of the number of sit-ups performed
by 15 people before and after they had completed a gym program.
22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
Number of sit-ups before gym program
22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
Number of sit-ups after gym program
Do the parallel dot plots support the contention that the number of sit-ups performed is
associated with completing the gym program? Write a brief explanation that compares the
Here the numerical variable sit-ups is the response variable and the categorical variable
gym program, taking the values ‘before’ and ‘after’, is the explanatory variable.
1 Locate the median number of sit-ups performed before and after the gym program,
M = 26 and M = 32 sit-ups respectively.
2 Determine the IQR of sit-ups performed before and after the gym program, IQR = 6
and IQR = 10 sit-ups respectively.
3 There are reasonable difference in both the median and IQR of the number of sit-ups
performed before and after the gym program, evidence that the number of sit-ups
performed is associated with completing the gym program. Report your conclusion,
backed up by a brief explanation.
The median number of sit-ups performed after attending the gym program (M = 32) is
higher than the median number of sit-ups performed before attending the gym program
(M = 26). The variability in the number of sit-ups has also increased from IQR = 6 to
IQR = 10. Thus we can conclude that the number of sit-ups performed is associated
with completing the gym program.
Example 10 Using a back-to back stem plot to identify and describe associations
The back-to-back stem plot below displays the distribution of life expectancy (in years)
for the same 13 countries in 1970 and 2010.
1970 2010
2|5 = 52 years 6|7 = 67 years
8 3
9 4 2 5 8
9 9 9 8 7 7 1 0 6 9
4 7 1 2 4 4 6 8 9 9
8 0 0 0
Do the back-to-back stem plots support the contention that life expectancy has changed
between these two time periods?
Here the numerical variable life expectancy is the response variable and the variable year
is the explanatory variable. While year can be considered a numerical variable, because
it is only taking two values (1970 and 2010) we are treating it as a categorical variable in
this example.
1 Determine the median life expectancies for 1970 and 2010. You should find them to be
67 and 76 years, respectively.
2 Determine the quartiles, and hence the values of the IQR for 1970 and 2010. You
should find them to be 12.5 years and 8 years respectively.
3 These differences in median and IQR between 1970 and 2010 are sufficient to
conclude that the distribution of life expectancy had changed over this time period.
Report your conclusion, supported by a brief explanation.
Report: There is an association between year and life expectancy. The median life
expectancy has increased between 1970 and 2010, from M = 67 years to M = 76
years. Over the same period life expectancy has also become less variable (IQR in
1970 = 12.5 years; IQR in 2010 = 8 years).
Use the following parallel boxplots to compare the pulse rates (in beats/minute) for a
group of 70 male students and 90 female students.
Solution 0.5
Here the numerical variable resting pulse rate is the response variable and the categorical
variable gender is the explanatory variable.
1 Compare the medians: The median for females is about 72, which is higher than that
for males, which is about 65.
2 Compare the spread: The IQR for females is 15, which is more than the IQR for
males, which is 10.
There is an association between resting pulse rate and gender. On average, the resting
pulse rate for males is lower (median: male = 65, female = 72) and less variable than that
for females (IQR: male = 10, female = 15). The distributions of resting pulse rates for
both male and female students were approximately symmetric. One male was found to
have an extremely low pulse rate of 40, while another had an extremely high pulse rate of
Example 12 Comparing distributions across more than two groups using parallel
Use the parallel boxplots below to compare the salary distribution for workers in a certain
industry across four different age groups: 20–29 years, 30–39 years, 40–49 years and
50–65 years.
50−65 years
40−49 years
30−39 years
20−29 years
Here the numerical variable salary is the response variable and the categorical variable
age group is the explanatory variable.
1 Compare the medians: The median salary increased from $64 000 for 20−29 year-olds
to $72 000 for 50−65 year-olds.
2 Compare the IQRs: The IQR increased from around $12 000 for 20−29-year-olds to
around $20 000 for 50−65-year-olds.
3 Comparing the shapes: The shape of the distribution of salaries changes with the age
group, from symmetric to positively skewed.
4 Locate the outliers: There are no outliers in the 20-29 and 30-39 age group. Outliers
also begin to appear at $110 000 for the 40-49 age group, and at $119 000, $126 000
and $140 000 for the 50-65 age group.
5 Write the report comparing the distributions.
In this industry there is an association between salary and age group. The median salaries
increase across the age groups, from $64 000 for 20−29 year-olds to $72 000 for 50−65
year-olds. The salaries also became more variable, with the IQR increasing from around
$12 000 for 20−29-year-olds to around $20 000 for 50−65-year-olds.The shape of the
distribution of salaries changes with age group, from symmetric for 20−29-year-olds, to
progressively more positively skewed as age increases. There are no outliers in the 20-29
and 30-39 age group. Outliers also begin to appear at $110 000 for the 40-49 age group,
and at $119 000, $126 000 and $140 000 for the 50-65 age group.
sheet Exercise 2C
Example 9 1 Data was collected to compare the the number of days spent away from home (number
of days away) by 21 tourists from each of Japan and Australia (country of origin). The
data collected is displayed in the parallel dot plots below.
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50
Example 10 2 The back-to-back stem plot shown compares the Females Males
distribution of the age patients (in years) admitted 9 0
5 0 1 3 6
to a small hospital during one week, and their 7 2 1 4 5 6 7
gender. 7 1 3 4
3 0 4 0 7
a Classify each variable as categorical or 0 5
numerical. 6
9 7
b Do the back-to-back plots support the 0 4 = 40 years 4 0 = 40 years
contention that the age of the patients is
associated with their gender? Write a brief
explanation that compares these distributions in
terms of centre and spread.
3 The following back-to-back stem plot displays the distributions of the number of hours
per week spent online by a group of students, and their year level.
Example 11 4 The parallel boxplots show the distribution of ages of 45 men and 38 women when first
(n = 38)
(n = 45)
10 20 30 40 50
Age at marriage (years)
60 70 80 90
Pulse rate (beats per minute)
a Identify each of the variables and classify as categorical or numerical.
b Use the boxplots to compare these distributions, and draw an appropriate conclusion
about the association between gender and pulse rate.
The data in the following boxplots was collected to investigate the association between
smoking and the birth weight of babies.
4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
Weight of baby at birth (lbs)
8 The information in the boxplots supports the contention that there is an association
between smoking and weight of baby at birth because:
A The IQRs of birthweight for both groups are approximately the same.
B The median birthweight for smokers is more than the median birthweight for
C The IQRs of birthweight for both groups are very different.
D The median birthweight for smokers is less than the median birthweight for
E Both distributions are approximately symmetric.
In this section, we will learn to identify and describe associations between two numerical
variables. Suppose, for example, we wish to investigate the association between university
participation rate (the EV) and average hours worked (the RV) in nine countries. The
starting point for this investigation is again a graphical display of the data. Here our options
are to construct a scatterplot. The data for 9 countries are shown below.
The scatterplot
A scatterplot is a plot which enables us to display bivariate data when both of the
variables are numerical.
In a scatterplot, each point represents a single case.
When constructing a scatterplot, it is conventional to use the vertical or y-axis for the
response variable (RV) and the horizontal or x-axis for the explanatory variable (EV).
The scatterplot below left shows the point for a country for which the university partici-
pation rate is 26% and average hours worked is 35, and the scatterplot below right is the
completed scatterplot when each of the remaining countries are plotted.
55 55
50 50
Hours worked
Hours worked
45 45
40 40
(26, 35)
35 35
30 30
0 10 20 30 40 50 60 0 10 20 30 40 50 60
Participation rate (%) Participation rate (%)
Test 1 10 18 13 6 8 5 12 15 15
Test 2 12 20 11 9 6 6 12 13 17
1 Start a new
document by
pressing / + N .
2 Select Add Lists &
Spreadsheet. Enter
the data into lists
named test1 and
3 Press / + I and select Add Data & Statistics.
4 a Click on Click to add variable on the x-axis
and select the explanatory variable test1.
b Click on Click to add variable on the y-axis and
select the response variable test2. A scatterplot is
displayed. The plot is scaled automatically.
Test 1 10 18 13 6 8 5 12 15 15
Test 2 12 20 11 9 6 6 12 13 17
1 Open the Statistics application
and enter the data into the columns
named test1 and test2.
2 Tap to open the Set
StatGraphs dialog box and
complete as given below.
Draw: select On.
Type: select Scatter ( ¤ ).
XList: select main\test1( ¤ ).
YList: select main\test2( ¤ ).
Freq: leave as 1.
Mark: leave as square.
Tap Set to confirm your selections.
3 Tap in the toolbar at the top of
the screen to plot the scatterplot in
the bottom half of the screen.
4 To obtain a full-screen plot, tap
from the icon panel.
Note: If you have more than one graph
on your screen, tap the data screen, select
StatGraph and turn off any unwanted
Exercise 2D
1 The scatterplot opposite has been 850
constructed to investigate the association 825
between the airspeed (in km/h) of
Airspeed (km/h)
ISBN 978-1-009-11041-9
that has 300 seats? © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
2D 2D Investigating associations between two numerical variables 133
Minimum temperature (x) 17.7 19.8 23.3 22.4 22.0 22.0
Maximum temperature (y) 29.4 34.0 34.5 35.0 36.9 36.4
The table above shows the maximum and minimum temperatures (in ◦ C) during a hot
week in Melbourne.
a Enter the data into your calculator, naming the variables mintemp and maxtemp.
b Construct a scatterplot with minimum temperature as the EV.
Balls faced 29 16 19 62 13 40 16 9 28 26 6
Runs scored 27 8 21 47 3 15 13 2 15 10 2
The table above shows the number of runs scored and the number of balls faced
by batsmen in a 1-day international cricket match. Use a calculator to construct an
appropriate scatterplot.
Temperature (◦C) 0 10 50 75 100 150
Diameter (cm) 2.00 2.02 2.11 2.14 2.21 2.28
The table above shows the changing diameter of a metal ball as it is heated. Use a
calculator to construct an appropriate scatterplot, with temperature as the EV.
Number in theatre 87 102 118 123 135 137
Time (minutes) 0 5 10 15 20 25
The table above shows the number of people in a movie theatre at 5-minute intervals
after the advertisements started. Use a calculator to construct an appropriate scatterplot.
Having found a clear pattern, we need to be able to describe these associations clearly, as
they are obviously quite different. The three features we look for in the pattern of points are
direction, form and strength. Having found a clear pattern, there are several things we look
for in the pattern of points. These are:
between the variables height and age for this group 185
of footballers. However, there is a possible outlier for
height; a footballer who is 201 cm tall. 175
16 18 20 22 24 26 28 30 32
Age (years)
Weight (kg)
opposite). The two variables are associated. If the
points in the scatterplot trend upwards as we go from 80
left to right we say there is a positive association
between the variables. In this example the positive 70
association means that taller players tend to be
heavier. In this scatterplot, there are no outliers. 170 180 190 200 210
Height (cm)
Hours worked
If the points in the scatterplot trend downwards as
we go from left to right we say there is a negative
association between the variables. In this example 35
the negative association means that countries with
university participation rate tend to work fewer hours. 30
0 10 20 30 40 50 60
In this scatterplot, there are no outliers. Participation rate (%)
Direction of an association
Two variables have a positive association when the value of the response variable
tends to increase as the value of the explanatory variable increases.
Two variables have a negative association when the value of the response variable
tends to decrease as the value of the explanatory variable increases.
Two variables have no association when there is no consistent change in the value of
the response variable when the values of the explanatory variable increase.
Height daughter
Diameter (cm)
50 170
40 15
30 10
20 150
0 0 0
0 1 2 3 4 5 6 0 20 22 24 26 28 30 32 34 36 0 150 160 170 180
Dose (mg) Age (years) Height mother
Explanation Solution
a There is a clear pattern in the The direction of the association is negative.
scatterplot. The points in the scatterplot Reaction times tend to decrease as the drug
trend downwards from left to right. dose increases.
b There is no pattern in the scatterplot of There is no association between diameter
diameter against age. and age.
c There is a clear pattern in the The direction of the association is
scatterplot. The points in the scatterplot positive. Taller mothers tend to have taller
trend upwards from left to right. daughters.
The next feature that interests us in an association is its general form. Do the points in a
scatterplot tend to follow a linear pattern or a curved pattern? If the scatterplot has a linear
form then we say that the association between the variables is linear.
For example, both of the scatterplots below can be described as having a linear form; that
is, the scatter in the points can be thought of as scattered around a straight line. (The dotted
lines have been added to the graphs to make it easier to see the linear form.)
30 55
Average working hours
Velocity (m/s)
5 35
0 30
0 1 2 3 4 5 0 10 20 30 40 50 60
Time (s) University participation (%)
By contrast, consider the scatterplot opposite, plotting 5
A scatterplot is said to have a linear form when the points tend to follow a straight line.
A scatterplot is said to have a non-linear form when the points tend to follow a curved
160 6
150 2
0 150 160 170 180 2 3 4 5 6 7
Height mother Number of weeks on a diet
Explanation Solution
a There is a clear pattern. The association is linear.
The points in the scatterplot can be imagined to
be scattered around a straight line.
b There is a clear pattern. The association is non-linear.
The points in the scatterplot can be imagined to
be scattered around a curved line rather than a
straight line.
The strength of an association is a measure of how much scatter there is in the scatterplot.
When there is a strong association between the variables, there is only a small amount of
scatter in the plot, and a pattern is clearly seen.
As the amount of scatter in the plot increases, the pattern becomes less clear. This indicates
that the association is less strong. In the examples below, we might say that there is a
moderate association between the variables.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
138 Chapter 2 Investigating associations between two variables
As the amount of scatter increases further, the pattern becomes even less clear. The
scatterplots below are examples of weak association between the variables.
An association is classified as:
Strong if the points on the scatterplot tend to be tightly clustered about a trend line.
Moderate if the points on the scatterplot tend to be broadly clustered about a trend
Weak if the points on the scatterplot tend to be loosely clustered about a trend line.
When no pattern can be seen we say that there is no association.
Y Y 15
5 10 15 20 25 0 5 10 15 20
Explanation Solution
a The points are loosely clustered. The association is weak.
b The points are tightly clustered. The association is strong.
Exercise 2E
Assessing the direction of an association from the variables
1 For each of the following pairs of variables:
a Indicate whether you expect an association to exist between the variables.
b If associated, say which variable you would expect to be the EV and which would
be the RV, and whether you would expect the variables to be positively or negatively
i intelligence and height ii level of education and salary level
iii salary and tax paid iv frustration and aggression
v population density and distance from the city centre
vi time using social media and time spent studying
Aptitude test score
120 100
80 90
0 0
0 80 100 120 140 0 8 10 12 14 16 18 20
Smoking rate Age (months)
iii 14 iv 20
Calf measurement
CO level
0 0
0 100 150 200 250 300 350 400 0 30 40 50
Age (years)
The strength of a linear association is an indication of how closely the points in the
scatterplot fit a straight line. If the points in the scatterplot lie exactly on a straight line,
we say that there is a perfect linear association. If there is no fit at all we say there is no
association. In general, we have an imperfect fit, as seen in all of the scatterplots to date.
To measure the strength of a linear relationship, a statistician called Carl Pearson
developed a correlation coefficient, r, which has the following properties.
r=0 r = +1 r = −1
If there is a less than perfect linear association, then the correlation coefficient, r, has a value
between –1 and +1, or –1 < r < +1. The scatterplots below show approximate values of r
for linear associations of varying strengths.
1 Start a new document
by pressing / + N.
2 Select Add Lists &
Enter the data into
lists named income
and co2.
3 Press / + I and select Add Calculator.
Using the correlation matrix command: type in corrmat(income, co2) and press
Alternatively: Press k 1 C to access the Catalog, scroll down to corrMat( and press
·. Complete the command by typing in income, co2 and press ·.
The value of the correlation coefficient is r = 0.8342 . . . or 0.83 (2 d.p.)
The following data show the per capita income (in $’000) and the per capita carbon
dioxide emissions (in tonnes) of 11 countries.
Determine the value of Pearson’s correlation coefficient rounded to two decimal places.
Income ($’000) 8.9 23.0 7.5 8.0 18.0 16.7 5.2 12.8 19.1 16.4 21.7
CO2 (tonnes) 7.5 12.0 6.0 1.8 7.7 5.7 3.8 5.7 11.0 9.7 9.9
1 Open the Statistics application
2 Enter the data into the columns:
Income in List1
CO2 in List2.
3 Select Calc>Regression>Linear
Reg from the menu bar.
4 Press .
5 Tap OK to confirm your selections.
The value of the correlation
coefficient is
r = 0.818 . . . or 0.82 (to 2 d.p.).
Classify the strength of each of the following linear associations using the previous table:
a r = 0.35 b r = −0.507
c r = 0.992 d r = −0.159
Explanation Solution
a The value 0.35 is more than 0.25 and weak, positive
less than 0.5. That is, 0.25 ≤ r < 0.5
b The value –0.507 is more than –0.75 moderate, negative
and less than –0.5. That is, –0.75 < r ≤
c The value 0.992 is more than 0.75 and strong, positive
less than 1. That is, 0.75 ≤ r ≤1
d The value –0.159 is more than –0.25 no association
and less than 0.25. That is, –0.25 < r <
If you use the value of the correlation coefficient as a measure of the strength of an
association, you should ensure that:
1 the variables are numeric
2 the association is linear
3 there are no outliers in the data (the correlation coefficient can give a misleading
indication of the strength of the linear association if there are outliers present)
sheet Exercise 2F
Basic ideas
1 The scatterplots of three sets of related variables are shown.
x 2 3 6 3 6 x̄ = 4, s x = 1.871
y 1 6 5 4 9 ȳ = 5, sy = 2.915
b This table shows the number of runs scored and balls faced by batsmen in a cricket
match. Runs scored and balls faced are linearly associated. Use your calculator to
show that r = 0.8782, correct to four decimal places.
Batsman 1 2 3 4 5 6 7 8 9 10 11
Runs scored 27 8 21 47 3 15 13 2 15 10 2
Balls faced 29 16 19 62 13 40 16 9 28 26 6
c This table shows the hours worked and university participation rate (%) in six
countries. Hours worked and university participation rate are linearly associated.
Use your calculator to show that r = −0.6727, correct to four decimal places.
If two variables are associated, it is possible to estimate the value of one variable from
that of the other. For example, people’s weights and heights are associated. Thus, given a
person’s height, we can roughly predict their weight. The degree to which we can make such
predictions depends on the value of r. If there is a perfect linear association (r = 1) between
two variables, we can make an exact prediction.
For example, when you buy cheese by the gram there is an exact association between the
weight of the cheese and the amount you pay (r = 1). At the other end of the scale, there is
no association between an adult’s height and their IQ (r ≈ 0). So knowing an adult’s height
will not enable you to predict their IQ any better than guessing.
If the correlation between weight and height is r = 0.8, find the value of the coefficient of
determination. Express your answer as a percentage.
The coefficient of determination = r2 = 0.82 = 0.64 = 64%
Note: We have converted the coefficient of determination into a percentage (64%) as this is the most useful
form when we come to interpreting the coefficient of determination.
We now know how to calculate the coefficient of determination, but what does it tell us?
In the previous example we found the coefficient of determination between height and
weight to be 0.64 (or 64%). Interpret this value in terms of the variables weight and
The coefficient of determination tells us that 64% of the variation in people’s weight is
explained by the variation in their height.
The level of carbon monoxide (CO) in the air measured at the roadside, and the traffic
volume at the same location are linearly related, with r =+0.985. Determine the value
of the coefficient of determination, write it in percentage terms and interpret. In this
relationship, traffic volume is the explanatory variable.
The coefficient of determination is:
r2 = (0.985)2 = 0.9702
Written as a percentage: 0.9702 × 100 = 97.0% rounded to one decimal place.
Therefore, 97.0% of the variation in carbon monoxide levels in the air can be explained
by the variation in traffic volume.
Clearly, traffic volume is a very good predictor of carbon monoxide levels in the air. Thus,
knowing the traffic volume enables us to predict carbon monoxide levels with a high degree
of accuracy. This is not the case with the next example.
Scores on tests of verbal and mathematical ability are linearly related with correlation
coefficient r =+0.275. Determine the value of the coefficient of determination, write it
in percentage terms, and interpret. In this relationship, verbal ability is the explanatory
The coefficient of determination is:
r2 = (0.275)2 = 0.0756
Written as a percentage: 0.0756 × 100 = 7.6% rounded to one decimal place.
Therefore, only 7.6% of the variation observed in scores on the mathematical ability test
can be explained by the variation in scores obtained on the verbal ability test.
Clearly, scores on the verbal ability test are not good predictors of the scores on the
mathematical ability test; 92.4% of the variation in mathematical ability is explained by
other factors.
Given the value of the coefficient of determination we can reverse the calculation and find
the value of the correlation coefficient. However, since the square root of a number can be
positive or negative, we need more information to be able to do this correctly, such as a
Explanation Solution
1 Since we know the value of the coefficient of r2 = 0.5210
determination (= r2 ), we need to find the square ∴ r = ± 0.5210 = ±0.7218
root of this value to find r.
2 There are two solutions, one positive and the Scatterplot indicates a negative
other negative. Use the scatterplot to decide association.
which applies.
3 Write down your answer. ∴ r = −0.7218
Exercise 2G
4 The value of the correlation coefficient, r (rounded to two decimal places) is closest to.
A 0.16 B 0.40 C 0.63 D –0.40 E –0.63
5 The percentage of variation in time explained by the variation in the number of training
sessions is closest to:
A 39.7% B 63.0% C 15.8% D 37.0% E 60.3%
6 The percentage of variation in time NOT explained by the variation in the number of
training sessions is closest to:
A 39.7% B 63.0% C 15.8% D 37.0% E 60.3%
7 Suppose that in a certain industry the correlation between years spent studying and
income for employees is 0.73, and the correlation between age and income is 0.45.
Given this information, which one of the following statements is true?
A Older employees tend to have spent more years studying.
B The correlation between age and years spent studying is 0.32.
C Age explains a higher percentage of the variation in income than years spent
D Years spent studying explains a higher percentage of the variation in income than
E Together age and years spent studying explain 100% of the variation in income.
Recently there has been interest in the strong association between the number of Nobel
prizes a country has won and the number of IKEA stores in that country (r = 0.82). This
strong association is evident in the scatterplot below. Here country flags are used to
represent the data points.
10 Million Population
Nobel Laureates per
5 r = 0.82
0 5 10 15 20
IKEA Stores per 10 Million Population
Does this mean that one way to increase the number of Australian Nobel prize winners is to
build more IKEA stores?
Almost certainly not, but this association highlights the problem of assuming that a strong
correlation between two variables indicates the association between them is causal.
To help you with this concept, you should watch the video ‘The Question of Causation’,
which can be accessed through the link below. It is well worth 15 minutes of your time.
Establishing causality
To establish causality, you need to conduct an experiment. In an experiment, the value of
the explanatory variable is deliberately manipulated, while all other possible explanatory
variables are kept constant or controlled. A simplified version of an experiment is displayed
Treatment 1:
Group 1 Lesson on time
Randomly allocate
a group of students Give test on
to two groups time series
Treatment 2:
Group 2 Lesson on
In this experiment, a class of students is randomly allocated into two groups. Random
allocation ensures that both groups are as similar as possible.
Next, group 1 is given a lesson on time series (treatment 1), while group 2 is given a lesson
on Shakespeare (treatment 2). Both lessons are given under the same classroom conditions.
When both groups are given a test on time series the next day, group 1 does better than
group 2.
We then conclude that this was because the students in group 1 were given a lesson on time
causes Sunscreen
Temperature observed
causes Fainting
Unfortunately, being able to attribute an association to a single third variable is the exception
rather than the rule. More often than not, the situation is more complex.
Confounding variables
Statistics show that crime rates and unemployment rates in a city are strongly correlated. Can
you then conclude that a decrease in unemployment will lead to a decrease in crime rates?
It might, but other possible causal explanations could be found. For example, these data
were collected during an economic downturn. Perhaps the state of the economy caused the
problem. See the diagram below.
causes ? observed
Economy association
causes ?
In this situation, we have at least two possible causal explanations for the observed
association, but we have no way of disentangling their separate effects. When this happens,
the effects of the two possible explanatory variables are said to be confounded, because we
have no way of knowing which is the actual cause of the association.
It turns out that there is a strong correlation (r = 0.99) between the consumption of
margarine and the divorce rate in the American state of Maine. Can we conclude that eating
margarine causes people in Maine to divorce?
A better explanation is that this association is purely coincidental.
Occasionally, it is almost impossible to identify any feasible confounding variables to
explain a particular association. In these cases we often conclude that the association is
‘spurious’ and it has happened just happened by chance. We call this coincidence.
However suggestive a strong association may be, this alone does not provide sufficient
evidence for you to conclude that two variables are causally related. Unless the association
is totally spurious and devoid of meaning, it will always be possible to find at least one
variable ‘lurking’ in the background that could explain the association.
Exercise 2H
2 There is a clear positive correlation between the number of churches in a town and
the amount of alcohol consumed by its inhabitants. Does this mean that religion is
encouraging people to drink? What common cause might counter this conclusion?
3 There is a strong positive correlation between the amount of ice-cream consumed and
the number of drownings each day. Does this mean that eating ice-cream at the beach
is dangerous? What common cause might explain this association?
4 The number of days a patient stays in hospital is positively correlated with the number
of beds in the hospital. Can it be said that bigger hospitals encourage patients to stay
longer than necessary just to keep their beds occupied? What common cause might
counter this conclusion?
5 Suppose we found a high correlation between smoking rates and heart disease across
a group of countries. Can we conclude that smoking causes heart disease? What
confounding variable(s) could equally explain this correlation?
6 There is a strong correlation between cheese consumption and the number of people
who died after becoming tangled in their bed sheets. What do you think is the most
likely explanation for this correlation?
7 There is a strong positive correlation between the number of fire trucks attending a
house fire and the amount of damage caused by the fire. Is the amount of damage
in a house fire caused by the fire trucks? What common cause might explain this
2I Which graph?
When investigating associations your first decision is choosing an appropriate graph to
display and understand the data you have been given. This decision depends on the type
of variables involved – that is, whether they are both categorical, one categorical and one
numerical, or both numerical.
The following guidelines might help you make your decision. They are guidelines only,
because in some instances there may be more than one suitable graph.
Type of variables
Response variable Explanatory variable Graph
Categorical Categorical Segmented bar chart.
Numerical Categorical Parallel boxplots, parallel dot plots
Numerical Categorical Back-to-back stem plot, parallel dot
(two categories only) plots or parallel boxplots
Numerical Numerical Scatterplot
Exercise 2I
1 Which graphical display (parallel boxplots, parallel dot plots, back-to-back stem plot, a
segmented bar chart or a scatterplot) would be appropriate to display the relationships
between the following? There may be more than one appropriate graph.
a vegetarian (yes, no) and sex (male, female)
b mark obtained on a statistics test and time spent studying (in hours)
c number of hours spent at the beach each year and state of residence
d number of CDs purchased per year and income (in dollars)
e runs scored in a cricket game and number of ‘overs’ faced
f attitude to compulsory sport in school (agree, disagree, no opinion) and school type
(government, independent)
g income level (high, medium, low) and place of residence (urban, rural)
h number of cigarettes smoked per day and sex (male, female)
Key ideas and chapter summary
Bivariate data Bivariate data are generated when information about two variables is
Assign- recorded for each subject.
Two-way Two-way frequency tables are used as the starting point for
frequency tables investigating the association between two categorical variables.
Segmented bar A segmented bar chart can be used to graphically display the
charts information contained in a two-way frequency table. It is a useful tool
for identifying relationships between two categorical variables.
Correlation and A correlation between two variables does not automatically imply that
causation the association is causal. Alternative non-causal explanations for the
association include a common response to a common third variable, a
confounded variable or simply coincidence.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
2C 8 I can use parallel dot plots to identify and describe the association between a
numerical variable and a categorical variable.
2C 9 I can use back-to-back stem plots to display and describe the association
between a numerical variable and a categorical variable.
2C 10 I can use parallel boxplots to display and describe the association between a
numerical variable and a categorical variable.
See Example 13, Example 14, Example 15, and Exercise 2E Question 2
Multiple-choice questions
The information in the following frequency table relates to Questions 1 to 4.
Plays sport Male Female
Yes 68 79
No 34
Total 102 175
Year 10 Year 12
Year level
5 The percentage of Year 12 students who do not read for leisure is closest to:
A 10% B 25% C 30% D 75% E 90%
6 The results could be summarised in a two-way frequency table. Which of the following
frequency tales could match the percentaged segmented bar chart?
Year Level Year Level
Read Year 10 Year 12 Read Year 10 Year 12
Yes 31 45 Yes 45 11
No 47 66 No 135 99
Year Level Year Level
Read Year 10 Year 12 Read Year 10 Year 12
Yes 75 90 Yes 75 25
No 25 10 No 90 10
Year Level
Read Year 10 Year 12
Yes 40 8
No 38 5
8 The stem plots displays the time taken (in minutes) for two groups of 12 people to
solve a complex puzzle. Before commencing the puzzle the people were divided
into two groups and assigned a different acitvity. Group A were asked to exercise
vigorously for 10 minutes, while Group B were asked to meditate for 10 minutes.
Group A Group B
1|2 = 21 minutes 9 0 5 6 7 1|6 = 16 minutes
3 1 1 2 4
8 6 5 1 5 5 6
4 4 3 1 2 0 1 2 3
8 5 2
The information in the stem plots supports the contention that there is an association
between time and activity because:
A The median time for Group A is more than the median time for Group B.
B The range of times for both groups are approximately equal.
C The median time for Group B is more than the median time for Group A.
D Both distributions are approximately symmetric.
E Both distributions are negatively skewed.
The information in the following parallel boxplots relates to Questions 9 and 10.
10 20 30 40 50 60
9 The variables battery life and brand are:
A both categorical variables
B a categorical and a numerical variable respectively
C a numerical and a categorical variable respectively
D both numerical variables
E neither a numerical nor a categorical variable
10 Which of the following statements (there may be more than one) support the contention
that battery life and brand are related?
I the median battery life for brand A is clearly higher than for brand B
II battery lives for brand B are more variable than brand A
III the distribution of battery lives for brand A is symmetrical with outliers but
positively skewed for brand B
A I only B II only C III only D I and II only E I, II and III
11 The association between weight at age 21 (in kg) and weight at birth (in kg) is to be
investigated. The variables weight at age 21 and weight at birth are:
A both categorical variables
B a categorical and a numerical variable respectively
C a numerical and a categorical variable respectively
D both numerical variables
E neither numerical nor categorical variables
13 The association between weight at age 21 and weight at birth for a group of males is
found to be positive and linear, with a correlation coefficient of r = 0.58. For males, the
percentage of variation in weight at age 21 explained by the variation in weight at birth
is closest to:
A 0.34% B 24% C 34% D 58% E 76%
14 The variables response time to a drug and drug dosage are linearly associated, with
r = −0.9. From this information, we can conclude that:
A response times are –0.9 times the drug dosage
B response times decrease with decreased drug dosage
C response times decrease with increased drug dosage
D response times increase with increased drug dosage
E response times are 81% of the drug dosage
15 The birth weight and weight at age 21 of eight women are given in the table below.
Birth weight (kg) 1.9 2.4 2.6 2.7 2.9 3.2 3.4 3.6
Weight at 21 (kg) 47.6 53.1 52.2 56.2 57.6 59.9 55.3 56.7
The value of the correlation coefficient is closest to:
A 0.536 B 0.6182 C 0.7863 D 0.8232 E 0.8954
The correlation coefficient between heart weight and body weight in a group of mice is
r = 0.765.
18 Given that heart weight and body weight of mice are strongly correlated (r = 0.765),
we can conclude that:
A increasing the body weights of mice will decrease their heart weights
B increasing the body weights of mice will increase their heart weights
C increasing the body weights of mice will not change their heart weights
D heavier mice tend to have lighter hearts
E heavier mice tend to have heavier hearts
19 We wish to investigate the association between the variables weight (in kg) of young
children and level of nutrition (poor, adequate, good). The most appropriate graphical
display would be:
A a histogram B parallel boxplots C a segmented bar chart
D a scatterplot E a back-to-back stem plot
21 There is a strong linear positive correlation (r = 0.85) between the amount of garbage
recycled and salary level.
From this information, we can conclude that:
A the amount of garbage recycled can be increased by increasing people’s salaries
B the amount of garbage recycled can be increased by decreasing people’s salaries
C increasing the amount of garbage you recycle will increase your salary
D people on high salaries tend to recycle less garbage
E people on high salaries tend to recycle more garbage
22 There is a strong linear positive correlation (r = 0.95) between the marriage rate in
Kentucky and the number of people who drown falling out of a fishing boat.
From this information, the most likely conclusion we can draw from this correlation is:
A reducing the number of marriages in Kentucky will decrease the number of people
who drown falling out of a fishing boat
B increasing the number of marriages in Kentucky will increase the number of people
who drown falling out of a fishing boat
C this correlation is just coincidence, and changing the marriage rate will not affect
the number of people drowning in Kentucky in any way
D only married people in Kentucky drown falling out of a fishing boat
E stopping people from going fishing will reduce the marriage rate in Kentucky
a What are the variables shown in the table? Are they categorical or numerical?
b Determine the response and explanatory variables.
c How many drivers under the age of 30 had more than one accident?
d Convert the table values to percentages by calculating the column percentages.
e Use these percentages to comment on the statement: ‘Of drivers who had an
accident in the past year, younger drivers (age < 30) are more likely than older
drivers (age ≥ 30) to have had more than one accident.’
Conversation test score 80
0 weeks 6 weeks 12 weeks
Completed weeks of course
a The two variables are Completed weeks of course and Conversation test score.
Which is numerical and which is categorical?
b Use the boxplots to compare these distributions, and draw an appropriate conclusion
about the association between the number of weeks of the course completed and the
score in the conversation test. Quote appropriate statistics in your response.
3 The data below give the hourly pay rates (in dollars per hour) of 10 production-line
workers along with their years of experience on initial appointment.
Rate ($ /h) 22.57 25.78 28.84 27.37 27.23 24.64 28.95 33.35 29.68 33.99
Experience (years) 1 1 2 2 3 4 5 6 8 12
a Determine which variable is the explanatory variable and which is the response
b Use a CAS calculator to construct a scatterplot of the data,
c Comment on direction, outliers, form and strength of any association revealed.
d Determine the value of the correlation coefficient (r) rounded to three decimal
e Determine the value of the coefficient of determination (r2 ), giving your answer as a
percentage rounded to one decimal place, and interpret.
4 In a study of the effects of meditation on the quality of sleep a sample of 500 people
were asked to rate the quality of their sleep as ‘good’, ‘OK’, or ‘poor’ before and
after participating in the course. Their responses are shown in the segmented bar chart
Quality of sleep
50% Poor
40% Ok
30% Good
Before After
Meditation course
a What percentage of people rated the quality of their sleep as ‘good’ before they
participated in the course?
b Does the segmented bar chart support the contention that for these people their
quality of sleep is associated with participation in the course? Justify your answer
by quoting appropriate percentages.
Chapter questions
I What is linear regression?
I What is a residual?
I What is a least squares line of best fit?
I How do you find the equation of the least squares line using summary
I How do you find the equation of the least squares line using technology?
I How do you interpret the intercept and slope of the least squares line?
I How do you use the equation of the least squares line to make predictions?
I How do you use the coefficient of determination in a regression analysis?
I What is a residual plot and how is it used?
I How do you report a regression analysis?
The process of modelling an association with a straight line is known as linear regression
and the resulting line is often called the regression line.
The equation of a line relating two variables x and y is of the form
y = a + bx
where a and b are constants. When the equation is written in this form:
a represents the coordinate of the point where the line crosses the y-axis (the y-intercept)
b represents the slope of the line.
In order to summarise any particular (x, y) data set, numerical values for a and b are needed
that will ensure the line passes close to the data. There are several ways in which the values
of a and b can be found.
The easiest way to fit a line to bivariate data is to construct a scatterplot and draw the line
‘by eye’. We do this by placing a ruler on the scatterplot so that it seems to follow the
general trend of the data. You can then use the ruler to draw a straight line. Unfortunately,
unless the points are very tightly clustered around a straight line, the results you get by using
this method will differ a lot from person to person.
The more mathematical approach to fitting a straight line to data is to use the least squares
method. This method assumes that the variables are linearly related, and works best when
there are no clear outliers in the data.
Some terminology
To explain the least squares method, we need to define several terms.
The assumptions for fitting a least squares line to data are the same as for using the
correlation coefficient, r. These are that:
the data is numerical the association is linear there are no clear outliers.
Note: The formula for the slope of the least squares regression line can be used to find
the value of the correlation coefficient (r), when the slope is known.
bs x
The correlation coefficient (r) is given by r =
If you do not correctly decide which is the explanatory variable (the x-variable) and
which is the response variable (the y-variable) before you start calculating the equation of
the least squares regression line, you will get the wrong answer.
Example 1 Determining the equation of the least squares regression line using
summary statistics and the correlation coefficient
The height and weight of 11 people have been recorded, and the values of the following
statistics determined:
height weight
mean 173.3 cm 65.45 kg
standard deviation 7.444 cm 7.594 kg
correlation coefficient r = 0.8502
Use the formula to determine the equation of the least squares regression line that enables
weight to be predicted from height. Calculate the values of the slope and intercept
rounded to two decimal places.
Explanation Solution
1 Identify and write down the EV: height (x)
explanatory variable (EV) and the
RV: weight (y)
response variable (RV). Label as x and
y, respectively.
2 Write down the given information. x = 173.3 s x = 7.444
y = 65.45 sy = 7.594
r = 0.8502
3 Calculate the slope. Slope:
rsy 0.8502 × 7.594
b= =
sx 7.444
Example 2 Determining the correlation coefficient using the slope of the least
squares regression line
Use the following information to find the value of the correlation coefficient r, rounded to
three significant figures.
Explanation Solution
1 Identify and write down the EV: hours studied (x)
explanatory variable (EV) and the
RV: exam score (y)
response variable (RV). Label as x and
y, respectively.
Height (x) 177 182 167 178 173 184 162 169 164 170 180
Weight (y) 74 75 62 63 64 74 57 55 56 68 72
Determine and graph the equation of the least squares regression line that will enable
weight to be predicted from height. Write the intercept and slope rounded to three
significant figures.
1 Start a new document by pressing / + N.
2 Select Add Lists & Spreadsheet. Enter the data
into lists named height and weight, as shown.
3 Identify the explanatory variable (EV) and
the response variable (RV).
EV: height
RV: weight
Note: In saying that we want to predict weight from
height, we are implying that height is the EV.
4 Press / + I and select Add Data & Statistics.
Construct a scatterplot with height (EV) on the
horizontal (or x-) axis and weight (RV) on the
vertical (or y-) axis.
Press b>Settings and click the Diagnostics
box. Select OK to activate this feature for all
future documents. This will show the coefficient
of determination (r2 ) whenever a regression is
Height (x) 177 182 167 178 173 184 162 169 164 170 180
Weight (y) 74 75 62 63 64 74 57 55 56 68 72
Determine and graph the equation of the least squares regression line that will enable
weight to be predicted from height. Write the intercept and slope rounded to three
significant figures.
1 Open the Statistics application
and enter the
data into columns labelled height
and weight.
2 Tap to open the Set
StatGraphs dialog box and
complete as shown.
Tap Set to confirm your selections.
3 Tap in the toolbar at the top of
the screen to plot the scatterplot in
the bottom half of the screen.
sheet Exercise 3A
Basic ideas
1 What is a residual?
3 Write down the three assumptions we make about the association we are modelling
when we fit a least squares line to bivariate data.
x y
mean 10.65 19.91
standard deviation 5.162 6.619
correlation coefficient r = 0.7818
5 We wish to find the equation of the least squares regression line that enables pollution
level beside a freeway to be predicted from traffic volume.
a Which is the response variable (RV) and which is the explanatory variable (EV)?
b Use the formula to determine the equation of the least squares regression line that
enables the pollution level to be predicted from the traffic volume where:
Write the equation in terms of pollution level and traffic volume with the intercept
and slope rounded to two significant figures.
6 We wish to find the equation of the least squares regression line that enables life
expectancy in a country to be predicted from birth rate.
a Which is the response variable (RV) and which is the explanatory variable (EV)?
b Use the formula to determine the equation of the least squares regression line that
enables life expectancy to be predicted from birth rate, where:
Write the equation in terms of life expectancy and birth rate with the y-intercept and
slope rounded to two significant figures.
x y
mean 12.51 10.65
standard deviation 4.796 5.162
least squares equation y = 16.72 − 0.4847x
8 The equation of the least squares regression line that enables distance travelled by a car
(in 1000s of km) to be predicted from its age (in years) was found to be:
distance = 15.62 + 11.08 × age
a Which is the response variable (RV) and which is the explanatory variable (EV)?
b Use the following information to find the value of the correlation coefficient r,
rounded to three significant figures.
distance age
mean 78.0 5.63
standard deviation 42.6 3.64
9 The following questions relate to the formulas used to calculate the slope and intercept
of the least squares regression line.
a A least squares line is calculated and the slope is found to be negative. What does
this tell us about the sign of the correlation coefficient?
b The correlation coefficient is zero. What does this tell us about the slope of the least
squares regression line?
c The correlation coefficient is zero. What does this tell us about the intercept of the
least squares regression line?
Using a CAS calculator to determine the equation of the least squares line from data
10 The table shows the number of sit-ups and push-ups performed by six students.
Sit-ups (x) 52 15 22 42 34 37
Push-ups (y) 37 26 23 51 31 45
Let the number of sit-ups be the explanatory (x) variable. Use your calculator to show
that the equation of the least squares regression line is:
push-ups = 16.5 + 0.566 × sit-ups (rounded to three significant figures)
11 The table shows average hours worked and university participation rates (%) in six
Hours 35.0 43.0 38.2 39.8 35.6 34.8
Rate 26 20 36 25 37 55
Use your calculator to show that the equation of the least squares regression line that
enables participation rates to be predicted from hours worked is:
rate = 130 − 2.6 × hours (rounded to two significant figures)
12 The table shows the number of runs scored and balls faced by batsmen in a cricket
Runs (y) 27 8 21 47 3 15 13 2 15 10 2
Balls faced (x) 29 16 19 62 13 40 16 9 28 26 6
a Use your calculator to show that the equation of the least squares regression line
enabling runs scored to be predicted from balls faced is:
y = −2.6 + 0.73x
b Rewrite the regression equation in terms of the variables involved.
13 The table below shows the number of TVs and cars owned (per 1000 people) in six
15 The statistical analysis of the set of bivariate data involving variables x and y resulted
in the information displayed in the table below:
x y
mean 32.5 88.1
standard deviation 3.42 6.84
least squares equation y = −2.56 + 1.45x
Using this information the value of the correlation coefficient r for this set of bivariate
data is closest to
A 0.73 B 0.34 C 0.50 D 0.53 E 0.78
16 A retailer recorded the number of ice creams sold and the day’s maximum temperature
over 8 consecutive Saturdays one summer.
Temperature (◦ C) 22 25 36 34 21 28 41 31
Number of ice creams sold 145 155 200 198 150 179 230 180
The equation of the least squares regression line fitted to the data is closest to:
A number of ice-creams = 4.08 + 58.2× temperature
B number of ice-creams = −12.9 + 0.237× temperature
C number of ice-creams = 58.2 + 4.08× temperature
D temperature = 3.57 + 72.3× number of ice-creams
E temperature = −12.8 + 0.237× number of ice-creams
Suppose, for example, that we wish to investigate the nature of the association between the
price of a secondhand car and its age. The ultimate aim is to find a mathematical model that
will enable the price of a secondhand car to be predicted from its age.
The age (in years) and price (in dollars) of a selection of secondhand cars of the same brand
and model have been collected and are recorded in a table (shown).
Age (years) Price (dollars) Age (years) Price (dollars) Age (years) Price (dollars)
1 32 500 3 22 000 5 18 400
1 30 500 4 22 000 6 6 500
2 25 600 4 23 000 7 6 400
3 20 000 4 19 200 7 8 500
3 24 300 5 16 000 8 4 200
We start our investigation of the association between price and age by constructing a
scatterplot and using it to describe the association in terms of strength, direction and form.
In this analysis, age is the explanatory variable.
From the scatterplot, we see that there is a 40000
strong, negative, linear association between 35000
the price of the car and its age. There are no 30000
Price (dollars)
clear outliers. The correlation coefficient is
r = −0.9643.
0 1 2 3 4 5 6 7 8 9 10
Age (years)
The equation of the least squares regression line from these data is:
price = 35 100 − 3940 × age
Consider again the least squares regression line relating the age of a car to its price:
price = 35100 − 3940 × age
The two key values in this mathematical model are the intercept (35100) and the slope
(−3940). The interpretation of these values is discussed in the following example.
Price (dollars)
price = 35 100 − 3940 × age
a Interpret the slope in terms of the 20000
variables price and age. 15000
b Interpret the intercept in terms of the 10000
variables price and age. 5000
0 1 2 3 4 5 6 7 8 9 10
Age (years)
Explanation Solution
a The slope predicts the average change On average, for each additional
(increase/decrease) in the price for each 1-year year of age the price of these cars
increase in the age. Because the slope is decreases by $3940.
negative, it will be a decrease.
b The intercept predicts the value of the price of On average, the price of these cars
the car when age equals 0; that is, when the car when new was $35 100.
is new.
The equation of a regression line that enables the price of a second-hand car to be
predicted from its age is:
price = 35 100 − 3940 × age
Use this equation to predict the price of a car that is 5.5 years old.
Explanation Solution
There are two ways this can be done. 40000
One is to draw a vertical arrow at 35000
Price (dollars)
across to the price axis as shown, to get an
answer of around $14 000.
15000 (5.5, 13 430)
A more accurate answer is obtained by
substituting age = 5.5 into the equation to obtain 5000
$13 430, as shown below. 0
0 1 2 3 4 5 6 7 8 9 10
Price = 35 100 − 3940 × 5.5 Age (years)
= $13 430
a educational attainment: r2 = 0.262 = 6.8%
student : teacher ratio : r2 = (−0.38)2 = 14.4%
b The variable student : teacher ratio explains 14.4% of the variation in educational
attainment, making it a more important explanatory variable than the amount spent on
education which explains only 6.8%.
Residual plot
A residual plot is a graph of the residuals (plotted on the vertical axis) against the
explanatory variable (plotted on the horizontal axis), where:
Residual value = actual data value − predicted data value
The actual price of the 6-year-old car is $6500. Calculate the residual when its price is
predicted using the regression equation: price = 35100 − 3940 × age
Explanation Solution
1 Write down the actual price. Actual price: $6500
2 Determine the predicted price using
the least squares regression equation: Predicted price = 35 100 − 3940 × 6
price = 35100 − 3940 × age = $ 11 460
3 Determine the residual. Residual = actual - predicted
= $6500 − $11 460
= −$4960
plot. Because the mean of the residuals
is always zero, we will construct the
horizontal axis for the plot at zero
(indicated by the red line) as shown.
1 2 3 4 5 6 7 8 9
Age (years)
6 1
y=6–x .5
y 3 0 x
0 −1
0 1 2 3 4 5 6 0 1 2 3 4 5 6
By contrast, the relationship shown in the following scatterplot is clearly non-linear. Fitting
a straight line to the data results in the residual plot shown. While there is some random
behaviour, there is also a clearly identifiable curve shown in the scatterplot.
5 1
4 .5
y 0
2 x
0 −1
0 1 2 3 4 5 6 7 8 9 10 0 2 4 6 8 10
Which of the following residual plots would call into question the assumption of linearity
in a regression analysis? Give reasons for your answers.
1 1
.5 .5
0 0 x
−.5 −.5
−1 −1
0 2 4 6 8 10 0 2 4 6 8 10
1 1
.5 .5
0 0 x
−.5 −.5
−1 −1
0 2 4 6 8 10 0 2 4 6 8 10
Explanation Solution
Examine each plot, looking for a pattern Plot A – residuals look random, so linearity
or structure in the residual. assumption is met.
Plot B – there is a clear curve in the
residuals, the linearity assumption is not
Plot C – residuals look random, so
linearity assumption is met.
Plot B – there is a clear curve in the
residuals, the linearity assumption is not
Construct a report to describe the association between the price and age of secondhand
From the scatterplot we see that there is a strong negative, linear association between the
price of a second hand car and its age, r = −0.964. There are no obvious outliers.
The equation of the least squares regression line is: price = 35 100 − 3940 × age.
The slope of the regression line predicts that, on average, the price of these
second-hand cars decreased by $3940 each year.
The intercept predicts that, on average, the price of these cars when new was $35 100.
The coefficient of determination indicates that 93% of the variation in the price of these
second-hand cars is explained by the variation in their age.
The lack of a clear pattern in the residual plot confirms the assumption of a linear
association between the price and the age of these second-hand cars.
sheet Exercise 3B
Some basics
1 Use the line on the scatterplot opposite to 100
determine the equation of the regression line in
terms of the variables, mark and days absent.
Give the intercept correct to the nearest whole 60
0 1 2 3 4 5 6 7 8
Days absent
3 The following regression equation can be used to predict a company’s weekly sales ($)
from their weekly online advertising expenditure ($).
sales = 575 + 4.85 × expenditure
a Write down the value of the intercept, and interpret this value in this context of the
variables in the equation.
b Write down the value of the slope, and interpret this value in this context of the
variables in the equation.
5 When preparing between 25 and 100 meals, a hospital’s cost (in dollars) is given by the
cost = 487.50 + 6.70 × meals
Use this equation to predict the cost (to the nearest dollar) of preparing the following
meals. Are you interpolating or extrapolating?
a 0 meals b 80 meals c 110 meals
6 For males of heights from 150 cm to 190 cm tall cm, the equation relating a son’s
height (in cm) to his father’s height (in cm) is:
son’s height = 83.9 + 0.525 × f ather’s height
Use this equation to predict (to the nearest cm) the adult height of a male whose father
is the following heights. State, with a reason, how reliable your predictions are in each
a 170 cm tall b 200 cm tall c 155 cm tall
9 For a 100 km trip, the equation of a regression line that enables fuel consumption of a
car (in litres) to be predicted from its weight (kg) is:
fuel consumption = −0.1 + 0.01 × weight
a Use this equation to predict (to one decimal place) the fuel consumption of a car
which weighs 980kg.
b This car has an actual fuel consumption of 8.9 litres. What is the residual value for
this for this data point?
3.0 3
−1.5 −3
0 2 4 6 8 10 12 0 2 4 6 8 10 12 14
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
3B 3B Using the least squares regression line to model a relationship 191
12 In an investigation of the association between the food energy content (in calories) and
the fat content (in g) in a standard-sized packet of chips, the least squares regression
line was found to be:
energy content = 27.8 + 14.7 × fat content r2 = 0.7569
a Write down the value of the intercept, and interpret this value in this context of the
variables in the equation.
b Write down the value of the slope, and interpret this value in this context of the
variables in the equation.
c Interpret the value of the coefficient of determination in terms of the variables in
energy content and fat content.
d Use this equation to predict the energy content of a packet of chips which contains 8
grams of fat.
e If the actual energy content of a packet of chips containing 8 grams of fat is 132
calories, what is the value of the residual?
13 In an investigation of the association between the success rate (%) of sinking a putt and
the distance from the hole (in cm) of amateur golfers, the least squares regression line
was found to be:
success rate = 98.5 − 0.278 × distance r2 = 0.497
a Write down the slope of this regression equation and interpret.
b Use the equation to predict the success rate when a golfer is 90 cm from the hole.
c At what distance (in metres) from the hole does the regression equation predict an
amateur golfer to have a 0% success rate of sinking the putt?
d Calculate the value of r, rounded to three decimal places.
e Write down the value of the coefficient of determination in percentage terms and
14 The scatterplot opposite shows the pay 24
rate (dollars per hour) paid by a company 23
to workers with different years of work
Pay rate ($)
experience. Using a calculator, the equation 20
of the least squares regression line is found 19
to have the equation: 17
y = 18.56 + 0.289x with r = 0.967
0 2 4 6 8 10 12
Experience (years)
a Is it appropriate to fit a least squares regression line to the data? Why?
b Work out the coefficient of determination.
c What percentage of the variation in a person’s pay rate can be explained by the
variation in their work experience?
d Write down the equation of the least squares line in terms of the variables pay rate
and years of experience.
e Interpret the y-intercept in terms of the variables pay rate and years of experience.
What does the y-intercept tell you?
f Interpret the slope in terms of the variables pay rate and years of experience. What
does the slope of the regression line tell you?
g Use the least squares regression equation to:
i predict the hourly wage of a person with 8 years of experience
ii determine the residual value if the person’s actual hourly wage is $21.20.
h The residual plot for this regression 0.5
analysis is shown opposite. Does 0.3
the residual plot support the initial 0.1
assumption that the relationship 0
between pay rate and years of −0.2
experience is linear? −0.4
Explain your answer. Experience (years)
15 The scatterplot opposite shows scores on a 5
hearing test against age. In analysing the data, a
Hearing test score
Does the residual plot support the initial 0.0
assumption that the relationship between hearing −0.5
test score and age is essentially linear? Explain −1.2
your answer. 0 2 4 6 8 10 12
response time
drug, the response time (in minutes) was
measured for different drug doses (in mg). A least 40
Regression equation: y = a + bx 0
a = 55.8947
b = −9.30612 −6
r2 = 0.901028
r = −0.949225 0.0 1.0 2.0 3.0 4.0 5.0 6.0
drug dose
Use this information to complete the following report. Call the two variables drug dose
and response time. In this analysis drug dose is the explanatory variable.
From the scatterplot we see that there is a strong relationship between
response time and :r= . There are no obvious outliers.
The equation of the least squares regression line is:
response time = + × drug dose
The slope of the regression line predicts that, on average, response time
increases/decreases by minutes for a 1-milligram increase in drug
The y-intercept of the regression line predicts that, on average, the response time
when no drug is administered is minutes.
The coefficient of determination indicates that, on average, % of the
variation in is explained by the variation in .
The residual plot shows a , calling into question the use of a linear equation
to describe the relationship between response time and drug dose.
bone) length and radius (the short thicker bone 26.5
Regression equation y = a + bx 0.00
a = −7.24946
b = 0.739556
r2 = 0.975291
r = 0.987568 42.5 43.5 44.5 45.5 46.5 47.5
Use the format of the report given in the previous question to summarise findings of
this investigation. Call the two variables femur length and radius length.
19 The table below shows the life expectancy in years and the percentage of government
expenditure which is spent on health (health) in 10 countries.
Health 17.3 10.3 4.7 6.0 20.1 6.0 13.2 7.7 10.1 17.5
Life expectancy (years) 82 76 68 69 83 75 76 76 75 75
A least squares line which enables a country’s life expectancy to be predicted from
their expenditure on health is fitted to the data. The number of times that a country’s
predicted life expectancy is greater than their actual life expectancy is:
A 3 B 4 C 5 D 6 E 7
20 In a study of the association between the length in cm and weight in grams of a certain
species of fish the following least squares line was obtained:
weight = −329 + 23.3 × length
Which one of the following is a conclusion that can be made from this least squares
A On average, the weight of the fish increased by 23.3 grams for each centimetre
increase in length.
B On average, the length of the fish increased by 23.3 cm for each one gram increase
in weight.
C On average, the weight of the fish decreased by 329 grams grams for each
centimetre increase in length.
D The equation cannot be correct as the weight of the fish can never be negative.
E The weight of the fish in grams can be determined by subtracting 305.7 from their
Birth rate 30 38 38 43 34 42 31 32 26 34
Life expectancy (years) 66 54 43 42 49 45 64 61 61 66
1 Write down the explanatory variable EV: birth
(EV) and response variable (RV). Use RV: life
the variable names birth and life.
2 Start a new document by pressing
/ + N.
Select Add Lists & Spreadsheet.
Enter the data into the lists named birth
and life, as shown.
1 Write down the explanatory variable EV: birth
(EV) and response variable (RV). Use RV: life
the variable names birth and life.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
3C Conducting a regression analysis using data 197
Inspect the plot and write your The random residual plot suggests
conclusion. linearity.
Note: When you performed a regression analysis earlier, the residuals were calculated automatically
and stored in list3. The residual plot is a scatterplot with list3 on the vertical axis and birth on the
horizontal axis.
Exercise 3C
1 The table below shows the scores obtained by nine students on two tests. We want to
be able to predict test B scores from test A scores.
Use your calculator to perform each of the following steps of a regression analysis.
a Construct a scatterplot. Name the variables test a and test b.
b Determine the equation of the least squares line along with the values of r and r2 .
2 The table below shows the number of careless errors made on a test by nine students.
Also given are their test scores. We want to be able to predict test score from the
number of careless errors made.
Test score 18 15 9 12 11 19 11 14 16
Careless errors 0 2 5 6 4 1 8 3 1
Use your calculator to perform each of the following steps of a regression analysis.
a Construct a scatterplot. Name the variables score and errors.
b Determine the equation of the least squares line along with the values of r and r2 .
Write answers rounded to three significant figures.
c Display the regression line on the scatterplot.
d Obtain a residual plot.
3 How well can we predict an adult’s weight from their birth weight? The weights of 12
adults were recorded, along with their birth weights. The results are shown.
Birth weight (kg) 1.9 2.4 2.6 2.7 2.9 3.2 3.4 3.4 3.6 3.7 3.8 4.1
Adult weight (kg) 47.6 53.1 52.2 56.2 57.6 59.9 55.3 58.5 56.7 59.9 63.5 61.2
a In this investigation, which would be the RV and which would be the EV?
b Construct a scatterplot.
c Use the scatterplot to:
i comment on the association between adult weight and birth weight in terms of
direction, outliers, form and strength
ii estimate the value of Pearson’s correlation coefficient, r.
d Determine the equation of the least squares regression line, the coefficient of
determination and the value of Pearson’s correlation coefficient, r. Write answers
rounded to three significant figures.
e Interpret the coefficient of determination in terms of adult weight and birth weight.
f Interpret the slope in terms of adult weight and birth weight.
g Use the regression equation to predict the weight of an adult with a birth weight of:
i 3.0 kg ii 2.5 kg iii 3.9 kg.
Give answers correct to one decimal place.
h It is generally considered that birth weight is a ‘good’ predictor of adult weight. Do
you think the data support this contention? Explain.
i Construct a residual plot and use it to comment on the appropriateness of assuming
that adult weight and birth weight are linearly associated.
Bivariate data Bivariate data are data in which each observation involves recording
Assign- information about two variables for the same person or thing. An
example would be the heights and weights of the children in a
Linear regression The process of fitting a line to data is known as linear regression. The
association can then be described by a rule of the form y = a + bx
In this equation:
y is the response variable
x is the explanatory variable
a is the y-intercept
b is the slope of the line.
Residuals The vertical distance from a data point to the straight line is called a
residual: residual value = data value − predicted value.
Least squares The least squares method is one way of finding the equation of a
method regression line. It minimises the sum of the squares of the residuals. It
works best when there are no outliers.
Determining The equation of the least squares regression line is given by y = a + bx,
the values of a where:
and b from the rsy
formulas the slope (b) is given by b=
the intercept (a) is then given by a = y − bx
r is the correlation coefficient
s x and sy are the standard deviations of x and y
x and y are the mean values of x and y.
Interpreting the For the regression line y = a + bx:
intercept and
the slope (b) tells us on average the change in the response variable
(y) for each one-unit increase or decrease in the explanatory variable
the intercept (a) tells us on average the value of the response variable
(y) when the explanatory variable (x) equals 0.
Consider for example the regression line
cost = 1.20 + 0.06 × number o f pages
The slope of the regression line tells us that on average the cost of a
textbook increases by 6 cents ($0.06) for each additional page.
The intercept of the line tells us that on average that a book with no
pages costs $1.20 (this might be the cost of the cover).
Interpolation and Predicting within the range of the values of the explanatory variable is
extrapolation called interpolation, and will give a reliable prediction.
Predicting outside the range of the values of the explanatory variable is
called extrapolation, and will give an unreliable prediction.
Residual plots Residual plots can be used to test the linearity assumption by plotting
the residuals against the EV.
A residual plot that appears to be a random collection of points
clustered around zero supports the linearity assumption.
A residual plot that shows a clear pattern indicates that the association is
not linear.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
3A 1 I can determine the equation of the least squares regression line using the
3A 2 I can determine the correlation coefficient from the slope of the least squares
regression line using the formula.
3A 3 I can determine the equation of the least squares regression line using a CAS
3C 10 I can use a CAS calculator to generate all of the analyses required for a
regression analysis.
Multiple-choice questions
1 When using a least squares line to model a relationship displayed in a scatterplot, one
key assumption is that:
A there are two variables B the variables are related
C the variables are linearly related D r2 > 0.5
E the correlation coefficient is positive
4 Given that b = 1.328, s x = 1.871 and sy = 3.391, the correlation coefficient, r, is closest
A 0.357 B 0.598 C 0.733 D 0.773 E 1.33
On average, for each additional km/hr of speed, the distance taken to come to a stop
A decreased by 1.14 metres B decreased by 0.79 metres
C increased by 1.14 metres D increased by 0.95 metres
E increased by 0.79 metres
7 The least squares regression line y = 8 − 9x predicts that, when x = 5, the value of y is:
A −45 B −37 C 37 D 45 E 53
10 Using a least squares regression line, the predicted value of a data value is 78.6. The
residual value is –5.4. The actual data value is:
A 73.2 B 84.0 C 88.6 D 94.6 E 424.4
11 The equation of the least squares line plotted on the y
scatterplot opposite is closest to: 10
A y = 8.7 − 0.9x 8
B y = 8.7 + 0.9x 6
C y = 0.9 − 8.7x 5
D y = 0.9 + 8.7x 3
E y = 8.7 − 0.1x 1
0 x
0 1 2 3 4 5 6 7 8 9 10
The following information relates to Questions 13 to 16.
Weight (in kg) can be predicted from height (in cm) from the regression line:
weight = −96 + 0.95 × height, with r = 0.79
15 Noting that the value of the correlation coefficient is r = 0.79, we can say that:
A 62% of the variation in weight can be explained by the variation in height
B 79% of the variation in weight can be explained by the variation in height
C 88% of the variation in weight can be explained by the variation in height
D 79% of the variation in height can be explained by the variation in weight
E 95% of the variation in height can be explained by the variation in weight
16 A person of height 179 cm weighs 82 kg. If the regression equation is used to predict
their weight, then the residual will be closest to:
A –8 kg B 3 kg C 8 kg D 9 kg E 74 kg
The following information relates to Questions 17 to 21.
The scatterplot shows the association 100
between a student’s mark on a test, 90
and the number of days absent during the term. 70
0 1 2 3 4 5 6 7 8
Days absent
18 The median days absent for this group of students is closest to:
A 2 B 3 C 4 D 55 E 62.5
20 There were two students who were absent for 2 days that term. The values of the
residuals for these students are
A 0 B 10 C 60 and 80 D −10 and 10 E −10
21 Using the graph of the least squares line, we predict that a student who is absent for 4
days would receive a mark of about:
A 48 B 51 C 62 D 65 E 67
22 The table below shows the weight in grams and the length in cm for a certain species of
Length(cm) 13.5 14.3 16.3 17.5 18.4 19.0 19.0 19.8 21.2 23.0
Weight(gm) 55 60 90 120 150 140 170 145 200 273
A least squares line which enables a a fish’s weight to be predicted from their length is
fitted to the data. The number of times that the fish’s predicted weight is greater than
their actual weight is:
A 3 B 4 C 5 D 6 E 7
23 The value of the correlation coefficient r for these data is equal to 0.965. The
percentage of variation in fish weight which is not explained by the length of the fish is
closest to:
A 96.5% B 93.1% C 9.3% D 6.9% E 3.5%
a Determine to the nearest whole number:
i the median age of these aircraft.
ii the mean and standard deviation of the airspeed of these aircraft.
To investigate the association between the number of seats and airspeed, a least squares
line is fitted to the data. The response variable in this investigation is airspeed.
b Determine and write down the equation of the least squares line in terms of number
of seats and airspeed. Round the intercept and slope to 3 significant figures.
c Determine and write down the percentage of variation in the airspeed that is
explained by the number of seats. Write the answer rounded to 1 decimal place.
2 In an investigation of the relationship between the hours of sunshine (per year) and
days of rain (per year) for 25 cities, the least squares regression line was found to be:
hours o f sunshine = 2850 − 6.88 × days o f rain, with r2 = 0.484
Use this information to complete the following sentences.
a In this regression equation, the explanatory variable is .
b The slope is and the intercept is .
c The regression equation predicts that a city that has 120 days of rain per year will
have hours of sunshine per year.
d The slope of the regression line predicts that the hours of sunshine per year will
by hours for each additional day of rain.
e r= , correct to three significant figures.
f % of the variation in sunshine hours can be explained by the variation in
g One of the cities used to determine the regression equation had 142 days of rain and
1390 hours of sunshine.
i The regression equation predicts that it has hours of sunshine.
ii The residual value for this city is hours.
h Using a regression line to make predictions within the range of data used to
determine the regression equation is called .
25 30 35 40 45 50 55 60 65 70 75 80
Number of meals
4 We wish to find the equation of the least squares regression line that will enable height
(in cm) to be predicted from femur (thigh bone) length (in cm).
a Which is the RV and which is the EV?
b Use the summary statistics femur length height
shown to determine the
mean 24.246 166.092
equation of the least
squares regression line standard deviation 1.873 10.086
that will enable height to correlation coefficient r = 0.9939
be predicted from femur
Write the equation in terms of height and femur length. Give the slope and intercept
rounded to three significant figures.
c Interpret the slope of the regression equation in terms of height and femur length.
d Determine the value of the coefficient of determination and interpret in terms of
height and femur length.
Arm span is also associated with height. A least squares regression line that can be
used to model this association is:
height = 0.498 + 0.926× arm span
In determining this equation, the arm span height
summary statistics displayed in the table
mean 169.615 166.092
were also calculated.
standard deviation 10.761 10.086
e Determine the percentage of the variation in height explained by the variation in
arm span. Write the answer as a percentage rounded to one decimal place.
Height (cm)
scatterplot is shown below. 120
0 2 4 6 8 10 12
Age (years)
Height (cm) 86.5 95.5 103.0 109.8 116.4 122.4 128.2 133.8 139.6 145.0
Age (years) 2 3 4 5 6 7 8 9 10 11
The task is to determine the equation of a least squares regression line that can be used
to predict height from age.
a In this analysis, which would be the RV and which would be the EV?
b Use the scatter plot to describe the association between age and height in terms of
strength and direction.
c Use your calculator to confirm that the equation of the least squares regression
line is: height = 76.64 + 6.366 × age and r = 0.9973.
d i Use the regression line to show that the predicted height of a one-year old is 83.0
cm, rounded to 3 significant figures.
ii In making this prediction are you extrapolating or interpolating?
e Interpret the slope of the least squares line in terms of height and age.
f Determine the percentage of the variation in height of these children explained by
their age. Round your answer to 1 decimal place.
g Use the least squares regression equation to:
i predict the height of the 10-year-old child in this sample
ii determine the residual value for this child.
h i Confirm that the residual plot for this
analysis is shown opposite.
ii Explain why this residual plot suggests that
a linear equation is not the most appropriate
model for this association.
2 3 4 5 6 7 8 9 1011
6 The heart rate (in beats/minute) was measured and recorded for a group of 13 students.
The students then completed the same set of exercises and their heart rate measured
again immediately on completion. The scatterplot below shows the students’ heart
rate after exercise plotted against their heart rate before exercise, with a least squares
regression line fitted. Also shown is the residual plot for this line.
60 65 70 75 80 85 90 95 100
Heart rate before exercise (beats/min)
a Describe the association between heart rate before exercise and heart rate after
exercise in terms of strength, direction and form.
The equation of the least squares line is:
heart rate after exercise = 85.671 + 0.561 × heart rate before exercise
b i Use the equation to predict heart rate after exercise when heart rate before
exercise is 100 beats/minute. Round to the nearest whole number.
ii Are you extrapolating or interpolating?
c The person with a heart rate of 122 beats/minute after exercise had a heartbeat of 76
beats/minute before exercise. If the least squares line is used to predict this person’s
heart rate after exercise, determine the value of the residual. Give your answer
rounded to one decimal place.
d i What assumption about the form of the association can be tested using a residual
ii Referring to the residual plot, explain why this assumption is satisfied.
Chapter questions
I What is a squared transformation and when is it used?
I What is a log transformation and when is it used?
I What is a reciprocal transformation and when is it used?
I How do I interpret a least squares line fitted to transformed data?
I How do I use a least squares line fitted to transformed data for
I How do I use a residual plot to assess the effectiveness of a data
I How do I use the coefficient of determination to assess the effectiveness of
a data transformation?
You may recall from your study of Variation in General Mathematics 12 that
a non-linear association could be transformed into a linear association using
data transformation. The transformations introduced were the squared, log and
reciprocal transformations. In this chapter we consider the effect of each of these
three transformations when applied to one axis only (either x or y, but not both), using
them to linearise scatterplots. This is the first step towards solving problems involving
non-linear associations.
y Spreads out the high y-values y
relative to the lower y-values,
leaving the x-values unchanged.
This has the effect of straightening
out curves like the one shown
The following example shows how the x-squared transformation works in practice.
A base jumper leaps from the top of a cliff, 1560 metres above the valley floor. The
scatterplot below shows the height (in metres) of the base jumper above the valley floor
every second, for the first 10 seconds of the jump.
Height (metres)
a Apply a squared transformation to the
variable time, and determine the least squares 1300
regression line for the transformed data. 1200
b Use the least squares equation to predict to the
nearest metre the height of the base jumper
after 3.4 seconds. 1000
0 1 2 3 4 5 6 7 8 9 10
Time (seconds)
a Applying the squared transformation involves changing the scale on the time axis to
time2 .
From the plot opposite we see that the 1600
association between height and time2 is now 1500
Height (metres)
Now that we have a linearised scatterplot, 1300
we can use a least squares line to model the
association between height and time2 .
The equation of this line is:
height = 1560 − 4.90 × time2 1000
0 10 20 30 40 50 60 70 80 90 100
b Like any regression line, we can use its equation to make predictions. After 3.4
seconds, we predict that the height of the base jumper is:
height = 1560 − 4.90 × 3.42 = 1503 m (to nearest m)
The next example shows how the y-squared transformation works in practice.
a Apply a squared transformation to the variable yield,
and determine the least squares regression line for the
transformed data.
b Use the least squares equation to predict the yield of a 2
0 2 4 6 8 10
plant given 6.5 mL of fertiliser, giving your answer to fertiliser(ml)
1 decimal place.
a Applying the y-squared transformation involves changing the scale on the y-axis to
yield2 .
From the plot opposite we see that the association 30
yield 2
a least squares line to model the association between
yield2 and fertiliser.
The equation of this line is: 0
yield2 = 4.45 + 2.29 × f ertiliser 0 2 4 6
8 10
Time 0 1 2 3 4 5 6 7 8 9 10
Height 1560 1555 1540 1516 1482 1438 1383 1320 1246 1163 1070
a Construct a scatterplot displaying height (the RV) against time (the EV).
b Apply an x-squared transformation and fit a least squares line to the transformed data.
c Use the regression line to predict the height of the base jumper after 3.4 seconds.
1 Start a new document by pressing / + N .
2 Select Add Lists & Spreadsheet.
Enter the data into lists named time and height, as
Time 0 1 2 3 4 5 6 7 8 9 10
Height 1560 1555 1540 1516 1482 1438 1383 1320 1246 1163 1070
a Construct a scatterplot displaying height (the RV) against time (the EV).
b Apply an x-squared transformation and fit a least squares line to the transformed data.
c Use the regression line to predict the height of the base jumper after 3.4 seconds.
1 In the Statistics application enter the data into lists named
time and height.
2 Name the third list timesq (short for time squared).
3 Place the cursor in the calculation cell at the bottom of the
third column and type time^2. This will calculate the values
of time2 .
Let time be the explanatory variable (x) and height the
response variable (y).
sheet Exercise 4A
0 x
0 1 2 3 4
a Linearise the scatterplot by applying an x-squared transformation and fit a least
squares line to the transformed data.
b Give its equation.
c Use the equation to predict the value of y when x = –2.
Diameter Number 5
Number of people
0.50 1 4
0.70 2 3
0.85 3
1.00 4
1.10 5
0.2 0.4 0.6 0.8 1 1.2
Diameter (metres)
a Apply the squared transformation to the variable diameter and determine the least
squares regression line for the transformed data. Diameter is the EV.
Write the slope and intercept of this line, rounded to one decimal place, in the
spaces provided.
number = + × diameter2
b Use the equation to predict the number of people who can be sheltered by an
umbrella of diameter 1.3 m. Give your answer rounded to the nearest person.
7 The time (in minutes) taken for a local anaesthetic to take effect is associated with
to the amount administered (in units). To investigate this association a researcher
collected the following data.
Amount 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
Time 3.7 3.6 3.4 3.3 3.2 3.0 2.9 2.7 2.5 2.3 2.1
The association between the variables amount and time is non-linear as can be seen
from the scatterplot below. A squared transformation applied to the variable time will
linearise the scatterplot.
a Apply the squared transformation 4
to the variable time and fit a least
Time (minutes)
A student uses the data in the table below to construct the scatterplot shown:
x y 300
1 264 250
2 239 200
3 234 y 150
4 208 100
5 182 50
6 164
7 98 0 1 2 3 4 5 6 7 8
8 49
9 A y2 transformation could also be used to linearise this association. A least squares line
is fitted to the transformed data, with y2 as the response variable, and the equation of
the least squares line is
y2 = 79973 − 9533.4x
Using this equation, the value of y when x = 4 is closest to:
A 205 B 208 C 247 D 531 E 41839
10 The association between the cost of a certain precious stone (in $) and its weight (in
mg) is non-linear. A squared transformation was applied to the explanatory variable
weight, and a least squares line fitted to the transformed data. The equation of the least
squares line is:
cost = 2370 + 0.238 × weight2
Using this equation, the cost of a precious stone weighing 75mg is closest to:
A $2389 B $3709 C $2689 D $7995 E $177,768
You will recall from Chapter 1 that the shape of a highly skewed single variable distribution
could be changed to become more symmetric by changing the scale from x to log10 x.
When applied to bivariate data, the effect of the logarithmic transformation is to again
to compress the upper end of the scale on either the x- or the y-axis, potentially linearising
a non-linear association. The effect of applying the log10 x and log10 y transformations
(separately) to a scatterplot are illustrated graphically below.
log10 y Compresses larger y values relative to y
the smaller y values
This has the effect of straightening out
curves like the one shown.
The general wealth of a country, often measured by its Gross Domestic Product (GDP),
is one of several variables associated with lifespan in different countries. However, the
association is not linear, as can be seen in the scatterplot below which plots lifespan (in
years) against GDP per person (in dollars) for 13 different countries.
The scatterplot shows that there is a strong positive 83
association between the lifespan and GDP. 81
Lifespan (years)
a Apply a log transformation to the variable GDP, 77
and determine the least squares regression line 75
for the transformed data. 73
b Use the least squares equation to predict the
lifespan of a country with a GDP of $20 000 67
per person, giving your answer rounded to one
decimal place. 0 10000 20000 30000 40000 50000
a Applying the log x transformation involves changing the scale on the x-axis to
When we make this change, we see that the 83
association between the variables lifespan and 81
Lifespan (years) 79
log (GDP) is linear. See the plot opposite.
Note: On the plot, when log (GDP) = 4, the actual GDP is
104 or $10 000.
We can now fit a least squares line to model the 71
association between the variables lifespan and 69
log(GDP). 67
The numbers of cases of a very infectious disease were recorded over a 12 day period.
The association is not linear, as can be seen in the scatterplot below which plots cases
against days.
a Apply a log transformation to the variable 4000
cases, and determine the least squares 3000
regression line for the transformed data. 2000
b Use the least squares equation to predict the
cases on day 13. 0 2 4 6 8 10 12
a Applying the log y transformation involves changing the scale on the y-axis to
When we make this change, we see from the 4
plot the association between the variables
log(cases) and day is linear.
Note: On the plot, when log (cases) =3, the actual
number of cases is 103 or 1000.
We can now fit a least squares line to 2
model the association between the variables
log(cases) and day. 1
0 2 4 6 8 10 12
The equation of this line is:
log (cases)= 1.046 + 0.227 × day
b Using this equation, on day 13 the number of cases is predicted as:
log (cases)= 1.046 + 0.227 × 13 = 3.997
To find the number of cases we use the calculator to evaluate 103.997 = 9931 cases (to
the nearest whole number).
1 Start a new document by pressing / + N .
2 Select Add Lists & Spreadsheet.
Enter the data into lists named lifespan and gdp.
3 Name column C as lgdp (short for log (GDP)).
Now calculate the values of log (GDP) and store
them in the list named lgdp.
4 Move the cursor to the formula cell below the lgdp
We need to enter the expression = log(gdp).
To do this, press = then type in log(gdp). Pressing
· calculates and displays the values of lgdp.
1 In the Statistics application enter the data into lists named
Lifespan and GDP.
2 Name the third list logGDP.
3 Place the cursor in the calculation cell at the bottom of the
third column and type log (GDP).
Let GDP be the explanatory variable (x) and lifespan the
response variable (y).
6 Write the equation in terms of lifespan lifespan = 54.3 + 5.59 × log (GDP)
and log (GDP).
7 Substitute 20 000 for GDP in the lifespan = 54.3 + 5.59 × log 20 000
equation. = 78.3 years
sheet Exercise 4B
1.5 2 2
2 3 1.5
3 3 1
4 3.5 0
5 4 0 1 2 3 4 5 6 7
Time (minutes)
6 3.5
7 3.9
7 3.6
A log transformation can be applied to the variable time to linearise the scatterplot.
a Apply the log transformation to the variable time and fit a least squares line to the
transformed data. log (time) is the EV.
Write the slope and intercept of this line, rounded to two significant figures, in the
spaces provided.
b Use the equation to predict the level of performance (rounded to one decimal place)
for a person who spends 2.5 minutes practising the task.
7 The table below shows the number of internet users signing up with a new internet
service provider for each of the first nine months of their first year of operation.
A scatterplot of the data is also shown.
Month Number 120
1 24
2 32 60
3 35 20
4 44 0
0 1 2 3 4 5 6 7 8 9
5 60 Month
6 61
7 78
8 92
9 118
The association between number and month is non-linear.
a Apply the log transformation to the variable number and fit a least squares line to
the transformed data. Month is the EV.
Write the slope and intercept of this line, rounded to four significant figures, in the
spaces provided.
log (number) = + × month
b Use the equation to predict the number of internet users after 10 months. Give
answer to the nearest whole number.
9 The association between the power of a car (in horsepower) and the time it takes to
accelerate from 0 to 100 km/hr (in seconds) is non-linear. A log transformation was
applied to the explanatory variable horsepower, and a least squares line fitted to the
transformed data. The equation of the least squares line is:
time = 42.7 − 13.9 × log(horsepower)
Using this equation, the time it would take for a car with 180 horsepower to accelerate
from 0 to 100km/hr is closest to:
A 29.5 seconds B 65 seconds C 28.8 seconds D 11.3 seconds E 11.4 seconds
10 The price of shares in a newly formed technology company price has increased
non-linearly since the company was formed 12 months ago. A log transformation was
applied to the maximum share price each month (share price), and a least squares line
fitted to the transformed data, with month as the explanatory variable. The equation of
the least squares line is:
log(shareprice) = 1.39 + 0.050 × month
Using this equation, the maximum monthly share price in month 14 is closest to:
A $123.03 B $2.09 C $8.08 D $20.16 E $25.25
The reciprocal transformation is a stretching transformation that compresses the upper end
of the scale on either the x- or y-axis.
The following example shows how the 1/x transformation works in practice.
After embarking on a new healthy eating and exercise plan, Ben recorded his weekly
weight loss over a 10 week. The association is not linear, as can be seen in the scatterplot
below which plots weekly weight loss in kg against length of diet in weeks.
The scatterplot shows that there is a strong negative 6
weekly weight loss (kg)
a Applying the 1/x transformation involves changing the scale on the x-axis to 1/(length
of diet).
When we make this change, we see from 6
The following example shows how the 1/y transformation works in practice.
The scatterplot shows that there is a strong negative association between the width of the
sticky labels and their lengths, but it is clearly non-linear.
a Apply a reciprocal transformation to the variable width, and determine the least
squares regression line for the transformed data.
b Use the least squares equation to predict the width of a sticky label which is 5 cm long,
giving your answer to two decimal places.
a Applying the 1/y transformation involves changing the scale on the y-axis from width
to 1/(width).
When we make this change, we see from the 0.6
scatterplot that the association between 1/width 0.55
and length is linear. 0.5
We can now fit a least squares line to model the 0.45
association between 1/width and length.
The equation of this line is:
1/width = 0.015 + 0.086 × length 0.25
3.5 4 4.5 5 5.5 6 6.5 7
Length (cm)
b For a sticky label of length 5 cm, we would predict that:
1 Start a new document by pressing / + N.
2 Select Add Lists & Spreadsheet.
Enter the data into lists named length and width.
3 Name column C as recipwidth (short for 1/width).
Calculate the values of recipwidth.
Move the cursor to the formula cell below the
recipwidth heading. Type in =1/width. Press ·
to calculate the values of recipwidth.
1 Open the Statistics application and enter the data into lists
named length and width.
2 Name the third list recwidth (short for reciprocal width).
3 Place the cursor in the calculation cell at the bottom of the
third column and type 1/width. This will calculate all the
reciprocal values of the width.
Let length be the explanatory variable (x) and width the
response variable (y).
4 Construct a scatterplot of
1/width against length.
Tap and complete
the Set StatGraphs
dialog box as shown.
Tap to view the
The plot is now clearly
5 Fit a regression line to the
transformed data.
Go to Calc, Regression,
Linear Reg.
Complete the Set
Calculation dialog box as
shown and tap OK.
This generates the
regression results.
Note: The y in the linear
equation corresponds to the
transformed variable 1/width;
that is 1/y.
Tap OK a second time to plot and display the line on the scatterplot.
sheet Exercise 4C
7.3 125 120
12.6 75 110
7.1 110 90
6.3 138 80
10.1 88 60
10.5 80 4 6 8 10 12 14 16
Fuel consumption
14.6 70 (km/litre)
10.9 100
7.7 103
a Apply the reciprocal transformation to the variable fuel consumption and fit a least
squares line to the transformed data. Horsepower is the RV.
Write the intercept and slope of this line in the boxes provided, rounded to three
significant figures.
horsepower = + ×
f uel consumption
b Use the equation to predict the horsepower of a car with a fuel consumption of
9 km/litre.
6 Ten students were given an opportunity to practise a complex matching task as often as
they liked before they were assessed. The number of times they practised the task and
the number of errors they made when assessed are given in the table.
Times Errors
1 14 12
2 9 10
2 11 8
4 5
5 4 2
6 4 0
0 2 4 6 8 10
7 3 Times
7 3
9 2
a Apply the reciprocal transformation to the variable errors and determine the least
squares regression with the number of times the task was practiced as the EV.
Write the intercept and slope of this line in the boxes provided, rounded to two
significant figures.
= + × times
b Use the equation to predict the number of errors made when the task is practised
six times.
8 0.58 .50
11 0.43 .40
14 0.39 .30
22 0.24 .20
26 0.19 .10
35 0.13
41 0.10 0 5 10 15 20 25 30 35 40 45 50
50 0.12
8 The association between score on a problem solving test (score) and the number of
attempts a person has at the test (attempts) is non-linear. A reciprocal transformation
was applied to the explanatory variable attempts, and a least squares line fitted to the
transformed data. The equation of the least squares line is:
score = 50 − 22.8 ×
Using this equation, the score that a person achieves on their fourth attempt is closest
A 6.8 B 27.2 C 55.7 D 41.2 E 44.3
9 The price of shares in a newly formed technology company price has increased non-
linearly since the company was formed 12 months ago. A reciprocal transformation
was applied to the maximum share price each month (share price), and a least squares
line fitted to the transformed data, with month as the explanatory variable. The
equation of the least squares line is:
= 0.0349 − 0.00215 × month
Using this equation, the maximum monthly share price in month 14 is closest to:
A $2.18 B $28.78 C $208.33 D 0.48 cents E 48 cents
The types of scatterplots that can be transformed by the squared, log or reciprocal
transformations can be fitted together into what we call the circle of transformations.
log x y2
log y log y
1 1
y y
log x
The scatterplot has a consistently increasing trend so the circle of transformations applies.
Comparing the scatterplot to those in the circle of transformations we see that the x2 , 1/y
and log x transformations all have the potential to linearise this scatterplot. All of these
transformations have been applied in turn, and the resulting scatterplots and residual plots
are shown in the following table.
Residual (x squared)
Age of tree (years)
30 2
20 –2
0 –8
0 100 200 300 400 0 100 200 300 400
Diameter squared Diameter squared
1 .3
y 0.8
Residual (1/y)
.2 0.6
.1 0.0
.0 –.06
0 5 10 15 20 0 5 10 15 20
Diameter (cm) Diameter (cm)
log y 1.8 .2
Residual(log y)
1.4 .1
.8 –.1
.4 –.2
0 5 10 15 20 0 5 10 15 20
Diameter (cm) Diameter (cm)
Applying each of these transformations in turn we can see from the residual plots that
both the x2 and the log y transformations have been quite effective in linearising the
association between the age of the tree and its diameter. There still seems to be a curve in
the residual plot after the the 1/y transformation so that has been less effective.
To further help to choose the best transformation we can compare the values of r2 , the
coefficient of determination.
For the x2 transformation, r2 = 92.7%
For the 1/y transformation, r2 = 75.7%
For the log y transformation, r2 = 90.2%
Both the x2 and log y transformations have a very high explanatory power, and
either would seem to be acceptable. When more than one transformation is doing a
reasonable job of linearising the association, and they have similar value of r2 then the
transformation which is easier to interpret in terms of the variables is preferred. In this
case diameter2 makes more sense in that it tells us that the age of the tree relates to
the cross sectional area of the tree. The log transformation does not have an equivalent
meaningful interpretation.
We can now fit a least squares line to model the association between age and diameter2
The equation of this line is:
age = 5.098 + 0.091 × diameter2
At this stage you might find it helpful to use the interactive ‘Data transformation’ (accessible
through the Interactive Textbook) to see how these different transformations can be used to
linearise scatterplots.
Exercise 4D
Example 7 1 The scatterplots below are non-linear. For each, identify the transformations x2 , log x,
1/x, y2 , log y, 1/y or none that might be used to linearise the plot.
a 5 b 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
c 5 d 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
2 The data below gives the yield in kilograms and length in metres of 12 commercial
potato plots.
a Construct a scatterplot showing the association between yield in kilograms (the RV)
and length of the plot in metres (the EV).
b Fit a least squares regression line to the data. Write down the equation in terms of
the variables in the question, giving the values of the intercept and slope rounded to
four significant figures.
c Construct a residual plot, and comment on whether the linearity assumption has
been met.
d Use the circle of transformations to select which transformations could be
considered in order to linearise the association.
e Using an appropriate transformation, recommend a regression model for the
association between yield and length of the plot. Write down the equation in terms
of the transformed variables, giving the values of the intercept and slope rounded to
four significant figures.
f What is the value of r2 for the recommended model? Give your answer as a
percentage rounded to one decimal place.
3 In order to investigate the association between the average number of cigarettes per day
per smoker (smoking) and the cost of cigarettes in $ per cigarette (cost) for a group of
countries the following data was collected.
cost ($ ) 0.67 0.75 0.80 0.92 1.00 1.08 1.17 1.25 1.30 1.40
smoking 16.7 15.5 14.8 13.4 12.5 12.0 11.1 10.9 10.3 9.5
a Construct a scatterplot showing the association between smoking (in cigarettes/day)
(the RV) and cost ($/cigarette) (the EV).
b Fit a least squares regression line to the data. Write down the equation in terms of
the variables in the question, giving the values of the intercept and slope rounded to
four significant figures.
c Construct a residual plot, and comment on whether the linearity assumption has
been met.
d Use the circle of transformations to select which transformations could be
considered in order to linearise the association.
e Using an appropriate transformation, recommend a regression model for the
association between smoking and cost. Write down the equation in terms of the
transformed variables, giving the values of the intercept and slope rounded to four
significant figures.
f What is the value of r2 for the recommended model? Give your answer as a
percentage rounded to one decimal place.
4 The following data shows the population density in people per hectare (density) and the
distance from the centre of the city in km (distance) for a large city.
density 307.58 294.67 283.93 270.82 234.93 175.08 101.56 49.80
distance 0 2 4 6 8 10 12 14
Key ideas and chapter summary
Squared The squared transformation stretches out the upper end of the scale on
transformation an axis.
Logarithmic The logarithmic transformation compresses the upper end of the scale
transformation on an axis.
Reciprocal The reciprocal transformation compresses the upper end of the scale
transformation on an axis but to a greater extent than the log transformation.
Residual plots Residual plots are used to assess the effectiveness of a data transforma-
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
Multiple-choice questions
1 Select the statement that correctly completes the sentence:
‘The effect of a squared transformation is to . . . ’
A stretch the high values in the data B maintain the distance between values
C stretch the low values in the data D compress the high values in the data
E reverse the order of the data values
5 The association between two variables y and x, as y
shown in the scatterplot, is non-linear. 5
Which of the following sets of transformations could 4
possibly linearise this association? 3
A log y, 1/y, log x, 1/x B y2 , x2 1
C y2 , log x, 1/x D log y , 1/y, x2 0 x
0 1 2 3 4 5 6 7 8 9 10
E ax + b
7 The following data were collected for two related variables x and y.
x 1 2 3 4 5 6 7 8 9 10 11
y 7 8.6 8.9 8.8 9.9 9.7 10.4 10.5 10.7 11.2 11.1
A scatterplot indicates a non-linear association. The data is linearised using a log
x transformation and a least squares line is then fitted. The equation of this line is
closest to:
A y = 7.52 + 0.37 log x B y = 0.37 + 7.52 log x
C y = −1.71 + 0.25 log x D y = 3.86 + 7.04 log x
E y = 7.04 + 3.86 log x
8 A student uses the data in the table below to construct the scatterplot shown
x y 300000
1 2030 250000
2 1265
3 8265 200000
4 5654
y 150000
5 6893
6 43265 100000
7 67890
8 87803 50000
9 113062
10 286370
0 1 2 3 4 5 6 7 8 9 10
9 The association between the total weight of produce picked from a vegetable garden
and its width is non-linear. An x2 transformation is used to linearise the data.
When a least squares line is fitted to the data, its y-intercept is 10 and its slope is 5.
Assuming that weight is the response variable, the equation of this line is:
A (weight)2 = 10 + 5 × width B width = 10 + 5 × (weight)2
C width = 5 + 10 × (weight)2 D weight = 10 + 5 × (width)2
E (weight)2 = 5 + 10 × weight
10 A model that describes the association between the hours spent studying for an exam
and the mark achieved is:
mark = 20 + 40 × log (hours)
From this model, we would predict that a student who studies for 20 hours would score
a mark (to the nearest whole number) of:
A 80 B 78 C 180 D 72 E 140
Written response questions
1 The table below shows the age in years (age) and the length in metres (length), for a
group of 18 dugongs. A scatterplot of the data is also shown.
age length age length
1.0 1.80 8.0 2.47
age (years)
1.5 1.85 8.5 2.19
1.5 1.87 9.0 2.26 8
1.5 1.77 9.5 2.40 6
2.5 2.02 9.5 2.39 4
4.0 2.27 10.0 2.41
5.0 2.15 12.0 2.50
5.0 2.26 12.0 2.32 1.6 1.8 2.0 2.2 2.4 2.6
7.0 2.35 13.0 2.43 length (metres)
The scatterplot shows that the association is clearly non-linear. A reciprocal
transformation can be applied to the variable age to linearise the association.
a Apply the reciprocal transformation to the data and use the transformed data
to determine the equation of a least squares line that enables to be predicted
from length. Write the values of the intercept and slope in the the appropriate boxes
provided. Round to four significant figures.
= − × length
b The association can also be linearised by applying a log transformation to the
variable age. When this is done, and a least squares line fitted to the transformed
data, the resulting equation is:
Use this equation to predict the age of a dugong with a length of 2.00 metres. Round
the answer to one decimal place.
2 The table below shows the percentage of people who can read (literacy rate) and the
gross domestic product (GDP), in dollars/person, for a selection of 14 countries. A
scatterplot of the data is also shown.
The scatterplot can be linearised by using a log x transformation.
GDP literacy rate
2677 72 80
a Apply the log transformation to the variable GDP, and fit a least squares line to the
transformed data. Write down its equation terms of the variables literacy rate and
log (GDP). Give the slope and intercept rounded to three significant figures.
b Verify that the log transformation has linearised the association by constructing a
residual plot.
c Use the regression equation to predict the literacy rate of a country with a GDP of
$10 000 to the nearest percent.
d Find the value of the residual when the regression equation is used to predict the
literacy rate when the GDP is equal to $19 860. Give your answer rounded to two
significant figures.
3 Measurements of the distance travelled (metres) and time taken (seconds) were made
on a falling body. The data are given in the table below.
time 0 1 2 3 4 5 6
distance 0 5.2 18.0 42.0 79.0 128.0 168.0
4 Is the infant mortality rate in a country associated with the number of doctors in
that country? The data below gives infant mortality rate in deaths per 1000 births
(mortality) and the number of doctors per 100 000 of population (doctors) for 14
mortality 12 13 12 10 10 7 111
doctors 192 222 154 182 179 204 61
mortality 15 10 20 54 75 121 71
doctors 270 271 357 79 59 27 52
Chapter questions
I What is time series data?
I How do we construct a time series plot?
I How do we recognise features such as trend, seasonality and irregular
I How do we smooth a time series plot using moving means?
I How do we smooth a time series plot using moving medians?
I How do we calculate and interpret seasonal indices?
I How do we calculate and interpret a trend line?
I How do we make forecasts of future values?
In this chapter we will focus on a special case of numerical bivariate data, called time
series data. In time series data the explanatory variable is always a measure of time (for
example hour, day, month or year), and we are concerned with understanding how the
response variable is changing over time.
Since time series data is just a special kind of two numerical variable example, where
the explanatory variable is time, we will begin by drawing a scatterplot of the data. In
this instance, the scatterplot is called a time series plot, with time always placed on the
horizontal axis. A time series plot differs from a normal scatterplot in that, in general, the
points will be joined by line segments in time order. An example of a time series plot, of the
road accident fatality data, is given below.
1980 1990 2000 2010 2020 2030
Looking at the time series plot, we can readily see a clear trend of decreasing road fatalities,
which is good news for drivers, as this provides some evidence that the many efforts being
made to reduce the road toll across Australia have been effective.
Maximum temperature was recorded each day for a week in a certain town. Construct a
time series plot of the data.
Explanation Solution
1 In a time series plot, time (day Day is the EV – this will label the horizontal axis.
in this case) is always the Temperature is the RV – this will label the vertical
explanatory variable (EV) and axis.
is plotted on the horizontal
2 Determine the scales for each A horizontal scale from 0–7 with intervals of 1 for
axis. each day would be suitable.
Temperature ranges from 20–36. A vertical scale
from 15–40 with intervals of 5 would be suitable.
3 Set up the axes, and then plot 40
all seven data points as for a 35
Temp (°C)
scatterplot. 30
Mon Tues Wed Thur Fri Sat Sun
4 Complete the graph, by 40
joining consecutive data 35
Temp (°C)
Most real-world time series data come in the form of large data sets that are best plotted with
the aid of a spreadsheet or statistical package. The availability of the data in electronic form
via the internet greatly helps this process. However, in this chapter, most of the time series
data sets are relatively small and can be readily plotted using a CAS calculator.
year 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
birth 1.926 1.920 1.855 1.826 1.814 1.752 1.741 1.740 1.657 1.580
1 Start a new document by pressing / + N.
2 Select Add Lists & Spreadsheet. Enter the data
into lists named year and birth.
year 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
birth 1.926 1.920 1.855 1.826 1.814 1.752 1.741 1.740 1.657 1.580
1 Open the Statistics application and enter the data into the
columns named year and birth. Your screen should look
like the one shown.
We would always expect to see irregular or random fluctuations in a time series, and it is
common to see one or more of the other features as well.
Examining a time series plot we can often see a general upward or downward movement
over time. This indicates a long-term change over time that we call a trend.
The tendency for values in a time series to generally increase or decrease over a
significant period of time is called a trend.
One way of identifying trends on a time series graph is to draw a line that ignores the
fluctuations, but which reflects the overall increasing or decreasing nature of the plot. These
lines are called trend lines.
Trend lines have been drawn on the time series plots below to indicate an increasing trend
(line slopes upwards) and a decreasing trend (line slopes downwards).
trend line
trend line
Time Time
Sometimes, different trends are apparent in a time series for different time periods.
Consider the time series plot of the Australian annual birth rates over the years from 1931
to 1990, shown below. Comment on the trend shown in the plot.
3.2 trend 1
trend 2
Birth rate
2.0 trend 3
There are three distinct trends, which can be seen by drawing trend lines on the plot. Each
of these trends can be explained by changing socioeconomic circumstances.
Trend 1: Between 1940 and 1961 the birth rate in Australia grew quite dramatically.
Those in the armed services came home from the Second World War, and the economy
grew quickly. This rapid increase in the Australian birth rate during this period is known
as the ‘Baby Boom’.
Trend 2: From about 1962 until 1980 the birth rate declined very rapidly. Birth control
methods became more effective, and women started to think more about careers. This
period is sometimes referred to as the ‘Baby Bust’.
Trend 3: During the 1980s, and beyond, the birth rate continued to decline slowly for a
complex range of social and economic reasons.
The term cycle refers to variations in time series that in general last longer than a year.
These variations may not be of a regular height and they may not repeat at regular intervals.
Cycles are recurrent movements in a time series, generally over a period greater than one
Sunspots are darker, cooler area on the surface of the sun. The following plot shows the
sunspot activity for the period 1945 to 2016. Comment on the cycles shown in the plot.
1940 1950 1960 1970 1980 1990 2000 2010 2020
The recurrent pattern in the number of sunspots can be seen clearly from the time series
plot. Looking at the plot the years of lowest sunspot activity look to be at approximately
1954, 1964, 1975, 1986, 1996, 2008.
Many business indicators, such as interest rates or unemployment figures, also vary in
cycles, but their periods are usually less regular.
Cycles with calendar-related periods of one year or less are of special interest and are
referred to as seasonality.
Seasonality is present when there is a periodic movement in a time series that is related
to a calendar-related period – for example a year, a month or a week.
Seasonal movements tend to be more predictable than other time series features, and occur
because of variations in the weather, such as ice-cream sales increasing in the summer, or
institutional factors, like the increase in the number of unemployed people at the end of the
school year.
The plot below shows the total percentage of hotel rooms occupied in Australia by
quarter, over the years 2012–2016. Comment on the seasonality shown in the plot.
Room occupancy rate (%) 73.0
Mar Jun Sep Dec Mar Jun Sep Dec Mar Jun Sep Dec Mar Jun Sep Dec Mar Jun
2012 2013 2014 2015 2016
The regular peaks and troughs in the plot that occur at the same time each year signal the
presence of seasonality. In this case, the demand for accommodation is at its lowest in
the June quarter and highest in the December quarter.
This time series plot reveals both seasonality and trend in the demand for hotel rooms.
The upward sloping trend line signals the presence of a general increasing trend. This
tells us that, even though demand for accommodation has fluctuated from month to
month, demand for hotel accommodation has increased over time.
Structural change
A structural change in a time series is a sudden change in the pattern of the time series at a
point in time.
Structural change
Structural change is present when there is a sudden change in the established pattern of a
time series plot.
The time series plot below shows the power bill for a rental house (in kWh) for the
12 months of a year. Comment on any structural change in the plot.
The plot reveals an abrupt change in power usage in June to July. During this period,
monthly power use suddenly decreases from around 300 kWh per month from January to
June to around 175 kWh for the rest of the year. This is an example of structural change
that can probably be explained by a change in circumstances, for example, from a family
with children to a person living alone.
Structural change is also displayed in the birth rate time series plot we saw earlier. This
revealed three quite distinct trends during the period 1931–1990. These reflect significant
external events (like a war) or changes in social and economic circumstances.
Outliers are individual values that stand out from the general body of data.
The time series plot below shows the daily power bill for a house (in kWh) for a fortnight.
Comment on any outliers in the plot.
Electricity use (kWh)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
For this household, daily electricity use follows a regular pattern that, although
fluctuating, averages about 10 kWh per day. In terms of daily power use, day 4 is a clear
outlier, with less than 2 kWh of electricity used. A follow-up investigation found that, on
this day, the house was without power for 18 hours due to a storm, so much less power
was used than normal.
Exercise 5A
2 Researchers recorded the number of penguins present on a remote island each month
for 12 months. Construct a time series plot of the data.
3 The following table shows the maximum temperature in Melbourne during one week in
March. Construct a time series plot of the data.
Example 3 5 Complete the table below by indicating which of the listed features are present in each
Example 4 of the time series plots shown below.
Plot plot A
Feature A B C 30
Irr fluctuations 25 plot B
Increasing trend 20
Decreasing trend
Cycles 5
plot C
Seasonality 0
2011 2012 2013 2014 2015
Example 5 6 Complete the table below by indicating which of the listed features are present in each
of the time series plots shown below.
Feature A B C
30 plot A
Irr fluctuations
Struc change
Increasing trend 20 plot B
Decreasing trend 15
Seasonality 10
5 plot C
2011 2012 2013 2014 2015
Example 6 7 Complete the table below by indicating which of the listed features are present in each
of the time series plots shown below.
Feature A B C plot A
Irr fluctuations 50
Struc change 40
Increasing trend 30
Decreasing trend
Outliers plot B
plot C
2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
2000 2005 2010 2015 2020
9 The data below shows the population (in millions) in Australia over the period
Year 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Population 22.73 23.13 23.48 23.82 24.19 24.60 24.98 25.37 25.69 25.97
10 The table below shows the motor vehicle theft rate per 100 000 cars in Australia from
2003 to 2018.
Year 2003 2004 2005 2006 2007 2008 2009 2010
Theft rate 500.9 442.4 398.3 367.2 337.6 320.0 274.2 214.8
Year 2011 2012 2013 2014 2015 2016 2017 2018
Theft rate 220.0 228.4 204.2 191.0 194.5 231.0 213.3 214.1
1985 1990 1995 2000 2005 2010 2015 2020
12 The time series plot below shows the number of overseas arrivals (millions of people
per month) in Australia from November 2011 until December 2021. Describe the
features of the plot.
Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan Apr Jul Oct
2015 2016 2017 2018 2019 2020 2021
Smokers (percentage)
and females over the period 1945–92. 60
i Describe any trends in the time series males
plot. females
ii Did the difference in smoking rates
increase or decrease over the period
1945–92? 0
1945 1955 1965 1975 1985 1995
b The table below shows the smoking rates for females and males aged 15 years at
several time points from 2000–2018 (smoking rate data is not collected every year).
Year 2000 2005 2007 2010 2011 2012 2013 2018 2014 2015 2018
Female 22.4 18.9 19.9 17.9 15.4 16.6 14.4 15.6 13.5 14.5 13.6
Male 26.7 22.9 24.9 22.9 19.1 21.9 18.0 20.7 17.0 19.7 18.7
i Use a CAS calculator to construct time series plots of the male and female data.
ii Describe any trends in the time series plot.
iii Did the difference in smoking rates change over the period 2000–2018?
2013 2014 2015 2016 2017 2018 2019
The time series plot is best described as having
A seasonality only B irregular fluctuations only
C seasonality with irregular fluctuations
D an increasing trend with irregular fluctuations
E an increasing trend with seasonality and irregular fluctuations
15 The time series plot below shows the annual sales (in $ millions) for a car sales
2012 2014 2016 2018 2020 2022
A time series plot can incorporate many of the sources of variation previously mentioned:
trend, cycles, seasonality, structural change, outliers and irregular fluctuations. One effect
of the irregular fluctuations and seasonality can be to obscure an underlying trend. The
technique of smoothing can sometimes be used to overcome this problem.
In this section we consider moving mean smoothing, which involves replacing individual
data points in the time series with the mean of the data point and some adjacent points. The
simplest method is to smooth over a small odd number of data points – for example, three or
five, but any number of points can be used.
The three-moving mean
To use three-moving mean smoothing, replace each data value with the mean of that
value and the one on each side. That is, if y1 , y2 and y3 are sequential data values, then:
y1 + y2 + y3
smoothed y2 =
The first and last points in the time series do not have values on each side, so they are
These definitions can be readily extended for moving means involving more points.
The table below gives the temperature (◦ C) recorded at a weather station at 9.00 a.m. each
day for a week.
Explanation Solution
a 1 Write down the three 18.1, 24.8, 26.4
temperatures centred on (18.1 + 24.8 + 26.4)
Mean = = 23.1
Tuesday. 3
2 Find their mean and write down The three-moving mean smoothed
your answer. temperature for Tuesday is 23.1◦ C.
b 1 Write down the five temper- 24.8, 26.4, 13.9, 12.7, 14.2
atures centred on Thursday (24.8 + 26.4 + 13.9 + 12.7 + 14.2)
Mean =
2 Find their mean and write down 5
= 18.4
your answer.
The five-moving mean smoothed
temperature for Thursday is 18.4◦ C.
The next step is to extend these computations to smooth all terms in the time series.
The following table gives the number of births per month over a calendar year in a
country hospital. Use the three-moving mean and the five-moving mean methods,
rounded to one decimal place, to complete the table.
Complete the calculations as shown below.
Month Number of births 3-moving mean 5-moving mean
January 10
10 + 12 + 6
February 12 = 9.3
12 + 6 + 5 10 + 12 + 6 + 5 + 22
March 6 = 7.7 = 11.0
3 5
6 + 5 + 22 12 + 6 + 5 + 22 + 18
April 5 = 11.0 = 12.6
3 5
5 + 22 + 18 6 + 5 + 22 + 18 + 13
May 22 = 15.0 = 12.8
3 5
22 + 18 + 13 5 + 22 + 18 + 13 + 7
June 18 = 17.7 = 13.0
3 5
18 + 13 + 7 22 + 18 + 13 + 7 + 9
July 13 = 12.7 = 13.8
3 5
13 + 7 + 9 18 + 13 + 7 + 9 + 10
August 7 = 9.7 = 11.4
3 5
7 + 9 + 10 13 + 7 + 9 + 10 + 8
September 9 = 8.7 = 9.4
3 5
9 + 10 + 8 7 + 9 + 10 + 8 + 15
October 10 = 9.0 = 9.8
3 5
10 + 8 + 15
November 8 = 11.0
December 15
The result of this smoothing can be seen in the plot below, which shows the raw data,
the data smoothed with a three-moving means and the data smoothed with a five-moving
20 raw data
Number of births
3-moving mean
15 5-moving mean
Note: In the process of smoothing, data points are lost at the beginning and end of the time series.
Smoothing with centring involves taking a two-moving mean of the already smoothed
values so that they line up with the original data values. Smoothing with centring is only
required when smoothing using an even number of data values, for example 2-moving
mean smoothing, or 4-moving mean smoothing.
In practice, we do not have to draw such a diagram to perform these calculations. The
purpose of doing so is to show how the centring process works. Calculating a two-moving
mean with centring is illustrated in the following example.
The temperatures (◦ C) recorded at a weather station at 9 a.m. each day for a week are
displayed in the table.
Calculate the two-moving mean smoothed temperature with centring for Tuesday.
Explanation Solution
1 For two-mean smoothing with centring, 18.1 24.8 26.4
write down the three data values
centred on Tuesday (highlighted in
(18.1 + 24.8)
2 Calculate the mean of the first two mean 1 = = 21.45
values (mean 1). Calculate the mean of
(24.8 + 26.4)
the second two values (mean 2). mean 2 = = 25.60
(mean 1 + mean 2)
3 The centred mean is then the average of Centred mean =
mean 1 and mean 2. (21.45 + 25.60)
= 23.525
4 Write down your answer, rounded to The two-moving mean smoothed
one decimal place. temperature for Tuesday is 23.5◦ C.
The process of smoothing with centring across more data values is the same as two-mean
smoothing except that the means are determined in larger groups. This process is illustrated
in the following example with groups of four and six.
The table below gives the temperature (◦ C) recorded at a weather station at 9.00 a.m. each
day for a week.
a Calculate the four-moving mean smoothed temperature with centring for Thursday.
b Calculate the six-moving mean smoothed temperature with centring for Thursday.
Explanation Solution
a 1 For four-mean smoothing 24.8 26.4 13.9 12.7 14.2
with centring, write down the
five data values centred on
(24.8 + 26.4 + 13.9 + 12.7)
2 Calculate the mean of the mean 1 =
first four values (mean 1)
= 19.45
and the mean of the last four
(26.4 + 13.9 + 12.7 + 14.2)
values (mean 2). mean 2 =
= 16.8
(mean 1 + mean 2)
3 The centred mean is then the centred mean =
average of mean 1 and mean (19.45 + 16.8)
2. =
= 18.125
4 Write down your answer. The four-mean smoothed temperature centred on
Thursday is 18.1 ◦ C (to 1 d.p.).
b 1 For six-mean smoothing with 18.1 24.8 26.4 13.9 12.7 14.2 24.9
centring, write down the
seven data values centred on
(18.1 + 24.8 + 26.4 + 13.9 + 12.7 + 14.2)
2 Calculate the mean of the mean 1 =
first six values (mean 1)
= 18.35
and the mean of the last six
(24.8 + 26.4 + 13.9 + 12.7 + 14.2 + 24.9)
values (mean 2). mean 2 =
= 19.4833 . . .
(mean 1 + mean 2)
3 The centred mean is then the centred mean =
average of mean 1 and mean (18.35 + 19.483)
2. =
= 18.917
4 Write down your answer. The six-mean smoothed temperature centred on
Thursday is 18.9 ◦ C (to 1 d.p.).
The next step is to extend these computations to smooth all terms in the time series. This
process is illustrated using four-moving mean smoothing in the following example. Setting
up and using a table like the one shown in the example will help keep track of the process.
The following table gives the number of births per month over a calendar year in a
country hospital. Use the four moving mean with centring method to complete the table.
Complete the calculations as shown below.
4-moving mean with
Month Number of births 4-moving mean
January 10
10 + 12 + 6 + 5
February 12 = 8.25
8.25 + 11.25
March 6 = 9.75
12 + 6 + 5 + 22
= 11.25
11.25 + 12.75
April 5 = 12
6 + 5 + 22 + 18
= 12.75
12.75 + 14.5
May 22 = 13.625
5 + 22 + 18 + 13
= 14.5
14.5 + 15
June 18 = 14.75
22 + 18 + 13 + 7
= 15
15 + 11.75
July 13 = 13.375
18 + 13 + 7 + 9
= 11.75
11.75 + 9.75
August 7 = 10.75
13 + 7 + 9 + 10
= 9.75
9.75 + 8.5
September 9 = 9.125
7 + 9 + 10 + 8
= 8.5
8.5 + 10.5
October 10 = 9.5
9 + 10 + 8 + 15
= 10.5
November 8
December 15
sheet Exercise 5B
Note: A CAS calculator may be used to construct the time series plots.
2 The table below gives the temperature (◦ C) recorded at a weather station at 3.00 p.m.
each day for a week.
t 1 2 3 4 5 6 7 8 9
y 10 12 8 4 12 8 10 18 2
3-moving mean y – –
5-moving mean y – – – –
Day 1 2 3 4 5 6 7 8 9 10
Temperature (◦ C) 24 27 28 40 22 23 22 21 25 26
3-moving mean 26.3 30.0 22.3 22.7 24.0
5-moving mean 28.2 27 23.4
5 The value of the Australian dollar in US dollars (exchange rate) over 10 days is given below.
Day 1 2 3 4 5 6 7 8 9 10
Exchange rate 0.743 0.754 0.737 0.751 0.724 0.724 0.712 0.735 0.716 0.711
3-moving mean 0.745 0.747 0.733 0.721 0.721
5-moving mean 0.738 0.730 0.722
Example 10 7 Use the time series data in the table in Question 6 to find:
a the four-mean smoothed y-value centred at t = 3
b the four-mean smoothed y-value centred at t = 6
c the six-mean smoothed y-value centred at t = 3
d the six-mean smoothed y-value centred at t = 6
8 The table below gives the minimum daily temperature (◦ C) recorded at a weather
station over a 10 day period.
Day 1 2 3 4 5 6 7 8 9 10
Temperature (◦ C) 8.9 3.5 11.6 14.1 12.5 13.3 6.4 8.5 9.1 4.5
Month Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
Number of complaints 10 12 6 5 22 18 13 7 9 10 8 15
2-moving mean 10.0 7.3 9.5 16.8 17.8 12.8 9.0 9.3 10.3
10 The table below gives the amount of rain (in mm) recorded each month at a weather
Month Apr May Jun July Aug Sep Oct Nov Dec
Rainfall (mm) 21.4 40.5 52.3 42.1 58.9 79.9 81.5 54.3 50.0
4-moving mean 43.8 53.4 67.1 67.5
The numbers of emails he received on Thursday, Friday and Saturday are not shown.
The five-mean smoothed number of emails he received on Friday is 39.
The three-mean smoothed number of emails he received on Friday is:
A 36 B 39 C 40 D 42 E 45
12 The table shows the closing price (price) of a company’s shares on the stock market
over a 10 day period.
Day 1 2 3 4 5 6 7 8 9 10
Price($ ) 0.99 1.05 1.10 1.25 1.29 1.37 2.42 1.95 2.05 2.35
The six-mean smoothed with centring closing share price on Day 6 is closest to:
A $1.56 B $1.62 C $1.64 D $1.88 E $1.72
The time series plot below shows the amount that Arnold saved each month (in dollars) over
a 12 month period.
Amount saved ($)
13 If he saved a total of $831 over the period from May to September, the five-mean
smoothed amount that he saved in July is closest to:
A $277 B $190 C $182 D $152 E $166
14 If seven-mean smoothing is used to smooth this time series plot, the number of
smoothed data points would be:
A 3 B 4 C 5 D 6 E 7
Another simple and convenient method of smoothing a time series is to use moving median
smoothing. The advantage of this method is that it can be done directly on the graph without
needing to know the exact values of each data point.1 However, before smoothing a time
series plot graphically using moving medians we will first need to know how to locate
medians graphically.
Note that, in this course, median smoothing is restricted to smoothing over an odd number of points, so
centring is not required.
Construct a three-median 25
raw data
smoothed plot of the time series
Number of births
plot shown opposite. 15
Explanation Solution
1 Locate on the time series plot 25
raw data
the median of the first three first 3-median
Number of births
points (Jan, Feb, Mar). point
15 2
1 middle number of births
middle month
2 Continue this process by 25
raw data
moving onto the next three 20 3-median
Number of births
points to be smoothed (Feb, point
Mar, Apr).
Mark their medians on the
graph, and continue the 5
groups of three.
3 Join the median points with a 25
line segment – see opposite. raw data
Number of births
20 3-median
Construct a five-median 25
raw data
smoothed plot of the time series
Number of births
plot shown opposite.
Note: The starting point for a median
smoothing is a time series plot and
you smooth directly onto the plot. 5
Copies of the plots in this section can
be accessed through the skillsheet icon
Explanation Solution
1 Locate on the time series plot 25 5 raw data
the median of the first five first 5-median
Number of births
middle month point
points (Jan, Feb, Mar, Apr, 15 2
May), as shown. 1 middle number of births
5 3 4
2 Then move onto the next five points to be smoothed (Feb, Mar, Apr, May, Jun). Repeat
the process until you run out of groups of five points. The five-median points are then
joined up with line segments to give the final smoothed plot, as shown.
raw data
Number of births
20 5-median
Note: The five-median smoothed plot is much smoother than the three-median smoothed plot.
sheet Exercise 5C
Note: Copies of all plots in this section can be accessed through the skillsheet icon in the Interactive
c 5 d 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 0 1 2 3 4 5
(in ◦ C) in a city over a period of 10 30
consecutive days. 20
0 1 2 3 4 5 6 7 8 9 10
Use three-median smoothing to determine the smoothed temperature for:
a day 4 b day 8
Example 13 4 Use the time series plot in Question 2 to find the five-median smoothed
temperature for:
a day 4 b day 8
Exchange rate
Australian dollar in US dollars 0.73
(the exchange rate) over a 0.72
period of 10 consecutive days 0.71
in 2009.
Use five-moving median 0 1 2 3 4 5 6 7 8 9 10
smoothing to graphically Day
smooth the plot and comment
on the smoothed plot.
6 Use the graphical approach to smooth the time series plot below using:
a three-moving median smoothing b five-moving median smoothing.
Number of whales (’000s)
20 25 30 35 40 45 50 55 60 65 70 75 80 85
19 19 19 19 19 19 19 19 19 19 19 19 19 19
a Find the median value of the percentage growth in GDP over the 13 year period.
b Smooth the times series graph using:
i three-moving median smoothing ii five-moving median smoothing.
c What conclusions can be drawn about the variation in GDP growth from these
smoothed time series plots?
The time series plot below shows the amount that Lulu saved each month (to the nearest $)
over a 12 month period.
Amount saved ($) 240
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
8 During the year shown in the time series plot the median monthly amount Lulu saved is
closest to:
A $180 B $155 C $130 D $190 E $200
5D Seasonal indices
Learning intentions
I To be able to interpret the meaning of seasonal indices.
I To be able to seasonally adjust data using seasonal indices.
I To be able to calculate seasonal indices from time series data.
When the data is considered to have a seasonal component, it is often necessary to remove
this component so any underlying trend is clearer. The process of removing the seasonal
component is called deseasonalising the data. To do this we need to calculate seasonal
indices. Seasonal indices tells us how a particular season (generally a day, month or quarter)
compares to the average season.
Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Total
1.1 1.2 1.1 1.0 0.95 0.95 0.9 0.9 0.85 0.85 1.1 1.1 12.0
Key fact 1
Seasonal indices are calculated so that their average is 1. This means that the sum of the
seasonal indices equals the number of seasons. Thus, if the seasons are months, the seasonal
indices add to 12. If the seasons are quarters, then the seasonal indices add to 4, and so on.
Key fact 2
Seasonal indices tell us how a particular season (generally a day, month or quarter) compares
to the average season.
For example:
seasonal index for unemployment for the month of February is 1.2 or 120%.
This tells us that February unemployment figures tend to be 20% higher than the monthly
average. Remember, the average seasonal index is 1 or 100%.
seasonal index for August is 0.90 or 90%.
This tells us that the August unemployment figures tend to be only 90% of the monthly
average. Alternatively, August unemployment figures are 10% lower than the monthly
Suppose that the seasonal indices (SI) for electricity usage in Esse’s home are as shown in
the table:
a The seasonal index for Winter is 1.26. This tells us that Esse’s electricity usage in
Winter is typically 26% higher than the average season.
b The seasonal index for Spring is 0.64. This tells us that Esse’s electricity usage in
Spring is typically 36% lower than the average season.
Deseasonalising data
Time series data are deseasonalised using the relationship:
actual figure
deseasonalised figure =
seasonal index
The rule for determining deseasonalised data values can also be used to reseasonalise data –
that is, convert a deseasonalised value into an actual data value.
Reseasonalising data
Time series data are reseasonalised using the rule:
actual figure = deseasonalised figure × seasonal index
The seasonal indices (SI) for cold drink sales for Imogen’s kiosk are as shown in the
Summer Autumn Winter Spring
1.75 0.66 0.46 1.13
a If the actual actual cold drink sales last summer totalled $21 653, what is the
deseasonalised sales figure for that time period?
b If the deseasonalised cold drink sales last spring totalled $10 870, what were the actual
sales for that time period?
Explanation Solution
21 653
a To deseasonalise we divide by the Deseasonalised sales =
seasonal index for Summer (1.75)
= $12 373.14
b To find the actual sales we multiply by the Actual sales = 10 870 × 1.13
seasonal index for Spring (1.13). = $12 283.10
Consider the table below which gives the seasonal indices for heater sales at a discount
Explanation Solution
a 1 Insert the seasonal index for summer In general for summer:
into the rule actual sales
deseasonalised sales =
actual sales 0.65
deseasonalised sales =
seasonal index 1
= × actual sales
= 1.538 × actual sales
2 Convert 1.538 into a percentage Multiplying the actual sales by 1.538 is the
increase or decrease. Write the equivalent of increasing the actual sales by
answer in a sentence. 53.8%.
To correct for seasonality, the actual sales
should be increased by 53.8%.
b 1 Insert the seasonal index for winter In general for winter:
into the rule actual sales
deseseasonalised sales =
actual sales 1.35
deseasonalised sales =
seasonal index 1
= × actual sales
= 0.741 × actual sales
2 Convert 0.741 into a percentage Multiplying the actual sales by 0.741 is the
increase or decrease. Write the
equivalent of decreasing the actual sales by
answer in a sentence.
(100%-74.1%) = 25.9%.
To correct for seasonality, the actual sales
should be increased by 25.9%.
Explanation Solution
value for season
1 The seasons are quarters. Write the Seasonal index =
seasonal average
formula in terms of quarters.
920 + 1085 + 1241 + 446
2 Find the quarterly average for the Quarterly average =
= 923
3 The seasonal index (SI) for each
quarter is the ratio of that quarter’s SISummer = = 0.997
sales to the average quarter.
SIAutumn = = 1.176
SIWinter = = 1.345
SISpring = = 0.483
4 Check that the seasonal indices Check: 0.997 + 1.176 + 1.345 + 0.483 = 4.001
sum to 4 (the number of seasons).
The slight difference is due to
5 Write out your answers as a table Seasonal indices
of the seasonal indices. Summer Autumn Winter Spring
0.997 1.176 1.345 0.483
The next example illustrates how seasonal indices are calculated with 3 years’ data. While
the process looks more complicated, we just repeat what we did in the previous example
three times and average the results for each year at the end.
Suppose that Mikki has 3 years of data, as shown. Use the data to calculate seasonal
indices, rounded to two decimal places.
The strategy is as follows:
Calculate the seasonal indices for years 1, 2 and 3 separately. As we already have the
seasonal indices for year 1 in the previous example we will save ourselves some time
by simply quoting the result.
Average the three sets of seasonal indices to obtain a single set of seasonal indices.
Explanation Solution
1 Write down the result for Year 1 seasonal indices:
year 1. Summer Autumn Winter Spring
0.997 1.176 1.345 0.483
2 Now calculate the seasonal
indices for year 2.
value for quarter
a The seasons are quarters. Seasonal index =
quarterly average
Write the formula in terms
of quarters.
1035 + 1180 + 1356 + 541
b Find the quarterly average Quarterly average =
for the year.
= 1028
c Work out the seasonal
index (SI) for each time SISummer = = 1.007
period. 1180
SIAutumn = = 1.148
SIWinter = = 1.319
SISpring = = 0.526
d Check that the seasonal Check: 1.007 + 1.148 + 1.319 + 0.526 = 4.000
indices sum to 4.
e Write out your answers Year 2 seasonal indices:
as a table of the seasonal
Summer Autumn Winter Spring
1.007 1.148 1.319 0.526
3 Now calculate the seasonal
indices for year 3.
1299 + 1324 + 1450 + 659
a Find the quarterly average Quarterly average =
for the year.
= 1183
b Work out the seasonal SISummer = = 1.098
index (SI) for each time 1324
period. SIAutumn = = 1.119
SIWinter = = 1.226
SISpring = = 0.557
c Check that the seasonal Check: 1.098 + 1.119 + 1.226 + 0.557 = 4.000
indices sum to 4.
d Write out your answers Year 3 seasonal indices:
as a table of the seasonal
Summer Autumn Winter Spring
1.098 1.119 1.226 0.557
4 Find the 3-year averaged Final seasonal indices:
seasonal indices by averaging 0.997 + 1.007 + 1.098
SSummer = = 1.03
the seasonal indices for each 3
1.176 + 1.148 + 1.119
season. SAutumn = = 1.15
1.345 + 1.319 + 1.226
SWinter = = 1.30
0.483 + 0.526 + 0.557
SSpring = = 0.52
5 Check that the seasonal Check: 1.03 + 1.15 + 1.30 + 0.52 = 4.00
indices sum to 4.
6 Write out your answers as a
Summer Autumn Winter Spring
table of the seasonal indices.
1.03 1.15 1.30 0.52
The quarterly sales figures for Mikki’s shop over a 3-year period are given below.
Explanation Solution
1 To deseasonalise each sales figure in Deseasonalised Summer sales:
the table, divide by the appropriate 920
Year 1: = 893
seasonal index. 1.03
For example, for summer, divide the Year 2: = 1005
figures in the ‘Summer’ column by 1.03. 1299
Round results to the nearest whole Year 3: = 1261
2 Repeat for the other seasons. Deseasonalised sales figures
Year Summer Autumn Winter Spring
1 893 943 955 858
2 1005 1026 1043 1040
3 1261 1151 1115 1267
Why deseasonalise?
The purpose of removing the seasonality component of a time series is generally so that
any trend in the time series is clearer. Consider again the actual customer data, and the
deseasonalised customer data from Example 18, both of which are shown in the following
time series plots.
Deseasonalised customers
1600 1300
Actual customers
1400 1200
1200 1100
1000 1000
800 900
600 800
400 700
200 600
0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 4 5 6 7 8 9 10 11 12 13
Quarter Quarter
It is hard to see from the first plot whether there has been any growth in Mikki’s business,
but the deseasonalised plot reveals revealed a clear underlying trend in the data.
It is common to deseasonalise time series data before you fit a trend line. We will consider
this further in the next section.
sheet Exercise 5D
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sales ($’000s) 9.6 10.5 8.6 7.1 6.0 5.4 6.4 7.2 8.3 7.4
Seasonal index 1.2 1.3 1.1 1.0 1.0 0.9 0.8 0.7 0.9 1.0 1.1
Example 15 2 a Find the deseasonalised sales figure (in $’000s) for March, giving your answer
rounded to one decimal place.
b Find the deseasonalised sales figure (in $’000s) for June, giving your answer
rounded to one decimal place.
3 a The deseasonalised sales figure (in $’000s) for August is 5.6. Find the actual sales
(in $’000s), giving your answer rounded to one decimal place.
b The deseasonalised sales figure (in $’000s) for April is 6.9. Find the actual sales (in
$’000s), giving your answer rounded to one decimal place.
Example 16 4 a By what percentage should the sales in August be increased or decreased in order to
correct for seasonality? Give your answer as a percentage rounded to one decimal
b By what percentage should the sales in February be increased or decreased in order
to correct for seasonality? Give your answer as a percentage rounded to one decimal
5 The table below shows the quarterly newspaper sales (in $’000s) of a corner store.
Also shown are the seasonal indices for newspaper sales for the first, second and third
Quarter 1 Quarter 2 Quarter 3 Quarter 4
Sales 1060 1868 1642
Seasonal index 0.8 0.7 1.3
7 The number of waiters employed by a restaurant chain in each quarter of 1 year, along
with some seasonal indices that have been calculated from the previous years’ data, are
given in the following table.
9 The table below records the monthly visitors (in ’000s) to a museum over one year.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
12 13 14 17 18 15 9 10 8 11 15 20
Use the data to determine the seasonal indices for the 12 months. Give your results
rounded to two decimal places.
Example 18 10 The table below records the monthly sales (in $’000s) for a shop over a two year
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
22 19 25 23 20 18 20 15 14 11 23 30
21 20 23 25 22 17 19 17 16 11 25 31
Use the data to determine the seasonal indices for the 12 months. Give your results
rounded to two decimal places.
Example 19 11 The daily number of cars carried on a car ferry service each day over a two-week
period, together with the daily seasonal indices, are shown in the table below:
a Use the seasonal indices to deseasonalise the data, rounding the answers to the
nearest whole number.
b Use your calculator to construct a time series plot the the deseasonalised data.
12 The numbers of retail job vacancies advertised on an online job board each quarter in
each of three consecutive years are shown in the following table.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Seasonal index 1.36 1.19 1.05 1.01 0.93 0.82 0.75 0.68 0.87 0.9 1.18 1.26
Number of customers 934 836 736 716 649 554 541 598 626 826 873
13 To correct the number of customers in May for seasonality, the actual number of
customers should be:
A increased by 93.0% B decreased by 93.0% C decreased by 7.0%
D decreased 7.5% E increased by 7.5%
14 To correct the number of customers in November for seasonality, the actual number of
customers should be:
A increased by 18.0% B decreased by 84.7% C increased by 15.3%
D decreased 15.3% E decreased by 18.0%
15 If the deseasonalised number of customers for August is 700, the actual number of
customers in that month is closest to:
A 1029 B 768 C 570 D 607 E 476
16 The table below records the monthly average electricity cost (in dollars) for a home.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
223 190 253 236 201 189 203 153 143 111 235 307
17 The table below shows the room occupancy rate for a chain of hotels over the summer,
autumn, winter and spring quarters for the years 2020–2022.
The table below shows the number of female students in Victoria enrolled in at least one
subject in the Mathematics learning area at year 12 over the period 2010–18. Fit a trend
line to the data, and interpret the slope.
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018
Number 76260 78707 79797 79952 78237 80858 81587 83820 84069
Explanation Solution
1 Construct a time series plot 85000
Using a trend line fitted to a time series plot to make predictions about future values is
known as trend line forecasting.
Example 21 Forecasting
How many female students in Victoria do we predict being enrolled in at least one
subject in the Mathematics learning area at year 12 in 2026 if the same increasing trend
continues? Give your answer rounded to the nearest whole number.
Explanation Solution
Substitute 2026 in the equation determined number of female students
using least squares regression, and round = −1 633 580 + 851.017 × year
to the nearest whole number. = −1 633 580 + 851.017 × 2026
= 90 580 to the nearest whole number.
Note: As with any prediction involving extrapolation, the results obtained when predicting well beyond
the range of the data should be treated with caution.
The deseasonalised quarterly sales data from Mikki’s shop are shown below.
Quarter 1 2 3 4 5 6 7 8 9 10 11 12
Sales 893 943 955 858 1005 1026 1043 1040 1261 1151 1115 1267
0 1 2 3 4 5 6 7 8 9 10 11 12
3 Write down the equation of Sales = 838.0 + 32.07 × quarter
the least squares line with the
intercept and slope rounded to 4
significant figures.
4 Interpret the slope in terms of the Over the 3-year period, on average sales at
variables involved. Mikki’s shop increased by 32.07 sales per
What sales do we predict for Mikki’s shop in the winter of year 4? (Because many items
have to be ordered well in advance, retailers often need to make such decisions.)
Explanation Solution
1 Substitute the appropriate value for Deseasonalised sales = 838.0 + 32.07 × quarter
the time period in the equation for = 838.0 + 32.07 × 15
the trend line. Since summer year
= 1319.05
1 is quarter 1, then winter year 4 is
quarter 15.
2 To obtain the actual predicted sales Actual sales prediction for winter of year 4
figure reseasonalise the predicted = 1319.05 × 1.30
value by multiplying this value by = 1714.765
the seasonal index for winter, 1.30.
= 1715 (to the nearest whole number)
Exercise 5E
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Number 488 490 510 538 569 569 595 619 632 645
The time series plot of the data is shown below.
a Comment on the 700
Commencing university students
b Fit a least squares
regression trend line
to the data, giving 550
2 The table below shows the percentage of total retail sales that were made in department
stores over an 11-year period:
Year 1 2 3 4 5 6 7 8 9 10 11
Sales (%) 12.3 12.0 11.7 11.5 11.0 10.5 10.6 10.7 10.4 10.0 9.4
c Fit a trend line to the time series plot, find its equation and interpret the slope. Give
your answer rounded to 3 significant figures.
d Use the trend line to forecast the percentage of retail sales which will be made by
department stores in year 15.
3 The median ages of mothers in Australia over the years 2010–2020 are shown below.
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Age 30.7 30.7 30.7 30.8 30.8 30.9 31.1 31.2 31.3 31.4 31.5
a Fit a least squares regression trend line to the data, and interpret the slope. (Give the
values of the coefficients rounded to 3 significant figures.)
b Use the trend line to forecast the average ages of mothers having their first child in
Australia in 2030. Explain why this prediction is not likely to be reliable.
4 The average weekly earnings (in dollars) in Australia during the period 2014–2021 are
given in the following table.
Quarter 1 2 3 4
Seasonal index 0.90 0.81 1.11 1.18
6 The quarterly seasonal indices for the sales of boogie boards in a surf shop are as
Seasonal index 1.13 0.47 0.62 1.77
The actual sales of the boogie boards over a 2-year period are given in the table.
a Use the seasonal indices to calculate the deseasonalised sales figures for this period
to the nearest whole number.
b Use a CAS calculator to plot the actual sales figures and the deseasonalised sales
figures for this period and comment on the plot.
c Fit a trend line to the deseasonalised sales data. Write the slope and intercept
rounded to three significant figures.
d Use the relationship calculated in c, together with the seasonal indices, to forecast
the sales for the first quarter of year 4 (you will need to reseasonalise here).
The predicted number of actual visitors for the April-June quarter in 2025 is closest to:
A 42070 B 42356 C 38544 D 46545 E 37501
8 An electrical goods retailer knows that the sales of air conditioners are seasonal. A
least squares regression line has been fitted to the data collected by the retailer in 2021
and 2022, and the equation is:
deseasonalised number of air conditioners = 197 + 1.2 × month
where month number one is January 2021.
The monthy seasonal indices for air conditioner sales are shown in the table below.
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Seasonal index 1.36 1.19 1.05 1.01 0.93 0.82 0.75 0.68 0.87 0.90 1.18 1.26
The predicted number of actual sales for November 2025 is closest to:
A 268 B 227 C 299 D 333 E 316
Key ideas and chapter summary
Time series data Time series data are a special case of bivariate data, where the
Assign- explanatory variable is the time at which the values of the response
variable were recorded.
Time series plot A time series plot is a bivariate plot where the values of the response
variable are plotted in time order. Points in a time series plot are joined
by line segments.
Cycles Cycles are present when there is a periodic movement in a time series.
The period is the time it takes for one complete up and down movement
in the time series plot. This term is generally reserved for periodic
movements with a period greater than one year.
Outliers Outliers are present when there are individual values that stand out
from the general body of data.
Moving mean In moving mean smoothing, each original data value is replaced by
smoothing the mean of itself and a number of data values on either side. When
smoothing over an even number of data points, centring is required to
ensure the smoothed mean is centred on the chosen point of time.
Seasonal indices Seasonal indices are used to quantify the seasonal variation in a time
Deseasonalise The process of accounting for the effects of seasonality in a time series
is called deseasonalisation.
Reseasonalise The process of a converting seasonal data back into its original form is
called reseasonalisation.
Trend line Trend line forecasting uses the equation of a trend line to make
forecasting predictions about the future.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
5A 6 I can identify outliers in a time series plot.
Multiple-choice questions
1 The time series plot below shows quarterly house sales for a real estate agency over a
three year period.
House sales
0 1 2 3 4 5 6 7 8 9 10 11 12 13
2 The time series plot below shows the annual profit (in $000) for a manufacturing
Annual profit ($000)
1995 2000 2005 2010 2015 2020
C seasonality with irregular fluctuations
D increasing trend with an outlier
E increasing trend with a structural change
Time period 1 2 3 4 5 6
Data value 2.3 3.4 4.4 2.7 5.1 3.7
5 The two-moving mean for time period 5 with centring is closest to:
A 2.7 B 3.6 C 3.9 D 4.0 E 4.2
6 The four-moving mean for time period 4 with centring is closest to:
A 2.7 B 3.6 C 3.9 D 4.1 E 4.2
Room occupancy rate (%)
0 1 2 3 4 5 6 7 8 9 10 11 12 13
9 The seasonal indices for the number of customers at a restaurant are as follows.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1.0 p 1.1 0.9 1.0 1.0 1.2 1.1 1.1 1.1 1.0 0.7
The value of p is:
A 0.5 B 0.7 C 0.8 D 1.0 E 1.2
10 The table shows the closing price (price) of a company’s shares on the stock market
over a 10 day period.
Day 1 2 3 4 5 6 7 8 9 10
Price($ ) 2.85 2.80 2.78 2.40 2.80 3.15 3.42 3.95 4.05 3.35
The six-mean smoothed with centring closing share price on Day 5 is closest to:
A $2.80 B $2.89 C $2.91 D $2.99 E $3.08
11 If five-mean smoothing was used to smooth this time series, the number of smoothed
values would be:
A 5 B 6 C 7 D 8 E 9
12 Suppose that Lyn spent a total of $427 on dining out over the period from January to
March, and then another $230 over the period April-May. The five-mean smoothed
amount that she spent in March is closest to:
A $115 B $129 C $131 D $142 E $329
13 The number of bathing suits sold one summer is 432. The deseasonalised number is
closest to:
A 432 B 240 C 778 D 540 E 346
14 The deseasonalised number of bathing suits sold one winter was 380. The actual
number was closest to:
A 114 B 133 C 152 D 380 E 1267
15 The seasonal index for spring tells us that, over time, the number of bathing suits sold
in spring tends to be:
A 50% less than the seasonal average
B 15% less than the seasonal average
C the same as the seasonal average
D 15% more than the seasonal average
E 50% more than the seasonal average
16 To correct for seasonality, the actual number of bathing suits sold in Autumn should be:
A reduced by 50% B reduced by 40% C increased by 40%
D increased by 150% E increased by 250%
17 The number of visitors to an information centre each quarter was recorded for one year.
The results are tabulated below.
Quarter Summer Autumn Winter Spring
Visitors 1048 677 593 998
Using this data, the seasonal index for autumn is estimated to be closest to:
A 0.25 B 1.0 C 1.22 D 0.82 E 0.21
18 Using this trend line, the percentage change in enrolments from the previous year
forecast for 2026 is:
A 24.98 B -11.05 C 1.73 D 12.11 E 24.62
20 Suppose that the seasonal indices for the wholesale price of petrol are:
Year 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018
CO2 18.2 17.6 17.3 17.0 16.4 15.8 15.8 15.9 15.7 15.5
2 The table below shows the annual inflation rates in Australia and China for the period
Year 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
Inflation Australia (%) 2.9 3.3 1.7 2.5 2.5 1.5 1.3 2.0 1.9 1.6 0.9
Inflation China (%) 3.3 5.4 2.6 2.6 2.0 1.4 2.0 1.6 2.1 2.9 2.4
2009 2011 2013 2015 2017 2019
Inflation Australia (%) Inflation China (%)
a i Find the equation of the least squares line which allows inflation to be predicted
from year for China.
ii Draw the least squares line on the time series plot.
b i Find the equation of the least squares line which allows inflation to be predicted
from year for Australia.
ii Draw the least squares line on the time series plot.
c Explain why the equations of the least squares lines predict that the inflation rate for
China will always remain higher than the inflation rate for Australia.
d Find the two-mean centred smoothed inflation rate for Australia for 2015.
3 The table below shows the number of dolphins spotted in a bay over each of the four
seasons for the years 2020-2021.
a Use the data in the table to find seasonal indices. Give your answers rounded to two
decimal places.
b The number of dolphins spotted in each of the four seasons in 2022 is shown in the
table below.
Use the seasonal indices from part a to deseasonalise the data. Round your answers
to the nearest whole number.
6A Exam 1 style questions: Univariate data
Use the following information to answer Questions 1–3.
The following table shows the data collected from a sample of five senior students at
secondary college. The variables in the table are:
campus – the campus they attend (C = city, R = regional))
time – the time in minutes each students took to get to school that day
transport – mode of transport (1 = walked or rode a bike, 2 = car, 3 = public transport)
number – number of siblings at the school
postcode – postcode of place of residence.
3 The number of regional students who used public transport to get to school is:
A 0 B 1 C 2 D 3 E 4
4 Consider this box-and-whisker plot. Which one of the following statements is true?
0 10 20 30 40 50 60 70
work, selecting from the
alternatives ‘strongly
disagree’, ‘agree’, ‘neither 15%
agree nor disagree’,
‘disagree’, ‘strongly 10%
Strongly Agree Neither Disagree Strongly
disagree’. Their responses agree agree nor disagree
are summarised in the bar disagree
chart opposite. I enjoy my work
The percentage of people who chose the modal response to this question is closest to:
A 18% B 20% C 27% D 29% E 31%
of a histogram.
6 The distribution of test scores is best 4
described as: 3
A positively skewed 1
B negatively skewed 0 5 10 15 20 25 30 35 40 45 50 55 60
C symmetric Test score
D symmetrically skewed
E symmetric with a clear outlier
7 Displayed in the form of a boxplot, the distribution of test scores would look like:
0 60 0 60
0 60 0 60
0 60
8 Students who scored 50 or more on the test were awarded an A on the test. The
percentage of students who were awarded an A is closest to:
A 10% B 11% C 21% D 30% E 33%
9 The number of students who scored at least 30 but under 45 marks is:
A 4 B 7 C 19 D 11 E 20
percentage of countries spending
$US10,000,000 or more on health
is equal to: 8
6.0 6.5 7.0 7.5 8.0 8.5 9.0
log(health expenditure)
minutes. The temperature of
the oven is then measured. 24
The temperatures of 300 20
ovens tested in this way 16
were recorded and the
data displayed using the
percentage frequency
histogram shown. 4
174 176 178 180 182 184 186 188
oven temperature
Use the following information to answer Questions 19–24.
The lengths of sample of 1000 ants of a particular species are approximately normally
distributed with a mean of 4.8 mm and a standard deviation of 1.2 mm.
19 From this information it can be concluded that around 95% of the lengths of the ants
should lie between:
A 2.4 mm and 6.0 mm B 2.4 mm and 7.2 mm C 3.6 mm and 6.0 mm
D 3.6 mm and 7.2 mm E 4.8 mm and 7.2 mm
20 The standardised ant length of z = −1.2 corresponds to an actual ant length of:
A 2.40 mm B 3.36 mm C 4.2 mm D 5.0 mm E 6.24 mm
21 The percentage of ants with lengths less than 3.6 mm is closest to:
A 2.5% B 5% C 16% D 32% E 95%
22 The percentage of ants with lengths less than 6.0 mm is closest to:
A 5% B 16% C 32% D 68% E 84%
23 The percentage of ants with lengths greater than 3.6 mm and less than 7.2 mm is
closest to:
A 2.5% B 18.5% C 68% D 81.5% E 97.5%
24 In the sample of 1000 ants, the number with a length between 2.4 mm and 4.8 mm is
A 3 B 50 C 475 D 975 E 997
26 A class of students sat for a biology test and a legal studies test. Each test had a
possible maximum score of 100 marks. The table below shows the mean and standard
deviation of the marks obtained in these tests.
Biology Legal Studies
Class mean 54 78
Class standard deviation 15 5
The data in the following table was collected to investigate the association between a
person’s age and their satisfaction with their career choice.
Age group
Satisfied with career choice? Under 35 35 or more Total
Yes 136 136 272
No 42 86 128
Total 178 222 400
3 Of those people aged under 35, the percentage who are satisfied with their career
choice is closest to:
A 23.6% B 34.0% C 50.0% D 61.3% E 76.4%
4 The data in the table supports the contention that there is an association between age
group and satisfaction with career choice because:
A 68.0% of people are satisfied with their career choice, compared to 32.0% who are
B The number of people satisfied with their career choice aged under 35 is the same as
the number aged 35 or more who are satisfied with their career choice.
C 76.4% of people aged under 35 are satisfied with their career choice, which is more
than the 61.3% of those aged 35 or more who are satisfied with their career choice.
D 50.0% of people are satisfied with their career choice are aged under 35.
E 67.2% of people who are not satisfied with their career choice are aged 35 or more.
Pay rate
and 2000.
The aim is to investigate the association 10
between pay rate and year.
1980 1990 2000
Which one of the following statements is not true?
A The IQR’s of pay rate in 1990 and 2000 are approximately the same.
B The median pay rate is lower in 1980 than the median pay rate in 1990 and 2000.
C The IQR of pay rate in 1980 is more than the IQR of pay rate in 1990 and 2000.
D The pay rate in 75% of the countries in 1980 was less than the median pay rate in
E All three distributions are approximately symmetric.
90% Right handed
80% Left handed
Yes No
6 The percentage of students diagnosed with dyslexia who are left-handed is closest to:
A 10% B 20% C 30% D 80% E 90%
7 The results could be summarised in a two-way frequency table. Which of the following
frequency tales could match the percentaged segmented bar chart?
Dyslexia Dyslexia
Dominant hand Yes No Dominant hand Yes No
Left 80 90 Left 80 20
Right 20 10 Right 90 10
Dyslexia Dyslexia
Dominant hand Yes No Dominant hand Yes No
Left 20 80 Left 20 10
Right 10 90 Right 30 40
Dominant hand Yes No
Left 10 10
Right 40 90
9 For which one of the following pairs of variables would it be appropriate to construct a
A eye colour (blue, green, brown, other) and country of birth
B weight in kg and blood pressure in mmHg
C number of cups of coffee drunk each day and stress level (high, medium, low)
D age in years and football team
E time spent watching TV each week in hours and educational level (primary,
secondary, tertiary)
11 The association pictured in the scatterplot in the previous question is best described as:
A strong, positive, linear B strong, negative, linear C weak, negative, linear
D strong, negative, non-linear with an outlier E strong, negative, non-linear
For the association between between computer ownership (computers/1000 people) and car
ownership (cars/1000 people) the coefficient of determination is equal to 0.8464.
13 If the people who own more cars also tend to own more computers, then the value of
the correlation coefficient, r (rounded to two decimal places) is closest to.
A 0.64 B 0.72 C 0.85 D 0.92 E 0.96
16 A back-to-back stem plot is a useful tool for displaying the association between:
A weight (kg) and handspan (cm)
B height (cm) and age (years)
C handspan (cm) and eye colour (brown, blue, green)
D height in centimetres and sex (female, male)
E meat consumption (kg/person) population and country of residence
17 To explore the association between owning an electric car (yes or no) and age group
(under 25 years, 25-44 years, 44 years or more), it would be best to use the data
collected to construct:
18 The association between the time taken to walk 5 km (in minutes) and fitness level
(below average, average, above average) is best displayed using:
A a histogram B a scatterplot C a time series plot
D parallel box plots E a back-to-back stem plot
The slope of the least squares regression line which would allow their score in Year 12
to be predicted from their score in Year 11 is closest to:
A 0.35 B 0.68 C 1.3 D 1.7 E 3.36
2 The statistical analysis of the set of bivariate data involving variables x and y resulted
in the information displayed in the table below:
x y
mean 123.5 38.7
standard deviation 4.65 4.78
least squares equation y = −140 + 0.475x
Using this information the value of the correlation coefficient r for this set of bivariate
data is closest to
A 0.73 B 0.34 C 0.46 D 0.49 E 0.97
The following data relate to Questions 3 and 4.
Number of hot dogs sold 190 168 146 155 150 170 185
Temperature (◦ C) 10 15 20 15 17 12 10
3 The equation of the least squares regression line fitted to the data is closest to:
A number of hot dogs sold = 227 − 4.31 × temperature
B number of hot dogs sold = 48.4 − 0.206 × temperature
C number of hot dogs sold = 4.31 + 227 × temperature
D number of hot dogs sold = 0.206 − 48.4 × temperature
E number of hot dogs sold = 227 + 4.31 × temperature
Number of errors
test is plotted against the time they reported 7
studying for the test. 6
A least squares regression line has been 4
determined for the data and is also displayed 3
on the scatterplot. The equation for the least 2
squares regression line is: 1
number o f errors = 8.8 − 0.12 × study time 0 10 20 30 40 50 60 70
and the coefficient of determination is 0.8198. Study time (minutes)
5 The least squares regression line predicts that a student reporting a study time of 35
minutes would make:
A 4.3 errors B 4.6 errors C 4.8 errors D 5.0 errors E 13.0 errors
6 The student who reported a study time of 10 minutes made six errors. The predicted
score for this student would have a residual of:
A −7.6 B −1.6 C 0 D 1.6 E 7.6
7 Which of the following statements that relate to the regression line is not true?
A The slope of the regression line is –0.12.
B The equation predicts that a student who spends 40 minutes studying will make
around four errors.
C The least squares line does not pass through the origin.
D On average, a student who does not study for the test will make around 8.8 errors.
E The explanatory variable in the regression equation is number of errors.
8 This regression line predicts that, on average, the number of errors made:
A decreases by 0.82 for each extra minute spent studying
B decreases by 0.12 for each extra minute spent studying
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Revision 322 Chapter 6 Revision: Data analysis
9 Given that the coefficient of determination is 0.8198, we can say that close to:
A 18% of the variation in the number of errors made can be explained by the variation
in the time spent studying
B 33% of the variation in the number of errors made can be explained by the variation
in the time spent studying
C 67% of the variation in the number of errors made can be explained by the variation
in the time spent studying
D 82% of the variation in the number of errors made can be explained by the variation
in the time spent studying
E 95% of the variation in the number of errors made can be explained by the variation
in the time spent studying
10 The average rainfall and temperature range 250
0 2 4 6 8 10 12 14 16 18 20
Temperature range (°C)
A least squares regression line has been fitted to the data, as shown. The equation of
this line is closest to:
A average rain fall = 210 − 11 × temperature range
B average rain fall = 210 + 11 × temperature range
C average rain fall = 18 − 0.08 × temperature range
D average rain fall = 18 + 0.08 × temperature range
E average rain fall = 250 − 13 × temperature range
B Around 43.8% of the variation in score on the English aptitude test is explained by
the variation in score on the Mathematics aptitude test.
C Together the scores on the Mathematics and English aptitude tests explain 100% of
the variation in university entrance score.
D The correlation between the scores on the Mathematics and English aptitude tests is
E The score on English aptitude tests is more strongly associated with the university
entrance score than is the score on the Mathematics aptitude test.
A student uses the data in the table below to construct the scatterplot shown:
x y
1 132 120
2 120 100
3 117 80
4 104 y
5 91
6 82
7 49
8 24 0
0 1 2 3 4 5 6 7 8 9
13 A y2 transformation could also be used to linearise this association. A least squares line
is fitted to the transformed data, with y2 as the response variable, and the equation of
the least squares line is
y2 = 20076 − 2397.2x
Using this equation, the predicted value of y when x = 2 is closest to:
A 102 B 124 C 120 D 10487 E 15282
14 The following data were collected for two related variables x and y.
x 0.4 0.5 1.1 1.1 1.2 1.6 1.7 2.3 2.4 3.4 3.5 4.3 4.7 5.3
y 5.8 4.7 3.3 5.5 4.2 3.4 2.3 2.8 1.8 1.3 1.9 1.2 1.6 0.9
A scatterplot indicates a non-linear relationship. The data is linearised using a 1/y
transformation. A least squares line is then fitted to the transformed data.
The equation of this line is closest to:
1 1 1
A = 0.08 + 0.16x B = 0.16 + 0.08x C = −0.08x + 5.23x
y y y
1 1
D = 5.23 − 0.08x E = 1.44 + 1.96x
y y
15 The equation of a least squares line that has been fitted to transformed data is:
population = 58 170 + 43.17 × year2
Using this equation, the predicted value of population when year = 10 is closest to:
A 9.2 B 9.9 C 10.6 D 62 417 E 62 487
16 The equation of a least squares line that has been fitted to transformed data is:
weight2 = 52 + 0.78 × area
Using this equation, the predicted value of weight when area = 8.8 is closest to:
A −7.7 B ±7.7 C 7.7 D ±58 E 58
17 The equation of a least squares regression line that has been fitted to transformed
data is: log(number) = 1.31 + 0.083 × month
Using this equation, the predicted value of number when month = 6 is closest to:
A 1.8 B 6.0 C 18 D 64 E 650
2 The time series 16
years or more
aged 65 years 14.5
or more in
Australia and
New Zealand, 13.5
over the years 13
from 2010 to 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
Australia New Zealand
From the plot, it can be concluded that over the interval 2010–2018, the difference in
the percentage of the population aged 65 years or more in the two countries has shown:
A a decreasing trend B an increasing trend
C seasonal variation D a 5-year cycle
E no trend
Use the information in the table below to answer Questions 3 to 6.
t 1 2 3 4 5 6 7 8 9 10
y 4 5 4 4 8 6 9 10 9 12
The numbers of customers on Wednesday, Thursday and Friday are not shown. The
five-mean smoothed number of customers on Thursday is 38.
The three-mean smoothed number of customers on Thursday is:
A 27 B 29 C 30 D 38 E 55
8 The table shows the closing price (price) of a company’s shares on the stock market
over a 9 day period.
Day 1 2 3 4 5 6 7 8 9
Price($) 1.30 1.15 1.10 1.25 1.29 1.37 2.42 1.95 2.55
The six-mean smoothed with centring closing share price on Day 5 is closest to:
A $1.43 B $1.50 C $1.56 D $1.68 E $1.81
Use the following information to answer Questions 9 and 10.
The table below records the monthly electricity cost (in dollars) for an apartment over one
calendar year.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
123 90 153 136 101 129 153 143 95 61 85 107
9 Based on this information, the seasonal index for September is closest to:
A 1.00 B 0.78 C 1.25 D 0.83 E 0.87
10 Using data collected over several years, the seasonal index for December was
determined to be 0.90. To correct the cost of electricity for seasonality in December,
the actual cost should be
A decreased by 11.1% B decreased by 9.0% C decreased by 10.0%
D increased by 10.0% E increased by 11.1%
Quarter 1 2 3 4
Sales ($’000s) 1200 1000 800 1200
Seasonal index 1.1 0.9 0.8
13 The deseasonalised sales (in dollars) for a company in June were $91 564. The
seasonal index for June is 1.45.
The actual sales for June were closest to:
A $41 204 B $61 043 C $63 148 D $91 564 E $132 768
14 Sales for a major department store are reported quarterly. The seasonal index for the
third quarter is 0.85. This means that sales for the third quarter are typically:
A 85% below the quarterly average for the year
B 15% below the quarterly average for the year
C 15% above the quarterly average for the year
D 18% above the quarterly average for the year
E 18% below the quarterly average for the year
Number of calls
over a 12-month period. 370
0 1 2 3 4 5 6 7 8 9 10 11 12
Month number
Study mode Female Male
On campus
e Data was collected from a 100%
total of 120 students. The 90%
percentaged segmented bar 70%
chart shows the study mode 60%
preferences for students from 50%
each of the three courses. 30%
Business Health Social Science
Online On campus
i What percentage of Business students chose online?
ii Does the percentaged segmented bar chart support the contention that the choice
of study mode (on-campus or online) is associated with course? Justify your
answer by quoting appropriate percentages.
2 The histogram and the boxplot below show the distribution of the distances the 120
students surveyed in Question 1 live from the campus.
0 2 4 6 8 10 12 14 16 18 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Distance (kms) Distance (kms)
a Describe the shape of the distribution of distance, including the values of outliers.
b Approximately how many of these students lived from 4 to 5 km from the campus?
c i Determine the values of the upper and lower fences for the boxplot.
ii Use the fences to explain why a distance of 1 km would not be shown as an
d The boxplots compare the
distance students live from the online
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Distance (kms)
3 To address the question "Can we predict a person’s height from their armspan?" height
(in cm) and armspan (in cm) measurements were collected from 60 students in Year
11 and 12. Of the 60 students 30 were male and 30 female. The following scatterplot
shows the data collected, with least squares regression lines fitted for males and
height (cm)
150 155 160 163 170 175 180 185 190 195 200
armspan (cm)
The equation of the least squares regression line for females is:
height= −4.199 + 1.028 × armspan
The equation of the least squares regression line for males is:
height= 31.705 + 0.815 × armspan
a Interpret the slope of the regression equation in terms of height and armspan for
In determining this equation, the armspan height
summary statistics displayed in
mean (females) 164.0 164.5
the table were also calculated.
standard deviation (females) 6.319 8.083
mean (males) 178.1 177.0
standard deviation (males) 9.832 9.583
b i Determine the value of the coefficient of determination for females and interpret
in terms of height and armspan. Give your answer as a percentage rounded to
one decimal place.
ii Determine the value of the coefficient of determination for males and interpret in
terms of height and armspan. Give your answer as a percentage rounded to one
decimal place.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
6E Exam 2 style questions 331
iii Explain why armspan is a better predictor of height for males than for females,
quoting appropriate statistics.
c i Use the least squares regression line to predict the difference in height between
males and females who both have armspans of 160 cm. Who is taller and by
how much?
ii Use the least squares regression line to predict the difference in height between
males and females who both have armspans of 190 cm. Who is taller and by
how much?
iii Are the prediction made in ci. and cii. reliable? Explain.
4 The average student PISA mathematics scores (score) for OECD countries, as well
as the expenditure per primary school child in those countries in $US per capita
(expenditure), are shown in the following scatterplot.
PISA mathematics score
0 5000 10000 15000 20000 25000
Expenditure primary ($US per capita)
a Describe the association between score and expenditure in terms of form and
b Which transformations could be used in order to linearise the association?
c A least squares regression line, with expenditure as the explanatory variable, was
fitted to the data, and the following residual plot constructed.
i A residual plot can be used to test an assumption about the nature of the
association between two numerical variables. What is this assumption?
ii Does the residual plot support this assumption? Explain your answer.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Revision 332 Chapter 6 Revision: Data analysis
d A log10 was applied to the variable expenditure to linearise the association. When
a least squares line was fitted to the transformed data, it was found to have an
intercept of 12.99, and a slope of 120.6.
i Write down the equation of this least squares line.
ii Use the equation from cii to predict the PISA mathematics score for a country
which has an expenditure of $US10 000 per capita. Round your answer to the
nearest whole number.
a Determine:
i the 5-median smoothed value for Month 10, rounded to the nearest $000.
ii the 7-median smoothed value for Month 9, rounded to the nearest $000.
b The following table gives the value of bitcoin in Australian dollars at the beginning
of each month for the first six months of 2021.
Month Jan Feb Mar Apr May Jun
Bitcoin($) 29391.78 33543.77 58726.68 57836.01 36681.74 33524.98
Find the centred two-mean smoothed value of bitcoin for the month of March,
rounding your answer to the nearest cent.
c A least squares regression line fitted the monthly bitcoin data for 2021 (January
2021 is month 1), giving the following equation:
Bitcoin = 36382.73 + 1525.799 × month
i Write down the value of the slope to the nearest cent, and interpret in terms of
the variables in the question.
ii Use the equation to predict the value of bitcoin in January 2024. Give your
answer rounded to the nearest dollar.
iii How reliable is the prediction made in cii?
Chapter objectives
I What is a sequence?
I How do we generate a sequence of numbers from a starting value and a
I How do we identify particular terms in a sequence?
I What is recursion?
I What is linear growth and decay?
I How can recurrence relations be used to model simple interest, flat rate
depreciation and unit cost depreciation on assets?
I How can recurrence relations be used to model compound interest and
reducing-balance depreciation on assets?
I How can the CAS calculator be used to find the length of time or the
necessary interest rate required for an investment or loan to reach a
particular value?
I How can investments and loans be compared using effective interest rates?
In this chapter, the notion of a sequence and recurrence relation are introduced as well
as the concepts of linear growth and decay and geometric growth and decay.
Taken together, these ideas are applied to financial situations including investments,
loans and the depreciation of assets to investigate how much interest must be paid on a
loan, how much interest an investment earns or how much an asset depreciates, under
different assumptions.
A list of numbers, written down in succession, is called a sequence. Each of the numbers
in a sequence is called a term. We write the terms of a sequence as a list, separated by
commas. If a sequence continues indefinitely, or if there are too many terms in the sequence
to write them all, we use an ellipsis, ‘. . . ’.
Sequences may be either generated randomly or by recursion using a rule. For example, this
1, 3, 5, 7, 9, . . .
has a definite pattern.
The sequence of numbers has a starting value of 1. We add 2 to this number to generate the
next term, 3. Then, add 2 again to generate the next term, 5, and so on.
The rule is ‘add 2 to each term’.
+2 +2 +2 +2
1 3 5 7 9 ...
Write down the first five terms of the sequence with a starting value of 6 and the rule ‘add
4 to the previous term’.
Explanation Solution
1 Write down the starting value. 6
2 Apply the rule (add 4) to generate the 6 + 4 = 10
next term.
3 Calculate three more terms. 10 + 4 = 14
14 + 4 = 18
18 + 4 = 22
4 Write your answer. The first five terms are 6, 10, 14, 18, 22.
Write down the first five terms of the sequence with a starting value of 5 and the rule
‘double the number and then subtract 3’.
Explanation Solution
1 Write down the starting value. 5
2 Apply the rule (double 5, then subtract 5×2−3=7
3) to generate the next term.
3 Calculate three more terms. 7 × 2 − 3 = 11
11 × 2 − 3 = 19
19 × 2 − 3 = 35
4 Write your answer. The first five terms are 5, 7, 11, 19, 35.
Use a calculator to generate the first five terms of the sequence with a starting value of 5
and the rule ‘double and then subtract 3’.
Explanation Solution
Steps TI-Nspire
1 Start with a blank computation screen.
2 Type 5 and press · or .
3 Next type ×2 − 3 and press · or
to generate the next term in the sequence.
The computation generating this value is
shown as ‘5·2–3’ on the TI-Nspire and
‘ans × 2 − 3 on the ClassPad (here ‘ans’
represents the answer to the previous
5 State your answer. The first five terms are 5, 7, 11, 19, 35
Recurrence relations
A recurrence relation is a mathematical rule that we can use to generate a sequence. It has
two parts:
1 a starting value: the value of the first term in the sequence
2 a rule: that can be used to generate the next term from the current term.
For example, in words, a recurrence relation that can be used to generate the sequence:
10, 15, 20, . . .
can be written as follows:
1 Start with 10.
2 To obtain the next term, add 5 to the current term.
A more compact way of communicating this information is to translate this rule into
symbolic form. We do this by defining a subscripted variable. Here we will use the variable
Vn , but the V can be replaced by any letter of the alphabet.
Let Vn be the term in the sequence after n applications of the rule, called iterations.
In words In symbols
Starting value = 10 V0 = 10
Next term = current term +5 Vn+1 = Vn + 5
Using this definition, we can write a formal recurrence relation where the starting value is
defined, followed by the rule for generating the next term.
V0 = 10, Vn+1 = Vn + 5
Note: Because of the way we defined Vn , the starting value of n is 0. At the start there have been no
applications of the rule.
Write down the first five terms of the sequence defined by the recurrence relation
V0 = 29, Vn+1 = Vn − 4
Explanation Solution
1 Write down the starting value. V0 = 29
2 Use the rule to find the next V1 = V0 − 4
term, V1 . = 29 − 4
= 25
Use your calculator to generate this sequence and determine how many terms at the start
of the sequence are positive.
Explanation Solution
1 Start with a blank computation
300 300.
2 Type 300 and press · (or 300 · 0.5 − 9 141.
). 141 · 0.5 − 9 61.5
3 Next type ×0.5 − 9 and press 61.5 · 0.5 − 9 21.75
· (or ) to generate the 21.75 · 0.5 − 9 1.875
next term in the sequence. 1.875 · 0.5 − 9 −8.625
4 Continue to press · (or |
) until the first negative
term appears. The first five terms of the sequence are positive.
5 Write your answer.
Explanation Solution
1 Write the name for each term under its
3, 9, 15, 21, 27, 33
value in the sequence.
V0 V1 V2 V3 V4 V5
2 Read the value of each required term. V1 = 9, V4 = 27 V5 = 33
Exercise 7A
Example 5 5 Using your calculator, write down the first five terms of the sequence generated by each
of the recurrence relations below.
a A0 = 12, An+1 = 6An − 15 b Y0 = 20, Yn+1 = 3Yn + 25
c V0 = 2, Vn+1 = 4Vn + 3 d H0 = 64, Hn+1 = 0.25Hn − 1
e G0 = 48 000, Gn+1 = Gn − 3000 f C0 = 25 000, Cn+1 = 0.9Cn − 550
Example 6 6 Consider the following recurrence relations. Find the required term for each.
a A0 = 2, An+1 = An + 2. Find A2 .
b B0 = 11, Bn+1 = Bn − 3. Find B4
c C0 = 1, Cn+1 = 3Cn . Find C3
d D0 = 3, Dn+1 = 2Dn + 1. Find D5
10 How many terms of the sequence formed from the recurrence relation below are negative?
Y0 = 30, Yn+1 = 1.2Yn + 2
Linear growth means a value is increasing by the same amount in each unit of time. For
example, if you have $300 in your bank account and you add $20 each week, then your
savings will have linear growth. Similarly, linear decay is characterised as decreasing by the
same amount in each unit of time. For example, the depreciation of a new car by a constant
amount each year.
0 1 2 3 4 5
Q0 = 20, Qn+1 = Qn − 2 ‘subtract 2’ 20, 18, 16, . . . Qn
0 1 2 3 4 5
We refer to D as the common difference and can graph the sequence to obtain a straight
line graph of dots (do not join the dots). An upward slope indicates growth and a downward
slope reveals decay.
For each of the following recurrence relations, list the first four terms and graph the
corresponding points.
a V0 = 2, Vn+1 = Vn + 5
b W0 = 20, Wn+1 = Wn − 3
Explanation Solution
a From the rule, the starting value is 2. The first four terms are 2, 7, 12, 17.
The rule is ‘add 5’.
The corresponding points can then be 20
graphed. 15
0 1 2 3
b From the rule, the starting value is 20. The first four terms are 20, 17, 14, 11.
The rule is ‘subtract 3’.
The corresponding points can then be 25
graphed. 20
0 1 2 3
Cheryl invests $5000 in an investment account that pays 4.8% per annum simple interest.
Model this simple investment using a recurrence relation of the form:
V0 = the principal, Vn+1 = Vn + D, where D = V0 .
Let Vn be the value of the investment after n years.
Explanation Solution
1 Write down the value of V0 . V0 = 5000
2 Write down the interest rate r and use it r = 4.8
to determine the value of D = V0 . 4.8
100 D= × 5000 = 240
3 Use the values of V0 and D to write V0 = 5000, Vn+1 = Vn + 240.
down the recurrence relation.
Once we have a recurrence relation, we can use it to determine the value of an investment
after a given number of years.
Explanation Solution
a Calculate V0 , V1 , V2 and V3 .
V0 = 5000
V1 = 5000 + 240 = 5240
V2 = 5240 + 240 = 5480
V3 = 5480 + 240 = 5720
Thus, after three years, the value of
Cheryl’s investment is $5720.
b i On a blank calculation screen, type 5000
and press · (or ). 5000 5000.
ii Type +240 and press · (or ) until 5000. + 240 5240.
the value of the investment first exceeds
5240. + 240 5480.
5480. + 240 5720.
iii Count the number of times that 240 was
5720. + 240 5960.
added. Write your answer.
5960. + 240 6200.
For some large items, their value decreases over time. This is called depreciation.
Businesses take into account the impact of depreciation by tracking the likely value of an
asset at a point in time, called the future value. At some point in time or at a particular
value, called the scrap value, the item will be sold or disposed of as it is no longer useful to
the business.
There are a number of techniques for estimating the future value of an asset. Two of them,
flat rate depreciation and unit cost depreciation, can be modelled using linear decay
recurrence relations.
A new car was purchased for $24 000 in 2014. The car depreciates by 20% of its
purchase price each year. Model the depreciating value of this car using a recurrence
relation of the form:
V0 = initial value, Vn+1 = Vn − D, where D = V0
Let Vn be the value of the car after n years depreciation.
Explanation Solution
1 Write down the value of V0 . Here, V0 is V0 = 24 000
the value of the car when new.
2 Write down the annual rate of r = 20
depreciation, r, and use it to determine 20
r D= × 24 000 = 4800
the value of D = V0 . 100
3 Use the values of V0 and D to write V0 = 24 000, Vn+1 = Vn − 4800
down the recurrence relation.
Once we have a recurrence relation, we can use it to determine things such as the value of an
asset after a given number of years of flat rate depreciation.
Explanation Solution
a i Write down the recurrence relation. V0 = 24 000, Vn+1 = Vn − 4800
ii On a blank calculation screen, type
24000 24000.
24 000 and press · (or ).
24000. − 4800 19200.
iii Type –4800 and press · (or ) 19200. − 4800 14400.
twice to obtain the value of the car 14400. − 4800 9600.
after 2 years’ depreciation. 9600. − 4800 4800.
Write your answer. 4800. − 4800 0.
b i Continue pressing · (or )
until the car has no value. a $14 400
ii Write your answer.
b In 2028
c Use the amount of depreciation and 4800
c × 100% = 20%
initial value. 24000
The percentage depreciation
rate is 20%
A professional gardener purchased a lawn mower for $270. The mower depreciates in
value by $3.50 each time it is used.
a Model the depreciating value of this mower using a recurrence relation of the form:
b Use the model to determine the value of the mower after it has been used three times.
c How many times can the mower be used until its depreciated value is first less
than $250?
Explanation Solution
a 1 Write down the value of V0 . Here, V0 = 270
V0 is the value of the mower when
2 Write down the unit cost rate of D = 3.50
depreciation, D.
3 Write your answer. V0 = 270, Vn+1 = Vn − 3.50
b 1 Write down the recurrence relation. V0 = 270, Vn+1 = Vn − 3.50
2 On a blank calculation screen, type
270 and press · (or ). 270 270.
Type –3.50 and press · (or ) 270. − 3.5 266.5
three times to obtain the value of the
266.5. − 3.5 263.
mower after three mows.
263. − 3.5 259.5
259.5 − 3.5 256.
256. − 3.5 252.5
252.5 − 3.5 249.
sheet Exercise 7B
b Calculate the value of D using the interest rate and the rule D = V0 to find the
amount of interest paid each year.
c Model this simple investment using a recurrence relation of the form
V0 = starting value, Vn+1 = Vn + D
3 Huang invests $41 000 in an account that pays 6.2% per annum simple interest.
a Let Hn be the value of Huang’s investment after n years. State the value of H0 .
b Find the amount, in dollars, that Huang will receive each year from the investment.
c Complete the recurrence relation, in terms of H0 , Hn+1 and Hn , that would model the
investment over time. Write your answers in the boxes below.
H0 = , Hn+1 = Hn +
Example 9 4 The following recurrence relation can be used to model a simple interest investment of
$2000, paying interest at the rate of 3.8% per annum.
V0 = 2000, Vn+1 = Vn + 76
In the recurrence relation, Vn is the value of the investment after n years.
a Use the recurrence relation to show that the value of the investment after 3 years is
b Use your calculator to determine how many years it takes for the value of the
investment to first be worth more than $3000.
5 The following recurrence relation can be used to model a simple interest loan of $7000
with interest charged at the rate of 7.4% per annum.
V0 = 7000, Vn+1 = Vn + 518
In the recurrence relation, Vn is the value of the loan after n years.
a Use the recurrence relation to find the value of the loan after 1, 2 and 3 years.
b Use your calculator to determine how many years it takes for the value of the loan to
first have a value of more than $10 000.
6 The following recurrence relation can be used to model a simple interest investment. In
the recurrence relation, Vn is the value of the investment after n years.
V0 = 15 000, Vn+1 = Vn + 525
a i What is the principal of this investment?
ii How much interest is earned each year?
iii Calculate 525 as a percentage of 15 000 to find the annual interest rate of this
b State how many years it takes for the value of the investment to first exceed $30 000.
a Calculate the value of D using the interest rate and the rule D = V0 to find the
amount of depreciation each year.
b Model this simple investment using a recurrence relation of the form
C0 = starting value, Cn+1 = Cn − D
8 Wendy purchases a new chair for her dental surgery for $2800. The chair depreciates
by 8% of its purchase price each year.
a Show that the total amount, in dollars, that Wendy’s chair will depreciate by each
year is $244.
b Let Wn be the value of Wendy’s chair after n years. State the value of W0 .
c Complete the recurrence relation, in terms of W0 , Wn+1 and Wn , that would model
the investment over time by filling in the boxes below.
W0 = , Wn+1 = Wn +
Example 11 9 The following recurrence relation can be used to model the depreciation of a computer
with purchase price $2500 and annual depreciation of $400.
V0 = 2500, Vn+1 = Vn − 400
In the recurrence relation, Vn is the value of the computer after n years.
a Use the recurrence relation to find the value of the computer after 1, 2 and 3 years.
b Use your calculator recursively to determine how many years it takes for the value
of the computer to first be worth less than $1000.
10 The following recurrence relation can be used to model the depreciation of a car
purchased for $23 000 and depreciated at 3.5% of its original value each year.
V0 = 23 000, Vn+1 = Vn − 805
In the recurrence relation, Vn is the value of the car after n years.
a Use the recurrence relation to find the value of the car after 1, 2 and 3 years.
b Determine how many years it takes for the value of the car to first be worth less
than $10 000.
11 The following recurrence relation can be used to model the depreciation of a television.
In the recurrence relation, Vn is the value of the television after n years.
V0 = 1500, Vn+1 = Vn − 102
a i What is the purchase price of this television?
ii What is the depreciation of the television each year?
iii What is the annual percentage depreciation of the television?
b Use your calculator to determine the value of the television after 8 years.
c If the owner of the television decides to discard the television once it is first worth
less than $100, determine how long the owner will own the television before
discarding it.
13 The following recurrence relation can be used to model the depreciation of a printer
with purchase price $450 and depreciation of 5 cents for every page printed.
V0 = 450, Vn+1 = Vn − 0.05
In the recurrence relation, Vn is the value of the printer after n pages are printed.
a Write the first five terms of the sequence.
b Use your calculator to find the value of the printer after 20 pages are printed.
14 The following recurrence relation can be used to model the depreciation of a delivery
van with purchase price $48 000 and depreciation by $200 for every 1000 kilometres
V0 = 48 000, Vn+1 = Vn − 200
In the recurrence relation, Vn is the value of the delivery van after n lots of 1000
kilometres are travelled.
a Use the recurrence relation to find the value of the van after 1000, 2000 and 3000
b Use your calculator to determine the value of the van after 15 000 kilometres.
c Use your calculator to determine how many kilometres it takes for the value of the
van to reach $43 000.
15 Jasmine owns a cafe that sells juices. The commercial blender, purchased for $1440,
depreciates in value using the unit cost method.
The rate of depreciation is $0.02 per juice that is produced. The recurrence relation that
models the year-to-year value, in dollars, of the blender is
B0 = 1440, Bn+1 = Bn − 144
a Calculate the number of juices that the blender produces each year.
b Determine how many juices the blender can produce before its value becomes 0.
c Use your calculator to find the value of the blender after 36 000 juices have been
d The recurrence relation above could also represent the value of the blender
depreciating at a flat rate. What annual flat rate percentage of depreciation is
17 The value of a tandoori oven is depreciated using the flat rate method and can be
modelled using the following recurrence relation where T n is the value of the oven after
n years.
T 0 = 4500, T n+1 = T n − 405
The annual depreciation rate is closest to
A 8% B 8.5% C 9% D 9.5% E 10%
18 Jane purchased a motorbike for $5500. She will depreciate the value of her motorbike
by a flat rate of 10% of the purchase price per annum.
A recurrence relation that Jane can use to determine the value of the motorbike, Vn ,
after n years is
A V0 = 5500, Vn+1 = Vn + 550
B V0 = 5500, Vn+1 = Vn − 550
C V0 = 5500, Vn+1 = 0.9Vn
D V0 = 5500, Vn+1 = 1.1Vn
E V0 = 5500, Vn+1 = 0.2(Vn − 550)
While we can generate as many terms of a sequence as we like through repeated addition
and subtraction, the process can be tedious and so instead a rule can be used.
Consider the example of investing $2000 in a simple interest investment paying 5% per
annum. If we let Vn be the value of the investment after n years, we can use the following
recurrence relation to model this investment:
V0 = 2000, Vn+1 = Vn + 100
Using this recurrence relation we can write out the sequence of terms generated as follows:
V0 = 2000 = V0 + 0 × 100 (no interest paid yet)
V1 = V0 + 100 = V0 + 1 × 100 (after 1 year of interest paid)
V2 = V1 + 100 = (V0 + 100) + 100 = V0 + 2 × 100 (after 2 years of interest paid)
V3 = V2 + 100 = (V0 + 2 × 100) + 100 = V0 + 3 × 100 (after 3 years of interest paid)
V4 = V3 + 100 = (V0 + 3 × 100) + 100 = V0 + 4 × 100 (after 4 years of interest paid)
and so on.
Following this pattern, after n years of interest has been added, we can write:
Vn = 2000 + n × 100
This rule can be used to determine the value after n iterations in the sequence. For example,
using this rule, the value of the investment after 15 years would be:
V15 = 2000 + 15 × 100 = $3500
Write down a rule for Vn for each of the following recurrence relations. Calculate V10 for
each case.
a V0 = 8, Vn+1 = Vn + 3
b V0 = 400, Vn+1 = Vn − 12
c V0 = 30, Vn+1 = Vn − 7
Explanation Solution
a 1 Identify the starting value. V0 = 8
2 Identify the common difference, D. D=3
3 Write the rule for Vn , noting that this Vn = 8 + 3n
is an example of linear growth.
4 Calculate V10 . V10 = 8 + 3 × 10 = 38
b 1 Identify the starting value. V0 = 400
2 Identify the common difference, D. D = 12
3 Write the rule for Vn , noting that this Vn = 400 − 12n
is an example of linear decay.
4 Calculate V10 . V10 = 400 − 12 × 10 = 280
c 1 Identify the starting value. V0 = 30
2 Identify the common difference, D. D=7
3 Write the rule for Vn , noting that this Vn = 30 − 7n
is an example of linear decay.
4 Calculate V10 . V10 = 30 − 7 × 10 = −40
These general rules can be applied to simple interest investments and loans, flat rate
depreciation and unit cost depreciation.
Amie invests $3000 in a simple interest investment with interest paid at the rate of 6.5%
per year.
Use a rule to find the value of the investment after 10 years.
Explanation Solution
1 Identify the starting value. V0 = 3000
2 Identify the common difference, D. D= × 3000 = 195
3 Write the rule for Vn , noting that this is Vn = 3000 + 195n
an example of linear growth.
4 Calculate V10 . V10 = 3000 + 195 × 10 = 4950
The following recurrence relation can be used to model a simple interest investment:
V0 = 3000, Vn+1 = Vn + 260
where Vn is the value of the investment after n years.
a What is the principal of the investment? How much interest is added each year?
b Write down the rule for the value of the investment after n years.
c Use a rule to find the value of the investment after 15 years.
d Use a rule to find when the value of the investment first exceeds $10 000.
Explanation Solution
a These values can be read directly from a Principal: $3000
the recurrence relation. Amount of interest = $260
b Start with the general rule: b Vn = 3000 + n × 260
Vn = V0 + nD and substitute V0 = 3000 = 3000 + 260n
and D = 260.
c Substitute n = 15 into the rule to c V15 = 3000 + 260 × 15
calculate V15 . = $6900
d Substitute Vn = 10 000 into the rule, d 10 000 = 3000 + 260n
and solve for n. Write your conclusion. so 7000 = 260n
Note: Because the interest is only paid into
or n = 7000/260
the account after a whole number of years, any
decimal answer will need to be rounded up to = 26.92 . . . years
the next whole number. The value of the investment will first
exceed $10 000 after 27 years.
A photocopier costs $6000 when new. Its value depreciates at the flat rate of 17.5% per
year. Write a rule and use this to find its value after 4 years.
Explanation Solution
1 Identify the starting value. V0 = 6000
2 Identify the common difference, D. D= × 6000 = 1050
3 Write the rule for Vn , noting that this is Vn = 6000 − 1050n
an example of linear decay.
4 Calculate V4 . V4 = 6000 − 1050 × 4 = 1800
The value after 4 years is $1800.
The following recurrence relation can be used to model the flat rate of depreciation of a
set of office furniture:
V0 = 12 000, Vn+1 = Vn − 1200
where Vn is the value of the furniture after n years.
a What is the initial value of the furniture? How much does the furniture decrease by
each year?
b Write down the rule for the value of the investment after n years.
c Use a rule to find the value of the investment after 6 years.
d How long does it take for the furniture’s value to decrease to zero?
Explanation Solution
a These values can be read directly from the a Initial value: $12 000
recurrence relation. Depreciation = $1200 each year
b Start with the general rule Vn = V0 − nD and b Vn = 12 000 − n × 1200
substitute V0 = 12 000 and D = 1200. = 12 000 − 1200n
c Use the rule to calculate V6 . c V6 = 12 000 − 1200 × 6
= $4800
d Substitute Vn = 0, and solve for n. Write d 0 = 12 000 − n × 1200
your conclusion. so n = 10
The value of the furniture will
depreciate to zero after 10 years.
A hairdryer in a salon was purchased for $850. The value of the hairdryer depreciates
by 25 cents for every hour it is in use.
Let Vn be the value of the hairdryer after n hours of use.
a Write down a rule to find the value of the hairdryer after n hours of use.
b What is the value of the hairdryer after 50 hours of use?
c On average, the salon will use the hairdryer for 17 hours each week. How many weeks
will it take for the value of the hairdryer to halve?
d The hairdryer has a scrap value of $100 before it is disposed of. Find the number of
hours of use before this occurs.
Explanation Solution
a 1 Identify the values of V0 and D. V0 = 850 and D = 0.25
2 Write down the rule for the value of Vn = 850 − 0.25n
the hairdryer after n hours of use.
b 1 Decide the value of n and substitute After 50 hours of use, n = 50.
into the rule. V50 = 850 − 0.25 × 50
V50 = 837.50
2 Write your answer. After 50 hours of use, the hairdryer has a
value of $837.50.
c 1 Halving the value of the hairdryer Solve Vn = 425
means it will have a value of $425.
2 Write down the rule, with the value 425 = 850 − 0.25n
of the hairdryer, Vn = 425.
3 Solve the equation for n. 0.25n = 850 − 425
0.25n = 425
n = 1700
4 Divide by 17 as the hairdryer is used Number of weeks = 100
for 17 hours each week.
5 Write your answer. After 100 weeks, the hairdryer is expected
to halve in value.
d Solve for Vn = 100. 100 = 850 − 0.25n
0.25n = 750
n = 3000
Write your answer. The hairdryer can be used for 3000 hours
before it reaches its scrap value.
sheet Exercise 7C
3 Anthony borrows $12 000 from a bank at an annual simple interest rate of 7.2%.
a Let Vn be the value of the loan after n years. State the starting value, V0 .
b Determine how much interest is charged each year in dollars.
c Write down a rule for the value of the loan, Vn , after n years.
d Use your rule to find how much Anthony will owe the bank after 9 years.
Example 15 4 The value of a simple interest loan after n years, Vn , can be calculated using the
rule Vn = 8000 + 512n.
a What is the principal of this loan?
b How much interest is charged every year in dollars?
c Use the rule to find:
i the value of the loan after 12 years
ii when the value of the loan first doubles in value.
5 The value of a simple interest investment after n years, Vn , can be calculated using the
rule Vn = 2000 + 70n.
a What is the principal of this investment?
b How much interest is earned every year in dollars?
c Use the rule to find:
i the value of the investment after 6 years
ii when the value of the initial investment will first double in value.
7 A machine costs $7000 new and depreciates at a flat rate of 17.5% per annum. The
machine will be written off when its value is $875.
a State the starting value.
b Determine the annual depreciation in dollars.
c Write down a rule for the value of the machine, Vn , after n years.
d Determine the number of full years that the machine will be used (that is, has a
value greater than zero).
Example 17 8 The value of a sewing machine after n years, Vn , can be calculated from the
rule Vn = 1700 − 212.5n.
a What is the purchase price of the sewing machine?
b By how much is the value of the sewing machine depreciated each year in dollars?
c Use the rule to find the value of the sewing machine after 4 years.
d Find its value after 7 years.
e Determine the number of years it takes for the sewing machine to be worth nothing.
11 A car is valued at $35 400 at the start of the year, and at $25 700 at the end of that year.
During that year, the car travelled 25 000 kilometres.
a Find the total depreciation of the car in that year in dollars.
b Find the depreciation per kilometre for this car.
c Using V0 = 35 400, write down a rule for the value of the car, Vn , after n kilometres.
d How many kilometres have been travelled if the car has a value of $6688?
12 A printing machine costing $110 000 has a scrap value of $2500 after it has printed
4 million pages.
a Find:
i the unit cost of using the machine
ii the value of the machine after printing 1.5 million pages
iii the annual depreciation of the machine if it prints 750 000 pages per year.
b Find the value of the machine after 5 years if it prints, on average, 750 000 pages
per year.
c How many pages has the machine printed by the time the value of the machine is
$70 053?
For each recurrence relation, state the rule, find the first 6 terms and then plot each point
on a graph.
a V0 = 1, Vn+1 = 3Vn
b V0 = 8, Vn+1 = 0.5Vn
Explanation Solution
a 1 Convert to words. Starting value = 1
Next value = 3 × current value
2 Multiply each term by 3 to find the 1, 3, 9, 27, 81, 243
next term. Vn
3 Plot each of the points on the axis. 0 1 2 3 4 5
b 1 Convert to words. Starting value = 8
Next value = 0.5 × current value
2 Multiply each term by 0.5 to find the 8, 4, 2, 1, 0.5, 0.25
next term. Vn
0 1 2 3 4 5
3 Plot each of the points on the axis.
As can be seen from the previous example, the first recurrence relation generates a sequence
whose successive terms grow geometrically, while the second recurrence relation decays
We now have a recurrence relation that we can use to model and investigate the growth of an
investment over time. Compound interest loans and investments often accrue interest over
periods of less than a year which we will consider at the end of this chapter.
The following recurrence relation can be used to model a compound interest investment
of $2000 paying interest at the rate of 7.5% per annum.
V0 = 2000, Vn+1 = 1.075 × Vn
In the recurrence relation, Vn is the value of the investment after n years.
a Use the recurrence relation to show that the value of the investment after 3 years is
b Determine when the value of the investment will first exceed $2500.
Explanation Solution
a 1 Write down the principal, V0 . V0 = 2000
2 Use the recurrence relation to V1 = 1.075 × 2000 = 2150
calculate V1 , V2 and V3 and round to V2 = 1.075 × 2150 = 2311.25
the nearest cent. V3 = 1.075 × 2311.25 = 2484.59
b 1 Type ‘2000’. Press · (or ).
2000 2000.
2 Type ×1.075.
2000. · 1.075 2150.
3 Count how many times you press
· (or ) until the term value is 2150. · 1.075 2311.25
greater than 2500. 2311.25 · 1.075 2484.59375
2484.59375 · 1.075 2670.93828125
4 Write your answer. The investment will first exceed $2500
after 4 years.
A sofa was purchased for $7500 and is depreciating at a reducing balance rate of
8.4% per annum. Write down a recurrence relation where Vn is the value of the sofa after
n years.
Explanation Solution
1 Identify the value of V0 . V0 = 7500
2 Calculate the value of R. The depreciation rate is 8.4% per annum.
R=1− so R = 0.916
3 Write your answer. V0 = 7500, Vn+1 = 0.916 × Vn
The following recurrence relation can be used to model the value of office furniture with a
purchase price of $9600, depreciating at a reducing-balance rate of 7% per annum.
V0 = 9600, Vn+1 = 0.93 × Vn
In the recurrence relation, Vn is the value of the office furniture after n years.
a Use the recurrence relation to find the value of the office furniture, correct to the
nearest cent, after 1, 2 and 3 years.
b If the office furniture was initially purchased in 2023, at the end of which year will the
value of the investment first be less than $7000?
Explanation Solution
a 1 Write down the purchase price of the V0 = 9600
furniture, V0 .
2 Use the recurrence relation to V1 = 0.93 × 9600 = 8928
calculate V1 , V2 and V3 . Use your V2 = 0.93 × 8928 = 8303.04
calculator if you wish. V3 = 0.93 × 8303.04 = 7721.83
b Steps
4 Write your answer. The value of the furniture first drops below
$7000 after 5 years. Thus, it is first worth
less than $7000 at the end of 2028.
sheet Exercise 7D
Example 19 1 Generate and graph the first five terms of the sequences defined by the recurrence
a V0 = 2, Vn+1 = 2Vn
b V0 = 3, Vn+1 = 3Vn
c V0 = 100, Vn+1 = 0.1Vn
3 A loan of $20 000 is charged compounding interest at the rate of 6.3% per annum.
A recurrence relation that can be used to model the value of the loan after n years is
shown below.
V0 = 20 000, Vn+1 = 1.063Vn
In the recurrence relation, Vn is the value of the loan after n years.
a Use the recurrence relation to show that the value of the loan after 3 years is
$24 023.14.
b Determine how many years it takes for the value of the loan to first exceed $30 000.
5 Jay takes out a loan of $18 000 at a compounding interest rate of 9.4% per annum.
a State the principal (starting value).
b Determine R using R = 1 + .
c Let Vn be the value of the loan after n years. Write down a recurrence relation for
this loan.
d Use the recurrence relation to find the value of the loan after 4 years.
e When will the loan first be valued at more than $25 000.
7 Let Mn be the value of a minibus after n years. Write down a recurrence relation for a
minibus that was initially valued at $28 600 and is depreciated at a reducing-balance
rate of 7.4% per annum.
Example 22 8 Office furniture was purchased new for $18 000. It will be depreciated using a reducing
balance depreciation method with an annual depreciation rate of 4.5%. Let Vn be the
value of the furniture after n years.
a Write a recurrence relation to model the value of the furniture, Vn .
b Use the recurrence relation to find the value of the furniture after each of the first 5
years. Write the values of the terms of the sequence correct to the nearest cent.
c What is the value of the furniture after 3 years?
d What is the total depreciation of the furniture after 5 years?
9 A wedding gown was purchased new for $4 000. The value of the wedding gown
depreciates using a reducing balance depreciation method with an annual depreciation
rate of 4.1%. Let Wn be the value of the wedding dress after n years.
a Write a recurrence relation to model the value of the wedding dress, Wn .
b Calculate the value of the wedding dress after three years.
c Determine the total amount of depreciation of the dress after five years.
10 A new computer server was purchased for $13 420. The value of the computer server
depreciates using a reducing-balance depreciation method with an annual depreciation
rate of 11.2%. Let S n be the value of the server after n years.
a Write a recurrence relation to model the value of the server, S n .
b Use the recurrence relation to find the value of the server after each of the first 5
years. Write the values of the terms of the sequence correct to the nearest cent.
c What is the value of the server after 5 years?
d What is the depreciation of the server in the third year?
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
7D 7D Modelling geometric growth and decay 365
13 Mana invests $28 000 at an interest rate of 6.2% per annum, compounding annually.
Her investment will first be more than double its original value after
A 1 year B 2 years
C 10 years D 11 years
E 12 years
As with linear growth and decay, we can derive a rule to calculate any term in a geometric
sequence directly.
Assume $2000 is invested in a compound interest investment paying 5% per annum,
compounding yearly. Let Vn be the value of the investment after n years, giving the
following recurrence relation to model this investment:
V0 = 2000, Vn+1 = 1.05Vn
Using this recurrence relation we can write out the sequence of terms generated as follows:
V0 = 2000
V1 = 1.05V0
V2 = 1.05V1 = 1.05(1.05V0 ) = 1.052 V0
V3 = 1.05V2 = 1.05(1.052 V0 ) = 1.053 V0
V4 = 1.05V3 = 1.05(1.053 V0 ) = 1.054 V0
and so on.
Following this pattern, after n years of interest are added, we have:
Vn = 1.05n V0
With this rule, we can now find the value of the investment for any specific year. For
example, using this rule, the value of the investment after 18 years would be:
V18 = 1.0518 × 2000 = $4813.24 (to the nearest cent)
Exactly the same rule will work for both growth and decay, noting that R > 1 is used for
growth and R < 1 for decay.
Write down a rule for the value of Vn in terms of n for each of the following. Use the rule
to find the value of V6 .
a V0 = 5, Vn+1 = 4Vn
b V0 = 10, Vn+1 = 0.5Vn
Explanation Solution
a 1 V0 = 5, R = 4 Vn = 4n × 5
2 Substitute n = 6. V6 = 46 × 5 = 20480
b 1 V0 = 10, R = 0.5 Vn = 0.5n × 10
2 Substitute n = 6. V6 = 0.56 × 10 = 0.15625
The rule for the value of the investment after n years, Vn , is shown below.
Vn = 1.09n × 10 000
a State how much money was initially invested.
b Find the annual interest rate for this investment.
c Find the value of the investment after 4 years, correct to the nearest cent.
d Find the amount of interest earned over the first 4 years, correct to the nearest cent.
e Find the amount of interest earned in the fourth year, correct to the nearest cent.
f Determine if the investor has doubled their money within 10 years.
Explanation Solution
a Recall the formnof the direct rule $10 000
Vn = 1 + × V0 . Read off V0 .
b Since R = 1.09 = 1 + r=9
The annual interest rate is 9%.
c 1 Substitute n = 4 into the rule for the V4 = 1.094 × 10 000
value of the investment. V4 = 14 115.816 . . .
2 Write your answer, rounded to the After 4 years, the value of the investment is
nearest cent. $14 115.82, correct to the nearest cent.
d To find the total interest earned in Amount of interest
4 years, subtract the principal from the = $14 115.82 − $10 000
value of the investment after 4 years. = $4115.82
After 4 years, the amount of interest earned
is $4115.82.
e 1 Calculate V3 to the nearest cent. V3 = 1.093 × 10 000
V3 = 12 950.29 (nearest cent)
2 Calculate V4 − V3 . V4 − V3 = 14115.82 − 12950.29
= 1165.53
3 Write your answer. Interest of $1165.53 was earned in the
Note: An alternate method is to calculate fourth year.
9% of V3 .
f Calculate V10 and compare this to We require Vn = 2 × V0 = 20 000.
double the principal. Note V10 = 1.0910 × 10 000 = 23 673.64
Since 23 673.64 > 20 000, the investor has
doubled their money within 10 years.
With reducing balance depreciation, the value of an asset declines over time. The value of R
can be found using the formula R = 1 − where r% is the annual depreciation rate.
A machine costs $9500 to buy, and decreases in value with reducing balance depreciation
of 20% each year. A recurrence relation that can be used to model the value of the
machine after n years, Vn , is shown below.
V0 = 9500, Vn+1 = 0.8 × Vn
a Write down the rule for the value of the machine after n years.
b Use the rule to find the value of the machine after 8 years. Write your answer, correct
to the nearest cent.
c Calculate the total depreciation of the machine after 8 years.
Explanation Solution
a 1 Write down the values of V0 and R. V0 = 9500
R=1− = 0.8
2 Write down the rule. Vn = Rn × V0
Vn = 0.8n × 9500
b 1 Substitute n = 8 into the rule. V8 = 0.88 × 9500
V8 = 1593.835 . . .
2 Write your answer, rounding as After 8 years, the value of the machine is
required. $1593.84, correct to the nearest cent.
c To find the total depreciation after 8 Depreciation = $9500 − $1593.84
years, subtract the value of the machine = $7906.16
after 8 years from the original value of After 8 years, the machine has depreciated
the machine. Write your answer. by $7906.16.
How many years will it take for an investment of $5000, paying compound interest at 6%
per annum, to grow above $8000? Write your answer correct to the nearest year.
Explanation Solution
1 Write down the values of V0 , Vn and R. V0 = 5000, R = 1 + = 1.06
Vn = 8000
2 Substitute into the rule for the Vn = Rn × V0
particular term of a sequence. 8000 = 1.06n × 5000
3 Solve this equation for n using a CAS
solve (8000 = (1.06)n · 5000, n)
n = 8.06611354799
4 Write your answer, rounding up as The value of the investment will grow
interest is paid at the end of the year. above $8000 after 9 years.
After 8 years, the value is $7969.24.
An industrial weaving company purchased a new loom at a cost of $56 000. It has
an estimated value of $15 000 after 10 years of operation. If the value of the loom is
depreciated using a reducing balance method, what is the annual rate of depreciation?
Write your answer correct to one decimal place.
Explanation Solution
1 Write down the values of V0 , Vn , R V0 = 56 000, Vn = 15 000, n = 10
and n. R=1−
Exercise 7E
8 After how many years would an investment of $200 invested at 4.75% per annum,
compounding annually, first exceed a value of $20 000?
Example 27 9 An investment of $1000 has grown to $1601.03 after 12 years invested at r% per
annum compound interest. Find the value of r to the nearest whole number.
10 What annual reducing balance depreciation rate would cause the value of a car to drop
from $8000 to $6645 in 3 years? Give your answer to the nearest percent.
11 How much money must you deposit in a compounding interest investment at a rate of
6.8% per annum if you require $12 000 in 4 years’ time? Round your answer to the
nearest cent.
14 Amber invests $15 000 at an interest rate of 5.8% per annum, compounding annually.
After how many years will her investment first be more than double its original value?
A 1 B 2 C 10 D 12 E 13
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
7F Interest rates over different time periods and effective interest rates 373
Compound interest rates are usually quoted as annual rates, or interest rates per annum. This
rate is called the nominal interest rate for the investment or loan. Despite this, interest can
be calculated and paid according to a different time period, such as monthly. The time period
which compound interest is calculated and paid upon is called the compounding period.
An investment account will pay interest at the rate of 4.68% per annum. Convert this
interest rate to each of the following rates:
a monthly b fortnightly c quarterly.
Explanation Solution
a Divide by p = 12. Monthly interest rate = = 0.39%
b Divide by p = 26. Fortnightly interest rate = = 0.18%
c Divide by p = 4. Quarterly interest rate = = 1.17%
Brian borrows $5000 from a bank. He will pay interest at the rate of 4.5% per annum.
Let Vn be the value of the loan after n compounding periods.
Write down a recurrence relation to model the value of Brian’s loan if interest is
a yearly b quarterly c monthly.
Explanation Solution
a 1 Define the variable Vn . The Let Vn be the value of Brian’s loan after n
compounding period is yearly. years.
2 Determine the value of R where The interest rate is 4.5% per annum.
r = 4.5 and p = 1. 4.5
R=1+ = 1.045
100 × 1
3 Write the recurrence relation. V0 = 5000, Vn+1 = 1.045Vn
b 1 Define the variable Vn . The Let Vn be the value of Brian’s loan after n
compounding period is quarterly. quarters.
2 Determine the value of R, where The interest rate is 4.5% per annum.
r = 4.5 and p = 4. 4.5
R=1+ = 1.01125
100 × 4
3 Write the recurrence relation. V0 = 5000, Vn+1 = 1.01125Vn
c 1 Define the variable Vn . The Let Vn be the value of Brian’s loan after n
compounding period is monthly. months.
2 Determine the value of R where The interest rate is 4.5% per annum.
r = 4.5 and p = 12. 4.5
R=1+ = 1.00375
100 × 12
3 Write the recurrence relation. V0 = 5000, Vn+1 = 1.00375Vn
A principal value of $10 000 is invested in an account earning compound interest monthly
at the rate of 9% per annum.
Let Vn be the value of the investment after n months.
a Calculate the growth multiplier, R.
b Write down a recurrence relation for the value of the investment after n months.
c Write down a rule for the value of the investment after n months.
d Use this rule to find the value of the investment after 4 years.
Explanation Solution
a Since interest compounds monthly, R=1+ = 1.0075
100 × 12
p = 12.
b Substitute V0 and R to form the V0 = 10 000, Vn+1 = 1.0075Vn
recurrence relation.
c Substitute R = 1.0075 and V0 = 10 000 Vn = 1.0075n × 10 000
into the rule to find the rule for Vn .
d Substitute n = 48 (4 years = 48 V48 = 1.007548 × 10 000
months) into the rule. = $14314.05
Brooke would like to borrow $20 000 that she will repay entirely after one year. She is
deciding between two loan options:
option A: 5.95% per annum, compounding weekly
option B: 6% per annum, compounding quarterly.
a Calculate the effective interest rate for each investment.
b Which investment option is the best and why?
Explanation Solution
While the effective interest rate can be calculated manually, the CAS calculator can also be
used to quickly perform the calculation. To do this, the nominal interest rate and the number
of compounding periods in a year are required.
Marissa has $10 000 to invest. She chooses an account that will earn compounding
interest at the rate of 4.5% per annum, compounding monthly.
Use a CAS calculator to find the effective interest rate for this investment, correct to three
decimal places.
Explanation Solution
1 Press b and then select
8: Finance
5: Interest Conversion
2: Effective interest rate
to paste in the eff(. . . ) command.
The parameters of this function are
eff(nominal rate, number of times the
interest compounds each year).
2 Enter the nominal rate (4.5) and number
of times the interest compounds each
year (12) into the function, separated by
a comma. Press · to get the effective
interest rate.
1 Select Interactive, Financial, Interest
Conversion, ConvEff.
2 Enter the number of times the interest
compounds each year (12) and the
nominal rate (4.5) as shown .
Exercise 7F
Interest rate conversions
Example 28 1 Convert each of the annual interest rates below to an interest rate for the given time
period. Write your answers correct to two decimal places.
a 4.8% per annum to monthly b 8.3% per annum to quarterly
c 10.4% per annum to fortnightly d 7.4% per annum to weekly
e 12.7% per annum to daily
Example 30 4 A principal value of $20 000 is invested in an account earning compound interest of
6% per annum, compounding monthly.
a Write down a recurrence relation for the value of the investment, Vn , after n months.
b Write down a rule for Vn in terms of n.
c Use this rule to find the value of the investment after 5 years (60 months). Round
your answer to the nearest cent.
6 Wayne invests $7600 with a bank. He will be paid interest at the rate of 6% per annum,
compounding monthly. Let Vn be the value of the investment after n months.
a Write a recurrence relation to model Wayne’s investment.
b Write down a rule for Vn in terms of n.
c How much is Wayne’s investment worth after 5 months? Round your answer to the
nearest cent.
d After how many months will Wayne’s investment first double in value?
7 Jessica borrows $3500 from a bank. She will be charged compound interest at the rate
of 8% per annum, compounding quarterly. Let Vn be the value of the loan after
n quarters.
a Write a recurrence relation to model the value of Jessica’s loan.
b If Jessica pays back everything she owes to the bank after 1 year, how much money
will she pay? Round your answer to the nearest cent.
9 Stella borrows $25 000 from a bank and pays nominal compound interest of
7.94% per annum.
a Calculate the effective rate for the current loan when interest compounds fortnightly,
correct to two decimal places.
b Calculate the effective rate for this loan when interest compounds monthly, correct
to two decimal places.
c Should Stella choose fortnightly or monthly compounding?
10 Luke is considering a loan of $35 000. His bank has two compound interest rate
A: 8.3% per annum, compounding monthly
B: 7.8% per annum, compounding weekly.
a Calculate the effective interest rate for each of the loan options. Round your answers
to two decimal places.
b Calculate the amount of interest Luke would pay in the first year for each of the loan
options. Round your answers to the nearest cent.
c Which loan should Luke choose and why?
11 Sharon is considering investing $140 000. Her bank has two compound interest
investment options:
A: 5.3% per annum, compounding monthly
B: 5.5% per annum, compounding quarterly.
a Calculate the effective interest rate for each of the investment options. Round your
answers to two decimal places.
b Calculate the amount of interest Sharon would earn in the first year for each of the
investment options. Give your answer to the nearest dollar.
c Which investment option should Sharon choose and why?
13 An account increases by 7% in one year when interest compounds monthly. Find the
annual interest rate correct to 2 decimal places.
15 An amount of $4700 is invested, earning compound interest at the rate of 6.8% per
annum, compounding quarterly. The effective annual interest rate is closest to
A 6.80% B 6.97% C 6.98% D 7.02% E 7.03%
17 Maya invested $25 000 in an account at her bank with interest compounding monthly.
After one year, the balance of Maya’s account was $26 253.
The difference between the rate of interest per annum used by her bank and the
effective annual rate of interest for Maya’s investment is closest to
A 0.112% B 0.2% C 4.89%
D 4.9% E 5.012%
Key ideas and chapter summary
Recurrence A relation that enables the value of the next term in a sequence to be
relation obtained by one or more current terms. Examples include ‘to find
the next term, add two to the current term’ and ‘to find the next term,
multiply the current term by three and subtract five’.
Balance The balance is the value of a loan or investment at any time during the
loan or investment period.
Interest The fee that is added to a loan or the payment for investing money is
called the interest.
Simple interest Simple interest is a fixed amount of interest that is paid at regular time
intervals. Simple interest is an example of linear growth.
Scrap value Scrap value is the value of an item at which it is ‘written off’ or is
considered no longer useful or usable.
Flat rate Flat rate depreciation is a constant amount that is subtracted from
depreciation the value of an item at regular time intervals. It is an example of linear
Unit cost Unit cost depreciation is depreciation that is calculated based on units
depreciation of use rather than time. Unit-cost depreciation is an example of linear
Compounding Interest rates are usually quoted as annual rates (per annum). Interest
period is sometimes calculated more regularly than once a year, for example
each quarter, month, fortnight, week or day. The time period for the
calculation of interest is called the compounding period.
Reducing balance When the value of an item decreases as a percentage of its value after
depreciation each time period, it is said to be depreciating using a reducing balance
method. Reducing balance depreciation is an example of geometric
Nominal interest A nominal interest rate is an annual interest rate for a loan or
rate investment.
Effective interest The effective interest rate is the interest earned or charged by an
rate investment or loan, written as a percentage of the original amount
invested or borrowed. Effective interest rates allow loans or investments
with different compounding periods to be compared. Effective interest
r n
rates can be calculated using the rule reff = 1 + − 1 × 100%
100 × n
where r is the nominal annual interest rate and n is the number of
compounding periods in 1 year.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
7A 5 I can name terms in a sequence.
7B 7 I can model simple interest loans and investments with a recurrence relation.
7E 24 I can calculate the value and total depreciation of an asset after a period of
reducing balance depreciation.
Multiple-choice questions
1 Consider the following recurrence relation
V0 = 5, Vn+1 = Vn − 3
The sequence generated by this recurrence relation is
A 5, 15, 45, 135, 405, . . . B 5, 8, 11, 14, 17, . . .
C 5, 2, −1, −4, −7, . . . D 5, 15, 45, 135, 405, . . .
E 5, −15, 45, −135, 405, . . .
4 Brian has two trees in his backyard. Every month, he will plant three more trees.
A recurrence relation for the number of trees, T n , in Brian’s backyard after n months is
A T 0 = 2, T n+1 = 3T n
B T 0 = 2, T n+1 = 3T n + 3
C T 0 = 2, T n+1 = T n + 3
D T 0 = 2, T n+1 = T n − 3
E T 0 = 2, T n+1 = 3T n − 3
5 A graph that shows the value of a simple interest investment of $1000, earning interest
of $5 per month is
A Mn B Mn
1050 1050
1000 1000
950 950
900 n 900 n
O 1 2 3 4 5 O 1 2 3 4 5
C Mn D Mn
1050 1100
1000 1050
950 1000
900 950 n
n O
O 1 2 3 4 5 1 2 3 4 5
E Mn
950 n
O 1 2 3 4 5
6 A car is depreciated using a unit cost depreciation method. It was purchased for
$18 990 and, after travelling a total of 20 000 kilometres, it has an estimated value of
$15 990. The depreciation amount, per kilometre, is
A $0.15 B $0.80 C $0.95 D $6.67 E $3000
7 Arthur invests $2000 with a bank. He will be paid simple interest at the rate of 5.1%
per annum. If Vn is the value of Arthur’s investment after n years, a recurrence relation
for Arthur’s investment is
A V0 = 2000, Vn+1 = Vn + 5.1
B V0 = 2000, Vn+1 = 5.1Vn
C V0 = 2000, Vn+1 = 0.051Vn + 102
D V0 = 2000, Vn+1 = Vn + 102
E V0 = 2000, Vn+1 = 1.051Vn + 2000
10 The recurrence relation that generates a sequence of numbers representing the value of
a car n years after it was purchased is
V0 = 18 000, Vn+1 = Vn − 1098
The car had a purchase price of $18 000 and is being depreciated using
A flat rate depreciation at 6.1% of its value per annum
B flat rate depreciation at $6.10 per kilometre travelled
C flat rate depreciation at $1098 per kilometre travelled
D unit cost depreciation at $6.10 per kilometre travelled
E unit cost depreciation at $1098 per kilometre travelled
11 A computer is depreciated using a flat rate depreciation method. It was purchased for
$2800 and depreciates at the rate of 8% per annum. The amount of depreciation after
4 years is
A $224 B $448 C $794 D $896 E $1904
12 Sandra invests $6000 in an account that pays interest at the rate of 4.57% per annum,
compounding annually. The number of years it takes for the investment to exceed
$8000 is
A 5 B 6 C 7 D 8 E 9
13 The value of a machine is depreciating by 8% every year. The initial value is 2700.
A recurrence relation model for the value of the machine after n years, Pn , is
A P0 = 2700, Pn+1 = 1.8 × Pn
B P0 = 2700, Pn+1 = 1.08 × Pn
C P0 = 2700, Pn+1 = 0.92 × Pn
D P0 = 2700, Pn+1 = 1 + 8 × Pn
E P0 = 2700, Pn+1 = 1.08 + Pn
14 An investment of $50 000 is compounding annually over a number of years. The graph
that best represents the value of the investment at the end of each year is
Year Year
15 An item is depreciated using a reducing balance depreciation method. The value of the
item after n years, Vn , is modelled by the recurrence relation
V0 = 4500, Vn+1 = 0.86Vn
The rule for the value of the item after n years is
A Vn = 0.86n × 4500
B Vn = 1.86n × 4500
C Vn = (1 + 0.86)n × 4500
D Vn = 0.86 × n × 4500
E Vn = (1 − 0.86)n × 4500
17 The interest rate on a compound interest loan is 12.6% per annum, compounding
monthly. The value of the loan after n months, Vn , is modelled by the recurrence
V0 = 400, Vn+1 = R × Vn
The value of the growth multiplier, R, in this recurrence relation is
A 0.874 B 1.00 C 1.0105 D 1.126 E 2.05
18 An amount of $2000 is invested, earning compound interest at the rate of 5.4% per
annum, compounding quarterly. The effective annual interest rate is closest to
A 5.2% B 5.3% C 5.4% D 5.5% E 5.6%
19 A car was purchased for $74 500. It depreciates in value at a rate of 8.5% per year,
using a reducing balance depreciation method. The total depreciation of the car over
5 years is closest to
A $4439
B $26 718
C $37 522
D $47 782
E $112 022
20 Sam invested $6500 at 8.75% per annum with interest compounding monthly. If the
investment now amounts to $13 056, for how many years was it invested?
A 5 B 7 C 8 D 9 E 96
Written response questions
1 Jack borrows $20 000 from a bank and is charged simple interest at the rate of 9.4% per
annum. Let Vn be the value of the loan after n years.
a Write down a recurrence relation for the value of Jack’s loan after n years.
b Use the recurrence relation to model how much Jack will need to pay the bank after
5 years.
The bank decides to change the loan to a compound interest loan on a yearly basis,
with an annual interest rate of 9.4%. Let Wn be the value of the loan after n years.
c Write a recurrence relation to model the value of Jack’s loan.
d Write a rule for Wn in terms of n.
e Use the rule to find the value of the loan after 5 years. Round your answer to the
nearest cent.
2 Ilana uses a personal loan to buy a dress costing $300. Interest is charged at 18% per
annum, compounding monthly.
If she repays the loan fully after 6 months, how much will she pay? Round your answer
to the nearest cent.
3 Kelly bought her current car 5 years ago for $22 500.
Let Vn be the value of Kelly’s car after n years.
a If Kelly uses a flat rate depreciation of 12% per annum:
i write down a recurrence relation for the value of Kelly’s car after n years
ii use the recurrence relation to find the current value of Kelly’s car.
b If Kelly uses reducing value depreciation at 16% per annum:
i write down a recurrence relation for the value of Kelly’s car after n years
ii use the recurrence relation to find the current value of Kelly’s car using reducing
balance depreciation. Round your answer to the nearest cent.
c On the same axes, sketch a graph of the value of Kelly’s car against the number of
years for both flat rate and reducing balance depreciation.
4 A commercial cleaner bought a new vacuum cleaner for $650. The value of the
vacuum cleaner decreases by $10 for every 50 offices that it cleans.
a How much does the value of the vacuum cleaner depreciate when one office is
b Give a recurrence relation for the value of the vacuum cleaner, Vn , after n offices
have been cleaned.
c The cleaner has a contract to clean 10 offices, 5 nights a week for 40 weeks in a
year. What is the value of the vacuum cleaner after 1 year?
7 On the birth of his granddaughter, a man invests a sum of money at a rate of 11.65%
per annum, compounding twice per year.
On her 21st birthday he gives all of the money in the account to his granddaughter.
If she receives $2529.14, how much did her grandfather initially invest? Round your
answer to the nearest cent.
8 Geoff invests $18 000 in an investment account. After 2 years the investment account
contains $19 300.
If the account pays r% interest per annum, compounding quarterly, find the value of r,
to one decimal place.
Chapter objectives
I How can we combine both linear and geometric growth/decay?
I How do we model a compound interest investment where additional
payments are made?
I How can recurrence relations be used to model reducing balance loans?
I How can recurrence relations be used to model annuities?
I What are amortisation tables and how can they be used?
I How can a finance solver be used to analyse reducing balance loans,
annuities and investments with additional payments?
I What are interest-only loans?
I What are perpetuities?
Often loans and investments are more complex than described in the previous chapter.
In particular, loans are often paid off through regular payments, investments may have
additional contributions made throughout their life and interest rates may change. In
this chapter, we analyse investments with additional payments, reducing balance loans,
annuities, interest-only loans and perpetuities using our already developed tool of
recurrence relations as well as new tools such as amortisation tables and the Finance
Solver found on a CAS.
In the previous chapter, recurrence relations were used to model financial situations
with linear and geometric growth/decay such as simple and compound interest and the
depreciation of assets. Recurrence relations can also be used to model situations that involve
elements of both linear and geometric growth/decay.
There are several examples in finance that involve both geometric and linear growth or
decay. For example, an investment with compound interest grows geometrically over time
but might also have linear growth if regular additions are made. Alternatively, a personal
loan may be paid off with regular payments rather than at the conclusion of the loan period.
In general, a recurrence relation of the form
V0 = starting value, Vn+1 = R × Vn ± D
can be used to model situations that involve both geometric and linear growth/decay.
Write down the first five terms of the sequence generated by the recurrence relation
V0 = 3, Vn+1 = 4Vn − 1
Explanation Solution
1 Write down the starting value. 3
2 Apply the rule (multiply by 4, then 3 × 4 − 1 = 11
subtract 1) to generate four more terms. 11 × 4 − 1 = 43
43 × 4 − 1 = 171
171 × 4 − 1 = 683
3 Write your answer. The first five terms are 3, 11, 43, 171, 683
Fred has saved $5000 and invests this in a compound interest account paying 4% per
annum, compounding yearly. He also adds an extra $1000 each year.
Model this investment using a recurrence relation of the form
V0 = the principal, Vn+1 = RVn + D
where Vn is the value of the investment after n years.
Explanation Solution
1 Write down the value of V0 and D V0 = 5000 and D = 1000
where D is the amount added each year.
2 Determine the value of R using R=1+ = 1.04
r 100 × 1
R=1+ where r = 4 and p = 1
100 × p
because interest compounds annually.
3 Use the values of V0 , R and D to write V0 = 5000, Vn+1 = 1.04Vn + 1000
down the recurrence relation.
When interest compounds at intervals other than a year, we need to find the interest rate for
the compounding period which is calculated based on the following:
12 equal months in every year (even though some months have different numbers of days)
4 quarters in every year (a quarter is equal to 3 months)
26 fortnights in a year (even though there are slightly more than this)
52 weeks in a year (even though there are slightly more than this)
365 days in a year (ignore the existence of leap years).
Nor invests $1200 and plans to add an extra $50 each month. The account pays interest at
a rate of 3% per annum, compounding monthly.
Model this investment using a recurrence relation of the form
V0 = the principal, Vn+1 = RVn + D
where Vn is the value of the investment after n months.
Explanation Solution
1 Write down the value of V0 and D V0 = 1200 and D = 50
where D is the amount added each
2 Determine the value of R using the R=1+ = 1.0025
r 100 × 12
formula R = 1 + where r = 3
100 × p
and p = 12 because interest compounds
3 Use the values of V0 , R and D to write V0 = 1200, Vn+1 = 1.0025Vn + 50
down the recurrence relation.
Once we have a recurrence relation, we can use it to determine the value of the investment
after a given number of periods once interest has been paid and extra payments have been
added to the principal. This value can be plotted on a graph so that we can see the impact of
making additional payments over time.
Explanation Solution
a Note that V0 = 400. The initial investment was $400.
b Perform the calculations. V0 = $400
V1 = 1.005 × 400 + 30 = $432
V2 = 1.005 × 432 + 30 = $464.16
V3 = 1.005 × 464.16 + 30 = $496.48
The value of Albert’s investment is
c i Either continue performing the
400 400
calculations or use your CAS by:
ii Type 400 and press · (or ). 400 · 1.005 + 30 432
432 · 1.005 + 30 464.16
iii Type × 1.005 + 30 and press · (or
) six more times. 464.16 · 1.005 + 30 496.48
496.4808 · 1.005 + 30 528.96
528.9632 · 1.005 + 30 561.61
561.6180 · 1.005 + 30 594.42
From the example above, it is clear that making additional payments on a regular basis
causes the investment to increase more. In particular, making additional payments early on is
beneficial as compound interest is earned on the additional payment for longer. An example
of this type of investment might be saving for retirement.
Sometimes we are given a recurrence relation and asked to determine the annual interest
rate. To do this, we use the formula R = 1 + .
100 × p
Determine the annual interest rates for each of the following investments.
a Consider an investment given by the recurrence relation
Explanation Solution
r r
a Solve R = 1 + for r where Solve 1.005 = 1 + .
100 × p 100 × 12
R = 1.005 and p = 12 because interest r=6
is compounded monthly. Thus, the annual interest rate is 6%.
r r
b Solve R = 1 + for r where Solve 1.012 = 1 + .
100 × p 100 × 4
R = 1.012 and p = 4 because interest is r = 4.8
compounded quarterly. Thus, the annual interest rate is 4.8%.
Exercise 8A
3 Jane has already saved $300 000 and plans to add an extra $50 000 per year to an
investment account immediately after the interest payment is calculated. The account
pays interest of 5.2% per annum, compounding annually.
Let Vn is the value of the investment after n years.
a State the initial value of the investment, V0 .
b State the amount added each year, D.
c Determine the value of the growth multiplier, R.
d Model this investment using a recurrence relation of the form
Example 3 4 Henry invests $3500 and plans to add an extra $150 per month after the interest is
calculated. The account pays interest of 3.6% per annum, compounding monthly.
Let Vn be the value of the investment after n months.
a Determine the value of the growth multiplier, R.
b Model this investment using a recurrence relation of the form
5 Lois invests $1700 and plans to add an extra $100 per quarter. The account pays
interest of 3.2% per annum, compounding quarterly.
Let Vn be the value of the investment after n quarters.
a Model this investment using a recurrence relation of the form
6 Sarah invests $1500 at 7.3% per annum, compounding daily. She plans to add an extra
$4 to her investment each day, immediately after the interest is calculated.
Let Vn be the value of the investment after n days.
Write down a recurrence relation to model Sarah’s investment.
7 Rachel invests $24 000 at 6% per annum, compounding monthly. She plans to add an
extra $500 to her investment each month.
Let Vn be the value of the investment after n months.
Write down a recurrence relation to model Rachel’s investment and determine the value
of the investment after six months. Round your answer to the nearest cent.
9 A compound interest investment with regular quarterly additions to the principal can be
modelled by the recurrence relation
V0 = 20 000, Vn+1 = 1.025Vn + 2000
where Vn is the value of the investment after n quarters.
a What is the principal of this investment?
b How much is added to the principal each quarter?
c Use your calculator to determine the balance of the investment after three quarterly
payments have been made. Round your answer to the nearest cent.
d Plot the value of the investment after 0, 1, 2 and 3 quarters on a graph.
Example 5 10 Consider the compound interest investment with regular annual additions to the
principal given by the recurrence relation
V0 = 2000, Vn+1 = 1.08Vn + 1000
where Vn is the value of the investment after n years.
Determine the annual interest rate for the investment.
11 Consider the compound interest investment with regular quarterly additions to the
principal given by the recurrence relation:
V0 = 20 000, Vn+1 = 1.025Vn + 2000
where Vn is the value of the investment after n quarters.
Determine the annual interest rate for the investment.
14 Consider the following five recurrence relations representing the value of an asset after
n years, Vn .
V0 = 10 000, Vn+1 = Vn + 1500
V0 = 10 000, Vn+1 = Vn − 1500
V0 = 10 000, Vn+1 = 1.15Vn − 1500
V0 = 10 000, Vn+1 = 1.125Vn − 1500
V0 = 10 000, Vn+1 = 1.25Vn − 1500
How many of these recurrence relations indicate that the value of an asset is
A 0 B 1 C 2 D 3 E 4
Flora borrows $8000 at an interest rate of 13% per annum, compounding annually. She
makes yearly payments of $2100.
Construct a recurrence relation to model this loan, in the form
V0 = the principal, Vn+1 = RVn − D
where Vn is the balance of the loan after n years.
Explanation Solution
1 State V0 and D. V0 = 8000 and D = 2100
2 Determine the value of R using the formula R=1+ = 1.13
r 100 × 1
R=1+ , where r = 13 and p = 1.
100 × p
3 Use the values of V0 , R and D to write down the V0 = 8000, Vn+1 = 1.13Vn − 2100
recurrence relation.
Alyssa borrows $1000 at an interest rate of 15% per annum, compounding monthly. She
makes monthly payments of $250.
Construct a recurrence relation to model this loan, in the form
V0 = the principal, Vn+1 = RVn − D
where Vn is the balance of the loan after n months.
Explanation Solution
1 State V0 and D. V0 = 1000 and D = 250
2 Determine the value of R using the formula R=1+ = 1.0125
r 100 × 12
R=1+ where r = 15 and p = 12.
100 × p
3 Use the values of V0 , R and D to write down the V0 = 1000, Vn+1 = 1.0125Vn −250
recurrence relation.
Once we have a recurrence relation, we can use it to determine things such as the balance of
a loan after a given number of payments.
Explanation Solution
a i Write down the recurrence relation. V0 = 1000, Vn+1 = 1.0125Vn − 257.85
ii Type ‘1000’ and press ‘·’ or
‘ ’. 1000 1000
iii Type ‘× 1.0125-257.85’ and press 1000 ·1.0125 − 257.85 754.65
‘·’ (or ) 4 times to obtain the 754.65·1.0125 − 257.85 506.23
screen opposite. 506.2331·1.0125 − 257.85 254.71
254.7110·1.0125 − 257.85 0.044927
b Read the third line of the calculator. Balance $0.04 (to the nearest cent).
$506.23 (to the nearest cent)
An annuity is an investment where compound interest is earned and money is withdrawn
from the investment by the individual in the form of regular payments. The calculations used
to model the values of reducing balance loans and annuities are identical. The value of the
annuity represents how much money is left in the investment.
An annuity can be modelled with a recurrence relation. Once we have a recurrence relation,
we can use it to determine things such as the value of the annuity after a given number of
payments have been received.
Modelling an annuity
Let Vn be the value of the annuity after n payments have been made. Then
V0 = principal, Vn+1 = RVn − D
where D is the payment that has been made, R = 1 + is the growth multiplier, r
100 × p
is the annual interest rate and p is the number of compounding periods per year.
Reza invests $12 000 in an annuity that earns interest at the rate of 6% per annum,
compounding monthly, providing him with a monthly income of $2035.
a Model this annuity using a recurrence relation of the form
Explanation Solution
a i State the value of V0 and D. V0 = 12 000 and D = 2035
ii Determine the value of R using the R=1+ = 1.005
r 100 × 12
formula R = 1 + .
100 × p
iii Use the values of V0 , R and D to V0 = 12 000, Vn+1 = 1.005Vn − 2035
write down the recurrence relation.
12000. · 1.005 − 2035 10025.
b i Type 12000 and press · or . 10025. · 1.005 − 2035 8040.125.
ii Type × 1.005-2035 and press · or 8040.125. · 1.005 − 2035 6045.326.
four times to obtain the screen 6045.326. · 1.005 − 2035 4040.55.
sheet Exercise 8B
Example 7 2 Jackson borrows $2000 at an interest rate of 6% per annum, compounding monthly.
The loan will be repaid by making monthly payments of $339.
Let Vn be the balance of the loan after n months.
a State V0 and D.
b Determine the value of R, using the formula R = 1 + .
100 × p
c Use the values of V0 , R and D to write down the recurrence relation in the form:
3 Benjamin borrows $10 000 at an interest rate of 12% per annum, compounding
quarterly. The loan will be repaid with quarterly payments of $2600.
Let Bn be the balance of the loan after n quarters.
a Model this loan using a recurrence relation of the form:
4 Write a recurrence relation to model a loan of $3500 borrowed at 4.8% per annum,
compounding monthly, with payments of $280 per month.
Let Vn be the balance of the loan after n months.
5 Write a recurrence relation to model a loan of $150 000 borrowed at 3.64% per annum,
compounding fortnightly, with payments of $650 per fortnight.
Let Vn be the balance of the loan after n fortnights.
6 Consider a loan of $235 000 borrowed at 3.65% per annum, compounding daily, with
payments of $150 per day.
Let Vn be the balance of the loan after n days.
a Write a recurrence relation to model this loan.
b Find the value of the loan after 3 days. Round your answer to the nearest cent.
10 Sandra invests $750 000 in an annuity paying interest at the rate of 5.4% per annum,
compounding monthly. She receives a payment of $4100 per month until the annuity is
Let Vn be the value of the annuity after n payments have been received.
a State the value of V0 and D.
b Determine the value of the growth multiplier, R.
c Use your values of V0 , D and R to model this annuity using a recurrence relation of
the form:
V0 = the principal, Vn+1 = RVn − D
11 Helen invests $40 000 in an annuity paying interest at the rate of 6% per annum,
compounding quarterly. She receives a payment of $10 380 each quarter.
Let Vn be the balance of the loan after n quarters.
a Model this loan using a recurrence relation of the form:
14 Jeff invests $1 000 000 in an annuity and receives a regular monthly payment.
The balance of the annuity, in dollars, after n months, An , can be modelled by a
recurrence relation of the form
A0 = 1 000 000, An+1 = 1.0024An − 4000
a State the initial balance of the annuity.
b State the payment that Jeff receives each month.
c Calculate the annual compound interest rate.
d Calculate the balance of this annuity after two months.
15 Esme invests $100 000 in an annuity and receives a regular monthly payment.
The balance of the annuity, in dollars, after n months, En , can be modelled by the
recurrence relation
E0 = 100 000, En+1 = 1.0055En − 18 400
a What monthly payment does Esme receive?
b Find the annual interest rate for this annuity.
c At some point in the future, the annuity will have a balance that is lower than the
monthly payment amount. What is the balance of the annuity when it first falls
below the monthly payment amount? Round your answer to the nearest cent.
17 Tim invests $3800 in an annuity and receives a regular monthly payment of $480.
The balance of the annuity, in dollars, after n months, T n , can be modelled by a
recurrence relation of the form
T 0 = 3800, T n+1 = 1.002T n − 480
The balance of the annuity after three months is closest to
A $3327 B $3328 C $2854 D $2379 E $2380
8C Amortisation tables
Learning intentions
I To be able to apply the amortisation process.
I To be able to construct an amortisation table.
I To be able to analyse an amortisation table for a reducing balance loan.
I To be able to read and interpret an amortisation table for an annuity to find the interest
I To be able to interpret and construct an amortisation table for a compound interest
investment with additions to the principal.
Amortisation tables provide additional information for each period, rather than just the
balance after each payment.
We can also represent this information in table format, showing the impact of a payment,
interest and the subsequent reduction of the principal to give a new balance.
When this analysis is repeated, the results can be summarised in an amortisation table. The
amortisation table shows all of the details that explain how the new balance was calculated.
Note that the first line shows the initial value of the loan as the balance when no payments
have been received.
The first three payments for Alyssa’s loan are shown below.
Flora borrows $20 000 at an interest rate of 8% per annum, compounding annually. She
makes annual payments of $2500.
a State the principal of the loan.
b Calculate the initial interest charged on the principal.
c Determine the impact of the first annual payment to find the principal reduction.
d Calculate the new balance.
e Complete the row in the table below with your calculations.
Explanation Solution
a Read the principal from the The principal is $20 000.
question or recurrence
b Calculate the interest paid. Interest paid = 8% of $20 000 = $1600
c Principal reduction = payment − Principal reduction = 2500 − 1600
interest. = $900
d New balance = balance owing − New balance = 20 000 − 900 = $19 100
principal reduction
e Place each of the numbers from
Interest Principal reduction Balance
the calculations into the
1600.00 900.00 19 100.00
relevant boxes.
Flora borrows $20 000 at an interest rate of 8% per annum, compounding annually. She
makes annual payments of $2500.
Construct an amortisation table for Flora’s reducing balance loan for the first three
Repeat the calculations from Example 10, rounding all numbers to the nearest cent.
Once the new balance has been calculated, repeat the process for the first three payments.
A business borrows $10 000 at a rate of 8% per annum, compounding quarterly. The loan
is to be repaid by making quarterly payments of $2700.00. The amortisation table for this
loan is shown below.
Payment number Payment Interest Principal reduction Balance
0 0.00 0.00 0.00 10 000.00
1 2700.00 2500.00 7500.00
2 2700.00 150.00 4950.00
3 2700.00 99.00 2601.00
Explanation Solution
r 8
a Use , where r = 8 and p = 4 Interest paid = × 10 000 = $200
100 × p 100 × 4
since interest is calculated quarterly.
Interest paid = 2% × unpaid balance
Alternatively, note that $2700 is paid Or, Interest paid = $2700 − $2500 = $200
and the principal was reduced by $250.
b Principal reduction = payment − Principal reduction = 2700.00 − 150.00
interest = $2550.00
c New Balance = balance owing − Balance of the loan after 3 payments
principal reduction = 4950 − 2601 = $2349
Consider the following amortisation table for an annuity after 3 monthly payments.
a State the principal of the annuity and the amount of interest paid in the first month.
b Calculate the monthly interest rate.
c Find the value of A and B.
Explanation Solution
a Read off from the table. Principal: $12 000, Interest: $60
Interest 60
b Calculate: × 100 × 100 = 0.5% per month
Principal 12000
c A is the interest due on $7709.30 A: × 7709.30 = 38.55
B is the principal reduction after the B: 2200 − 38.55 = 2161.45
third payment.
Consider the following amortisation table for a compound interest investment with
monthly additions to the principal. Assume that interest compounds monthly.
Complete two additional lines for the table corresponding to payment 4 and payment 5.
Begin by calculating the monthly interest rate × 100 = 0.25%.
Now we can calculate the line associated with payment 4. The interest paid is calculated
on the balance $1359.40:
Interest = 0.25% × 1359.40 = $3.40
The principal increases by the interest and the additional payment:
Principal increase = interest + payment = 3.40 + 50 = $53.40
Thus, the new balance becomes:
New balance = previous balance + principal increase = 1359.40 + 53.40 = $1412.80
Repeating gives the following two lines of the table.
Exercise 8C
f Complete the next two rows of the amortisation table corresponding to payment 2
and 3 for Walter.
2 Ellie borrows $12 000 at an interest rate of 6% per annum, compounding monthly. She
makes regular repayments of $300 per month.
a State the principal of the loan.
b Calculate where r is the annual interest rate and p is the number of
100 × p
compounding periods each year.
c Calculate the interest charged on the principal in the first month.
d Determine the impact of the first monthly payment to find the principal reduction.
e Calculate the new balance after the first month.
f Complete the row in an amortisation table corresponding to payment 1.
g Complete the next two rows of the amortisation table corresponding to payment 2
and 3 for Ellie.
3 Anna borrows $36 000 at an interest rate of 8% per annum, compounding quarterly.
She makes regular repayments of $1000 per quarter.
a State the principal of the loan.
b Calculate where r is the annual interest rate and p is the number of
100 × p
compounding periods each year.
c Construct an amortisation table corresponding to the first three payments for the
5 The amortisation table for a loan with quarterly payments is shown below.
6 Ada has a reducing balance loan with an interest rate of 3.6% per annum, compounding
monthly. She makes monthly payments of $1800 as shown in the amortisation table
Payment number Payment Interest Principal reduction Balance
0 0.00 0.00 0.00 460 000.00
1 1800.00 1380.00 420.00 459 580.00
2 1800.00 1378.74 A 459 158.74
3 1800.00 B
11 Four lines of an amortisation table for an annuity investment are shown below.
The interest rate for this investment remains constant, but the amount of the additional
payment may vary.
Example 15 Finding the final payment for a reducing balance loan or annuity
Consider the following amortisation table for a reducing balance loan of $20 000 with an
interest rate of 8% per annum, compounding annually. Regular payments of $5009.12 are
made for the first four years as shown in the amortisation table.
Example 16 Finding the total payment made and total interest paid
Consider the following amortisation table for a reducing balance loan of $10 000 with
an interest rate of 8% per annum, compounding quarterly. Three quarterly payments of
$2626 are made.
Payment number Payment Interest Principal reduction Balance
0 0.00 0.00 0.00 10 000.00
1 2626.00 200.00 2426.00 7574.00
2 2626.00 151.48 2474.52 5099.48
3 2626.00 101.99 2524.01 2575.47
4 A B C 0.00
a Complete the amortisation table corresponding to payment four such that the final
payment ensures that the balance is 0.
b Calculate the total payment made for the loan.
c Calculate the total interest paid on the loan.
Explanation Solution
a Follow the previous example B: Interest = × $2575.47 = $51.51
100 × 4
by first finding the interest
C: Principal reduction = $2575.47
applied to the loan (B), the
principal reduction (C) and A: Final payment = 2575.47 + 51.51
the final adjusted payment = 2626.98
(A). Payment Interest Principal reduction
2626.98 51.51 2575.47
b Add up all of the payments Payments = 2626 × 3 + 2626.98 = $10 504.98
made over the four quarters.
c Subtract the principal from the Interest = 10 504.98 − 10 000 = $504.98
total payments. OR
Alternatively, we can add up Interest = 200 + 151.48 + 101.99 + 51.51 = $504.98
the interest column.
Plot a graph of the interest and principal reduction on the same graph.
For this loan, we can plot a graph for each
payment period to show that the amount
of interest paid each payment (blue dots)
declines while the amount of principal paid
increases (red dots). 200
Amount ($)
O 1 2 3 4 5
Payment (n)
Note that for a loan, the graph above shows how the amount of interest paid for each
payment (blue dots) decreases with the payment number, while the amount of principal paid
off increases (red dots). This is because the balance is decreasing and so the interest is being
calculated on a lower balance each period.
For a compound interest investment, we would expect to see the interest earned each period
increase because the balance increases each time a payment is made.
Exercise 8D
4 Charlie invests $4500 into an annuity with an interest rate of 5.4% per annum,
compounding monthly. He receives monthly payments of $760 for five months.
Calculate the value of the sixth payment that Charlie receives to ensure the annuity is
completely exhausted.
Finding the total payment made/received and the total interest paid/earned
Example 16 5 Consider the following amortisation table for an investment of $11 000 with an interest
rate of 4.8% per annum, compounding quarterly. Regular quarterly payments of $1200
are added each quarter as shown below for the first four quarters.
Payment number Payment Interest Principal increase Balance
0 0.00 0.00 0.00 11 000.00
1 1200.00 132.00 1332.00 12 332.00
2 1200.00 147.98 1347.98 13 679.98
3 1200.00 164.16 1364.16 15 044.14
4 1200.00 A B 16 424.67
a Find the value of A.
b Find the value of B.
c Find the total interest earned on the investment in the first four quarters.
6 Consider the following amortisation table for a reducing balance loan of $4000 with an
interest rate of 6% per annum, compounding monthly. A monthly payment of $344.14
is made for the first 11 months.
Payment number Payment Interest Principal reduction Balance
9 344.14 6.81 337.33 1023.71
10 344.14 5.12 339.02 684.69
11 344.14 3.42 340.72 343.97
12 A B C 0.00
7 Consider the final two lines of the amortisation table for an annuity of $30 000 with
an interest rate of 3.6% per annum, compounding quarterly. A quarterly payment of
$3903.50 is made for the first 7 quarters.
Payment number Payment Interest Principal reduction Balance
7 3903.50 69.32 3834.18 3868.37
8 A B C 0.00
8 Tania invests $12 000 in an annuity with an interest rate of 6.6% per annum,
compounding monthly. She receives regular monthly payments of $3040 per month
for three months followed by a final payment in the fourth month. Calculate the total
payment and the interest for the annuity.
Plot a graph of the interest and principal reduction on the same graph for the first six
10 The amortisation table below charts the growth of a compound interest investment with
regular additions made to the principal each month.
Plot a graph of the interest and principal increase on the same graph.
12 Ned borrows $15 000 at an interest rate of 6% per annum, compounding monthly. He
makes regular monthly payments of $3796.99 for three months followed by a final
payment in the fourth month.
The total interest that Ned pays on the loan is closest to
A $56 B $75 C $188 D $300 E $900
While the techniques used so far in this chapter are useful for performing a small number
of calculations, they are tedious over a long period. For example, a typical home loan may
involve monthly payments over 30 years. CAS calculators have a Finance Solver that allow
for larger calculations to be performed with ease.
1 Press ctrl + N
1 Tap Financial from the main
menu screen.
2 Select the compound interest
solver by tapping on Compound
Interest from the solver screen.
4 Tap Format and confirm that the setting for ‘Odd Period’ is
set to ‘off’ and ‘Payment Date’ is set to ‘End of period’.
5 When using Finance Solver to solve loan problems, there
will be one unknown quantity. To find its value, tap its
entry field and tap Solve.
In the example shown, tapping Solve will solve for Pmt.
Recall that a compound interest investment with regular additions to the principal is an
investment where the balance increases through both the interest earned and the additional
Lars invests $500 000 at 5.5% per annum, compounding monthly. He makes a regular
deposit of $500 per month into the account. What is the value of his investment after
5 years? Round your answer to the nearest cent.
Explanation Solution
1 Open Finance Solver and enter the information 60
below, as shown opposite. 5.
N: 60 (5 years) −500000
I%: 5.5
PV: –500 000 (you give this to the bank)
PMT: –500 (you give this to the bank) 12
FV: to be determined
Pp/Y: 12 payments per year
Cp/Y: 12 compounding periods per year
2 Solve for FV and write your answer, rounding to After 5 years, Lars’ investment
the nearest cent. Note that this is positive as the will be worth $692 292.30.
bank will give this money to you.
Andrew borrows $20 000 at an interest rate of 7.25% per annum, compounding monthly.
This loan will be repaid over 4 years with regular payments of $481.25 each month for 47
months followed by a final payment to fully repay the loan.
a How much does Andrew owe after 3 years? Round your answer to the nearest cent.
b What is the final payment amount that Andrew must make to fully repay the loan
within 4 years (48 months)? Round your answer to the nearest cent.
Explanation Solution
a 1 Open Finance Solver and enter the following:
3 Write your answer, correct to the nearest cent Andrew owes $5554.36.
b 1 Enter the information below.
N: 48 (number of months in 4 years)
I%: 7.25 (annual interest rate)
PV: 20000
Pmt or PMT (the payment amount is negative):
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
2 Solve for the unknown future value (FV). On the:
TI-NspireCAS: Move the cursor to the FV entry box
and press · to solve.
ClassPad: Tap on the FV entry box and tap Solve
The amount 0.1079. . . (11 cents) now appears in the
FV entry box.
Charlie invests $300 000 into an annuity, paying 5% interest per annum, compounding
monthly. Over the next ten years, Charlie receives a payment of $3182 per month from
the annuity for each month except the final month.
a Find the value of the annuity after five years. Round your answer to the nearest cent.
b Find the final payment from the annuity. Round your answer to the nearest cent.
Explanation Solution
a 1 Open Finance Solver and enter the following:
N: 60 (number of monthly payments in 5 years)
I%: 5.00 (annual interest rate)
PV: −300000 (negative to indicate that this is
money paid by Charlie to the bank)
Pmt or PMT: 3182 (positive to indicate that the
bank is paying back to Charlie)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
2 Solve for the unknown future value (FV). On the: 60
TI-NspireCAS: Move the cursor to the FV entry
box and press to · solve. 3182
ClassPad: Tap on the FV entry box and tap 168612.
sheet Exercise 8E
Determining the value of an investment with regular additions made to the principal
using a financial solver
Example 18 1 Wanda invested $20 000 at 7.1% per annum, compounding annually. She makes a
regular deposit of $6000 per year into the account.
a State whether the PV is positive or negative.
b State whether the PMT is positive or negative.
c Find the value of the investment after 10 years. Round your answer to the nearest
d Find the value of the investment after 30 years. Round your answer to the nearest
2 Ingrid invested $20 000 at 4.9% per annum, compounding monthly. She makes a
regular deposit of $380 per month into the account.
a Find the value of the investment after 5 months. Round your answer to the nearest
b Find the value of the investment after 3 years. Round your answer to the nearest
4 Suzanne borrows $25 000 at an interest rate of 7.8% per annum, compounding
monthly. She repays the loan with regular payments of $1200 per month and then a
final payment to bring the balance to zero.
a How much does Suzanne owe after 3 months? Round your answer to the nearest
b How much does Suzanne owe after 1 year? Round your answer to the nearest cent.
5 Rachel borrows $240 000 at an interest rate of 8.3% per annum, compounding
quarterly. She makes 119 regular payments of $5442.90 each quarter followed by a
final payment. Rachel repays the loan over thirty years.
a How much does Rachel owe after 6 years? Round your answer to the nearest cent.
b What is the final payment that Rachel must make to fully repay the loan in 30 years?
Round your answer to the nearest cent.
6 David borrows $50 000 for a new car at an interest rate of 4.6%, compounding weekly.
He repays the loan over 3 years with regular payments of $343.27 per week except for
the final payment.
a How much does David owe after 1 year? Round your answer to the nearest cent.
b What is the final payment that David must make to fully repay the loan in three
years? Round your answer to the nearest cent.
8 Eliza invests $20 000 into an annuity, paying 7.2% per annum, compounding monthly.
The annuity regularly pays $1732.37 per month, for eleven months followed by a final
payment to exhaust the annuity.
a Find the value of the annuity after three months. Round your answer to the nearest
b What is the final payment made to Eliza so that the value of the annuity is zero after
12 months? Round your answer to the nearest cent.
9 Ezra is going backpacking around Europe and has invested $15 000 into an annuity for
this trip. The annuity pays 6.8% per annum, compounding weekly for one year. The
annuity pays $298.57 per week for each week except for the final payment.
a Find the value of the annuity after twenty-six weeks. Round your answer to the
nearest cent.
b What is the final payment made to Ezra so that the value of the annuity is zero after
1 year? Round your answer to the nearest cent.
12 Benjamin invests $75 000 in an annuity, paying 7.3% per annum, compounding
Benjamin receives a payment of $2326 each month from the annuity.
The value of the annuity, correct to the nearest cent after 2 years is
A $26842.05 B $26842.06 C $69356.55 D $69356.56 E $85153.41
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
432 Chapter 8 Reducing balance loans, annuities and investments
As well as finding the future value or balance of a loan, annuity or investment, we can
also use the financial solver to find the interest rate, regular payment or length of the loan,
annuity or investment.
Example 21 Finding the interest rate for an investment with additional payments
Mingjia puts $20 000 into a compound interest investment where interest compounds
monthly. She adds $50 per month. She wants her investment to reach $40 000 in 10 years.
Find the annual interest rate required for this to occur. Round your answer to two decimal
Explanation Solution
1 Open finance solver and enter the following: 120
N: 120 (10 years) 4.
PV: −20000 −20000
When using a financial solver, rounding is very important so it is always a good idea to
check your answer in case you need to round up or down accordingly.
Example 22 Finding the regular monthly payment and time taken for an
investment with additions to the principal
Winston puts $20 000 into an investment, paying 5.1% interest per annum, compounding
a If Winston wants his investment to be worth at least $40 000 in 5 years, what is the
minimum he will need to add each month?
b If Winston invests $1000 each month immediately after interest is calculated, what is
the minimum number of months required for his investment to at least triple in value?
Explanation Solution
a 1 Open finance solver and enter the following:
N: 60 (5 years)
I%: 5.1 (annual interest rate)
PV: −20000
FV: 40000 (the annuity will be exhausted
after 10 years)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
2 Solve for Pmt or PMT.
Note: The sign of Pmt or PMT is negative, because it
is money that Winston invests.
3 Write your answer, noting that $208.34 per
month is insufficient as it gives a balance of
$39999.89. . . Winston will add $208.35 each
month to the investment.
b 1 Change the payment Pmt or PMT to −1000
and the FV to 60 000 and solve for N.
2 Write your answer, noting that 34 months has The value of Winston’s
a FV of $59598.147. . . so we need to round investment will take 35 months to
up triple.
Example 23 Determining the payment amount, total repayment and total amount
of interest paid for a reducing balance loan
Sipho borrows $10 000 to be repaid in 59 equal monthly payments followed by a 60th
payment of less than one dollar more than the regular payment. Interest is charged at the
rate of 8% per annum, compounding monthly.
a Find the regular monthly payment amount. Round your answer to the nearest cent.
b Find the final payment. Round your answer to the nearest cent.
c Find the total of the repayments on the loan. Round your answer to the nearest cent.
d Find the total amount of interest paid. Round your answer to the nearest cent.
Explanation Solution
a 1 Open Finance Solver and enter the following:
N: 60 (number of monthly payments in 5
years, assuming 60 equal payments)
I%: 8 (annual interest rate)
PV: 10000
FV: 0 (the balance will be zero when the
loan is repaid)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
2 Solve for the unknown future value (Pmt or
PMT). On the:
TI-Nspire: Move the cursor to the Pmt
entry box and press · to solve.
ClassPad: Tap on the PMT entry box and
tap Solve.
The amount –202.7639. . . now appears in the
Pmt or PMT entry box.
Note: The sign of the payment is negative to indicate
that this is money Sipho is giving back to the lender.
3 Write your answer. Sipho repays $202.76 as the
regular payment.
Example 24 Finding the interest rate, time taken and regular payment for an
Joe invests $200 000 into an annuity, with interest compounding monthly.
a What interest rate would allow Joe to withdraw $2500 each month for 10 years?
Round your answer to one decimal place.
b Assume the interest rate is 5% per annum and that Joe receives a regular monthly
payment of $3000. For how many months will Joe receive a regular payment?
c Assume that the interest rate is 5% per annum and that Joe wishes to be paid monthly
payments for 10 years. How much will he regularly receive each month?
d If Joe receives the regular monthly payment found in part c for 119 months, what will
his final payment be? Round your answer to the nearest cent.
Explanation Solution
a 1 Open finance solver and enter the following:
N: 120 (10 years)
PV: −200000
PMT: 2500
FV: 0 (exhausted after 10 years)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
Solve for I%.
2 Write your answer, rounding down as we are Joe will receive a regular
only counting regular payments. payment for 78 months.
c 1 Open the finance solver on your calculator
and enter the information below, as shown.
N: 120 (10 years)
I%: 5 (annual interest rate)
PV: −200000
FV: 0 (the annuity will be exhausted after
10 years)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
2 Solve for Pmt or PMT.
Note: The sign of Pmt or PMT is positive, because it
is money received.
sheet Exercise 8F
Example 22 2 Jemima puts $30 000 into an investment that compounds monthly.
a What annual interest rate would allow Jemima’s investment to double after 5 years if
she invests an additional $400 each month for 5 years? Round your answer to one
decimal place.
Assume the investment pays 3.2% per annum, compounding monthly.
b i If she wants her investment to be worth at least $40 000 in 1 year, what is the
minimum she will need to add to the investment each month? Round your
answer to the nearest cent.
Note: You will need to round your answer up so that it reaches $40 000.
ii If she invests an additional monthly payment of at least $1000, what is the
minimum number of months that it will take for the investment to first reach
$100 000?
3 Kelven puts $7500 into an investment, paying 4.7% per annum, compounding monthly.
He makes regular additional contributions to the investment each month. After one
year, Kelven’s investment is worth $13 991.15 to the nearest cent.
a Find the amount of Kelven’s regular monthly payment. Round your answer correct
to the nearest cent.
b i Find the amount that Kelven invested in the first year through monthly
ii Find the increase in the value of the investment in the first year.
iii Hence, find how much interest was earned in the first year. Round your answer
correct to the nearest cent.
c Given Kelven’s monthly payment found in a, how many months will it take for
Kelven’s investment to be worth at least $20 000?
5 A building society offers $240 000 loans at an interest rate of 10.25% compounding
monthly for a 30 year period.
a If payments are $2200 per month, calculate the amount still owing on the loan after
12 years. Round your answer to the nearest cent.
b If the loan has a regular monthly payment of $2150.64 for the first 359 payments,
i the final payment, rounding your answer to the nearest cent.
ii the total amount repaid, rounding your answer to the nearest cent.
iii the total amount of interest paid, rounding your answer to the nearest cent.
6 Rahul borrows $17 000 at an interest rate of 6.8% per annum, compounding monthly.
Rahul wishes to pay off the reducing balance loan in 30 months by making equal
payments for 29 months followed by a final payment that is as close to the regular
payment as possible.
a Find the regular monthly payment. Round your answer to the nearest cent.
b Find the final payment of the loan. Round your answer to the nearest cent.
c Find the total of the repayments of the loan.
d Find the total amount of interest that Rahul has paid.
7 Cale borrows $140 000 at an interest rate of 8.6% per annum, compounding quarterly.
Cale makes regular equal quarterly payments except for the final payment which is as
close to the regular payment as possible.
a If Cale pays off the loan in 10 years with 39 regular quarterly payments of
i find the final payment. Round your answer to the nearest cent.
ii find the total amount that Cale repays.
b Rounding each of your answers to the nearest cent, if Cale pays off the loan in 15
years, find
i the regular quarterly payment.
ii the final payment.
iii the total cost of repaying the loan.
8 Lorenzo borrows $250 000 at 5.2% per annum, compounding fortnightly. Lorenzo
makes 649 equal fortnightly payments followed by a final payment which is as close to
the regular payment as possible. Find the total cost of repaying the loan. Round your
answer to the nearest cent.
9 Joan takes out a loan of $50 000 with an interest rate of 4.9% per annum, compounding
monthly. She makes regular monthly payments for 23 months of $2191.33 followed by
a single final payment. How much interest does Joan pay in total over the duration of
the two year loan?
11 Sophia invests $300 000 into an annuity, paying 4.3% interest per annum, compound-
ing quarterly. She wishes to receive a payment of at least $5000 every quarter.
For how many quarters will Sophia receive at least $5000?
12 Kai invests $500 000 in an annuity. The annuity earns interest at the rate of 4.7% per
annum, compounding monthly. The balance of Kai’s annuity at the end of the first year
of the investment is $474 965.28.
a What monthly payment did Kai receive? Round your answer to the nearest cent.
b How much interest would Kai’s annuity earn in the first year? Round your answer to
the nearest cent.
14 Lachlan borrows $480 000 to buy an apartment using a reducing balance loan that
compounds monthly.
Lachlan makes regular monthly payments of $3075.72 followed by a final payment of
$3075.53. If the loan is paid out fully in 20 years, the annual interest rate is closest to
A 0.3875% B 0.0465% C 4.65% D 11.55% E 14.67%
15 Audrey invests $85 000 in an annuity, paying 6.3% per annum, compounding monthly.
Audrey receives a regular monthly payment from the annuity.
If the value of the annuity after one year is $71 983.41, the amount of interest earned in
the first year is closest to
A $1500 B $4983 C $5355 D $13 017 E $31 016
Sometimes the conditions of a reducing balance loan can change, requiring the regular
repayment to increase or decrease for the loan to be repaid in full. Similarly, a change in
the interest rate can also alter the payment received from an annuity or the balance of an
investment with compound interest rates. A financial solver on the CAS calculator can help
to solve for the regular payment or the new balance after a change has occurred.
Derek invests $50 000 into a compound interest investment paying 6.1% per annum,
compounding annually. Derek invests an additional $8000 per year immediately after
interest is calculated.
After five years, Derek increases his additional investment to $10 000 per year.
Calculate the value of Derek’s investment after twelve years (in total).
Explanation Solution
1 Open finance solver and enter the following:
N: 5 (5 years before the change)
I%: 6.1 (annual interest rate)
PV: -50000 (value of initial investment)
PMT: -8000 (additional amount added)
Pp/Y: 1 (annual payment)
Cp/Y: 1 (interest compounds annually)
Solve for FV.
Adrian borrows $150 000 for 25 years at an interest rate of 6.8% per annum, compound-
ing monthly.
For the first three years, Adrian repays $1041.11 each month.
After 3 years, the interest rate rises to 7.2% per annum. Adrian still wishes to pay off
the loan in 25 years so makes 263 monthly payments of $1076.18 followed by a final
Calculate the final payment to ensure the loan is fully repaid at the end of 25 years.
Round your answer to the nearest cent.
Explanation Solution
1 Open the finance solver on your calculator
and enter the information below, as shown
N: 36 (number of monthly payments in
3 years)
I%: 6.8 (annual interest rate)
PV: 150000 (initial value of loan)
Pmt: −1041.11 (monthly repayments)
Pp/Y: 12 (monthly payments)
Cp/Y: 12 (interest compounds monthly)
Note: You can enter N as 3 × 12 (3 years of monthly
payments). The finance solver will calculate this as
36 for you.
A similar analysis can be used for both annuities and investments with additions to the
Exercise 8G
2 Peta invests $20 000 into a compound interest investment paying 4.8% per annum,
compounding monthly.
Peta invests an additional $200 per month immediately after interest is calculated.
After ten years, Peta increases her additional investment to $500 per month.
Caculate the value of Peta’s investment after twenty years (in total). Round your
answer to the nearest cent.
3 Jarrod opens an account with an initial balance of $0 that pays interest at a rate of
6% per annum, compounding monthly.
He makes monthly deposits of $500 to the account for 10 years.
After 10 years of making deposits, Jarrod withdraws the balance and places it in an
annuity, also with an annual interest rate of 6%, compounding monthly. He withdraws
$500 each month from the account.
a How much does he invest in the annuity after the initial 10 years?
b How much will remain in the annuity after 10 years? Round your answer to the
nearest cent.
5 A couple negotiates a 25-year mortgage of $500 000 at a fixed rate of 7.5% per annum
compounding monthly for the first seven years.
The monthly repayment amount of $3694.96 is paid each month for seven years.
After seven years, the interest rate rises to 8.5% per annum. The couple now pay
$3959.44 each month.
Calculate the value of the loan after a further seven years at the higher interest rate.
Round your answer to the nearest cent.
6 Zian borrows $750 000 for a new home at an interest rate of 8.5% per annum,
compounding monthly.
For the first five years, he only pays the interest so the value of the loan remains at
$750 000.
a Calculate Zian’s monthly repayments.
After five years the interest rate increases to 9.4%. Zian must now pay more each
month in order to pay the loan in full within the original 30 years. He does this by
making 299 regular monthly repayments followed by a final payment which is as close
to the regular payment as possible.
b Calculate the new regular monthly payment amount Zian must make.
c Find the final payment.
d Calculate the total amount that Zian pays over 30 years.
e How much interest will Zian pay over the lifetime of the loan?
8 Ethan invests $125 000 into an annuity from which he receives a regular monthly
payment of $850. The interest rate for the annuity is 5.4% per annum, compounding
a Let Vn be the balance of the annuity after n monthly payments. Write a recurrence
relation written in terms of V0 , Vn+1 and Vn to model the value of this annuity from
month to month.
b After two years, the interest rate for this annuity will fall to 4.1%.
So that Ethan will continue to receive a monthly payment of $850 for the following
18 years, he will add an extra one-off amount to the annuity at this time.
Determine the minimum value of the one-off addition. Give your answer to the
nearest dollar.
9 Sameep deposits $150 000 into a savings account earning 6% per annum, com-
pounding monthly, for 10 years. He makes no withdrawals or deposits during that
a Let S n be the balance of Sameep’s investment after n months. Write a recurrence
relation to model this investment.
b What is the value of this account after 10 years? Round your answer to the nearest
After 10 years Sameep withdraws the money and invests the full amount into an
annuity. He will require this investment to provide monthly withdrawals of $2600.
c What is the minimum annual interest rate required if Sameep’s investment is to be
exhausted after 10 additional years? Round your answer to two decimal places.
11 When Jessica starts working, she sets up an investment account with an initial balance
of $1000. Each month she deposits $200 into the account.
The account has an annual interest rate of 4.9% compounding monthly.
a Find the balance of the investment after one year. Round your answer to the nearest
b How much interest has the investment earned in the first year? Round your answer
to the nearest cent.
After three years, the interest rate increases to 6% per annum and Jessica will increase
her monthly deposit to $350 per month.
c Find the balance of Jessica’s investment account after two years at the higher
interest rate. Round your answer to the nearest cent.
d How many payments of $350 will Jessica need to make until her investment first
exceeds $35 000?
13 Thirty years ago, Irene invested a sum of money in an account earning interest at the
rate of 3.1% per annum, compounding monthly.
After 10 years, the interest rate changed.
For the next twenty years, the account earned interest at the rate of 2.7% per annum,
compounding monthly.
The balance of her account today is $876 485.10.
The sum of money that Irene originally invested is closest to
A $360 300 B $375 000 C $390 000 D $511 100 E $670 000
14 Calvin plans to retire from his work in 12 years’ time and hopes to have $800 000 in an
annuity investment at that time.
The present value of this annuity investment is $227 727.96, where the interest rate is
3.6% per annum, compounding monthly.
To make this investment grow faster, Calvin adds $2500 at the end of each month.
Two years from now, Calvin expects the interest rate to fall to 3.3% per annum,
compounding monthly, and to remain at this level until he retires.
When the interest rate changes, Calvin must change his monthly payment if he wishes
to make his retirement goal.
The value of his new monthly payment will be closest to
A $1950 B $2500 C $2560 D $2600 E $2630
8H Interest-only loans
Learning intentions
I To be able to find the regular payment amount for an interest-only loan with and
without a financial solver.
I To be able to find the amount borrowed for an interest-only loan.
I To be able to find the interest rate for an interest-only loan.
In an interest-only loan, the borrower repays only the interest that is charged. As a result,
the balance of the loan remains the same for the duration of the loan. To understand how
this happens, consider a loan of $1000 with an interest rate of 5% per annum, compounding
yearly. The interest that is charged after 1 year will be 5% of $1000, or $50. If the borrower
only repays $50, the value of the loan will still be $1000.
The recurrence relation V0 = 1000, Vn+1 = 1.05Vn − D can be used to model this loan.
The table below shows the balance of the loan over a 4-year period for three different
payment amounts: D = 40, D = 50 and D = 60.
D = 40 D = 50 D = 60
V0 = 1000, V0 = 1000, V0 = 1000,
Vn+1 = 1.05Vn − 40 Vn+1 = 1.05Vn − 50 Vn+1 = 1.05Vn − 60
V0 = 1000 V0 = 1000 V0 = 1000
V1 = 1010 V1 = 1000 V1 = 990
V2 = 1020.50 V2 = 1000 V2 = 979.50
V3 = 1031.525 V3 = 1000 V3 = 968.475
V4 = 1043.101 . . . V4 = 1000 V4 = 956.898 . . .
The amount owed keeps The amount owed stays The amount owed keeps
increasing. constant. decreasing.
Jane borrows $50 000 to buy some shares. Jane negotiates an interest-only loan at an
interest rate of 9% per annum, compounding monthly. What is the monthly amount Jane
will be required to pay?
Explanation Solution
Calculation method
Use the rule D = × V0 . V0 = 50 000
100 × p
1 V0 is the amount borrowed = $50 000 r
D= × V0
100 × p
2 Calculate the interest payable where
r = 9 and p = 12. D= × 50 000
100 × 12
3 Evaluate the rule for these values and D = 375
write your answer. Jane will need to repay $375 every month
on this interest-only loan.
Finance solver method
A loan at 6% per annum, compounding monthly, requires payments of $440 each month.
If the loan is an interest-only loan, what is the principal?
Explanation Solution
1 Use formula D = × V0 , where Solving for the principal:
100 × p 6
D = 440, r = 6, p = 12. 440 = × V0
100 × 12
V0 = 88 000
2 Write the answer.
3 The principal is $88 000.
Exercise 8H
2 In order to invest in the stockmarket, Jamie takes out an interest-only loan of $50 000.
If the interest on the loan is 8.4% per annum compounding monthly, find his monthly
payment amount.
3 Robert takes out an interest-only loan for $220 000 at an interest rate of 5.46% per
annum, compounding fortnightly. Find the fortnightly payment.
4 Frannie borrows $180 000 at an interest rate of 4.95% per annum, compounding
quarterly. If Frannie only pays the interest, find the total payments made over a five
year period.
5 Jackson takes out an interest-only loan of $30 000 from the bank to buy a painting. He
hopes to resell it at a profit in 12 months’ time. The interest on the loan is 9.25% per
annum, compounding monthly. He makes monthly payments on the loan.
a Find the total amount that Jackson pays in 12 months.
b How much will he need to sell the painting for in order not to lose money?
6 Ric takes out an interest-only loan of $600 000 to buy an investment property. The
interest on the loan is 5.11% per year, compounding monthly.
a Calculate Ric’s monthly repayments if he only pays the interest.
b Ric sells the property after 10 years. Calculate the total interest paid on the loan.
c How much must Ric sell the property for if he wishes to make a profit of at least
$100 000?
7 Mindy borrows $35 000 for 20 years at 6.24% per annum, compounding monthly. For
the first five years, Mindy pays interest only.
a Calculate the monthly repayments that Mindy makes for the first two years.
b State the balance of the loan after five years.
c For the next 179 months, Mindy pays $300 per month followed by a smaller
payment to fully repay the loan. Find this final repayment. Round your answer to
the nearest cent.
d Find the total amount that Mindy paid for the duration of the 20 year loan.
9 An interest-only loan with an interest rate of 6.6% per annum, compounding monthly,
requires a monthly payment of $88. What is the principal?
10 Yianni took out an interest-only loan with an interest rate of 4.2% per annum,
compounding monthly. Over a two year period, Yianni paid $2352 in total. Find the
principal of the loan.
12 An interest-only loan of $12 000 compounds monthly and requires monthly payments
of $36. What is the annual interest rate?
13 Leo takes out an interest-only loan of $35 000 which compounds monthly and requires
monthly payments. Over a two year period, Leo pays a total of $3360. What is the
annual interest rate?
14 Svetlana borrows $320 000 on an interest-only loan at an interest rate of 4.92% per
annum, compounding monthly for the first 5 years. Following this, the interest rate
a Calculate the monthly repayments that Svetlana makes for the first five years.
b Calculate the total amount that Svetlana paid during the first five years.
For the next five years, Svetlana pays $86 400 on the interest-only loan.
c State the total interest that she paid during the second half of the loan.
d Calculate the monthly repayment that she made during the second half of the loan.
e Hence, find the annual interest rate during the second half of the loan.
8I Perpetuities
Learning intentions
I To be able to calculate the regular payment from a perpetuity.
I To be able to calculate the investment required to establish a perpetuity.
I To be able to calculate the interest rate of a perpetuity.
Recall that an annuity involves money being deposited in an investment and then withdrawn
over time in the form of regular payments. In our earlier analysis, we considered the case
where the withdrawals were made to exhaust the annuity over a given time frame. That is,
the value of the annuity eventually reached zero.
If the regular payments are smaller than the interest received, the annuity will continue
to grow. If the payments received are exactly the same as the interest earned in each
compounding period, the annuity will maintain its value indefinitely. This type of annuity
is called a perpetuity and the payments that are equal to the interest earned can be
made forever (or in Perpetuity). Perpetuities have the same relationship to annuities as
interest-only loans have to reducing balance loans.
Modelling perpetuities
Let Vn be the value of the perpetuity after n payments have been made. Then
V0 = principal, Vn+1 = RVn − D
where R = 1 + is the growth multiplier, r is the annual interest rate, p is the
100 × p
number of compounding periods per year and D is the regular payment per compounding
period which is equal to the interest earned, given by
D= × V0
100 × p
Elizabeth invests her superannuation payout of $500 000 into a perpetuity that will
provide a monthly income.
If the interest rate for the perpetuity is 6% per annum, what monthly payment will
Elizabeth receive?
Explanation Solution
1 Find the monthly interest D= × V0
100 × p
= × 500000
100 × 12
= 2500
2 Write your answer, rounding Elizabeth will receive $2500 every month from her
as required. investment.
Calculate how much money will need to be invested in a perpetuity account, earning
interest of 4.8% per annum compounding monthly, if $300 will be withdrawn every
Explanation Solution
1 Use the rule D = × V0 to write
100 × p
down an equation that can be solved 300 = × V0
100 × 12
for V0 .
V0 =
= 75000
A university mathematics faculty has $30 000 to invest. It intends to award an annual
mathematics prize of $1500 with the interest earned from investing this money in a
What is the minimum interest rate that will allow this prize to be awarded indefinitely?
Explanation Solution
We will consider just one compounding
period because all compounding periods
will be identical.
Calculation method
r r
1 Use the rule D = × V0 and 1500 = × 30 000
100 × p 100 × 1
solve the equation for r.
1500 = r × 300
r= =5
2 Write your answer. The minimum annual interest rate to award
this prize indefinitely is 5%.
Financial solver
Exercise 8I
2 Craig wins $1 000 000 in a lottery and decides to place it in a perpetuity that pays
5.76% per annum interest, compounding monthly.
a What monthly payment does he receive?
b How much interest does he earn in the first year?
3 Donna sold her cafe business for $720 000 and invested this amount in a perpetuity.
The perpetuity earns interest at a rate of 3.6% per annum. Interest is calculated and
paid monthly.
a What monthly payment will Donna receive from this investment?
b After three years, the interest rate for the perpetuity increases. Describe whether
Donna’s monthly payment will increase, decrease or stay the same.
5 Barbara wishes to start a scholarship that will reward the top mathematics student each
quarter with a $600 prize.
If the interest on the initial investment averages 4.8% per annum, compounding
quarterly, how much should be invested?
6 Omar inherits $920 000 and splits the money between a perpetuity and an annuity
The perpetuity pays $2340 each month based on an interest rate of 5.2% per annum
that compounds monthly.
The annuity investment has an interest rate of 4.8% per annum that compounds
a Calculate how much Omar invested in the perpetuity.
b State how much is initially invested in the annuity investment.
c Omar realises that he only needs $2000 from his perpetuity each month and so he
adds $340 as an aditional payment into the annuity investment each month. Find the
value of the annuity investment after three years.
8 Benjamin has $12 000 to invest in a perpetuity to provide a prize of $750 each
year. What is the minimum interest rate that he requires in order to pay the prize in
perpetuity if interest compounds annually?
Analysis of perpetuities
10 Marco invests $350 000 in a perpetuity from which he will receive a regular monthly
payment of $1487.50.
The perpetuity earns interest at the rate of 5.1% per annum.
a Determine the total amount, in dollars, that Marco will receive after one year of
monthly payments.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
456 Chapter 8 Reducing balance loans, annuities and investments 8I
b Write down the value of the perpetuity after Marco has received one year of
monthly payments.
c Let Mn be the value of Marco’s perpetuity after n months. Write down a recurrence
relation in terms of M0 , Mn+1 and Mn , that would model the value of this perpetuity
over time.
11 Zihan invests $200 000 in a perpetuity from which he will receive a regular payment.
There are two options available:
Option A: Earn interest at a rate of 3.6% per annum, compounding monthly with a
regular monthly payment.
Option B: Earn interest at a rate of 3.8% per annum, compounding annually with a
regular annual payment.
a Calculate the monthly payment from Option A.
b Calculate the annual payment from Option B.
c Determine which option pays the most over one year.
d Let Zn be the value of the perpetuity after n payments that pays the most over the
course of a year. Write down a recurrence relation in terms of Z0 , Zn+1 and Zn , that
would model the value of this perpetuity over time.
13 Aaliyah invests $120 000 in a perpetuity from which she will receive a regular
monthly payment. The perpetuity has a compound interest rate of 5.2% per annum and
compounds monthly. The amount that Aaliyah will receive from the perpetuity in the
first two years is closest to
A $520 B $624 C $6240 D $12 480 E $149 760
Key ideas and chapter summary
Reducing balance A reducing balance loan is a loan that attracts compound interest but is
loan reduced in value by making regular payments.
Each payment partly pays the interest that has been added and partly
reduces the value of the loan.
Amortisation An amortising loan is one that is paid back with periodic payments. An
amortising investment is one that is exhausted by regular withdrawals.
Amortisation of reducing balance loans tracks the distribution of each
periodic payment, in terms of the interest paid and the reduction in the
value of the loan.
Amortisation of an annuity tracks the source of each withdrawal,
in terms of the interest earned and the reduction in the value of the
Interest-only An interest-only loan is a loan where the regular payments made are
loan equal in value to the interest charged. Interest-only loans have the same
value after each payment is made.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
8C 11 I can analyse an amortisation table for an annuity to find the interest rate.
8C 12 I can interpret and construct an amortisation table for a compound interest
investment with additional payments.
8D 13 I can find the final payment for a reducing balance loan or annuity.
8D 14 I can find the total payment made and the total interest paid.
8E 16 I can determine the value of an investment with regular additions made to the
principal using a financial solver.
8E 17 I can determine the balance and final payment of a reducing balance loan after
a given number of payments.
8F 19 I can find the interest rate for an investment with additional payments.
8F 20 I can find the regular monthly payment and the time taken for an investment
with additions to the principal.
8F 21 I can determine the payment amount, total cost and total amount of interest
paid for a reducing balance loan.
8F 22 I can find the interest rate, time taken and regular payment for an annuity.
8G 23 I can find the value of an investment when the regular payment changes.
8G 24 I can analyse the impact of a change in the interest rate on a reducing balance
loan, an annuity and an investment.
8H 25 I can find the repayment amount for an interest-only loan with and without
finance solver.
Multiple-choice questions
1 An investment of $18 000, earning compound interest at the rate of 6.8% per annum,
compounding yearly, and with regular additions of $2500 every year can be modelled
with a recurrence relation. If Vn is the value of the investment after n years, the
recurrence relation is
A V0 = 18000, Vn+1 = 1.006Vn − 2500 B V0 = 2500, Vn+1 = 1.068Vn − 18 000
C V0 = 18 000, Vn+1 = 1.068Vn + 2500 D V0 = 18000, Vn+1 = 1.068Vn − 2500
E V0 = 2500, Vn+1 = 1.006Vn − 18000
2 Let Vn be the value of an investment after n months. The investment is modelled by the
recurrence relation V0 = 25 000, Vn+1 = 1.007Vn − 400. The annual interest rate for
this investment is
A 0.084% B 0.7% C 2.8% D 8.4% E 36.4%
Questions 4 and 5 relate to the following information.
A loan of $28 000 is charged interest at the rate of 6.4% per annum, compounding monthly.
It is repaid with regular monthly payments of $1200.
4 Correct to the nearest cent, the value of the loan after 5 months is
A $21 611.35 B $22 690.33 C $23 763.59 D $24 831.16 E $31 363.91
5 Following 24 regular payments of $1200, a final payment is made to fully repay the
loan. The final payment on the loan, correct to the nearest cent, will be
A $1125.41 B $1131.41 C $1175.20 D $1181.47 E $1200
7 Paula borrows $12 000 from a bank, to be repaid over 5 years. Interest of 12% per
annum is charged monthly on the amount of money owed. If Paula makes regular
monthly payments of $266.90, then the amount she owes at the end of the second year
is closest to
A $2880 B $5590 C $6410 D $8040 E $9120
8 Ayush invests $12 000 in an annuity from which he receives a regular monthly payment
of $239. The annuity earns interest of 7.2% per annum, compounding monthly.
The balance of the annuity after three months is closest to
A $11 495 B $11 496 C $11 665 D $12 938 E $12 939
9 James invests $50 000 in an annuity from which he receives a regular monthly payment
of $925.30.
The balance of the annuity, in dollars, after n months, Jn , can be modelled by a
recurrence relation of the form
J0 = 50 000, Jn+1 = 1.0035Jn − 925.30
The balance of the annuity after six months is closest to
A $45 289 B $45 458 C $45 459 D $56 659 E $56 660
Questions 10–13 refer to the following amortisation table for a reducing balance loan.
13 Assuming that payments are made monthly and interest compounds monthly, the
annual interest rate on the loan is
A 0.4% B 0.48% C 4% D 4.8% E 16%
14 Amir borrows $1500 in a reducing balance loan at a rate of 3.6% per annum,
compounding quarterly.
She makes regular repayments of $383.45 each quarter for three quarters followed by a
final payment.
To pay out her loan fully, her final payment is
A $0.01 B $0.10 C $383.35 D $383.45 E $383.55
15 Monthly withdrawals of $220 are made from an account that has an opening balance of
$35 300, invested at 7% per annum, compounding monthly. The balance of the account
after 1 year is closest to
A $32 660 B $33 500 C $35 125 D $35 211 E $40 578
16 Tilly invests $5000 in an account that pays interest at the rate of 3.9% per annum,
compounding annually.
She makes an additional payment of $1200 each year.
The number of years that it will take the investment to first reach a balance of $20 000
A 1 B 2 C 9 D 10 E 13
17 Twenty years ago, Oscar invested $65 000 in an account earning interest at the rate of
2.8% per annum, compounding monthly.
After 10 years, he made a one-off payment of $20 000 to the account.
For the next 10 years, the account earned interest at the rate of 3.2% per annum,
compounding monthly. The balance of the account today is closest to
A $65 000 B $85 975.39 C $105 975.39 D $113 719.51 E $145 879.51
18 The monthly payment on an interest-only loan of $175 000, at an interest rate of 5.9%
per annum, compounding monthly, is closest to
A $198 B $397 C $860 D $1117 E $2581
19 A scholarship will be set up to provide an annual prize of $400 to the best Mathematics
student in a school. The scholarship is paid for by investing an amount of money into a
perpetuity, paying interest of 3.4% per annum, compounding annually. The amount that
needs to be invested to provide this scholarship is closest to
A $400 B $800 C $1176 D $11 764 E $11 765
20 Pham invests $74 000 from which he will receive a regular monthly payment.
The perpetuity has a compound interest rate of 4.8% per annum and compounds
monthly. The amount that Pham will receive from the perpetuity in the first two years
is closest to
A $296 B $3233 C $3552 D $7104 E $77 756.64
c If Samantha deposited her money into an annuity and withdrew $2000 per month,
after how many months is the investment first below $100,000?
d If Samantha deposited her money into an annuity and withdrew $4000 per month for
each month until the final month:
i How many regular payments of $4000 would she receive?
ii What would be the value of her last withdrawal?
3 A loan of $10 000 is to be repaid over 5 years with 19 equal quarterly payments of
$656.72 followed by a final payment. Interest is charged at the rate of 11% per annum
compounding quarterly.
a the final payment. Round your answer to the nearest cent.
b the sum of all repayments, to the nearest dollar.
c the total amount of interest paid, to the nearest dollar.
4 The Andersons were offered a $24 800 loan to pay for a new car. Their loan is to be
repaid in equal monthly payments of $750, except for the last month when less than
this will be required to fully pay out the loan. The interest rate is 10.8% per annum,
compounding monthly.
a Find the number of months needed to repay this loan.
b Calculate the amount of the final payment. Round your answer to the nearest cent.
c Calculate the total interest that is paid on the loan.
5 Elsa borrowed $100 000 at 9.6% per annum, compounding quarterly. The loan was to
be repaid over 25 years with 99 equal quarterly payments followed by a final payment
that is as close to the regular payment as possible.
a How much of the first quarterly payment went towards paying off the principal?
b Elsa inherits some money and decides to terminate the loan after 10 years by paying
what is owing in a lump sum. How much will this lump sum be?
6 Helene won $750 000 in a lottery. She decides to place the money in an investment
account that pays 4.5% per annum interest, compounding monthly.
a How much will Helene have in the investment account after 10 years?
b After 10 years, Helene withdraws the money from the investment account and
places it in an annuity. The annuity pays 3.5% per annum, compounding monthly.
Helene receives $6000 per month from the annuity. For how many months will she
receive $6000?
c Helene’s accountant suggests that rather than purchase an annuity she places
the money in a perpetuity so that she will be able to leave some money to her
grandchildren. If she places $1,100,000 into a perpetuity that pays 3.6% per annum
compounding monthly, how much is the monthly payment that Helene will receive?
Recursion and financial
4 The value of a commercial fridge, purchased for $9000, is depreciated by 10% per
annum using a reducing balance method.
Recursive calculations can determine the value of the fridge after n years, Vn .
Which one of the following recursive calculations is not correct?
A V0 = 9000
B V1 = 0.9 × 9000
C V2 = 0.9 × 8100
D V3 = 0.9 × 7290
E V4 = 0.9 × 6560
6 Luke has purchased a caravan for $75 000.
He depreciates the value of the caravan using the reducing balance method.
For the first two years of reducing balance depreciation, the annual depreciation rate
was 12%.
Luke then changed the annual depreciation to d per cent.
After three more years of reducing balance depreciation, the value of the caravan was
The changed depreciation rate, d per cent, is closest to
A 9% B 9.5% C 10% D 10.5% E 11.5%
7 Tran invests $375 000 in an account that pays interest at the rate of 3.6% per annum,
compounding monthly.
He makes additional payments of $400 each month into this account.
The value of Tran’s account, in dollars, after n months, T n , can be modelled by the
recurrence relation shown below.
T n = 375 000, T n+1 = 1.003T n + 400
The balance of Tran’s account first exceeds $650 000 at the end of month
A 15 B 16 C 143 D 144 E 145
The interest rate for Witter’s loan changed after one of these repayments had been
The first repayment with the lower interest rate was repayment number
A 14 B 15 C 16 D 17 E 18
12 Alicia invested some money in a perpetuity from which she receives a payment of
$675.10 each quarter.
The perpetuity pays interest of 4.3% per annum, compounding quarterly.
How much money did Alicia invest in the perpetuity?
A $6280 B $15 700 C $18 840 D $62 800 E $188 400
14 Lucy borrows $180 000 in an interest-only loan.
Interest is calculated and paid monthly.
If Lucy pays $1080 per month, then the annual interest rate is
A 0.06% B 0.6% C 0.72% D 6% E 7.2%
15 Maya borrowed $32 000 to buy a car and was charged interest at the rate of 10.4% per
annum, compounding monthly.
For the first year of the loan, Maya made monthly repayments of $380.
For the second year of the loan, Maya made monthly repayments of $510.
The total amount of interest that Maya paid over this two-year period was closest to
A $3050 B $3268 C $4362 D $6318 E $10 680
2 Tina spent $25 000 on a new coffee machine for her cafe.
The value of the coffee machine will depreciate by $0.50 per hour of use.
The recurrence relation below can be used to model the value of the coffee machine,
Vn , after n years.
V0 = 25 000, Vn+1 = Vn − 936
a Use recursion to show that the value of the coffee machine after three years is
$22 192.
b Tina uses the coffee machine all 52 weeks of the year for the same number of hours
each week.
For how many hours each week is the coffee machine used?
4 Millie takes out a reducing balance loan of $240 000. The interest rate for the loan is
3.6% per annum, compounding fortnightly.
Millie pays $1350 per fortnight and decides to do so for all payments except the final
payment which will be lower.
a How many of Millie’s payments will be exactly $1350?
b After seven years of repayments, Millie decides to pay the remaining balance of her
How much will Millie need to pay?
c In which of the years will Millie pay the most interest?
5 Hugh bought a motorscooter to deliver food for a local restaurant.
He paid $4450 for the scooter.
The value of Hugh’s scooter depreciated by a fixed amount for each evening shift that
he completed.
After 20 evening shifts, the value of the scooter decreased by $50.
a What was the value of Hugh’s scooter after 20 evening shifts?
b Write a calculation that shows that the value of Hugh’s scooter depreciated by $2.50
per evening shift.
c The value of Hugh’s scooter after n evening shifts can be determined using a rule.
Complete the rule below by writing the appropriate numbers in the boxes provided.
Hn = − ×n
d Using the rule, find the value of Hugh’s scooter after 200 evening shifts.
e The value of the scooter continues to depreciate by $2.50 per evening delivery shift.
After how many shifts will the value of Hugh’s scooter first fall below $3000?
Chapter questions
I What is a matrix?
I How is the order of a matrix defined?
I How are the positions of the elements of a matrix specified?
I How do we use matrices to represent information and solve practical
I How do we use matrices to represent a network diagram?
I What are the rules for adding and subtracting matrices?
I How do we multiply a matrix by a scalar?
I What is the method for multiplying a matrix by another matrix?
I How do we form and use permutation, communication and dominance
I How can your CAS calculator be used to do matrix operations?
Matrix algebra was first studied in England in the middle of the nineteenth century.
Matrices are now used in many areas of science and business: for example, in physics,
medical research, encryption and internet search engines.
In this chapter we will show how addition and multiplication of matrices can be defined
and how matrices can be used to describe the relationship between people, businesses
and sporting teams.
In Chapter 11 we will see how matrices can be used to represent networks.
If we extract the numbers from the table and enclose them in square brackets, we form a
matrix. We might call this matrix D (for data matrix). We use capital letters A, B, C, etc. to
name matrices.
Order of a matrix
Order of a matrix = number of rows × number of columns
Row matrices
Matrices come in many shapes and sizes. For example, from this same set of data, we could
have formed the matrix called K
K = 173 64 18 90
This matrix has been formed from just one row of the data: the data values for Kate.
Because it only contains one row of numbers, it is called a row matrix (or row vector). It is
a 1 × 4 matrix: one row by four columns. It contains 1 × 4 = 4 elements.
Column matrices
Equally, we could form a matrix called H (for height
matrix). This matrix is formed from just one column of
the data, the heights of the students.
Because it only contains one column of numbers, it is
called a column matrix (or column vector). This is H =
an 8 × 1 matrix: eight rows by one column. It contains
8 × 1 = 8 elements.
Example 1
State the order of each of the following matrices.
1 1 4 4 5
1 5
4 4 3 4 6
a 3 0 b 1 5 8 9 0 c d
4 4 3 2 1
7 6
9 9 1 0 7
Note: The transpose of a 3 × 2 matrix is a 2 × 3 matrix because the rows and columns are switched.
0 1 2
b Write down the matrix .
3 2 5
c If A = 2 0 1 , write down the matrix AT .
Explanation Solution
a The transpose of the matrix is obtained by
7 8
switching (interchanging) its rows and columns.
4 1
b The symbol T is an instruction to transpose the T 0 3
0 1 2
matrix. = 1 2
3 2 5
2 5
c The symbol T is an instruction to transpose AT = 2 0 1 = 0
matrix A. 1
Square matrices
As a final example, we could form a matrix we call M (for
173 57 18 86
males). This matrix contains only the data for the males. As
179 58 19 82
this matrix has four rows and four columns, it is a 4 × 4 matrix. M =
195 84 18 71
It contains 4 × 4 = 16 elements.
184 74 22 78
A matrix with an equal number of rows and columns is called a
square matrix.
For each of the matrices below, write down its type, order and the number of elements.
Diagonal matrices
1 2 3 4 1 2 3 4
5 6 7 8 5 6 7 8
A square matrix has two diagonals:
0 1 2 9 0 1 2
3 4 5 6 3 4 5 6
In practice, the diagonal going downwards from left to right in the matrix (coloured red)
turns out to be more important than the other diagonal (coloured blue), so we give it a
special name: the leading diagonal.
A square matrix is called a diagonal matrix if all of the elements off the leading diagonal are
zero. The elements on the leading diagonal may or may not be zero.
1 0 0 0
2 0 0
2 0
0 1 0 0 6 0 0
The matrices opposite are all diagonal matrices:
0 1 0 0 0
0 0 3
0 0 0 6
Identity matrices
Diagonal matrices in which each element in the diagonal is 1 are of special importance.
They are called identity or unit matrices and have their own name and symbol (I).
Every order of square matrix has its
1 0 0 0
own identity matrix, three of which are 1 0 0
1 0
I = 0 1 0 I = 0 1 0 0
shown opposite. I =
0 1 0 0 1 0
0 0 1
0 0 0 1
Symmetric matrices
A symmetric matrix is a square matrix that is
1 2 4 6
unchanged by transposition (switching rows and 2 3 4
2 3
3 1 5 2 1 5 7
columns). In a symmetric matrix, the elements
3 1 5 3 8
above the leading diagonal are a mirror image of the
4 5 3
elements below the diagonal. Three are shown. 6 7 8 5
Triangular matrices
Triangular matrices come in two types:
1 An upper triangular matrix is a square matrix in which all elements below the leading
diagonal are zeros.
2 A lower triangular matrix is a square matrix in which all elements above the leading
diagonal are zeros.
Examples of triangular matrices are shown.
1 0 0 0
1 2 3
3 2 0 0
0 4 5
6 5 4 0
0 0 6
0 9 8 7
upper triangular matrix lower triangular matrix
Explanation Solution
1 0 0
1 2 1 0
a All the elements below the leading , and 0 4 0
0 3 0 1
diagonal are 0.
0 0 3
Some notation
In some situations, we talk about a matrix and its elements without having specific numbers
in mind. We can do this as follows.
For the matrix A, which has n rows and m columns, we write:
a21 represents the element in the second row and the first column
a12 represents the element in the first row and the second column
a22 represents the element in the second row and the second column
amn represents the element in the mth row and the nth column.
Explanation Solution
a a12 is the element in the first row and the second a12 = 5
column of A.
b a21 is the element in the second row and the first a21 = −1
column of A.
c a33 is the element in the third row and the third a33 = 6
column of A.
d b31 is the element in the third row and the first b31 = 1
column of B.
In some instances, there is a rule connecting the value of each matrix with its row and
column number. In such circumstances, it is possible to construct this matrix knowing this
rule and the order of the matrix.
1 Press ctrl + N . Select Add Calculator.
2 Press t and use the cursor arrows to
highlight the matrix template shown. Press ·.
Note: Math Templates can also be accessed by pressing
2 3 0
5 When you type A (or a) it will paste in the matrix . Press · to display.
1 4 2
1 a Open the Main ( ) application Press to display
the soft keyboard.
b Select the keyboard.
2 Tap the 2 × 2 matrix icon, followed by the 1 × 2
matrix icon. This will add a third column and create a
2 × 3 matrix.
3 Type the values into the matrix template.
Note: Tap at each new position to enter the new value or use the
black cursor key on the hard keyboard to navigate to the new position.
sheet Exercise 10A
Order of a matrix
Example 1 1 State the order of each of the following matrices.
2 6
18 3 4 4
1 5 9
a b 7 6 12 c 3 4 d 7 e 0 3 3
3 0 4
11 8 1 0 3 3
4 A matrix has 12 elements. What are its possible orders? (There are six.)
a the upper triangular matrices b the identity matrix
c the diagonal matrices d the symmetric matrices.
Example 5 8 Complete the sentences below that relate to the following matrices.
2 6
3 4 3 1
0 1 −1 0
A = 2 1 1 0 2 B = 2 C = D = E = 0 −1 0
−1 2 1 3
1 2 0 4
4 −4
5 8 11
17 Consider the matrix A, where A = 7 10 13. The element in row i and column j of
9 12 15
matrix A is ai j . The elements in matrix A are determined by the rule
A ai j = 4 + j B ai j = 2i + 3 j C ai j = i + j + 3
D ai j = 3i + 2 j E ai j = 2i − j + 2
1 3 5 0
3 4 1 0
18 The matrix is an example of
5 1 2 0
0 0 0 0
19 The element in row i and column j of matrix A is ai j . The elements in matrix A are
determined using the rule ai j = 2i + j. Matrix A could be
3 3 4 5
3 4
A B 3 4 5 C 4 D [5] E 5 6 6
5 4
6 7 8 7
At the start of this chapter we used a matrix to store numerical information in a data table.
Matrices can also be used to carry codes that encrypt credit-card numbers for internet
transmission or to carry all the information needed to solve sets of simultaneous equations.
A less obvious application is using matrices to represent network diagrams.
Explanation Solution
1 Draw a blank (2 × 3) matrix.
Label the rows M for male and F for female.
Label the columns W for weights, A for aerobics
and F for fitness.
2 Fill in the elements of the matrix row by row,
starting at the top left-hand corner of the table.
M 16 104 86
F 75 34 94
Convert the 16-digit credit card number: 4454 8178 1029 3161 into a 2 × 8 matrix, listing
the digits in pairs, one under the other. Ignore the spaces.
Explanation Solution
1 Write out the sequence of numbers. 4454 8178 1029 3161
Note: Writing the number down in groups of four (as on
the credit card) helps you keep track of the figures.
2 Fill in the elements of the matrix row by row,
4 5 8 7 1 2 3 6
starting at the top left-hand corner of the table.
4 4 1 8 0 9 1 1
Explanation Solution
1 Draw a blank 4 × 4 matrix, labelling the rows and A B C D
columns A, B, C, D to indicate the points. A
The diagram opposite shows the roads connecting four towns: town A B
A, town B, town C, and town D. This diagram has been represented
by a 4 × 4 matrix, M. The elements show the number of roads
between each pair of towns.
a In the matrix M, m24 = 1. What does this tell
A 0 1 0 0
b In the matrix M, m34 = 3. What does this tell
M= B 1 0 2 1
C 0 2 0 3
c In the matrix M, m41 = 0. What does this tell
D 0 1 3 0
d What is the sum of the elements in row 3 of
matrix M and what does this tell us?
e What is the sum of all the elements of matrix
M and what does this tell us?
a There is one road between town B and town D.
b There are three roads between town D and town C.
c There is no road between town D and town A.
d 5: the total number of roads between town C and the other towns in the network.
e 14: the total number of different ways you can travel between towns.
Note: For each road, there are two ways you can travel; for example, from town A to town B (a12 = 1)
and from town B to town A (a21 = 1).
Exercise 10B
2 The table opposite gives the Car sales Small Medium Large
yearly car sales for two car
Honest Joe’s 24 32 11
Super Deals 32 34 9
Use the table to:
a construct a matrix to display the numbers in the table. What is its order?
b construct a row matrix to display the numbers in the table relating to Honest Joe’s.
What is its order?
c construct a column matrix to display the numerical information in the table relating
to small cars. What is its order? What does the sum of its elements tell you?
4 At a certain school there are 200 girls and 110 boys in Year 7. The numbers of girls
and boys in the other year levels are 180 and 117 in Year 8, 135 and 98 in Year 9,
110 and 89 in Year 10, 56 and 53 in Year 11, and 28 and 33 in Year 12. Summarise this
information in a matrix.
Example 8 5 Convert the 16-digit credit card number 3452 8279 0020 3069 into a 2 × 8 matrix. List
the digits in pairs, one under the other. Ignore any spaces.
6 The statistics for five members of a basketball team are recorded as follows:
Player A points 21, rebounds 5, assists 5
Player B points 8, rebounds 2, assists 3
Player C points 4, rebounds 1, assists 1
Player D points 14, rebounds 8, assists 6
Player E points 0, rebounds 1, assists 2
Express this information in a 5 × 3 matrix.
a 1 b 1 c
2 2 1
3 3
4 4
8 The diagram opposite shows the roads interconnecting three town 1 town 2
towns: town 1, town 2 and town 3. Represent this diagram with
a 3 × 3 matrix where the elements represent the number of roads town 3
between each pair of towns.
Example 10 9 The network diagram opposite shows a friendship network girl 1 girl 2
between five girls: girl 1 to girl 5.
This network has been represented by a 5 × 5 matrix, F, using girl 3
the rule:
element = 1 if the pair of girls are friends girl 5 girl 4
10 a The diagram below shows a ‘food web’ for polar bears (P), seals (S ) and
cod (C).
The matrix F below has been set up to represent the information in the diagram.
Polar bear
P 0 1 1
eats eats
F = S 0 0 1
C 0 0 0
Seal Cod
i What does the ‘1’ in column C, row P, of matrix F represent?
ii What does the column of zeroes in matrix F represent?
b The diagram below shows a ‘food web’ for polar bears (P), seals (S ) cod (C) and
zooplankton (Z).
Complete the matrix W to represent the information in the diagram.
P S C Z Polar bear
eats eats
W =
Seal Cod
eats eats
12 A small chain of delicatessans with shopnames Allbright, Bunchof, Crisp, Delic and
Elite (A, B, C, D, E) each sell the products Feta, Goatmilk, Haloumi, Insalata and Jam
(F, G, H, I, J). The number of weekly sales of each product at each of the shops is
shown in the matrix below.
F 34 40 52 106 27
G 42 154 38 55 68
P = H 136 145 11 44 77
I 136 147 43 72 111
J 139 140 66 56 145
Find which product had the highest weekly sales at any single one of the shops. The
name of this product and the shop is
A Feta at Allbright B Goatmilk at Bunchof C Jam at Elite
D Insalata at Bunchof E Haloumi at Allbright
Example 11
Given that
2 a 2 10
8 b+1 20 7
find the value of a and the value of b.
a = 10 and b + 1 = 7 which implies b = 6.
Explanation Solution
1 As the two matrices have the same
2 3 0
1 2 3
order, 2 × 3, they can be added. A + B = +
1 4 2 2 −2 1
2 + 1 3+2 0 + 3
1 + 2 4 + (−2) 2 + 1
Likewise, if we have two matrices of the same order (same number of rows and columns),
we can subtract the two matrices by subtracting their corresponding elements.
Explanation Solution
1 As the two matrices have the same
order, 2 × 3, they can be subtracted. 2 3 0 1 2 3
A − B = −
1 4 2 2 −2 1
2 − 1 3 − 2 0 − 3
1 − 2 4 − (−2) 2 − 1
Explanation Solution
Multiplying a matrix by a number has the 2 3 0 6 9 0
3 =
effect of multiplying each element by that 1 4 2 3 12 6
4 −4 2 −2
0.5 =
−2 6 −1 3
The zero
1 2 1 2 1 2 1 2 0 0
If X = and Y = , then X − Y = − = = O.
3 4 3 4 3 4 3 4 0 0
A matrix of any order with all zeros is known as a zero matrix. The symbol O is used to
represent a zero matrix. The matrices below are all examples of zero matrices.
0 0
0 0
, O = 0 , O = 0 0
O = [0], O =
0 0 0
0 0
6 0 9 0 18 0 18 0
3G − 2H = 3 × − 2 × = −
−4 2 −6 3 −12 6 −12 6
18 − 18 0 − 0 0 0
= =
−12 − (−12) 6 − 6 0 0
∴ 3G − 2H = O
CAS 2: How to add, subtract and scalar multiply matrices using the
TI-Nspire CAS
2 3 0 1 0 3
If A = and B = , find:
1 4 2 2 −2 1
a A+B b A−B c 3A − 2B
1 Press ctrl + N . Select Add Calculator.
2 Enter and store the matrices A and B into
your calculator.
a To determine A + B, type a + b.
Press · to evaluate.
b To determine A − B, type a − b.
Press · to evaluate.
c To determine 3A − 2B, type 3a − 2b.
Press · to evaluate.
CAS 2:How to add, subtract and scalar multiply matrices with the
2 3 0 1 0 3
If A = and B = , find:
1 4 2 2 −2 1
a A+B b A−B c 3A − 2B
1 Enter the matrices A and B into your calculator using the h keyboard.
The sales data for two used car dealers, Honest Joe’s and Super Deals, are displayed
2014 2015
Car sales Small Medium Large Small Medium Large
Honest Joe’s 24 32 11 26 38 16
Super Deals 32 34 9 35 41 12
a Construct two matrices, A and B, to
24 32 11
26 38 16
represent the sales data for 2014 and A = B =
32 34 9 35 41 12
2015 separately.
b Construct a new matrix C = A + B. C = A+B
What does this matrix represent?
24 32 11 26 38 16
= +
32 34 9 35 41 12
50 70 27
67 75 21
sheet Exercise 10C
0 1 1 0 0 1 4
D = E = F =
−1 2 2 −1 3 2 1
Example 16 7 The number of DVDs sold in a company’s city, suburban and country stores for each
3-month period in a year is shown in the table.
a Construct four 3 × 1 matrices A, B, C, and D that show the sales in each of the
three-month periods during the year.
b Evaluate A + B + C + D. What does the sum A + B + C + D represent?
8 The numbers of females and males enrolled in three different gym programs for 2014
and 2015, Weights, Aerobics and Fitness, are shown in the table.
2014 2015
Gym membership Weights Aerobics Fitness Weights Aerobics Fitness
Females 16 104 86 24 124 100
Males 75 34 94 70 41 96
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
498 Chapter 10 Matrices 10C
a Construct two matrices, A and B, which represent the gym memberships for 2014
and 2015 separately.
b Construct a new matrix C = A + B. What does this matrix represent?
c Construct a new matrix D = B − A. What does this matrix represent? What does the
negative element in this matrix represent?
d The manager of the gym wants to double her 2015 membership by 2018. Construct
a new matrix E that would show the membership in 2018 if she succeeds with her
plan. Evaluate.
0 2 0 4
10 If M = and N =
, then 2M − 2N =
−3 1 3 0
0 0 0 −2 0 −4 0 4 0 2
−9 2 −6 1 −12 2 12 −2 6 −1
The process of multiplying two matrices involves both multiplication and addition. The
process can be illustrated using Australian Rules football scores.
Explanation Solution
a Write down the matrix product. Under
6 0
each matrix, write down its order AB = 3 1 ; not defined
−4 2
(columns × rows)
order: 2×2 1×2
Once we know that two matrices can be multiplied, we can use the order of the two matrices
to determine the order of the resulting matrix.
2 0 2
For example, if A = and B = 3, then:
3 1 4
2 0 2
AB = × 3 is defined and will be of order 2 × 1.
3 1 4
6 0
, B = 3 1 and C = 1
A =
−4 2 −1
Explanation Solution
1 Write down the matrix product.
6 0
a BA = 3 1 ; order of BA 1 × 2
Under each matrix, write down
−4 2
its order rows × columns.
order: (1 × 2) (2 × 2)
2 The order of the product matrix
b BC = 3 1 ; order of BC 1 × 1
is given by rows in matrix 1 ×
columns in matrix 2.
order: (1 × 2) (2 × 1)
3 Write down the order.
6 0 1
c AC = ; order of AC 2 × 1
−4 2 −1
order: (2 × 2) (2 × 1)
Explanation Solution
1 Write down the matrix product 3×1
and, above each matrix, write down 1×3
AB = 1 3 2
its order. Use this information to 4
determine whether the matrix product
is defined and its order.
Defined: the number of columns in A
equals the number of rows in B.
The order of AB is 1 × 1.
2 To determine the matrix product: 2
a multiply each element in the row 1 3 2 4
matrix by the corresponding element 1
in the column matrix = [1 × 2 + 3 × 4 + 2 × 1]
b add the results
= [16]
c write down your answer.
∴ AB = [16]
Explanation Solution
1 Write down the matrix product 2×2 2×1
and, above each matrix, write down
1 0
its order. Use this information to AB =
2 3 3
determine whether the matrix product
is defined and its order. Defined: the number of columns in A
equals the number of rows in B.
The order of AB is 2 × 1.
1 Press ctrl + N . Select Add Calculator.
2 Enter and store the matrices C and D into your
3 To calculate matrix CD, type c × d. Press · to
Note: You must put a multiplication sign between the c
and d.
Example 21 Using matrix multiplication to sum the rows and columns of a matrix
Explanation Solution
a i To sum the rows of a 3 × 3 matrix,
1 3 0 1 4
post-multiply a 3 × 1 summing
2 6 7 1 = 15
3 0 1 1 4
1 3 0
1 1 1 2 6 7 = 6 9 8
ii To sum the columns of a 3 × 3 3 0 1
matrix, pre-multiply a 1 × 3
summing matrix.
b To sum the columns of a 2 × 5 matrix,
2 5 −1 −3 4
1 1
pre-multiply a 1 × 2 summing matrix.
0 6 2 −2 3
= 2 11 1 −5 7
T = [20 40] Matrix T gives the times (in minutes) a person spent walking and
Walk Run running in a training session.
Compute the matrix product TE and show that it gives the total energy consumed during
the training session.
T × E = 20 40 = [20 × 25 + 40 × 40] = [2100]
The total energy consumed is:
20 minutes × 25 kJ/minute + 40 minutes × 40 kJ/minute = 2100 kJ
This is the value given by the matrix product T E.
You could work out the energy consumed on the training run for one person just as quickly
without using matrices. However, the advantage of using a matrix formulation is that, with
the aid of a calculator, you could almost as quickly have worked out the energy consumed by
10 or more runners, all with different times spent walking and running.
Matrix powers
Now that we can multiply matrices, we can also determine the power of a matrix. This is
an important tool when we meet communication and dominance matrices in the next section
and transition matrices in the next chapter.
Explanation Solution
1 Write down the matrices.
1 0
−1 1
0 1
A = , B = , C =
2 −1 2 1 1 1
3 Type in each of the expressions as 5 −2
written, and press to evaluate. a 2A + B2 − 2C =
2 −1
Write down your answer.
6 −1
b (2A + B)2 − C 2 =
−1 5
0 −3
c AB − 3C =
2 2
3 −9
Note: For CAS calculators you must use a multiplication sign between a and b2 in the last example,
otherwise it will be read as variable (ab)2 .
sheet Exercise 10D
Matrix multiplication
Example 17 1 The questions below relate to the following five matrices.
3 0 1
A= 1 3 B = C = 1 0 −1 D = E = 1
1 −1 2
a Which of the following matrix products are defined?
i AB ii BA iii AC iv CE
v EC vi EA vii DB viii CD
Example 19 b Compute the following products by hand.
Example 20 i AB ii CE iii DB iv AD
c Enter the five matrices into your calculator and compute the following matrix
i AB ii EC iii AB − 3CE iv 2AD + 3B
4 1 0 Team 1
3 1 1 Team 2
The results of the competition are 3 0 2 Team 3
R =
summarised in the results matrix. Work 1 2 2 Team 4
out the final points score for each team
1 1 3 Team 5
by forming the matrix product RP.
0 1 4 Team 6
9 Four people complete a training session in which they walked,
25 Walk
jogged and ran at various times.
E = 40 Jog
The energy consumed in kJ/minute when walking, jogging or
65 Run
running is listed in the energy matrix opposite.
The time spent in each activity (in minutes) W J R
by four people is summarised
20 30 Person 1
in the time matrix opposite. Work out
20 25 Person 2
the total energy consumed by each T =
20 20 20 Person 3
person, by forming the matrix product
30 20 10 Person 4
T E.
10 A manufacturer sells three types of fruit straps, A B C
A, B and C, through outlets at two shops, Energy (E)
34 19 E
and Nourishing (N). Q =
30 45 25 N
The number of fruit straps sold per month at each
shop is given by the matrix Q.
a Write down the order of matrix Q.
11 Matrix X shows the number of cars of models a and b bought by four dealers A, B,
C, D. Matrix Y shows the cost in dollars of cars a and b. Find XY and explain what it
a b
A 3 1
B 2 2 26 000 a
X= Y =
C 1 4 32 000 b
D 1 1
12 It takes John 5 minutes to drink a milk shake which costs $2.50, and 12 minutes to eat
a banana split which costs $3.00.
5 12 1
a Find the product and interpret the result in fast-food economics.
2.50 3.00 2
5 12 1 2 0
b Two friends join John. Find and interpret the result.
2.50 3.00 2 1 1
13 The final grades for Physics and Chemistry are made up of three components: tests,
practical work and exams. Each semester, a mark out of 100 is awarded for each
component. Wendy scored the following marks in the three components for Physics:
Powers of matrices
2 1
Example 23 14 If A = , determine A2 , A3 , A4 and A7 .
1 3
−1 1
15 If A = , determine A4 , A5 , A6 and A7 .
1 2
2 0 2
16 If B = 1 2 3 use your calculator to find B3 .
1 3 1
2 1 −1 1 0 1
17 If A = , B = and C = , evaluate:
1 3 1 2 1 −2
a A + 2B − C 2 b AB − 2C 2 c (A + B + 2C)2
d 4A + 3B2 − C 3 e (A − B)3 − C 3
1 4
1 5 2
19 Matrix P = 2 6 and matrix Q =
2 1 6
1 2
Matrix R = P × Q. Element r31 is determined by the calculation
A 1×1+4×2 B 1×1+4×0 C 1×1+2×2
D 4×1+5×0 E 4×2+5×1
20 There are two types of chocolate boxes, Minty Chews and Orange Delight, available in
a shop. The cost, in dollars, to purchase a box is shown in the table below.
Chocolate Cost($)
Minty Chews 6
Orange Delight 8
Liam is doing all his Christmas shopping by buying chocolate boxes. He buys 7 boxes
of Minty Chews and 9 boxes of Orange Delight. The total cost in dollars of these
chocolates can be determined by which one of the following calculations?
A 7 × 6 8 B 7 9 × 6 8 C × 6 8
6 6
D 7 9 × E 9 7 ×
8 8
Learning intentions
I To be able to detemine by multiplication when two matrices are inverses.
I To be able to determine the determinant of a 2 × 2 matrix.
I To be able to determine the inverse of a 2 × 2 matrix.
I To be able to use a CAS calculator to determine the inverse and determinant of an
n × n matrix.
I To be able to solve simple matrix equations.
Having defined the inverse matrix, two questions immediately come to mind. Does the
inverse of a matrix actually exist? If so, how can we calculate it?
First we will demonstrate that at least some matrices have inverses. We can do this by
showing that two matrices, which we will call A and B, have the property AB = I and
BA = I, where I is the identity matrix. If this is the case, we can then say that B = A−1 .
Explanation Solution
2 3 5 −3
1 Write down A and B. A = B =
3 5 −3 2
2 3 5 −3
2 Form the product AB and AB =
3 5 −3 2
evaluate. You can use your
calculator to speed things up 2 × 5 + 3 × (−3) 2 × (−3) + 3 × 2
if you wish. 3 × 5 + (−5) × 3 3 × (−3) + 5 × 2
1 0
0 1
∴ AB = I
3 Form the product BA and
5 −3 2 3
evaluate. You can use your BA =
−3 2 3 5
calculator here to speed things
up if you wish. 5 × 2 + (−3) × 3 5 × 3 + (−3) × 5
(−3) × 2 + 2 × 3 (−3) × 3 + 2 × 5
1 0
0 1
∴ BA = I
4 Write down your conclusion. Because AB = I and BA = I, we conclude that A
and B are inverses.
2 3 5 −3
While Example 24 clearly demonstrates that the matrices A = and B =
3 5 −3 2
both have an inverse, many square matrices do not have inverses. To see why, we need to
introduce another new matrix concept, the determinant, and see how it relates to finding the
inverse of a square matrix. To keep things manageable, we will restrict ourselves initially to
2 × 2 matrices.
a b
det(A) = =a×d−b×c
c d
a b
1 Write down the matrix and use the rule det(A) = = a × d − b × c.
c d
2 Evaluate.
2 3 2 3
a A = ∴ det(A) = =2×5−3×3=1
3 5 3 5
2 3 2 3
b B = ∴ det(B) = =2×3−2×3=0
2 3 2 3
2 4 2 4
c C = ∴ det(C) = = 2 × 3 − 2 × 4 = −2
2 3 2 3
From Example 25, we can see that the determinant of a matrix is a number that can take on
both positive and negative values as well as being zero. For a matrix to have an inverse, its
determinant must be non-zero.
The most important thing about this rule is that it shows immediately why you cannot
calculate an inverse for some matrices. These are the matrices whose determinant is zero.
Explanation Solution
1 Write down the matrix and use the rule a
2 2
a b A =
det(A) = =a×d−b×c 3 4
c d
to evaluate the determinant. 2 2
det(A) = =2×4−3×2=2
3 4
1 d −b 1 4 −2
Use the rule ∴A =
−1 =
det(A) −c a 2 −3 2
1 d −b
A =
det(A) −c a 2 −1
to evaluate A−1 . −1.5 1
2 Write down the matrix and use the b
2 3
a b B =
rule det(B) = = 2 3
c d
a × d − b × c. 2 3
∴ det(B) = =2×3−2×3=0
to evaluate the determinant. 2 3
det(B) = 0
∴ B does not have an inverse.
CAS 4: How to find the determinant and inverse of a matrix using the
TI-Nspire CAS
1 2 3
If A = 4 1 0, find det(A) and A−1 .
2 0 2
1 Press
2 ctrl + N . Select Add Calculator.
3 Enter the matrix A into your calculator.
4 To calculate det(A), type det(a) and press · to
Note: det() can also be accessed using b>Matrix &
CAS 4: How to find the determinant and inverse of a matrix using the
1 2 3
If A = 4 1 0, find det(A) and A−1 .
2 0 2
1 Enter the matrix A into your calculator.
Note: Change the status of the calculator to Standard for fractions
to be displayed. Tapping on Decimal will change the calculator to
2 To calculate det(A):
a type and highlight A (by swiping with the stylus)
b select Interactive from the menu bar, tap Matrix-
Calculation, then tap det.
3 To calculate the inverse matrix A−1 :
a type A∧ –1
b press to evaluate.
Note: If the matrix has no inverse, the calculator will respond with
the message Undefined.
Example 27
1 2 1 2
2 1 8 −1 2
Let A = 2 1 3 B = C = D = and E = 3
3 2 4 6 3
1 1 1 1
Solve each of the following matrix equations for X
a B+X =C b BX = C c XB = C
d BX = D e AX = E f BX + = D
Explanation Solution
2 1 8 −1
a 1 Write down B and C. B = C =
3 2 4 6
4 4
f 1 Form the equation BX + = D and BX + = D
5 5
note that X must be a 2 × 1matrix.
4 BX = D −
2 Calculate X = B−1 D − . 5
X = B D −
2 −1 −2
X =
−3 2 −2
∴X =
sheet Exercise 10E
7 Suppose that A, B, C and X are 2 × 2 matrices and that both A and B have inverses.
Solve the following for X:
a AX = C b ABX = C c AXB = C
d A(X + B) = C e AX + B = C f XA + B = A
3 2 1 x 15 000
8 If 2 2 1 y = 20 000 find x, y and z
1 1 1 z 10 000
10 A factory makes and assembles three products, P, Q and R, each requiring different
quantities of three components, a, b and c. The following matrix A represents the
required quantities of components for each product, and the matrix K represents the
daily production of components at the factory.
a 5 3 2 a 95
A = b 2 K = b 80
2 4 and
c 0 2 3 c 40
a Find the inverse of A.
b Assume that the factory uses all components that are produced. Find the rate of
assembly of P, Q and R at the factory, expressed as number of products per day.
A 4 B 0 C −4 D 1 E 2
1 −1
13 The inverse of the matrix is
1 −2
2 1 1 1 1 1 2 −1
A −1 B C D E
−1 −1 −1 −2 −1 2 1 −1
Learning intentions
I To be able to identify and work with binary matrices, permutation matrices and
communication matrices.
Binary matrices
The following matrices are examples of binary matrices.
0 1 0
1 1 0
1 0 1 0 0 1
1 0 1
1 0 0
Binary matrices are at the heart of many practical matrix applications, including analysing
communication systems and using the concept of dominance to rank players in sporting
Permutation matrices
A permutation1 matrix is a square binary matrix in which there is only one ‘1’ in each row
and column.
The following matrices are examples of permutation matrices.
0 1 0 1 0 0
0 1 1 0
1 0 0 0 0 1
1 0 0 1
0 0 1 0 1 0
An identity matrix is a special permutation matrix. A permutation matrix can be used to
rearrange the elements in another matrix.
1 The word ‘permutation’ means a rearrangement a group of objects, in this case the elements of a matrix,
into a different order.
ISBN 978-1-009-11041-9
2 When we form the matrix product A × © B,
we Jones et we
say that al 2023
are pre-multiplying by A. Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
524 Chapter 10 Matrices
Explanation Solution
a i Form the matrix product PX. To
find the first entry in the resulting 0 0 1 T R
column matrix note that the 1 of the PX = 0 1 0 A = A
permutation matrix is in the third 1 0 0 R T
column. of the first row. We obtain
0×T +0×A+1×R=R
To find the second entry in the 2
0 0 1 T T
resulting column matrix note that the
P2 X = 0 1 0 A = A
1 of the permutation matrix is in the
second column of the second row and 1 0 0 R R
so on.
ii Form the matrix product P2 X.
b To leave the matrix X unchanged,
1 0 0
P2 must be an identity matrix.
P2 is the identity matrix I = 0 1 0.
0 0 1
Example 29
Find a permutation matrix that takes the column matrix to the column matrix
Explanation Solution
We want to move D from the fourth place
0 0 0 1 A D
to the first place. Place the 1 in the fourth
0 0 1 0 B C
column of the first row.
= .
We want to move C from the third place to 0 1 0 0 C B
the second place. Place the 1 in the third 1 0 0 0 D A
column of the second row.
Example 30
0 0 1 0
0 0 0 1
A 4 × 4 permutation matrix P = applied to a 4 × 1 column matrix A gives
0 1 0 0
1 0 0 0
the column matrix . Determine the matrix A.
PA =
Using your calculator.
∴ A = P−1 = PT =
Communication matrices
A communication matrix is a square binary matrix in which the 1s represent the links in a
communication system.
Eva, Wong, Yumi and Kim are students who are staying in a E
backpacker’s hostel. Because they speak different languages they
can have problems communicating. The situation they have to deal
with is that:
Eva speaks English only
Yumi speaks Japanese only K
Kim speaks Korean only
Wong speaks English, Japanese and Korean.
All of the non-zero elements in the leading diagonal of a communication matrix, or its
powers, represent redundant links in the matrix.
However, all of the remaining non-zero elements represent meaningful two-step
communication links.
For example, the 1 in row Y, column K E W Y K
represents the two-step communication link that
1 0 1
1 E
enables Yumi to send a message to Kim.
0 3 0
0 W
C 2 =
1 0 1 1 Y
1 0 1 1 K
Finally, the total number of one and two-step links in a communication system, T , can be
found by evaluating T = C + C 2 .
0 1 0 0 1 0 1 1 1 1 1 1 E
1 0 1 1 0 3 0 0 1 3 1 1 W
T = C + C 2 = + =
0 1 0 0 1 0 1 1 1 1 1 1 Y
0 1 0 0 1 0 1 1 1 1 1 1 K
sheet Exercise 10F
Permutation matrices
1 Which of the following binary matrices are permutation matrices?
0 1 0 0 1
1 0
A = B = 1 0 0 C = 1 0
1 1
0 0 1 0 1
Example 28 2 X is the row matrix: X = S H U T
0 0 0 1
1 0 0 0
P is the permutation matrix: P =
0 1 0 0
0 0 1 0
Communication matrices
Example 31 5 Freya (F), Lani (L) and Mei (M) are close friends who regularly L
send each other messages. The direct (one-step) communication
links between the friends are shown in the diagram opposite. M
a Construct a communication matrix C from this diagram. F
b Calculate C 2 .
c How many different ways can Mei send a message to Freya?
6 Four fire towers T 1, T 2, T 3 and T 4, can communicate T2
with one another as shown in the diagram opposite. In T4
this diagram an arrow indicates that a direct channel of
communication exists between a pair of fire towers. T3
For example, a person at tower 1 can directly communicate with a person in tower 2
and vice versa.
The communication matrix C can also T1 T2 T3 T4
be used to represent this information.
0 1 0 0 T1
a Explain the meaning of a zero in the
C = 1 0 0 T2
communication matrix.
0 1 0 1 T3
b Which two towers can
communicate directly with T 2? 0 1 0 T4
c Write down the values of the two missing elements in the matrix.
The matrix C 2 is shown opposite. T1 T2 T3 T4
d Explain the meaning of the 1 in row
1 0 1 0
T 3, column T 1.
C = 0
2 2 0 1 T2
e How many of the two-step
1 0 2 0 T3
communication links shown in the
matrix C 2 are redundant? 0 1 0 1 T4
f Construct a matrix that shows the total number of one and two-step communication
links between each pair of towers.
g Which of the four towers need a three-step link to communicate with each other?
A ‘1’ in the matrix shows that the person named in that row can send a message
directly to the person named in that column. Adam wants to send a message to David.
This can be done through a sequence of communications formed from the five people.
Which of the following is a possible sequence of communications to get the message
from Adam to David?
A A, D B A, B, D C A, B, C, D D A, B, E, D E A, E, C, B, D
9 Matrix P is a 3 × 3 permutation matrix. Matrix Z is another matrix such that the matrix
product P × Z is defined. This matrix product results in the third row becoming the first
row, the second row becoming the third and the first row becoming the second row of
matrix Z. The permutation matrix P is
0 0 1 1 0 0 1 0 0
A 1 0 0 B 1 0 0 C 1 0 0
0 1 0 0 1 0 1 0 0
0 1 0 1 0 0
D 0 0 1 E 0 1 0
1 0 0 0 0 1
Learning intentions
I To be able to construct and interpret dominance matrices.
In many group situations, certain individuals are said to be dominant. This is particularly
true in sporting competitions. Problems of identifying dominant individuals in a group can
be analysed using the same approach we used to analyse communication networks.
For example, five players – Anna, Birgit, Cas, Di and Emma – played in a round-robin
tournament3 of tennis to see who was the dominant (best) player.
The results were as follows: B
One-step dominances
The first dominance matrix, D, A B C D E Dominance
records the number of one-step
A 0 0 1 1 0 2
dominances between the players.
1 0 1 0 1 3
For example, Anna has a one-step
D= C
0 0 0 1 0 1
dominance over Cas because,
D 0 1 0 0 0 1
when they played, Anna beat Cas.
E 1 0 1 1 0 3
This matrix can be used to calculate a one-step dominance score for each player, by
summing each of the rows of the matrix. According to this analysis, B and E are equally
dominant with a dominance score of 3.
3 A round-robin tournament is one in which each of the participants play each other once.
Two-step dominances
A two-step dominance occurs when a player beats another player who has beaten someone
else. For example, Birgit has a two-step dominance over Di because Birgit defeated Cas who
defeated Di.
Two-step dominances can be A B C D E Dominance
determined using the same
A 0 1 0 1 0 2
technique used to obtain two-
B 1 0 2 3 0 6
step links in a communication
network. We simply square the D2 = C
0 1 0 0 0 1
one-step dominance matrix. D 1 0 1 0 1 3
The two-step dominances for E 0 1 1 2 0 4
these players are shown in matrix D2 .
Reading across the ‘B row’.
The 1 in column A represents the two-step dominance ‘Birgit defeated Emma who
defeated Anna’.
The 2 in column C represents the two-step dominances ‘Birgit defeated Emma who
defeated Cas’ and ‘Birgit defeated Anna who defeated Cas’
The 3 in column D represents the three two-step dominances ‘Birgit defeated Emma
who defeated Di’, ‘Birgit defeated Anna who defeated Di’ and ‘Birgit defeated Cas who
defeated Di’.
In column E the 0 tells us that there are no two-step dominances for Birgit over Emma
even though there was a one-step dominance.
We can combine the A B C D E T otal
information contained in both
A 0 1 1 2 0 4
D and D2 by calculating a
B 2 0 3 3 1 9
new matrix T = D + D2 .
T = D+D = 2
0 1 0 1 0 2
D 1 1 1 0 1 4
Using these total dominance scores:
E 1 1 2 3 0 7
Birgit is the top-ranked player with a total
dominance score of 9
Eva is second with a total score of 7
Anne and Di are equal third with a total score of 4
Cas is the bottom-ranked player with a total score of 2.
Explanation Solution
a Construct the one-step dominance A B C D One-step
matrix D.
A 0 1 0 1 2
D= B 0 0 1 0 1
C 0 0 0 0 0
D 0 1 1 0 2
b The person with the highest total Person A is the most influential
dominance score is the most influential. person with a total dominance score
of 5.
Exercise 10G
Dominance matrices
2 Five students play each other at chess. The dominance matrix shows the winner of each
game with a ‘1’ and the loser or no match with a ‘0’. For example, row 2 indicates that
B loses to A, D and E but beats C.
a Find the one-step dominance Losers
score for each student and use
these to rank them.
A 0
1 1 1 1
b Calculate the two-step dominance
B 0 0 1 0 0
matrix. Winners
C 0 0 0 0 0
c Determine the matrix T = D + D 2
and use this matrix to rank the D 0 1 1 0 0
players. E 0 1 0 1 0
4 The following dominance matrix, M, gives the results of a series of squash matches
between five friends, where mi j = 1 if player i beat player j.
Ash Ben Carl Dot Elle
0 0 1 1 0
1 0 1 1 0
M = Carl
0 0 0 1 0
0 0 0 0 1
Elle 1 1 1 0 0
7 The following table gives the results of the first round of games at a chess club.
Game A vs B C vs D A vs D B vs C B vs D A vs C
Winner A C D B B A
11 There are five hens in a coup. Their owner calls them Alpha, Beta, Gamma, Delta and
Epsilon. There is a pecking order in the coop, and the following dominance matrix, M,
was formed by the owner:
Based on the matrix M + M 2 , which of the following best describes the pecking order
in the coop?
A Beta, Epsilon, Delta, Alpha, Gamma B Beta, Epsilon, Gamma, Delta, Alpha
C Beta, Epsilon, Delta, Gamma, Alpha D Epsilon, Beta, Delta, Alpha, Gamma
E Epsilon, Beta, Delta, Gamma, Alpha
Which one of the following is the correct one-step dominance for this tournament?
0 0 1 1 1 0 1 1
0 0 1 1
1 0 0 0 1 0 0 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1 0 1
0 1 0 0 0 1 0 1 0 1 0 1
0 0 1 1 1 0 1 1
1 0 1 0 1 0 0 1
0 1 0 1 0 1 0 1
1 1 0 0 0 1 0 0
Square matrix A square matrix has an equal number of rows and columns.
Order The order (or size) of a matrix is given by the number of rows and
columns in that order.
Locating an The location of each element in the matrix is specified by its row and
element column number in that order.
Equal matrices Matrices are equal when they have the same order and corresponding
elements are equal in value.
Adding and Two matrices of the same order can be added or subtracted, by adding
subtracting or subtracting corresponding elements.
Power of a matrix The power of a matrix is defined in the same way as the powers of
numbers: A2 = A × A, A3 = A × A × A, and so on.
Only square matrices can be raised to a power.
A0 is defined to be I, the identity matrix.
Identity matrix An identity matrix, I, is a square matrix with 1s down the leading
diagonal and zeros elsewhere.
Inverse The inverse of a matrix, A, is written as A−1 and has the property that
AA−1 = A−1 A = I.
Only square matrices have inverses.
The inverse of a matrix is not defined if det(A) = 0.
A calculator is used to determine the inverse of a matrix.
Binary matrix A binary matrix is a matrix whose elements are either zeros or ones.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
10C 11 I can recognise when two matrices are equal and use this to solve problems.
10C 13 I can subtract one matrix from another when they have the same order.
10C 15 I recognise the role of the zero matrix and can undertake operations using the
zero matrix.
10C 16 I can use addition, subtraction and scalar multiplication to process data.
10D 21 I can use summing matrices to sum the rows or columns of a matrix.
10E 24 I can recognise that two matrices are inverses if their product is the identity
10E 27 I can calculate the inverse and determinant of a n × n matrix using CAS.
10F 29 I can use a permutation matrix to rearrange the elements of a column or row
10F 32 I can construct a communication matrix from information given in written form
or a diagram.
10F 33 I can construct a dominance matrix from information given in written form or a
4 The following matrices can be added:
A U and V B V and W C X and Y D U and Y E none of the above
6 −2Y =
0 −2 0 2 0 −2 0 1 −2 −1
2 −4 −2 4 −2 4 −1 2 −4 2
8 UT =
2 0 1 1 2 1 0 2 1 0
1 1 2 0 0 1 1 1 1 2
2 0 1
9 In the matrix A = 4 −1 3, the element a23 =
−5 −4 7
A −4 B −1 C 0 D 3 E 4
2 0 −1 0
10 2 − =
−1 1 1 1
5 0 5 0 3 0 6 0 5 0
−4 0 4 0 −2 0 1 2 −3 1
1 2 3 × 2 =
1 3
1 2
A 10 B 12 C D 2 2 E not defined
4 3
3 1
13 det(U) =
A −2 B 0 C 1 D 2 E 4
14 Y −1 =
0.5 0 4 −2 1 0 1 −4 2
A B C D E not defined
−0.5 1
−2 1
0 1
8 2 −1
15 U −1 =
0.5 0 4 −2 1 0 1 −4 2
A B C D E not defined
−0.5 1
−2 1 0 1
8 2 −1
21 If both A and B are m × n matrices, where m , n, then A + B is
A an m × n matrix B an m × m matrix C an n × n matrix
D a 2m × 2n matrix E not defined
4 6 3 6 −6 −12
22 The matrix expression + + is equal to
−2 1 −2 1 4 −1
0 0 1 0
A B 0 C
0 0 0 1
13 24 1 24
0 1 0 3
The matrix A is:
1 0 0 0 1 0 0 0 1
2 0 0 1 0 0 1 1 0 1 2
A 1 2 3 4 B C D E
3 0 1 0 1 0 1 1 1 3 4
4 1 0 1 0 1 0 1 0
3 0
4 1
24 If matrix M = then the transpose matrix M T =
7 2
9 6
0 3 0 3
1 4 1 4 3 4 7 9 0 1 2 6 3 4 0 1
2 7 6 9 0 1 2 6 3 4 7 9
2 6 7 9
6 9 2 7
26 Matrix A1 is the 4 × 1 column matrix
A second 4 × 1 column matrix, A2 , contains the same elements as A1 , but the elements
are ordered from top to bottom in alphabetical order. Matrix A2 = P × A1 , where P is a
permutation matrix. Matrix P is
1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1
0 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0
0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 1 0 0
1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 0 0
X 0 1 1 1 X 0 1 1 1
Y 0 1 1 Y 0 0 1 1
1 E
Z 0 0 1 Z 1
0 0 0
W 0 1 1 W 1 0 1 0
1 5 3 2
0 1 3 2
28 The matrix is an example of a
0 0 1 4
0 0 0 1
A symmetric matrix. B unit matrix. C triangular matrix.
D diagonal matrix. E communication matrix.
29 A, B, C, D and E are five intersections A B
joined by roads, as shown in the
diagram opposite. Some of these roads
are one-way only.
The matrix opposite indicates the
direction that cars can travel along each From intersection
of these roads.
In this matrix: 0 0 0 0 0 A
the 1 in column A and
1 0 0 0 0 B
row B indicates that cars 0 1 0 1 1 C T o intersection
can travel directly from 1 0 0 0 0 D
A to B
0 0 1 1 0 E
the 0 in column B and row A
indicates that cars cannot travel
directly from B to A (either it is a
one-way road or no road exists).
Cars can travel in both directions between intersections:
A A and D B B and C C C and D D D and E E C and E
town 3
town 3
town 4
2 Heights in feet and inches can be converted into centimetres using matrix
multiplication. The matrix C = can be used as a conversion matrix
(1 foot = 30.45 cm and 1 inch equals 2.54 cm).
a What is the order of matrix C?
Jodieh tellsius that her height is 5 feet 4 inches. We can write her height as a matrix
J= 5 4.
b What is the order of matrix J?
Bookshop 1 Bookshop 2
Number of titles Hardback Paperback Hardback Paperback
Fiction 334 876 354 987
Non-fiction 213 456 314 586
4 Mathematics and Physics are offered in a first year university science course.
600 Mathematics
The matrix N = lists the number of students enrolled in each
320 Physics
The matrix P = [0.15 0.225 0.275 0.25 0.10] lists the proportion of these
students expected to be awarded an A, B, C, D or E grade in each subject.
a Write down the order of matrix P.
b Let the matrix R = NP.
i Evaluate the matrix R.
ii Explain what the matrix element R13 represents.
c Students enrolled in Mathematics have to pay an extra fee of $220, while students
enrolled in Physics pay an extra fee of $197.
i Write down a clearly labelled row matrix, called F, that lists these fees.
ii Show a matrix calculation that will give the total fees fees, L, paid in dollars by
the students enrolled in Mathematics and Physics. Find this amount.
6 A mining company operates three mines, A, B and C. Each of the mines produces three
types of minerals, p, q and r. Consider the following two matrices:
p q r
A 20 20 40 A 46 000
X = B 0 40 20 Y = B 34 000
C 60 40 60 C 106 000
Matrix X gives the number of tonnes of each of the minerals extracted per day from
each of the mines, and matrix Y gives the total revenue (in dollars) from selling the
minerals extracted from each of the mines on one day.
a Calculate the total number of tonnes of minerals produced by mine A.
b Calculate the total number of tonnes of mineral q produced.
c Calculate the total revenue of the three mines.
d In the matrix equation XA = Y
i What is the order of matrix A?
ii What do the elements of matrix A represent?
iii We know that A = X −1 Y. Find A.
Chapter objectives
I How do you construct a transition matrix from a transition diagram and
vice versa?
I How do you construct a transition matrix to model the transitions in a
I How do you use a matrix recurrence relation, S0 = initial state matrix,
Sn+1 = TSn , to generate a sequence of state matrices?
I How do you informally identify the equilibrium state or steady-state matrix
in the case of regular state matrices?
I How do you use a matrix recurrence relation S0 = initial state matrix,
Sn+1 = TSn + B to model systems that include external additions or
reductions at each step of the process?
I How do you use and interpret Leslie matrices to analyse population growth?
A car rental firm has two branches: one in Bendigo and one in Colac. Cars are usually rented
and returned in the same town. However, a small percentage of cars rented in Bendigo each
week are returned in Colac, and vice versa. The diagram below describes what happens on a
weekly basis.
B – Bendigo
C – Colac
80% B C 90%
The percentages (written as proportions) are summarised in the form of the matrix below.
Rented in
Bendigo Colac
Bendigo 0.8 0.1
Returned to
Colac 0.2 0.9
This matrix is an example of a transition matrix (T). It describes the way in which
transitions are made between two states:
state 1: the rental car is based in Bendigo.
state 2: the rental car is based in Colac.
Note: In this situation, where the total number of cars remains constant, the columns in a transitional matrix
will always add to one (100%). For example, if 80% of cars are returned to Bendigo, then 20% must be
returned to Colac.
12% 77%
Explanation Solution
1 There are three locations from which
Rented in
the cars can be rented and returned:
Albury (A), Wodonga (W) and Benalla
(B). To account for all the possibilities,
Returned to w
a 3 × 3 matrix is needed. Construct a
blank matrix labelling the rows and B
columns A, W and B, respectively.
Column labels indicate where the car
was rented. The row labels indicate
where the cars were returned to.
2 Complete the matrix by writing
each of the percentages (converted
A 0.7
to proportions) into the appropriate
w 0.1
locations. Start with column A and
write in values for each row: 0.7 (70%), B 0.2
0.1 (10%) and 0.2 (20%).
3 Mentally check your answer by
summing columns; they should
A 0.7 0.05 0.12
sum to 1.
w 0.1 0.8 0.11
B 0.2 0.15 0.77
A factory has a large number of machines. Machines can be in one of two states:
operating or broken. Broken machines are repaired and come back into operation, and
vice versa. On a given day:
85% of machines that are operational stay operating
15% of machines that are operating break down
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
11A 11A Transition matrices - setting up a transition matrix 553
5% of machines that are broken are repaired and start operating again
95% of machines that are broken stay broken.
Construct a transition matrix to describe this situation. Use the columns to define the
situation at the ‘Start’ of the day and the rows to describe the situation at the ‘End’ of
the day.
Explanation Solution
1 There are two machine states: operating (O) or broken
(B). To account for all the possibilities, a 2 × 2 transition
matrix is needed. Construct a blank matrix, labelling the O
rows and columns O and B, respectively. End
Exercise 11A
55% 25%
c 70% d 45%
Example 2 2 A factory has a large number of machines which can be in one of two states, operating
(O) or broken down (B). It is known that that an operating machine breaks down by the
end of the day on 4% of the days, and that 98% of machines which have broken down
are repaired by the end of the day.
Complete the 2 × 2 transition matrix T to describe this.
T = Tomorrow
3 A large company has 1640 employees, 60% of whom currently work full-time (F) and
40% of whom currently work part-time (P). Every year 20% of full-time workers move
to part-time work, and 14% of part-time workers move to full-time work.
Complete the 2 × 2 transition matrix T to describe this.
This year
T = Next year
30% A B 35%
A transition matrix that provides the same information as the transition diagram is
A Today B Today C Today
A 65% 30% A 30% 65% A 30% 65%
Tomorrow Tomorrow Tomorrow
B 70% 35% B 70% 35% B 35% 70%
D Today E Today
A 30% 35% A 65% 70%
Tomorrow Tomorrow
B 65% 70% B 35% 30%
Learning intentions
I To be able to interpret a transition matrix and a transition diagram.
Let us return to the car rental problem at the start of this section. As we saw then, the
following transition matrix, T , and its transition diagram can be used to describe the weekly
pattern of rental car returns in Bendigo and Colac.
Rented in B – Bendigo
B C C – Colac
0.80 0.10 B 20%
T =
Returned to 80% B C 90%
0.20 0.90 C 10%
Further, if 40 cars are rented in Colac this week, the transition matrix predicts that:
10% or 4 of these cars will be returned to Bendigo next week (0.10 × 40 = 4)
90% or 36 of these cars will be returned to Colac next week (0.90 × 40 = 36).
The following transition matrix, T , and its transition diagram can be used to describe the
weekly pattern of rental car returns in three locations: Albury, Wodonga and Benalla.
A = Albury
0.7 0.05 0.12 A B = Benalla
W = Wodonga
T = 0.1 0.8 0.11 W Returned to
0.2 0.15 0.77 B W 15%
A 5% 11% B
Use the transition matrix T and its transition diagram to answer the following questions.
a What percentage of cars rented in Wodonga each week are predicted to be returned to:
i Albury? ii Benalla? iii Wodonga?
b Two hundred cars were rented in Albury this week. How many of these cars do we
expect to be returned to:
i Albury? ii Benalla? iii Wodonga?
c What percentage of cars rented in Benalla each week are not expected to be returned to
d One hundred and sixty cars were rented in Albury this week. How many of these cars
are expected to be returned to either Benalla or Wodonga?
a i 0.5 or 5% ii 0.15 or 15% iii 0.80 or 80%
b i 0.70 × 200 = 140 cars ii 0.20 × 200 = 40 cars iii 0.10 × 200 = 20 cars
c 11 + 12 = 23% or 100 − 77 = 23%
d 20% of 160 + 10% of 160 = 48 cars
Exercise 11B
b Eighty people are seen buying popcorn at the movies. How many of these are
expected to buy popcorn next time they go to the movies?
c Sixty people are seen buying an ice cream at the movies. How many of these are
expected to buy popcorn next time they go to the movies?
d On another occasion, 120 people are seen buying popcorn and 40 are seen buying
an ice cream. How many of these are expected to buy an ice cream next time they
attend the movies?
2 On Windy Island, sea birds are observed nesting at three sites: A, B and C. The
following transition matrix and accompanying transition diagram can be used to predict
the movement of these sea birds between these sites from year to year.
This year 10%
A B C 100% 80%
1.0 0.10 0.05 A
T = 0 0.80 0.05 B Next year
5% 10%
0 0.10 0.90 C C 90%
a What percentage of sea birds nesting at site B this year were expected to nest at:
i site A next year? ii site B next year? iii site C next year?
b This year, 850 sea birds were observed nesting at site B. How many of these are
expected to:
i still nest at site B next year? ii move to site A to nest next year?
c This year, 1150 sea birds were observed nesting at site A. How many of these birds
are expected to nest at:
i site A next year? ii site B next year? iii site C next year?
d What does the ‘1’ in column A, row A of the transition matrix indicate?
C in the long term, all of the children will choose the same activity.
D Sport is the most popular activity in the first week
E 40% of the students will do First Aid each week.
We return again to the car rental problem. The car rental firm now plans to buy 90 new cars.
Fifty will be based in Bendigo and 40 in Colac.
Given this pattern of rental car returns, the first question the manager would like answered is:
‘If we start with 50 cars in Bendigo, and 40 cars in Colac, how many cars will be
available for rent at both towns after 1 week, 2 weeks, etc?’
You have met this type of problem earlier when doing financial modelling (Chapter 8). For
example, if we invest $1000 at an interest rate of 5% per annum, how much will we have
after 1 year, 2 years, 3 years, etc?
We solved this type of problem by using a recurrence relation to model the growth in our
investment year-by-year. We do the same with the car rental problem, the only difference
being that we are now working with matrices.
Thus, after 2 weeks we predict that there will be 39.8 cars in Bendigo and 50.2 in Colac.
Generating S 3
After 3 weeks:
0.8 0.1 39.8 36.9
S 3 = T S 2 =
0.2 0.9 50.2 53.1
Thus, after 3 weeks we predict that there will be 36.9 cars in Bendigo and 53.1 in Colac.
A pattern is now emerging. So far we have seen that:
S1 = T S0
S2 = T S1
S3 = T S2
If we continue this pattern we have:
S4 = T S3
S5 = T S4
or, more generally, S n+1 = T S n .
With this rule as a starting point, we now have a recurrence relation that will enable us to
model and analyse the car rental problem on a step-by-step basis.
Recurrence relation
S 0 = intial value, S n+1 = T S n
The factory has a large number of machines. The machines can be in one of two states:
operating (O) or broken (B). Broken machines are repaired and come back into operation
and vice versa.
At the start, 80 machines are operating and 20 are broken.
Use the recursion relation
S 0 = intial value, S n+1 = T S n
80 0.85 0.05
S 0 = and T =
20 0.15 0.95
to determine the number of operational and broken machines after 1 day and after 3 days.
Explanation Solution
80 0.85 0.05
1 Write down a column matrix S 0 = T =
20 0.15 0.95
with S 0 representing the initial
operational state of the machines,
and the transition matrix.
0.85 0.05 80 69
2 Use the rule S n+1 = T S n to S 1 = T S 0 = =
0.15 0.95 20 31
determine the operational state
After 1 day, 69 machines are operational and
of the machines after one day by
31 are broken.
forming the product S 1 = T S 0 and
0.85 0.05 69 60.2
3 To find the operational state of the S 2 = T S 1 = =
0.15 0.95 31 39.8
machines after 3 days, we must
first find the operating state of the 0.85 0.05 60.2 53.16
S 3 = T S 2 =
machines after 2 days (S 2 ) and 0.15 0.95 39.8 46.84
use this matrix to find S 3 using After 3 days, 53 machines are operating and
S3 = T S2 . 47 are broken.
and so on.
The factory has a large number of machines. The machines can be in one of two states:
operating (O) or broken (B). Broken machines are repaired and come back into operation
and vice versa.
Initially, 80 machines are operating and 20 are broken, so:
80 0.85 0.05
S 0 = and T =
20 0.15 0.95
Example 6
0.6 0.3
We have a transition matrix T =
0.4 0.7
25 587
and we know that the state matrix S 4 = .
34 413
Determine S 3 and S 2 .
We know that S 4 = T S 3 . Hence S 3 = T S 4 . First T −1 −1
= 3 .
− 43 2
You should hold this in your calculator and then
S 3 = T −1 S 4 and S 2 = T −1 S 3
7 −1 25 587 7 −1 25 290
= 34 = 34
− 3 2 34 413 − 3 2 34 710
25 290 24 300
= =
34 710 35 700
Note: To calculate S 2 given S 3 we could have used:
S 2 = T −1 S 4
Week 0 1 2 3 4–11 12 13 14 15
50 44 39.8 36.9 30.3 30.2 30.1 30.1
State matrix ...
40 46 50.2 53.1 59.7 59.8 59.9 59.9
What you should notice is that, as the weeks go by, the number of cars at each of the locations
starts to settle down. We call this the steady- or equilibrium- state solution.
For the rental car problem, the steady-state solution is 30.1 (in practice, 30) cars at the
Bendigo branch and 59.9 (in practice, 60) cars at the Colac branch, which means the
numbers of cars at each location will not change from then on.
This can be seen more clearly in the graph below (the points have been joined to guide the eye).
initial value: steady-state value:
Bendigo 50 Colac 60
Number of cars
50 Colac
40 Bendigo
20 initial value: steady-state value:
10 Colac 40 Bendigo 30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In summary, even though the number of cars returned to each location varied from day to day,
the numbers at each location eventually settled down to an equilibrium or steady-state solution.
In the steady state, the number of cars at each location remained the same.
1 In the steady state, cars are still moving between Bendigo and Colac, but the number
of cars rented in Bendigo and returned to Colac is balanced by the number of cars
rented in Colac and returned to Bendigo. Because of this balance, the steady state is
also called the equilibrium state.
2 For a system to have a steady state, the transition matrix must be regular and the
columns must add up to 1. A regular matrix is one whose powers never contain
any zero elements. In practical terms, this means that every state represented in the
transition matrix is accessible, either directly or indirectly from every other state.
Estimate the steady-state solution by calculating S n for n = 10, 15, 17 and 18.
Explanation Solution
1 Write down the transition matrix T
and initial state matrix S 0 . Enter the 0.8 0.1 50
matrices into your calculator. Use T T = , S 0 =
0.2 0.9 40
and S .
2 Use the rule S n = T n S 0 to write Sn = T n S0
down the expression for the nth state
for n = 10. ∴ S 10 = T 10 S 0 =
sheet Exercise 11C
7 Imagine that we live in a world in which people are either ‘happy’ or ‘unhappy’, but
the way people feel can change from day to day.
In this world:
90% of people who are happy today 10%
will be happy tomorrow
10% of people who are happy today 90% H U 40%
will be unhappy tomorrow
40% of people who are unhappy 60%
today will be unhappy tomorrow
60% of people who are unhappy today will be happy tomorrow.
8 In another model of this world, people can be ‘happy’, ‘neither happy nor sad’, or
‘sad’, but the way people feel can change from day to day.
The transition matrix opposite shows how H N S
people’s feelings may vary from day to day
H 0.80 0.40 0.35
in this world, and the proportions of people
T = N 0.15
0.30 0.40
S 0.05 0.30 0.25
In the transition matrix, the columns define the situation today and the rows define the
situation tomorrow.
a On a given day, out of 2000 people, 1200 are ‘happy’, 600 are ‘neither happy nor
sad’ and 200 are ‘sad’. Write down a matrix, S 0 , that describes this situation.
b The next day, how many people do we expect to be happy?
c After 5 days, how many people do we expect to be happy?
d In the long term, how many of the 2000 people do we expect to be happy?
On Monday 25% of the students ate Crispies. What percentage of the students ate
Krunchies on Tuesday?
A 47.5% B 48% C 50.25% D 52.5% E 62%
10 A factory employs the same number of workers each day. The workers are allocated to
work with either machine A or machine B. The workers may be allocated to work on a
different machine from day to day, as shown in the transition matrix below.
A 0.32 0.16
T =
B 0.68 0.84
Machine A has 72 workers each day on it. Each day, the number of workers machine B
will be
A 24 B 36 C 72 D 288 E 306
10 x 0.4 y
S 0 = 20 , S n+1 = T S n where T = 0.6 z 0.4
30 0.1 0.2 w
Matrix T is a regular transition matrix. Given that S 1 = 26 which of the following is
A x = 0.3, y = 0.1, z = 0.4, w = 0.5 B x = 0.3, y = 0.3, z = 0.4, w = 0.7
C x = 0.2, y = 0.7, z = 0.3, w = 0.3 D x = 0.2, y = 0.8, z = 0.3, w = 0.2
E x = 0.3, y = 0.6, z = 0.4, w = 0.4
To increase the number of cars, two extra cars are added to the rental fleet at each location
each week. The recurrence relation that can be used to model this situation is:
50 0.8 0.1 2
S 0 = , S n+1 = T S n + B where T = and B =
40 0.2 0.9 2
Determine the number of cars at Bendigo and Colac after:
a 1 week b 2 weeks.
Explanation Solution
a Use the rule S 1 = T S o + B to determine S 1 = T S0 + B
the state matrix after 1 week and write
0.8 0.1 50 2
your conclusion. = +
0.2 0.9 40 2
44 2 46
= + =
46 2 48
Thus, we predict that there will be 46 cars
in Bendigo and 48 cars in Colac.
b Use the rule S 2 = T S 1 + B to determine S 2 = T S1 + B
the state matrix after 2 weeks and write
0.8 0.1 46 2
your conclusion. = +
0.2 0.9 48 2
41.6 2 43.6
= + =
52.4 2 54.4
Unfortunately, the recurrence rule S n+1 = T S n + B does not lead to a simple rule for the state
matrix after n steps, so we need to work our way through this sort of problem step-by-step.
sheet Exercise 11D
i S1 ii S 2
Practical application
2 On Windy Island, sea birds are observed nesting at three sites: A, B and C. The
following transition matrix and accompanying transition diagram can be used to predict
the movement of sea birds between these sites from year to year.
This year 10%
A B C 100% 80%
1.0 0.10 0.05 A A B
T = 0 0.80 0.05 B Next year
5% 10%
0 0.10 0.90 C
C 90%
10 000
Initially, 10 000 sea birds were observed nesting at each site, so S 0 = 10 000.
10 000
a Use the recurrence rule S n+1 = T S n to:
i determine S 1 , the state matrix after 1 year
ii predict the number of sea birds nesting at site B after 2 years.
b Without calculation, write down the number of sea birds predicted to nest at each of
the three sites in the long term. Explain why this can be done without calculation.
c To help solve the problem of having all the birds eventually nesting at site A, the
ranger suggests that 2000 sea birds could be removed from site A each year and
relocated in equal numbers to sites B and C.
The state matrix, S 2 , is now given by
S2 = T S1 + N
10000 1.0 0.10 0.05 −2000
where S 1 = 10000 , T = 0 0.80 0.05 and N = 1000 .
10000 0 0.10 0.90 1000
i S2 ii S 3 (assuming that S 3 = T S 2 + N) iii S 4 (assuming that S 4 = T S 3 + N).
4 Supporters of a football team attend home games. There are 3 areas, bays A, B and C,
where they sit. There is considerable moving of position from game to game and the
numbers attending the home games gradually decline as the year progresses. Let Xn
be the state matrix that shows the number of supporters in each bay n weeks into the
the season. The number of supporters in each location can be determined by the matrix
recurrence relation
Xn+1 = T Xn − D
This game
0.1 0.2 0.5 A
and D = 70
T = 0.3 0.7 0.2 B Next game
0.6 0.1 0.3 C
If X3 = 11 130 then X2 =
12370 4000 12 000 7500 79300
A 9510 B 10 000 C 11 000 D 8600 E 245232
4650 15 000 5000 12 000 231011
Leslie matrices are used to construct discrete models of population growth. In particular, they
are used to model changes in the sizes of different age groups within a population.
Note: Only the females of the species are counted in the population, as they are the ones who give birth to
the new members of the population.
A Leslie matrix is a transition matrix that can be used to describe the way population changes
over time. It takes into account two factors for the females in each age group: the birth rate, bi ,
and survival rate, si , where i is the number of the age group.
Birth rates We ignore migration, and so the population growth is entirely due to new female
births. The birth rate, bi , for age group i is the average number of female offspring from a mother
in age group i during one time period. For example, average birth rate of women in age group 4
(20 − 30 years) might be 1.7 female children for the 10 year period.
Survival rates The survival rate, s i , for age group i is the proportion of the population in
age group i that progress to age group i + 1. Note that 0 ≤ s i ≤ 1.
For example, the survival rate for age group 2 might be 0.95, that is 95% of females in this
10 − 20 year age group would survive to progress to age group 3, 20 − 30 years.
Note: The survival rate of the last age group (100 − 110) is taken to be 0.
A simple example
We start with a simple example where the life span of the species is 9 years. We will divide the
population into three age groups. This means we use a time period of 3 years.
Age group(i) 1 2 3
Age range (years) 0–3 3–6 6–9
1 2 3
0.6 0.3
We can now use the Leslie matrix, L, in combination with the initial state matrix S 0 to generate
the state matrix after one time period, S 1 , to find the size of each age group after one time period
(3 years) as follows:
0 2.3 0.4 400 1080
S 1 = LS 0 = 0.6 0 0 400 = 240
0 0.3 0 400 120
Thus after one time period, there are 1080 females in age group 1, 240 in age group 2 and 120
in age group 3 and the total population size has increased from 1200 (= 400 + 400 + 400) to
1440 (= 1080 + 240 + 120). Similarly, to find the number in each age group after two time
periods we calculate S 2 from S 1 as follows:
0 2.3 0.4 1080 600
S 2 = LS 1 = 0.6 0 0 240 = 648
0 0.3 0 120 72
Thus, after two time periods, there are 600 females in age group 1648 in age group 2 and 72 in
age group 3 and the over-all population size has decreased to 1320.
Finding the population matrix Sn after n time periods.
To speed up the process we can make use of the explicit formula for the state matrix S n after n
time periods. Notice that there is a pattern when calculating the population state matrices:
S 1 = LS 0
S 2 = LS 1 = L2 S 0
S 3 = LS 2 = L3 S 0
S n+1 = LS n = Ln S 0
In general, we can find the population matrix S n using the rule
S n = Ln S 0
Using this rule, to find S 3 , we have
0 2.3 0.4 400 1519.2
S 3 = L3 S 0 = 0.6 0 0 400 = 360
0 0.3 0 400 194.4
Continuing in this way, we can see the change over time in the total population and in the
distribution of the age groups.
Time period 0 1 2 3 4 5
Age 0–3 years 400 1080 600 1519.2 905.76 2139.70
Age 3–6 years 400 240 648 360.0 911.52 543.46
Age 6–9 years 400 120 72 194.4 108.00 273.46
Total 1200 1440 1320 2073.6 1925.28 2956.61
Leslie matrices
An m × m Leslie matrix has the form
b1 b2 b3 · · · bm−1 bm
s1 0 0 · · · 0 0
0 s
2 0 ··· 0 0
L =
0 0 s3 · · · 0 0
. .. .. . . .. ..
. .
. . . . .
0 0 0 · · · sm−1 0
m is the number of age groups being considered
s i , the survival rate, is the proportion of the population in age group i that progress to
age group i + 1
bi , the birth rate, is the average number of female offspring from a mother in
age group i during one time period.
Leslie matrix and its interpretation
From age group
1 2 3 4
0 1.4 1.2 0.3 1
0.6 0 0 0 2
L = To age group
0 0.5 0 0 3
0 0 0.1 0 4
This is a Leslie matrix with 4 age groups. The corresponding life-cycle transition
diagram is shown here.
1 2 3 4
0.6 0.5 0.1
Recursive rules
The population matrix S n is an m × 1 matrix representing the size of each age group after
n time periods. This is calculated using a recursive formula
S 0 is the initial state matrix, S n+1 = LS n
or the explicit rule
S n = Ln S 0
Use the Leslie matrix and initial state matrix below to answer the following questions.
From age group
1 2 3 4
0 1.8 2.6 0.1 1 1000
0.2 0 0 0 2 0
L = To age group S 0 =
0 0.4 0 0 3 0
0 0 0.3 0 4 0
a Write down
i the birth rate for age group 2 ii the survival rate for age group 3
b Complete a life cycle diagram for this Leslie matrix.
c Evaluate the following population state matrices.
S 1 , S 5 and S 20
d Given that S 16 = , determine S 17
Explanation Solution
a i The birth rate for age group 2 is Birth rate for age group 2 = 1.8
given in the matrix position, row 1,
column 2.
ii The survival rate for age group 3 is Survival rate for age group 3 = 0.3
given in the matrix position, row 4,
column 3.
b Survival rates 0.1
s1 = 0.2, s2 = 0.4, s3 = 0.3 2.6
Birth rates
1 2 3 4
b2 = 1.8, b3 = 2.6, b4 = 0.1 0.2 0.4 0.3
c S 1 = LS 0 Using a calculator.
S5 = L S0
5 0 149.76 3.84
S 20 = L20 S 0 200 26.4 0.97
S 1 = , S = , S =
0 5 16.64 20 0.49
0 8.64 0.19
d S 17 = LS 16
(Further investigation would reveal that
the population continues to decrease S 17 =
over time.)
Explanation Solution
a Enter the initial population
numbers into a 5 × 1 matrix.
S 0 = 40
c Survival rates
0 0
s1 = 0.6, s2 = 0.7, s3 = 0.5, 0.9
s4 = 0.2 0.2
1 2 3 4 5
Birth rates 0.6 0.7 0.5 0.2
b2 = 0, 2, b3 = 0.9, b4 = 0,
b5 = 0
0 0.2 0.9 0 0 10 8.7
0.6 0 0 0 0 25 10.17
d S 3 = L3 S 0 0 40 = 17.22
0 0.7 0
0 0 0.5 0 0 20 2.1
0 0 0 0.2 0 15 1.75
There are two goats in the 3-4 year old age group
in this population.
1000 2500 2 777 063
i S 5 = 250 ii S 10 = 531.25 iii S 50 = 582 688.05
62.5 218.75 244521.18
Example 12 A Leslie matrix and state matrix with constant rate of increase
i S 10 = 4953.39 ii S 14 = 10271.35 iii S 15 = 12325.62
1238.35 2567.84 3081.4043
By comparing S 14 and S 15 ,
24651.24 12325.62 3081.4043
= = ≈ 1.2
20542.70 10271.35 2567.84
we find that the growth rate is 1.2.
c 8 : 4 : 1 = 1600 : 800 : 200 ≈ 9906.78 : 4953.39 : 1238.35
a Let b3 = 8. Use your calculator to store the matrices L and S 0 . Then compute:
0 0 1000
S 1 = LS 0 = 250 , S 2 = L2 S 0 = S 3 = L3 S 0 = 0
0 ,
0 125 0
The population will continue to cycle through these three states; this is because L3 = I.
b Let b3 = 4. Then a numerical investigation suggests that the population decreases over
the long term:
0 0 0
S 1 = LS 0 = 250 , S 5 = L5 S 0 = 0 , S 50 = L S 0 = 0
0 62.5 0.0019
c Let b3 = 10. Then a numerical investigation suggests that the population increases over
the long term:
0 0 0
S 1 = LS 0 = 250 , S 5 = L5 S 0 = 0 ,
S 50 = L50 S 0 = 0
0 156.25 4440.89
sheet Exercise 11E
Example 8 1 Use the Leslie matrix and initial state matrix below to answer the following questions.
From age group
1 2 3 4
0 1.9 2.1 1.1 1 100
0.7 0 0 0 2 100
L = To age group S 0 =
0 0.5 0 100
0 3
0 0 0.6 0 4 100
a Write down
i The birth rate for age group 2 ii The survival rate for age group 3
b Complete the life cycle diagram for this Leslie matrix.
c Evaluate the following population state matrices.
i S1 ii S 3 iii S 20
d Given that S 7 = determine S 8 . Give your values correct to the nearest whole
2 Complete the life cycle diagram corresponding to each of the following Leslie
0 2.9 3.1 2.1 0 0 3 8
0 0 0.42
0.8 0 0 0 0.4 0 0 0
a b 0.6 0 0
0 0.7 0 0
0 0.5
0 0
0 0.75 0
0 0 0.5 0 0 0 0.25 0
c 0.6
1 2 3 4
0.5 0.4 0.05
Example 9 4 Information about a population of female kangaroos in a particular area is given in the
following table.
Example 10 6 Consider the following Leslie matrix L and initial population state matrix S 0 :
0 2 1 204
L = 0.5 0 0 and S 0 = 96
0 0.25 0 23
a Find
i S5 ii S 10 iii S 20
Premultiply each of these state matrices by 1 1 1 to calculate the total
populations at each of these stages and comment.
b Determine S 20 and S 21 . Divide each age group population for S 21 by the
corresponding age group population for S 20 and show that S 21 ≈ 1.057S 20 and
c Calculate
i 1.27S 0 ii 1.272 S 0 iii 1.273 S 0
Compare these answers to the answers of part b
Example 12 8 Consider the following Leslie matrix L and initial population matrix S 0 :
0 0 12 1200
L = 14 0 0 and S 0 = 0
0 31 0 0
a Find:
i LS 0 ii L2 S 0 iii L3 S 0
b Comment on these results in terms of the population behaviour. Try using a different
initial population matrix S 0 .
c Now investigate for each of the following Leslie matrices. Comment on population
increase or decrease.
0 0 6 0 0 15
i L = 41 0 0 ii L = 41
0 0
0 13 0 0 1
10 For a certain species of fish, we consider three age groups each of one year in length.
These fish reproduce only during their third year and then die. Assume that 20% of fish
survive their first year and that 50% of these survivors make it to reproduction age. The
initial population consists of 1000 newborns.
a Investigate what happens for each of the following values of b3 :
i b3 = 10 ii b3 = 15 iii b3 = 6
b For b3 = 20, determine the long-term growth rate and the proportion of fish in each
age group.
0.9 2.5 0.4
L = 0.3 0 0
0 0.45 0
Some of the species were moved into a sanctuary. The initial female population in the
sanctuary is given by
S 0 = 40
The best estimate of the total female population after 7 years is
A 1000 B 1500 C 2000 D 2500 E 3000
0 2 b
12 The Leslie matrix L = c 0 0 satisfies the matrix equation
0 d 0
16 16
L 4 = 4
2 2
The values of b, c and d are
1 1 1 1 1 1
A b = 2, d = , c = B b = 2, d = , c = C b = 4, d = ,c =
4 2 2 4 4 2
1 1 1 1
D b = 4, d = , c = E b = 2, d = , c =
2 4 4 2
13 A population of birds is modelled by using the Leslie matrix
0 2 1.5
L = 0.44 0 0
0 0.55 0
The growth has reached the point where the rates of growth of the different age groups
of the population are constant and the state matrix at this point is S k = 400 . The rate
of growth per time period is
A 10% B 11% C 12%
D 13% E 14%
State matrix A state matrix S n is a column matrix whose elements represent the
Transition nth state of a dynamic system defined by a recurrence relation of the
ment matrixes
form: S 0 = initial state, S n+1 = T S n . Here T is a square matrix called a
transition matrix.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
11A 2 I can set up a transition matrix from a written information.
11C 4 I can use a recurrence relation to calculate state matrices step by step
11C 5 I can use a recurrence relation Sn+1 = Tn S0 to determine the nth state.
11C 7 I can estimate steady state solution for suitable transition matrices.
11D 8 I can use the matrix recurrence relation S0 = initial state matrix, Sn+1 = TSn + B.
11E 9 I can determine state matrices and construct life cycle diagrams in situations
modelled by Leslie matrices.
11E 10 I can enter information into a Leslie matrix from written information.
11E 11 I can use numerical techniques to consider the limiting behaviour of Leslie
11E 12 I can identify the properties of a Leslie matrix and the state matrices when
there is a constant rate.
From From
A To A 0.75 0.25 B To A 0.75 0.05
B 0.05 0.95 B 0.25 0.95
From From
C To A 0.75 0.25 D To
A 0.75 0.95
B 0.95 0.05 B 0.25 0.05
To A 0.25 0.05
B 0.75 0.95
B To X 0.75 0.10 0.15
C To X 0.75 0.10 0.15
Y 0.60 0.05 0.35 Y 0.10 0.05 0.35
Z 0.50 0.30 0.20 Z 0.50 0.30 0.20
D To X 0.75 0.05 0.15 E To X 0.75 0.05 0.15
Y 0.10 0.60 0.20 Y 0.15 0.35 0.50
Z 0.15 0.35 0.50 Z 0.10 0.60 0.20
4 For this system, T 2 is:
0.36 0.25 0.56 0.55 0.6 0.5 1.2 1.0
0.16 0.25 0.44 0.45 0.4 0.5 0.8 1.0
E not defined
7 If L1 = T S 0 + B, where B = , then L1 equals:
70 150 170
176 210
220 180 160 164 120
8 If P1 = T S 0 − 2B, where B = , then P1 equals:
140 170 180 170 180
100 100 100 160 180
11 A large population of birds lives on a remote island. Every night each bird settles at either
location A or location B.
On the first night the number of birds at each location was the same. On each subsequent
night, a percentage of birds changed the location at which they settled.
The movement of birds between the two locations is described A B
by the transition matrix T shown opposite. Assume this pattern
A 0.8 0
of movement continues. T=
B 0.2 1
In the long term, the number of birds that settle at location A will:
12 The total number of people who are expected to change the candidate that they plan to vote
for 1 week after the election campaign begins is:
A 828 B 1423 C 2251 D 4269 E 6891
13 The election campaign will run for 10 weeks. If people continue to follow this pattern of
changing the candidate they plan to vote for, the expected winner after 10 weeks will be:
A Rob by about 50 votes B Rob by about 100 votes
C Rob by fewer than 10 votes D Anna by about 100 votes
E Anna by about 200 votes
Written response questions
1 The Diisco (D) and the Spin (S) are two large music venues in the same city. They both
open on the same Saturday night and will open on every Saturday night.
The matrix A1 opposite is the attendance matrix for the first
500 D
Saturday. This matrix shows the number of people who attended A1 =
240 S
the first Diisco and the number of people who attended the Spin.
The number of people expected to attend the second Saturday for each venue can be
determined using the matrix equation
A2 = GA1
This Saturday
−0.4 D Next Saturday
where G is the matrix G =
0.2 0.6 S
2 Suppose that the trees in a forest are classified into three age groups: young trees
(0–15 years), middle-aged trees (16–30 years) and old trees (more than 30 years). A time
period is 15 years, and it is assumed that in each time period:
10% of young trees, 20% of middle-aged trees and 40% of old trees die
surviving trees enter into the next age group; old trees remain old
dead trees are replaced by young trees.
3 The following table represents a study of a particular population of marsupials, which has
been divided into eight age groups. The table gives the initial population, birth rate and
survival rate for each age group.
Age group 1 2 3 4 5 6 7 8
Initial population 0 100 100 50 0 0 0 0
Birth rate 0 0.1 0.9 0.2 0 0 0 0
Survival rate 0.98 0.95 0.95 0.9 0.7 0.5 0.1 0
4 The growth of algae in a particular lake is being studied to protect the ecology from a
disastrous algal bloom. The algae can live for up to four days. So the population is divided
into four age groups of one day each. The fertility rates and survival rates are being
monitored so that the population can be modelled using a Leslie matrix.
At the beginning of the study in late winter (day 0), it was observed that the algae
concentration in the lake was 3200 cells per millilitre of water, with equal numbers in each
age group. The fertility rates on the four days of life were 0.2, 0.5, 0.6 and 0.4 respectively.
The survival rate for each of the first three days of life was 0.7.
a Write down a Leslie matrix to represent this particular model.
b Find the population matrix for cells per millilitre of water on day 20, correct to three
significant figures.
c Find the population matrix for cells per millilitre of water on day 21. Hence find the
rate of change in the algae concentration per day at this stage.
d With the coming of spring on day 21, the fertility rates increased to 0.3, 0.6, 0.7 and 0.5;
the survival rate remained unchanged. Find the population matrix after a further
three weeks (i.e. on day 42).
e With the arrival of warmer weather on day 42, the fertility rates increased to 0.3, 0.7,
0.8 and 0.5; the survival rate increased to 0.85. Suppose that an algal bloom is declared
if the concentration of algae reaches 100 000 cells per millilitre of water. Using trial
and error, find the day of the study on which an algal bloom was declared.
Revision: Matrices
Pn W = W for n =
A 1 B 2 C 3 D 4 E 5
0 0 1 0
0 1 0 0
4 O T S P
1 0 0 0
0 0 0 1
5 1 0 1 0 0 × 0 equals
A [0] B [1] C [2] D [3] E [5]
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
12A Exam 1 style questions: Matrices 597
6 Matrix A has four rows and three columns.
Matrix B has three rows and four columns.
Matrix C = B × A has:
A two rows and three columns B three rows and two columns
C three rows and three columns D four rows and two columns
E four rows and three columns
7 Matrix N = . The matrix P is the permutation matrix such that the 4 × 1 matrix
M = P × N has the smallest value at the top and the elements are in increasing order as
you go down the matrix. The matrix P is
0 0 1 0 0 1 0 0 0 0 1 0
0 1 0 0 0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 0 0 0 0 1
0 0 0 1 0 0 0 1 1 0 0 0
0 0 0 1 0 0 0 1
0 1 0 0 0 1 0 0
0 0 1 0 1 0 0 0
1 0 0 0 0 0 1 0
12 A is a 5 × 5 matrix.
B is a 8 × 5 matrix.
Which one of the following matrix expressions is defined?
A ABT B A3 B C BA + 3A
−1 2
D A(BA) E A − BA
13 S1 =
90 100 110 120 140
110 100 90 80 60
14 S 5 is closest to:
90 93.1 95.5 107.9 106.9
110 106.9 104.5 92.1 93.1
0 0 0 1
1 0 0 0
16 Consider the permutation matrix P = The row matrix X =
0 1 0 0
0 0 1 0
B O A R is permuted by finding the product XP2 . The result is
People in a suburb are very fitness conscious but get sick of going to the same gymnasium. It
is found that they go to four gymnasiums A, B, C and D but there are changes in attendance
every month. It is found that following transition matrix can be used to predict the number
of people at each of the four gymnasiums each month.
This month
0.30 0.25 0.1 0.1
A 300
S 0 =
T = 0.45
0.5 0.15 0.1 B Next month 300
0.13 0.2 0.45 0.2 C
0.12 0.05 0.3 0.6 D
17 Three hundred people go to gym C in September. How many of these people are in
gym A in October.
A 280 B 290 C 30 D 320 E 330
18 This transition matrix predicts that, in the long term, the people
A go only to A B go only to B
C go to only A and C D go only to B and D
E continue to visit all four gyms
20 A town has two hardware shops: Fairtrade (F) and Bungles (B). The percentage of
shoppers at each shop changes from day to day, as shown in the transition matrix T.
F 65% 70%
B 35% 30%
On a particular Monday, 35% of shoppers went to Fairtrade. The matrix recursion
relation S n+1 = T S n is used to model this situation. The percentage of shoppers who go
to Fairtrade on Wednesday of the same week is closest to
A 65% B 66% C 67% D 68% E 69%
21 The matrix gives the results of a table tennis round robin competition between five
players: A, B, C, D and E. A ‘1’ indicates a win of ’row’ over ‘column’.
0 0 0 1 0 A
1 0 1 0 1 B
Winner 1
0 0 0 0 C
0 1 1 0 0 D
1 0 1 1 0 E
When the sum of the one-step and two-step dominances is used to rank the players in
this competition, the ranking is:
A B, E, D, A, C B B, E, C, D, A
C B, E, D, C, A D E, B, D, A, C
E E, B, D, C, A
22 A taxi company has two depots at A and B. They always keep x taxis at A and z taxis at
B. The transition matrix T shows how the taxis change their nightly location.
A 65% 70%
B 35% 30%
On a particular night 14 taxis came from depot B to depot A. How many taxis in the
A 24 B 36 C 42 D 60 E 80
23 The matrix S n+1 is determined from the matrix S n using the recurrence relation
S n+1 = T × S n − C, where
0.3 0.8 0.4 80 58
T = 0.6 0.1 0.3 , S 0 = 60 , S 1 = 41
0.1 0.1 0.3 20 17
and C is a column matrix. Matrix S 2 is equal to
58 35 14.85 30.5 59.5
A 41 B 25 C 8.95 D 26.5 E 41.5
17 12 4.2 15.5 15
24 A group of people travel every weekday but they have a choice of the type of transport
they use: train (T), bus (B) or Car (C). They change from day to day according to the
transition matrix.
T 65% 70% 50%
Tomorrow B 20% 10% 25%
C 15% 20% 25%
On Monday 30% take a car and 70% take the train and no one takes the bus. What is
the percentage of people who are not expected to change their mode of transport from
Tuesday to Wednesday?
A 50.45% B 46% C 48.45% D 45.975% E 100%
B To A 0.30 0.30 0.55
C To A 0.75 0.10 0.15
B 0.60 0.05 0.35 B 0.60 0.05 0.35
C 0.1 0.30 0.20 C 0.50 0.30 0.20
D To A 0.75 0.05 0.15 E To A 0.30 0.30 0.55
B 0.10 0.60 0.20 B 0.15 0.35 0.50
C 0.45 0.35 0.20 C 0.45 0.35 0.20
2 Lake Blue and Lake Green are two small lakes connected by a channel. This enables
fish to move between the two lakes on a daily basis. Research has shown that each day:
67% of fish in Lake Blue stay in Lake Blue
33% of fish in Lake Blue move to Lake Green
72% of fish in Lake Green stay in Lake Green
28% of fish in Lake Green move to Lake Blue.
a Construct a transition matrix, T , of the form:
Blue Green
3 The life cycle of a type of insect can be descibed by a Leslie matrix. The stages of life
which are used are
Egg E Juvenile J Young adult Y adult A
From stage of life
0 0 20 30 E
0.5 0 0 0 J
L = To next stage of life
0 0.1 0 0 Y
0 0 0.05 0 A
897 E
438 J
The initial population is described by the 4 × 1 state matrix S 0 =
43 Y
2 A
The time period in this model is one week and the state matrix after n weeks can be
determined by S n+1 = LS n .
a From the Leslie matrix complete the life-cycle diagram.
b How many eggs are produced in total each week?
c How many insect eggs are there after one week?
d How many weeks pass before there are more than 1000 eggs?
e What percentage of young adult insects become adult insects each week?
f How many insects of every type (including eggs) are there after?
i 8 weeks ii 9 weeks
Give answers correct to the nearest whole number.
g It is known that after some weeks the rate of increase per week of the entire
population (including eggs) is very close to being a constant. Use the results of f to
give an estimate of this rate per week as a percentage correct to the nearest percent.
(That is, in the form a%, where a is a whole number.)
4 The following transition matrix, T , is used to help predict attendance at a weekly club
This meeting
attend not attend
0.80 0.30 attend
T = Next meeting
0.20 0.70 not attend
S 1 is the attendance matrix for the first club meeting of the year.
110 attend
S 1 =
40 not attend
S 1 indicates that 110 club members attended the first meeting and 40 club members did
not attend the first meeting.
a Use T and S 1 to:
i determine S 2 , the attendance matrix for the second meeting.
ii predict the number of club members attending the third meeting.
b Write down a matrix equation for S n in terms of T , n and S 1 .
c How many weeks does it take for the attendance to fall below 91?
d In the long term, how many club members are predicted to attend meetings?
5 A chemist wholesaler stocks three brands of hand sanitizer Cleanup (C), Loveneasy (L)
and Orama (O). The number of half litre bottles of these sanitizers sold in March 2022
is shown in matrix A below.
3000 C
A = 1500 L
2500 O
a i What is the order of matrix A?
ii The wholesaler expected that in April 2022 the sales of all three brands of
sanitizer would increase by 20%. She multiplied matrix A by a real number, k, to
determine the expected volume of sales for April. What is the value of k?
b A small chain of 4 chemists operate through this wholesaler and communicate with
each other rather inefficiently through an online facility. The communication links
are shown in this communication matrix receiver
A 0 1 1 0
B 1 0 0 0
M = sender
C 1 0 0 1
D 0 1 0 0
the ‘1’ in row A, column B indicates that A can send information to B.
the ‘0’ in row D, column C indicates that D cannot send information to C.
i Which pairs of chemists can send information directly to each other?
ii D needs to send documents to C. What is the sequence of communication links
that will successfully get the information from D to C?
iii Matrix M 2 below shows the number of two-step communication links between
each pair of chemists
A 2 0 0 1
0 1 1 0
M 2 = sender
C 0 2 1 0
D 1 0 0 0
Communication from C to B in two different two-step communication links is
possible. List each two-step communication link for this pair.
c The total purchases of the three hand sanitizers, Cleanup (C), Loveneasy (L) and
Orama (O), change from month. Let T denote the transition matrix and S n represent
the state matrix describing the number of shoppers buying each brand n months
after June 2022.
this month
0.7 0.8 0.5 C
T = 0.05 0.1 0.2 L next month
0.25 0.1 0.3 O
The initial state matrix S 0 below shows the number of shoppers at the four chemists
who bought each brand of hand sanitizer in June 2022.
200 C
S 0 = 150 L
250 O
i Calculate S 1 .
ii How many of the shoppers bought the same sanitizer in July as they did in June
iii Consider the shoppers who were expected to buy Cleanup in September 2022.
What percentage of these shoppers also bought Cleanup in August 2022?
d The shopping habits changed over the months. A rule to model this situation is
S n+1 = T × S n + B, where S n represents the state matrix describing the number of
shoppers n months after June 2021.
200 C
Here T is as above and S 0 = 150 L
250 O
405 C
If S 1 = 90 L find B.
165 O
6 In a cinema complex there are four cinemas A, B, C and D. They vary in the number
of seats, the standard of seating and other amenities. The number of empty (E) and
occupied seats (O) on a Friday afternoon is shown in the matrix below.
A 30 60
B 60 120
C 20 50
D 10 85
a What is the order of matrix M?
b What is the
i total number of seats
ii percentage of seats which are occupied on the Friday afternoon.
The cost of admission to each of the cinemas is given in the following matrix
Q = $8.50 $12.60 $18.00 $25.80
c The total payments for this Friday afternoon can be determined by the calculation
T otal = Q × L, where L is a 4 × 1 matrix. Write the matrix L with its entries and
calculate the matrix T otal.
7 A commercial art school offers classes in pottery (P), sculpture (S), drawing (D) and
weaving (W). Students are allowed to change activities every month. In January 2022
the number of students in each class can be described by the state matrix and the
movement from one month to the next is described by the transition matrix T .
30 P 0.6 0.2 0.4 0.1
30 S 0.2 0.3 0.1 0.6
S 0 = T=
30 D 0.1
0.2 0.2 0.2
30 W 0.1 0.3 0.3 0.1
8 A local take-away shop popular with students sells hamburgers (H), fish and chips
(F) and sandwiches (S ). The number of each item sold over three weeks is shown in
matrix M.
160 200 50 week 1
M = 180 210 55 week 2
210 240 80 week 3
a How many hamburgers were sold in these three weeks?
b What does the element m23 indicate?
c the total sales in dollars for three weeks for each of these items is given in the matrix
H 8250
C = F 9100
S 2220
Determine the unit cost of each of these items.
d The matrix expression shown gives the total cost of all hamburgers and sandwiches
in these three weeks.
Matrix L is a 1 × 3 matrix. Write down matrix L.
Chapter objectives
I What is a graph?
I How do we identify the features of a graph?
I How do we draw a graph?
I How do we apply graphs in practical situations?
I How do we construct an adjacency matrix from a graph?
I How do we define and draw a planar graph?
I How do we identify the type of walk on a graph?
I How do we find the shortest path between two vertices of a graph?
I How do we find the minimum distance required to connect all vertices of a
In this chapter graphs and their use as networks representing connections between
objects will be introduced, in addition to exploring their properties and applications.
Problems involving networks will be investigated and you will learn unique algorithms
such as Djikstra’s and Prim’s to solve such problems.
Degree of a vertex
Anna has three friends. The vertex representing Anna has Frances
three edges attached to it, connecting Anna to one of her Anna Brett
friends. The number of edges attached to a vertex is called
the degree of that vertex. Ethan Cora
The degree of the vertex representing Anna is odd, because there is an odd number of edges
connected to it. The degree of the vertex representing Dario is even because there is an even
number of edges connected to it.
In symbolic form, we can let the letter A represent the vertex for Anna. The degree of this
vertex can be written as deg(A). In this graph, deg(A) = 3.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
610 Chapter 13 Graphs, networks and trees: travelling and connecting problems
Imagine that Ethan is able to add himself as a friend on the social media website.
The edge representing this connection would connect the Frances
vertex representing Ethan, E, back to itself. This type of Brett
edge is called a loop.
A loop is attached twice to a vertex and so it will contribute (E)
two degrees. So deg(E) = 4.
Describing graphs
Graphs that represent connections between objects can take different forms and have
different features. This means that there is a variety of ways to describe these graphs.
Simple graphs
Simple graphs do not have any loops. There are no C
duplicate or multiple edges either.
Isolated vertex
A graph has an isolated vertex if there is a vertex B
that is not connected to another vertex by an edge.
The isolated vertex in this graph is E, because it is
not connected to any other vertex by an edge. E
Degenerate graphs
Degenerate graphs have all vertices isolated. This means B
that there are no edges in the graph at all.
A subgraph is a part of a larger graph. All of the edges and B C
vertices in the subgraph must exist in the original graph.
If there are extra edges or vertices, the graph will not be a D
subgraph of the larger graph.
Graph 1
Graph 2 Graph 3 Graph 4
Example 1 Graphs
Explanation Solution
a Count the number of times an edge The degree of vertex C is 4.
connects to vertex C. There are four deg(C) = 4.
b A vertex has a loop if an edge connects Vertex B and vertex F have loops.
it to itself.
c Count the number of times an edge The degree of vertex F is 5.
connects to vertex F. Remember that a deg(F) = 5
loop contributes two degrees.
d Look for an edge that, if removed, A bridge exists between vertex A and
would disconnect the graph. vertex C.
e There are a few possible answers for A
this question. Some are shown on the
right. B B
All of these graphs are considered to be equivalent to each other because they all contain
identical information. Each has edges connecting the same vertices. Graphs that contain
identical information like this are called equivalent graphs or isomorphic graphs.
Planar graphs
The graph opposite has two edges that overlap. It is important to B
note that there is no vertex at the point of overlap of the edges. A
It can help to think of an edge as an insulated electrical wire. It
is quite safe to cross two such electrical wires because the wires D
themselves never touch and never interfere with each other. The
edges that cross over in this diagram are similar, in that they do not
intersect and do not interfere with each other.
If a graph has edges that cross, it may be possible to redraw A B
the graph so that the edges no longer cross. The edge between
vertices A and D has been moved, but none of the information in C
the graph has changed. Graphs where this is possible are called
planar graphs. If it is impossible to draw an equivalent graph D
without crossing edges, the graph is called a non-planar graph.
Explanation Solution
1 Choose one of the edges that crosses over another B C
Euler’s formula
Leonard Euler (pronounced ‘oiler’) was
one of the most prolific mathematicians B
of all time. He contributed to many areas
f1 f3
of mathematics and his proof of the rule f2
named after him is considered to be the
beginning of the branch of mathematics A F
called topology. E
A planar graph defines separate regions of the paper it is drawn on. These regions are
enclosed spaces that you could colour in and these regions are called faces. An often-
forgotten face of a graph is the space outside of the graph itself, covering the infinite space
around it. This face is labelled f4 in the graph above.
The number of faces for a graph can be counted. In the graph shown above, there are four
faces, labelled f1 , f2 , f3 and f4 .
Euler’s formula
There is a relationship between the number of vertices, v, the number of edges, e, and the
number of faces, f, in a connected planar graph.
In words: number of vertices + number of faces = number of edges +2
In symbols: v + f = e + 2
Euler’s formula
For any planar graph:
v+ f =e+2
where v is the number of vertices, e is the number of edges and f is the number of faces
in the graph.
Explanation Solution
a Temporarily remove an B C
edge that crosses another
edge and redraw it so that
it does not cross another A D
b Count the number of vertices, In the planar graph there are five vertices,
edges and faces. seven edges and four faces.
v+ f =e+2
Euler’s formula is verified.
A connected planar graph has six vertices and nine edges. How many faces does the
graph have? Draw a connected planar graph with six vertices and nine edges.
Explanation Solution
a Write down the known values.
v=6 e=9
b Substitute into Euler’s formula and v+ f =e+2
solve for the unknown value.
6+ f =9+2
6 + f = 11
f = 11 − 6
f =5
This graph has five faces, labelled
f1 , f2 , f3 , f4 and f5 .
c Sketch the graph. D
Note: There are other possible graphs. E
B f f3 f F
f1 2 4
sheet Exercise 13A
Equivalent graphs
3 In each question below, three graphs are isomorphic and the fourth is not. Identify the
graph which is not isomorphic to the others.
a i ii iii iv
b i ii iii A B iv
c iA ii B C iii B iv B
c D d B
Euler’s formula
Example 3 5 For each of the following graphs:
i state the values of v, e and f ii verify Euler’s formula.
a b c
d e f
Properties of graphs
Example 4 7 A connected planar graph has eight vertices and thirteen edges. Find the number of
faces of this graph.
8 A connected graph has five vertices and seven edges. Find the sum of the degrees of
the vertices.
9 Find the number of edges needed to make a complete graph with six vertices.
A 0 1 1 0 1
1 0 2 1 0
C 1 2 0 0 0
D 0 1 0 0 0
E 1 0 0 0 0
Explanation Solution
1 Draw a dot for each vertex and label B
A to E. D
2 There is a ‘2’ in the intersection of
row A and column C. This means there
are two edges connecting vertex A and
vertex C. Add these to the graph.
3 Note the ‘1’ in the intersection of row D and column D. This shows that there is a loop
at vertex D.
4 Look at every intersection of row and column and add edges to the graph, if they do
not already exist.
Note: This graph is drawn as a planar graph, but this is not strictly necessary unless required by the
Example 6
Construct an adjacency matrix that can be used to represent A B C
the graph opposite. This graph represents the ways that
three houses A, B and C are connected to three utility
outlets, gas (G), water (W) and electricity (E).
Explanation Solution
The convention used to enter the values is A B C G W E
the same as discussed above. A 0 0 0 1 1 1
B 0 0 0 1 1 1
C 0 0 0 1 1 1
G 1 1 1 0 0 0
W 1 1 1 0 0 0
E 1 1 1 0 0 0
The graph in Example 6 is called a bipartite graph as the set of vertices is separated into
two sets of objects Houses (A, B, C) and Utility outlets (G, W, E) with each edge connecting
a vertex in each set. You will meet bipartite graphs again in Chapter 14 when studying
allocation problems.
Adjacency matrices
The adjacency matrix A of a graph is an n × n matrix in which, for example, the entry in
row C and column F is the number of edges joining vertices C and F.
A loop is a single edge connecting a vertex to itself.
Loops are counted as one edge.
Exercise 13B
d A B e A f B
Properties of graphs
3 The adjacency matrix on the right has a row and column A B C
for vertex C that contains all zeros. What does this tell you A
0 1 0
about vertex C? B 1 0 0
C 0 0 0
4 Every vertex in a graph has one loop. What feature of the adjacency matrix would tell
you this information?
5 A graph has five vertices: A, B, C, D and E. It has no duplicate edges and no loops. If
this graph is complete, write down the adjacency matrix for the graph.
D 10 times E 11 times
9 Of the 25 elements in the adjacency matrix, the numbers ‘2’ or ‘3’ appear
A 6 times B 7 times C 8 times D 9 times E 10 times
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
13B 13B Adjacency matrices 623
11 The number of faces of the graph represented by the adjacency matrix above is
A 2 B 3 C 4 D 5 E 6
P 0 1 1 1 1 1 P 0 1 1 1 1 1
P 0
1 1 2 1 2
Q 0 1 1 1 1 1 Q 1 0 1 0 0 0
Q 1 0 1 0 0 0
R 1 1 0 1 1 1
R 0 1 1 1 1 1 R 1 1 0 2 1 1
S 1
0 1 1 1 1
S 1 0 1 1 2 2 S 2 0 2 1 2 2
T 1 0 1 2 0 1 T 1 0 1 2 0 1
T 1 0 1 1 0 1
U 1 0 1 2 1 0 U 2 0 1 2 1 0
U 1 0 1 1 1 0
P 0 1 1 2 1 2 P 0 1 1 2 1 2
Q 1 0 1 0 0 0 Q 1 0 1 0 0 0
R 1 1 0 2 1 1 R 1 1 0 2 1 1
S 2 0 2 0 3 2 S 2 0 2 1 3 2
T 1 0 1 3 0 1 T 1 0 1 3 0 1
U 2 0 1 2 0 0 U 2 0 1 2 1 0
Graphs can be used to model and analyse problems involving exploring and travelling.
These problems include minimising the distance travelled or time taken between different
locations using different routes. For example, a courier driver would like to know the
shortest route to use for deliveries, and a tour guide would like to know the quickest route
that allows tourists to see a number of sights without retracing their steps.
To solve these types of problems, you will need to learn the language we use to describe the
different ways of navigating through a graph, from one vertex to another.
A walk is a sequence of edges, linking successive A start
vertices in a graph.
A walk starts at one vertex and follows any route to B C D
finish at another vertex.
The red line in the graph opposite traces out a walk. end
This walk can be written down by listing the vertices in E F G
the order they are visited: A−C−A−D−G.
A trail is a walk with no repeated edges. A end
The red line in the graph opposite traces out a trail. This
trail can be written down by listing the vertices in the B C D
order they are visited: B−E−F−C−B−A.
Note: There are no repeated edges in this trail, but one vertex (B) is
A path is a walk with no repeated edges and no repeated A start
The red line in the graph opposite traces out a path. This B C D
path can be written down by listing the vertices in the
order they are visited: A−D−C−F−E−B.
A circuit is a trail (no repeated edges) that starts and A
ends at the same vertex. Circuits are also called closed
trails. start
The red line in the graph opposite traces out a circuit. end
This circuit can be written down by listing the vertices in
the order they are visited: A−C−F−G−D−C−B−A.
Note: There are no repeated edges in this circuit, but one vertex,
C, is repeated. The start and end vertices are also repeated because
of the definition of a circuit.
A cycle is a path (no repeated edges, no repeated vertices) A
that starts and ends at the same vertex. The start and end
vertex is an exception to repeated vertices. Cycles are also
called closed paths.
The red line in the graph opposite traces out a cycle. This start
cycle can be written down by listing the vertices in the E G
order they are visited: F−E−B−C−F.
Note: There are no repeated edges and no repeated vertices in this
cycle, except for the start and end vertices.
Identify the walk in each of graphs below as a trail, path, circuit, cycle or walk only.
a B b B
start start
C end C
c B start d B end
A A start
end F
a This walk starts and ends at the same vertex so it is either a circuit or a cycle. The walk
passes through vertex C twice without repeated edges, so it must be a circuit.
b This walk starts and ends at the same vertex so it is either a circuit or a cycle. The walk
has no repeated vertex or edge so it is a cycle.
c This walk starts at one vertex and ends at a different vertex, so it is not a circuit or a
cycle. There is one repeated vertex (B) and no repeated edge, so it must be a trail.
d This walk starts at one vertex and ends at a different vertex so it is not a circuit or a
cycle. There are repeated vertices (C and E) and repeated edges (the edge between C
and E), so it must be a walk only.
Hint: To remember the difference between eulerian and hamiltonian travels, remember that eulerian refers to
edges, and both start with ‘e’.
Explanation Solution
a A road connection exists between: Kinglake
St Andrews and Kinglake
St Andrews and Yarra Glen
St Andrews
Kinglake and Yarra Glen
Kinglake and Toolangi Yarra Glen
Yarra Glen and Toolangi
Yarra Glen and Healesville
Healesville and Toolangi.
b The graph has two odd-degree vertices There are exactly two odd-degree vertices
(Toolangi and Kinglake). in this graph. An Eulerian trail will exist,
but an Eulerian circuit does not.
c There are a few different answers to An Eulerian trail, starting at Toolangi
this question. One of these is shown. is: Toolangi−Healesville−Yarra
Glen−Toolangi−Kinglake−Yarra Glen−St
d There are two different answers to this A Hamiltonian cycle that begins at
question. One of these is shown. Healesville is: Healesville−Yarra Glen−St
sheet Exercise 13C
Describing travels
Example 7 1 Identify the walk in each of the graphs below as a trail, path, circuit or walk only.
a start b
c d
e B f
5 List a Hamiltonian path for this graph, starting at F A B
and finishing at G.
Example 8 6 Four children each live in a different town. The A
diagram opposite is a map of the roads that link
the four towns, A, B, C and D. B
a How many different ways can a vehicle travel between town A and town B without
visiting any other town?
7 The following graph shows the roads linking nine Victorian country towns.
Wickliffe Lake Bolac
Nerrin Nerrin
Hexham Mortlake
Properties of graphs
8 A graph has six vertices, A, B, C, D, E A B C D E F
and F. The adjacency matrix for this A 0 0 1 1 0 0
graph is shown opposite. B 0 0 0 0 1 1
a Is the graph connected? C 1 0 0 2 1 0
D 1 0 2 0 0 1
b Is the graph planar?
E 0 1 1 0 0 0
c Does the graph contain a bridge? F
0 1 0 1 0 1
13 Which one of the following is a Hamiltonian path for the graph above?
14 The graph above will have a Eulerian circuit if an edge could be added between the
A E and C B A and B C A and F
D A and D E F and B
Weighted graphs
The edges of graphs represent connections between the vertices. Sometimes there is more
information known about that connection. If the edge of a graph represents a road between
two towns, we might also know the length of this road, or the time it takes to travel this road.
Extra numerical information about the edge that connects vertices can be added to a graph
by writing the number next to the edge. Graphs that have a number associated with each
edge are called weighted graphs.
The weighted graph in the diagram Croghon
on the right shows towns, represented
12 6 Melville 13 Kenton
by vertices, and the roads between
those towns, represented by edges. 20 8
Bartow 5
The numbers, or weights, on the 7 11
edges are the distances along the 9
roads. Stratmoore Osburn
Weighted graphs in which the weights are physical quantities, for example distance, time or
cost, are called networks.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
634 Chapter 13 Graphs, networks and trees: travelling and connecting problems 13D
Explanation Solution
1 List options for travelling B−S−M−O−K
from Bartow to Kenton. B−S−M−K
2 Add the weights for each B−S −M−O−K 7 + 8 + 5 + 11 = 31 km
route. B−S −M−K 7 + 8 + 13 = 28 km
B−S −O−K 7 + 9 + 11 = 27 km
3 Write your answer. The shortest path from Bartow to Kenton is 27 km
with route B−S −O−K.
sheet Exercise 13D
4 Determine the shortest path from S to F in the following weighted graphs. Write down
the length of the shortest path.
a A 6 C b A 9 D
3 4 1
S F 3 4
C 2
2 3 S
B 7 D 5 F
c A d A 7 G
3 E 5
7 D 3 2 4
4 6 6 2
5 S B 9 F
S B 1 F
7 4 2 6
8 H
8 4
2 E
4 2
1 D
3 E 8
3 5
4 2
B 4 H
6 2
A 37 B
20 26
10 9
47 C D
Home School
20 17
12 26
30 60
E 8 F
8 Victoria rides her bike to school each day. The edges of the network on the previous
page represent the roads that Victoria can use to ride to school from her home. The
numbers on the edges give the time taken, in minutes, to travel along each road.
The fastest Victoria can ride from home to school is
A 80 minutes B 81 minutes C 83 minutes
D 84 minutes E 98 minutes
9 Which of the following represent the fastest route for Victoria’s journey from home to
A Home − A − B − School B Home − A − B − D − School
C Home − C − D − School D Home − C − B − D − School
E Home − E − C − B − D − School
The algorithm
You may choose to read through the example first to see a detailed implementation. Here we
write the algorithm to emphasise its repetitive aspect.
Step 1: Assign the starting vertex a label of value zero and circle the vertex and the zero
Once a vertex and its label have been circled it cannot be changed
Step 2: Consider the vertex which has been most recently circled. Suppose this vertex to be
X and the label of value d assigned to it. Then, in turn, consider each vertex directly joined
to X but not yet permanently circled. For each such vertex, Y say, temporarily assign it with
the value d + (the weight of edge XY) if Y does not have a temporary value or if it does,
assign the lesser of d + (the weight of edge XY) and the existing temporary value.
Step 3: Choose the least of all temporary value labels on the network. Make this value label
permanent by circling it.
Step 4: Repeat Steps 2 and 3 until the destination node has a permanent label.
Step 5: Go backwards through the network, retracing the path of shortest length from the
destination vertex to the starting vertex by
starting at destination and go to the circled vertex with value = destination value − edge
continuing to move back to the start vertex folowing this procedure.
Explanation Solution
Step 1 0 B 2 E
Assign the starting vertex a zero and A
3 3
circle the vertex and its new value of 2
6 2
zero. F
C 1 D
A is the starting vertex, it is assigned
zero and it is circled.
Step 2
Assign a value to each vertex connected
0 B 2 2 E
to the starting vertex. The value 2
assigned is the length of the edge 3
2 3
connecting it to the starting vertex. 6 2
Circle the vertex with the lowest
assigned value.
C6 1 D
The starting vertex A is connected to
vertices B and C.
The vertex B is assigned 2 and the
vertex C is assigned 6.
Vertex B is circled because it has the
lowest value.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
13E Dijkstra’s algorithm 639
Explanation Solution
Step 3 2
B 2 2
From the newly circled vertex, assign
2 3 3
a value to each vertex connected to it 6 2
by adding the value of each connecting 5
1 5
C4 D
edge to the newly circled vertex’s
value. The newly circled vertex B is connected
If a connecting vertex already has a to three vertices; C, D and E. Starting
value assigned and the new value is less with vertex B’s value of 2, E is assigned
than it, replace it with the new value. 4 (adding 2 from the connecting edge)
If a vertex is circled, it cannot have its and D is assigned 5 (adding 3 from the
value changed. connecting edge).
Consider all uncircled vertices and The vertex C will be re-assigned 4
circle the one with the lowest value. (adding 2 from the connecting edge)
because it is lower than 6.
Now there are two uncircled vertices
with the lowest assigned value 4,
vertices C and E; it does not matter
which one is circled. C is circled.
Step 4 B2 2 4E
0 2
Repeat Step 3 until the destination
3 3
2 2
vertex and its assigned value are 6 F
circled. 5
1 5
C4 D
The length of the shortest path will be
the assigned value of the destination Vertex F is the destination vertex,
vertex. assigned a value of 7. Therefore the
Step 5 shortest path from A to F has a length
The shortest path is found by of 7.
backtracking. Starting at the destination To find the shortest path, start at F and
vertex, move to the circled vertex consider the two connecting edges to it.
whose value is equal to the destination The edge of length 3 is correct, because
vertex’s assigned value, subtract the 7 minus 3 equals 4, the value of vertex
connecting edge value. Continue to E.
subtract the connecting edge value from Likewise from E, subtract the
one circled vertex to the next until you connecting edge of 2 to vertex B to
reach the starting vertex. equal 2, then subtract the connecting
Note: Once a vertex is assigned a value, it cannot edge of 2 to A to equal zero.
be assigned a larger value, even if it has not been Therefore the shortest path from A to F is:
circled yet. You do not need to circle all vertices.
A − B − E − F with a length of 7.
Stop when the destination vertex is circled.
sheet Exercise 13E
In the previous applications of networks, the weights on the edges of the graph were used to
determine a minimum pathway through the graph. In other applications, it is more important
to minimise the number and weights of the edges in order to keep all vertices connected to
the graph. For example, a number of towns might need to be connected to a water supply.
The cost of connecting the towns can be minimised by connecting each town into a network
ISBN 978-1-009-11041-9 © Peter
or water pipes only once, rather than Jones et aleach
connecting 2023 town to every other town.
Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
13F Trees and minimum connector problems 641
Problems of this type are called connector problems. In order to solve connector problems,
you need to learn the language of networks that have as few edges as possible.
A tree is a connected graph that has no loops, multiple G
edges or cycles.
This tree has seven vertices and six edges.
The number of edges is always one less than the number A E
of vertices. C
Spanning trees
Every connected graph will have at least one subgraph that is a tree. A subgraph is a tree,
and if that tree connects all of the vertices in the graph, then it is called a spanning tree.
A tree has no loops, multiple edges or cycles.
If a tree has n vertices, it will have n − 1 edges.
A spanning tree is a tree that connects all of the vertices of a graph.
There can be more than one spanning tree for any connected graph. The total weight of a
spanning tree is the total of all the weights on the edges that make up the tree.
5 7
2 5 2
Explanation Solution
a 1 Count the number of vertices and There are 9 vertices and 13 edges.
edges in the graph.
2 Calculate the number of edges in the The spanning tree will have 8 edges.
spanning tree.
5 2
Weight = 5 + 2 + 1 + 4 + 5 + 2 + 3 + 4
= 26
6 E
Explanation Solution
Start with vertex A. C
The smallest weighted edge from vertex A is to B 8
with weight 2. B
2 3
6 5
A 5 D
6 E
Look at vertices A and B. The smallest weighted C
edge from either vertex A or vertex B is from A to 8
D with weight 5. B
2 3
6 5
A 5 D
6 E
Look at vertices A, B and D. The smallest C
weighted edge from vertex A, B or D is from 8
D to C with weight 3. B
2 3
6 5
A 5 D
6 E
6 E
Look at vertices A, B, D, C and E. The smallest C
weighted edge from vertex A, B, D, C or E is from 8
E to F with weight 2. B
2 3
All vertices have been included in the graph. This 6 5
is the minimum spanning tree. A 5 D
6 E
Add the weights to find the total weight of the The total weight of the
minimum spanning tree. minimum spanning tree is
2 + 5 + 3 + 5 + 2 = 17.
Water is to be piped from a water tank to seven outlets on a property. The distances (in
metres) of the outlets from the tank and from each other are shown in the network below.
Starting at the tank, the aim is to find the minimum length of pipe, in metres, which will
be needed to have water piped to all outlets in the property.
Outlet A 13 Outlet B
11 6
12 6 Outlet C
Tank 2
Outlet F Outlet G 11
10 8 5
Outlet E 9 Outlet D
a On the diagram, show where the water pipes will be placed in order to minimise the
length required.
b Calculate the total length, in metres, of the water pipe that is required to obtain this
minimum length.
Explanation Solution
a 1 The water pipes will be a Outlet A 13 Outlet B
minimum length if they are 12 6
6 11
placed on the edges of the Outlet C
Outlet F Outlet G Tank 2
minimum spanning tree for
5 11
the network. 10 8
A starting point for Prim’s
Outlet E 9 Outlet D
algorithm is the vertex that
is connected to the tank by
the edge with the smallest
weight. The starting vertex
(Tank), the edge and
the vertex it connects to
form the beginning of the
minimum spanning tree.
2 Follow Prim’s algorithm to Outlet A 13 Outlet B
find the minimum spanning 12 6
6 11
tree. Outlet C
Outlet F Outlet G Tank 2
5 11
10 8
Outlet E 9 Outlet D
b Add the weights of the The length of water pipe required is 2 + 6 + 5 + 8 +
minimum spanning tree. 7 + 10 + 6 = 44 metres
Write your answer.
sheet Exercise 13F
Spanning trees
Example 11 1 A weighted graph is shown on the right. 5
a How many edges must be removed
in order to leave a spanning tree? 4 2
5 3
b Remove some edges to form three 3 4
different trees.
c For each tree in part b, find the total 2 2 3 6
Minimum spanning trees and Prim’s algorithm
Example 12 2 Find a minimum spanning tree for each of the following graphs and give the total
a A 2 B 2 E b 24 C
6 2 2 F 16 16 16
3 17 E 17
5 A D
C 1 D 11
10 12 15
G 17 F
c B 18 C d D
10 H 70 100 140
E 19 E G
18 20 90 C 100 200
10 9 80 90
A 18 D 90
Connector problems
Example 13 3 In the network opposite, the vertices represent 300
water tanks on a large property and the edges 70 40 140
80 70
represent pipes used to move water between 90 60 80
these tanks. The numbers on each edge 40 40
80 50
indicate the lengths of pipes (in m) connecting 60 110 120
different tanks. 90
Determine the shortest length of pipe needed to connect all water storages.
5 The minimum spanning tree for the graph below includes the edge with weight
labelled k.
The total weight of all edges for 6 6
D 9 E 10
9 8 k
C a = 10 and b = 14
D a = 10 and b = 15
E a = 13 and b = 12
Loop A loop is an edge that connects a vertex to itself. In the graph above
there is a loop at vertex C.
Degree of a The degree of a vertex is the number times edges attach to a vertex.
vertex The degree of vertex A is written as deg(A).
In the graph above, deg(A) = 4, deg(B) = 2 and deg(D) = 2.
A loop has degree of 2. In the graph above, deg(C) = 4.
Multiple edge Sometimes a graph has two or more identical edges. These are called
multiple edges. In the graph above, there are multiple edges between
vertex A and vertex D.
Simple graph Simple graphs are graphs that do not have loops and do not have
multiple edges.
Isolated vertex An isolated vertex is one that is not connected to any other vertex.
Isolated vertices have degree of zero.
Degenerate A degenerate graph has no edges. All of the vertices are isolated.
Connected graph A connected graph has no isolated vertex. There is a path between
each pair of vertices.
Complete graph A complete graph has every vertex connected to every other vertex by
an edge.
Subgraph A subgraph is a graph that is part of a larger graph and has some of the
same vertices and edges as that larger graph. A subgraph does not have
any extra vertices or edges that do not appear in the larger graph.
Face An area in a graph or network that can only be reached by crossing an
edge. One such area is always the area surrounding a graph.
Planar graph A planar graph can be drawn so that no two edges overlap or intersect,
except at the vertices.
This graph is planar. This graph is not planar.
Adjacency matrix An adjacency matrix is a square matrix that uses a zero or an integer
to record the number of edges connecting each pair of vertices in the
Circuit A circuit is a trail (no repeated edges) that starts and ends at the same
In the graph above, A−F−D−E−F−B−A is an example of a circuit.
Cycle A cycle is a path (no repeated edges nor vertices) that starts and ends
at the same vertex. The start and end vertex is an exception to repeated
In the graph above, B−F−D−C−B is an example of a cycle.
Eulerian trail An Eulerian trail is a trail (no repeated edges) that includes all of the
edges of a graph. Eulerian trails exist if the graph is connected and has
exactly zero or two vertices of odd degree. The remaining vertices have
even degree.
Eulerian circuit An Eulerian circuit is a trail (no repeated edges) that includes all of the
edges of a graph and that starts and ends at the same vertex. Eulerian
circuits exist if the graph is connected and has all of the vertices with an
even degree.
Hamiltonian path A Hamiltonian path is a path (no repeated edges or vertices) that
includes all of the vertices of a graph.
Hamiltonian cycle A Hamiltonian cycle is a path (no repeated edges or vertices) that
starts and ends at the same vertex. The starting vertex is an exception to
repeated vertices.
Weighted graph A weighted graph has numbers, called weights, associated with the
edges of a graph. The weights often represent physical quantities as
additional information to the edge, such as time, distance or cost.
Shortest path The shortest path through a network is the path along edges so that the
total of the weights of that path is the minimum for that network.
Shortest path problems involve finding minimum distances, costs
or times through a network. Shortest paths can be determined by
inspection or by using Dijkstra’s algorithm.
Spanning tree A spanning tree is a tree that connects every vertex of a graph.
A spanning tree is found by counting the number of vertices (n) and
removing enough edges so that there are n − 1 edges left that connect all
Minimum A minimum spanning tree is a spanning tree for which the sum of the
spanning tree weights of the edges is as small as possible.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
13A 3 I can define and identify simple graphs, isolated vertices, degenerate graphs,
connected graphs, bridges and subgraphs.
13C 10 I can define walks, trails, paths, circuits and cycles through a graph.
13D 15 I can calculate the shortest path from one vertex to another by inspection.
13D 16 I can calculate the shortest path from one vertex to another using Dijkstra’s
Multiple-choice questions
1 The minimum number of edges for a graph with seven vertices to be connected is:
A 4 B 5 C 6 D 7 E 21
2 Which of the following graphs is a spanning tree for the A
network shown?
3 For the graph shown, which vertex has degree 5? Q
4 A connected graph with 15 vertices divides the plane into 12 regions. The number of
edges connecting the vertices in this graph will be:
A 15 B 23 C 24 D 25 E 27
6 A connected planar graph divides the plane into a number of regions. If the graph has
eight vertices and these are linked by 13 edges, then the number of regions is:
A 5 B 6 C 7 D 8 E 10
12 Of the following graphs, which one has both Eulerian circuit and Hamiltonian cycles?
13 A graph with six vertices is connected with the minimum number of edges. The
minimum number of extra edges needed to make this a complete graph is
A 5 B 6 C 10 D 14 E 16
16 The network below shows the distance, in metres, between points. The shortest path
between S and T has length 36 m.
A 15 F
3 8
22 D
9 10 T
B 2
s 18 G
3 x
17 6 7
C 15 H
The value of x is.
A 4 B 5 C 6 D 7 E 8
2 4 1 4
1 5
4 2
A 3 G
2 4
b D 58 G
28 45
B 20 30
18 H X
50 E 9
A 56
20 I 16 52
C 16 F 72 J
c K
A 3 B
4 7
5 26 I 12 L
9 7
12 27 8 20
C H 11
D M 11
E 9 20
15 14 7 X
11 4 6
3 9 6
F 9 G 18
J 17 N
d B K
F 9
5 2 5 13
D I 5
A 16 N
12 4
3 G 3
5 10 L 16
6 8 7
6 E 3 8
8 X
C 6 J 21 9
2 The map shows six campsites, B
6 9
A, B, C, D, E and F, which are joined by
tracks. The numbers by the paths show C
lake 4
lengths, in kilometres, of that section of A 5 D
track. 5
1 4
F 3 E
a i Complete the graph opposite, B
which shows the shortest direct 6
distances between campsites. C
(The campsites are represented A 8 D 4
by vertices and tracks are 5
F 7
represented by edges.) 4
8 5
ii A telephone cable is to be laid to enable each campsite to phone each other
campsite. For environmental reasons, cables can only be laid along the tracks
and cables can only connect to one another at the campsites. What is the
minimum length of cable necessary to complete this task?
iii Fill in the missing entries for the
adjacency matrix shown for the A B C D E F
completed graph formed above. A 0 1 0 1 1 1
B 1 0 1 0 0 0
C 0 1 0 1 1 0
D 1 0 − − − −
E 1 0 − − − −
F 1 0 − − − −
Stratmoore Osburn
a What is the degree of the vertex representing Melville?
b Determine the sum of the degrees of the vertices of this graph.
c Verify Euler’s formula for this graph.
A salesperson might need to travel to every village in this network to conduct business.
d If the salesperson follows the path Stratmoore − Melville − Kenton− Osburn
− Melville− Croghon − Bartow − Stratmoore, has the salesperson followed a
Hamiltonian cycle? Give a reason to justify your answer.
e If the salesperson follows the path Croghorn− Bartow − Stratmoore − Melville−
Kenton − Osburn, what is the mathematical term for this path?
It would make sense for the salesperson to avoid visiting a certain village more than
once, and it would also make sense for them to return ‘home’ after travelling the
shortest distance possible.
f If the salesperson starts and ends in Bartow, find the shortest route and state the
shortest distance the salesperson would have to travel.
g If the salesperson can start and end at any village in the network, what is the shortest
route possible?
A road inspector must travel along every road connecting the six villages.
h Explain why the inspector could not follow an Eulerian circuit through this road
i The inspector may start and end their route at different villages, but would like to
travel along each road once only. Which villages can the inspector start their route
from? Write down a path the inspector could take to complete their work.
j The speed limit for each of these roads is 60 km/hr. If the inspector must complete
their work by 5 p.m, what is the latest time that the inspector can begin their work?
New electrical cables connecting the villages are required. They will be installed along
some of the roads listed in the graph above. These cables will form a connected graph
and the shortest total length of cable will be used.
k Give a mathematical term to describe a graph that represents these cables.
l Draw the graph that represents these cables and find the total length of cable
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Flow, matching and
scheduling problems
Chapter objectives
I How do we define a directed graph?
I How do we define flow?
I How do we calculate the maximum flow through a network?
I How do we draw and use a bipartite graph to solve allocation problems?
I How do we find the optimal allocation of multiple groups of objects?
I How do we identify predecessors of an activity?
I How do we draw an activity network and use it to plan for a project?
I How do we account for float times in our project?
I How do we find the earliest starting time and latest finishing time for an
activity in a project?
I How do we identify the critical path of an activity network?
In the previous chapter, undirected graphs were used to define and represent situations.
In this chapter, directed graphs will be used to model networks and solve problems
involving travel, connection, flow, matching, allocation and scheduling.
Directed graphs
In the previous chapter, graphs were used to represent connections between people, places
or objects. The vertices of a graph represented objects, such as towns, and edges represented
the connections between them, such as roads. Weighted graphs included extra numerical
information about the connections, such as distance, time or cost. When a graph has this
numerical information we call it a network.
A directed graph, or digraph, records directional information
on networks using arrows on the edges. The network on the right
shows roads around a city. The vertices are the intersections of
the roads and the edges are the possible road connections between
the intersections. The arrows show that some of the roads only
allow traffic in one direction, while others allow traffic in both
Maximum flow
If pipes of different capacities are connected one after the other, the maximum flow
through the pipes is equal to the minimum capacity of the individual pipes.
Explanation Solution
a Look at the subgraph that includes 600
town C. The smallest capacity of the
individual roads is 300 cars per hour. 300 800
This will be the maximum flow through A E
town C. (source) (sink)
500 800
500 D 150
It is difficult to determine the maximum flow by inspection for directed networks that
involve many vertices and edges. We can simplify the search for maximum flow by
searching for cuts within the digraph.
A cut divides the network into two parts, B
completely separating the source from the
sink. It is helpful to think of cuts as imaginary 700
breaks within the network that completely
block the flow through that network. For the C 400
network or water pipes shown in this diagram, 1500
the dotted line is a cut. This cut completely
blocks the flow of water from the source (S) 1200 600
to the sink (A).
The dotted line on the graph above is a valid The dotted line on the graph above is not
cut because it separates the source and the a valid cut because material can still flow
sink completely. No material can flow from from the source to the sink. Not all of the
the source to the sink. pathways from source to sink have been
blocked by the cut.
Capacity of a cut
The cut capacity is the sum of all the capacities of the edges that the cut passes through,
taking into account the direction of flow. The capacity of an edge is only counted if it
flows from the source side to the sink side of the cut.
Explanation Solution
All edges in C1 are counted. The capacity of C1 = 15 + 20 = 35
Note that the edge from F to B is not The capacity of C2 = 14 + 20 = 34
counted in C2 . The capacity of C3 = 14 + 15 + 20 = 49
All edges in C3 are counted. The capacity of C4 = 20 + 10 = 30
Note that the edge from D to C is not
counted in C4 .
The capacity of a cut is important to help determine the maximum flow through any digraph.
Look for the smallest, or minimum, cut capacity that exists in the graph. This will be
the same as the maximum flow that is possible through that graph. This is known as the
maximum-flow minimum-cut theorem.
Explanation Solution
1 Mark in all possible cuts on the A
network. 8
5 3
3 5
C1C 11 C
2 C6
C3 C4 C5
The koala sanctuary in Cowes allows visitors to walk through their park. The park is
represented by a network below, where each edge represents one-way tracks for visitors
through the park. The direction of travel on each track is shown by an arrow. The
numbers on the edges indicate the maximum number of people who are permitted to walk
along each track each hour.
15 10
A 8
D 12
20 9
8 G
E 10 F
Explanation Solution
a Firstly, consider the edges The vertex A is the source in this network. The three
coming from the source. edges connected to A, flowing towards B, D and E
When calculating the will be flowing at maximum capacity because they
maximum flow through a are coming from the source. Draw lines along these
network, always assume edges and circle their capacities as they are flowing at
the initial edges from the maximum capacity.
source are flowing at their B
15 10
maximum capacity. C
A 8
D 12
20 9
8 13 G
E 10 F
E 10 F
E 8 F
Explanation Solution
At vertex D there is a total of 25 flowing into it,
coming from vertices B and A. This flow of 25 can be
redistributed to the two edges coming from D towards
the sink at G. Of the 25, 9 can flow directly to G and 13
can flow from D to F. The maximum capacities of these
edges can be achieved, so circle these numbers along the
B 10
A 5
D 10
8 13 G
E 8 F
The maximum flow through the network is the total
amount incoming to the sink vertex G, which is
10 + 9 + 21 = 40. Therefore a maximum of 40 people
can walk through the koala sanctuary each hour.
b A group of 9 must begin Starting from vertex A there are only two edges the
at vertex A, only pass group of 9 people can walk along; the edges going to
through edges with a vertices B and D. Walking through vertex B there is
capacity greater than one walk possible: A − B − C − G. Walking through
or equal to 9 and end at vertex D there are two walks possible: A − D − G and
vertex G. Take note of A − D − F − G.
the direction of the arrow
Explanation Solution
c From the starting vertex A, Starting at vertex A the largest possible group of
consider the largest possible people that can enter the sanctuary is 20, however
group that could start a walk after reaching vertex D the group of 20 cannot stay
through the sanctuary and together as they move towards vertex G because
then analyse how many the capacity of the edges from D can only take a
of that group could then maximum of 13 people. From D, moving towards
walk to vertex G given the G a group of 13 could pass through with no other
capacities of each of the edges restrictions. This is the largest group of people
throughout the network. that can enter the sanctuary at vertex A and pass
through to vertex G, together as one group, given the
restrictions of the edge capacities.
Water enters a network of pipes at either Source 1 or Source 2 and flows out at either
Outlet 1 or Outlet 2.
The numbers next to the arrows represent the maximum rate, in kilolitres per minute, at
which water can flow through each pipe.
Source 1 400
800 300 100 Outlet 1
100 300 200
Source 2 300 500 400
300 600
100 300 Outlet 2
Determine the maximum rate, in kilolitres per minute, at which water can flow from these
pipes into the ocean at Outlet 1 and Outlet 2.
Note that although this method gives us the maximum flow for each outlet, we cannot
always add these values up to find the total maximum flow through the system, because
we might not be able to achieve maximum flow for every outlet at the same time.
Explanation Solution
The outlets need to be considered
Outlet 1 The capacity of C1 is: 400 + 800 = 1200
Look for the minimum cut that
The capacity of C2 is: 400 + 300 = 700
prevents water reaching Outlet 1.
Note: The pipe with capacity 200 The capacity of C3 is: 400 + 400 = 800
leading towards Outlet 2 does not need
The capacity of C4 is: 300 + 100 + 300 = 700
to be considered in any cut because
this pipe always prevents water from The capacity of C5 is: 300 + 400 = 700
reaching Outlet 1.
The minimum cut/maximum flow is 700 kilolitres
per minute.
C1 C2 C3 C4 C5
Source 1 300
800 300 Outlet 1
Note: The maximum flow through a network with two sources can also be determined by tracking the flow
as outlined in Example 4.
sheet Exercise 14A
Directed graphs
1 Find the number of vertices that can be reached from vertex A in each of the directed
graphs below.
a b A B
Example 2 2 Determine the capacity of each of the C1 C3 C2
cuts in the digraph opposite. The source B C
is vertex S and the sink is vertex T . 6 3
3 5 T
8 10 3
F 4 E
Example 3 4 Find the maximum flow for each of the following graphs. The source is vertex S and
Example 4 the sink is vertex T .
a A 3 C 5 b A 3 C
4 6 7
S 6 T 2 T
S 7
6 4 8 7
B B 6 D
c A 3 C d A 4 C 8
5 10
5 4 S
4 7
5 9 6 10
S 5 B 3 D B 10 D
Minimum-cut maximum-flow
5 A train journey consists of a cut B cut C
cut D
connected sequence of stages 0
cut A 4
formed by edges on the directed 3 3 9
network opposite from Arlie to 7 4 Bowen
Arlie 7 8
Bowen. The number of available 1
seats for each stage is indicated 4 8 1 7 cut E
beside the corresponding edge, 4
a 10
8 4
6 7
Source 1
6 6 Sink 1
9 12
Source 2 Sink 2
3 4 5
13 6
source 1 sink 1
15 2
14 3
source 2
1 sink 2
a How many different routes from the source to the sink are possible?
b Determine the maximum flow from the source to the sink.
b Determine the maximum flow of people from the entrance to the exit of the
c One group of primary school students would like to walk through the museum.
The teacher explains that this can happen unsupervised if all students in the group
remain together, not separating to explore different routes. Given that a group of
students must stay together from the entrance to the exit, what is the largest group of
students possible that can pass through the museum every 30 minutes?
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
14A 14A Flow problems 673
10 The number of these cuts with a capacity equal to the maximum flow of liquid from the
source to the sink, in litres per minute, is
A 0 B 1 C 2 D 3 E 4
Bipartite graphs
In some situations, the vertices of a graph belong Anastasios Trumpet
in two separate sets. Consider a music school
that has four teachers: Anastasios, Panayioti, Panayioti Piano
Olga and Irene. These teachers, between them
can teach four different instruments: trumpet, Olga Violin
piano, violin and clarinet. The teachers and
instruments are represented by vertices, arranged
Irene Clarinet
vertically as shown in the diagram opposite.
The edges of the diagram connect the teachers to the instruments they can teach.
This type of graph is called a bipartite graph. Each edge in a bipartite graph joins a vertex
from one group to a vertex in the other group.
In the situation described above, the school would need to match each teacher to one
instrumental class; this is an example of an allocation problem. The bipartite graph above
graphically shows the instrument(s) that each teacher can teach and can help the school
assign each teacher to an instrument. Anastasios is the only teacher who can teach clarinet,
Irene can only teach violin, therefore Olga must teach piano and Panayioti must teach
Example 6
Nick, Maria, David and Subitha are presenters on a TV travel show. Each presenter will
be assigned a story to film about one country that they have visited before.
Nick has visited Greece, Malaysia and Subitha has visited Malaysia and Brazil
Maria has visited Greece and France David has visited Malaysia
Construct a bipartite graph of the information above and use it to decide on the
assignment of each presenter to one country.
Explanation Solution
The two groups of items are: Presenters and Countries. Nick Greece
Draw a vertex for each presenter in one column and each
Maria Malaysia
country in another. Join the vertex of each presenter to the
vertex of each country they have visited. David France
Subitha Brazil
Given that each presenter can only visit one country, the
final allocations can be deduced by eliminating impossible
Maria is allocated to France, therefore she cannot Nick will be allocated
visit Greece. Nick is the only other presenter who has Greece
visited Greece, therefore he must be allocated that Maria will be allocated
country. France
As David is allocated to Malaysia, Subitha has only David will be allocated
one other country available to visit. Malaysia
Subitha must be allocated to Brazil and this is also Subitha will be
supported by the fact Nick must be allocated to Greece. allocated Brazil
Step 1: Subtract the lowest value in each row, from every value in that row.
Step 2: If the minimum number of lines required to cover all the zeros in the
table is equal to the number of allocations to be made, jump to step 6. Otherwise,
continue to step 3.
Step 3: If a column does not contain a zero, subtract the lowest value in that
column from every value in that column.
Step 4: If the minimum number of lines required to cover all the zeros in the
table is equal to the number of allocations to be made, jump to step 6. Otherwise,
continue to step 5a.
Step 5a: Add the smallest uncovered value to any value that is covered by two
lines. Subtract the smallest uncovered value from all the uncovered values.
Step 6: Draw a bipartite graph with an edge for every zero value in the table.
Zelda D
sheet Exercise 14B
Bipartite graphs
Example 6 1 a On Monday, three workers are each to be allocated one task at work. The bipartite
graph below shows which task(s) each person is able to complete.
Worker 1 Task 1
Worker 2 Task 2
Worker 3 Task 3
If each person completes a different task, write down the task each worker must be
allocated to on Monday.
b On Tuesday, the same three workers will be allocated to a new set of tasks. The
bipartite graph below shows which task(s) each person is able to complete.
Worker 1 Task 4
Worker 2 Task 5
Worker 3 Task 6
Given that Worker 2 must complete Task 6, write down the new task each worker
must be allocated to on Tuesday.
2 It is Miko’s birthday and his sister Aria has asked some of his friends to assist with the
celebrations by purchasing some items for a party. The bipartite graph below shows
which item(s) each person is able to purchase on their way to the party.
Niranjan Cake
Nishara Serviettes
Dhinesh Balloons
Dhishani Candles
Each friend must purchase an item. Write down which item each friend must purchase.
3 The sport of ice hockey has six player positions: goalie, left defence, right defence,
right wing, left wing and centre. A group of six have decided to play. Only one person
is happy to play goalie. The other five people must be allocated to the other five
The bipartite graph below shows which positions each of the five players can play.
Player 1 Centre
Each player plays a different position. Write down two possible allocations, describing
which position each player must play.
4 Gloria, Minh, Carlos and Trevor are buying ice-cream. They have a choice of five
flavours: chocolate, vanilla, peppermint, butterscotch and strawberry. Gloria likes
vanilla and butterscotch, but not the others. Minh only likes strawberry. Carlos likes
chocolate, peppermint and butterscotch. Trevor likes all flavours.
a Explain why a bipartite graph can be used to display this information.
b Draw a bipartite graph with the people on the left and flavours on the right.
c What is the degree of the vertex representing Trevor?
Use the Hungarian algorithm to select the ‘best’ student for each event.
The costs (in $’000s) for each team to play at each of the grounds are given in the table
Determine a schedule that will minimise the Team Home Away Neutral
total cost of playing the three games and
Champs 10 9 8
determine this cost.
Stars 7 4 5
Note: There are two different ways of scheduling the
games to achieve the same minimum cost. Identify both Wests 8 7 6
of these.
Determine a service vehicle assignment that will ensure that the total distance travelled
by the service vehicles is minimised. Determine this distance.
Based on the bipartite graph, which one of the following allocations is not possible?
A Friend Position
B Friend Position
Aaliyah Right wing Aaliyah Right wing
Brock Forward Brock Defender
Corazon Defender Corazon Forward
Daniel Left wing Daniel Left wing
C Friend Position
D Friend Position
Aaliyah Forward Aaliyah Forward
Brock Right wing Brock Defender
Corazon Left wing Corazon Left wing
Daniel Defender Daniel Right wing
E Friend Position
Aaliyah Forward
Brock Right wing
Corazon Defender
Daniel Left wing
12 The manager of the bank wants to allocate the tasks so as to minimise the total time
taken to complete the five tasks. If each person starts their allocated task at the same
time, then the first person to finish could be either
A Anita or Brad B Anita or Elektra C Brad or Carmen
D Brad or Dexter E Brad or Elektra
13 Before the tasks are performed, it is found that Elektra will only require 4 hours to
complete Task 5 rather than 9 hours. If the tasks are allocated based on this new
information, the minimum total time for all tasks will
A increase by 4 days. B decrease by 4 days. C decrease by 3 days.
D decrease by 2 days. E decrease by 1 day.
14 Four people, Xena, Wilson, Yasmine, Zachary, are each assigned a different job by
their manager. The table below shows the time, in hours, that each person would take
to complete each of the four jobs.
Wilson takes 6 minutes to complete Job 4, while Yasmine only takes 5 minutes to
complete Job 4. Both Xena and Zachary take p minutes to complete Job 4.
The manager will allocate the jobs as follows:
Job 1 to Wilson Job 2 to Xena
Job 3 to Yasmine Job 4 to Zachary
This allocation will achieve the minimum total completion time if the value of p is not
greater than
A 6 B 7 C 8 D 9 E 10
For any project, if activity A must be completed before activity B can begin then activity A
is said to be an immediate predecessor of activity B. The activities within a project can
have multiple immediate predecessors and these are usually recorded in a table called a
precedence table.
This precedence table shows some of the activities
involved in a project and their immediate predecessors. Activity predecessors
The information in the precedence table can be used to A −
draw a network diagram called an activity network. B −
Activity networks do not have labelled vertices, other than C A
the start and finish of the project. The activities in the D B
project are represented by the edges of the diagram and so E B
it is the edges that must be labelled, not the vertices. F C, D
G E, F
Activity networks
When activity A must be completed before activity B can begin, activity A is called an
immediate predecessor of activity B.
A table containing the activities of a project, and their immediate predecessors, is called a
precedence table.
An activity network can be drawn from a precedence table. Activity networks have
edges representing activities. The vertices are not labelled, other than the start and finish
Explanation Solution
the project.
Explanation Solution
Dummy activities
Sometimes two activities will have some of the same
immediate predecessors, but not all of them. In this
Activity predecessors
very simple precedence table, activity D and activity
A −
E share the immediate predecessor activity B, but they
both have an immediate predecessor activity that the B −
other does not. C −
The start and finish of the activity network are shown in the diagram above. We need to use
the precedence information for activity D and activity E to join these two parts together.
Activity D needs to follow directly from activity A and activity B, but we can only draw one
edge for activity D. Activity E needs to follow directly from both activity B and activity C,
but again we only have one edge for activity E, not two.
The solution is to draw the diagram with
activity D starting after one of its immediate dummy
predecessors, and using a dummy activity for start finish
the other. The dummy activities are represented
by dotted edges and are, in effect, imaginary. C
They are not real activities, but they allow all of
the predecessors from the table to be correctly represented.
The dummy activity for D allows activity D to directly follow both activity A and B.
A dummy activity is also needed for activity E
because it, too, has to start after two different dummy
activities, activity B and C. start finish
Dummy activities
A dummy activity is required if two activities share some, but not all, of their immediate
A dummy activity will be required from the end of each shared immediate predecessor to
the start of the activity that has additional immediate predecessors.
Dummy activities are represented in the activity network using dotted lines.
Explanation Solution
Explanation Solution
1 Create a table with a row for each activity.
2 Look at the start of an activity. Write down all of Activity predecessors
the activities that lead directly to this activity in
A −
the immediate predecessor column.
3 Activity C is a predecessor of activity E, and the
dummy activity makes it also a predecessor of F C A
and G. D B
4 Activity G is a predecessor of activity I, and the E C
dummy activity makes it also a predecessor of J. F D, C
G D, C
H E, F
J G, H
sheet Exercise 14C
b T
start S U finish
c L
start P finish
d C
start E
e R U
start finish
f E H
start F finish
Projects that involve multiple activities are usually completed against a time schedule.
Knowing how long individual activities within a project are likely to take allows managers
of such projects to hire staff, book equipment and also to estimate overall costs of the
project. Allocating time to the completion of activities in a project is called scheduling.
Scheduling problems involve analysis to determine the minimum overall time it would take
to complete a project.
B, 6 D, 2
F, 1
Float times
The diagram below shows a small section of a different activity network. There are three
activities shown, with their individual durations, in hours.
Activity B and activity C are both immediate A, 5 B, 3
predecessors to the next activity, so the project
cannot continue until both of these tasks are finished. C, 6
Activity B cannot begin until activity A is finished.
Activity C can be completed at the same time as activity A and activity B.
Activity A and B will take a total of 5 + 3 = 8 hours, while activity C only requires 6 hours.
There is some flexibility around when activity C needs to start. There are 8 − 6 = 2 hours
spare for the completion of activity C. This value is called the float time for activity C.
The flexibility around the starting time for activity C can be demonstrated with the following
Start at same time C C C C C C Slack Slack
Delay C by 1 hour Slack C C C C C C Slack
Delay C by 2 hour Slack Slack C C C C C C
The five red squares represent the 5 hours it takes to complete activity A. The three green
squares represent the 3 hours it takes to complete activity B.
The six yellow squares represent the 6 hours it takes to complete activity C. Activity C does
not have to start at the same time as activity A because it has some slack time available
(2 hours).
Activity C should not be delayed by more than 2 hours because this would cause delays to
the project. The next activity requires B and C to be complete before it can begin.
Forward scanning
Forward scanning will be demonstrated using the activity network below. The durations of
each are in days.
1 Draw a box, split into two C, 1
cells, next to each vertex of A, 8 E, 3
(for A)
the activity network, as shown G, 2 H, 1
start dummy, 0 finish
in the diagram opposite. If (for B)
B, 6 D, 2 F, 1
more than one activity begins
at a vertex, draw a box for
each of these activities.
2 Activities that begin at the C, 1
start of the project have an A, 8 E, 3
(for A) 0
EST of zero (0). Write this dummy, 0 G, 2 H, 1
start finish
in the left box, shown shaded (for B) 0 D, 2
B, 6 F, 1
yellow in the diagram.
Backward scanning
Backward scanning will be demonstrated using the activity network with completed forward
scanning from above.
1 Copy the minimum time to 8 C, 1 9
complete the project into the A, 8 E, 3
(for A) 0
right cell shown shaded blue dummy, 0 G, 2 H, 1
start finish
B, 6
in the diagram. (for B) 0 12 14 1515
D, 2 F, 1
6 9
The completed activity network with all EST and LST is shown below.
8 8 9 9
C, 1
A, 8 E, 3
(for A) 0 0
G, 2 H, 1
start dummy, 0 finish
B, 6
(for B) 0 3 12 12 14 14 15 15
D, 2
F, 1
6 9 9 11
Activities that have no float time are critical ones for completion of the project. Tracking
through the activity network along the edges of critical activities gives the critical path for
the project. The critical path for this project is highlighted in red on the diagram below.
8 8 9 9
C, 1
A, 8 E, 3
(for A) 0 0
G, 2 H, 1
start dummy, 0 finish
B, 6
(for B) 0 3 12 12 14 14 15 15
D, 2
F, 1
6 9 9 11
Critical path
A critical path is the longest or equal longest path in an activity network.
There can be more than one critical path in an activity network.
The critical path is the sequence of activities that cannot be delayed without affecting
the overall completion time of the project.
The process for determining the critical path is called critical path analysis.
C, 8 F, 12
A, 6
D, 7 E, 10
dummy, 0
B, 9
Explanation Solution
a C: 9 18
F: 26 26 38 38
D: 9 9 F, 12
C, 8 Finish
A: 0 3 A, 6
Start D, 7
B, 9 E, 10
B: 0 0 dummy, 0
E: 16 16
b The critical path is highlighted in red. The critical path of this project is
Note: The dummy is not included in the critical
B → D → E → F.
c The minimum completion time is in EST of The minimum completion time of
the end box. this project is 38 days.
Explanation Solution
E: 3
EST for each activity. Activities with no
immediate predecessors always have an A: 0 D,4 G,6
EST of zero. Add the left cell value at Start Finish
B,5 F: 7 F,5 G: 12 18
B: 0
the start of the activity to the duration C: 0
and write the result in the left cell at the C,12 H, 2
C,12 H, 2
there is more than one activity beginning
at the same vertex.
H: 12 16
c EST values are in the left cell at the The EST for activity H is 12 weeks
start of each activity.
d LST values are in the right cell at the The EST for activity H is 16 weeks
end of each activity.
e Float = LS T − ES T . Float H = LS T − ES T
= 16 − 12
= 4 weeks
f The critical path joins all of the
D: 3 3
activities that have the same EST and
E: 3 5
LST, and therefore which have zero A,3
float time. A: 0 0 D,4 G,6
Start Finish
B,5 F: 7 7 F,5 G: 12 12 18 18
B: 0 2
C: 0 4
C,12 H, 2
H: 12 16
sheet Exercise 14D
P, 4 W, 6
c m n 12 12 d 6 a
M, 4 A, 5
c 15
B, 3
14 b
e 3 f f 4 4 12 12
F, 6 Q, q
R, r
dummy P, p
n 9
5 G, 7 g
4 An activity network is 3 4 12 12
B, 8
shown in the diagram
opposite. A, 3 E, 10
0 0 D, 12 42 42
a Write down the critical
path for this project. C, 7 F, 20
22 22
b Calculate the float
times for non-critical
17 17
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
704 Chapter 14 Flow, matching and scheduling problems 14D
6 A precedence table and activity network for a project are shown below. The precedence
table is incomplete.
6 6 12 12 19 19 22 22
Duration Immediate C, 6 E, 7 G, 3
A, 3
Activity (weeks) predecessors dummy, 0 H, 3
dummy, 0
A 3 − 0 0 B, 6
D, 5 F, 1
B 6 − 6 6 11 12 12 19
I, 2
C 6
E 7
F 1 D
H 3
I 2 B
A, 3 D, 5
E, 2
Start Finish
B, 7
G, 4
C, 5
F, 2
A, 4 C, 2
E, 3
Start G, 4
F, 5
B, 7 Finish
D, 5 H, 2
a Write down the three activities that are immediate predecessors of activity H.
b Determine the earliest start time of activity H.
c For activity H the earliest start time and the latest start time are the same. What does
this tell us about activity H?
d Determine the minimum completion time, in hours, for this project.
e Which activity could be delayed for the longest time without affecting the minimum
completion time of the project?
Start C, 8
E, 7 I, 7
F, 9 Finish
B, 13
K, 1
D, 4
G, 12 J, p
A, 5 H, 7
dummy Finish
Start G, 10
B, 7 J, 2
D, 3 I, 3
E, 5
H, 9 K, 5
A, 5 F, 7
D, 6 L, 3 Finish
B, 2 I, 11
M, 3
E, 3 dummy
C, 4
J, 10
G, 11
a Complete a precedence table for this network, using two columns, one column for
the activities and a second column for the Immediate predecessors.
b How many activities have an earliest start time of 16 hours?
c Find the latest start time of activity F.
d There are two critical paths. Write down both critical paths.
e How many activities can be delayed by 1 hour without increasing the minimum
completion time of the project?
16 How many of these activities could be delayed without affecting the minimum
completion time of the project?
A 3 B 4 C 5 D 6 E 7
The minimum completion time for this project is 28 weeks. The time taken to complete
activity J is labelled x. The maximum value of x is
A 12 B 10 C 8 D 4 E 2
18 A project consists of ten activities, A to J. The table below shows the immediate
predecessor(s) and earliest start time, in days, of each activity.
Immediate Earliest
predecessors starting time
A − 0
B − 0
C − 0
D A 6
E B 5
F B 5
G C 4
H D, E 13
I F, G 14
J H, I 25
It is known that activity H has a completion time of ten days. The project can still be
completed in minimum time if activity D is delayed. The maximum length of the delay
for activity D is
A one day B two days C seven days D eight days E nine days
14E Crashing
Learning intentions
I To be able to use crashing to reduce the completion time of a project.
I To be able to minimise the cost of crashing activities to achieve the maximum
reduction in completion time of a project.
The minimum time for completion is currently 13 hours. In order to reduce this overall
time, the manager of the project should try to complete one, or more, of the activities in a
shorter time than normal. Reducing the time taken to complete activity A, B or C would not
achieve this goal however. These activities are not on the critical path and so they already
have slack time. Reducing their completion time will not shorten the overall time taken to
complete the project.
Activity D and E, on the other hand, lie on the critical path. Reducing the duration of these
activities will reduce the overall time for the project. If activity D was reduced in time to
4 hours instead, the project will be completed in 11, not 13, hours.
The directed network below shows the sequence of 8 activities that are needed to
complete a project. The time, in days, that it takes to complete each activity is also
D, 9
A, 4 H, 1
B, 5 E, 5
Start Finish
G, 3
C, 6 F, 7
Explanation Solution
a In crashing problems, we first need Path Duration(days)
to identify the critical path, or paths.
A−D−H 14
We will do this by remembering that
B−E −G 13
a critical path is the longest or equal C −F −G 16
longest path in the activity network.
Using this method, set up a table, list The critical path is C − F − G
all possible paths from Start to Finish
for the directed network and calculate
the length of each path. The critical
path is the path with the longest time
from Start to Finish.
Explanation Solution
b Write the duration of the 16 days
critical path identified in the
previous part. Path Duration New duration with
(days) maximum reduction
c Crash all possible activities by
(F by 3)
the maximum reduction. Add
A−D−H 14 14
a new column to the summary
B−E −G 13 13
table to get an overview of the
C −F −G 16 13
new duration of each path.
The new minimum completion time for the project
This may result in a new
is 14 days.
critical path. Consider the cost
Activity F originally took 7 days to complete. It can
of crashing and whether it is
be crashed, which means activity F may be reduced
worth applying the maximum
by a maximum of 3 days, to result in a completion
time of 4 days. It is possible to choose to reduce
acitvity F by 0, 1, 2 or 3 days. Reducing activity F
by the maximum 3 days would result in the original
critical path to be reduced from a total of 16 days,
down to 13 days. Considering there is a cost of $100
per day, this is not a desirable outcome; crashing
activity F by 3 days results in a new critical path
A − D − H with a total completion time of 14 days.
If we crash activity F by 2 days only, we create 2
equal critical paths requiring 14 days to complete
the project.
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
14E Crashing 711
The directed network below shows the sequence of 9 activities that are needed to
complete a project. The time, in days, that it takes to complete each activity is also
C, 8
A, 9 E, 10
Start Finish
D, 8 F, 7 H, 3
B, 10 I, 6
G, 12
The minimum completion time for the project is 28 days. It is possible to reduce the
completion time for activities B, E, G, H and I. The completion time for each of these five
activities can be reduced by a maximum of two days.
a What is the new minimum completion time, in days, that the project could take?
The reduction in completion time for each of these five Activity Daily cost($)
activities will incur an additional cost. The table opposite
B 1500
shows the five activities that can have their completion
E 2000
times reduced and the associated daily cost, in dollars. G 700
b What is the minimum cost that will achieve the H 900
greatest reduction in time taken to complete the I 800
Explanation Solution
a List all possible paths from Path Duration New duration after
Start to Finish, including the (days) maximum reduction
completion time of each. Crash (B,E,G,H,I by 2)
all activities by their maximum A−C −E 27 25
reduction. Identify the new A−F−H 19 17
critical path (path with the B−D−F −H 28 24
longest completion time) after B−G−I 28 22
the maximum reductions are
applied. A − C − E is the new critical path with a duration
of 25 days.
b 1 Begin with the new critical path A − C − E. The reduction of activity E must occur
to achieve the new minimal completion time, therefore reducing activity E by 2
days is essential.
2 Ignore the path A − F − H because its completion time of 19 days is already lower
than the critical path.
3 Consider the path B − D − F − H. It has a completion time of 28 days and must be
reduced to 25 days to equal the critical path of A − C − E. There are two options;
reduce B by 2 and H by 1 or reduce B by 1 and H by 2. From the table, it is more
expensive per day to reduce B than H, however by choosing to reduce B this will
also reduce the completion time of the final path B − G − I, which is more cost
effective; reducing activity B reduces the completion time of two different paths.
So reduce activity B by 2 and H by 1 day to reduce the overall completion time of
B − D − F − H down to 25 days.
4 The final path B − G − I has already been reduced by 2 days due to the reduction
of activity B previously chosen. One more activity must be reduced for this path to
equal the critical path. Activity G has a lower cost of reduction than activity I per
day, so include this in your calculation.
5 Calculate your total cost of crashing.
b Cost of crashing = E by 2 days + B by 2 days + H by 1 day + G by 1 day
= 2000 × 2 + 1500 × 2 + 900 + 700
= $8600
Exercise 14E
1 The activity network for a project is shown in the diagram below. The duration for each
activity is in hours.
A, 7
Start D, 10
B, 3 E, 4
F, 13
C, 5
G, 6 I, 8
H, 3
a List all four paths from the Start to the Finish of the project, with their respective
completion times.
b Identify the critical path and the minimum completion time for the project.
c If Activity E is reduced by 3 hours, identify the new minimum completion time for
this project.
Example 11 2 The directed network below shows the sequence of 8 activities that are needed to
complete a project. The time, in days, that is takes to complete each activity is also
D, 4
B, 7 dummy G, 3
Start Finish
A, 5 C, 6 E, 8
F, 5 H, 4
4 The activity network for a project is shown in the diagram below. The duration for each
activity is in hours.
D, 2
A, 4 E, 9 K, 11
B, 3 F, 13 L, 1
Start Finish
C, 6 I, 2
J, 5
G, 8
H, 4
a How many activities could be delayed by 4 hours without altering the minimum
completion time for the project?
b If the project is to be crashed by reducing the completion time of one activity only,
what is the minimum time, in hours, that the project can be completed in?
c Activity G can be reduced in time at a cost of $200 per hour. Activity J can
be reduced in time at a cost of $150 per hour. What is the cost of reducing the
completion time of this project as much as possible?
Example 13 5 The directed network below shows the sequence of 8 activities that are needed to
complete a project. The time, in days, that it takes to complete each activity is also
A, 7 D, 5
G, 5
B, 11 F, 6
Start Finish
H, 10
C, 5 E, 8
The minimum completion time for the project is 24 days. It is possible to reduce the
completion time for activities D, E and H. The completion time for each of these three
activities can be reduced by a maximum of two days.
a What is the new minimum completion time, in days, of the project?
The reduction in completion time for each of these Activity Daily cost($)
three activities will incur an additional cost. The table
D 170
opposite shows the three activities that can have their
E 350
completion times reduced and the associated daily H 200
cost, in dollars.
b What is the minimum cost that will achieve the greatest reduction in time taken to
complete the project?
6 The directed network below shows the sequence of 11 activities that are needed to
complete a project. The time, in days, that it takes to complete each activity is also
C, 5 F, 3
A, 7 dummy
I, 2 J, 6
Start D, 4
G, 3 K, 7
B, 2 H, 5
E, 6
7 The directed network below shows the sequence of 12 activities that are needed to
complete a project. The time, in weeks, that it takes to complete each activity is also
B, 4 D, 5
A, 6
dummy I, 4
Start G, 9
C, 2 E, 6 Finish
J, 11
L, 1
F, 4 H, 3
K, 13
8 The directed network below shows the sequence of 15 activities that are needed
to complete a maintenance project at the MCG. The time, in days, that it takes to
complete each activity is also shown.
I, 6
H, 7 N, 6
A, 5 C, 3 J, 3
G, 4 M, 4
Start Finish
K, 9 O, 5
D, 7 F, 2
B, 2 L, 11
E, 8
Activity Reduction in
completion time
(0, 1 or 2 days)
11 The directed graph below shows the sequence of activities required to complete a
project. All times are in weeks. There is one critical path for this project.
C, 6 H, 7
A, 4
L, 4
Start G, 9
I, 8
D, 6 Finish
F, 7 M, 2
B, 5
J, 6 K, 10
E, 9
The total completion time of the project can be reduced by four weeks by reducing
A activity B by four weeks
B activity F by four weeks.
C activity J by four weeks.
D activity I by three weeks and activity J by one week.
E activity D by three weeks and activity E by one week.
Weighted graph A weighted graph is a graph in which a number representing the size of
Assign- same quantity is associated with each edge. These numbers are called
Directed graph A directed graph is a graph where direction is indicated for every edge.
(digraph) This is often abbreviated to digraph.
Flow The transfer of material through a directed network. Flow can refer to
the movement of water or traffic.
Capacity The maximum flow of substance that an edge of a directed graph can
allow during a particular time interval. The capacity of water pipes is
the amount of water (usually in litres) that the pipe will allow through
per time period (minutes, hours, etc.). Other examples of capacity are
number of cars per minute or number of people per hour.
Source The source is the origin of the material flowing through a network.
Sink The sink is the final destination of the material flowing through a
Minimum cut The minimum cut is the cut with the minimum capacity. The cut must
separate the source from the sink.
Maximum flow The maximum flow through a directed graph is equal to the capacity of
the minimum cut.
Bipartite graph A bipartite graph has two distinct groups or categories for the vertices.
Connections exist between a vertex or vertices from one group with
a vertex or vertices from the other group. There are no connections
between the vertices within a group.
Allocation An allocation is made when each of the vertices in one group from a
bipartite graph are matched with one of the vertices in the other group
from that graph. An allocation is possible when both groups have
exactly the same number of vertices. The vertices in each group are
matched to only one vertex from the other group.
Cost matrix A table that contains the costs of allocating objects from one group
(such as people) to another (such as tasks). The ‘cost’ can be money, or
other factors such as the time taken.
Activity network An activity network is a directed graph that shows the required order of
completing individual activities that make up a project.
Precedence table A precedence table is a table that records the activities of a project and
their immediate predecessors. Precedence tables can also contain the
duration of each activity.
Dummy activity A dummy activity has zero cost. It is required if two activities share
some, but not all, of the same immediate predecessors. It allows the
network to show all precedence relationships in a project correctly.
Earliest starting EST is the earliest time an activity in a project can begin.
time (EST)
Latest starting LST is the latest time an activity in a project can begin, without
time (LST) affecting the overall completion time for the project.
Float (slack) time Float (slack) time is the difference between the latest starting time and
the earliest starting time.
Float = LST − EST
The float time is sometimes called the slack time. It is the largest
amount of time that an activity can be delayed without affecting the
overall completion time for the project.
Forward Forward scanning is a process of determining the EST for each activity
scanning in an activity network. The EST of an activity is added to the duration
of that activity to determine the EST of the next activity. The EST of
any activity is equal to the largest forward scanning value determined
from all immediate predecessors.
Critical path The critical path is the series of activities that cannot be delayed
without affecting the overall completion time of the project. Activities
on the critical path have no slack time. Their EST and LST are equal.
Critical path Critical path analysis is a project planning method in which activity
analysis durations are known with certainty.
Skills checklist
Download this checklist from the Interactive Textbook, then print it and fill it out to check
Check- your skills. X
14A 2 I can determine the maximum flow for any section of sequential edges of a
directed graph.
14A 4 I can determine the maximum flow as equal to the minimum cut capacity.
14B 6 I can use the Hungarian algorithm to determine an optimum allocation in order
to minimise cost.
14C 7 I can create an activity network from a precedence table.
14D 10 I can use forward scanning to determine the earliest starting time of activities
in an activity network.
14D 11 I can use backward scanning to determine the latest starting time of activities
in an activity network.
14D 12 I can determine the float time for activities in an activity network.
14D 13 I can determine the overall minimum completion time for a project using
critical path analysis.
Multiple-choice questions
1 The shortest path from A to Z in the 9 E
network on the right has length:
5 7
10 F
A 10 B 15 7 11
A D 4 Z
C 22 D 26 7
12 6 3 10
E 28 8
G 8 H
C 10
6 This activity network is for a project E, 3
B, 4 I, 3
where the component times in days A, 5 D, 6 H, 3
are shown. The critical path for the G, 2 K, 6
C, 3 J, 1
network of this project is given by: F, 6
The earliest time (in days) that activity F can begin is:
A 0 B 12 C 14 D 22 E 24
A 14 B 15 C 16 D 17 E 27
A 3 B 5 C 6 D 7 E 8
Written-response questions
1 An English class recently performed poorly in their essay writing assessment. To help
them improve, the teacher separated the class into groups of five and assigned one of
the following tasks to each person: Introduction, Body paragraph 1, Body paragraph
2, Body paragraph 3 and Conclusion. Each task will be completed by one person. The
table below shows the time, in minutes, that each person would take to complete each
of the five tasks.
The tasks will be allocated so that the total time of completing the five tasks is a
a Complete the sentences below by clearly stating which task each student should
write in order for the essay to be completed in the minimum time possible.
Alvin should write the...
Billy should write the...
Chloe should write the...
Danielle should write the...
Elena should write the...
b What is the minimum total time the group will dedicate to completing the essay?
2 WestAir Company flies routes in Mildura 2 Echuca
western Victoria. The network 7 1
3 4
shows the layout of connecting Ballarat Melbourne
flight paths for WestAir, which Horsham 2 7 10
2 1
originate in Mildura and terminate 3 Geelong
in either Melbourne or on the way to 2
Melbourne. On this network, Warmambool
the available spaces for passengers flying out of various locations on one morning
are shown.
The network has one cut shown.
a What is the capacity of this cut?
b What is the maximum number of passengers who could travel from Mildura to
Melbourne for the morning?
3 A school swimming team wants to select a 4 × 200 metre relay team. The fastest times
of its four best swimmers in each of the strokes are shown in the table below. Which
swimmer should swim which stroke to give the team the best chance of winning, and
what would be their time to swim the relay?
a The directed network that shows these activities is shown below. Add the three
missing features to the network.
C, 6
A, 4
H, 5
Start dummy Finish
B, 5 G, 4
F, 5
Revision: Networks and
decision mathematics
A graph that represents the connections between the towns on the map is:
5 A connected planar graph has an equal number of vertices and faces. If there are
20 edges in this graph, the number of vertices must be:
A 9 B 10 C 11 D 20 E 22
6 Underground water pipes are needed to water a new golf course. Water will be pumped
from the dam in the back corner of the course. To find the smallest total length of water
pipe needed, we must find:
A a critical path B a minimal spanning tree
C the shortest Eulerian circuit D the shortest Hamiltonian cycle
E the perimeter of the golf course
7 Which one of the following is a true statement about a critical path in a project?
A Knowledge of the critical path can be used to decide if any tasks in a project can be
delayed without extending the length of time of the project.
B All tasks on the critical path must be completed before any other task in the same
project can be started.
C Decreasing the times of tasks not on the critical path will decrease the length of time
of the project.
D The critical path must always include at least two tasks in a project.
E There is only one critical path in any project.
8 The length of the shortest path between the P
5 9
origin, O, and destination, D, in the weighted graph 5 U
3 S 4
shown here is: O 2
Q 6 5
A 11 B 12 C 13 1
3 6 T
2 5
D 14 E 15 7
9 Four students, talking about five ski Ann Falls Creek
resorts they have visited, represented their
information on the bipartite graph shown here. Matt Val D’Isere
B Matt and Tom have been to four ski resorts between them.
C Maria has visited fewer ski resorts than any of the others.
D Ann and Maria between them have visited all five ski resorts discussed.
E Ann and Tom between them have visited fewer resorts than Matt and Maria between
10 A gas pipeline is to be constructed to link several towns in the country. Assuming the
pipeline construction costs are the same everywhere in the region, the cheapest network
formed by the pipelines and the towns as vertices would form:
A a Hamiltonian cycle B an Eulerian circuit C a minimum spanning tree
D a critical path E a complete graph
11 Which one of the following Sally Kate Jon Greg
statements is not implied by this
bipartite graph?
A There are more translators of
French than Greek. Spanish Italian Greek Turkish French
B Sally and Kate can translate five
languages between them.
C Jon and Greg can translate four languages between them.
D Kate and Jon can translate more languages between them than can Sally and Greg.
E Sally and Jon can translate more languages between them than can Kate and Greg.
12 There are four different human blood types: O, A, B and AB. The relationships between
donor and recipients for these blood types are as follows:
Type O can donate blood to any type.
Type AB can receive blood from any type.
Each type can donate blood to its own type.
Each type can receive blood from its own type.
Which one of the following donor–recipient bipartite graphs correctly represents this
donor donor
recipient recipient
donor donor
recipient recipient
13 For the weighted graph shown, the length (total 8
weight) of the minimum spanning tree is: 7
A 28 B 29 C 30 3 6
D 31 E 32 5
14 A connected graph with 12 edges divides a plane into four faces. The number of
vertices in this graph will be:
A 6 B 10 C 12 D 13 E 14
15 The number of edges for a complete graph with twenty vertices is:
A 10 B 20 C 21 D 180 E 190
16 The number of vertices for a complete graph with twenty-one edges is:
A 7 B 8 C 14 D 42 E 43
17 The number of edges for a tree with four vertices must be:
A 1 B 2 C 3 D 4 E 5
20 The graph opposite represents a project with C D
activities listed on the edges of the graph. A I
Which of the following statements must be
true? F
A A must be completed before B can start.
B A must be completed before F can start. G
C E and F must start at the same time.
D E and F must finish at the same time.
E E cannot start until A is finished.
0 0 1 0 0 0 1 0 0 0 1 0
0 0 1 0 0 0 1 0 0 0 1 0
1 1 2 0 1 1 1 0 1 1 3 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 1 0
0 0 1 0 0 1 1 0
1 1 0 0 1 1 2 0
0 0 1 0 0 0 0 0
25 A connected planar graph divides the plane into a number of faces. If the graph has
nine vertices and these are linked by 20 edges, then the number of faces is:
A 11 B 13 C 21 D 27 E 31
26 The sum of the weights of the minimum spanning tree of
the weighted graph is: 6 4
A 2 B 30 C 32 8
D 33 E 35 5
The following graph relates to Questions 27 and 28.
7 D
A 5
C 3
6 E
28 The number of ways that vertex F can be reached from vertex A is:
A 1 B 2 C 3 D 4 E 5
2 The adjacency matrix and graph below represent the same information. Some elements
are missing from the adjacency matrix and some edges from the graph. Write down the
missing elements in the matrix and add the missing edges to the graph.
1 0 1
Q 1 0 1
R 0 2 0 1
S 1 1 1 0
The directed network that shows these activities is shown below. Three features are
missing. Using the table above, complete the network diagram below.
Start Finish
5 Consider the activity network below.
A, 6
C, 2
E, 3
G, 6
B, 4
D, 2
H, 4
F, 4
6 All the activities and their durations (in hours) in a project at a quarry are shown in the
network diagram below. The least time required for completing this entire project is 30
G, 4
E, 4 T, 0 J, 3
A, 6 F, 6 I, 2 K,
start finish
B, 5 C, 2 H, 3
For each activity in this project, the table on the next page shows the completion time,
the earliest starting time and the latest starting time.
8 Seven towns on an island have been surveyed 19 B
for transport and communications needs. D 56
The towns (labelled A, B, C, D, E, F, G) form E
the network shown here. The road distances 29 21
between the towns are marked in kilometres. 33
C 28
(All towns may be treated as points being of
no size compared to the network lengths.) 25 16
a Explain what is meant by the description of the graph as ‘planar’.
b Verify Euler’s formula for the graph above.
An inspector of roads is stationed at B. Starting from B, she must travel the complete
network of roads to examine them.
c If she wishes to travel the least distance where will she end up in the network?
d What will that distance be?
e Is the route unique? Briefly justify your answer.
f Determine the shortest distance that a fire truck stationed at E must travel to assist at
an emergency at A.
g To establish a cable network for 19 B
telecommunications on the island, it is D 56
proposed to put the cable underground E
beside the existing roads. What is the 29 21
minimal length of cable required here 33
C 28
if back-up links are not considered
necessary; that is, there are no loops in the 25 16
cable network? G
The Island Bank has outlets in each of the towns. The regional assistant manager
stationed at C must visit each outlet every second Friday and then return to the
office at C.
h Treating the towns as vertices and roads as edges in a graph, what is the distance of
a journey that forms a Hamiltonian cycle in the graph?
C, 2 G, 4
A, 2 D, 2 J, 4
F, 1 H, 8
Start Finish
B, 1 E, 6 I, 5 K, 2
Activity A B C D E F G H I J K
EST 0 0 2 2 4 4 10 10 18 22
a The earliest start times (EST) for each activity except G are given in the table.
Complete the table by finding the EST for G.
b What is the shortest time required to assemble the product?
c What is the float (slack time) for activity I?
12 In laying a pipeline, the various jobs involved have been grouped into a set of specific
tasks A−K, which are performed in the precedence described in the network below.
Start Finish
a List all the task(s) that must be completed before task E is started. The durations of
the tasks are given in Table 1.
b Use the information in Table 1 to complete Table 2.
Table 1 Task durations Table 2 Starting times for tasks
Normal completion Task EST LST
Task time (months) A 0 0
A 10 B 0
B 6 C 6 7
C 3 D 10 10
D 4 E 11
E 7 F 14 14
F 4 G 14 18
G 5 H 18 20
H 4 I 18
I 5 J 23 23
J 4 K 22 24
K 3
B, 3
dummy G, 3
E, 4
A, 1 I, 4
Start Finish
H, 4
dummy J, 3
C, 5
F, 5
a What is the new minimum completion time now possible for the project?
b What is the minimum cost of completing the project in this time?
c How many activities will be reduced in time to achieve the new minimum
completion time at minimal cost?
Revision of Chapters
1 In a study of cats, data relating to the following five variables were collected:
age (1 = less than 1 year, 2 = 1–5 years, 3 = more than 5 years)
weight in kg
length in cm
The number of these variables that are discrete numerical variables is:
A 0 B 1 C 2 D 3 E 4
days. She records the number 40
of days between the customers 30
receiving the invoice, and the 20
invoice being paid (day paid),
for 105 invoices. Her data is 0 5 10 15 20 25 30
shown in the histogram. Day paid
2 Janelle decides to consider invoices which were paid in 15 days or more as late
payments. The percentage of the invoices classified as late payments is closest to:
A 9.5% B 10.0% C 19.0% D 20.0% E 85.7%
4 A class of 28 students in Year 11 (16 girls and 12 boys) sat for a French test. The mean
score on the test for the 16 girls in the class was 42. The mean score on the test for the
12 boys in the class was 28. The mean score on the test for the whole class was
A 30 B 32 C 35 D 36 E 37
Use the following information to answer questions 6–8
6 The difference between the mean kilometres driven and the median kilometres driven,
to the nearest 100 km, is closest to:
A 500 km B 4900 km C 5000 km
D 16 500 km E 33 500 km
7 Of these 300 cars, the number that have been driven less than 15 000 km is closest to:
A 50 B 75 C 100 D 125 E 150
8 The shape of the distribution of the the kilometres driven is best described as
A approximately symmetric
B positively skewed
C positively skewed with one or more outliers
D negatively skewed
E negatively skewed with one or more outliers
9 Suppose that the weights of adults wombats are approximately normally distributed. If
16% of wombats weigh less than 3.5 kg, and 0.15% of wombats weigh more than 5.5
kg, the mean and standard deviation of the weight of wombats, in kg, is closest to
A mean = 4.0, standard deviation = 0.5 B mean = 4.3, standard deviation = 0.4
C mean = 5.5, standard deviation = 2.0 D mean = 4.5, standard deviation = 1.0
E mean = 4.5, standard deviation = 0.5
10 The percentage of students surveyed who chose to go to the beach is closest to:
A 48.4% B 60.0% C 35.8% D 65.0% E 35.5%
11 The data in the table supports the contention that there is an association between
preferred camp and year level because:
A more students preferred to go the beach than go to the snow.
B 35.8% of students in Year 10 preferred to go to the beach, compared to only 55.0%
of Year 10 students who preferred to go the snow.
C 60.0% of students in Year 9 preferred to go to the beach, compared to only 30.8% of
Year 9 students who preferred to go the snow.
D 48.2% of students preferred the beach compared to 42.4% who preferred the snow.
E 60.0% of students in Year 9 preferred to go to the beach, compared to only 35.8% of
Year 10 students who preferred to go the beach.
12 The percentage of variation in growth NOT explained by the variation in the sunlight
hours is closest to:
A 16.9% B 31.0% C 34.3% D 47.6% E 52.4%
14 The scatterplot shows the reaction 30
time, measured in hundredths of 28
reaction time
squares line had been fitted to
the scatterplot with age as the
explanatory variable, and reaction
time as the response variable. The 14
equation of the least squares line is 12
closest to: 10
16 18 20 22 24 26 28 30 32 34 36 38 40
15 The table below shows the life expectancy in years and the percentage of government
expenditure which is spent on health (health) in 8 countries.
Health (%) 17.3 10.3 4.7 6.0 20.1 6.4 13.2 7.7
Life expectancy (years) 82 76 68 69 83 75 76 76
A least squares line which enables a country’s life expectancy to be predicted from
their expenditure on health is fitted to the data. The value of the residual (to the nearest
year) when the actual percentage of government expenditure which is spent on health is
6% is closest to:
A -3 B -2 C 1 D 2 E 3
16 The price of shares in a company has increased non-linearly in the last 12 months. A
log transformation was applied to the maximum share price each month (share price),
and a least squares line fitted to the transformed data, with month as the explanatory
variable. The equation of the least squares line is:
log(shareprice) = 0.855 + 0.154 × month
Using this equation, the maximum monthly share price in month 15 is closest to:
A $0.50 B $1.04 C $3.16 D $1462.18 E $94.42
17 The table below records the monthly electricity cost (in dollars) for an apartment over
one calendar year.
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
123 90 153 136 101 129 153 143 95 61 85 107
The six-mean smoothed with centring cost of electricity in July is closest to:
A $129 B $120 C $128 D $131 E $143
18 The time series plot below shows shows earnings per quarter ($000) for a certain
salesperson over a 3 year period.
Earnings ($000)
1 2 3 4 5 6 7 8 9 10 11 12
19 The table below shows the long term mean monthly sales figures (in $’000s) for a
company, and the associated seasonal indices for the sales. The long-term mean sales
figure for January is missing.
Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Sales 80.0 70.3 62.6 54.6 55.0 52.1 54.2 56.5 52.8 61.8 99.7
SI 0.727 1.289 1.132 1.008 0.880 0.886 0.840 0.874 0.911 0.850 0.996 1.607
The long-term mean sales figure for January is closest to:
A 45.1 B 58.9 C 62.1 D 73.4 E 85.4
20 The number of job applications received by a large supermarket chain is seasonal. Data
has been collected, and a least squares regression line fitted to the deseasonalised data.
The equation of the line is
deseasonalised job applications = 457.8 + 12.27 × month number
where month number 1 is January 2022.
The monthly seasonal indices for job applications are shown in the following table:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1.14 1.06 1.22 1.03 0.95 0.95 0.83 0.70 0.78 0.88 1.23 1.23
The actual number of job applications predicted for January 2023 was closest to
A 605 B 617 C 542 D 690 E 704
Recursion and financial modelling
26 Reuben has purchased a new oven for his restaurant for $37 000.
He depreciates the value of the oven using the unit cost method at the rate of $3 per
hour of use.
The recurrence relation below can be used to model the value of the oven, Vn , after n
V0 = 37 000, Vn+1 = Vn − 5460
Reuben uses the oven all 52 weeks of the year for the same number of hours each
The number of hours each week that the oven is used for is closest to
A 15 B 21 C 35 D 105 E 5460
27 The first three lines of an amortisation table for a reducing balance loan are shown
Payment number Payment Interest Principal reduction Balance
0 0.00 0.00 0.00 140 000.00
1 755.00 560.00 195.00 139 805.00
2 755.00 559.22
28 An annuity investment earns interest at the rate of 5.3% per annum, compounding
Tim initially invested $60 000 and will add monthly payments of $1600.
The value of this investment will first exceed $74 000 after
A five months
B six months
C seven months
D eight months
E nine months
29 Raj borrowed $54 000 to buy a car and was charged interest at the rate of 9.6% per
annum, compounding monthly.
For the first year of the loan, Raj made monthly repayments of $1080.
For the second year of the loan, Raj made monthly repayments of $1200.
The total amount of interest that Raj paid over this two-year period is closest to
A $8785 B $12 960 C $14 400 D $18 575 E $27 360
30 Victoria has invested an amount of money in a perpetuity.
The perpetuity earns interest at the rate of 4.8% per annum.
Interest is calculated and paid quarterly.
If Victoria receives $1020 per quarter from the perpetuity, then the amount that she has
invested is
A $4896 B $21 250 C $85 000 D $255 000 E $489 600
3 4 2
31 If matrix M = then its transpose M T is
5 6 8
3 4 3 5 4 3
5 6 8 5 4 8
A 5 6
B 4 6
C D E 6 5
3 4 2 3 6 2
7 8 2 8 8 7
3 4
1 9 11
32 Matrix A = 5 6 and matrix B = . Matrix R = A × B. Element r32 is
0 12 14
7 0
calculated by
A 3 × 9 + 4 × 12 B 7 × 9 + 0 × 12 C 1 × 11 + 6 × 14
D 1 × 11 + 6 × 14 E 7 × 11 + 0 × 12
33 A gymnasium has the same number of customers every day. They can do either
pilates (P) or yoga (Y). The customers may change activities every day. Their change
in involvement is shown in the transition matrix below.
P 65% 70%
Y 35% 30%
There must be 20 customers in the pilates class. Each day the number of customers in
the yoga class is
A 10 B 12 C 15 D 20 E 25
36 Matrix A is a 5 × 5 matrix.
Matrix B is a row matrix. Matrix C is a column matrix.
Which one of the matrix products below could result in a 1 × 1 matrix?
0 1 0 0 0 a
0 0 0 1 0 b
Matrix P = 1 0 0 0 0 and matrix Z = c. The smallest value of n such that
0 0 0 0 1 a
0 0 1 0 0 b
Pn Z = Z is
A 1 B 2 C 3 D 4 E 5
The expected number of each rating received after n weeks can be determined by the
recurrence relation
S 0 = 600 , S n+1 = T S n
where S 0 is the state matrix for the first week of the survey with 300 low ratings, 600
medium ratings and 100 high ratings.
38 What percentage of these 1000 participants are not expected to change their rating in
the second week from the first?
A 60% B 43% C 56% D 72% E 82 %
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
16A Exam 1 questions 751
39 In the long term (say after more than 20 weeks of the survey), how many of the
participants will give a low rating?
A 150 B 250 C 350 D 450 E 550
25 x 0.4 0.6
S 0 = 16 , S n+1 = T S n where T = y 0.2 0.4
56 z w v
(Matrix T is a regular transition matrix, The columns of the matrix T sum to one).
Which one of the following statements is not necessarily true?
A z+w+v=1 B x+y+z=1 C w = 0.4
D v=0 E z + w + v ≥ 0.4
43 What is the maximum number of edges a bipartite graph with 30 vertices can have?
A 60 B 125 C 200 D 225 E 250
44 Consider the graph opposite. Euler’s formula can be verifted for this graph. What
values of e, v and f can be used in this verification?
A e = 5, v = 5, f = 3
B e = 7, v = 5, f = 4
C e = 6, v = 5, f = 3
D e = 6, v = 5, f = 4
E e = 8, v = 4, f = 4
46 For a connected graph with five vertices and five edges, the sum of the degrees of the
vertices is
A 4 B 6 C 8 D 9 E 10
48 The directed graph below shows the sequence of activities required to complete
a project. The time taken to complete each activity, in hours, is also shown. The
minimum completion time for this project is 21 hours. The time taken to complete
activity G is labelled x. The maximum value of x is
C, 10
L, 2
A, 4 G, x
D, 4 K, 2 Finish
Start M, 5
F, 3 H, 2
B, 3 J, 4
E, 6
A 1 B 2 C 3 D 5 E 7
49 The flow of water through a series of pipes is shown in the network below. The
numbers on the edges show the maximum flow through each pipe in litres per minute.
A 3 C
5 4 Sink
4 6
5 Cut
Source 5 B 3 D
The capacity of the cut in litres per minute is
A 10 B 12 C 14 D 18 E 20
1 The weights (in kg) carried by the horses in a handicap race are given below.
60 57 57 55 54 53 53 53 52 52 51.5 51
a Calculate the mean and the standard deviation of the weights carried by the horses
in this race, rounding your answers to three decimal places.
b One horse in the race carried a weight of 51 kg. Use your answer to part a. to
calculate the standardised score (z) for the weight carried by that horse, rounding
your answer to one decimal place.
c Suppose that the weights carried by horses in handicap races are approximately
normally distributed. If the weight carried by one horse, Silver, has a standardised
score of z = −2:
i Using the mean and standard deviation determined in part a., how much weight
was carried by Silver? Round your answer to one decimal place.
ii What percentage of horses would be expected to carry weight less than Silver?
None A few Many
number of distractions
a Which variable is the explanatory variable and which is the response variable in this
b Which variables in this study are numerical?
c For the group who completed the task with no distractions
i What is the interquartile range?
ii Show why the value of 38.0 seconds is an outlier for this group.
d People who took more than 36 seconds to complete the task were classified as slow.
How many people in the study would be classified as slow?
e Do the boxplots support the contention that there is an association between number
of distractions and time? Refer to the values of an appropriate statistic in your
3 The equation of the least squares line that relates the fuel consumption of a certain car,
in litres/100 km, to the speed at which the car is travelling, in km/hr is:
fuel consumption = 6.827 + 0.0218 × speed
a Use the summary statistics fuel consumption speed
shown to determine the
mean 8.7556 88.444
coefficient of determination as
a percentage, rounded to one standard deviation 0.52941 22.367
decimal place.
b Interpret the value of the coefficient of determination in terms of fuel consumption
and speed.
c Use the equation to predict the fuel consumption of the car if it is travelling at 100
km/hr. Round the answer to one decimal place.
d Write down the slope of the regression line and interpret in terms of fuel consumtion
and the speed.
e When the speed was 72 km/hr, the actual fuel consumption was 8.3 litre/100 km.
Show that, when the least squares line is used to predict the fuel consumption at 72
km/hr, the residual is −0.10 rounded to two decimal places.
4 In a study of the association between a person’s enthusiasm for their job (a numerical
variable measured on a scale from 0 to 15), and their efficiency when performing their
job (a numerical variable measured on a scale from 0 to 25), the following data was
collected from a group of 12 employees.
enthusiasm 14.2 13.2 11.6 12.6 11.6 10.0 8.8 8.3 5.1 4.5 2.8 2.4
efficiency 22.5 16.7 12.0 11.3 10.0 5.2 6.1 4.0 3.4 2.2 2.1 1.9
The following scatterplot was constructed, with enthusiasm as the explanatory variable,
and efficiency as the response variable.
2 4 6 8 10 12 14 16
a Describe the association between efficiency and enthusiasm in terms of form and
b Which transformations could be used in order to linearise the association?
c Apply a log transformation to the variable efficiency to linearise the association. Fit
a least squares line to the transformed data, and write down its equation. Round the
values of the intercept and slope to three significant figures.
d Use the equation from part c. to predict the efficiency score for a person who scores
ISBN 978-1-009-11041-9 9.3 on enthusiasm. © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Revision 756 Chapter 16 Revision of Chapters 1–15
5 The table below shows the quarterly house sales achieved by a real estate company in
the years 2020-2021.
Year Q1 Q2 Q3 Q4
2020 52 59 68 27
2021 57 65 75 29
a Use the data in the table to find seasonal indices. Give your answers rounded to two
decimal places.
b The number of houses sold in each of the four quarters in 2022 is shown in the table
Year Q1 Q2 Q3 Q4
2022 63 69 80 33
Use the seasonal indices from part a to deseasonalise the data. Round your answers
to the nearest whole number.
6 The following table gives the value of average price of unleaded petrol in Victoria each
year from 2015–2021.
Year 2015 2016 2017 2018 2019 2020 2021
Price (cents/litre) 126.3 116.4 128.7 143.4 141.1 123.9 147.6
a Find the centred four-mean smoothed value of petrol in Victoria for the year 2017 in
cents/litre, rounding your answer to one decimal place.
b Find the five-median smoothed value of petrol in Victoria for the year 2019 in
cents/litre, rounding your answer to one decimal place
The following time series plot shows the price of petrol in Victoria in cents/litre,
and the price of petrol in the Northern Territory (NT) in cents/litre, over the years
Price of petrol (cents/litre)
2000 2005 2010 2015 2020 2025
Least squares regression lines (shown on the plot) have been fitted to both sets of time
series data, and the following equations determined:
Victoria: petrol price = −4071.53 + 2.08699 × year
NT: petrol price = −4290.07 + 2.20128 × year
c i Write down the slope of the least squares line for Victoria rounded to two
decimal places, and interpret.
ii Write down the slope of the least squares line for the NT rounded to two decimal
places, and interpret.
d Use the least squares regression lines to predict the price of petrol in 2026:
i in Victoria ii in the NT
e Do the equations predict that the difference in petrol prices between Victoria and
the NT will decrease, stay the same, or increase? Explain your answer, quoting
appropriate statistics.
7 Joslyn originally paid $9500 for office furniture that she now wishes to sell. Joslyn will
sell the furniture at a depreciated value.
a Joslyn could use a reducing balance method, with an annual depreciation rate of
Using this depreciation method, what is the value of the furniture five years after it
was purchased? Round your answer to the nearest cent.
b If Joslyn used a reducing balance depreciation method and the furniture was sold
for $6890 after five years, what annual percentage rate of depreciation did this
represent? Give your answer correct to one decimal place.
c Joslyn could use a flat rate depreciation method.
Let Jn be the value, in dollars, of Joslyn’s furniture n years after it was purchased.
The value of the furniture, Jn can be modelled by the recurrence relation below.
J0 = 9500, Jn+1 = Jn − 855
i Using this depreciation method, what is the value of the furniture five years after
it was purchased?
ii What annual flat rate of depreciation is represented?
8 Marina would like to buy a new bicycle and has saved $2600.
a Marina could invest this money in an account that pays interest which compounds
The balance of this investment after n months, Mn , could be determined using the
recurrence relation below.
M0 = 2600, Mn+1 = 1.003 × Mn
Calculate the total interest that would be earned by Marina’s investment in the first
five months. Round your answer to the nearest cent.
Marina could invest the $2600 in a different account that allows her to make an
additional payment of $140 each month.
b Marina would like to have a balance of $5500, to the nearest dollar, after 18 months.
What annual interest rate would Marina require? Round your answer to two decimal
c The interest rate is 3% per annum, compounding monthly. Let Vn be the value of
Marina’s investment after n months.
Write down a recurrence relation, in terms of V0 , Vn+1 and Vn , that would model the
change in the value of this investment.
9 Alessandro borrows $12 000 with interest on the loan charged at the rate of 7.9% per
annum, compounding monthly.
Immediately after the interest has been calculated and charged each month, Alessandro
will make a repayment.
a Alessandro considers making interest only repayments. What would be the value of
each interest only repayment?
b Alessandro makes equal monthly repayments for four years. After these four
years, the balance of his loan will be $2946.24 correct to two decimal places. What
amount, in dollars, will Alessandro repay each month during the four years?
c If Alessandro instead decides to fully repay the loan in three years with 35 equal
monthly payments followed by a final payment that is as close to the regular
payment as possible. Find both the regular payment and the final payment. Round
your answers to the nearest cent.
10 A regional city has three supermarkets HSL (H), Radcliffs (R) and Cottonworths (C)
The total number of shoppers at each of the stores on a weekday is shown in matrix W.
W = 1500 2500 3200
a Write down the order of matrix W.
Each of the supermarkets has a meat counter and a delicatessen The proportion of daily
shoppers who only purchase from the meat counter (M), only from the delicatessan (D)
and those that do shopping from several sections (G) is the same for each supermarket
and is described in the matrix shown here.
0.1 M
P = 0.2 D
0.7 G
b Find the matrix Q = P × W and describe what the value element q32 is.
c On a particular day at Cottonworths the amount spent, in dollars, is described by the
following matrix. The amounts indicate the typical amount spent by a customer.
20 M
A = 20 D
60 G
The numbers of each type of of customer that spent exactly these amounts is
descibed by the matrix
T = 150 250 600
Show a matrix multiplication that will give the total amount spent on that day.
d There can be a lot of change in the daily shopping numbers of customers between
the three supermarkets. We use a 3 × 3 transition matrix and a recurrence relation to
describe the changed shopping locations of customers in the city month by month.
This month
1500 H
0.13 0.8 0.2 H
Let S 0 = 2500 R and T =
0.7 0.1 0.2 R Next month
3200 C
0.17 0.1 0.6 C
The recurrence relation is S n+1 = T × S n where S n is the state matrix n months after
our starting time.
i Find S 1 ii Find S 50
iii Describe the long term situation for the number of customers at each of the
11 When a new infectious disease was first noticed in a particular country, there were
100 people already infected. As the disease spread, the country’s health authority
collected the following information:
The duration of the disease is at most 3 weeks. People who contract the disease
either die during these 3 weeks or else recover during the third week.
The survival rate of people who have the disease is 90% in the first week and 80% in
the second week.
People who contract the disease are not infectious during the first week. In their
second week of the disease they have a 90% probability of infecting one other
person, and in their third week a 70% probability of infecting one other person.
Using this information:
a Construct a Leslie matrix L for the disease based on three one-week stages.
b Assume that, when the disease is first noticed, the 100 infected people are all in the
second week of the disease. Write down the initial population matrix S 0 .
For each of the following, give answers to the nearest whole number.
c Determine how the disease is spreading by using the Leslie matrix L to find S 1 , S 2 ,
S 3 and S 4 . Comment.
d Find S 40 , and then use S 40 to find S 41 . Verify that, in these two population matrices,
the sizes of the three groups are in nearly the same ratio. Hence estimate the growth
rate of the disease at this stage.
e Suppose that the health authority had taken immediate action to reduce the rate of
infection in the third week to 35%. Repeat parts a–d. Would this action have been
sufficient to eradicate the disease? Give evidence for your answer.
f Now suppose that the health authority had taken more drastic action and reduced the
rate of infection in the third week to 10%. Repeat parts a–d.
12 A certain population of female marsupials is divided into six age groups, each spanning
3 months. The population can then be modelled by the following Leslie matrix, L, and
initial population matrix, S 0 :
0 0.3 0.8 0.7 0.4 0 24
0.6 0 0 0 0 0 16
0 0.9 0 0 0 0 24
L = and S =
0 0 0.9 0 0 0 8
0 0 0 0.8 0 0 0
0 0 0 0 0.6 0 0
13 The choice of seats and their costs at the Leslie theatre are
Stalls (S) $34.00 Balcony (B) $42.00 Dress circle (D) $60.00
The number of seats available in each class are
Stalls (S) 200 Balcony (B) 150 Dress circle (D) 80
a The column matrix A contains the number of seats in each class.
200 S
A = 150 B
80 D
State the order of A.
b Matrix C gives the cost of each type of seat
C = 34 42 60
i How many of these swimmers had E competed with before joining the club?
ii Who had competed with both A and B before joining the club?
b The swimming club has a medley relay team. Three of the new club members, A,
B and C can can complete the following sectors of the medley race: backstroke,
butterfly and breastroke. The table below shows the average times in seconds for
100 m for these sectors for each of the three swimmers. The freestyle swimmer has
been chosen and has much better freestyle times than the three new members.
How should the swimmers be allocated to minimise the team’s time?
iv The project is to be crashed by reducing the completion time of one activity
only. What is the minimum time, in months, that the project can be completed
a Determine a Hamiltonian cycle beginning and ending at A that the family can
b Determine an Eulerian trail that the family can follow. Explain how you determined
this trail.
The network opposite shows the lengths B 1.6 km
of the paths that join the rides. The
family decides to visit the rides by 2.4 km D
4.2 km
following an Eulerian trail. Assume
A 3 km 1.8 km 2.1 km
that the family can walk at a speed of 3
km/hr. The theme park will close at 5 3.2 km
p.m. C
c What is the latest time that the family
3 km
can enter the theme park?
17 The diagram below shows the buildings of a new university. The lines on the diagram
show the location of the pathways between the buildings.
20 m 20 m Office
10 m 25 m
30 m 10 m
30 m 20 m 10 m
10 m 10 m
25 m
15 m
C 20 m 10 m D
a i How many different ways can a student walk directly from building A to
building B?
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Revision 764 Chapter 16 Revision of Chapters 1–15
18 Anthony is creating a robot for a university project. The activities required to design
and build the robot are shown in the table below, along with their duration in weeks and
the immediate predecessor for each activity.
A → B
68–95–99.7% rule: [p. 77] A rule for reducing balance loan or annuity on a step-by-step
determining the percentage of values that lie (payment-by payment) basis or the payment of
within one, two and three deviations of the mean a compound interest investment with additional
in a normal distribution. payments.
Annuity: [p. 402] An annuity is a compound
A interest investment from which regular payments
are made.
Activity: [p. 683] A task to be completed as
part of a project. Activities are represented by the
edges in the project diagram.
Activity network: [p. 684] An activity B
network is a weighted directed graph that shows Backward scanning: [p. 696] Backward
the required order of completion of the activities scanning is the process of determining the LST
that make up a project. The weights indicate the for each activity in a project activity network.
durations of the activities they represent.
Balance: [p. 342] The balance of a loan or
Adding to the principal: see annuity investment is the amount owed or accrued after a
investment. period of time.
Adjacency matrix: [p. 619] A square matrix Bar chart: [p. 8] A statistical graph used to display
showing the number of edges joining each pair of the frequency distribution of categorical data.
vertices in a graph.
Binary matrix: [p. 523] A matrix whose
Algorithm: [p. 637] A step-by-step procedure elements are either zero or one.
for solving a particular problem that involves
Bipartite graph (bigraph): [p. 674] A graph
applying the same process repeatedly. Examples
whose set of vertices can be split into two subsets,
include Prim’s algorithm and the Hungarian
X and Y, in such a way that each edge of the graph
joins a vertex in X and a vertex in Y.
Allocation: [p. 74] Allocation is the process of
Bivariate data: [p. 104] Data in which each
assigning a series of tasks to different members
observation involves recording information about
of a group in a way that enables the tasks to be
two variables for the same person or thing. An
completed for the minimum time or cost.
example would be data recording the height and
Amortisation: [p. 407] Amortisation is the weight of the children in a preschool.
repayment of a loan or an investment with regular
Boxplot: [p. 61] A graphical display of the
payments made over a period of time.
five-number summary of a data set showing
Amortisation table: [p. 408] An amortisa- outliers if present. See outliers.
tion table charts the amortisation (repayment) of a
Bridge: [p. 611] A single edge in a connected Compound interest: [p. 360] Where the
graph that, if removed, leaves the graph interest paid on a loan or investment is added to
Cycle (graphs): [p. 625] A walk with no to ensure that all predecessor activities are
repeated vertices that starts and ends at the same properly accounted for.
vertex. See also circuit.
Cycle (time series): [p. 258] Periodic
movement in a time series but over a period
greater than a year. E
Earliest starting time (EST): [p. 695] The
D → E
D earliest time an activity in a project can be started.
Edge: [p. 609] A line joining one vertex in
Data transformation: [p. 211] Using a
a graph or network to another vertex or itself
mathematical rule to change the scale on either
(a loop).
the x- or y-axis in order to linearise a non-linear
scatterplot. Effective interest rate: [p. 375] Used to
compare the interest paid on loans (or invest-
Degenerate graph: [p. 610] A graph in which
ments) with the same annual nominal interest rate
no vertex is connected to any other vertex. All the
r but with different compounding periods (daily,
vertices are isolated.
monthly, quarterly, annually, other).
Degree of a vertex (deg(A)): [p. 609] The
number of edges attached to the vertex. The Elements: [p. 474] The numbers or symbols
degree of vertex A is written as deg(A). displayed in a matrix.
Depreciation: [p. 343] The reduction in value Equal matrices: [p. 491] Matrices that have
of an item over time. the same order and identical elements in identical
Deseasonalise: [p. 285] The process of positions.
removing seasonality in time series data. Equivalent graph: [p. 613] see isomorphic
Determinant: [p. 514] A number associated graphs.
with square matrices. The determinant of a matrix
Eulerian circuit: [p. 627] An Eulerian walk
A, written det(A), is used to decide if the matrix
that starts and finishes at the same vertex. To have
has an inverse. If det(A) = 0, the matrix has no
an Eulerian circuit, a network must be connected
inverse; it is singular.
and all vertices must be of even degree.
Dijkstra’s algorithm: [p. 637] An algorithm
Eulerian trail: [p. 627] A walk in a graph or
for finding the shortest path between two vertices
network that includes every edge just once (but
in a weighted graph. Pronounced ‘Di-stra’: ‘Di’ as
does not start and finish at the same vertex).
in ‘die’ and ‘stra’ as in ‘car’.
To have an Eulerian walk (but not an eulerian
Directed graph (digraph): [p. 660] A graph
circuit), a network must be connected and have
or network in which directions are associated with
exactly two vertices of odd degree, with the
each of the edges.
remaining vertices having even degree.
Discrete variable: [p. 3] A numerical
variable that represents a quantity that is Euler’s formula: [p. 614] The formula
determined by counting; for example, the number v − e + f = 2, which relates the number of
of people waiting in a queue is a discrete variable. vertices, edges and faces in a connected graph.
Dominance matrix: [p. 531] A square Explanatory variable: [p. 105] When
binary matrix in which the 1s represent one-step investigating associations in bivariate data, the
dominances between the members of a graph. explanatory variable (EV) is the variable used
to explain or predict the value of the response
Dot plot: [p. 28] A statistical graph that uses
variable (RV).
dots to display individual data values on a number
line; suitable for small sets of data only. Extrapolation: [p. 182] Using a mathematical
Dummy activity: [p. 687] An artificial activity model to make a prediction outside the range of
of zero time duration added to a project diagram data used to construct the model.
F → I
Face: [p. 614] An area in a graph or network Hamiltonian cycle: [p. 627] A Hamiltonian
that can only be reached by crossing an edge. One path that starts and finishes at the same vertex.
such area is always the area surrounding a graph. Hamiltonian path: [p. 627] A path through a
Finance Solver: [p. 422] A finance solver is graph or network that passes through each vertex
a computer/calculator application that automates exactly once. It may or may not start and finish at
the computations associated with analysing a the same vertex.
reducing balance loan, an annuity or an annuity Histogram: [p. 15] A statistical graph used to
investment. display the frequency distribution of a numerical
Five-number summary [p. 61] A list of the variable; most suitable for medium to large sized
five key points in a data distribution: the minimum data sets.
value (min), the first quartile (Q1 ), the median Hungarian algorithm: [p. 676] An algorithm
(M), the third quartile (Q3 ) and the maximum for solving allocation (assignment) problems.
value (max).
Flat-rate depreciation: [p. 344] Deprecia-
tion where the value of an item is reduced by the I
same amount each year. Flat-rate depreciation is Identity matrix (I): [p. 477] A matrix that
equivalent, but opposite, to simple interest. behaves like the number one in arithmetic. Any
Float (slack) time: [p. 694] The amount of matrix multiplied by an identity matrix remains
time available to complete a particular activity that unchanged. An identity matrix is represented by
does not increase the total time taken to complete the symbol I.
the project. Immediate predecessor: [p. 683] An
Flow: [p. 660] Flow is the movement of activity that must be completed immediately
something from a source to a sink. before another one can start.
Forward scanning: [p. 695] Forward Initial state matrix: [p. 560] A column
scanning is the process of determining the EST matrix used to represent the starting state of a
for each activity in a project activity network. dynamic system.
Frequency table: [p. 7] A listing of the values Interest: [p. 342] The amount of money paid
a variable takes in a data set along with how often (earned) for borrowing (lending) money over a
(frequently) each value occurs. Frequency can be period of time.
recorded as a count or as a percentage. Interest-only loans: [p. 446] A loan on
which only the interest is paid. At the end of the
loan, the principal must be repaid in full.
the identity matrix (I). For a matrix A, the
inverse is written as A−1 and has the property that
A−1 A = AA−1 = I. Matrix: [p. 473] A rectangular array of numbers
or symbols set out in rows and columns within
Irregular (random) fluctuations [p. 262]
square brackets (plural: matrices).
Unpredictable fluctuations in a time series.
Always present in any real world time series plot. Matrix multiplication: [p. 499] The process
of multiplying a matrix by a matrix.
Isolated vertex: [p. 610] A vertex that is not
L → M
connected to any other vertex. Its degree is zero. Maximum flow (graph): [p. 661] The
capacity of the ‘minimum’ cut.
Isomorphic graphs: [p. 613] Equivalent
graphs. Graphs that have the same number of Maximum or minimum value of the
edges and vertices that are identically connected. objective function: [p. 759] The value found
by evaluating the objective function’s value at the
Iteration [p. 336] Each application of a
vertices or along the boundaries of the feasible
recurrence rule to calculate a new term in a
sequence is called an iteration.
Mean (x): [p. 50] The balance point of a data
distribution. The mean is given by x̄ = , where
Σx is the sum of the data values and n is the
L number of data values. Best used for symmetric
Latest start time (LST): [p. 696] The latest
time an activity in a project can begin, without Median: [p. 44] The median (M) is the middle
affecting the overall completion time for the value in a data distribution. It is the midpoint of a
project. distribution dividing an ordered data set into two
equal parts. Can be used for skewed or symmetric
Least squares method: [p. 170] One way
of finding the equation of a regression line. It
minimises the sum of the squares of the residuals. Minimum cut (graph): [p. 664] The cut
It works best when there are no outliers. through a graph or network with the minimum
Linear decay [p. 340] When a recurrence rule
involves subtracting a fixed amount, the terms in Minimum spanning tree: [p. 642] The
the resulting sequence are said to decay linearly. spanning tree of minimum length. For a given
connected graph, there may be more than one
Linear growth [p. 340] When a recurrence
minimum spanning tree.
rule involves adding a fixed amount, the terms in
the resulting sequence are said to grow linearly. Modal category or modal interval: [p. 9]
The category or data interval that occurs most
Linear regression: [p. 169] The process of
frequently in a data set.
fitting a straight line to bivariate data.
Mode: [p. 9] The most frequently occurring
Log scale [p. 34] A scale used to transform
value in a data set. There may be more than one.
a strongly skewed histogram to symmetry or
linearise a scatterplot. Modelling: [ch. 7] Mathematical modelling
is the use of a mathematical rule or formula to
Logarithmic transformations (log x or
represent real-life situations.
log y): [p. 221] Transformations that linearise a
scatterplot by compressing the upper end of the Moving mean smoothing: [p. 268] In
scale on an axis. three-moving mean smoothing, each original
data value is replaced by the mean of itself and
Loop: [p. 610] An edge in a graph or network
the value on either side. In five-moving mean
that joins a vertex to itself.
smoothing, each original data value is replaced
Lower fence: [p. 63] See outliers. by the mean of itself and the two values on either
Moving median smoothing: [p. 277] as data values greater than the upper fence
Moving median smoothing is a graphical (Q3 + 1.5 × IQR) or less than the lower fence
technique for smoothing a time series plot using (Q1 − 1.5 × IQR).
moving medians rather than moving means.
Multiple edges: [p. 610] Where more than one
Row matrix: [p. 474] A matrix with only one
row. A row matrix is also called a row vector.
Range (R): [p. 46] The difference between the
Row vector: [p. 474] See row matrix.
smallest and the largest observations in a data set;
a measure of spread.
Reciprocal transformations (1/x or
1/y): [p. 230] Transformations that linearise a
scatterplot by compressing the upper end of the Scalar multiplication: [p. 492] The
R → S
scale on an axis to a greater extent than the log multiplication of a matrix by a number.
transformation. Scatterplot: [p. 130] A statistical graph used
Recurrence relation: [p. 336] A relation that for displaying bivariate data. Data pairs are
enables the value of the next term in a sequence represented by points on a coordinate plane, the
to be obtained by one or more current terms. EV is plotted on the horizontal axis and the RV is
Examples include ‘to find the next term, add two plotted on the vertical axis.
to the current term’ and ‘to find the next term, Scrap value: [p. 344] The value at which an
multiply the current term by three and subtract item is no longer of use to a business.
Seasonal indices: [p. 284] Indices calculated
Reducing-balance depreciation: [p. 361] when the data shows seasonal variation. Seasonal
When the value of an item is reduced by the indices quantify seasonal variation. A seasonal
same percentage each year. Reducing-balance index is defined by the formula:
depreciation is equivalent to, but opposite to,
compound interest. value for season
seasonal index =
seasonal average
Reducing-balance loan: [p. 400] A loan
that attracts compound interest, but where regular For seasonal indices, the average is 1 (or 100%).
repayments are also made. In most instances the
Seasonality: [p. 259] The tendency for values
repayments are calculated so that the amount of
in the time series to follow a seasonal pattern,
the loan and the interest are eventually repaid in
increasing or decreasing predictably according to
time periods such as time of day, day of the week,
Redundant communication link: [p. 527] A month, or quarter.
communication link is said to be redundant if the
Segmented bar chart: [p. 9] A statistical
sender and the receiver are the same people.
graph used to display the information contained in
Reseasonalise [p. 285] The process of a two-way frequency table. It is a useful tool for
converting seasonal data back into its original identifying associations between two categorical
form. variables.
Residual: [p. 170] The vertical distance from a
Sequence: [p. 334] A list of numbers or
data point to a straight line fitted to a scatterplot is
symbols written down in succession, for example
called a residual:
5, 15, 25, . . .
residual = actual value − predicted value
Residuals are sometimes called errors of Shape of a distribution: [p. 21] The
prediction. general form of a data distribution described
Residual plot: [p. 184] A plot of the residuals as symmetric, positively skewed or negatively
against the explanatory variable. Residual skewed.
plots can be used to investigate the linearity Shortest path: [p. 634] The path through a
assumption. graph or network with minimum length.
Response variable [p. 105] The variable of Simple graph: [p. 610] A graph with no loops
primary interest in a statistical investigation. or multiple edges.
Round-robin tournament: [p. 531] A Simple interest: [p. 342] Interest that is
tournament in which each participant plays each calculated for an agreed period and paid only on
other participant once. the original amount invested or borrowed.
Singular matrix: [p. 517] A matrix that does Steady-state matrix: [p. 565] A column
not have an inverse; its determinant is zero. matrix that represents the final state of a dynamic
Sink: See sink and source. system. Also called the equilibrium state.
Sink and source: [p. 660] In a flow network, Stem plot (stem-and-leaf plot): [p. 29]
a source generates flow while a sink absorbs the A method for displaying data in which each
flow. observation is split into two parts, a ‘stem’ and a
‘leaf’. A stem plot is an alternative display to a
Slope (of a straight line): [p. 169] The slope histogram, suitable for small to medium sized data
of a straight line is defined to be: slope = . sets. When data values are tightly clustered, stems
The slope is also known as the gradient. can be split to give finer detail.
Smoothing: [p. 268] A technique used to Strength of a linear relationship:
eliminate some of the variation in a time series [p. 143] Classified as weak, moderate or strong.
plot so that features such as seasonality or trend Determined by observing the degree of scatter in a
are more easily identified. scatterplot or calculating a correlation coefficient.
Source See sink and source. Structural change (time series) [p. 260] A
sudden change in the established pattern of a time
Spanning tree: [p. 641] A subgraph of a
series plot.
connected graph that contains all the vertices of
the original graph, but without any multiple edges, Subgraph: [p. 611] Part of a graph that is also
circuits or loops. a graph in its own right.
Spread of a distribution: [p. 46 & p. 52] Summary statistics [p. 43] Statistics that
A measure of the degree to which data values give numerical values to special features of a
are clustered around some central point in the data distribution, such as centre and spread.
distribution. Measures of spread include the Summary statistics include the mean, median,
standard deviation (s), the interquartile range range, standard deviation and IQR.
(IQR) and the range (R). Symmetric distribution: [p. 21] A data
Square matrix: [p. 476] A matrix with the distribution in which the data values are evenly
same number of rows as columns. spread out around the mean. In a symmetric
distribution, the mean and the median are the
Squared transformations (x2 or y2 ):
[p. 212] Transformations that linearise a
scatterplot by stretching out the upper end of the
scale on an axis.
Standard deviation (s): [p. 52] A summary
Time series data: [p. 253] A collection of
statistic that measures the spread of the data data values along with the times (in order) at
values aroundrtheP mean. 2The standard deviation is which they were recorded.
(x − x̄)
given by s = . Time series plot: [p. 253] A line graph where
Standardised (z) scores: [p. 80] The value the values of the response variable are plotted in
of the standard score gives the distance and time order.
direction of a data value from the mean in terms of Trail [p. 625] A walk with no repeated edges.
standard deviations. See also path.
The rule for calculating a standardised score is:
actual score − mean Transition matrix (T): [p. 551] A square
standardised score = matrix that describes the transitions made between
standard deviation
State matrix: [p. 559] A column matrix that the states of a system.
represents the starting state of a dynamic system. Transpose: [p. 475] The transpose of a matrix
Statistical question: [p. 10] A question that is obtained by interchanging its rows and columns.
depends on data for its answer. Tree: [p. 641] A connected graph with no
circuits, multiple edges or loops.
Trend: [p. 257] The tendency for values in the Univariate data: [p. 1] Data associated with a
time series to generally increase or decrease over a single variable.
significant period of time. Upper fence: [p. 63] See outliers.
Trend line forecasting: [p. 296] Using a line
fitted to an increasing or decreasing time series to
predict future values. V
Triangular matrix: [p. 477] An upper Vertex (graph): [p. 609] The points in a graph
U → Z
triangular matrix is a square matrix in which all or network (pl vertices).
elements below the leading diagonal are zeros.
A lower triangular matrix is a square matrix in
which all elements above the leading diagonal are W
zeros. Walk [p. 624] Any continuous sequence of
Two-way frequency table: [p. 110] A edges, linking successive vertices, that connects
frequency table in which subjects are classified two different vertices in a graph. See also trail
according to two categorical variables. Two-way and path.
frequency tables are commonly used to investigate Weighted graph: [p. 633] A graph in which
the associations between two categorical a number representing the size of some quantity
variables. is associated with each edge. These numbers are
called weights.
Unit-cost depreciation: [p. 345] Deprecia- Z
tion based on how many units have been produced Zero matrix (O): [p. 493] A matrix that
or consumed by the object being depreciated. For behaves like zero in arithmetic. Represented by
example, a machine filling bottles of drink may be the symbol O. Any matrix with zeros in every
depreciated by 0.001 cents per bottle it fills. position is a zero matrix.
Chapter 1 b
Shoe size Count %
Exercise 1A
8 6 50.0
1 a numerical b numerical c categorical 9 3 25.0
d categorical e numerical 10 2 16.7
f numerical g categorical
11 0 0
h categorical
12 1 8.3
2 a nominal b nominal c ordinal
d ordinal e ordinal f nominal
Total 12 100.0
1 a Total 11 100.0
c 7
Grades Count %
A 3 27.3 5
B 5 45.5 3
C 3 27.3 2
Total 11 100.1 0
0 Victoria SA WA
State of residence
3 a categorical c Vehicle type
b Commercial
Frequency 90
Car size Count % 70
Small 8 40 50
Medium 9 45 40
Large 3 15 20
Total 20 100 0
6 a 20, 55 b 5 c 20 d 55%
50 e Report: 20 schools were classified ac-
40 cording to school type. The majority of
35 these schools, 55%, were found to be
25 government schools. Of the remaining
20 schools, 25% were independent while
10 20% were Catholic schools.
5 7 a 7, 45.5, 100.0
0 Small Medium Large b Report: When 22 students were asked
Car size the question, ‘How often do you play
4 a nominal sport’, the most frequent response was
b Place of birth ‘sometimes’, given by 45.5% of the
Overseas students. Of the remaining students,
80 31.8% of the students responded that
Exercise 1C 4
1 a 9
Number Count %
0 6 30 5
1 4 20 4
2 3 15 2
3 3 15 1
4 2 10 0 200 400 600 800 1000
5 2 10 Population density
Total 20 100 5 a i 17% ii 13% iii 46% iv 33%
b i 6 ii 4
b 20%
c 15–19 words/sentence
c 0
6 a 21
2 a
Frequency b i 13 ii 8 iii 5 iv 0
Number Count % c i 4.8% ii 57.1%
2 1 2.5 7 a
3 0 0
4 17 42.5
5 13 32.5
6 9 22.5
Total 40 100.0
b 2.5% 0
c 4 62 66 70 74 78 82 86 90 94 98
3 a b i 69 ii 3; 69, 70, 70
Height (cm) Frequency c
160−164 5
165−169 5 4
170−174 5
175−179 6
180−184 3
185−189 1
Total 25 60 70 80 90 100
b 175−179 Pulse
d 3
c 16%
8 a c
20 symmetric
count 4 15
2 5
0 centre
0 1 2 3 4 5 6 7 8 9 10
children Histogram C
b 3.5, 5
c mode
9 20
6 15
positively skewed
0 2 4 6 8 10 12
children 0
d i 2 ii 6 and 7 centre
9 a mode Histogram D
20 positively e
15 potential mode
10 outlier 20
Histogram A 5 skewed
b 0
symmetric centre
80 Histogram E
60 f
40 modes
80 symmetric
Histogram B
Histogram F
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
1D 778 Answers
Exercise 1E b M = 26
c Q1 = 17.5, Q3 = 30.5
1 a 0.4 b 1.4 c 2.4 d 3.4
d IQR = 13, R = 29
e −0.3 f −1.3 g −2.3 h −3.3
8 a positively skewed with a possible
2 a 0.0032 b 0.032 c 0.32 d 1.0
outlier at 6.
3 a 20
b M=0
1E → 1F
c IQR = 1 d R=6
9 a M = 21 b Q1 = 10.5, Q3 = 28
c IQR = 17.5, R = 54
0 1000 2000 3000 4000 5000 6000 10 Median from 65 to less than 70, Q1 from
brain weight (gm) 60 to less than 65, Q3 from 75 to less than
The shape is positively skewed with 80.
outliers. 11 a Median in the interval 5.0-9.9.
b 8
b max IQR = 19.9
12 a n = 4, Σx = 12, x̄ = 3
b n = 5, Σx = 104, x̄ = 20.8
c n = 7, Σx = 21, x̄ = 3
13 a x̄ = 3, M = 3, Mode = 2
–1.0 .0 1.0 2.0 3.0 4.0 b x̄ = 5, M = 5, Mode = 5
log(brain weight)
14 a i mean = 36.1 ii median = 36.0
The shape is slightly negatively
b The mean and median almost co-
skewed but closer to symmetric.
incide because the distribution is
4 a −0.4 b 3.8 c 100 g d 0.1 g approximately symmetric.
e i 5 ii 12 iii 24 15 a i mean = $3.65 ii median = $1.70
5 B 6 D b The median. The mean is inflated
because of the one large sale and not
Exercise 1F representative of the sales in general.
16 a strongly positively skewed distribution
1 a 5 b 12
b positively skewed distribution with
2 $850
3 M=1
17 a symmetric; either
4 a M=7.3 b R=6.4 b mean = 82.55 median= 82.5
5 a M = 2 b Q1 = 1, Q3 = 3 c IQR = 2 18 a IQR b range c standard deviation
d R=7
19 7.1, 0 20 b, d, f
6 a M = 11 b Q1 = 10, Q3 = 15
c IQR = 5, R = 18 21 a 20.1, 1.8 b symmetric
c The distribution is approximately 11 a 120 b 116 c 142 d 100
symmetric but with outliers. The e 72 f 50
distribution is centred at 41, the 12 $1.50
median value. The spread of the 13 101 g
distribution, as measured by the IQR, 14 mean = 3.5 kg, st dev = 0.5 kg
is 7 and, as measured by the range, 36. 15 mean = 66.0 marks, st dev = 7.7 marks
1H → 1 review
The outliers are at 10, 15, 20 and 25.
16 a 0.2 b 46.5 c 2.5% d 34%
16 The median time it takes Taj to travel
e 16% f 97.5%
to university is 70 minutes. The range
17 a i 16% ii 2.5%
is of the distribution of travel time is
b 130
60 minutes, but the interquartile range
c 133
is only 15 minutes. The distribution of
travel times is positively skewed with two 18 a i 84% ii 97.5%
outliers, unusually long travel times of b 184 cm
110 minutes and 120 minute respectively. c 144 cm
17 B 18 A 19 B 20 D d 150.4 cm
19 A 20 D 21 C 22 C
Exercise 1H 23 C
Agree 18 34.6
iv 0
Disagree 26 50.0 5 a 18 cm
Don’t know 8 15.4 b 5.5%
Total 52 100.0 c
90 Don’t know
80 Disagree
70 Agree
15 16 17 18 19 20 21 22
Exercise 2B c No, there is little difference in the
percentage of males and females who
1 a EV: gender, RV: intends to go to
are left handed, 9.0% compared to
5 a course b ordinal c 54.9%
Intends to go Gender d Yes; the percentage of Business stu-
to university Male Female dents who exercise regularly (18.6%)
Yes 4 8 was much higher that the percentage of
No 4 4 Arts who exercise regularly (5.9%).
Total 8 12 6 a
Teacher (%)
Result Dr Evans Dr Smith
2 a EV: age group, RV: reduce university
fees? Fail 11.1 9.4
b Pass 61.1 62.5
Credit 27.8 28.1
Reduce Age group
university fees? 17-18 19-25 26 or more Total 100.0 100.0
Yes 8 6 6 b 100%
No 3 3 4 Credit
Total 11 9 10 40%
c 20%
Reduce Age group (%) Dr Evans Dr Smith
university fees? 17-18 19-25 26 or more c There is no evidence students of Dr
Yes 72.7 66.7 60.0 Evans receive higher grades than
No 27.3 33.3 40.0 students of Dr Smith. The percentage
of students achieving each grade level
Total 100.0 100.0 100.0
is almost the same for both classes (eg.
3 a enrolment status 61.1% compared 62.5% for students
b No. The percentage of full-time and who received a Pass).
part-time students who drank alcohol 7 The data supports the contention that
is similar: 80.5% to 81.8%. This people who are satisfied with their job are
indicates that drinking behaviour is not more likely to be satisfied with their life,
related to enrolment status. with 70% of people who are satisfied with
4 a handedness their job reporting that they are satisfied
b with their life, compared to only 50% of
Gender (%)
people who are dissatisfied with their job.
Handedness Male Female
8 a EV: type of treatment, RV: treatment
Left 9.0 9.8 outcome
Right 91.0 90.2 b The data supports the contention that
Total 100.0 100.0 the special pillow is more effective
at treating snoring than the drug
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 treatment, with just over
30% ofUniversity Press
Photocopying is restricted under law and this material must not be transferred to another party.
2C 784 Answers
people who used the special pillow (34 years) admitted to the hospital
reporting they were cured, compared to was considerably higher than the
only 10% of people who used the drug median age of males (25.5 years). The
reported they were cured. variability of the ages was also higher
9 a 11.9% b 52.3% c marital status for the females (IQR = 28 years)
d ordinal compared that of the males (IQR = 13
e Yes. There are several ways that years).
this can be seen. For example, by 3 a hours online: numerical, year level:
comparing the married and widowed categorical
groups, we can see that a smaller b From this information it can be
percentage of those widowed found concluded that the median number
life exciting (33.8%) compared to of hours spent online was associated
those who were married (47.6%). with year level. The median time spent
Or: a bigger percentage of widowed online by the Year 10 students (20
people found life pretty routine (54.3% hours) was higher than the median
to 48.7%) and dull (11.9% to 3.7%) number of hours by the Year 11
compared to those who were married. students (16.5 hours). The variability
10 A 11 B 12 C of the hours spent online was lower for
the Year 10 students (IQR = 9.5 hours)
compared that of the Year 11 students
Exercise 2C
(IQR = 13 hours).
1 a EV: country of origin, RV: number of 4 a age at marriage: numerical, gender:
days away categorical
b The number of days these tourists b For this data there is an association
spend away from home was associated between age at marriage and gender.
with their country of origin. The me- The age at marriage is higher for
dian number of days spent away from men (M = 23 years) than for women
home for Japanese tourists (M = 17 (M = 20.5 years). The variability is
days) is considerably higher than also greater for the men (IQR = 12
for Australian tourists (M = 7 days). years) than for the women (IQR =
The variability for the number of 8.5 years). The distributions of age
days away is also higher for Japanese at marriage are positively skewed for
tourists (IQR = 16.5) compared to that both men and women. There are no
for Australian tourists (IQR = 10.5). outliers.
2 a age: numerical, gender: categorical 5 a pulse rate: numerical, gender:
b From this information it can be categorical
concluded that the median age of the b For this data there is an association
people admitted to the hospital during between pulse rate and gender.
this week was associated with their The pulse rates for males (M = 73
age. The median age of the females beats/min) are lower than the pulse
rates for women (M = 76 beats/min). 3
The variability is also lower for the
males (IQR = 8 beats/min) than for 30
the women (IQR = 14 beats/min).
Both distributions are approximately 15
symmetric, with no outliers.
2D → 2E
6 a lifetime: numerical; price: categorical 0
5 10 15 20 25 30 35 40 45 50 55 60 65
b For this data there is an association
between the lifetime of a battery 4 2.30
and its price. The lifetime of the
high price batteries (M = 51 hours) 2.25
price batteries (M = 35 hours), which
is in turn slightly longer than that
of the low price batteries (M = 32 2.10
hours). The variability in lifetime 2.02
increased as price decreased, from
IQR = 7 hours for the high price 0 40 80 120 160
batteries, to IQR = 12 hours for the
5 135
medium price batteries, and IQR = 17
hours for the low price batteries. All 120
7 A 8 D 90
Exercise 2D 0 4 8 12 16 20 24
1 a number of seats 6 D
b numerical
c 8 aircraft Exercise 2E
d around 800 km/h
1 Note: There are no absolute right or
wrong answers to these questions as
answering them requires a degree of
max temp
personal judgment.
33 i no association ii yes, positive
iii yes, positive iv yes, positive
30 v yes, negative vi yes, negative
2 a i moderate, positive, linear association
17.5 18.5 19.5 20.5 21.5 22.5 23.5
min temp ii weak, negative, linear association
iii strong, positive, linear association
conditions. Weather conditions are the Written-response questions
probable common cause. 1 a Number of accidents and age; both
4 Maybe but not necessarily. Bigger categorical variables
hospitals tend to treat more people with b RV: Number of accidents; EV: age
serious illnesses and these require longer
c 470
hospital stays. A common cause could be
No of accidents < 30 ≥ 30
2I → 2 review
the type of patients treated at the hospital.
5 Not necessarily. Possible confounding At most one 21.7% 42.5%
variables include age and diet. More than one 78.3% 57.5%
6 There is no logical link between eating e The statement is correct. Of drivers
cheese and becoming tangled in bed aged less than 30, 78.3% had more
sheets and dying. The correlation is than one accident compared to only
probably spurious and the result of 57.5% of drivers in the older category.
coincidence. 2 a Numerical: conversation test score.
7 Not necessarily. For example, the more Categorical: completed weeks of
serious the fire, the more fire trucks course
in attendance and the greater the fire b There is an association between the
damage. A possible common cause is the students’ scores on the conversation
severity of the fire. test, and the number of weeks of
8 E the course they have completed.
The median score at the beginning
Exercise 2I
of the course (M=38) showed a
1 a segmented bar chart little improvement after six weeks
b scatterplot c parallel box plots (M =42), followed by a very large
d scatterplot e scatterplot improvement by the end of the 12
f segmented bar chart week course (M =72). The variability
g segmented bar chart of the scores changed little over the
h parallel box plots or back-to-back stem plots course (IQR=12 at the beginning,
2 E IQR = 12 at 6 weeks, IQR = 14 at 12
3 D weeks). The distributions of scores at
0 weeks is approximately symmetric
with an outlier at 66, positively skewed
Chapter 2 review with an outlier at 76 at 6 weeks, and
Multiple-choice questions approximately symmetric with no
outliers at 12 weeks.
1 A 2 D 3 B 4 D
3 a rate is the response variable,
5 E 6 B 7 D 8 A
experience is the explanatory variable.
9 C 10 E 11 D 12 E
13 C 14 C 15 C 16 D
17 A 18 E 19 B 20 C
21 E 22 C
b 34 2 C
3 The data is numerical; the association is
linear; there are no clear outliers.
30 4 a x
rate ($/hr)
28 b y = 9.23 + 1.00x
4 a 80 cm, extrapolating e 49.7%: 49.7% of the variation in
b 92 cm, interpolating success rate in putting is explained by
c 98 cm, extrapolating the variation in the distance the golfer
5 a $487.50, extrapolating is from the hole.
b $1023.50, interpolating 14 a yes, linear relationship
6 a 173 cm, reliable, interpolating c 93.5%
by 0.74 cm for each one centimetre
18 E 19 B 20 A
Exercise 3C
1 a
0 1 2 3 4 5 6 7 8
b score = 17.5 − 1.08 × error,
r = −0.841, r2 = 0.707
test b
8 9 10 11 12 13 14 15 16 17 18 19
y = 17.5 + 1.08333x
test a 15
8 9 10 11 12 13 14 15 16 17 18 19
test a −1.5
0 1 2 3 4 5 6 7 8
3 a RV: adult weight; EV: birth weight 17 C 18 A 19 A 20 D
b 21 C 22 C 23 D
adult weight
Written-response questions
1 a i 5 years
ii mean= 767, st dev = 35
3 review
1.0 2.2 2.6 3.0 3.4 3.6 4.2
birth weight b airspeed = 673 + 0.372 × number of
c i strong positive linear association seats
with no outliers c 74.1%
ii approximately 0.9 2 a days of rain b −6.88, 2850 c 2024
d adult weight = d decrease, 6.88 e −0.696
38.4 + 5.87 × birth weight, f 48.4, days of rain g i 1873 ii −483
r2 = 0.765, r = 0.875 h interpolation
e 76.5% of the variation in the adult 3 a cost
weight is explained by the variation in b There is a strong, positive, linear
birth weight. association between the cost of
f On average, adult weight increases by the meals and the number of meals
5.9 kg for each additional kilogram of prepared.
birth weight. c i $307.30 ii extrapolating
g i 56.0 ii 53.1 iii 61.3 d i 222.48: the fixed costs of preparing
h Yes. 76.5% of the variation in the adult meals is $222.48.
weight is explained by the variation in ii $4.039: The slope of the regression
birth weight. line predicts that, on average,
i 3.0 meal preparation costs increase by
1.0 $4.039 for each additional meal
e Answer given in question.
4 a RV: height; EV: femur length
b height = 36.3 + 5.35 × f emur length
1.0 2.2 2.6 3.0 3.4 3.6 4.2
birth weight c On average, height increases by 5.35
The lack of a clear pattern in the cm for each cm increase in femur
residual plot supports the assumption length.
that the association between adult d r2 = 0.988; that is, 98.8% of the
weight and birth weight is linear. variation in height is explained by the
variation in femur length.
Chapter 3 review
e 97.6%
Multiple-choice questions
5 a RV: height; EV: age
1 C 2 D 3 A 4 C b strong positive association with no
5 E 6 C 7 B 8 B outliers.
9 D 10 A 11 A 12 A c Answer given in question.
13 D 14 E 15 A 16 C d i Answer given in question
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
4A → 4B 792 Answers
variation in age. 0 x
g i 140.3 cm ii −0.7 cm
h i Answer given in question. b y2 = 1.5 + 3.1x
ii Residuals show a clear curved pattern. c y = ±5.4, but only the positive solution
6 a moderate, positive linear association applies here because the model is only
with no outliers defined for y > 0.
b i 142 6 a number of people = 0.0 + 4.1 ×
ii extrapolating diameter2
c −6.3 b 7
d i linearity 7 a time2 = 18 − 9.3 × amount b 3.8 min
ii the lack of a clear pattern in the 8 D 9 A 10 B
residual plot supports the linearity
assumption. Exercise 4B
Exercise 4A 10
1 a 19.5 b 11.7 c 23.8 d 126.7 7
2 a y 5
20 3 log x
1. 0
b y = 1 + 3 log x c 7
3 a y
5 16
0 x2 14
0 5 10 15 20 12
b y = 16 − x2 10
c when x = −2, y = 12 6
4 log x
3 a y
1.0 1.5 2.0 2.5 3.0
60 b y = 20 − 5 log x c 5
50 4 a 100 b 218.8 c 1 000 000
30 d 0.8
0 x2
0 5 10 15 20 25
b y = 1 + 2x2
c when x = 6, y = 73
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Answers 793
5 a log y 5 a horsepower = 22.1 +
2.0 b 99 to nearest whole number
1.8 1
6 a = −0.00024 + 0.050 × times
to 2 sig. figs
b 3 to nearest whole number
1.2 x
4C → 4D
0.1 0.2 0.3 0.4 0.5 7 A 8 E 9 C
b log y = 1 + 2x c 158.5
6 a level = 1.8 + 2.6 log(time) to 2 sig. figs Exercise 4D
b 2.8 to 1 d.p. 1 1
1 a log y, , log x,
7 a log(number) = 1.314 + 0.08301 × y x
b None; trend needs to be consistently
month to 4 sig. figs
increasing or decreasing.
b 139 to nearest whole number 1
c log y , , x2
8 C 9 E 10 A y
2 2
d x ,y
Exercise 4C 2 a
1 a 13.3 b 2.8 c 4.9 d 309.5
2 a y 2000
potato yield(kg)
60 1500
0 0.1 0.2 0
120 5 10 15 20 25 30 35
b y= c 24 plot length (metres)
x b yield = −620.0 + 80.23 × length
3 a 0.17 b 0.07 c 0.16 d 0.06
4 a 1
y 400
5 200
4 100
2 –200
1 –300
5 10 15 20 25 30 35
0 x plot length (metres)
0 1 2 3 4 5
No, the residuals show a clear curved
b =x pattern.
c 4
d log y , , x2
e yield = 3.983 + 2.030 × (length)2
f r2 = 97.5% 4 a
3 a 350
density (people/hectare)
16 250
smoking (cigarettes/day)
10 50
0 2 4 6 8 10 12 14
distance (km)
8 b density = 345.3 − 18.65 × distance
.60 .70 .80 .90 1.00 1.10 1.20 1.30 1.40
cost ($/cigarette) c
b smoking = 22.49 − 9.501 × cost 40
.50 20
0 2 4 6 8 10 12 14
distance (km)
No, the residuals show a clear curved
.60 .80 1.00 1.20 1.40 pattern.
cost ($/cigarette)
No, the residuals show a clear curved d x2 , y2
pattern. e density = 308.9 − 1.345 × (distance)2
1 1
d log x , log y , , f r2 = 99.1%
y x
e Either the log x and could be
x Chapter 4 review
recommended as both transformations
Multiple-choice questions
give very good results. That is
smoking = 3.420 + or 1 A 2 D 3 D 4 B
smoking = 12.73 − 21.90 × log(cost) 5 A 6 B 7 E 8 D
1 9 D 10 D 11 D
The transformation is more intuitive
and easier to interpret. Written-response questions
1 1
f : r2 = 99.3% 1 a = 2.606 − 1.053 × length
x age
log x: r2 = 99.6% b 2.6 years
2 a literacy rate = −44.2 + 33.3 log (GDP)
b 4 a
30 120
–10 60
4 review
2.0 2.5 3.0 3.5 4.0 4.5 20
Residual plot shows no clear pattern 0 50 100 150 200 250 300 350 400
c 89% b
d 0.077 120
3 a 180
160 100
Distance (metres)
60 40
20 20
1 2 3 4 5 6 0
Time (seconds) .000 .005 .010 .015 .020 .025 .030 .035 .040
b 1
Time 0 1 2 3 4 5 6
c mortality = −1.194 + 3856 ×
Distance 0 5.2 18 42 79 128 168 doctors
Time 0 1 4 9 16 25 36 60
c 200 40
50 0
0 –20
0 5 10 15 20 25 30 35 40
(time)2 –40
.00 .01 .02 .03 .04
d distance = 0.45 + 4.8 × time2
e 236 metres The residual plot shows no clear struc-
f 7.5 ture indicating that the assumption of
2.5 linearity is justified.
–2.5 e r2 =82.8%
–5.0 f 37
0 5 10 15 20 25 30 35 40
(time)2 g Since 100 is within the range of the
The residual plot shows no clear structure data we are interpolating, and the
indicating that the assumption of linearity prediction is reliable.
is justified.
Chapter 5
Feature A B C
Irreg fluct X X X
Exercise 5A
Inc trend X
1 Dec trend X
Cycles X
60 Seasonality X X
6 Feature A B C
Irreg fluct X X
Struct change X
20 Inc trend X
10 Dec trend X
0 Seasonality X
2014 2015 2016 2017 2018 2019 2020 2021 2022
7 Feature A B C
2 800
Inc trend X
Dec trend
200 Outlier X
8 The number of mobile phones per 100
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec people increases rapidly over the years
3 25 2000-2008. The number continues to
24 increase from 2009 until 2019, but the
increase in the number of phones is at
much lower rate than in the preceding
20 years.
9 a
Population of Australia (millions)
17 26
15 25
Mon Tue Wed Thu Fri Sat Sun
4 24
Feature A B C
Irreg fluct X X X 23
Inc X
Dec X 2010 2012 2014 2016 2018 2020 2022
b The plot shows a steady increase in the 1945 to 1975 but then decreased
population of Australia over the years at a similar rate to males over the
2012 - 2021. period 1975–1992.
10 a ii The difference in smoking rates
600 between males and females has
Theft rate(per 100,000 cars)
400 b i
200 24.0
100 22.0
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020 18.0
b The plot shows a steady decline in
the number of vehicle thefts over the
1998 2000 2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
years 2003 -2010, after which the Year
number of vehicle thefts has remained ii Whilst both plots show irregular
reasonably steady, showing only fluctuation, overall the percentages
irregular variation. of male and females who smoke
have declined substantially over the
11 The number of cases of measles show an
years 2000-2018.
increasing trend between 1989 and 1992.
In 1993-1994 there is a rapid increase in iii The difference in smoking rates
the number of measles cases, followed between males and females has
be a rapid decrease in 1994-1995. The remained almost the same over
number of cases continued to decrease these years.
until 2000, since then have remained low, 14 E
showing only irregular variation over the 15 D
years 2001-2019.
12 The number of overseas arrivals (millions Exercise 5B
people per month) in Australia increased
1 a i 3 ii 1 iii 4
steadily from November 2011 until April
b i 3.2 ii 1.2 iii 2.2
2020. The number of arrivals is clearly
c i 2.6 ii 2.0
seasonal, with the peak time for arrivals
in the January quarter each year. The d 2.3
number of arrivals dropped suddenly to 2 a 24.4 b 20.0 c 23.2
almost zero in April 2020, and remained 3
at this level until October 2021. t 1 2 3 4 5 6 7 8 9
13 a i The percentage of males who
y 10 12 8 4 12 8 10 18 2
smoke has consistently decreased
3-mean − 10 8 8 8 10 12 10 −
since 1945, while the percentage of
females who smoke increased from 5-mean − − 9.2 8.8 8.4 10.4 10 − −
30 The exchange rate has a downward
trend over the 10-day period. This is
15 most obvious from the smoothed plots,
particularly the 5-moving mean plot.
0 b
0 1 2 3 4 5 6 7 8 9 10
Day 3-moving 5-moving
Raw data Day Exchange rate mean mean
3-mean smoothed
5-mean smoothed 1 0.743 − −
The smoothed plots show that the 2 0.754 0.745 −
‘average’ maximum temperature 3 0.737 0.747 0.742
changes relatively slowly over the 4 0.751 0.737 0.738
10-day period (the 5-day average varies 5 0.724 0.733 0.730
by only 5◦ ) when compared to the daily 6 0.724 0.720 0.729
maximum, which can vary quite widely 7 0.712 0.724 0.722
(for example, nearly 20◦ between the 8 0.735 0.721 0.720
fourth and fifth day) over the same
9 0.716 0.721 −
period of time.
10 0.711 − −
3-moving 5-moving 6 a 3.8 b 2.0
Day Temp. ( C) mean mean 7 a 3.3 b 1.5 c 2.4 d 1.9
1 24 − − 8 a 13.1 b 12.2 c 10.7
2 27 26.3 − 9 a, c
3 28 31.7 28.2 25
4 40 30.0 28.0
5 22 28.3 27.0
6 23 22.3 25.6 5
7 22 22.0 22.6 0
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
8 21 22.7 23.4 Number of complaints 2-moving mean
0 1 2 3 4 5 6 7 8 9 10
• exchange rate
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Answers 799
10 a, c 6
Four-mean smoothing of the plot shows
a steady increase in rainfall from June to Year
ar 0
Ye 11
Ye 12
Ye r 1
Ye r 3
Ye r 5
Ye ar 9
Ye 2
Ye 4
Ye 6
Ye r 7
Ye r 8
Ye r 1
GDP growth
18 3-median smoothing
5-median smoothing
Exercise 5D 11 b
lower than sales in the average month. Mon Tue Wed Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun
2 a 7.8 b 6.7 12 a, d
3 a 3.9 b 6.9 240
4 a Increase by 42.9%. 220
b Decrease by 23.1% 200
5 a 1.2 b 1514 c 1437 d 1005 180
6 Sum Aut Win Spr 150
1 2 3 4 5 6 7 8 9 10 11 12
Number of students: 56 125 126 96 Jobs vacancies Deseasonalised job vacancies
155 157 150 134 153 134 150
150 154 190 148 143 150 157 9
1 2 3 4 5 6 7 8 9 10 11 12
b There was a general decreasing trend The deseasonalised sales appear to
in the percentage of retail sales made show an increasing trend over time.
in department stores. c deseasonalised sales
c sales = 12.5 − 0.258 × year = 80.8 + 23.5 × quarter
The percentage of total retail sales d forecasted actual sales
that are made in department stores is = 386.3 × 1.13 = 437
5 review
decreasing by 0.258% per year. 7 C 8 E
d 8.6%
3 a age = −147 + 0.0882 × year; On Chapter 5 review
average, the average age of mothers
increased by 0.0882 years (equivalent Multiple-choice questions
to 1 month) each year between 2010
1 E 2 E 3 A 4 B 5 E 6 C
and 2020.
7 D 8 D 9 C 10 D 11 B 12 C
b 32.0 years; Unreliable as we are ex-
13 B 14 A 15 E 16 D 17 D 18 A
trapolating 10 years beyond the period
19 B 20 E
in which the data were collected.
4 a earnings = −83 280 + 42.07 × year;
Written-response questions
On average, average weekly earnings
increased by $42.07 each year between 1 a 18.5
2014 and 2021.
Carbon dioxide emissions
b $2122.10; Unreliable as we are extrap- 17.0
olating 9 years beyond the period in 16.5
5 a deseasonalised number
= 50.9 + 1.59 × quarter number 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
b deseasonalised number = 76.34 b Carbon dioxide emissions decreased
reseasonalised (actual) number = 90 between 2009 and 2014, then remained
(to the nearest whole number) reasonable steady over the years
6 a Year Q1 Q2 Q3 Q4 2014–2017, showing only irregular
1 122 128 118 130 fluctuations. Between 2017 and 2018
there was a small decrease in carbon
2 250 245 263 236
dioxide emissions.
b 400
sales c CO2 emissions = 612.0 − 0.2958 × year
300 d 12.7
a ii, b ii 9 D 10 A 11 E 12 E
6 13 B 14 A 15 E 16 C
5 17 D
inflation(%) 4
Exercise 6D
1 A 2 A 3 A 4 B
5 E 6 B 7 A 8 B
9 D 10 E 11 D 12 E
2009 2011 2013 2015 2017 2019 13 E 14 B 15 B 16 D
17 A
c The trend lines are parallel. As such,
they will never cross, so the inflation Exercise 6E
rate for China will remain higher than 1 a age, distance
the inflation rate for Australia. b mean = 7.17 km, sd = 3.46 km
d 1.7 c z = 1.7
3 a Sum Aut Win Spr d
SI 0.29 0.36 1.37 1.98
b Study mode Female Male
Sum Aut Win Spr On campus 3 3
Deseas 269 239 255 273 Online 4 2
Total 7 5
Chapter 6 e i 60%
e ii Yes, there is an association between
Exercise 6A
study mode preference and course.
A higher percentage of students
1 A 2 B 3 B 4 B business chose to study online
5 D 6 B 7 D 8 E (60%), compared to only 36% for
9 E 10 E 11 D 12 A both students of Health and Social
13 D 14 A 15 A 16 E Science.
17 C 18 B 19 B 20 B 2 a The distribution of distance is pos-
21 C 22 E 23 D 24 C tively skewed, with outliers at 17 km,
25 A 26 D 18 km, and 19 km.
Exercise 6B b 30
c i Lower fence = -2, Upper fence =
1 E 2 A 3 E 4 C
5 C 6 B 7 E 8 B
ii A distance of 1 km is within the
9 B 10 E 11 B 12 E
13 D 14 B 15 E 16 D
d i 1 km ii 1.5 km
17 A 18 D
3 a On average, height increases by 0.815
Exercise 6C
cm for each additional 1 cm increase in
1 C 2 C 3 A 4 D
arm span.
5 B 6 B 7 E 8 B
b i Females: r2 = 64.6% c i Slope = $1525.80. On average the
ii Males: r2 = 69.9% price of bitcoin is increasing by
iii Since the value of the coefficient of $1525.80 each month.
determination for males (69.9%) is ii $92 837
higher than the value for females iii Unreliable as we are extrapolating
(64.6%), then we can say that arm several years beyond the period in
span is a better predictor of height which the data were collected.
for males than for females.
c i The models predict that when both Chapter 7
have arm span measurements of
160 cm, a male will be 1.8 cm Exercise 7A
taller than a female.
1 a 2, 8, 14, 20, 26 b 5, 2, −1, −4, −7
ii The models predict that when both c 1, 4, 16, 64, 256 d 64, 32, 16, 8, 4
have arm span measurements of 2 a 6, 14, 30, 62, 126
190 cm, a female will be 4.6 cm b 24, 16, 12, 10, 9 c 1, 2, 5, 14, 41
taller than a male. d 124, 60, 28, 12, 4
iii The differences predicted are 3 a 4, 6, 8, 10, 12 b 24, 20, 16, 12, 8
not reliable for a height of 160 c 2, 6, 18, 54, 162 d 50, 10, 2, 0.4, 0.08
cm as this value is outside the e 5, 13, 29, 61, 125
range of height data for males. f 18, 16.4, 15.12, 14.096, 13.2768
The prediction is not reliable for a 4 a 2, 5, 8, 11, 14 b 50, 45, 40, 35, 30
height of 190 cm as this value is c 1, 3, 9, 27, 81 d 3, −6, 12, −24, 48
outside the range of height data for e 5, 9, 17, 33, 65 f 2, 7, 17, 37, 77
females. g −2, −1, 2, 11, 38
4 a There is a moderate strength, non- h −10, 35, −100, 305, −910
linear association between expenditure 5 a 12, 57, 327, 1947, 11667
and score. b 20, 85, 280, 865, 2620
b y2 , log x, c 2, 11, 47, 191, 764
c i The linearity assumption. d 64, 15, 2.75, −0.3125, −1.078125
ii No, there is a clear structure in e 48000, 45000, 42000, 39000, 36000
the residual plot. If the linearity f 25000, 21950, 19205, 16734.50, 14511.05
assumption had been met the resid- 6 a A2 = 6 b B4 = 2 c C3 = 27
uals would have been randomly d D5 = 95
scattered around a horizontal line at 7 a V0 = 4, Vn+1 = Vn + 2
y = 0. b V0 = 24, Vn+1 = Vn − 4
d i score = 12.99 + 120.6 × c V0 = 2, Vn+1 = 3Vn
log(expenditure) 8 a V0 = 5, Vn+1 = Vn + 5
ii 495 b V0 = 13, Vn+1 = Vn − 4
5 a i $12 000 c V0 = 1, Vn+1 = 4Vn
ii $11 000 d V0 = 64, Vn+1 = 0.5Vn
b $52 208.29
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
804 Answers
7B → 7C
10 b $684
c 14 years
12 a $32 600
6 b M0 = 32 600, Mn+1 = Mn − 10
13 a 450, 449.95, 449.90, 449.85, 449.80
b $449
2 14 a $47 800, $47 600, $47 400
b $45 000 c 25 000 km
0 1 2 3 15 a 7200 b 72 000 c $720 d 10%
b Vn 16 C 17 C 18 B
Exercise 7C
1 a An = 4 + 2n, A20 = 44
30 b An = 10 − 3n, A20 = −50
20 c An = 5 + 8n, A20 = 165
d An = 300 − 18n, A20 = −60
2 a 5000 b $270 c Vn = 5000 − 270n
n d $7430
0 1 2 3
3 a 12 000 b $864
2 a V0 = 8000 b $320
c Vn = 12 000 + 864n d $19 776
c V0 = 8000, Vn+1 = Vn + 320
4 a $8000 b $512
3 a H0 = 41 000 b $2542
c i $14 144 ii 16 years
c H0 = 41 000, Hn+1 = Hn + 2542
5 a $2000 b $70
4 a V0 = 2000 b 14 years
c i $2420 ii 29 years
V1 = 2000 + 76 = 2076 6 a $5600 b $1260 c Vn = 5600 − 1260n
V2 = 2076 + 76 = 2152 d $1820
V3 = 2152 + 76 = 2228 7 a $7000 b $1225 c Vn = 7000 − 1225n
5 a $7518, $8036, $8554 b 6 years d 5 years
6 a i $15 000 ii $525 iii 3.5% 8 a $1700 b $212.50 c $850
b 29 years d $212.50 e 8 years
7 a $12 300 9 a $65 000 b $3250 c 5%
b C0 = 82 000, Cn+1 = Cn − 12 300 d $42 250 e 11 years
10 a $29 000 b $0.25 (25 cents)
c $24 000 d 96 000 km
11 a $9700 b $0.388 per km 2 a V0 = 6000
c Vn = 35 400 − 0.388n V1 = 1.042 × 6000 = 6252
d 74 000 km
V1 = 1.042 × 6252 = 6514.58
12 a i $0.026875 ii $69 687.50
V2 = 1.042 × 6514.58 = 6788.20
iii $20 156.25
b 7 years
b $9218.75
3 a V0 = 20 000
7D → 7E
c 1 486 400
V1 = 1.063 × 20 000 = 21 260
13 D 14 C
V1 = 1.063 × 21 260 = 22 599.38
Exercise 7D V2 = 1.063 × 22 599.38 = 24 023.14
b 7 years
1 a 2, 4, 8, 16, 32 4 a $5000 b 1.068
c V0 = 5000, Vn+1 = 1.068 × Vn
35 d $6947.46 e $1947.46
5 a $18 000 b 1.094
20 c V0 = 18 000, Vn+1 = 1.094Vn
15 d $25 783.50 e 4 years
5 6 V0 = 9800, Vn+1 = 0.965Vn
n 7 M0 = 28 600, Mn+1 = 0.926Mn
1 2 3 4
8 a V0 = 18 000, Vn+1 = 0.955Vn
b 3, 9, 27, 81, 243
b $17 190, $16 416.45, $15 677.71,
250 $14 972.21, $14 298.46
c $15 677.71 d $3701.54
9 a W0 = 4000, Wn+1 = 0.959Wn
150 b $3527.90 c 755.46
10 a S 0 = 13 420, S n+1 = 0.888S n
b $11 916.96, $10 582.26, $9397.05,
50 $8344.58, $7409.99
c $7409.99 d $1185.21
0 1 2 3 4 11 C 12 E 13 E 14 C
c 100, 10, 1, 0.1, 0.01
Exercise 7E
1 a Vn = 2n × 6, V4 = 96
70 b Vn = 3n × 10, V4 = 810
60 c Vn = 0.5n × 1, V4 = 0.0625
40 d Vn = 0.25n × 80, V4 = 0.3125
30 2 a i 3000 ii 10%
b Vn = 1.1n × 3000
n c $4831.53
0 1 2 3 4
1 C 2 E 3 D 4 C 5 A
6 a Vn = 0.905 × 38 500
6 A 7 D 8 A 9 B 10 A
b $23 372.42 c $15 127.58 11 D 12 C 13 C 14 B 15 A
7 6 years 8 100 years 94 16 B 17 C 18 D 19 B 20 C
10 6% 11 $9223.52 12 $32 397.18
13 C 14 E
Written-response questions
6 a V0 = 7600, Vn+1 = 1.005 × Vn 10000
b Vn = 1.005 Vn × 7600
c $7791.91 d 139 months
O 1 2 3 4 5
7 a V0 = 3500, Vn+1 = 1.02 × Vn n
b $3788.51 4 a $0.20
8 a 4.68% b 4.70% c Monthly b Let Vn be the value of the vacuum
9 a 8.25% b 8.24% c Monthly cleaner after cleaning n offices.
10 a A – 8.62%, B – 8.11% V0 = 650, Vn+1 = Vn − 0.20
b A – $3018.10, B – $2837.08 c $250
c B – this loan will be charged less interest 5 a $6575 b $6722.75 c 6.9%
11 a A – 5.43%, B – 5.61% 6 a V0 = 30 000, Vn+1 = 1.0075 Vn
b A – $7603, B – $7860 b $31 142.00
c B – this investment will earn more interest c $32 814.21 d $34 318.81
7 $234.57 10 8%
8 3.5 11 10%
12 C 13 B 14 C
Chapter 8 Exercise 8B
8A → 8B
Exercise 8A
2 a V0 = 2000, D = 339
1 a 2, 5, 11, 23, 47
b R = 1.005
b 50, 90, 170, 330, 650
c V0 = 2000, Vn+1 = 1.005Vn − 339
c 128, 96, 80, 72, 68
3 a B0 = 10 000, Bn+1 = 1.03Bn − 2600
2 a $500 b $100 c 1.03
b $5331
d V0 = 500, Vn+1 = 1.03Vn + 100
4 V0 = 3500, Vn+1 = 1.004Vn − 280
3 a $300 000 b $50 000
5 V0 = 150 000, Vn+1 = 1.0014Vn − 650
c 1.052
6 a V0 = 235 000, Vn+1 = 1.0001Vn − 150
d V0 = 300 000, Vn+1 = 1.052Vn + 50 000
b $234 620.46
4 a 1.003
7 a $2500 b $626 c 8%
b V0 = 3500, Vn+1 = 1.003Vn + 150
d $1117.03
c $3821.48
8 a $5000 b $865 c r = 12%
5 a V0 = 1700, Vn+1 = 1.008Vn + 100
d $3361.85
b $2395.38
9 a V0 = 20 000, D = 3375
6 V0 = 1500, Vn+1 = 1.0002Vn + 4
b R = 1.072
7 V0 = 24000, Vn+1 = 1.005Vn + 500,
c V0 = 20 000, Vn+1 = 1.072Vn − 3375
$27 766.81
10 a V0 = 750 000, D = 4100
8 a $2000 b $1000 c $4412.80
b R = 1.0045
d Vn ($)
c V0 = 750 000, Vn+1 = 1.0045Vn − 4100
4500 11 a V0 = 40 000, Vn+1 = 1.015Vn − 10 380
3500 b $10 217.70
12 a $5000 b $1030 c 12%
2000 d $2030.50 e $3090
1000 13 a $3052.65 b $6000
14 a $1 000 000 b $400 c 2.88%
n (years)
0 1 2 d $996 796.16
9 a $20 000 b $2000
15 a $18 400 b 6.6%
c $27 689.06
c $9762.84
d Vn ($) 16 D 17 E 18 A
n (quarters)
0 1 2 3
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
8C 808 Answers
Exercise 8C
d $13 740
e Payment number Payment Interest Principal reduction Balance
0 0.00 0.00 0.00 14 000.00
1 1800.00 1540.00 260.00 13 740
9 D
ISBN 978-1-009-11041-9
10 D 11 D © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Answers 809
Exercise 8D c $6061.91
4 a $21 867.22 b $12 095.13
1 $554.16
5 a $225 788.13 b $5452.89
2 $1262.60
6 a $34 093.96 b $344.64
3 $692.58
7 a Negative b $28 674
4 $771.27
c $6825.74
8D → 8F
5 a 180.53 b 1380.53 c $624.67
8 a $15 133.81 b $1732.49
6 a A = 345.69, B = 1.72, C = 343.97
9 a $7627.37 b $298.51
b $4131.23 c $131.23
10 E 11 E 12 B
7 a A = 3903.19, B = 34.82, C = 3868.37
b $31 227.69 c $1227.69
Exercise 8F
8 $12 165.50, $165.50
9 1 8.39%
2 a 2.7% b $741.19
c 60 months
250 3 a $500
Amount ($)
Exercise 8G 12 3.6%
13 4.8%
1 $46 615.21
14 a $1312 b $78 720 c $86 400
2 $178 558.60
d $1440 e 5.4%
3 a $81 939.67 b $67 141.09
15 C 16 B 17 D
4 $416.37
5 $338 807.90
Exercise 8I
6 a $5312.50 b $6500.67
Chapter 10
d i $1000 ii $12 000
iii $250 000
2 a $781.25 b $147 298.48 Exercise 10A
c 38 months
d i 41 payments ii $3323.07 1 a 2×3 b 1×3 c 3×2 d 3×1
e 3×3
3 a $656.65 b $13 134
9A → 10A
c $3134 2 a 2×3 b 4×1 c 1×3
4 a 40 months b $320.78 3 a 12 b 15 c 28
c $4770.78 4 1 × 12, 12 × 1, 6 × 2, 2 × 6, 4 × 3, 3 × 4
9 8
5 a $247.04 b $83 713.37
1 0 1 9
6 a $1 175 244.58 b 290 months 5 a b 3 5 c
2 3 0 1
c $3300
7 5
6 a square; 2 × 2; 4 b column 3 × 1; 3
c row; 1 × 4; 4
Chapter 9
1 3 5
1 0 0
7 a 0 4 7 b 0 1 0
Exercise 9A
0 0 2 0 0 1
1 E 2 C 3 C 4 E 5 C 6 D
1 0 0
7 E 8 D 9 A 10 D 11 D 12 D 3 0
c 0 1 0and
13 E 14 E 15 D
0 5
0 0 1
d First 3 are symmetric.
Exercise 9B
8 a C, E b 3 c A d B
1 a $8500 b $222.44 e 4, 2 f 3, 3 g 1, 5 h 3, 1
c A0 = 8500, An+1 = 1.013 × An i 4, 2 j 9 k 5 l 0
d 5.2% e 13 quarters m 1 n 0 o 4 p −1
2 a V0 = 25 000, q 3 r 3 s 1
V1 = 25 000 − 936 = 24 064,
1 2 10 3 11
V2 = 24 064 − 936 = 23 128, −2 −5
V3 = 23 128 − 936 = 22 192
2 4 4 −1 −4
b 36
3 6
0 −3
3 a $260 000
b × 100 = 5.4% 12
4 9
260 000 2 5 10
c A = 1156.71, B = 993.29, C = 256 053.46
9 16
4 a 204 b $29516.73 c The first
50 14 a b c
5 a $4400 b = $2.50 1 2 4 −2 1
c Hn = 4450 − 2.50n 0 −2 −4 6 −1
d $3950
e 581 3 1 2
6 a $5520 b 22 months
2 3
15 C 16 D 17 B 18 C 6
21 5 5
19 B
2 3
1 1
Exercise 10B
14 8 6
1 a
0 1 2
4 2 1
6 2 3 , 3 × 3 7 a b
0 1 0 0 1 0 1
d i 4 4 ii 8 a
1 1
16 104 86 24 124 100
1 1 A = B =
75 34 94 70 41 96
iii iv −2 2
0 0
40 228 186
b C =
0 0 0 ; the total
145 75 190
v vi 9 3
1 number of females and males enrolled
3 in each of the three programs for the
vii viii two years
0 4 16 0 4 16
8 20 14
c D = ; the increase in the
12 8 4 12 8 4
−5 7 2
ix −2 10 x not defined number of females and males enrolled
4 a b in each of the three programs for the
5 5 −3 −1
two years; a decrease in the number of
5 5 3 1
men enrolled in weights classes
c d e
0 0 48 248 200
9 8 1 d E =
6 7 1 140 82 192
f g h not defined 9 C 10 C 11 E
2 −2
3 3
5 a b Exercise 10D
−2.2 1.1 −0.2 −13.8
1 a i, ii, iv, v, vi, vii
7.7 4.4 1 −3.7
5 8 7 d 0.6 2
1 3.2 b i [6] ii [2] iii iv −3 7
16 0 3 1 0 −0.6 2
c i [6] ii iii [0]
−1 5 4
2 0 −2
6 x = 2, y = 4, z = 6, w = −4
1 0 −1
7 a
0 0 0
3.4 iv undefined
[0] b [1] c [3]
A = 3.5 B = 3.4 C = 2.6 D = 4.1 2 a
1.6 1.8 1.7 2.1 3
5 5 3
1 2 0
13.6; the total (yearly) DVD sales
for each store 3 a
1 2 3
6 7
−3 −6 −9
19 18
5 10 15
c d 110 000
4 0 −5 15 9
116 000
2 −2 −2 8 4
11 XY = It gives the total sales of
154 000
1 4 −5 8 7
4 a post-multiply by this matrix. 58 000
1 each
of the dealers.
1 29
pre-multiply by this matrix. 12 a , John took 29 minutes to eat
1 1 1
food costing $8.50
7 1 2 1 10
29 22 12
5 1 2 2 1 = 5 . b ,
8.50 8.00 3.00
8 1 4 1 13
John’s friends took 22 and 12 minutes
9 0 2 to eat food costing $8.00 and $3.00
6 1 1 1 1 7 3 = 18 10 9 respectively
8 3 4
79 78 80
7 a b
13 a b 0.3
22 8 10 6 14 11 80 78 82
c Semester 1: 79.2; Semester 2: 80.4
d Semester 1: 83.8; Semester 2: 75.2
12 e No, total score is 318.6
30 f 3 marks
14 5 5 15 20 50 75 , 2250 3625
8 9
9 3000 , ,
5 10 20 35 75 125 3625 5875
7 2800
15 5 7 2 19 17 40 , 23 97
, ,
7 26 19 59 40 137 97 314
4 2200
24 30 36
38 59 64
33 54 51
10 a 2 × 3
b i 17 a b
−1 5 −3 8
5 2 6 −3
ii the total revenue from selling c d
17 8
16 2
products A, B and C at Energy and
8 17
2 39
Nourishing respectively e
29 −5
c number of columns in P , number of
−5 13
rows in Q
18 A 19 C 20 D
Exercise 10E f
1 a i ii 10
1 0 1 0 0
−1 2
0 1 0 1 0
−3 5
0 0 1
iii 7 a X = A−1C
1 0 0 0
10E → 10F
b X = (AB)−1C = B−1 A−1C
0 1 0 0
c X = A−1CB−1 d X = A−1C − B
0 0 1 0
e X = A−1 (C − B)
0 0 0 1 f X = (A − B)A−1 = I − BA−1
8 x = −5000, y = 15 000, z = 0
1 2 1 0 1 2
b AI =
= ;
0 3 0 1
0 3 9
Spray P Q R
1 0 1 2 1 2 8
46 12
IA = = Barrels
0 1 0 3
0 3 13
39 13
∴ AI = IA = A 10 a
0.1 0.25 −0.4
0.3 −0.75 0.8
1 2 0 1 0 0 1 2 0 −0.2 0.5 −0.2
CI = 3 1 0 = 3
1 0 0 1 0 ; b
Product P Q R
0 1 2 0 0 1 0 1 2 Number per day 13.5 0.5 13
1 0 0 1 2 0 1 2 0
11 Brad 20; Flynn 10; Lina 15
IC = 0 1 0 = 3
1 0 3 1 0
12 A 13 E 14 D
0 0 1 0 1 2 0 1 2
CI = IC = C Exercise 10F
3 a 3 b −3 c 0 d −8
1 B only
4 a 10 2 b 20
11 − 3 2 a X= H U T S b n=4
9 18
1 −50 1
0 3 4
3 9 9 0
0 1 0 X
Matrix has no inverse, det (D) = 0
0 0 1 W
1 1 1
− − 1
0 0 0 Z
2 2 2
1 0 0 0 Y
0 1 1
5 a b
0 0 1
0 1 1 2 1 1
5 a b
−9 −3
−28 −15
C = 1 0 1 C 2 = 1 2 1
1 3 39 22
1 1 0 1 1 2
−14 9
d e
c 2
1 3.5
6 a There is no direct communication link
−9 8 0
between the towers.
b T 1 and T 3 2 a
c 1, 0 losers
d There is a 2-step communication link A B C D E
between T 3 and T 1.
A 0 1 1 1 1
e 6
B 0 0 1 0 0
T1 T2 T3 T4
winners C 0
0 0 0 0
1 1 1 0 T1 D
0 1 1 0 0
T = 1 2 1 1 T2 E 0 1 0 1 0
1 1 2 1 T3
A : 4, B : 1, C : 0, D : 2, E : 2
0 1 1 1 T4
A; D and E equal; B; C.
g T 1 and T 4
0 2 2 1 0
0 0 0 0 0
A 0 0 0 0 1
D2 = 0 0 0 0 0
B 1 0 0 0 1
0 0 0 1 0 0
C 1 0 0 1
0 1 2 0 0
D 0 0 1 0 1
0 3 3 2 1 9
E 0 1 1 0 0
0 0 1 0 0 1
8 D 9 D
T = 0 0 0 0 0 0
0 1 2 0 0 3
Exercise 10G
0 2 2 1 0 5
1 a
0 1 1 0
The tie can be broken using two-step
dominances to give the ranking
B 0 0 1 0
A, E, D, B, C.
C 0 0 0 0
A B C D E Score
D 1 1 1 0 3 a
A 0 0 1 1 0 2
D, A, B, C
B 1 0 1 0 1 3
D = C 0 0 0 1 0
b 1
D 0 1 0 0 0 1
A 0 0 1 0
E 1 0 1 1 0 3
B 0 0 0
0 1 0 1 0
C 0 0 0 0
1 0 2 3 0
D 0 1 2 0
D = 0 1 0 0 0
D, A, B, C
1 0 1 0 1
0 1 1 2 0
c A B C D E Score Chapter 10 review
A 0 1 1 2 0 4
Multiple-choice questions
B 2 0 3 3 1 9
D + D2 = C 0 1 0 1 0
2 1 C 2 D 3 B 4 D
D 1 1 1 0 1
4 5 D 6 A 7 B 8 C
10 review
E 1 1 2 3 0 7 9 D 10 E 11 A 12 B
The matrix D + D gives the following 13 D
2 14 E 15 A 16 D
ranking: 17 D 18 E 19 D 20 C
21 A 22 C 23 C 24 C
Rank Player Score
25 A 26 C 27 C 28 C
First Bea 9 29 E
Second Eve 7
Written-response questions
Equal Ann and 4
third Deb 1 a b
0 1 2 0 2 1 1
Fifth Cat 2
1 0 1 2 0 2 0
4 a 10
2 1 0 1 2 0 1
b Ash defeats Carl and Dot
1 0 1 0
Ben defeats Ash, Carl and Dot
Carl defeats Dot 0 0
0 0
Dot defeats Elle
2 a 2×1 b 1×2
Elle defeats Ash, Ben and Carl
c Yes; number of columns in C equals
c Ben = Elle, Ash, Carl = Dot
0 1 0 1
the number of rows in J.
0 0 1 0 d [162.41] ; 5 × 30.45 + 4 × 2.54 = 162.41
5 a 1 0 0 1 0
0 1 0 0
0 185.24
3 a 456 b 2×2
1 0 1 1 0
b E, B, A = C, D c
354 987
B =
0 2 2 1
314 586
0 0 1 0
6 a
b A d
688 1863
0 0 0 0 C = ; the total number of
527 1042
0 1 2 0
books of each type in the two stores
0 1 1 0 e i 2×1
0 0 1 1
7 a b A, B, D, C 31 236
0 0 0 1
18 021
1 0 0 0
iii total value of fiction and
8 A, B, D, C non-fiction books at bookshop 1
9 E, B, A = C, D
668 1752
f 2A =
10 C 11 A 12 D 426 912
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
11A → 11B 818 Answers
4 a 1×5 d
b i
90 135 165 150 60
A 0.45 0.35 0.15
R =
48 72 63 88 32 B 0.25 0.45 0.20
ii the number of students expected to C 0.30 0.20 0.65
get a C in Mathematics .
M P 2
c i F = 220
h i
O 0.96 0.98
B 0.04 0.02
ii FN = 220 197 = [195 040]
320 3
The total fees paid are $195 040.
F 0.80 0.14
5 a N= 8 6 1 b NG = [575]
P 0.20 0.86
c total number of points scored by Daniel
6 a 80 tonnes b 100 tonnes
4 B
c $186, 000
d i 3×1
Exercise 11B
ii The price per tonne of each of
the minerals 1 a
0.85 0.25
T =
0.15 0.75
iii 700
0.85 P I 0.75
Chapter 11
b 0.85 × 80 = 68 c 0.25 × 60 = 15
Exercise 11A d 0.15 × 120 + 0.75 × 40 = 48
2 a i 10% ii 80% iii 10%
1 a
A B b i 680 ii 85
A 0.40 0.55 c i 1150 ii 0 iii 0
B 0.60 0.45 d All (100%) of the sea birds who nest at
site A this year will nest at site A next
X Y year.
X 0.70 0.25
Y 0.30 0.75
3 a i 76440 ii 7560
c b i 5500 ii 1210 iii 266
4 a i 18 ii 6 iii 6
X 0.6 0.15 0.22
b i 84 ii 66 iii 30
Y 0.1 0.7 0.23
Z 0.3 0.15 0.55 c 66
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
Answers 819
d i 21 ii 3 iii 6 b i
e 180
S 2 136 S 3 = 132.1
5 B 6 E
257 242.9
Exercise 11C
S 7 = 129.7
1 a
398 225.4
S 1 = S 2 =
c See solutions
220 202
c 6 a
0.80 0.25
S 3 = T =
0.20 0.75
2 a b b
399.998 400
S 5 = S 7 = S 0 =
200.02 200
S 12 = c S 1 = , 420 to Jill’s and 380 to
3 a
S 4 = S 3 =
d S 5 = , 442 to Jill’s and 358 to
4780 4784
4 a i
130 ii
151 Pete’s
S 1 = S 2 =
170 149 e steady state solution: S s = , 444
iii 355.6
S 3 = to Jill’s
and 356 to Pete’s
0.90 0.60
7 a T =
0.10 0.40
T 5 =
0.27731 0.44538 1500
b S 0 =
c i
151 ii
165.7 500
S 2 = S 3 =
149 134.3 1650
c S 1 = , 1650 are happy and 350
S 7 = are unhappy
d See solutions 1712.55
d S 4 = , 1713 are happy and
5 a i ii 287.45
180 207
287 are unhappy
S 1 = 130 S 2 = 136
e steady state solution: S = ,
290 257 285.7
1714 are happy and 286 are unhappy
S 3 132.1 1200
8 a S 0 = 600
ii S 2 = T S 1 − B
b S 1 = 440 , 1270 are happy
0.6 0.2 100 −20
= · −
0.4 0.8 100 20
80 −20 100
c S 5 = 429.82 , 1310 are happy = − =
120 20 100
2 a i S 1 = 8500 ii 7300
d steady state solution: 429.1 , 1312
b A: 30 000, B: 0, C: 0; While the sea
are happy birds move between nesting sites each
9 A 10 E 11 A year, the ‘1’ in the transition matrix
indicates that, once a sea bird nests at
Exercise 11D site A, it continues to nest at this site.
Meanwhile, some of the birds who nest
80 68.8
1 a i ii at sites B and C each year will move to
120 131.2
site A until, in the long term, all birds
b i are nesting at site A.
S1 = T S0 + R
c i
9500 ii 9000 iii 8507.5
0.6 0.2 100 10
= · +
9500 9150 8912.5
0.4 0.8 100 5
11000 11850 12 580
80 10 90
= + = 3 C 4 B
120 5 125
ii Exercise 11E
S2 = T S1 + R
0.6 0.2 90 10 1 a i 1.9 ii 0.6
= ·
0.4 0.8 125
5 b
79 10 89
= + = 2.1
136 5 141 1.9
1 2 3 4
c i 0.7 0.5 0.6
S1 = T S0 − B
c i
510 ii 784.8 iii 208 276
0.6 0.2 100 −20
= · −
212.8 103 876
0.4 0.8 100 20
50 178.5 36 984
80 −20 100
= − = 60 21 15 815.8
120 20 100
d e 37 f 90
g i 111 ii 158 iii 235
600 h i 99 ii 145 iii 233
It appear that the population rate of
increase approaches 10% per year.
2 a
Further investigation confirms this.
5 a
0 b 0
3.1 0 1000
0 0.02 0 0
1 2 3 4
0.8 0.7 0.5
50 0 0.05 0
b 0.42 c 1000
0 0
1 2 3 1 2 3
0.6 0.75 0.02 0.05
d i ii
8 50 000 0
0 50
1 2 3 4
0.4 0.5 0.25
50 000
0 1.3 2.4 0 2.3 3 0
3 a 0.7 0 0 b 0.6 0 0
0 0.6 0 0 0.3 0
e i
50 000
0 1.4 2.6 0.6
0.5 0 0 0
c 5 50
0 0.4 0 0
50 000
0 0 0.05 0
4 a b
15 0 0.2 0.9 1.1 0
20 0.8 0 0 0 0
6 a i ii
269 356 iii 622
30 0 0.9 0 0 0
127 168 294
15 0 0 0.7 0 0
30 40 70
10 0 0 0 0.8 0
i 427 ii 565 iii 986
1.1 0 b At this stage the rate of increase is
0.2 approximately 5.7%
1 2 3 4 5
0.8 0.9 0.7 0.8 7 a i
1400 ii 700 iii
d i ii
240 840 420
48 58
18 22
21 21
12 19
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
11 review 822 Answers
b i
976 ii 1241 iii
1579 Chapter 11 review
460 586 745 Multiple-choice questions
91 115 146
1 B 2 A 3 C 4 B 5 B
c i
974 ii 1237 iii 1571
6 C 7 C 8 A 9 B 10 B
460 584 741 11 B 12 C 13 E
c 1.035 b
S 0 =
0.2 0.5 0.6 0.4 980
0.7 0 0 0 692
4 a
0 488
0.7 0 0
S 1 =
; 4360 fish in Lake Blue and
0 0 0.7 0 344
972 7467 5640 fish in Lake Green
12A → 12B
680 4762
c , 0.992 d
484 3038
S 3 = ; 4555 fish in Lake Blue
342 1938 5444.844
e Day 56 and 5445 fish in Lake Green
S s = ; 4590 fish in Lake Blue and
Chapter 12 5409.8
5410 fish in Lake Green
Exercise 12A 3
a 30
1 E 2 D 3 A 4 A 5 B 6 C
7 D 8 C 9 C 10 D 11 B 12 A 20
13 A 14 B 15 C 16 C 17 C 18 E
19 D 20 C 21 A 22 D 23 B 24 D 0.5 0.1 0.05
25 A b 50 c 920 d 5 weeks
e 5%
Exercise 12B f i 1667 ii 1706
g 2%
1 a 3×1 b 1×3 ii 95
4a i
c HC; because the number of columns in 100
H equals the number of rows in C
d i [696.72] b S n = T nS 1 c 5 weeks
ii the number of Australian dollars d 90
($696.72) that you would receive 5a i 3 × 1 ii k = 1.2
by converting your foreign b i A and B and A and C
currency into Australian dollars; ii D → B → A → C
HC = 102 × 1.316 + 262 × 1.818 + iii C → A
→B and C → D → B
516 × 0.167
566.21 c i S 1 = 75 ii 230
e 137.46 140
iii 70.56%
2 a
0.67 0.28
0.33 0.72
20 d C
d B = 15
6a 4 × 2 H
2 a b
b i 435
ii 72.4%
c L = Q × L = 5115
85 d
39 41.4
36 35.1
7a S 1 = S 2 =
21 20.1
24 23.4
3 a i b ii c ii
4 a B C
b S 10 =
b A B
(P) 43, (S ) 35, (D) 20, (W) 23
c 3
d 36 D
e 3.9 c B
f 47.4
8a 550
b The number of sandwiches sold in
week 3.
c Hamburgers $15, fish and chips $14 d not possible
sandwiches $12 5 a i v = 8, f = 6, e = 12
b i v = 6, f = 8, e = 12
d L= 1 0 1
c i v = 7, f = 7, e = 12
d i v = 5, f = 3, e = 6
e i v = 5, f = 6, e = 9
Chapter 13 f i v = 6, f = 4, e = 6
6 a 4 b 12 c 19
Exercise 13A 7 7 8 14 9 15
1 a i 3 ii 2 iii 1 10 C 11 E 12 C 13 B
b 14 14 E 15 C
c town D and town H
Exercise 13B 2 a A B b B
1 a
0 1 1 0
c A D
B 1 0 1 1
C 1 1 0 0
13B → 13C
D 0 1 0 0 C
3 C is an isolated vertex.
A 0 1 1 0
4 Leading diagonals will all be ‘1’.
B 1 0 0 1 5
C 1 0 0 1
A 0 1 1 1 1
D 0 1 1 0
B 1 0 1 1 1
C 1 0 1 1
A 0 1 0 0
D 1 1 1 0 1
B 1 0 0 0
E 1 1 1 1 0
C 0 0 0 1
D 0 0 1 0 6 E 7 A 8 E 9 A
d 10 C 11 B 12 E
A 0 1 1 1
Exercise 13C
B 0 1 1
C 1
1 0 1 1 a path b trail c path d walk
1 1 1 0 e trail f path
e 2 a walk b cycle c path d walk
e path f walk g h
A 0 1 1 0 0 0
3 a i Euler trail
B 1 0 0 1 0 0
ii A−B−E−D−B−C−D−A−E
C 1 0 0 1 0 0
b neither
D 0 1 1 0 0 0
c i Euler trail
E 0 0 0 0 0 1
ii A−C−E−C−B−D−E−F
F 0 0 0 0 1 0 d i Euler circuit
f ii A−B−C−E−D−C−A
0 0 0 0
e i Euler circuit
ii E−F−D−E−A−B−D−C−B−E
B 0 0 0 1
4 a A−B−C−F−I−H−E−G−D−A
C 0 0 2
b A−B−C−D−E−F−A
D 0 1 2 0
c A−B-D−C−E−A
5 F−A−B−C−D−E−H−G
6 a 2 b 7 c B Exercise 13E
1 a A − B − C − H, 160
b A − C − F − E − G − H, 53
c A − D − E − F − H, 385
d Vertices are not all even.
d A − B − E − F − I − H, 87
7 a v = 9, e = 12, f = 5
2 23 minutes
v−e+ f =2
b i Hamiltonian cycle
Exercise 13F
ii Lake Bolac − Streatham − Nerrin
Nerrin−Woorndoo−Mortlake 1 a 6
−Hexham−Chatworth− b 4 5 3 2
2 6
Wickliffe−Lake Bolac.
5 3 2
and the reverse of this 2 2
c i Eulerian circuit 6
3 5 2
ii Not all vertices have an even 3
degree 6
d i Lake Bolac - Wickliffe c 22, 20, 21 (Answers will vary)
A B 2 E
8 a Yes b Yes c No d Yes 3 F
e No f Yes g Yes 2
1 10
9 3 C D
b C
10 C 11 D 12 E 13 B B
14 A E 16
12 11
15 80
Exercise 13D G F
c B 18 C
1 a D–E b 17 minutes 10
c 8 minutes d 36 minutes 10 9
2 11 A D 47
3 a 34 b 56 c E–B–A–E, 22 d D
d A–E–F–G–I or A–C–F–G–I 70
C 100 200 G
4 a S –B–D–F, 12 100
b S − A − C − D − F, 10 80 90
c S − B − D − F, 15 90
d S − A − E − G − F, 19 730
5 19 km 3 490 m
6 B 7 C 8 A 9 E
Chapter 14
4 A 5 B 6 C 7 E
8 B
Exercise 14A
Chapter 13 review
1 a 3 b 8
Multiple-choice questions
2 C1 = 14, C2 = 12, C3 = 21
1 C 2 C 3 A 4 D
3 C1 = 12, C2 = 16, C3 = 16
13 review → 14B
5 A 6 C 7 B 8 B
9 E 10 B 11 A 12 A 4 a 9 b 11 c 8 d 18
b chocolate f A
Gloria start B D E G
vanilla F
Minh C J
Carlos peppermint I K
2 a H
Trevor butterscotch F
start m my
strawberry du finish
c 5 G
5 a W–D, X–A, Y–B, Z–C b D
b e.g., minimum cost is 11; W–A, X–B, A dummy finish
start E
Y–D, Z–C C
6 Dimitri 800 m, John 400 m, Carol 100 m, c R
Elizabeth 1500 m dummy finish
start S
7 Joe C, Meg A, Ali B V
8 A–Y, B–Z, C–X, D–W Q T
9 Champs Home, Stars Away,
d D
Wests Neutral; or Champs Neutral, Stars B G
H finish
Away, Wests Home. start dummy
Cost = $20 000 C
10 A Mark, B Karla, C Raj, D Jess; or A
Karla, B Raj, C Mark, D Jess; 55 km a
Activity Immediate predecessors
11 D 12 A 13 E 14 D
A −
B −
Exercise 14C
1 a B D D A
start finish E B, C
b P R
start finish b
Activity Immediate predecessors
c T V P −
start Q P
d F I S Q
start J finish T Q
H U S, V
e K M V R
start N finish W R
L O P X T, U
c f
Activity Immediate predecessors Activity Immediate predecessors
J − A −
K − B A
O K F C, D
Q L, M H E, F, G
S O, R J I
d 4 a Remove panel.
Activity Immediate predecessors
b ‘Order component’ and ‘Pound out dent’
A −
5 a
B −
C A Activity Immediate predecessors
D A A −
E D, B B −
C −
F C, E
G D, B
E B, F
e G B, F
Activity Immediate predecessors
P − H D, E
Q −
J I, K
U R N J, L
W S, T b A−D−H−M
Y W c B−E−H−M
Z V, X, Y B−E−H−I−J−N −O
B−G−K − J −N −O
B−G−L−N −O
d C−F−E−H−M
C−F−E−H−I−J−N−O 8 a D, F, G b 13
C −F −G−K − J −N −O c Activity H lies on the critical path and
C −F −G−L−N −O if delayed, the completion time of the
b b B − E − G − I, 21 hours
Activity ES T LS T
c 18 hours
P 0 11
2 a A−B−F −H b 21 days
Q 0 10
c 20 days d $100
R 0 0
3 a B−E−H−J b 2 hours
S 4 15
c 6 hours d 14 hours
T 5 15
14E → 14 review
4 a 4 b 17 hours c $1200
U 7 18
V 12 12 5 a 22 days b $870
W 12 21 6 a C, D, H
X 16 16 b B, E, H, I, J
Y 29 29 c i 21 days ii $450
C, 6
5 a A, 4
3 a Cut 1 does not isolate the source from
E, 10
B, 5 G, 4
b 26 c 22
D, 7 F, 5 4 Start Finish
b 12 c 1 hour d 4 D G I
e B−D−E−H f 27 hours B F
g i B−D−F −G−H
ii 22 hours
5 a 9 b 7 c 1
h D, H must be in that order
d B−D−E −G e 15
6 a A → 1, D → 4, F → 10, K → 12,
Chapter 15 b B–C–E–G–J–K
7 a i 2.1 km
Exercise 15A ii PQRT S U or PRQS T U or
1 A 2 E 3 E 4 B
5 C 6 B 7 A 8 B b i R−Q−P−R−T−Q−S−T−U−S or
9 E 10 C 11 E 12 D R−Q−P−R−T−S−Q−T−U−S
13 D 14 B 15 E 16 A ii travel each road only once
17 C 18 C 19 C 20 E 8 a None of the edges overlap.
21 B 22 A 23 D 24 C b 7 + 6 − 11 = 2 c C d 297 km
25 B 26 C 27 C 28 E e no f 79 km g 127 km
h 187 km
Exercise 15B 9 a 5 b 24 hours c 7 hours
10 11 megalitres/day
1 a 14 b 3 c 3
11 a 112 km
e 5 + 5 = 7 + 2, 9 = 9
b i minimum spanning tree
2 ii M
P 1 1 0 1 L 31
Q 1
0 2 1 35 S N
R 0 2 0 1
S 1 1 1 0 R
47 O
R iii 293 km
c 306 km
12 a A, B, C
b LST for B is 1, EST for E is 10, LST b time
for I is 18 c i IQR = 6.2 seconds
c i A−D−F−I−J ii 27 months ii Upper fence = 28.2 + 1.5 × 6.2 =
d i B−C−D−F−I−J ii 25 months 37.5
13 a A–Z, B–W, C–X, D–Y, or A–Z, B–X, d 10 people
C–W, D–Y e From this information it can be con-
16A → 16B
b $130 cluded that the time taken to complete
14 a 15 weeks b $8500 c 3 the task is associated with the number
of distractions. The median time
Chapter 16 taken by the group who completed
the task with no distractions was 25.0
Exercise 16A seconds, faster than the group with a
few distractions which has a median
Data analysis, probability and statistics
time of 26.2 seconds, which was in
1 A 2 C 3 B 4 D 5 C turn faster than the group with many
6 B 7 B 8 C 9 A 10 A distractions which took a median time
11 E 12 E 13 B 14 B 15 A of 29.2 seconds to complete the task.
16 D 17 B 18 B 19 A 20 E 3 a r2 = 84.8%
Recusion and financial modelling b 84.8% of the variation in fuel con-
sumption can be explained by the
21 C 22 B 23 D 24 A 25 B
variation in speed.
26 C 27 B 28 D 29 A 30 C
c 9.0 litres/100 km
d slope = 0.0218. On average, for each
31 B 32 B 33 A 34 E 35 D additional 1 km/hr increase in the
36 D 37 E 38 D 39 E 40 A speed of the car, the fuel consumption
41 B increases by 0.0218 litres/100 km.
Networks e predicted value = 8.40, actual
value= 8.30. Thus residual= −0.10.
42 C 43 D 44 B 45 D 46 E
4 a There is a strong, non-linear re-
47 B 48 C 49 C 50 D 51 C
lationship between efficiency and
Exercise 16B b log y, , x2
c log(efficiency) = 0.0205 + 0.0860 ×
Data analysis, probability and statistics
1 a mean = 54.042, stand dev = 2.717 d 6.6
b z = −1.1 5 a
c i 48.6 kg ii 2.5%
Q1 Q2 Q3 Q4
2 a EV: number of distractions, RV: time
SI 1.01 1.15 1.32 0.52
Year Q1 Q2 Q3 Q4 2791.33
2022 62 60 61 63 ii S 50 = 2577.88
6 a 130.6 cents/litre
715 735.9 iii A, E, F, H
416 429.0 iv 18 months
16 a A − B − D − E − C − A
c S 0 = =
, S
b B − C − E − D − E − C − A − B − D;
317 326.7
247 Must start and end at a vertex of odd
144 148.2
c 9:54 am
13 a 3 × 1
17 a i 4
b C × A = [17900]; the total cost of the
A Office
c The product = [11600]; the total cost 20 20
of the stalls and dress circle. 35
60 20 30
Networks 20
B 30
14 a i 40 40 15
45 35
2 30
iii A, B, D
b i
A 20 Office
3 20
ii 1 + 4 + 1 + 2 + 2 = 10
40 B
b i Vertices D and E are odd. 15
ii E and F
iii E − F − D − E − A − B − C − D C
c i Capacity = 20 + 25 + 30 = 75
ii $44 400
ii Maximum flow = minimum cut
c i A Office
= 15 + 15 + 30 = 60 20
15 a i 2
ii C 20
b A on breastroke, 15
B on backstroke,
C on butterfly D
c i 7 months 30
ii ii $20 400
E 8 10
D 8 8 D,7 G 15 15
18 a
E, 10
C 6 6
G,4 G, 4
B, 8
A,3 E,9
A, 5 D, 15
A 0 3 dummy,0
I,2 21 21 H, 1 I, 3
I 19 19
H,9 C, 3
B 0 0 B,6 F, 6
F 6 7 F,3
H 9 10 b 32 weeks
ISBN 978-1-009-11041-9 © Peter Jones et al 2023 Cambridge University Press
Photocopying is restricted under law and this material must not be transferred to another party.
16B 836 Answers