Statistics: Paper 4040/01 Paper 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

General Certificate of Education Ordinary Level

4040 Statistics November 2009


Principal Examiner Report for Teachers
UCLES 2009
STATISTICS


Paper 4040/01
Paper 1


General Comments

The overall standard of work submitted was very similar to that in recent years, and as last year there were
fewer exceptionally high marks and fewer very low marks than had been the case previously.

Unfortunately, one improvement which had been noted last year was reversed this year. Numerous
examples were encountered of marks being lost through final answers not being given to the levels of
accuracy required by questions, when this was stated. Some of the errors seen suggest that there may be
candidates who are unaware of the difference between 'significant figures' and 'decimal places'. Another
area which continues to show no sign of improvement, and resulted in the widespread needless loss of
marks, was that in parts of questions requiring comments, many candidates continued to produce general
comments that had obviously been learned by rote from a textbook or similar source, when what was clearly
asked for was a specific comment in the context of the question. Examples of both these types of error are
mentioned in the comments on individual questions. These errors, and others, continue to be due to
candidates not reading questions sufficiently carefully.

The question paper for this session was in an entirely new format, and there are a number of matters of
which Centres should make candidates fully aware. It is the intention that answers should be presented on
the question paper. Extra paper (lined or graph) should not be issued or requested as a matter of course at
the start of the examination, but only if a candidate genuinely requires it, e.g. to complete a written answer or
calculation, or to re-draw a graph on which a mistake has been made. Further, candidates should ensure
that where an answer line is clearly given, or where a question requires an answer to be placed in a
particular position, e.g. in a table, this is where they write their final answer, as that is what will be marked as
such. Because this session was the first in which the new format was used for this subject, a correct final
answer was allowed to score wherever it was seen, but this will not generally be the case in future.


Comments on Individual Questions

Section A

Question 1

Most candidates scored full marks. A minority created their own scale for (ii) and, provided their pictogram
was correct against that, it scored.

Answers: (i)(a) 20; (b) 45.

Question 2

Both parts of the question were answered very poorly, particularly (i). For example, it was very obvious that
many had no idea even of what a pilot questionnaire is, let alone how it is used. The only two purposes for
which marks were awarded were to test whether respondents understood the questions, and to test whether
the responses provided the information required. However, any comment which could, even loosely, be
interpreted as implying one of these two points was awarded a mark. Similarly in (ii), the only allowed
'advantage' was cheapness compared to interviewing costs. (No matter how large, postage costs are less
than those of transporting, accommodating and paying interviewers.) The only permitted 'disadvantage' was
a likely low response rate. As with (i), any comment which implied the point being looked for scored a mark.
No comment related to 'time' was permitted as either an advantage or disadvantage, as it is impossible to
argue its case definitively in either direction.


1
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
Question 3

Here was the first example in this paper of many candidates failing to read a question sufficiently carefully. A
majority of candidates failed to interpret correctly the words printed in bold type, i.e. that the figures given
were cumulative frequencies (which therefore needed to be un-cumulated to answer the question). The only
mark available to such candidates was the method mark for obtaining the mean. Most candidates who did
interpret the question correctly scored very well.

Answers: (i) 12 11 11 9 8 14 12 17 17 12 17; (ii) 12.7, 12.

Question 4

Few candidates scored full marks, but most obtained some. However, many missed out on probably the
'easiest' mark in the entire paper, that awarded for presenting their results, whether correct or not, in a table,
i.e. a simple array of rows and columns.

Answers: (i) 4; (ii)(a) 1, 2, 3, 4; (b)


Question 5

A majority of candidates scored full marks for this question, clearly making correct deductions from the
information given. Almost all candidates scored more than half marks.

Answers: (i) 103, 103; (ii) 13, 13; (iii) 37, 39, 89, 77, 30.

Question 6

The syllabus for this subject just refers to 'crude and standardised rates', and while questions set on the topic
almost always involve death rates, this is not essentially so, and candidates need to be prepared to cope
with other contexts, such as the accident rates involved in this question. Most did so, because of course the
'arithmetic' involved in calculating the rates is the same whatever the context. Many of the comments given
in answer to (iii) did, however, refer to death rates.

The purpose of the instruction in (ii) to show full working for at least one of the categories was twofold; firstly
to enable the Examiner to check whether a correct method was being used, so that if so, any error in the final
answer was obviously solely due to a calculation error, and secondly to indicate to candidates that they did
not need to show full working for all the categories. A small number took the question to mean that only one
category had to be considered, and the others not at all. Very few correct explanations were given in (iii).
Most simply detailed the different methods of calculation of the crude and standardised rates. The reason
being looked for was that the structure of the employment categories in the factory was not the same as that
in the standard population. (Were these two structures to be the same, then despite their different methods
of calculation, the two rates would be equal.)

Answers: (i) 125 per thousand; (ii) 141.10 per thousand.

Section B

Question 7

Most candidates scored both the marks available in (i). A pleasingly large number of candidates scored the
first mark available in (ii), by interpreting the steepest part of the cumulative frequency curve as that part of
the distribution with the highest frequency, but few then scored the second mark by answering exactly what
the question asked, i.e. what information this gave about the precision of the process. Any valid comment
using either of the words 'precision' or 'accuracy' was allowed to score.

As well as testing knowledge associated with various percentile values of a distribution, the numerical parts
of this question were designed to investigate how well candidates could cope with a variable of which the
values contained a large number of decimal places. Most candidates coped very well indeed. However, in
answering questions of this type, candidates need to be aware that an Examiner needs to see some
indication of their method, either in the way of calculation, or by points/lines being clearly marked on the
graph, or both. The usual principle applies; an incorrect result, with no indication of method, scores no
marks.
N 1 2 3 4
P(N) 0.4 0.3 0.2 0.1
2
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009

It is often the intention of paper setters that the final two or three marks of a section B question should be
designed to test the ability of the very best candidates. This was the case with (vi) of this question, where
only a minority appreciated that the zero point of the cumulative frequencies of accepted components was
the value 11 on the vertical axis of the graph.

Answers: (iii)(a) 5.0074 or 5.0075; (b) 5.0055; (c) 5.0105 5.0108; (iv) 26; (v) 66;
(vi)(a) 5.0077; (b) 5.0064.

Question 8

This was the least popular question in Section B by a considerable margin, and of those attempting it,
relatively few scored more than one or two marks for (a) and (b). A wide variety of errors occurred in
answers offered to (a), but one particular error was almost always the cause of loss of marks in (b), this
being the failure to realise that, for example, the probability that the fourth selection was of the blue disc
needed to include the probabilities of the previous three selections all having been of red discs. The context
in (c) was more 'traditional' and answers offered for this part were generally more successful.

Answers: (a)(i) 1 2 3 4 5 7 8 9 10 11 12; (ii) 1/6; (iii) 1/36; (iv) 1/108; (b) 0.1, 0.1, 0.1;
(c) 0.48.

Question 9

The comment required in (i) needed to state or imply that the standard gauge readings constituted the
independent variable. Work associated with the graph was generally of a high standard. Very few
candidates gave the required reason in (iii). Relating the observations to the value of the mean is not
correct; it is perfectly possible for more than half the observations to be on one side of the mean. Neither is it
sufficient to refer simply to 'ascending or descending order'. It needs to be stated explicitly that it is the x
values which have to be ordered.

Despite the wording in (iii), a few candidates still sub-divided the observations by the order in which they
appeared in the table. In general, however, (iv), (v) and (vi) were well answered.

Very few candidates scored both the available marks in (vii). One mark was awarded for any valid comment
about the equation in context, (i.e. not just referring to 'gradient' and 'intercept'). The second mark required
an answer referring specifically to 'any necessary adjustments to the new gauge', as the question asked.

Answers: (iv) (21,26), (9.25,14.25), (32.75,37.75); (vi) y = x + 5.

Question 10

Most candidates were aware of the 'area is proportional to frequency' principle, and marks were awarded
either for the correct calculations being seen, or for rectangles of the correct height being drawn. There
appears to be no single universally-accepted principle of what constitutes the modal class of a grouped
distribution with unequal class intervals, some textbooks teaching that with the highest frequency, others that
with the highest frequency density. Both possibilities were marked as correct in (i)(b).

Many candidates scored well in (ii)(a), the most common causes of loss of marks being two particular errors
of accuracy, both of which are needless. The question requested both results correct to three significant
figures, and so any result not stated to that level of accuracy lost the final mark available for it. Also, many
candidates lost marks for the standard deviation because they used, in its calculation, the three-significant-
figure value of the mean rather than the full accuracy value which should always be used in intermediate
calculations.

Very few candidates scored any marks for (ii)(b), failing to realise that they needed to consider both the mid-
point and frequency of the merged group, and to discuss how these compared with the original values.

Answers: (i)(a) Heights 6, 32, 3,7 2,1 20.5, 3; (b) 60-under 70 or 80-under 100; (c) 8;
(ii)(a) 49.8, 21.3.

3
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
Question 11

This question provided a plentiful source of marks for most candidates. However, the most common errors
need to be mentioned. In (a)(ii) it was necessary to state explicitly in one way or another that hockey was
not included; it could not be implied. In (a)(iii) the question required the showing of all working leading to the
result, and so, if no working were shown, no marks were awarded, even if the result was correct.

In (b)(i) some candidates lost a mark needlessly through not stating the angle to the nearest degree. There
was the suggestion in a few cases of candidates appearing not to have realised that the question continued
onto a third page, despite the clear instruction 'turn over' at the bottom of the second page, and the fact that
by that stage only eight marks had been awarded.

Answers: (a)(i) 15; (iv) 6 and 11, 17 and 39, 34 and 3; (b)(i) 85; (ii) 5.6; (iii) 770.
4
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
STATISTICS


Paper 4040/02
Paper 2


General Comments

The overall standard of work submitted was a little below that of last year, this appearing to be due mainly to
the general standard of answers to questions on certain particular topics not being as good as is customarily
the case. This was particularly true of the work on the transformation of data in Question 7. The
combination of this question being both unpopular and not answered very well by most who attempted it, and
the customary unpopularity of the question on expectation, meant that many candidates struggled to give
four complete answers to Section B questions, and indeed, some only attempted three.

Centres should be aware that any topic may occur in either Section of the paper. Topics should not be
omitted from teaching in the belief that they will always occur in Section B and are therefore 'optional'.

Two customary causes of loss of marks were again prevalent, despite being mentioned in these reports year
after year. Numerous examples were encountered of marks being lost through final answers not being given
to the levels of accuracy required by questions, when this was stated. Some of the errors seen suggest that
there may be candidates who are unaware of the difference between 'significant figures' and 'decimal
places'. Another area which showed no sign of improvement, and resulted in the widespread needless loss
of marks, was that in parts of questions requiring comments, many candidates continued to produce general
comments that had obviously been learned by rote from a textbook or similar source, when what was clearly
asked for was a specific comment in the context of the question. These errors, and others, continue to be
due to candidates not reading questions sufficiently carefully, something which was specifically mentioned in
the report on this paper last year. Marks are only awarded for correct answers to what a question asks, not
to answers to questions which candidates wish to be asked.

The question paper for this session was in an entirely new format, and there are a number of matters of
which Centres should make candidates fully aware. It is the intention that answers should be presented on
the question paper. Extra paper (lined or graph) should not be issued or requested as a matter of course at
the start of the examination, but only if a candidate genuinely requires it, e.g. to complete a written answer or
calculation, or to re-draw a graph on which a mistake has been made. Further, candidates should ensure
that where an answer line is clearly given, or where a question requires an answer to be placed in a
particular position, e.g. in a table, this is where they write their final answer, as that is what will be marked as
such. Because this session was the first in which the new format was used for this subject, a correct final
answer was allowed to score wherever it was seen, but this will not generally be the case in future.


Comments on Individual Questions

Section A

Question 1

Any candidate who mentioned 'extending' the class half a gram at either end, or used the expression 'true
class limits', or even just quoted the values 99.5 and 109.5, was taken to have appreciated the point being
looked for. Some candidates used the word 'boundary', possibly trying to make the same point, but did not
explain sufficiently clearly exactly what they meant by a boundary.

Almost all candidates obtained the correct cumulative frequencies.

5
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
It was surprising how many candidates, having scored the mark in (i) by one correct means or another, then
failed to apply the same principle correctly to the upper quartile class in (iii) in order to obtain its true lower
limit and width. Many candidates did not determine the upper quartile item correctly; for a grouped
distribution with a total frequency of only 79, this is the 60th item.

Answers: (ii) 2, 15, 35, 57, 74, 79; (iii) 133.0.

Question 2

While quite a number of candidates realised that the sum of the probabilities of A and B had to be
considered, only a few gave the correct explanation. It was necessary to comment that the sum was greater
than (not just not equal to) 1, and therefore the two outcomes had to have an intersection, as they were two
possible outcomes of the same experiment.

The number of correct solutions to (ii) seen was very disappointing, given that it was simply an application of
the basic addition rule of probability for two events which are not mutually exclusive; that is P(A B) = P(A) +
P(B) P(A B).

It was most noticeable that the candidates who scored most marks in (iii) were those who drew a Venn
diagram or some similar means of representing the probabilities of the different possible combinations of
outcomes. (iii)(a) required [1 P(A B)] to be obtained. Many, however, just multiplied P(A') by P(B')
without any consideration of whether the two outcomes were independent, which they were not.

Many candidates regarded (iii)(b) as being simply a repetition of (ii), a very unlikely scenario in an
examination question! For those who had drawn a diagram, the correct method was obvious, that P(A B)
should be subtracted from the answer to (ii).

Answers: (ii) 0.85; (iii)(a) 0.15; (b) 0.45.

Question 3

A majority of candidates correctly identified the simple random (A) and systematic (D) methods in (i). These
are unbiased methods of sampling, yet very few correctly identified both as such in (ii). E was generally, and
correctly, identified as biased in (ii), and a valid reason for this given in (iii). This left methods B and C.
Almost no candidates at all identified the difference between the two, and the reason why it meant that C
was unbiased, but B biased. For each of the two-digit random numbers 00-99, the remainders 00-15 occur
three times, but those 16-41 only twice, and so under B the claims did not have an equal chance of
selection, this being the reason why the method was biased. Under C the third occurrence of remainders in
the range 00-15 was eliminated, hence it being unbiased.

Answers: (ii) Unbiased A C D, Biased B E.

Question 4

This was the best-answered question on the paper, with many scoring full marks and almost all more than
half-marks. Surprisingly the most-frequently incorrect answer was that to (iv), those candidates who gave it
as false considering that the range was an appropriate measure of dispersion, despite the smallest value
being further from the next-smallest than the next-smallest was from the largest.

Answers: (i) False; (ii) False; (iii) False; (iv) True; (v) True; (vi) False.

Question 5

(i) and (ii) required bar charts to be drawn which were correct and fully annotated. Many candidates lost
marks because of incomplete annotation. The vertical axis of bar charts should start from 0 and be linear
and unbroken. In (ii) some candidates presented a percentage sectional bar chart, for which 1 mark was
awarded if fully correct. (iii) required valid comments in the context of the question; general comments about
bar charts were awarded no marks.

6
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
Question 6

This provided probably the clearest example of candidates not reading questions sufficiently carefully. (i)
specifically asked for the mean and standard deviation of X (the result of applying an assumed mean to the
data). Yet an overwhelming majority of candidates (some of whom did obtain the values of X, which were
used in their calculations) gave as their answers the mean and standard deviation of the question's data.
The two standard deviations are of course equal, but credit was only given to solutions dealing explicitly with
values of X. In (ii) the marks were awarded both to the correct results, and to those obtained by correct
follow-through from (i).

Answers: (i) 0.63, 0.43; (ii) 299.63, 0.43.

Section B

Question 7

The three parts of this question involved different approaches to the scaling/transformation of data. Not only
did the question as a whole appear to be extremely unpopular, being attempted by fewer candidates than
attempted the question on expectation (which is unusual, as expectation is customarily by far the least
popular topic on papers in which it occurs), but attempts at (a) and (b)(i) rarely scored more than one mark.
Attempts at (b)(ii) were, in general, much better.

In (a) hardly any candidates realised that the basic fee was the equivalent of an assumed mean, and the per
subject fee the equivalent of a scale factor. Hence the new mean was obtained by subtracting the 'old
assumed mean', multiplying by the 'scale factor', and then adding back the 'new assumed 'mean', i.e. (230 -
20)(35/30) + 25 = 270. As a standard deviation is not affected by the addition/subtraction of a value, the new
standard deviation was simply 90 x (35/30) = 105. An alternative approach was to base calculations on the
number of subjects to which the mean value of $230 corresponded.

(b)(i) simply required the scaling of the candidate's marks from the raw maximum to the standardised
maximum.

Answers: (a) 270, 105; (b)(i) 116; (ii) 115.

Question 8

Of the minority of scripts on which this question had been attempted, answers fell into the customary three
categories for questions on this topic. There were candidates who interpreted the context correctly, and
applied a valid method throughout, scoring full or nearly full marks. There were those who got a short way
into the question, obtaining the basic probabilities and possibly the sequence of outcomes correctly, before
having no idea how to proceed further. Finally there were those who had little idea of the context, possibly
just getting as far as basing their probabilities on the number of sectors in the wheel rather than the sizes of
the sector angles.

About half of those who answered all or most of the parts of the question correctly lost the final mark through
failure to give their result to the nearest cent. $0.6 is not to the nearest cent, $0.60 is.

The final parts of the question required, at some point, consideration of the entry fee to the game, and this
was permitted to count whether it was done in (iv), (v) or (vi).

Answers: (i) 0.1, 0.5, 0.4;
(ii) (iii) (iv)
Sequence of outcomes Probability Amount won ($)
W 1/10 = 0.1 5
L 1/2 = 0.5 0
SA W 1/25 = 0.04 6
SA L 1/5 = 0.2 1
SA SA W 2/125 = 0.016 7
SA SA SA 8/125 = 0.064 3
SA SA L 2/25 = 0.08 2

(v) 0.08; (vi) Loss of 60 cents.
7
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009

Question 9

This was a very 'standard' index number question, and many candidates scored quite well on it, but a
considerable number of marks were lost for the two reasons mentioned in the general comments: lack of
accuracy, and non-contextual comments.

In (i) there were frequent incorrect attempts at rounding values to the nearest 50, for example 1333.8 was
often seen rounded to 1300 rather than the correct 1350.

In (iii) very few obtained the correct fuel price relative of 120. There were, as has been the case with similar
questions in the past, two very common causes of loss of marks in (v). Both involve the previously-
mentioned presentation of general non-contextual comments which have clearly been learned by rote, with
little thought being given as to their relevance. One involves superficial comment, such as 'the weights may
have changed'. What needs to be specified is the reason why the weights may have changed, and what has
caused such a change. The other involves comments which relate solely to prices, the most common being
some remark about inflation. What candidates who write this have obviously not realised is that inflation is
the result of rising prices, and the calculations they have already carried out have been based on new prices,
i.e. account has already been taken of any inflation which might have occurred. The simplest way for
candidates to make relevant comments which will earn marks is to refer, in context, to the quantities
involved. Examples of perfectly valid comments given here by some candidates were, he might have
travelled a different distance in 2007 than in 2004, and he might have bought a new car which had a
different fuel consumption than his old one.

Answers: (i) 450, 1050, 1350; (ii) 3:7:9; (iii) 114.2; (iv) 3220.

Question 10

This question was, in general, not answered as well as 'pure probability' section B questions have been in
the past. In particular, (vi) was answered correctly only very rarely; even many high-scoring candidates who
gained the other 14 marks with little difficulty were unable to answer (vi) correctly.

An error which cost some candidates a considerable number of marks involved misinterpretation of the
question. The probabilities were given in the form of percentages, and such candidates took this as meaning
that they had to consider 100 vehicles in a 'without replacement' scenario. While this enabled some method
marks to be obtained, all but 'follow-through' accuracy marks were lost.

As had been the case with question 2 (also a probability question) earlier in the paper, there were examples
here of some candidates believing that two different parts of the question were asking the same thing.
Again, this was the result of the question not being read sufficiently carefully. Some failed to identify the
difference between (iii) and (v). In both cases two vehicles did turn left, but whereas in (iii) the other vehicle
turned right, in (v) it could go in any direction other than left. Others regarded (i) and (vi) as identical,
despite it being given in (vi) that all three went in the same direction, whereas in (i) it was not.

Answers: (i) 0.008; (ii) 0.142; (iii) 0.165(375); (iv) 0.189; (v) 0.238(875); (vi) 0.0563.

Question 11

This was the best-answered question in Section B, being very 'standard' on the topic of moving averages.
However, there are a number of points of detail on which candidates are still losing individual marks
needlessly.

When using the word 'even' in an explanation of why moving average values need to be centred, it is
necessary to state exactly what it is that is 'even', i.e. that the number of values of which a cycle is comprised
is even, or that the period of the cycle is even. Just the mention of the word itself does not score.

In (iii) many of the explanations of why z could not be calculated were too ambiguous to be credited. As a
minimum it needs to be specified exactly what further information would be required for z to be calculable.

Where a question specifies that a graph axis should be started at a particular value, as in (iv), then any axis
which does not do so will automatically lose a mark.

8
General Certificate of Education Ordinary Level
4040 Statistics November 2009
Principal Examiner Report for Teachers
UCLES 2009
Although (v) specified the drawing of a single straight line, some candidates nevertheless joined consecutive
individual points by a 'zig-zag' line.

In (vi) it was pleasing to see that a majority of candidates had learned that the quarterly components should
sum to zero.

It became obvious that in a few centres candidates did not know the procedure to use in (vii), i.e. to read the
appropriate value from their trend line, and then to add the relevant quarterly component to it; (in this case of
course, the component was negative, and so its numerical value had to be subtracted.)

Answers: (ii) 246, 483, 59.375; (vi) 6.9.
9

You might also like