Using Bookmaker Odds To Predict The Final Result of Football Matches
Using Bookmaker Odds To Predict The Final Result of Football Matches
Using Bookmaker Odds To Predict The Final Result of Football Matches
discussions, stats, and author profiles for this publication at:
2 4,349
2 authors, including:
Jacek Grekow
Bialystok University of Technology
All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Jacek Grekow
letting you access and read them immediately. Retrieved on: 24 October 2016
Using bookmaker odds to predict the final result
of football matches
Abstract. There are many online bookmakers that allow betting money
in virtually every field of sports, from football to chess. The vast majority
of online bookmakers operate based on standard principles and estab-
lish the odds for sporting events. These odds constantly change due to
bets placed by gamblers. The amount of changes is associated with the
amount of money bet on a given odd. The purpose of this paper was to
investigate the possibility of predicting how upcoming football matches
will end based on changes in bookmaker odds. A number of different clas-
sifiers that predict the final result of a football match were developed.
The results obtained confirm that the knowledge of a group of people
about football matches gathered in the form of bookmaker odds can be
successfully used for predicting the final result.
1 Introduction
The purpose of this paper is to investigate the possibility of predicting how
upcoming sporting events will end based on changes in bookmaker odds. Football
was the sport chosen for observation of changes in bookmaker odds. It should be
assumed that if a gambler risks his own money, he has reasons to place such a
bet. The greater the amount of gambled funds, the greater the change of the odds
and greater possibility that the bet was based on factual knowledge about the
competing teams, the status of the players, games played, etc. Predictions of the
result can be based on such types of information. If the research should provide
promising results, one might be tempted to build a decision-making system that
could allow predicting final results based on observation of fluctuations of odds.
2 Previous Works
There are several papers that have dealt with similar problems of analysis and
prediction of sporting event results. They are based on various types of data
such as expert knowledge, results of previous matches, rankings of teams or
bookmaker odds.
2 Karol Odachowski, Jacek Grekow
A group of papers directly referring to this paper are those addressing the
problem of using data mining techniques to predict the final result of a sports
match. An analysis of data of National Basketball Association (NBA) seasons
was used to develop the expert system, which predicts the winner in a sport
game [1]. The analyzed data contained detailed statistics of each game played
during a season. The best accuracy (67%) was achieved by a classifier built using
a multinomial logistic regression model with a ridge estimator. Miljkovic et al.
[2] presents a system that uses data mining techniques in order to predict the
outcomes of basketball games in the NBA league. To predict the game result
the Naive Bayes method is used. Besides the actual result, the system calculates
the spread for each game by using multivariate linear regression. Each game was
described with attributes composed of the standard basketball statistics (field
goals made, field goals attempted, 3 pointers, free throws, rebounds, blocked
shots, fouls, etc), and information about league standings (number of wins and
losses, home and away wins, current streak etc). The system correctly predicted
the winners of about 67% of the matches. McCabe and Trevathan [3] used Arti-
ficial Neural Networks to predict games. They used attributes that indicate the
quality of a particular team and achived 54.6% correct predictions for the En-
glish Football Premier League and 67.5% for Super Rugby. Smith et al. [4] used
the Bayesian classifier to predict Cy Young Award winners in American base-
ball. The model was crated based on player statistics data collected for baseball
seasons from 1967 to 2006. The accuracy of the Bayesian classifier was more
than 80% correct.
4 Input Data
The input data describing the changes in bookmakers odds was obtained from the
PinnacleSports [5] website, which makes public any information about sporting
events in a clear form of an XML document. The XML file can be found at
Using bookmaker odds to predict the final result of football matches 3
Fig. 1. Sample chart of odds changes (home, away, draw) in 1-X-2 type bets
4 Karol Odachowski, Jacek Grekow
Fig. 2. Odd changes for the Racing Genk vs. Loceren (2:1) match
We observed that the closer to the start of the match, the more changes
in the odds occurred. Figure 1 illustrates such a situation. This is a chart of
values of the odds for the home team, the visiting team, and a draw over time
(Y axis) during the last 10 hours (X axis) before the Tottenham Hotspur vs.
Chelsea match, which was held on 12th December 2010 and ended with a 1-1
draw. Figure 2 presents another example of changes in bookmaker odds for the
Racing Genk vs. Loceren (2:1) match, which was held on 3th April 2011.
We decided that it would be justified to divide the sampling period into
several smaller ones, because the irregularity of the distribution of the changes
may indicate that the entire sampling period does not have the same effect on the
final result. For each period we generated the same set of features. Additionally,
the entire sampling period was also taken into account. This allowed us to extract
general information about the match. Figure 3 shows a schematic diagram of such
a division.
in this file is the decision class, which is the result of the match and adopts the
nominal values from the set: Win-home, Win-away, Win-draw. It defines the
final outcome of the match. For the input data prepared in such a manner, clas-
sifiers were developed allowing to predict the final result. To analyze the data
and the development of classifiers, a data mining task software WEKA [7] was
used. Cross-Validation Folds 10 (CV-10) were used to evaluate the classifiers.
5 Experiment Results
To make the data collected from the PinnacleSports and Betfair sites useful
for data mining purposes, they had to go through pre-treatment in the form of
transformation and cleaning of the collected information. The overall objective
was to minimize so-called GIGO (garbage in - garbage out) - the reduction of
”garbage” that enters the model so that the model could minimize the number of
incorrect results [8]. For this purpose, the study included only those events that
had odds in the full 10-hour sampling period and had not been postponed. An
equal number of matches for each decision-making class was included in order to
offset the number of instances from each class [9]. Thus a total of 1116 sample
football games were selected, including: 372 matches that ended with a win for
the home team; 372 matches that ended with a win for the away team; 372
matches that ended with a draw.
Six classification algorithms were selected: BayesNet, SMO, LWL, Ensemble-
Selection, DecisionTable and SimpleCart [7]. For attribute selection the following
attribute evaluators and search methods were used: CfsSubsetEval with Best-
First, CfsSubsetEval with LinearForwardSelection and PrincipalComponents with
Ranker. The highest accuracy rate of 46.51% was achieved by the DecisionTable
algorithm. The confusion matrix for the created model is presented in Table 1.
Table 1. Confusion matrix of classifier for a win for the home team, the away team or
a draw
a b c ← classified as
260 65 47 a = Win-home
154 154 64 b = Win-away
173 94 105 c = Win-draw
Matches that ended with a win for home team (Win-home class) are classified
very well in comparison with the two other classes. Most of the matches which
ended with a win for the away team were classified slightly worse. In this case
Using bookmaker odds to predict the final result of football matches 7
a big mistake occurred due to a mistaken classification as a win for the home
team. The worst is the classification of matches that ended in a draw, which are
mostly classified incorrectly as a win for the home or the away team. This is
because a draw is a middle class between the two results.
Binary Classifier for a Win for the Home Team. When developing a
classifier for the home team win, just as before (section 4.1), we used 1116 sample
football matches. Matches which ended with a win for the home team remained
unchanged, but the matches that ended with a win for the visiting team and
a draw were combined to form a new class. Then, we randomly discarded 372
matches to make the number of the instances in each class equal. Below is the
size of the two classes: 372 matches that ended with a win for the home team
(Win-home class); 372 matches that ended with a win for the away team or a
draw (Win-no-home class).
Six classification algorithms were selected: BayesNet, SMO, LWL, Bagging,
DecisionTable, and LadTree. For attribute selection the following attribute eval-
uators and search methods were used: CfsSubsetEval with BestFirst, Consis-
tencySubsetEval with GreedyStepwise, WrapperSubsetEval (classifier: Bagging)
with BestFirst. The highest accuracy rate of 70.56% was noted by the Bagging
algorithm, which obtained this result after feature selection (WrapperSubsetEval
with BestFirst) and after discretization of attributes. The confusion matrix for
the created model is presented in Table 2.
Table 2. Confusion matrix of binary classifier for a win for the home team
a b ← classified as
229 143 a = Win-home
76 296 b = Win-no-home
Binary Classifier for a Win for the Away Team. The accuracy of predict-
ing a win for the away team proved to be a bit more difficult than predicting
a win for the home team. The classifiers achieved worse results, but as in pre-
vious studies, a positive influence of feature selection and data discretization
was observed. The highest accuracy rate of 65.46% was noted by the Bayesian
NaiveBayes algorithm. The confusion matrix for the created model is presented
in Table 3.
8 Karol Odachowski, Jacek Grekow
Table 3. Confusion matrix of binary classifier for a win for the away team
a b ← classified as
244 128 a = Win-no-away
129 243 b = Win-away
a b ← classified as
196 176 a = Win-no-draw
144 228 b = Win-draw
Due to the fact that predicting a draw is difficult, we decided to perform addi-
tional tests on data that do not contain instances of matches ending in a draw.
This allowed creating a classifier that could enable predicting a win for the home
or the away team. This information can be used to place Asian handicap bets,
where in the case of a draw the betting amount is returned.
Matches that ended in a draw were discarded from the 1116 football matches
sample set. Matches that ended with a win for the home or the away team were
left unchanged. Below is the size of the two classes: 372 matches that ended with
a win for the home team (Win-home class); 372 matches that ended with a win
for the away team (Win-away class).
Six classification algorithms were selected: BayesNet, VotedPerception, Ibk,
Bagging, DecisionTable, and LADTree. For attribute selection the following at-
tribute evaluators and search methods were used: CfsSubsetEval with BestFirst,
ConsistencySubsetEval with BestFirst, WrapperSubsetEval (classifier: Naive-
Bayes) with BestFirst.
Removal of matches that ended in a draw from the sample data set proved to
be very beneficial. Classifiers predicting a win for a home or away team obtained
the highest accuracy taking all the conducted studies into account. The classifier
that proved to be the most accurate was an algorithm based on the Bayesian
Using bookmaker odds to predict the final result of football matches 9
Table 5. Confusion matrix of classifier for win for the home or the away team
a b ← classified as
298 74 a = Win-home
147 225 b = Win-away
The evaluation performed on the classifiers built for 1-X-2 type bets showed
that a draw is the most difficult to predict. This study confirms the reality of
football, because the draw class determines the intermediate odd between a win
for the home and the away team. Tests showed that features describing a draw
contain many similarities to those relating to a win for the home or the away
team. Matrices of classification errors in the study of the standard data set show
that most matches which ended in a draw are incorrectly classified as a win for
the home team. This is due to the fact that in most cases, the home team is the
favorite (has the lowest odd).
In the case of binary classifiers, the accuracy of predicting a win for the home
team and the away team is promising. The classifier of a win for the home team
achieved an accuracy of 70.56%. Once again the classifier of a draw had the
worst results. The best independent classifier was the classifier of a win for the
home or away team; the accuracy did not deteriorate with matches which ended
in a hardly recognizable draw. The achieved accuracy of this classifier is very
satisfying. This classifier can be used for Asian handicap bets, where in the case
of a draw the betting amount is returned.
In most cases, feature selection resulted in increasing the accuracy of clas-
sification. We observed that the features were selected from all the sampling
intervals. A selection frequently used features concerning the minimum and max-
imum values, angles to these values, derivatives, the differences between the first
and last samples in the interval, and the largest drops in the value of odds be-
tween adjacent samples. This indicates that these features were most important.
Discretization in most cases also had a very positive influence on the results of
classification. Below are the best classification algorithms that have been selected
to predict the final results of new football matches. A summary of accuracy of
the developed classifiers is presented in Table 6.
10 Karol Odachowski, Jacek Grekow
6 Conclusions
The results obtained, an effectiveness of 70%, are quite satisfactory and prove the
existence of a relationship between changes in the bookmaker odds values and
the outcome of the football match. These results confirm that the knowledge of a
group of people about football matches gathered in the form of bookmaker odds
can be successfully used for predicting the final result. Based on our research
results, one could build a decision-making system that could allow predicting
final results based on observation of fluctuations of odds. In further work on
the system, new features describing changes of the odds should be investigated,
which would probably contribute to improving the accuracy of the system.
1. Zdravevski, E., Kulakov, A.: System for Prediction of the Winner in a Sports Game.
ICT Innovations 2009, Part 2, pp. 55–63 (2010)
2. Miljkovic, D., Gajic, L., Kovacevic, A., Konjovic, Z.: The use of data mining for
basketball matches outcomes prediction. 8th International Symposium on Intelligent
Systems and Informatics, pp. 309-312 (2010)
3. McCabe, A., Trevathan, J.: Artificial Intelligence in Sports Prediction. Proceedings
of the Fifth International Conference on Information Technology: New Generations,
IEEE Computer Society, pp. 1194-1197 (2008)
4. Smith, L., Lipscomb, B., Simkins, A.: Data Mining in Sports: Predicting Cy Young
Award Winners. Journal of Computing Sciences in Colleges, vol. 22, issue 4, Con-
sortium for Computing Sciences in Colleges, pp. 115-121 (2007)
5. Pinnacle Sports,
6. Betfair,
7. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The
WEKA Data Mining Software: An Update. SIGKDD Explorations, Volume 11, Issue
1 (2009)
8. Larose, D.T.: Discovering Knowledge in Data: An Introduction to Data Mining.
Wiley Interscience, pp. 28 (2005)
9. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press (2001)
10. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and tech-
niques. Morgan Kaufmann, San Francisco, CA, USA (2005)