Man V Machine: Greyhound Racing Predictions: MSC Research Project Data Analytics
Man V Machine: Greyhound Racing Predictions: MSC Research Project Data Analytics
Predictions
Alva Lyons
x15014274
School of Computing
National College of Ireland
I hereby certify that the information contained in this (my submission) is information
pertaining to research I conducted for this project. All information other than my own
contribution will be fully referenced and listed in the relevant bibliography section at the
rear of the project.
ALL internet material must be referenced in the bibliography section. Students
are encouraged to use the Harvard Referencing Standard supplied by the Library. To
use other author’s written or electronic work is illegal (plagiarism) and may result in
disciplinary action. Students may be required to undergo a viva (oral examination) if
there is suspicion about the validity of their submitted work.
Signature:
Date:
Penalty Applied (if
applicable):
Man v Machine: Greyhound Racing Predictions
Alva Lyons
x15014274
MSc Research Project in Data Analytics
21st January 2017
Research Questions
Abstract
The purpose of this research is to ascertain if machine learning techniques can
prove advantageous in predicting the outcome of greyhound races. The main focus
of this research is in bridging the gap between existing sports prediction models
which use manual feature selection to creating a model built from machine chosen
subsets by algorithmically sub-setting the feature space. Feature selection is the
process of sub-setting the feature space by analysing the relevance of features both
to each other and to the predicted variable so that only the most relevant fea-
tures are used within the modelling framework. The reason for introducing the
greyhound racing expert is to test whether the system can outperform the average
social gambler who tends to make their betting selection based on tips given to
them by domain experts.
1 Introduction
The greyhound racing industry in Ireland is controlled by the Irish Greyhound Board
(IGB). It is estimated that 720,000 visitors annually attend IGB controlled greyhound
stadia. Shelbourne Park is the premier greyhound stadium in Ireland and hosts one of
the world’s richest greyhound races, The Irish Derby, every September. Racing takes
place in Shelbourne Park every Wednesday, Thursday and Saturday.
This research will attempt to predict the finishing position of a greyhound in a given
race. The data used in this research comprises of 64,908 observations of 10,986 races ran
in Shelbourne Park between January 2009 and August 2016. The prediction rate of this
model is bench-marked against that of the stadium’s resident greyhound expert who is
employed by IGB to predict the winning greyhound, the top 2 and the top 3 finishing
greyhounds for the top of the race card for each race on a given race night.
1
The use of machine learning techniques in sports prediction is not a new phenomenon
but rather it has gained many more practitioners since the spread of online gambling
markets. While the use of machine learning techniques is prevalent in predicting horse
racing (Butler et al. (1998), Silverman and Suchard (2013), Davoodi and Khanteymoori
(2010), Williams and Li (2008) etc.) results there have only three documented cases of
utilising machine learning in predicting the outcome of greyhound races. While the two
sports are often synonymous there are distinct differences between the two which ensures
that modelling concepts need to be amended. A greyhound race result is the outcome
of 6 greyhounds chasing a mechanical hare in their attempts to catch it; while a horse
race result is the outcome of the interactions between a jockey and its mount as they
traverse the race course. While this might seem trivial, the key difference is apparent
when considering a model’s attempts at predicting the finishing positions of competitors
in a race. A greyhound is bred to chase the hare and will continue its mission even if the
race has already been won. On the other hand, a jockey which surmises it has no chance
of finishing in the first x positions, may choose to pull back so that the horse’s handicap
rating is not affected for its next race. This nuance is one of the factors that led this
researcher to choose greyhound racing as the sport of choice for this research.
The seminal paper in the field of greyhound racing predictions, Chen et al. (1994),
dates back to 1994 and uses the knowledge of a greyhound expert for feature selection in
choosing which performance variables to use when running machine learning techniques
on their dataset. Both of the follow up studies utilise a similar feature selection in their
models (Schumaker and Johnson (2008), Johansson and Sönströd (2003)). This research
paper uses feature selection algorithms to limit the problem space of the domain in order
to avoid the subjectivity of human interactions within the modelling process.
Additionally, this paper combines various data mining techniques from sentiment ana-
lysis through to deep learning ensemble methods in its attempts to test if a machine
learnt model can out-preform an expert in the area of greyhound racing predictions. The
rest of this document is laid out as follows:
• Section 2 discusses the related work in the field of sports predictions and highlights
the role this research plays within this field.
• Section 3 discusses the methodology framework used in completing this research.
• Section 4 reviews and justifies the implementation steps carried out in this re-
search.
• Section 5 evaluates the results of the prediction algorithms.
• Section 6 concludes the research and discusses potential future work to be carried
out.
2 Related Work
2.1 Sports Predictions
The literature and academic work produced in the field of sports predictions is far reach-
ing. Using historical results data to predict the outcome of sporting events has gained
exposure due to the growth of on-line betting markets and the large volumes of historical
data which are easily accessible.
Cain et al. (2003) carried out analysis to test whether the favourite long shot bias that
is prevalent in horse racing betting markets is found in other sports. The results of their
analysis shows that betting repeatedly on favourites results in smaller losses than betting
on long shots in most sports; with the exception of soccer and greyhound racing, Lyons
(2016).
Many of the works performed on predicting results of horse and greyhound races focus
on the model used for prediction and it’s tuning parameters rather than the selection of
the feature subset (Pudaruth et al. (2013), Davoodi and Khanteymoori (2010), Williams
and Li (2008)). Their feature subset are listed but the motivation behind choosing which
features to include in their model is not elaborated on. This lack of formal explanation
for the feature subset doesn’t allay the assumption of human subjectivity in the choosing
of what performance variables affect the outcome of these sporting events.
McCabe and Trevathan (2008)’s paper focuses more on the feature set than the model
used in sports prediction. This paper provides an interesting discussion on why variables
were included in the model however they are very vague on the potential ”subjective”
variables not added. Similar to the papers listed above feature selection in the research
by McCabe and Trevathan is a manual process and does not use machine learning to
choose the optimal subset of features to include in the modelling.
2.3.1 Historical Feature Selection Techniques in Greyhound Racing Predic-
tions
Chen et al. (1994) in their prediction of greyhound racing results chose their feature
set following discussions with domain experts who informed them which 10 performance
variables they believed were most important in predicting winners. They admit that
while this is not optimal it is a consequence of their chosen algorithms being unable to
handle noisy data. Remarkably neither Johansson and Sönströd (2003) nor Schumaker
and Johnson (2008), in their follow up studies, chose to research further attempts at
feature selection. Rather they used a similar feature subset to those used in the study by
Chen et al.. (Lyons (2016))
3 Methodology
The methodology used in this research is Knowledge Discovery in Databases (KDD). The
KDD methodology allows for an iterative approach to the processes involved in extracting
knowledge from raw data. Initial plans were to utilise the SEMMA notation, as developed
by SAS, but the sequential nature of this methodology couldn’t rival the flexibility and
interactivity of KDD, Azevedo and Santos (2008). KDD focuses on the entire process
from data selection through pre-processing, extraction, data mining to interpretation,
Fayyad et al. (1996). An illustration of the KDD methodology as it pertains to this
research is shown in Figure 1
3.2 Pre-Processing
The benefits of a thorough pre-processing phase ensure a strong knowledge of the dataset
is gained before transformations commence.The data from IGB’s website contains numer-
ous inaccuracies and missing data points ensuring the pre-processing phase of the KDD
methodology plays an integral role in this research. Errors in the data are discovered
when the data is examined using visualization and descriptive statistics.
Domain knowledge plays an important part in deciding what steps to take in handling
missing values. For instance, a NULL value in the Seed column does not represent a
true missing value. A greyhound’s seed indicates their preferred running style; Inside (I)
seeded greyhounds tend to run toward the bend; Middle(M) seeded greyhounds tend to
run in the centre of the track; Wide(W) seeded greyhounds run toward the rails. The
lack of one of these characters in the seed column is likely to indicate that a greyhound
has no preferred running style. For this reason the missing data points in this column
are replaced with ”A” (Any).
Table 1: % of Missing Values in DogRaceHistory Table that are attributed to Time Trials
DogRaceHistory Table
Field Responsible for % of Missing Data
Weight 100
NumberOfDogs 100
WinTime 100
Going 100
PlacedDistance 99.9
RunnerGrade 86.90
RaceGrade 73.1
SP 70.5
Remarks 61.2
SectionalPosition 46.9
SectionalTime 44.5
EstimatedTime 8.4
Time trials take place on race nights before racing commences whereby 1 or more
greyhounds run around the track in a non competitive setting to see how fast they can
chase the mechanical hare. Due to the non competitive nature of these events it is deemed
appropriate to remove these from the dataset. Removal of time trials significantly reduce
the number of missing data points in the DogRaceHistory table.
Similiarily, litter distribution data in the Dogs table, which depicts the number of starts
and race placings of a greyhound’s siblings, is also inadmissible as it is that of a view of
the to-date total of the litter at the time of scraping rather than the time of racing.
While omitting this data does have its benefits in that it cuts down the processing time
of feature selection in the data mining phase; it restricts the model, in that it does not
have access to the same data available in real time to the greyhound expert the model
will be bench-marked against.
3.3 Transformation
3.3.1 Text Analysis
The remarks column in the DogRaceHistory table provides a shorthand of comments on
how a greyhound ran in a given race eg. FAw (Fast Away), BBkd (Badly Baulked),
TRec (Track Record) etc. The full list of 320 possible remarks can be found on the IGB’s
website 2 .
2
https://www.igb.ie/upload/pdf/Abbreviations.pdf
In order to analyse these remarks across the dataset the basic premise of sentiment
analysis is performed such that the text is classified as expressing a positive or negative
tenet, Liu (2010). While sentiment analysis deals with ”the computational treatment of
opinion, sentiment, and subjectivity in text”, Pang and Lee (2008), and is considered to
be more suitable for text mining of unstructured datasets; the simplicity and power of
this method is deemed appropriate to analyse this variable.
The first step in utilising the premise of sentiment analysis is to create domain specific
lexicons of the shorthands used in the remarks column. Once the lexicons are created
confirmation is received from 2 experts working within the greyhound racing industry
to ensure subjectivity is minimized during this phase. These dictionaries are created
by assigning each shorthand comment into one of 5 categories; Very Positive, Positive,
Neutral, Negative, Very Negative.
The motivation for utilising this variable and performing text analysis lies with the
possibility that the greyhound’s ability is not properly reflected by it’s finishing position.
For instance, a greyhound may only finish 5th in a race despite being quick out of the
traps due to being impeded by another greyhound. By the same respect the 1st placed
finisher in this race might have missed the early fighting and received a clear run despite
being slow away.
In order to run sentiment analysis on this column the 5 dictionaries are loaded into
MySQL tables and the remarks column is scored using an SQL statement which scans
each of these tables and matches the word in remarks to those in the dictionaries. A
scoring is given to the words depending on which category they fall into:
• Positive = +1 point
• Neutral = 0 points
• Negative = -1 point
The scoring for each remark is totalled and set as the RemarkScore for each greyhound
in a given race.
Similar formulae are used to transform other variables in the raw data. Figure 3
depicts a table of the transformations that took place in this phase of the research meth-
odology.
The application of predicting the outcome of a competitive event does not strictly
fall into either a classification or a regression problem and as a result both regression
and classification techniques are possible within the realms of this research domain. As a
classification problem the output variable can be a matter of predicting the binary output
of win or lose. As a regression problem it is possible to look at the finishing order of a
race with the view to regressing on the FinishingPosition variable.
This research approaches the problem of predicting greyhound racing results as a clas-
sification problem. However, rather than choosing binary classification of ”win” or ”lose”
it attempts are made to classify a greyhound’s Finishing Position. The reasoning for not
choosing binary classification is partly due to class imbalance; for each race 6 greyhounds
are entered and 5 greyhounds cannot win as such the number of observations in the lose
class in the training set is larger than that of winners and random predictions could result
in a higher rate of prediction due to chance alone. Additionally, by choosing to classify
the problem using the Finishing Position as the predictor variable this allows for testing
how wrong a predicted class is. For instance, incorrectly predicting a 1st place finisher
will finish in 2nd place is ”less wrong” than predicting the same greyhound will finish
in 6th place. The choice of algorithms and justification for their uses is discussed in the
implementation section of this paper.
4 Implementation
4.1 Tools Used
The tools used in implementing this research are:
• MySQL
• R (Version 3.3.1)
• R Studio 64bit
• Amazon EC2
Python is used in the selection phase of this research to scrape the raw data due to
the power of its BeautifulSoup library; which provides an easy to use framework for
parsing HTML into a tree representation. The data mining phase utilises R; a statistical
programming language which is widely used for data analysis, Lantz (2013).
• Filter Methods - are concerned with exploring only the inherent features of a
dataset. They are based on statistical tests and are independent of the variable to
be predicted.
• Wrapper Methods - Unlike filter methods wrapper methods are used to find
features subsets which interact with the variable to be predicted. In this way the
choosing of a wrapper method is closely linked to the choosing of a modelling
algorithm as the feature subset space is wrapped around the classifying model,
Saeys et al. (2007).
This research focuses on wrapper and embedded methods as they interact with the
variable to be predicted, are less likely to get stuck in a local optima and model feature
dependencies. The limitations of these methods, however, lie in the increased risk of
over-fitting, Saeys et al. (2007).
The caret packages provides a set of predefined functions to embed RFE with al-
gorithmic functions; such as Naı̈ve Bayes; Random Forests; and Bagged Trees. These 3
functions are modelled on the dataset in order to attempt to find the optimal subset of
features to use in prediction.
1. Naı̈ve Bayes is a classification algorithm which is based on Bayes Theorem and as-
sumes independence amongst the feature space. The running of a recursive feature
elimination using Naı̈ve Bayes limits the feature subspace to 3 features; Finishing-
PositionAvg5, OverallAvgTime, Avg2ndBend. Examining the flattened correlation
matrix in figure 4 tells us that the basic assumptions of Naı̈ve Bayes are violated
in the our dataset, in that the features are not independent. Therefore, the results
of this analysis are dismissed for the remainder of this research.
2. Random Forests are the result of combining decision trees such that each tree
depends on the values of a randomly sampled independent vector whereby the
entire forest is distributed homogeneously Breiman (2001). The output of the ran-
dom forest RFE is shown in Appendix C. This depicts the top 10 features se-
lected to be NumberOfDogs, TimeAvg5, OverallAvgTime, PrizeMoneyWonAvg5,
DogsAge, FinishingPositionAvg5, SecTimeAvg5, PlacedPercent, RankedGradeAvg5
and Avg2ndBend5.
3. Tree Bagging is an ensemble method which uses decision trees to generate multiple
versions of a predictor and aggregates the result, Breiman (1996). The output of the
Tree Bagging RFE method is shown in Appendix D. The top 10 features returned
embedding RFE with Tree Bagging are DogsAge, Weight, TimeAvg5, SecTimeAvg5,
OverallAvgTime, RaceNumber, RankedGradeAvg5, PrizeMoneyToDate, BreakAvg5
and TrapNumber.
4.3.5 Combined Feature Selection Results
The results of the caret & randomForest wrapper, embedded treebagging and embedded
randomForest are combined to create an optimal subset of the data for use in the final
stage of the modelling process. The top 10 features of each of these methods are combined
in order to ascertain if there are any features which are prevalent across all feature
selection methods. The ranked table is shown in Table 2.
Features Selected
caret & randomForest RFE with RandomForest RFE with TreeBagging
DogsAge NumberOfDogs DogsAge
OverallAvgTime TimeAvg5 Weight
SecTimeAvg5 OverallAvgTime TimeAvg5
TimeAvg5 PrizeMoneyWonAvg5 SecTimeAvg5
Weight DogsAge OverallAvgTime
RankedGradeAvg5 FinishingPositionAvg5 RaceNumber
PrizeMoneyWonAvg5 SecTimeAvg5 PrizeMoneyToDate
BreakAvg5 PlacedPercent BreakAvg5
RaceNumber RankedGradeAvg5 RankedGradeAvg5
DaysSinceLastRace Avg2ndBend5 TrapNumber
The features highlighted in red indicate those that are selected as a top 10 ranking
feature across all 3 feature selection methods. These 5 variables are selected for the
final feature subset. Five more features for this subset are selected by choosing the
highest ranked features amongst the 3 methods deployed. The transformed variable,
AvgRemarks5, resulting from the Text Analysis phase of this research (section 3.3.1),
fails to make the cut despite finishing amongst the top 13 features across all feature
selection methods.
Resulting Optimal Feature Subset: The final feature subset selected for the model-
ling phase consists of DogsAge, OverallAvgTime, SecTimeAvg5, TimeAvg5, RankedGradeAvg5,
Weight, RaceNumber, PrizeMoneyWonAvg5, BreakAvg5 and Avg2ndBend5 .
5 Evaluation
5.1 Model Performance
For the purpose of modelling the dataset is split in a 60/20/20 ratio for training, valid-
ation and testing and all variables are scaled to reduce variance across the dataset. An
important design criterion for model performance is choosing the correct parameters. It
is important to ensure that while these parameters are tuned the test set is not utilised
so as to avoid the model learning from iterations over the test set.
5.1.1 Neural Network
In order to emulate the research in the field of greyhound racing by Chen et al. (1994)
and Johansson and Sönströd (2003); who used shallow neural networks in their predic-
tions; the optimal feature subset chosen following feature selection is inputted into a
Deep Learning Neural Network using the H2O package in R. Deep Learning reduces the
complexity of an algorithm but is better suited to larger datasets.
This research initially utilises deep neural networks in its attempts at classifying the
finishing position of a greyhound. A deep neural network is ran several times across
different subsets of the data and the average prediction performance is found to be 18.92%.
The use of shallow neural networks improves the performance of the feature subset;
using just one hidden layer on subsets of the dataset improves the average prediction
performance to 19.89%. The performance of the shallow neural network approaches the
prediction rate of Chen et al. (1994), whose model correctly predicted 20% of winners.
Similar comparisons cannot be made on Johansson and Sönströd’s paper as their model
is assessed against monetary gain on bets rather than win percentages.
Neural Network: A neural network was chosen for the modelling phase of this project
to analyse whether algorithmically limiting the feature space is more, less or equally as
effective as using domain expert knowledge in choosing an optimal subset of features. As
discussed in section 2.3.1, Johansson and Sönströd and Chen et al.; who limited their
feature space by seeking advice on what variables to add into their model by the same
experts they hoped their model would outperform; both used neural networks for their
modelling phase. This research shows that algorithmic and manual feature selection in
the domain of greyhound racing when combined with the neural network model have
comparable results, with the manual feature selection performing 0.11% better; as shown
in table 4.
The second aim was to build a system to predict greyhound racing results. As discussed
in section 5.2.2, with an increased number of races the random probability of correctly
predicting the winner in a greyhound race should reach 17.05%. The models are tested
across 20% of the dataset, 2197 races, which should bring the probability close to 17%.
Table 3 shows that all 3 machine learning models preform better than random chance;
thus highlighting that the use of machine learning techniques is advantageous in predicting
greyhound racing results.
Finally, the system built was to be bench marked against a human expert in greyhound
predictions. The system which is created in this research needs further tuning to rival the
hit ratio of the human expert. While the model’s performance is improved during tuning
the results in table 4 depict that the use of shallow neural networks correctly predict
3.81% less winners than the human expert.
It can be concluded that machine learnt feature selection must at all times be accom-
panied by domain knowledge; it is in combining the two that an optimal feature set can
be obtained. Although feature selection plays an important role in this research it is only
1 step within the iterative framework. It alone, cannot adequately account for a model’s
success or failure; rather the amalgamation of domain knowledge, feature engineering,
feature selection and model selection when combined optimally ensure success.
2. Deep learning algorithms improve with added data. A future focus to improve
model performance could be to generalise the feature selection across all 28,271
races scraped from tracks throughout Ireland.
Acknowledgements
This research would not have been possible without the help of my supervisor Mr. Oisı́n
Creaner, alongside Mr. Michael Bradford and Dr. Simon Caton, whose help and
guidance throughout the completion of this Masters programme proved invaluable. I
would like to take this opportunity to thank them sincerely for their generosity of time
and spirit.
I would also like to thank the Irish Greyhound Board who generously allowed me
to use their data in the completion of this Masters programme.
Without the continued support of my family and friends I would not have had the
courage nor strength to be able to complete my Masters. Particularly, I would like
to dedicate this thesis to my mam, Eileen Lyons, who sadly passed away before its
completion. Ar dheis Dé go raibh a h-anam.
References
Azevedo, A. and Santos, M. F. (2008). Kdd, semma and crisp-dm: a parallel overview.,
in A. Abraham (ed.), IADIS European Conf. Data Mining, IADIS, pp. 182–185.
URL: http://dblp.uni-trier.de/db/conf/iadis/dm2008.htmlAzevedoS08
Breiman, L. (1996). Bagging predictors, Machine Learning 24(2): 123–140.
URL: http://dx.doi.org/10.1007/BF00058655
Breiman, L. (2001). Random forests, Machine Learning 45(1): 5–32.
URL: http://dx.doi.org/10.1023/A:1010933404324
Brownlee, J. (2014). Machine learning mastery.
URL: http://machinelearningmastery.com/discover-feature-engineering-how-to-
engineer-features-and-how-to-get-good-at-it/
Butler, J., Tsang, E. P. K. and Sq, C. C. (1998). EDDIE beats the bookies.
Cain, M., Law, D. and Peel, D. (2003). The favourite-longshot bias, bookmaker margins
and insider trading in a variety of betting markets, Bulletin of Economic Research
55(3): 263–273.
URL: http://dx.doi.org/10.1111/1467-8586.00174
Chen, H., Rinde, P. B., She, L., Sutjahjo, S., Sommer, C. and Neely, D. (1994). Expert
prediction, symbolic learning, and neural networks: An experiment on greyhound ra-
cing., IEEE Expert 9(6): 21–27.
URL: http://dblp.uni-trier.de/db/journals/expert/expert9.htmlChenRSSSN94
Dash, M. and Liu, H. (1997). Feature selection for classification, Intelligent Data Analysis
1: 131–156.
Dash, M. and Liu, H. (2003). Consistency-based search in feature selection, Artificial
Intelligence 151(1–2): 155 – 176.
URL: http://www.sciencedirect.com/science/article/pii/S0004370203000791
Davoodi, E. and Khanteymoori, A. R. (2010). Horse racing prediction using artificial
neural networks, Proceedings of the 11th WSEAS International Conference on Nural
Networks and 11th WSEAS International Conference on Evolutionary Computing and
11th WSEAS International Conference on Fuzzy Systems, NN’10/EC’10/FS’10, World
Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin,
USA, pp. 155–160.
URL: http://dl.acm.org/citation.cfm?id=1863431.1863457
Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P. (1996). The kdd process for extracting
useful knowledge from volumes of data, Commun. ACM 39(11): 27–34.
URL: http://doi.acm.org/10.1145/240455.240464
Guyon, I., Weston, J., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer
classification using support vector machines, Machine Learning 46(1): 389–422.
URL: http://dx.doi.org/10.1023/A:1012487302797
Han, J. (2005). Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA.
Hand, D., Mannila, H. and Smyth, P. (2001). Principles of Data Mining, A Bradford
book, MIT Press.
URL: https://books.google.ie/books?id=SdZ-bhVhZGYC
Johansson, U. and Sönströd, C. (2003). Neural networks mine for gold at the grey-
hound racetrack, Proceedings of the International Joint Conference on Neural Networks
pp. 1798 – 1801 vol.3.
Liu, B. (2010). Sentiment analysis and subjectivity, Handbook of Natural Language Pro-
cessing, Second Edition. Taylor and Francis Group, Boca.
Lyons, A. (2016). RIC research proposal for greyhound racing predictions modelling.
Pang, B. and Lee, L. (2008). Opinion mining and sentiment analysis, Found. Trends Inf.
Retr. 2(1-2): 1–135.
URL: http://dx.doi.org/10.1561/1500000011
Pudaruth, S., Medard, N. and Dookhun, Z. B. (2013). Article: Horse racing prediction
at the champ de mars using a weighted probabilistic approach, International Journal
of Computer Applications 72(5): 37–42. Full text available.
Saeys, Y., Inza, I. n. and Larrañaga, P. (2007). A review of feature selection techniques
in bioinformatics, Bioinformatics 23(19): 2507–2517.
URL: http://dx.doi.org/10.1093/bioinformatics/btm344
Williams, J. and Li, Y. (2008). A case study using neural networks algorithms: Horse
racing predictions in jamaica., in H. R. Arabnia and Y. Mun (eds), IC-AI, CSREA
Press, pp. 16–22.
URL: http://dblp.uni-trier.de/db/conf/icai/icai2008.htmlWilliamsL08
Witten, I. H., Frank, E. and Hall, M. A. (2011). Data Mining: Practical Machine Learning
Tools and Techniques, 3rd edn, Morgan Kaufmann Publishers Inc., San Francisco, CA,
USA.
A Python Script Flow Chart