Introduction To Machine Learning
Introduction To Machine Learning
2
Why Machine Learning
3
Motivation
• “We are drowning in information,
but we are starved for knowledge”
- John Naisbitt, Megatrends
4
Solution: Machine Learning
• Hypothesis: pre-existing data repositories contain a lot of
potentially valuable knowledge
5
Applications: Widespread
• Online ad selection and placement
• Risk management in finance, insurance, security
• High-frequency trading
• Medical diagnosis
• Mining and natural resources
• Malware analysis
• Drug discovery
• Search engines
• Education
• Sport
• …
6
Draws on Many Disciplines
• Artificial Intelligence
• Statistics
• Continuous optimisation
• Databases
• Information Retrieval
• Communications/information theory
• Signal Processing
• Computer Science Theory
• Philosophy
• Psychology and neurobiology
• Linguistics
7
Data Science / BusA Landscape
Domain
Computing Statistics
expertise
9
AI, Machine Learning, Big Data
Statistics / ML
Artificial Intelligence
“Intelligent machines and Big Data, data
software” processing
Planning,
Reasoning,
Decision Making
6 9x(x ) (z _ ¬y))
10
This item: The Martian by Andy Weir Paperback $8.92
The Revenant: A Novel of Revenge by Michael Punke Paperback $9.52
The Revenant: A Novel of Ready Player One: A Novel The Life We Bury The 5th Wave: The First The Boys in the Boat: Nine
Revenge › Ernest Cline › Allen Eskens Book of the 5th Wave Americans and Their Epic
› Michael Punke 9,210 1,896 Series Quest for Gold at the…
1,250 Paperback Paperback › Rick Yancey › Daniel James Brown
Paperback $8.37 $8.75 2,006 17,056
$9.52 Paperback #1 Best Seller in Boating
$6.70 Paperback
$9.15
Netflix 13/03/2016 10:03 26am
Fuller House The Wiggles My Little Pony Mako Mermaids H2O: Just Add Water Good Luck Charlie Pokémon
Popular
Action
https://www.netflix.com/Kids Page 1 of 4
11
Jobs
Numerous companies
across all industries hire
ML experts:
Data Scientist
Analytics Expert
Business Analyst
Statistician
Software Engineer
Researcher
…
12
Companies Employing our Students in ML Roles
13
Discussion
Share an example of how machine
learning can help in either your
workplace or in your daily life.
About this Subject
15
Teaching Staff
• Lecturer (James)
• Tutors
* Curtis (Hanxun) Huang
• [email protected]
• PhD candidate, School of Computing and Information Systems
* Edmund Lau
* [email protected]
* PhD candidate, School of Mathematics and Statistics
* Yuning Zhou
• [email protected]
• PhD candidate, School of Computing and Information Systems
16
Getting Help
• Machine learning subject on Canvas is operational
* Please check for announcements, lecture and workshop
materials, discussion forums, ….
17
Timetable
• 9am-10:20am Part A
* 9:00-9:15 of Part A reserved for quizzes (Weeks 3-7)
• 10:20-10:40 Break
• 10:40-12:00 Part B
• 2:00-5:00 Workshop.
18
Relation to Other Subjects
• Machine learning (aka “Statistical Learning II”) versus
* Statistical Learning
* Predictive Analytics
* Text and Web Analytics
19
Versus “Statistical Learning”
• Complementary, with some overlap on regression
• This subject
* More computer science (CS) in flavour
* More: Regularisation, nonlinear & computational perspectives
* Covers variety of learning tasks beyond regression
20
Versus “Predictive Analytics”
• Complementary, again overlapping but mostly early
• This subject
* Less timeseries, more approaches to prediction in CS
* Drill down further into the algorithms - implementations;
scratch the theory surface
21
Versus “Text and Web Analytics”
• Also complementary, with some overlap
22
Assumed Knowledge
• Programming
* Familiarity with computer programming
* Load data, perform simple manipulations, call ML libraries,
inspect & plot results
• Maths
* Comfort with formal notation (“mathematical maturity”)
* Familiarity with probability (e.g. Bayes rule, multivariate
distributions)
* Exposure to optimisation (and some calculus, linear algebra)
23
Textbooks (Optional)
• No set textbook for the subject. You can find required information in lecture
notes, supplemented with easy to connect info on the Web. Some independent
research is reasonable for a masters-level subject. However, there exist number of
good books can might used as references:
• Hastie, Tibshirani, and Friedman (2009), The Elements of Statistical Learning: Data
Mining, Inference and Prediction
* This is a seminal book on machine learning that covers major machine learning
tools in depth with great rigour.
26
Textbooks (Optional)
• Russell and Norvig (2002), Artificial Intelligence: A
Modern Approach
* A very broad (but less deep) overview of the whole field of
artificial intelligence, including machine learning
• Data mining resources are also useful
* Data Mining, Fourth Edition: Practical Machine Learning
Tools and Techniques (Morgan Kaufmann Series in Data
Management Systems), 4th edition, Witten, Frank, Hall and
Pal.
* Introduction to Data Mining, 1st edition, Tan, Steinbach and
Kumar.
27
Materials
28
Software – Python Stack
• We will be using Python 3 as the primary language in
workshops
• Get a copy of Python for your machine
* The Anaconda distribution is particularly convenient
• https://www.continuum.io/downloads
* Jupyter used extensively in workshops (and industry!)
* See Software Guide published to Canvas
29
Assessment
• 25% individual short in-lecture quizzes (Weeks 3 to 8)
* 5 quizzes (each 13+2min reading time)
* Worth 5% each
30
Syndicate assignment
We are planning to use a property dataset from
ANZ (CoreLogic dataset)
• Week 2:
* Feature selection and decision trees. Ensemble methods, bagging and
boosting
• Week 3:
* Regularization, support vector machines
• Week 4:
* Neural networks and optimisation
32
Subject Plan (cont.)
• Week 5:
* Boosting: gradient tree boosting (XGBoost) and AdaBoost
• Week 6:
* Unsupervised learning: Clustering
• Week 7:
* Unsupervised learning: Network analysis, community detection
and semi-supervised learning
• Week 8:
* Revision
33
Machine Learning – A Dizzying Array
• We will be looking at a range of machine learning
techniques
* Regression, naïve Bayes, decision trees, random forests,
gradient tree boosting, neural networks, clustering,
community detection
34
Machine Learning – Common Themes
35
ML Setup
36
Terminology
• Input to a machine learning system can consist of
* Instance (aka object): measurements about individual
entities/objects a loan application
* Attribute (aka Feature): component of the instances
the applicant’s salary, number of dependents, etc.
* (Class) Label: an outcome that is categorical, numeric, etc.
forfeit vs. paid off
* Examples: instance coupled with label
<(100k, 3), “forfeit”>
* Models: discovered relationship between attributes and/or label
37
Terminology
Height Weight Age Gender
1.8 80 22 Male
1.53 82 23 Male
1.6 62 18 Female
38
Supervised vs Unsupervised Learning
Data Model used for
39
Architecture of a Supervised Learner
Examples
Learner
Train data
More soon
Model
Instances
Labels
Test data
Evaluation
Labels
40
Evaluation (Supervised Learners)
• How you measure quality depends on your problem!
• Typical process
* Pick an evaluation metric comparing label vs prediction
* Procure an independent, labelled test set
* “Average” the evaluation metric over the test set
41
Training and Testing: If Sufficient Data
• Divide data into:
* Training set (e.g. 2/3)
* Test set (e.g. 1/3)
Workshop will
cover cross
validation
Why Evaluate on “Independent” Data?
43
Why Evaluate on “Independent” Data?
44
Why Evaluate on “Independent” Data?
45
Metrics for Performance Evaluation
• Can be summarised in a Confusion Matrix
(contingency table)
– Actual class: {yes, no, yes, yes, …}
– Predicted class: {no, yes, yes, no…}
For binary classification
PREDICTED CLASS
a: TP (true positive)
Class=Yes Class=No
b: FN (false negative)
c: FP (false positive)
ACTUAL Class=Yes a b
d: TN (true negative)
CLASS
Class=No c d
Metrics for Performance Evaluation
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
CLASS (TP) (FN)
Class=No c d
(FP) (TN)
a+d TP + TN
Accuracy = =
a + b + c + d TP + TN + FP + FN
* Actual class: {yes, no, yes, yes, no, yes, no, no}
* Predicted: {no, yes, yes, no, yes, no, no, yes}
PREDICTED CLASS
Class=Yes Class=No
• Precision: TP/(TP+FP)
What is accuracy?
What is precision?
What is recall?
What is F1-score?
Accuracy=(60+9760)/10000
Precision=60/(200)=6/20
Recall=60/100
F-measure=2/(20/6 + 10/6)
ROC Curves
• Many classification algorithms output not only a
classification for each test instance but also some
“rating” of classification accuracy:
• naive Bayes, logistic regression, ... support vector machines,
neural networks
• Often in machine learning tasks, we can afford the
luxury of “skimming off” a subset of the instances with
higher classification plausibility
• Also, we are often more interested in how reliably we
can predict a small subset of positive instances than
the vast majority of negative instances
• Is this a good classifier?
Receiver Operating Characteristic (ROC) Curves
• Advantages
* Can compare classifiers inc. relative to a baseline
* Unbiased estimate of: probability that a randomly chosen
positive instance will be ranked higher than a randomly chosen
negative instance. Why is this an advantage in practice?
Generating ROC Curves Example
61
Major Frameworks in Statistical ML
•
62
Major Types of Supervised Models
• Given instance x wish to predict response y
• Recall conditional probability Pr(y|x) = Pr(x,y) / Pr(x)
Example: Wish to distinguish between Swedish and Russian
• Discriminative models
* Model only Pr(y|x)
* E.g. Logistic regression (also linear regression, SVMs, …)
Identify characteristics that differentiate the languages, use presence
absence to compute Pr(Swedish|speech) and Pr(Russian|speech)
• Generative models
* Model full Pr(x,y) = Pr(y|x) Pr(x)
* E.g. Naïve Bayes
Learn to speak Russian and Swedish, then classify the
speech with your knowledge of each language
Linear Models
Discriminative approaches mostly as refresher
64
Bayes Rule
Bayes rule in action
Naïve Bayes (NB) Classifiers
Simplifying assumption
The final NB formulation
Estimating the probabilities (1)
Estimating the probabilities (2)
Naïve Bayes in Action
Marginals
76
Naïve Bayes: Summary
• A simple linear classifier with a generative model
• Frequentist: Its probabilistic model is fit by MLE
• Bayesian? Bayes rule, but not necessarily Bayesian!
• Naïve? It models strong independence assumptions
• Easy to implement, fast, scalable; good baseline
• Can handle continuous features (use Gaussians)
• Can handle missing data (just ignore; v simple!)
• Scores not always great; Feature correlations ignored
77
Linear Regression
78
Example: Predict Humidity from Temperature
79
Method of Least Squares
80
Regression for classification
by this target
• This logit transformation maps [0,1] to (-¥ , +¥ ), i.e., the new target
values are no longer restricted to the [0,1] interval
82
Logistic Regression Model
Logistic function
•
1.0
0.8
0.6
Probabilities
" !
0.4
0.2
0.0
-10 -5 0 5 10
!
Reals
83
Logistic Regression Model
predict predict
• “no” “yes”
T2D
Note: here we
do not use sum
0.5 of squared
errors for fitting
BMI
0
84
Logistic Regression: Linearity, Training
•
85
Decision boundary example
http://www.kdnuggets.com/2016/08/role-activation-function-neural-network.html
Decision boundary example
http://www.kdnuggets.com/2016/08/role-activation-function-neural-network.html
Exercise
How can you use linear regression (or logistic
regression) to model non-linear functions on your
data?
Summary
• Subject intro and logistics
• Performance evaluation metrics
* Accuracy, AUC, and a veritable zoo
• Approaches to ML
* Frequentist vs Bayesian vs Decision Theoretic
* Supervised models: Generative vs Discriminative
• Linear approaches
* Naïve Bayes
* Linear regression
* Logistic regression
89