AI-Lecture 8 (Machine Learning Overview)
AI-Lecture 8 (Machine Learning Overview)
Lecture 8
5
Machine learning approach
6
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
Machine Learning (ML)
• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense
out of data
• Extracting patterns, fitting data to functions,
classifying data, etc
• ML systems can learn and improve
• With historical data, time and experience
• Bridges theoretical computer science and real
noise data.
8
ML in real-life
9
ML in a Nutshell
• Tens of thousands of machine learning algorithms
– Evaluation
– Optimization
ML Components
• Representation
– Numerical functions
●
Linear regression
●
Neural networks
●
Support vector machines
– Symbolic functions
●
Decision trees
●
Sets of rules / Logic programs
– Instance-based functions
●
Nearest-neighbor
●
Case-based
– Probabilistic Graphical Models
●
Naïve Bayes
●
Bayesian networks
●
Hidden-Markov Models (HMMs)
●
Probabilistic Context Free Grammars (PCFGs)
●
Markov networks
ML Components
• Various Search/Optimization Algorithms
– Gradient descent
●
Perceptron
●
Backpropagation
– Dynamic Programming
●
HMM Learning
●
PCFG Learning
– Divide and Conquer
●
Decision tree induction
●
Rule learning
– Evolutionary Computation
●
Genetic Algorithms (GAs)
●
Genetic Programming (GP)
●
Neuro-evolution
ML Components
• Evaluation
– Accuracy
– Precision and recall
– Squared error
– Likelihood
– Posterior probability
– Cost / Utility
– Margin
– Entropy
– K-L divergence
– Etc.
Types of Learning
• Supervised (inductive) learning
– Training data includes desired outputs
15
Reinforcement
learning
Learning to play Break Out
https://www.youtube.com/
watch?v=V1eYniJ0Rnk
16
Clustering
Crime prediction using k-
means clustering
http://www.grdjournals.co
m/uploads/article/GRDJE/V
02/I05/0176/GRDJEV02I05
0176.pdf
17
Machine learning algorithms
• Regression:
Ridge regression, Support Vector Machines, Random Forest,
Multilayer Neural Networks, Deep Neural Networks, ...
• Classification:
Naive Base, , Support Vector Machines,
Random Forest, Multilayer Neural Networks,
Deep Neural Networks, ...
• Clustering:
k-Means, Hierarchical Clustering, ...
18
Issues
• Many machine learning/AI projects fail
(Gartner claims 85 %)
19
Reasons for failure
• Asking the wrong question
• Trying to solve the wrong problem
• Not having enough data
• Not having the right data
• Having too much data
• Hiring the wrong people
• Using the wrong tools
• Not having the right model
• Not having the right yardstick
20
Frameworks
• Programming languages
– Python
– R Fast-evolving ecosystem!
– C++
– ...
• Many libraries classic machine
– scikit-learn learning
– PyTorch
deep learning
– TensorFlow
frameworks
– Keras
– …
21
scikit-learn
• Nice end-to-end framework
– data exploration (+ pandas + holoviews)
– data preprocessing (+ pandas)
●
cleaning/missing values
●
normalization
– training
– testing
– application
• "Classic" machine learning only
• https://scikit-learn.org/stable/
22
Keras
• High-level framework for deep learning
• TensorFlow backend
• Layer types
– dense
– convolutional
– pooling
– embedding
– recurrent
– activation
– …
• https://keras.io/
23
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering
2.5
2.0
1.5
irisCluster$cluster
Petal.Width
1
1.0
0.5
0.0
2 4 6
Petal.Length
24
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression
25
Supervised Learning
• Classification
• Predicts which class a given sample of data (sample of
descriptive features) is part of (discrete value).
virginica
0.0 4.0 96.0
Percent
100
75
Predicted
versicolor
0.0 96.0 4.0 50
• Regression 25
26
Machine Learning as a Process
- Define measurable and quantifiable goals
Define
- Use this stage to learn about the problem
Objectives
- Normalization
- Transformation
Model - Missing Values
Deployment Data - Outliers
Preparation
27
ML as a Process: Data Preparation
• Needed for several reasons
• Some Models have strict data requirements
• Scale of the data, data point intervals, etc
• Some characteristics of the data may impact dramatically on the
model performance
• Time on data preparation should not be underestimated
28
ML as a Process: Feature engineering
• Determine the predictors (features) to be used is one of the most critical
questions
• Some times we need to add predictors
• Reduce Number:
• Fewer predictors more interpretable model and less costly
• Most of the models are affected by high dimensionality, specially for non-informative
predictors
Algorithms
Multiple
that use
models
Wrappers adding and
removing
models as
input and
Genetics
Algorithms
performance
parameter
as output
• Binning predictors
Evaluate the Based
Filters relevance of
the predictor
normally on
correlations
29
View of Std ML Datasets
- a Single Table (2D array)
Output
Feature 1 Feature 2 Feature N
... Category
...
• Data Splitting
• Allocate data to different tasks
• model training
• performance evaluation
• Define Training, Validation and Test sets
• Feature Selection (Review the decision made previously)
• Estimating Performance
• Visualization of results – discovery interesting areas of the problem
space
• Statistics and performance measures
• Evaluation and Model selection
• The ‘no free lunch’ theorem no a priory assumptions can be made
• Avoid use of favorite models if NEEDED
31
Nearest Neighbors: Basic Algorithm
for Classification
• Find the K nearest neighbors to
test-set example
• Or find all ex’s within radius R
• Combine their ‘votes’
– Most common category
– Average value (real-valued prediction) +
- -
-
– Can also weight votes by distance ?
+ -
– Lots of variations on basic theme
Simple Example: 1-NN
(1-NN ≡ one nearest neighbor)
Training Set
1. a=0, b=0, c=1 +
2. a=0, b=0, c=0 -
3. a=1, b=1, c=1 -
Test Example
a=0, b=1, c=0 ?
“Hamming Distance” (# of different bits)
Ex 1 = 2
Ex 2 = 1 So output -
Ex 3 = 2
From neurons to ANNs
𝑥1 inspiration
𝑤1
𝑥2 𝑤2
𝑦 𝑁 𝜎 (𝑥 )
𝑥3
𝑤3 𝑦 =𝜎 ( ∑ 𝑤 𝑖 𝑥𝑖 + 𝑏
𝑖=1
) activation function
𝑏
+1
𝑥
...
𝑤𝑁
𝑥𝑁
34
Multilayer network
How to determine
weights?
35
Training: backpropagation
• Initialize weights "randomly"
• For all training epochs
• for all input-output in training set
• using input, compute output
(forward)
• compare computed output with
training output
• adapt weights (backward) to
improve output
• if accuracy is good enough, stop
36
Task: handwritten digit
recognition
• Input data
• grayscale image
• Output data
• digit 0, 1, ..., 9
• Training examples
• Test examples
37
Deep neural networks
• Many layers
• Features are learned, not given
• Low-level features combined into
high-level features
1 ⋯ 0
[ ]
⋮ ⋱ ⋮
0 ⋯ 1
40
Convolution examples
1 ⋯ 0 1 ⋯ 0
[ ]
⋮ ⋱ ⋮
0 ⋯ 1 [ ]
⋮ ⋱ ⋮
0 ⋯ 1
0 ⋯ 1 0 ⋯ 1
[ ]
⋮ ⋱ ⋮
1 ⋯ 0 [ ]
⋮ ⋱ ⋮
1 ⋯ 0
41
Task: sentiment <start> this film was just
brilliant casting location
/
myself so i loved the fact
there was a real connection
with this
• Training examples film the witty remarks
throughout the film were
• Test examples great it was
just brilliant so much that
i bought the film as soon as
it
43
Word
embedding
• Represent words as
one-hot vectors
length = vocabulary
size
Issues:
• unwieldy
• no semantics
• Word embeddings
• dense vector
• vector distance
semantic distance
• Training
• use context
• discover relations with
surrounding words
44
End