Chapter 5 - Machine Learning
Chapter 5 - Machine Learning
Chapter 5 - Machine Learning
1
What is machine learning ?
Many definitions have been proposed for the term machine learning (ML).
Machine learning is a branch of artificial intelligence, concerned with the design
and development of algorithms that allow computers to evolve behaviours based
on empirical data.
Arthur Samuel (1959) defined machine learning as “a subfield of computer
science that gives computers the ability to learn without being explicitly
programmed.”
It means that ML is able to perform a specified task without being directly
told how to do it.
Example: distinguish between spam and valid email messages. Given a set of
manually labelled good and bad email examples, an algorithm can
automatically learn a set of rules that distinguish them.
2
What is machine learning ?
A widely accepted formal definition by Tom Mitchell (1997)
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at the tasks
T , as measured by P, improves with the experiences.
In short
a set of computer programs that automatically learn from past experiences
(examples or training corpus)
Improve the performances of intelligent programs
Example: According to this definition, we can reformulate the email problem as the task of
identifying spam messages (task T) using the data of previously labelled email messages
(experience E) through a machine learning algorithm with the goal of improving the future
email spam labelling (measure P)
3
What is machine learning ?
A widely accepted formal definition by Tom Mitchell (1997)
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at the tasks T ,
as measured by P, improves with the experiences.
4
What is machine learning ?
In the field of today’s data science:
This learning process is often carried out through repeated exposure to the
defined problem (dataset), allowing the model to achieve self-optimization
and continuously enhance its ability to solve new, previously unseen
problems.
5
Artitificial intelligence Vs. Machine Learning Vs. Deep learning
Probability Theory in Machine Learning
• Now suppose that I do not know the proportions of car makes in California,
but would like to estimate them. I observe a random sample of cars in the
street and then I have an estimate of the proportions of the population. This is
statistical reasoning
Key Terms
• Classical (or theoretical) probability is used when each outcome in a sample space is equally
likely to occur. With theoretical probability, we do not actually conduct an experiment. The
classical probability for event E is given by
Nu m ber of ou t com es in even t
P (E ) .
Tot a l n u m ber of ou t com es in sa m ple spa ce
• Toss a coin and the probability of getting a head or a tail is 1/2. P(head) =1/2, P(tail) = ½
• Example 1: Find the probability of rolling a 6 on a fair die. Answer: The sample space for rolling is die is 6
equally likely results: {1, 2, 3, 4, 5, 6}. The probability of rolling a 6 is 1/6.
• If you roll a fair die, what is the probability of rolling an even number?
Empirical (or statistical) probability, also called experimental probability, is based on observations
obtained from probability experiments. Empirical probability is found by repeating an experiment and
observing the outcomes. Each observation in an experiment is called a trial.The empirical frequency of an
event E is the relative frequency of event E.
F r equ en cy of E ven t E
P (E )
Tot a l fr equ en cy
Key Terms
• Simple probability. P(A). The probability that an event (say, A) will occur.
• Joint probability. P(A and B). P(A ∩ B). The probability of events A and B occurring together.
• Conditional probability. P(A|B), read "the probability of A given B." The probability that event A
will occur given event B has occurred.
12
Probability Distributions
• Probability distribution is a list of all possible outcomes of a random variable along
with their associated probability values.
• Example 1: the probability distrubtion of a fair 6-sided die.
20
Related Fields
data
mining Optimization
statistics
decision theory
information theory machine
learning cognitive science
databases
psychological models
evolutionary neuroscience
models
22
Classes of Machine Learning problem
• Supervised Learning
• Learn to predict output when given an input vector
• Training data includes desired outputs
• Unsupervised Learning
• The aim is to uncover the underlying structures (classes or clusters) in the data
• Training data does not include desired outputs. This is the new frontier of machine
learning because most big datasets do not come with labels.
• Semi-supervised Learning
• Desired outputs or classes are available for only a part of the training data.
• This approach is useful when it is impractical or too expensive to access or measure
the target variable for all participants
• Reinforcement Learning
• Learning method that interacts with its environment by producing actions and
discovers errors or rewards.
• On the basis of trial and error, to discover what actions maximize reward and
minimize the penalty.
23
Classes of machine learning problem
24
Classes of machine learning Problem
Classes of machine learning Problem
Clustering
Dimensionality reduction
26
Machine learning structure
27
Machine learning structure
28
The Learning Problem
• Given <x,f(x)> pairs, infer f
y = f(x)
• Model usage:
• the test set is used to see how well it works for classifying
future or unknown objects
Step 1: Model Construction
Classification
Algorithms
Training
Data
Classifier
model
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 33
Applications of Machine learning
Prediction (weather, medical, agricultural yield, etc.)
Face detection; character recognition;
Surveillance and security system
Object detection and recognition
Naturallanguage processing (word sequence predication, spelling and
grammar checker, speech recognition (dialog system),translation system,
information retrieval, news text classification)
Image segmentation
Multimedia event detection
Economical and commercial usage
Outlier detection: Exceptions that are not covered by the rule, e.g., fraud
……… many many more 34
Challenges in Machine Learning
37
Neural Network (NN)
• NN is replica of neuron system in human brain. The human brain is composed by billions
neuron which are interconnected each others. A biological neuron consists of three main
components :
1. Dendrites, that are input signals channel where the strength of connections to
nucleus are affected by weights.
2. Cell Body, where computation of input signals and weights generate output signals
which will be delivered to another neurons
38
3. Axon, is part which transmit output signals to another neurons that are connected to
it.
Mathematical Model of a Neuron
Neural Network (NN) classifier
It is represented as a layered set of interconnected processors.
These processor nodes and connections resembles a relationship with the neurons of the
brain.
Each node has a weighted connection to several other nodes in adjacent layers.
Individual nodes take the input received from connected nodes and use
the weights together to compute output values.
The inputs are fed simultaneously into the input layer.
The weighted outputs of these units are fed into hidden layer.
The weighted outputs of the last hidden layer are inputs to units making up
the output layer.
40
Neural Networks Applications
There are two basic goals for neural network research:
Brain modelling
•Aid our understanding of how the brain works. This helps to understand the nature of perception,
actions, learning and memory, thought and intelligence and/or formulate medical solutions to brain
damaged patients.
Artificial System Construction/ real world applications.
•Financial modelling – predicting the stock market
•Time series prediction – climate, weather, seizures
•Computer games – intelligent agents, chess, backgammon
•Robotics – autonomous adaptable robots
•Pattern recognition – speech recognition, seismic activity, sonar signals
•Data analysis – data compression, data mining
•Bioinformatics – DNA sequencing, alignment
41
Neural Network Architectures
• Network architectures (or topologies) define ways how neurons are mutually connected.
• Early NN Models:
• Perceptron, ADALINE, Hopfield Network
• Current Models
• Multilayer Perceptrons (MLPs)
• Deep Learning Architectures
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks (RNNs)
• Radial Basis Function Networks (RBFNs)
Basic Architecture of Perceptron
Multi-layer Perceptron (MLP)
• One of the most popular neural network model is the multi-layer perceptron (MLP).
• In an MLP, neurons are arranged in layers. There is one input layer, one output layer,
and several (or many) hidden layers.
Hidden layer: Neuron with Activation
z = j∑= wj xj
1
3. Activation function (also called squashing function): for
limiting the output behaviour of the neuron.
y = ϕ ( z +b )
4545
45
Activation functions
• CNNs are a special kind of multi-layer neural networks, designed for processing data
that has an input shape like a 2D matrix like images.
• Images are 2D matrix of pixels on which we run CNN to either recognize the image or
to classify the image.
• Convolutional layer is the core building block of a CNN, and it is where the majority
of computation occurs.
• The term convolution refers to the mathematical combination of two functions to
produce a third function. It merges two sets of information.
• In the case of a CNN, the convolution is performed on the input data with the use of a
filter or kernel then produce a feature map.
Recurrent Neural Networks (RNN)
• A recurrent neural network (RNN) is an extension of a regular feedforward
neural network, which is able to handle variable-length sequential data and
processing time-series prediction.
• Example: If you want to predict the next word in a sentence you need to
know which words came before it.
• In sequence problem, the output depends on
• Current Input
• Previous Output
• Example: Sequence is important for part of speech (POS) tagging
• Traditional neural network cannot capture such relationship.
Typical RNN Architecture
RNN can be seen as an MLP network with the addition of loops to the architecture.
RNN Example: Guess part of speech (POS)
RNN Applications
• Natural language processing
• E.g. Given a sequence of words, RNN predicts the probability of next word given the previous ones.
• Machine translation: Similar to language modeling
• E.g. Google translator (English to Amharic )
• Speech recognition:
• given input: sequence of acoustic signals, produce output phonetic segments
• Image tagging : RNN + CNN jointly trained.
• CNN generates features (hidden state representation).
• RNN reads CNN features and produces output (end-to-end training).
• Time series prediction : Forecast of future values in a time series, from past seen values.
• e.g Weather forecast, financial time series
Assignment 2
• Make group then search and study one of the following topics and present to class
1. Convolutional neural networks (CNN)
2. Recurrent Neural Networks (RNN)
3. Dimensionality reduction techniques
4. Semi supervised learning
5. Parametric vs non-parametric learning
6. Logistic regression
58
Thanks a lot !!