Chapter 5 - Machine Learning

Debark University
Chapter 5: Overview of machine learning

Target:3rd year extension in cs
Compiled by mikiyas A. (Msc in computer science)
1
What is machine learning ?
 Many definitions have been proposed for the term machine learning (ML).
 Machine learning is a branch of artificial intelligence, concerned with the design
and development of algorithms that allow computers to evolve behaviours based
on empirical data.
 Arthur Samuel (1959) defined machine learning as “a subfield of computer
science that gives computers the ability to learn without being explicitly
programmed.”
 It means that ML is able to perform a specified task without being directly
told how to do it.
 Example: distinguish between spam and valid email messages. Given a set of
manually labelled good and bad email examples, an algorithm can
automatically learn a set of rules that distinguish them.
2
 A widely accepted formal definition by Tom Mitchell (1997)
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at the tasks
T , as measured by P, improves with the experiences.
 In short
a set of computer programs that automatically learn from past experiences
(examples or training corpus)
Improve the performances of intelligent programs
 Example: According to this definition, we can reformulate the email problem as the task of
identifying spam messages (task T) using the data of previously labelled email messages
(experience E) through a machine learning algorithm with the goal of improving the future
email spam labelling (measure P)
3
 A widely accepted formal definition by Tom Mitchell (1997)
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at the tasks T ,
as measured by P, improves with the experiences.
 Example: Handwritten recognition problem

Task T : Recognizing hand written characters
Performance measure P : Percent of characters correctly classified
Training experience E: A database of handwritten characters with given
classifications
4
 In the field of today’s data science:
 ML aims to select, explore and extract useful knowledge from complex,

often non-linear data, building a computational model capable of describing
unknown patterns or correlations, and in turn, solve challenging problems.
 This learning process is often carried out through repeated exposure to the
defined problem (dataset), allowing the model to achieve self-optimization
and continuously enhance its ability to solve new, previously unseen
problems.
5
Artitificial intelligence Vs. Machine Learning Vs. Deep learning
Probability Theory in Machine Learning
• Probability is key concept is dealing with uncertainty

• Three main sources of uncertainty in machine learning: noisy data, incomplete
coverage of the problem domain and imperfect models.
• Probability Theory
• Framework for quantification and manipulation of uncertainty
• Provides the foundation and tools for quantifying, handling, and harnessing
uncertainty in applied machine learning.
• Probability provides the basis for developing specific algorithms, such as Naive
Bayes, and Bayesian Belief Networks
Basic Concepts on Probability
• Probability. The word probability is actually undefined, but the probability of an

event can be explained as the proportion of times, under identical circumstances,
that the event can be expected to occur.
• It is the event's long-run frequency of occurrence.
• For example, the probability of getting a head on a coin toss = .5. If you
tossing a coin repeatedly, for a long time, you will note that a head occurs
about one half of the time.
• Probability vs. Statistics. Probability

• In probability, the population is known.
Population Sample
• In statistics you draw inferences about
the population from the sample. Statistics
Probabilistic vs Statistical Reasoning
• Suppose I know exactly the proportions of car makes in California. Then I

can find the probability that the first car I see in the street is a Ford. This is
probabilistic reasoning as I know the population and predict the sample.
• Now suppose that I do not know the proportions of car makes in California,
but would like to estimate them. I observe a random sample of cars in the
street and then I have an estimate of the proportions of the population. This is
statistical reasoning
Key Terms
• Classical (or theoretical) probability is used when each outcome in a sample space is equally
likely to occur. With theoretical probability, we do not actually conduct an experiment. The
classical probability for event E is given by
Nu m ber of ou t com es in even t
P (E )  .
Tot a l n u m ber of ou t com es in sa m ple spa ce
• Toss a coin and the probability of getting a head or a tail is 1/2. P(head) =1/2, P(tail) = ½
• Example 1: Find the probability of rolling a 6 on a fair die. Answer: The sample space for rolling is die is 6
equally likely results: {1, 2, 3, 4, 5, 6}. The probability of rolling a 6 is 1/6.
• If you roll a fair die, what is the probability of rolling an even number?
Empirical (or statistical) probability, also called experimental probability, is based on observations
obtained from probability experiments. Empirical probability is found by repeating an experiment and
observing the outcomes. Each observation in an experiment is called a trial.The empirical frequency of an
event E is the relative frequency of event E.
F r equ en cy of E ven t E
P (E ) 
Tot a l fr equ en cy
Key Terms
Examples of emperical probability

1. You tossed a coin 10 times and recorded a head 3 times, a tail 7 times. P(head)= 3/10, P(tail) = 7/10
2.A survey was conducted to determine students' favourite brands of sneakers. Each student chose only one
brand from the list of brands A (12), B(15), C(24), D(26), or E(13). What is the probability that a student's
favourite sneaker was brand D? Answer: There were 12 + 15 + 24 + 26 + 13 = 90 "trials" in this experiment
(each student's response was a trial). 26 out of the 90 students chose brand D. The probability is :
26/90=13/45
• Simple probability. P(A). The probability that an event (say, A) will occur.
• Joint probability. P(A and B). P(A ∩ B). The probability of events A and B occurring together.
• Conditional probability. P(A|B), read "the probability of A given B." The probability that event A
will occur given event B has occurred.
12
Probability Distributions
• Probability distribution is a list of all possible outcomes of a random variable along
with their associated probability values.
• Example 1: the probability distrubtion of a fair 6-sided die.
• Example 2: What is probability distribution for the number of heads occurring in

three coin tosses?
Probability Distributions
• A function that represents a discrete probability distribution is called a probability mass function
(PMF).
• A function that represents a continuous probability distribution is called a probability density

function (PDF).
Traditional vs ML Approach
20
Related Fields
data
mining Optimization
statistics
decision theory
information theory machine
learning cognitive science
databases
psychological models
evolutionary neuroscience
models
Machine learning is primarily concerned with the accuracy and

effectiveness of the computer system in performing complex
tasks.
21
Statistics vs. Machine Learning
Statistics Machine Learning

Inference Prediction
Small data sets/low-dimensional data Large data sets/high-dimensional data
Specific assumptions and hypotheses Large flexibility and free from a priori
assumptions/hypothesis free
Computation of the P values to accept or ROC curve, cross-validation, etc.
reject a null hypothesis
Fitting a parsimonious model to produce an Considers complex non-linear
easy to understand and interpretable results patterns, a sophisticated model that is
not easy to understand or interpret.
22
Classes of Machine Learning problem
• Supervised Learning
• Learn to predict output when given an input vector
• Training data includes desired outputs
• Unsupervised Learning
• The aim is to uncover the underlying structures (classes or clusters) in the data
• Training data does not include desired outputs. This is the new frontier of machine
learning because most big datasets do not come with labels.
• Semi-supervised Learning
• Desired outputs or classes are available for only a part of the training data.
• This approach is useful when it is impractical or too expensive to access or measure
the target variable for all participants
• Reinforcement Learning
• Learning method that interacts with its environment by producing actions and
discovers errors or rewards.
• On the basis of trial and error, to discover what actions maximize reward and
minimize the penalty.
23
Classes of machine learning problem
24
Classes of machine learning Problem
Classes of machine learning Problem
Clustering
Dimensionality reduction
26
Machine learning structure
27
Machine learning structure
28
The Learning Problem
• Given <x,f(x)> pairs, infer f
x f(x) Given a finite sample, it is often

impossible to guess the true function f.
1 1
Approach: Find some pattern (called a
2 4 hypothesis) in the training examples, and
3 9 assume that the pattern will hold for future
examples too.
4 16
5 ?
The machine learning framework
y = f(x)
output prediction Image

function feature
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},

estimate the prediction function f by minimizing the prediction error on
the training set
• Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
Slide credit: L. Lazebnik

Learning—A Two-Step Process
• Model construction:
• A training set is used to create the model.
• The model is represented as classification rules, decision trees,
or mathematical formulae
• Model usage:
• the test set is used to see how well it works for classifying
future or unknown objects
Step 1: Model Construction
Classification
Algorithms
Training
Data
NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
32
Step 2: Using the Model in Prediction
Classifier
model
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 33
Applications of Machine learning
 Prediction (weather, medical, agricultural yield, etc.)
 Face detection; character recognition;
 Surveillance and security system
 Object detection and recognition
 Naturallanguage processing (word sequence predication, spelling and
grammar checker, speech recognition (dialog system),translation system,
information retrieval, news text classification)
 Image segmentation
 Multimedia event detection
 Economical and commercial usage
 Outlier detection: Exceptions that are not covered by the rule, e.g., fraud
 ……… many many more 34
Challenges in Machine Learning
• Efficiency and scalability of machine learning algorithms

• Handling high-dimensionality
• Handling noise, incomplete and imbalanced data
• Pattern evaluation and knowledge integration
• Protection of security, integrity, and privacy in machine learning
• Data acquisition and representation issues
• Degree of interpretability for predictive power
• Deployment issues
Basic Steps in Machine Learning
1. Data collection
“training data”, mostly with “labels” provided by a “teacher”;
2. Data preprocesing
Clean data to have homogenity
3. Feature engineering
Select represenatative features to improve performance
4. Modeling
choose the class of models that can describe the data
5. Estimation/Selection
find the model that best explains the data: simple and fits well;
6. Validation
evaluate the learned model and compare to solution found using other model classes;
7. Operation
Apply learned model to new “test” data or real world instances
36
Neural Network (NN)
37
Neural Network (NN)
• NN is replica of neuron system in human brain. The human brain is composed by billions
neuron which are interconnected each others. A biological neuron consists of three main
components :
1. Dendrites, that are input signals channel where the strength of connections to
nucleus are affected by weights.
2. Cell Body, where computation of input signals and weights generate output signals
which will be delivered to another neurons
38
3. Axon, is part which transmit output signals to another neurons that are connected to
it.
Mathematical Model of a Neuron
Neural Network (NN) classifier
It is represented as a layered set of interconnected processors.
 These processor nodes and connections resembles a relationship with the neurons of the
brain.
 Each node has a weighted connection to several other nodes in adjacent layers.
 Individual nodes take the input received from connected nodes and use
the weights together to compute output values.
The inputs are fed simultaneously into the input layer.
The weighted outputs of these units are fed into hidden layer.
The weighted outputs of the last hidden layer are inputs to units making up
the output layer.
40
Neural Networks Applications
There are two basic goals for neural network research:
Brain modelling
•Aid our understanding of how the brain works. This helps to understand the nature of perception,
actions, learning and memory, thought and intelligence and/or formulate medical solutions to brain
damaged patients.
Artificial System Construction/ real world applications.
•Financial modelling – predicting the stock market
•Time series prediction – climate, weather, seizures
•Computer games – intelligent agents, chess, backgammon
•Robotics – autonomous adaptable robots
•Pattern recognition – speech recognition, seismic activity, sonar signals
•Data analysis – data compression, data mining
•Bioinformatics – DNA sequencing, alignment
41
Neural Network Architectures
• Network architectures (or topologies) define ways how neurons are mutually connected.
• Early NN Models:
• Perceptron, ADALINE, Hopfield Network
• Current Models
• Multilayer Perceptrons (MLPs)
• Deep Learning Architectures
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks (RNNs)
• Radial Basis Function Networks (RBFNs)
Basic Architecture of Perceptron
Multi-layer Perceptron (MLP)
• One of the most popular neural network model is the multi-layer perceptron (MLP).
• In an MLP, neurons are arranged in layers. There is one input layer, one output layer,
and several (or many) hidden layers.
Hidden layer: Neuron with Activation
The neuron is the basic information processing unit of a NN.

It consists of:
1. A set of links, describing the neuron inputs, with weights W1,W2, …,Wm
2. An adder function (linear combiner) for computing the weighted sum of the
inputs (real numbers):
m
z = j∑= wj xj
1
3. Activation function (also called squashing function): for
limiting the output behaviour of the neuron.
y = ϕ ( z +b )
4545
45
Activation functions
1, x >= 1 +1, x >= 0

Step(x) = Sign(x) = Sign(x) = 1/(1+e-x)
0, if x < 1 -1, if x < 0
Changing the bias weight W0,i moves the threshold location

Bias helps the neural network to be more flexible since it adjust the activation function left-or-
right, making it centered on some other value than x = 0. To this effect an additional node is
added to the input layer, with its constant input; say, 1 or -1, … When this is multiplied by the
4646
46
weights of the hidden layer, it provides a bias to activation function.
Training Multi-layer NN
Train this layer first

Train this layer first

then this layer
Deep Learning…
Convolutional Neural Networks (CNNs)
• CNNs are a special kind of multi-layer neural networks, designed for processing data
that has an input shape like a 2D matrix like images.
• CNN’s are typically used for image detection and classification.
• Images are 2D matrix of pixels on which we run CNN to either recognize the image or
to classify the image.
• Example: Identify if an image is of a human being, or car or just digits on an address.

Convolutional Neural Network Architecture
Convolutional Neural Network Architecture
• A CNN typically has three layers:

• Convolutional layer,
• Pooling layer, and
• Fully connected layer.
• Convolutional layer is the core building block of a CNN, and it is where the majority
of computation occurs.
• The term convolution refers to the mathematical combination of two functions to
produce a third function. It merges two sets of information.
• In the case of a CNN, the convolution is performed on the input data with the use of a
filter or kernel then produce a feature map.
Recurrent Neural Networks (RNN)
• A recurrent neural network (RNN) is an extension of a regular feedforward
neural network, which is able to handle variable-length sequential data and
processing time-series prediction.
• Example: If you want to predict the next word in a sentence you need to
know which words came before it.
• In sequence problem, the output depends on
• Current Input
• Previous Output
• Example: Sequence is important for part of speech (POS) tagging
• Traditional neural network cannot capture such relationship.
Typical RNN Architecture
RNN can be seen as an MLP network with the addition of loops to the architecture.
RNN Example: Guess part of speech (POS)
RNN Applications
• Natural language processing
• E.g. Given a sequence of words, RNN predicts the probability of next word given the previous ones.
• Machine translation: Similar to language modeling
• E.g. Google translator (English to Amharic )
• Speech recognition:
• given input: sequence of acoustic signals, produce output phonetic segments
• Image tagging : RNN + CNN jointly trained.
• CNN generates features (hidden state representation).
• RNN reads CNN features and produces output (end-to-end training).
• Time series prediction : Forecast of future values in a time series, from past seen values.
• e.g Weather forecast, financial time series
Assignment 2
• Make group then search and study one of the following topics and present to class
1. Convolutional neural networks (CNN)
2. Recurrent Neural Networks (RNN)
3. Dimensionality reduction techniques
4. Semi supervised learning
5. Parametric vs non-parametric learning
6. Logistic regression
58
Thanks a lot !!

Chapter 5 - Machine Learning

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Chapter 5 - Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 - Machine Learning

Uploaded by

Copyright:

Available Formats

Debark University

Chapter 5: Overview of machine learning

Compiled by mikiyas A. (Msc in computer science)

 Example: Handwritten recognition problem

 ML aims to select, explore and extract useful knowledge from complex,

• Probability is key concept is dealing with uncertainty

• Probability. The word probability is actually undefined, but the probability of an

• Probability vs. Statistics. Probability

• Suppose I know exactly the proportions of car makes in California. Then I

Examples of emperical probability

• Example 2: What is probability distribution for the number of heads occurring in

• A function that represents a continuous probability distribution is called a probability density

Machine learning is primarily concerned with the accuracy and

Statistics Machine Learning

x f(x) Given a finite sample, it is often

output prediction Image

• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},

Slide credit: L. Lazebnik

NAME RANK YEARS TENURED Classifier

• Efficiency and scalability of machine learning algorithms

The neuron is the basic information processing unit of a NN.

1, x >= 1 +1, x >= 0

Changing the bias weight W0,i moves the threshold location

Train this layer first

Train this layer first

• CNN’s are typically used for image detection and classification.

• Example: Identify if an image is of a human being, or car or just digits on an address.

• A CNN typically has three layers:

You might also like