Intro To ML PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Introduction to machine

learning

© Copyright IBM Corporation 2018


Course materials may not be reproduced in whole or in part without the prior written permission of IBM.
Unit objectives
• Explain what machine learning is.
• Describe machine learning types and approaches.
• List different machine learning algorithms.
• Explain what neural networks and deep learning are, and why they are
important in today’s AI field.
• Explain how to evaluate your machine learning model.

Introduction to machine learning © Copyright IBM Corporation 2018


What is machine learning

Introduction to machine learning © Copyright IBM Corporation 2018


Topics

• What is machine learning


• Machine learning algorithms
• What are neural networks
• What is deep learning
• How to evaluate a machine learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning

• In 1959, the term “machine learning” was first introduced by Arthur


Samuel. He defined it as the “field of study that gives computers the
ability to learn without being explicitly programmed”.
• The learning process improves the machine model over time by using
training data.
• The evolved model is used to make future predictions.

Introduction to machine learning © Copyright IBM Corporation 2018


What is a statistical model

• A model in a computer is a mathematical function that represents a


relationship or mapping between a set of inputs and a set of outputs.

• New data “X” can predict the output “Y”.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning algorithms

Introduction to machine learning © Copyright IBM Corporation 2018


Topics
• What is machine learning
• Machine learning algorithms
• What are neural networks
• What is deep learning
• How to evaluate a machine learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning algorithms

• The machine learning algorithm is a technique through which the


system extracts useful patterns from historical data. These patterns
can be applied to new data.
• The objective is to have the system learn a specific input/output
transformation.
• The data quality is critical to the accuracy of the machine learning
results.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning approaches
1) Supervised learning: Train by using labeled data, and learn and
predict new labels for unseen input data.
• Classification is the task of predicting a discrete class label, such as
“black, white, or gray” and “tumor or not tumor”.
• Regression is the task of predicting a continuous quantity, such as
“weight”, “probability”, and “cost”.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning approaches (cont’d)
2) Unsupervised learning: Detect patterns and relationships between
data without using labeled data.
• Clustering algorithms: Discover how to split the data set into a
number of groups such that the data points in the same groups are
more similar to each other compared to data points in other groups.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning approaches (cont'd)
3) Semi-supervised learning:
• A machine learning technique that falls between supervised and
unsupervised learning.
• It includes some labeled data with a large amount of unlabeled data.
• Here is an example that uses pseudo-labeling:
a. Use labeled data to train a model.
b. Use the model to predict labels for the unlabeled data.
c. Use the labeled data and the newly generated labeled data to create a
new model.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning approaches (cont'd)
4) Reinforcement learning
• Reinforcement learning uses trial and error (a rewarding approach).
• The algorithm discovers an association between the goal and the
sequence of events that leads to a successful outcome.
• Example reinforcement learning applications:
▪ Robotics: A robot that must find its way.
▪ Self-driving cars.

Introduction to machine learning © Copyright IBM Corporation 2018


Machine learning algorithms
Understanding your problem and the different types of ML algorithms
helps in selecting the best algorithm
Here are some machine learning algorithms:
• Naïve Bayes classification (supervised classification – probabilistic)
• Linear regression (supervised regression)
• Logistic regression (supervised classification)
• Support vector machine (SVM) (supervised linear or non-linear
classification)
• Decision tree (supervised non-linear classification)
• K-means clustering (unsupervised learning)

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification
• Naïve Bayes classifiers assume that the value of a particular feature is
independent of the value of any other feature, given the class variable.
▪ For example, a fruit may be considered to be an apple if it is red, round, and
about 10 cm in diameter.
▪ Features: Color, roundness, and diameter.
▪ Assumption: Each of these features contributes independently to the
probability that this fruit is an apple, regardless of any possible correlations
between the color, roundness, and diameter features.

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)
Example: Use Naïve Bayes to predict whether the Red, Round shaped,
10 cm diameter label is an apple or not.

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)

To do a classification, you must perform the following steps:

1. Define two classes (CY and CN) that correspond to Apple = Yes and
Apple = No.

2. Compute the probability for CY as x: p(CY | x):


p(Apple = Yes | Colour = Red, Shape = round, Diameter => 10 cm)

3. Compute the probability for CN as x: p(CN | x):


p(Apple = No | Colour = Red, Shape = round, Diameter => 10 cm)

4. Discover which conditional probability is larger:


If p(CY |x) > p(CN |x), then it is an apple.

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)

Naïve Bayes model:

5. Compute p(x|CY) = p(Colour = Red, Shape = round, Diameter =>10


cm | Apple = Yes).

Naïve Bayes assumes that the features of the input data (the apple

parameters) are independent.

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)

Thus, we can rewrite p(x| CY) as:

= p(Colour = Red | Apple = Yes) X p(Shape = round | Apple = Yes) X

p(Diameter => 10 cm | Apple = Yes)

Same for p(x| CN):

= p(Color = Red | Apple = No) X p(Shape = round | Apple = No) X

p(Diameter => 10 cm | Apple = No)

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)
6. Calculate each conditional probability:
p(Colour = Red | Apple = Yes) = 3/5 (Out of five apples, three of them were red.)
p(Colour = Red | Apple = No) = 2/5
p(Shape = Round | Apple = Yes) = 4/5
p(Shape = Round | Apple = No) = 2/5
p(Diameter = > 10 cm | Apple = Yes) = 2/5
p(Diameter = > 10 cm | Apple = No) = 3/5

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)

• p(Color = Red | Apple = Yes) X p(Shape = round | Apple = Yes) X


p(Diameter = > 10 cm | Apple = Yes)

= (3/5) x (4/5) x (2/5) = 0.192

• p(Color = Red | Apple = No) X p(Shape = round | Apple = No) X


p(Diameter = > 10 cm | Apple = No)

= (2/5) x (2/5) x (3/5) = 0.096

• p(Apple = Yes) = 5/10


• p(Apple = No) = 5/10

Introduction to machine learning © Copyright IBM Corporation 2018


Naïve Bayes classification (cont'd)

Compare p(CY | x) to p(CN | x):

Therefore, the verdict is that it is an apple.

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression
• Linear regression is a linear equation that combines a specific set of
input values (X) and an outcome (Y) that is the predicted output for that
set of input values. As such, both the input and output values are
numeric.
• The target variable is a continuous value.
Examples for applications:
• Analyze the marketing effectiveness, pricing, and promotions on the
sales of a product.
• Forecast sales by analyzing the monthly company’s sales for the past
few years.
• Predict house prices with an increase in the sizes of houses.
• Calculate causal relationships between parameters in biological
systems.
Introduction to machine learning © Copyright IBM Corporation 2018
Linear regression (cont'd)
• Example: Assume that we are studying the real state market.
• Objective: Predict the price of a house given its size by using previous
data.
Size Price

30 30,000

70 40,000

90 55,000

110 60,000

130 80,000

150 90,000

180 95,000

190 110,000

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)
Plot this data as a graph

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)
• Can you guess what is the best estimate for a price of a 140-meter
square house?
• Which one is correct?
A. $60,000 Size Price
B. $95,000 30 30,000

C. $85,000 70 40,000

90 55,000

110 60,000

130 80,000

150 90,000

180 95,000

190 110,000

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)
• Target: A line that is within a “proper” distance from all points.
• Error: The aggregated distance between data points and the assumed
line.
• Solution: Calculate the error iteratively until you reach the most
accurate line with a minimum error value (that is, the minimum distance
between the line and all points).

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)

• After the learning process, you get the most accurate line, the bias, and the
slope to draw your line.

• Here is our linear regression model representation for this problem:


h(p) = p0 + p1 * X1
or
Price = 30,000 + 392* Size
Price = 30,000 + 392* 140
= 85,000

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)

• Squared error function


▪ m is the number of samples.
▪ is the predicted value for data point i.
▪ is the actual value for data point i.
Target: Choose P values to minimize errors.
• Stochastic Gradient descent algorithm:

j is the feature number.


is the learning rate.

Introduction to machine learning © Copyright IBM Corporation 2018


Linear regression (cont'd)
• In higher dimensions where we have more than one input (X), the line is
called a plane or a hyper-plane.
• The equation can be generalized from simple linear regression to
multiple linear regression as follows:
Y(X)=p0​+p1​*X1​+p2​*X2​+...+pn​*Xn​

Introduction to machine learning © Copyright IBM Corporation 2018


Logistic regression
• Supervised classification algorithm.
• Target: A dependent variable (Y) is a discrete category or a class (not a
continuous variable as in linear regression).
Example: Class1 = Cancer, Class2 = No Cancer

Introduction to machine learning © Copyright IBM Corporation 2018


Logistic regression (cont'd)
• Logistic regression is named for the function that is used at the core of
the algorithm.
• The logistic function (sigmoid function) is an S-shaped curve for data
discrimination across multiple classes. It can take any real value 0 – 1.

Logistic function

Introduction to machine learning © Copyright IBM Corporation 2018


Logistic regression (cont'd)

• The sigmoid function squeezes the input value between [0,1].


• Logistic regression equation:
Y = exp(p0+p1X)/(1+exp(p0+p1X))

Introduction to machine learning © Copyright IBM Corporation 2018


Logistic regression (cont'd)
• Example: Assume that the estimated values of p’s for a certain model
that predicts the gender from a person’s height are p0= -120 and p1 =
0.5.
• Class 0 represents female and class 1 represents male.
• To compute the prediction, use:
Y = exp(-120+0.5X)/(1+exp(-120+0.5X))
Y = 0.00004539
P(male|height=150) is 0 in this case.

Introduction to machine learning © Copyright IBM Corporation 2018


Support vector machine

• The goal is to find a separating hyperplane between positive and


negative examples of input data.
• SVM is also called a “large Margin Classifier”.
• The SVM algorithm seeks the hyperplane with the largest margin, that
is, the largest distance to the nearest sample points.

Introduction to machine learning © Copyright IBM Corporation 2018


Support vector machine (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree
• A supervised learning algorithm that uses a tree structure to model
decisions.
• It resembles a flow-chart or if-else cases.
• An example for applications is general business decision-making like
predicting customers’ willingness to purchase a given product in a
given setting, for example, online versus a physical store.

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree (cont'd)
A decision tree is built by making decisions regarding the following
items:
• Which feature to choose as the root node
• What conditions to use for splitting
• When to stop splitting

Introduction to machine learning © Copyright IBM Corporation 2018


Decision tree (cont'd)
• Using entropy and information gain to construct a decision tree.
• Entropy: It is the measure of the amount of uncertainty and
randomness in a set of data for the classification task.
• Information gain: It is used for ranking the attributes or features to
split at given node in the tree.
Information gain = (Entropy of distribution before the split)–(entropy of
distribution after it)

Introduction to machine learning © Copyright IBM Corporation 2018


K-mean clustering
• Unsupervised machine learning algorithm.
• It groups a set of objects in such a way that objects in the same group
(called a cluster) are more similar to each other than those in other
groups (other clusters).

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)
Examples for applications include customer segmentation, image
segmentation, and recommendation systems.

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)
• Example: Given the following data points, use K-means clustering to
partition data into two clusters.

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)

Find a new centroid by using

Iteration 1:
• Now, we calculate for each point to which center it belongs. The result
depends on the distance between the center and the point (by using
Euclidian distance):
Point 1: (1, 1) d11 = Yes d12 = No
This means point1(2,2) belongs to C1 and not C2 because it is closer to C1.
▪ Point 2: (2, 1) d21 = No, d22 = Yes
▪ Point 3: (4, 3) d31 = No, d32 = Yes
▪ Point 4: (5, 4) d41 = No, d42 = Yes
• Now, we calculate the new centroid as follows:
▪ C1 = (1, 1)
▪ C2 = 1/3 ((2, 1) + (4, 3) + (5, 4)) = (3.67, 2.67)

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)
Iteration 2:
• Point 1: (1, 1) d11 = Yes, d12 = No
• Point 2: (2, 1) d21 = Yes, d22 = No
• Point 3: (4, 3) d31 = No, d32 = Yes
• Point 4: (5, 4) d41 = No, d42 = Yes
Now, we calculate the new centroid as follows:
• C1 = 1/2 ((1, 1)+(2,1)) = (1.5,1)
• C2 = 1/2 ((4, 3) + (5, 4)) = (4,5, 3.5)

Introduction to machine learning © Copyright IBM Corporation 2018


K-means clustering (cont'd)

Introduction to machine learning © Copyright IBM Corporation 2018


What are neural networks

Introduction to machine learning © Copyright IBM Corporation 2018


Topics
What is machine learning
Machine learning algorithms
What are neural networks
What is deep learning
How to evaluate a machine learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Neural networks
• Machine learning models that are inspired by the structure of the human
brain.
• The human brain is estimated to have 100 billion neurons, and each
neuron is connected to up to 10,000 other neurons.

Introduction to machine learning © Copyright IBM Corporation 2018


Neural networks (cont'd)
• Artificial neural networks are collections of interconnected “neurons”
(called nodes) that work together to transform input data to output data.
• Each node applies a mathematical transformation to the data it
receives; it then passes its result to the other nodes in its path.
• Examples for applications:
▪ Object detection, tracking, and image and video analysis

▪ Natural language processing (for example, machine translation)

▪ Autonomous cars and robots

Introduction to machine learning © Copyright IBM Corporation 2018


Neural networks (cont'd)
• Three or more layers (an input layer, one or many hidden layers, and an
output layer)
• Neural network models can adjust and learn as data changes.

Introduction to machine learning © Copyright IBM Corporation 2018


Perceptron
• A single neuron model and originator for the neural network.
• Similar to linear classification, where each input has weight.
• One bias.

Introduction to machine learning © Copyright IBM Corporation 2018


Neural networks: Backpropagation
Backpropagation is an algorithm for training neural networks that has
many layers. It works in two phases:
• First phase: The propagation of inputs through a neural network to the
final layer (called feedforward).
• Second phase: The algorithm computes an error. An error value is
then calculated by using the wanted output and the actual output for
each output neuron in the network. The error value is propagated
backward through the weights of the network (adjusting the weights)
beginning with the output neurons through the hidden layer and to the
input layer (as a function of the contribution of the error).

Introduction to machine learning © Copyright IBM Corporation 2018


What is deep learning

Introduction to machine learning © Copyright IBM Corporation 2018


Topics
• What is machine learning
• Machine learning algorithms
• What are neural networks
• What is deep learning
• How to evaluate a machine learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Deep learning
• Similar to a traditional neural network, but it has many more hidden
layers.
• Deep learning has emerged now because of the following reasons:
▪ Emergence of big data, which requires data processing scaling.
▪ Improvement in processing power and the usage of GPUs to train neural
networks.
▪ Advancement in algorithms like the rectified linear unit (ReLU).

Introduction to machine learning © Copyright IBM Corporation 2018


Deep Learning
Applications:
• Multilayer perceptron (MLP): Classification and regression, for example, a
house price prediction.
• Convolutional neural network (CNN): For image processing like facial
recognition.
• Recurrent neural network (RNN): For one-dimensional sequence input
data. Like audio and languages.
• Hybrid neural network: Covering more complex neural networks, for
example, autonomous cars.

Introduction to machine learning © Copyright IBM Corporation 2018


How to evaluate a machine
learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Topics
• What is machine learning
• Machine learning algorithms
• What are neural networks
• What is deep learning
• How to evaluate a machine learning model

Introduction to machine learning © Copyright IBM Corporation 2018


Model evaluation
• Overfitting occurs when a machine learning model can fit the training
set perfectly and fails with unseen future data.
▪ Reason: Too many features are used or you are reusing training
samples in testing.
▪ Solution:
− Fewer features
− More data
− Cross-validation

Introduction to machine learning © Copyright IBM Corporation 2018


Model evaluation (cont'd)
• Underfitting occurs when a machine learning model cannot fit the
training data or generalize to new data.
▪ Reason: The model is using a simple estimator.
▪ Solution:
− More features
− Different estimator

Introduction to machine learning © Copyright IBM Corporation 2018


Model evaluation (cont'd)
• Cross-validation (CV) is a process to evaluate a model by dividing the
data set once or several times in training and testing.
• Hold-out method: Randomly splits the data set into a training set and
test set.
• K-fold cross validation: Splits data into K subsamples where each
subsample gets a chance to be the validation set, and K-1 is the
training set.
• Leave one out cross validation (LOO-CV): Similar to K-fold except
that one subsample that contains one data point is held out, and the
rest of data is used for training.

Introduction to machine learning © Copyright IBM Corporation 2018


Unit summary
• Explain what machine learning is.
• Describe machine learning types and approaches.
• List different machine learning algorithms.
• Explain what neural networks and deep learning are, and why they are
important in today’s AI field.
• Explain how to evaluate your machine learning model.

Introduction to machine learning © Copyright IBM Corporation 2018

You might also like