0% found this document useful (0 votes)
43 views49 pages

Chap-6 Machine Learning Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views49 pages

Chap-6 Machine Learning Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Machine Learning

Machine Learning
Introduction

1
Machine Learning Introduction

Machine learning is a field of computer science that gives


computer systems the ability to "learn" with data, without
Math & being explicitly programmed. The name Machine learning was
coined in 1959 by Arthur Samuel.
Statistics
Traditional Traditional
Software Research
Unicorn
Subject
Computer Machine
Learning Matter
Science
Expertise

2
Machine Learning Introduction

• ML is a system which can do automatic acquisition and


integration of knowledge.
• It is that branch of artificial intelligence that deals with
the construction of systems that can learn from data
• Develop methods that can automatically detect patterns
in data, and then to use these patterns to predict future
data
• Machine learning can predict
the future based on the past
• Computer programs that
automatically improve their
performance through experience

3
Why Machine Learning?

 Automatically adapt and customize to individual users.


• Personalized news, mail filters, movie/book
recommendation
 Discover new knowledge from huge amount of data
• Market analysis
 Perform repetitive monotonous tasks of humans which
require intelligence and experience
• Recognize signatures or handwritten characters
• Driving a car, flying a plane
 Rapidly changing phenomenon
• Credit scoring, financial modeling, diagnosis, fraud
detection
 No human experts industrial/manufacturing control,
mass spectrometer analysis, drug design

4
Machine Learning
Concepts & Dimensions of
Machine Learning

5
Concepts of Learning
 Learning = Improve Task “T” with respect to performance
measure “P” based on experience “E”

 Example: Spam Filtering


• T: Identify Spam emails
• P: % of Spam emails filtered correctly, % of non-Spam
emails that were filtered incorrectly (false positives)
• E: Database of emails labelled manually by users

 A checkers learning problem:


• Task T: playing checkers
• Performance measure P: percent of games won against
opponents
• Training experience E: playing practice games against
itself
We can specify many learning problems in this fashion, such
as learning to recognize handwritten words, or learning to
drive a robotic automobile autonomously

6
Concepts of Learning
 Example: Signature matching
• T: Determine if signature belongs to correct person
• P: % of signatures that were correctly matched, %
of valid signatures that were incorrectly labelled as
not matching
• E: Database of signatures known to be of that
person

7
Dimensions Of Learning Systems

 What is Dimension of Learning?


• Dimensions of Learning is a comprehensive model
that uses what researchers and theorists know about
learning to define the learning process. There are five
types of thinking – what we call the five dimensions
of learning- are essential to successful learning.

 Dimension 1: Attitudes and Perceptions


• Attitudes and perceptions affect students’ abilities to
learn. For example, if students view the classroom as
an unsafe and disorderly place, they will likely learn
little there.

8
Dimensions Of Learning Systems

 Dimension 2: Acquire and Integrate Knowledge


• When students are learning new information, they
must be guided in relating the new knowledge to
what they already know, organising that information,
and then making it part of their long-term memory.

 Dimension 3: Extend and Refine Knowledge


• Learning does not stop with acquiring and
integrating knowledge. Learners develop in-depth
understanding through the process of extending and
refining their knowledge

9
Dimensions Of Learning Systems

 Dimension 4: Use Knowledge Meaningfully


• The most effective learning occurs when we use
knowledge to perform meaningful tasks. For
example, we might initially learn about tennis rackets
by talking to a friend or reading a magazine article
about them.

 Dimension 5: Habits of Mind


• The most effective learners have developed powerful
habits of mind that enable them to think critically,
think creatively, and regulate their behaviour.

10
Machine Learning
The Learning Process

11
12
The learning Process

 Normalization is used when the data doesn't have Gaussian distribution whereas
Standardization is used on data having Gaussian distribution.
 Normalization scales in a range of [0, 1] or [-1, 1]. Standardization is not bounded by range.
 Normalization is highly affected by outliers. Standardization is slightly affected by outliers.
 Normalization is considered when the algorithms do not make assumptions about the data
distribution. Standardization is used when algorithms make assumptions about the data
distribution. 13
The learning Process

 Normalization is used when the data doesn't have Gaussian distribution whereas
Standardization is used on data having Gaussian distribution.
 Normalization scales in a range of [0, 1] or [-1, 1]. Standardization is not bounded by range.
 Normalization is highly affected by outliers. Standardization is slightly affected by outliers.
 Normalization is considered when the algorithms do not make assumptions about the data
distribution. Standardization is used when algorithms make assumptions about the data
distribution. 14
The learning Process

Bootstrap aggregating, also called


bagging, is a machine learning ensemble
meta-algorithm designed to improve the
stability and accuracy of machine
learning algorithms used in statistical
classification and regression.

15
Bagging The learning Process
Bagging attempts to reduce the chance overfitting
complex models.

• It trains a large number of "strong" learners in


parallel.
• A strong learner is a model that's relatively
unconstrained.
• Bagging then combines all the strong learners
together in order to "smooth out" their predictions.
• Example – Random Forest Classifier.
Boosting
Boosting attempts to improve the predictive flexibility
of simple models. Bootstrap aggregating, also called
• It trains a large number of "weak" learners in bagging, is a machine learning ensemble
sequence. meta-algorithm designed to improve the
• A weak learner is a constrained model (i.e., you stability and accuracy of machine
could limit the max depth of each decision tree). learning algorithms used in statistical
• Each one in the sequence focuses on learning from classification and regression.
the mistakes of the one before it.
• Boosting then combines all the weak learners into a
single strong learner.
• Example – Boosted Tree Classifier 16
Hyper Parameters and Model Parameters
What are Hyper Parameters?
So far, we’ve been casually talking about “tuning” models. When we talk of tuning
models, we specifically mean tuning hyper-parameters.
There are two types of parameters in machine learning algorithms. The key distinction is
that model parameters can be learned directly from the training data while hyper-
parameters cannot. Hyper-parameters express “higher-level” structural settings for
algorithms. They are decided before fitting the model because they can’t be learned from
the data.
• As Example: Strength of the penalty used in Regularized Regression
• As Example: The number of trees to include in a Random Forest
What are Model Parameters?
Model parameters are learned attributes that define individual models. They can be
learned directly from the training data.
• As Example: Regression coefficients
• As Example: Decision Tree split locations
17
18
19
Machine Learning Steps
Training Learn
Data Model
(past)

Testing Predict
Data Model
(future)

 Steps:
• Gather Data from various sources
• Clean data to have homogeneity
• Build model (select the right Machine Learning
algorithm)
• Gather insights from the model’s results
• Visualize – transform results into visual graphs

20
Performance Evaluation

• Randomly split examples into training set U and also test


set V.
• Use training set to learn a hypothesis H.
• Measure % of V correctly classified by H.
• Repeat for different random splits and average results.

21
Problems - Overfitting & Underfitting

• Overfitting – the model learns the training set too well.


It overfits to the training set and performs poorly on the
test set

• Underfitting – when the model is too simple, both


training and test errors are large

22
Machine Learning
Categorization of Machine
Learning

23
Categorization of Machine Learning
Machine
Learning

Supervised Unsupervised Reinforcement

• Task Driven, labelled • Data • Reward feedback


(Regression/Classification). Driven, based
unlabeled
• Input variables X and (Clustering) • Algorithm learns to
output variable y. y = f(X). react to an
• No correct environment.
• Algorithm learns from answers, no Automatically
training data set. Iteratively teacher, no determine ideal
make predictions which are labelling behavior within a
corrected (teaching). context.
• Algorithm
• Learning stops when looks for
performance reaches patterns in
acceptable level. data.
24
Data Set for Supervised Learning

25
Data Set for Supervised Learning

The most obvious visual


difference between cupcakes
and muffins is, of course, the
frosting. Cupcakes are topped
with creamy, delicious frosting.
Instead, muffins may have a
sugared top or a very thin glaze.
Usually the fillings inside the
muffins add enough excitement
to the baked good, so there
might not be anything on top.

26
Data Set for Unsupervised Learning

27
Machine Learning Coordinates

Supervised Learning Unsupervised Learning

Discrete
Classification or Categorization Clustering

Continuous
Regression Dimensionality Reduction

28
Machine Learning Coordinates

Supervised Learning Unsupervised Learning

Discrete
Classification or Categorization Clustering

Continuous
Regression Dimensionality Reduction

In statistics, machine learning, and


information theory, dimensionality
reduction or dimension reduction is the
process of reducing the number of random
variables under consideration by obtaining a
set of principal variables. It can be divided
into feature selection and feature extraction
29
Classification & Clustering

• Classification is the problem of identifying to which of a


set of categories (classes) a new observation belongs, on
the basis of a training set of data containing
observations (or instances) whose category membership
is known.

• Classification is considered an instance of supervised


learning, i.e. learning where a training set of correctly
identified observations are available.

• The corresponding unsupervised procedure is known as


clustering, and involves grouping data into categories
based on a measure of similarity or distance.

30
Classification & Clustering

• Example: This is an example of a classification problem


where there are two classes: low-risk and high-risk
customers. The information about a customer makes up
the input to the classifier whose task is to assign the
input to one of the two classes. After training with the
past data, a classification rule learned may be
of the form

Savings
Low-Risk
IF Income > Ѳ1
AND Savings > Ѳ2
THEN
High-Risk
Low-Risk
ELSE
High-Risk
Income

31
Regression
• A regression problem is when the output variable is a real
value, such as “dollars” or “weight” instead of a class.
• Estimate the relationship between a dependent variable and
one or more independent variables (or 'predictors’)

Sales figures for a television


model can depend on several

y: sales figures
x
factors like screen size, display x y‘ = w0 + w1x1
x
type, brand, resolution, x x
x
technology etc. x x
Here, we consider just one
attribute, screen size and plot
the corresponding prices
x: screen size
y: sales figures
We try to find the relation X: screen size
(function) that best matches
these values

32
Dimensionality Reduction
• Dimensionality reduction is the process of reducing the
number of random variables under consideration by
obtaining a set of principal variables. It can be divided into
feature selection and feature extraction.

• Feature selection approaches try to find a subset of the


original variables (also called features or attributes). It is
about choosing some of features based on some statistical
score.

• Feature extraction transforms the data in the high-


dimensional space to a space of fewer dimensions. It is using
techniques to extract some second layer information from
the data e.g. interesting frequencies of a signal using Fourier
transform.

• Dimensionality reduction helps in data compression, reduces


computation time and removes redundant features.

33
Machine Learning
Machine Learning
Applications

34
Applications of Machine Learning

35
Classification Applications

 Face Recognition
• Identify or verify a person from a digital image or a
video frame

 Character Recognition

 Spam detection

 Medical Diagnosis
• Determine which disease or condition explains a
person's symptoms and signs.

 Biometrics
• Authentication using physical and/or behavioral
characteristics: Face, iris, signature, etc

36
Regression Applications

 Economics/Finance: predict the value of a stock

 Epidemiology
• incidence, distribution, and possible control of
diseases and other factors relating to health

 Car/plane navigation: angle of the steering wheel,


acceleration

 Temporal trends: weather over time

37
Manufacturing & Retail Industries

Manufacturing

• Predictive maintenance or condition monitoring


• Warranty reserve estimation
• Demand forecasting
• Process optimization
• Telematics

Retail
• Predictive inventory planning
• Recommendation engines
• Upsell & cross-channel marketing
• Market segmentation & targeting
• Customer ROI & lifetime value

38
Healthcare & Life Science & Travel & Hospitality

Healthcare & Life Science

• Alerts & diagnostics from real-time patient data


• Disease identification & risk stratification
• Patient triage optimization
• Proactive health management
• Healthcare provider sentiment analysis

Travel & Hospitality

• Aircraft scheduling
• Dynamic pricing
• Social media – consumer feedback & interaction analysis
• Customer complaint resolution
• Traffic patterns & congestion management

39
Financial Services & Energy, Feedstock & Utilities

Financial Services

• Risk analytics & regulation


• Customer Segmentation
• Cross-selling & up selling
• Sales & marketing Campaign management
• Credit Worthiness evaluation

Energy, Feedstock & Utilities

• Power usage analytics


• Seismic data processing
• Carbon emissions and trading
• Customer-specific pricing
• Smart grid management
• Energy demand & supply optimization

40
Machine Learning
Machine Learning
Algorithms

41
Machine Learning Algorithms
Decision Trees
• A decision tree takes as input an object or situation
described by a set of properties and outputs a yes/no
decision.
• Each decision node tests the value of an input attribute.
• Branches from the node are all possible values of the
attribute
• Leaf nodes supply the value (Yes/No) to be returned if
that leaf is reached.
 Criteria used to choose the best nodes to build the most
precise decision tree:
• Entropy - degree of disorganization in our data.
Entropy is 1 when collection has equal no. of
positive and negative examples
• Information Gain is used to determine the
goodness of a split. The attribute with the most
entropy reduction is chosen.

43
Example: Decision Tree
• Decision Tree on whether to buy a Mobile Phone or not

Price of Mobile

<10000 >10000

Size of screen Technology


<7” >=7” Android IOS Other

No Yes Yes No No

44
Support Vector Machines (SVM)
Y

Support Vectors

X
• Map data to higher-dimensional space where they will
be linearly separable.
• Algorithm - plot each data item as a point in n-
dimensional space (where n is number of features) with
the value of each feature being the value of a particular
coordinate.
• Then perform classification into 2 classes by finding the
hyper-plane that differentiates the classes very well
• Support Vectors are the co-ordinates of the individual
observations. Support Vector Machine is a frontier
which best segregates the two classes (hyper-plane/
line).
45
Bayesian Networks

• Compute probability distribution for unknown variables


given observed values of other variables.
• Start with a belief, called a prior
• Obtain some data and use it to update the belief. The
outcome is called a posterior.
• Should we obtain even more data, the old posterior
becomes a new prior and the cycle repeats.

 Obeys Bayes rule:


• P (A | B) = P (B | A) * P( A ) / P( B )
• P (A | B) is conditional probability, how likely is A if
B happens?

46
K Nearest Neighbor Model (k-NN)
• Idea: Properties of an input x are likely to be similar to
those of points in the neighborhood of x
• Find (k) nearest neighbor(s) of x and infer target
attribute value(s) of x based on corresponding attribute
value(s).
• In k-NN classification, the output is a class membership.
An object is assigned to the class most common among
its k nearest neighbors.
• In k-NN regression, the output is the property value for
the object. This value is the average of the values of its k
nearest neighbors.
• To determine which of the K instances in the training
dataset are most similar to a new input a distance
measure is used. There can be various types of distance
measures like Euclidean, Hamming, Manhattan etc.

47
Ensemble Learning

• Use multiple models to obtain better predictive


performance than could be obtained from any of the
individual constituent models

• Boosting – incrementally build an ensemble by training


each new model instance to emphasize the training
instances that previous instances misclassified

48
Deep Learning (Neural Networks)
 Subset of machine learning and covers all three
paradigms using artificial neural networks (ANNs)

 ANNs are composed of multiple nodes that imitate the


biological neurons of the human brain
• Neurons are connected by links and interact with
each other
• Nodes can take input data and perform simple
operations on the data. They pass the results to
other neurons.
• The output at each node is called its ”activation
value” or ”node value”
• Each link is associated with a weight

 ANNs “learn” by altering the link weight values

 Convolutional neural networks are specialized to read


images as input, so are used for image recognition

49

You might also like