0% found this document useful (0 votes)
13 views

Module 1 ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Module 1 ML

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Machine Learning

Techniques
Instructor: Dr. Rupak Chakraborty
[email protected]
Module 1
Introduction
Outline
• Definition - Types of Machine Learning - Examples of Machine
Learning Problems - Training versus Testing - Characteristics of
Machine learning tasks - Predictive and descriptive tasks.
• Machine learning Models: Geometric Models, Logical Models,
Probabilistic Models.
• Features: Feature types – Feature Construction and Transformation -
Feature Selection.
What is Learning?
“Learning is any process by which a system improves performance from
experience.”
- Herbert Simon

“The subfield of computer science that gives computers the ability to learn
without being explicitly programmed”.
- Arthur Samuel, 1959

“A computer program is said to learn from experience E with respect to


some class of tasks T and performance measure P if its performance at tasks
in T, as measured by P, improves with experience E.”
- Tom Mitchell, 1997
When Do We Use Machine Learning?
• ML is used when:
• Human expertise does not exist (e.g. navigating on Mars)
• Humans can’t explain their expertise (e.g. speech recognition)
• Models must be customized (e.g. personalized medicine)
• Models are based on huge amounts of data(e.g. genomics)
Sample Applications
• Web search
• Computational biology
• Finance
• E-commerce
• Space exploration
• Robotics
• Information extraction
• Social networks
• Debugging software
• [Your favorite area]
State of the Art Applications: Autonomous Cars

• Nevada made it legal for autonomous cars to drive on


roads in June2011.
• As of 2013, four states (Nevada, Florida, California, and
Michigan )have legalized autonomous cars.
“Uber suspends its self-
driving car tests after one of
its cars flips over.”
- Mar. 27, 2017, Consumer Affairs
“Tesla car was on Autopilot
when it hit a Culver City
firetruck”
-SEP. 3, 2019, Los Angeles Times
State of the Art Applications: Healthcare
Types of Machine Learning Problems

Supervised

Unsupervised

Reinforcement
Types of Machine Learning Problems

Supervised • Given: training data + desired outputs (labels)


• Learn through examples of which we know the
desired output (what we want to predict).
Unsupervised • Is this a cat or a dog?
• Are these emails spam or not?
• Predict the market value of houses, given the square
Reinforcement
meters, number of rooms, neighborhood, etc.
Types of Machine Learning Problems

Supervised

Unsupervised

Reinforcement
Types of Machine Learning Problems

Supervised • Given: training data(without desired outputs)


• There is no desired output. Learn something about
the data. Latent relationships.
• I have photos and want to put them in 20 groups.
Unsupervised
• I want to find anomalies in the credit card usage
patterns of my customers.

Reinforcement
Types of Machine Learning Problems

Supervised • Useful for learning structure in the data


(clustering), hidden correlations, reduce
dimensionality, etc.

Unsupervised

Reinforcement
Types of Machine Learning Problems

Supervised • Rewards from sequence of actions


• An agent interacts with an environment and watches
the result of the interaction.
• Environment gives feedback via a positive or negative
Unsupervised reward signal.

Reinforcement
RECAP
• What is ML?
• Applications of ML
• Types of ML

Today’s Topics

• Steps to Solve a Machine Learning Problem


• Training and Testing (Practical Example in Python)
• Predictive and Descriptive Tasks
Steps to Solve a Machine Learning Problem
Learning system model
Training and Testing
• Example:
IRIS Dataset
Characteristics of Machine learning tasks
• The most common machine learning tasks are predictive, in the sense
that they concern predicting a target variable from features.
• Binary and multi-class classification: categorical target
• Regression: numerical target
• Descriptive tasks are concerned with exploiting underlying structure
in the data.

ML Task
Machine learning Models
• Models form the central concept in machine learning
• Training to recognize certain types of patterns or to solve a given task.
• Train a model over a set of data and provide it an algorithm that it can use to
reason over and learn from those data.
• instance space -- collection of all possible outcomes
• Types:
• Geometric models
• Probabilistic models
• Logical models
Machine learning models
• Machine learning models can be distinguished according to their main
intuition:
• Geometric models use intuitions from geometry such as separating (hyper-)
planes, linear transformations and distance metrics.
• Probabilistic models view learning as a process of reducing uncertainty,
modelled by means of probability distributions.
• Logical models are defined in terms of easily interpretable logical expressions.
Geometric Model
• Linear Model
• Distance-based Model
Geometric Models
• Geometric model is constructed directly in instance space, using geometric
concepts such as lines, planes and distances.
• Models that define similarity by considering the geometry of the instance
space (geometric classifier- linear classifier).
• Features could be described as points in two dimensions (x- and y-axis) or a
three-dimensional space (x, y, and z).
Geometric Model: Basic linear classifier
Geometric models:
1 Linear model:
• use geometric concepts like lines or planes to segment (classify) the instance
space-- Linear models.
• For example, in y = mx + c, m and c are the parameters that we are trying to
learn from the data.

 (predictedi – actuali)2 =  (residuei)2


Geometric Model: Support Vector Machine

The decision boundary learned by a support vector machine from the linearly
separable data from Figure. The decision boundary maximizes the margin, which
is indicated by the dotted lines. The circled data points are the support vectors.
Geometric Model: Distance-based
• Use geometric notion of distance to represent similarity.
• If the distance between two instances is small, then the
instances are similar in terms of their feature values, and
so nearby instances would be expected to receive the
same classification or belong to the same cluster.
• if two points are close together, they have similar values
for features and thus can be classed as similar.
• Distance is applied through the concept of neighbors and
exemplars.
Geometric Model: Distance-based
• Neighbours are points in proximity with respect to the distance
measure expressed through exemplars.
• Exemplars are either centroids ( find the centre of mass according to
a chosen distance metric-eg: arithmetic mean) or medoids (most
centrally located data point).
• Commonly used distance metrics:
• Euclidean – square root of the sum of the squared distances along each
coordinate
• Manhattan - sum the distances along each coordinate:
Logical Models
• Tree Based
• Rule Based
‘Bonus’

Logical Models: A feature tree

(a) A feature tree combining two Boolean features. Each


internal node or split is labelled with a feature, and each (a)
edge emanating from a split is labelled with a feature
value. Each leaf therefore corresponds to a unique
combination of feature values. Also indicated in each
leaf is the class distribution derived from the training set.

(b) A feature tree partitions the instance space into


rectangular regions, one for each leaf. We can clearly see
(b)
that the majority of ham lives in the lower left-hand
corner.
‘Bonus’
Labelling a feature tree
• The leaves of the tree could be labelled, from left to right, as ham –
spam – spam, employing a simple decision rule called majority class.
• Alternatively, we could label them with the proportion of spam e-mail
occurring in each leaf: from left to right, 1/3, 2/3, and 4/5.
• Or, if our task was a regression task, we could label the leaves with
predicted real values or even linear functions of some other, real-
valued features
A complete feature tree
‘Bonus’

‘Bonus’

(left) A complete feature tree built from two Boolean features.


(right) The corresponding instance space partition is the finest partition that can be
achieved with those two features.
Formulation of rules from a tree
For each path from the root to a leaf:
‘Bonus’
• Collect all comparisons from the
intermediate nodes.
• Join the comparisons using AND/OR.
• Use majority class from the leaf as
decision.
‘Bonus’
Overlapping Rules
• Consider the following rules:
If lottery= 1 then Class=Y=spam If
Peter= 1 then Class=Y=ham

• These rules overlap for lottery=1 𝖠 Peter=1, for which they make
contradictory predictions. Furthermore, they fail to make any
predictions for lottery=0 𝖠 Peter=0
Overlapping Rules

• The effect of overlapping rules in instance space. The


two rules make contradictory predictions in the top
right-hand corner, and no prediction at all in the bottom
left-hand corner.
Probabilistic Models
• Let A denote the variables we know about, e.g., our instance’s
feature values; and let B denote the target variables we’re
interested in, e.g., the instance’s class.
• Model the relationship between A and B – By representing and
manipulating the level of uncertainty with respect to these variables –
through probability distribution
• Bayes’ rule
• Given the following dataset containing four symptoms and one class Flu,
Apply Naïve Bayes classifier to determine the diagnosis class for the
symptoms <chills-Yes, runny nose-Yes, headache-Strong, fever-
No>.
Chills Runny nose headache fever Flu?
Yes No Mild Yes No
Yes Yes No No Yes
Yes No Strong Yes Yes
No Yes Mild Yes Yes
No No No No No
No Yes Strong Yes Yes
No Yes Strong No No
Yes Yes Mild Yes Yes

The task of the algorithm is to look at the evidence and to determine the
likelihood of a specific class and assign a label accordingly to each entity.
• Naïve Bayes is an example of a probabilistic classifier.
• Based on the idea of Conditional Probability.
• Conditional probability is based on finding the probability that
something will happen, given that something else has already
happened.

Probability of A
(point will be inside
A) if we know that B
happens (point is
inside B)

P(A|B)=P(AB)/P(B)
Bayes Rule
• P(A|B)= P(AB)/P(B)
• P(B|A)= P(AB)/P(A)
P(AB)=P(B|A)*P(A)

P(A|B)=P(B|A)*P(A)/P(B)

52
Bayes Rule
prior likelihood
posterior
P C p x
P C | x  

evidence
 Prior: probability of a patient is high risk regardless
of x.
 Knowledge we have as to the value of C before
looking at observables x

53
Bayes Rule
prior likelihood
posterior
P C p x
P C | x  

evidence
 Likelihood: probability that event in C will have
observable X
 P(x1,x2|C=1) is the probability that a high-risk patient
has his X1=x1 ,X2=x2

54
Bayes Rule
prior likelihood
posterior
P C p x
P C | x  

evidence

• Evidence: P(x) probability that observation x is seen


regardless if positive or negative

55
56
Bayes Rule
prior likelihood
posterior
P C p x
P C | x  

evidence

P C  0  P C  1  1
p x   p x | C  1P C  1 p x | C  0 P C  0 
p C  0 | x  P C  1 | x   1

57
Bayes Rule for classification
• Assume know : prior, evidence and likelihood
• Plug them in into Bayes formula to obtain P(C|x)
• Choose C=1 if P(C=1|x)>P(c=0|x)

58
• maximum a posteriori (MAP) decision rule:

• maximum likelihood (ML) decision rule:


Types of probabilistic models
• Predictive probability models use the idea of
a conditional probability distribution P (Y |X) from
which Y can be predicted from X.

• Generative models estimate the joint


distribution P (Y, X) – Once joint distribution is
known, then we can derive any conditional or
marginal distribution involving the same variables
Conclusion
• Geometric models use the idea of distance to
classify entities
• Logical models use logical expression to
partition the instance space
• Probabilistic models use the idea of probability
to classify new entities.
Features: The workhorses of machine learning
• Feature can be thought of as a kind of measurement that can be easily
performed on any instance.
• Mathematically, they are functions that map from the instance space to
some set of feature values called the domain of the feature.
• Since measurements are often numerical, the most common feature
domain is the set of real numbers.
• Other typical feature domains include the set of integers, for instance when
the feature counts something, such as the number of occurrences of a
particular word; the Booleans, if our feature is a statement that can be true
or false for a particular instance, such as “this e-mail is addressed to Peter
Flach”; and arbitrary finite sets, such as a set of colours, or a set of shapes.
Steps to Solve a Machine Learning Problem
What is Feature?
“A feature is an individual measurable property of a phenomenon being observed.”

Example: Predict the Price of an Apartment

Features Label
(Individual measurable properties) (Phenomenon Observed)
Size: 33 sqm
Location: Agartala, Tripura ₹4000K
Floor: 5th
Elevator: No
#Rooms: 2
……

The number of features you will be using is called the dimension.


What is Feature Engineering?
• Feature engineering is the process of transforming raw data into
relevant features, i.e. that are: -
• Informative (it provides useful data for your model to correctly predict the
label)
• Discriminative (it will help your model make differences among your training
examples)
• Non-redundant (it does not say the same thing than another feature),

resulting in improved model performance on unseen data.


What is Feature Engineering?
After feature engineering, your dataset will be a big matrix of numerical values

Remember that behind “data” there are two very different notions, training examples and features.
What is Feature Engineering?
• Feature engineering usually includes, successively:
Feature construction
Feature transformation
Dimension reduction
a. Feature selection
b. Feature extraction
Feature Construction
Feature construction means turning raw data into informative features that
best represent the underlying problem and that the algorithm can understand.

Example: Decompose a Date-Time

Same raw data Different problems Different features

2017-01-03 15:00:00 Predict how much hungry “Hours elapsed since last
someone is meal”: 2

2017-01-03 15:00:00 Predict the likelihood of a “Night”: 0 (numerical value for


burglary “False”)
Feature Construction

Feature construction is
where you will need all
the domain expertise and
is key to the performance
of your model!
Feature Transformation
Feature transformation is the process of transforming a feature into a new one with
a specific function.

• What: Map feature values to new set of values

• Why: Have data in format suitable for analysis

• Caveat: To take care not to filter out important characteristics of


data
Feature Transformation
Feature Transformation: Scaling
Scaled Values
[0-1]

Weight (in lb)


Height (in Feet)
Feature Transformation: Class-sensitive
discretisation

(left) Artificial data depicting a histogram of body weight measurements of people with (blue)
and without (red) diabetes, with eleven fixed intervals of 10 kilograms width each. (right) By joining
the first and second, third and fourth, fifth and sixth, and the eighth, ninth and tenth intervals, we
obtain a discretisation such that the proportion of diabetes cases increases from left to right. This
discretisation makes the feature more useful in predicting diabetes.
Feature Transformation: Non-linearly
separable data

(left) A linear classifier would perform poorly on this data. (right) By transforming
the original (x, y) data into (x’, y’) = (x2, y2), the data becomes more ‘linear’, and a
linear decision boundary x’+y’= 3 separates the data fairly well. In the original
space this corresponds to a circle with radius 3 around the origin.
Feature Transformation
Examples of transformations:
Dimension Reduction
• Dimension reduction is the process of reducing the number of
features used to build the model, with the goal of keeping only
informative, discriminative and non-redundant features.
• The main benefits are:
• Faster computations
• Less storage space required
• Increased model performance
• Data visualization (when reduced to 2D or 3D)
Dimension Reduction- Feature selection
• Feature selection is the process of selecting the most relevant
features among your existing features.

• To keep “relevant” features only, we will remove features that are:


• Non informative
• Non discriminative
• Redundant
Dimension Reduction- Feature selection
Dimension Reduction- Feature selection
Dimension Reduction- Feature selection
Dimension Reduction- Feature selection

Identifying the most relevant features will help you get


a better general understanding of the drivers of the
phenomenon you are trying to predict.
END of
MODULE-1

You might also like