Module 1 ML
Module 1 ML
Techniques
Instructor: Dr. Rupak Chakraborty
[email protected]
Module 1
Introduction
Outline
• Definition - Types of Machine Learning - Examples of Machine
Learning Problems - Training versus Testing - Characteristics of
Machine learning tasks - Predictive and descriptive tasks.
• Machine learning Models: Geometric Models, Logical Models,
Probabilistic Models.
• Features: Feature types – Feature Construction and Transformation -
Feature Selection.
What is Learning?
“Learning is any process by which a system improves performance from
experience.”
- Herbert Simon
“The subfield of computer science that gives computers the ability to learn
without being explicitly programmed”.
- Arthur Samuel, 1959
Supervised
Unsupervised
Reinforcement
Types of Machine Learning Problems
Supervised
Unsupervised
Reinforcement
Types of Machine Learning Problems
Reinforcement
Types of Machine Learning Problems
Unsupervised
Reinforcement
Types of Machine Learning Problems
Reinforcement
RECAP
• What is ML?
• Applications of ML
• Types of ML
Today’s Topics
ML Task
Machine learning Models
• Models form the central concept in machine learning
• Training to recognize certain types of patterns or to solve a given task.
• Train a model over a set of data and provide it an algorithm that it can use to
reason over and learn from those data.
• instance space -- collection of all possible outcomes
• Types:
• Geometric models
• Probabilistic models
• Logical models
Machine learning models
• Machine learning models can be distinguished according to their main
intuition:
• Geometric models use intuitions from geometry such as separating (hyper-)
planes, linear transformations and distance metrics.
• Probabilistic models view learning as a process of reducing uncertainty,
modelled by means of probability distributions.
• Logical models are defined in terms of easily interpretable logical expressions.
Geometric Model
• Linear Model
• Distance-based Model
Geometric Models
• Geometric model is constructed directly in instance space, using geometric
concepts such as lines, planes and distances.
• Models that define similarity by considering the geometry of the instance
space (geometric classifier- linear classifier).
• Features could be described as points in two dimensions (x- and y-axis) or a
three-dimensional space (x, y, and z).
Geometric Model: Basic linear classifier
Geometric models:
1 Linear model:
• use geometric concepts like lines or planes to segment (classify) the instance
space-- Linear models.
• For example, in y = mx + c, m and c are the parameters that we are trying to
learn from the data.
The decision boundary learned by a support vector machine from the linearly
separable data from Figure. The decision boundary maximizes the margin, which
is indicated by the dotted lines. The circled data points are the support vectors.
Geometric Model: Distance-based
• Use geometric notion of distance to represent similarity.
• If the distance between two instances is small, then the
instances are similar in terms of their feature values, and
so nearby instances would be expected to receive the
same classification or belong to the same cluster.
• if two points are close together, they have similar values
for features and thus can be classed as similar.
• Distance is applied through the concept of neighbors and
exemplars.
Geometric Model: Distance-based
• Neighbours are points in proximity with respect to the distance
measure expressed through exemplars.
• Exemplars are either centroids ( find the centre of mass according to
a chosen distance metric-eg: arithmetic mean) or medoids (most
centrally located data point).
• Commonly used distance metrics:
• Euclidean – square root of the sum of the squared distances along each
coordinate
• Manhattan - sum the distances along each coordinate:
Logical Models
• Tree Based
• Rule Based
‘Bonus’
‘Bonus’
• These rules overlap for lottery=1 𝖠 Peter=1, for which they make
contradictory predictions. Furthermore, they fail to make any
predictions for lottery=0 𝖠 Peter=0
Overlapping Rules
The task of the algorithm is to look at the evidence and to determine the
likelihood of a specific class and assign a label accordingly to each entity.
• Naïve Bayes is an example of a probabilistic classifier.
• Based on the idea of Conditional Probability.
• Conditional probability is based on finding the probability that
something will happen, given that something else has already
happened.
Probability of A
(point will be inside
A) if we know that B
happens (point is
inside B)
P(A|B)=P(AB)/P(B)
Bayes Rule
• P(A|B)= P(AB)/P(B)
• P(B|A)= P(AB)/P(A)
P(AB)=P(B|A)*P(A)
P(A|B)=P(B|A)*P(A)/P(B)
52
Bayes Rule
prior likelihood
posterior
P C p x
P C | x
evidence
Prior: probability of a patient is high risk regardless
of x.
Knowledge we have as to the value of C before
looking at observables x
53
Bayes Rule
prior likelihood
posterior
P C p x
P C | x
evidence
Likelihood: probability that event in C will have
observable X
P(x1,x2|C=1) is the probability that a high-risk patient
has his X1=x1 ,X2=x2
54
Bayes Rule
prior likelihood
posterior
P C p x
P C | x
evidence
55
56
Bayes Rule
prior likelihood
posterior
P C p x
P C | x
evidence
P C 0 P C 1 1
p x p x | C 1P C 1 p x | C 0 P C 0
p C 0 | x P C 1 | x 1
57
Bayes Rule for classification
• Assume know : prior, evidence and likelihood
• Plug them in into Bayes formula to obtain P(C|x)
• Choose C=1 if P(C=1|x)>P(c=0|x)
58
• maximum a posteriori (MAP) decision rule:
Features Label
(Individual measurable properties) (Phenomenon Observed)
Size: 33 sqm
Location: Agartala, Tripura ₹4000K
Floor: 5th
Elevator: No
#Rooms: 2
……
Remember that behind “data” there are two very different notions, training examples and features.
What is Feature Engineering?
• Feature engineering usually includes, successively:
Feature construction
Feature transformation
Dimension reduction
a. Feature selection
b. Feature extraction
Feature Construction
Feature construction means turning raw data into informative features that
best represent the underlying problem and that the algorithm can understand.
2017-01-03 15:00:00 Predict how much hungry “Hours elapsed since last
someone is meal”: 2
Feature construction is
where you will need all
the domain expertise and
is key to the performance
of your model!
Feature Transformation
Feature transformation is the process of transforming a feature into a new one with
a specific function.
(left) Artificial data depicting a histogram of body weight measurements of people with (blue)
and without (red) diabetes, with eleven fixed intervals of 10 kilograms width each. (right) By joining
the first and second, third and fourth, fifth and sixth, and the eighth, ninth and tenth intervals, we
obtain a discretisation such that the proportion of diabetes cases increases from left to right. This
discretisation makes the feature more useful in predicting diabetes.
Feature Transformation: Non-linearly
separable data
(left) A linear classifier would perform poorly on this data. (right) By transforming
the original (x, y) data into (x’, y’) = (x2, y2), the data becomes more ‘linear’, and a
linear decision boundary x’+y’= 3 separates the data fairly well. In the original
space this corresponds to a circle with radius 3 around the origin.
Feature Transformation
Examples of transformations:
Dimension Reduction
• Dimension reduction is the process of reducing the number of
features used to build the model, with the goal of keeping only
informative, discriminative and non-redundant features.
• The main benefits are:
• Faster computations
• Less storage space required
• Increased model performance
• Data visualization (when reduced to 2D or 3D)
Dimension Reduction- Feature selection
• Feature selection is the process of selecting the most relevant
features among your existing features.