Machine_Learning
Machine_Learning
1
Machine Learning
Machine Learning:
Machine Learning (ML) is a branch of artificial
intelligence (AI) that focuses on developing algorithms
and statistical models that enable computers to learn
from and make decisions based on data without being
explicitly programmed.
2
Types of Machine Learning
Supervised Machine Learning:
Supervised learning is a type of machine learning where
the model is trained on labeled data. This means that for
each input, there is a corresponding output.
Unsupervised Machine Learning:
Unsupervised learning is a type of machine learning
where the model is trained on unlabeled data. The goal is
to uncover hidden patterns or structures within the data
without predefined labels.
3
Supervised Learning Process: Two steps
4
Supervised Learning
Supervised learning problems can be further grouped into
regression and classification problems:
Classification: Classification is a type of supervised learning task
in machine learning where the goal is to assign predefined
labels or categories to input data based on its features.
Decision Tree
K Nearest Neighbors
Logistic Regression
Linear Regression
6
7
Decision Tree
A Decision Tree (DT) defines a hierarchy of rules to make a
Root Node
prediction
Body
Warm temp. Cold
Gives No
Yes
birth
Mammal Non-mammal
5
NO YES
𝑥1 >3.5 ?
4 Test input
NO 𝑥 2> 2?
YES NO 𝑥 2> 3 ?
YES
3
2
Predict Predict Predict Predict
1 Red Green Green Red
1 2 3 4 5 6
Remember: Root node
Feature 1 ( contains all training
DT is very efficient at test time: To inputs
predict the label of a test point, nearest Each leaf node receives
neighbors will require computing a subset of training
distances from 48 training inputs. DT inputs
predicts the label by doing just 2
feature-value comparisons! Way faster!
K Nearest
Decision Neighbors
Trees for (KNN)
Classification
10
● Manhattan distance
1
KNN Methodology
● Let’s say we have a new instances called x.
● Algorithm will calculate distance between x
and all the instances in the training set.
1
KNN Methodology
Nearest Neighbor Classifiers
• Basic idea:
• If it walks like a duck, quacks like a duck, then it’s probably a duck
Compute
Distance Test Record
Rule of thumb:
K = sqrt(N)
N: number of training points X
KNN Methodology
earest-Neighbor Classifiers: Issues
The value of k, the number of nearest neighbors to retrieve
Choice of Distance Metric to compute the distance between records
Computational complexity
Size of training set
Dimension of data
Linear Regression Model
This is the base model for
all statistical machine
learning
x is a one-feature data
variable
y is the value we are
y w
trying w1 x
to0 predict
The regression model is
Two parameters to
estimate – the slope of
the line w1 and the y-
Solving the regression problem
We basically want to find
{w0, w1} that minimize
deviations from the predictor
line
How do we do it?
Iterate over all possible w
values along the two
dimensions?
Same, but smarter? [next
class]
No, we can do this in
closed form with just plain
calculus
Parameter estimation via calculus
We just need to set the
partial derivatives to zero (
full derivation)
Simplifying
Logistic Regression
• Logistic Regression is a statistical technique that predicts probability of a target
variable based on the independent features.
• It predicts the probability of occurrence of a class label. Based on these probabilities
the data points are labelled.
• Probability of an outcome(y) is calculated using sigmoid function S(x)=(1/(1+e-f(x))
which is then used to decide the class based on the threshold value.
• A threshold (or cut-off; commonly a threshold of 0.5 is used) is fixed, then
Class
1
Probability > threshold
0
Probability < threshold
Logistic Regression
● Logistic regression is very much similar to linear regression where the explanatory
variables(X) are combined with weights values to predict a target variable of binary
class(y).
● f(x) = a+bx here, f(x) can have values from -∞ to ∞
● log(p/(1-p)) = f(x)
○ Here, p is the probability that the event y occurs(Y=1) [range 0 to 1]
○ p/(1-p) is the odds ratio [range 0 to infinity]
○ log(p/(1-p)) is log of odds ratio (logit) [-∞ to ∞]
● log(p/(1-p)) = a+bx : log of p/(1-p) is linearly related to the features and can have
value between -∞ to ∞
Logistic Regression
Exponential of the logit and you have the odds for the two groups in
question.
p/(1-p) = ef(x) : Odds (range from 0 to infinity with values greater than 1
associated with an event being more likely to occur than to not occur and
values less than 1 associated with an event that is less likely to occur)
P(Y) = 1/(1+e-f(x)) : Sigmoid function calculates the probability
p(y) = 1/(1+e-(a+bx)) : If f(x) = 0 then p = 0.5 as f(x) increases, p
approaches 1 and as f(x) gets really small, p approaches 0.
Note - Logarithm or logit transformation is used to model the non-
linear relationship between Y and X by transforming Y.
Advantages of Supervised Learning
It allows you to be very specific about the definition
of the labels.
You can determine the number of classes you want
to have,
The input data is very well known and is labeled.
The results produced by the supervised method are
more accurate.
22
Unsupervised Learning
Unsupervised learning is where you only have input data(X) and
no corresponding learning is to model the underlying structure or
distribution in the data to learn more about the data.
23
Unsupervised Learning
Unsupervised learning problems can be further grouped into
clustering and association problems.
Clustering: Clustering is a technique in machine learning
and data analysis that involves grouping similar data points
based on certain criteria.
Association: The primary goal is to identify associations or
dependencies between variables without the need for
predefined labels or a target outcome. Association learning
is commonly used in data mining, market basket analysis,
and discovering patterns in transactional datasets.
24
Advantages of Unsupervised Learning
Less complexity in comparison with supervised learning.
25
Unsupervised Learning
List of common supervised machine learning algorithms:
K-means clustering
Dimensionality Reduction
26
K-means clustering
K-means clustering is an algorithm to classify or group the
objects based on features into K number of groups.
2 each
2 the 2
1
objects
1
0
cluster 1
0
0
0 1 2 3 4 5 6 7 8 9 10 to
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10
most
similar reassign reassign
center 10 10
K=2 9 9
8 8
Arbitrarily choose 7 7
K object as initial
6 6
5 5
2
the 3
1 cluster 1
0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10
K-means clustering Method
The K-Means Clustering Method
Given: {2,4,10,12,3,20,30,11,25}, k=2
Randomly assign means: m1=3,m2=4
K1={2,3}, K2={4,10,12,20,30,11,25},
m1=2.5,m2=16
K1={2,3,4},K2={10,12,20,30,11,25},
m1=3,m2=18
K1={2,3,4,10},K2={12,20,30,11,25},
m1=4.75,m2=19.6
K1={2,3,4,10,11,12},K2={20,30,25},
m1=7,m2=25
Stop as the clusters with these means are the same.
Dimensionality Reduction
Dimensionality reduction is a technique used in machine learning
and data analysis to reduce the number of features or variables in
a dataset while preserving its essential information.
31
Types of Dimensionality Reduction
1. Feature Selection:
Feature selection involves choosing a subset of the most relevant
features from the original set. This is done by evaluating the
importance of each feature based on certain criteria, such as
statistical tests, information gain, or correlation analysis.
35