0% found this document useful (0 votes)
4 views

Machine_Learning

Machine Learning (ML) is a branch of artificial intelligence focused on algorithms that allow computers to learn from data. It includes supervised learning, where models are trained on labeled data, and unsupervised learning, which identifies patterns in unlabeled data. Common algorithms include Decision Trees, K Nearest Neighbors, and Linear Regression, each serving different purposes in classification and regression tasks.

Uploaded by

Prabhat Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Machine_Learning

Machine Learning (ML) is a branch of artificial intelligence focused on algorithms that allow computers to learn from data. It includes supervised learning, where models are trained on labeled data, and unsupervised learning, which identifies patterns in unlabeled data. Common algorithms include Decision Trees, K Nearest Neighbors, and Linear Regression, each serving different purposes in classification and regression tasks.

Uploaded by

Prabhat Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Machine Learning

1
Machine Learning
 Machine Learning:
 Machine Learning (ML) is a branch of artificial
intelligence (AI) that focuses on developing algorithms
and statistical models that enable computers to learn
from and make decisions based on data without being
explicitly programmed.

2
Types of Machine Learning
 Supervised Machine Learning:
 Supervised learning is a type of machine learning where
the model is trained on labeled data. This means that for
each input, there is a corresponding output.
 Unsupervised Machine Learning:
 Unsupervised learning is a type of machine learning
where the model is trained on unlabeled data. The goal is
to uncover hidden patterns or structures within the data
without predefined labels.
3
Supervised Learning Process: Two steps

 Learning (Training): Learn a model using


the training data

 Testing: Test the model using unseen test


data to access the model accuracy

4
Supervised Learning
Supervised learning problems can be further grouped into
regression and classification problems:
 Classification: Classification is a type of supervised learning task
in machine learning where the goal is to assign predefined
labels or categories to input data based on its features.

 Regression, like classification, is a supervised learning task.


However, the goal in regression is to predict a continuous
numeric value rather than discrete classes.
5
Supervised Learning
List of common supervised machine learning
algorithms:

 Decision Tree
 K Nearest Neighbors
 Logistic Regression
 Linear Regression

6
7
Decision Tree
 A Decision Tree (DT) defines a hierarchy of rules to make a
Root Node
prediction
Body
Warm temp. Cold

An Internal Node Non-mammal


A Leaf Node

Gives No
Yes
birth

Mammal Non-mammal

 Root and internal nodes test rules. Leaf nodes make


predictions
8
Learning Decision Tree with Supervision
 The basic idea is very simple

 Recursively partition the training data into homogeneous


regions Even though the rule
What do you mean within each group is
by “homogeneous” simple, we are able to
regions? learn a fairly
sophisticated model
overall (note in this
A homogeneous region example, each rule is
will have all (or a a simple
majority of) training horizontal/vertical
inputs with the classifier but the
same/similar outputs overall decision
boundary is rather
sophisticated)

 Within each group, fit a simple supervised learner (e.g., predict


Decision
Decision Tree
Trees for Classification
for Classification
9

5
NO YES
𝑥1 >3.5 ?
4 Test input

NO 𝑥 2> 2?
YES NO 𝑥 2> 3 ?
YES
3

2
Predict Predict Predict Predict
1 Red Green Green Red

1 2 3 4 5 6
Remember: Root node
Feature 1 ( contains all training
DT is very efficient at test time: To inputs
predict the label of a test point, nearest Each leaf node receives
neighbors will require computing a subset of training
distances from 48 training inputs. DT inputs
predicts the label by doing just 2
feature-value comparisons! Way faster!
K Nearest
Decision Neighbors
Trees for (KNN)
Classification
10

 KNN Classifier is a non-parametric and instance-based learning algorithm.


 Non-parametric makes no assumptions about the distribution of data and
thus avoids the risks of mistaking the underlying distribution of the data.
 Instance-based learning means that the algorithm doesn’t explicitly learn
any parameters.
 For classification, the algorithm obtains a majority vote between the K most
similar instances to a given “unseen” observation. K is a count.
 KNN is not suitable if the data is noisy and the target classes do not have clear
demarcation in terms of attribute values.
 The closest class will be identified using the distance measures like Euclidean
distance.
K Nearest Neighbors (KNN)
Distance
measures
● Euclidean distance between any two points:

● Manhattan distance

1
KNN Methodology
● Let’s say we have a new instances called x.
● Algorithm will calculate distance between x
and all the instances in the training set.

● Arrange these distances in increasing order.


● Find k nearest neighbors. If k = 3, then it will
select three nearest instances based on the
similarity measure.
● Use k neighbors to determine the class of x
using majority voting. (more than 1
instance in this case) of the closest
instances.

1
KNN Methodology
Nearest Neighbor Classifiers
• Basic idea:
• If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test Record

Training Choose k of the


Records “nearest” records
KNN Methodology
Value of K
• Choosing the value of k:
• If k is too small, sensitive to noise points
• If k is too large, neighborhood may include points from other classes

Rule of thumb:
K = sqrt(N)
N: number of training points X
KNN Methodology
earest-Neighbor Classifiers: Issues
 The value of k, the number of nearest neighbors to retrieve
 Choice of Distance Metric to compute the distance between records
 Computational complexity
 Size of training set
 Dimension of data
Linear Regression Model
 This is the base model for
all statistical machine
learning
 x is a one-feature data
variable
 y is the value we are
y w
trying  w1 x  
to0 predict
 The regression model is

 Two parameters to
estimate – the slope of
the line w1 and the y-
Solving the regression problem
 We basically want to find
{w0, w1} that minimize
deviations from the predictor
line

 How do we do it?
 Iterate over all possible w
values along the two
dimensions?
 Same, but smarter? [next
class]
 No, we can do this in
closed form with just plain
calculus
Parameter estimation via calculus
 We just need to set the
partial derivatives to zero (
full derivation)

 Simplifying
Logistic Regression
• Logistic Regression is a statistical technique that predicts probability of a target
variable based on the independent features.
• It predicts the probability of occurrence of a class label. Based on these probabilities
the data points are labelled.
• Probability of an outcome(y) is calculated using sigmoid function S(x)=(1/(1+e-f(x))
which is then used to decide the class based on the threshold value.
• A threshold (or cut-off; commonly a threshold of 0.5 is used) is fixed, then

Class
1
Probability > threshold
0
Probability < threshold
Logistic Regression
● Logistic regression is very much similar to linear regression where the explanatory
variables(X) are combined with weights values to predict a target variable of binary
class(y).
● f(x) = a+bx here, f(x) can have values from -∞ to ∞
● log(p/(1-p)) = f(x)
○ Here, p is the probability that the event y occurs(Y=1) [range 0 to 1]
○ p/(1-p) is the odds ratio [range 0 to infinity]
○ log(p/(1-p)) is log of odds ratio (logit) [-∞ to ∞]

● log(p/(1-p)) = a+bx : log of p/(1-p) is linearly related to the features and can have
value between -∞ to ∞
Logistic Regression
 Exponential of the logit and you have the odds for the two groups in
question.
 p/(1-p) = ef(x) : Odds (range from 0 to infinity with values greater than 1
associated with an event being more likely to occur than to not occur and
values less than 1 associated with an event that is less likely to occur)
 P(Y) = 1/(1+e-f(x)) : Sigmoid function calculates the probability
 p(y) = 1/(1+e-(a+bx)) : If f(x) = 0 then p = 0.5 as f(x) increases, p
approaches 1 and as f(x) gets really small, p approaches 0.
 Note - Logarithm or logit transformation is used to model the non-
linear relationship between Y and X by transforming Y.
Advantages of Supervised Learning
 It allows you to be very specific about the definition
of the labels.
 You can determine the number of classes you want
to have,
 The input data is very well known and is labeled.
 The results produced by the supervised method are
more accurate.
22
Unsupervised Learning
 Unsupervised learning is where you only have input data(X) and
no corresponding learning is to model the underlying structure or
distribution in the data to learn more about the data.

 These are called unsupervised learning because unlike supervised


learning there are no correct answers and there is no teacher.
Algorithms are left to their own devices to discover and present an
interesting structure in the data.

23
Unsupervised Learning
Unsupervised learning problems can be further grouped into
clustering and association problems.
 Clustering: Clustering is a technique in machine learning
and data analysis that involves grouping similar data points
based on certain criteria.
 Association: The primary goal is to identify associations or
dependencies between variables without the need for
predefined labels or a target outcome. Association learning
is commonly used in data mining, market basket analysis,
and discovering patterns in transactional datasets.

24
Advantages of Unsupervised Learning
 Less complexity in comparison with supervised learning.

 It is often easier to get unlabeled data.

 Takes place in real-time such that all the input data is to be


analyzed and labeled in the presence of learners.

25
Unsupervised Learning
List of common supervised machine learning algorithms:

 K-means clustering
 Dimensionality Reduction

26
K-means clustering
 K-means clustering is an algorithm to classify or group the
objects based on features into K number of groups.

 K is a positive integer number.

 The grouping is done by minimizing the sum of squares of


distances between data and the corresponding cluster
centroid.
27
K-means clustering Method

Given k, the k-means algorithm is implemented in four steps:


 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the clusters of the
current partition (the centroid is the center, i.e., mean point, of
the cluster)
 Assign each object to the cluster with the nearest seed point
 Go back to Step 2, stop when no more new assignment
28
K-means clustering Method
The K-Means Clustering Method
• Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
3

2 each
2 the 2

1
objects
1

0
cluster 1

0
0
0 1 2 3 4 5 6 7 8 9 10 to
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10

most
similar reassign reassign
center 10 10

K=2 9 9

8 8

Arbitrarily choose 7 7

K object as initial
6 6

5 5

cluster center 4 Update 4

2
the 3

1 cluster 1

0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10
K-means clustering Method
The K-Means Clustering Method
Given: {2,4,10,12,3,20,30,11,25}, k=2
 Randomly assign means: m1=3,m2=4
 K1={2,3}, K2={4,10,12,20,30,11,25},
m1=2.5,m2=16
 K1={2,3,4},K2={10,12,20,30,11,25},
m1=3,m2=18
 K1={2,3,4,10},K2={12,20,30,11,25},
m1=4.75,m2=19.6
 K1={2,3,4,10,11,12},K2={20,30,25},
m1=7,m2=25
 Stop as the clusters with these means are the same.
Dimensionality Reduction
 Dimensionality reduction is a technique used in machine learning
and data analysis to reduce the number of features or variables in
a dataset while preserving its essential information.

 The high dimensionality of a dataset (large number of features)


can lead to challenges such as increased computational
complexity, the curse of dimensionality, and difficulties in
visualizing or interpreting the data.

31
Types of Dimensionality Reduction
1. Feature Selection:
 Feature selection involves choosing a subset of the most relevant
features from the original set. This is done by evaluating the
importance of each feature based on certain criteria, such as
statistical tests, information gain, or correlation analysis.

 Common techniques for feature selection include filter methods


(e.g., based on statistical tests), wrapper methods (e.g., using the
performance of a specific model), and embedded methods (e.g.,
feature importance from tree-based models).
32
Types of Dimensionality Reduction
2. Feature Extraction:
 Feature extraction transforms the original features into a new set
of features, typically of lower dimensionality. This is achieved by
creating new features that capture the most important
information in the original data.
 Principal Component Analysis (PCA) is a popular linear technique
for feature extraction. It identifies orthogonal directions (principal
components) along which the data varies the most and projects
the data onto these components.
33
Steps of PCA
 Let be the mean vector  For matrix C, vectors e (=column
(taking the mean of all rows) vector) having same direction as
 Adjust the original data by the Ce :
mean  eigenvectors of C is e such that
 X’ = X – Ce=e,
  is called an eigenvalue of C.
 Compute the covariance
 Ce=e  (C-I)e=0
matrix C of adjusted X
 Find the eigenvectors and
eigenvalues of C.
34
THANK YOU

35

You might also like