0% found this document useful (0 votes)

2 views33 pages

Data Science Unit 3

The document provides an overview of various supervised machine learning algorithms, including K-nearest neighbors (KNN), Support Vector Machines (SVM), Decision Trees, Random Forests, and Naïve Bayes Classifier. Each algorithm is explained in terms of its working mechanism, advantages, disadvantages, and applications, along with performance evaluation metrics such as confusion matrix and accuracy. It emphasizes the importance of understanding these algorithms for effective predictive modeling in various fields.

Uploaded by

rkkammari1996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views33 pages

Data Science Unit 3

Uploaded by

rkkammari1996

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

K-nearest neighbors (KNN)

Introduction

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which

can be used for both classification as well as regression predictive problems.
However, it is mainly used for classification predictive problems in industry. The
following two properties would define KNN well −
 Lazy learning algorithm − KNN is a lazy learning algorithm because it
does not have a specialized training phase and uses all the data for training
while classification.
 Non-parametric learning algorithm − KNN is also a non-parametric
learning algorithm because it doesn’t assume anything about the underlying
data.

Working of KNN Algorithm

K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values
of new datapoints which further means that the new data point will be assigned a
value based on how closely it matches the points in the training set. We can
understand its working with the help of following steps −
Step 1 − For implementing any algorithm, we need dataset. So during the first
step of KNN, we must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K
can be any integer.
Step 3 − For each point in the test data do the following −
 3.1 − Calculate the distance between test data and each row of training data
with the help of any of the method namely: Euclidean, Manhattan or
Hamming distance. The most commonly used method to calculate distance
is Euclidean.
 3.2 − Now, based on the distance value, sort them in ascending order.
 3.3 − Next, it will choose the top K rows from the sorted array.
 3.4 − Now, it will assign a class to the test point based on most frequent
class of these rows.
Step 4 − End

Example
The following is an example to understand the concept of K and working of KNN
algorithm −
Suppose we have a dataset which can be plotted as follows −

Now, we need to classify new data point with black dot (at point 60,60) into blue
or red class. We are assuming K = 3 i.e. it would find three nearest data points. It
is shown in the next diagram −

We can see in the above diagram the three nearest neighbors of the data point
with black dot. Among those three, two of them lies in Red class hence the black
dot will also be assigned in red class.

Pros and Cons of KNN

Pros

 It is very simple algorithm to understand and interpret.

 It is very useful for nonlinear data because there is no assumption about
data in this algorithm.
 It is a versatile algorithm as we can use it for classification as well as
regression.
 It has relatively high accuracy but there are much better supervised learning
models than KNN.

Cons

 It is computationally a bit expensive algorithm because it stores all the

training data.
 High memory storage required as compared to other supervised learning
algorithms.
 Prediction is slow in case of big N.
 It is very sensitive to the scale of data as well as irrelevant features.

Applications of KNN

The following are some of the areas in which KNN can be applied successfully −

Banking System

KNN can be used in banking system to predict weather an individual is fit for loan
approval? Does that individual have the characteristics similar to the defaulters
one?

Calculating Credit Ratings

KNN algorithms can be used to find an individual’s credit rating by comparing with
the persons having similar traits.
Support vector machines (SVMs)

Introduction to SVM

Support vector machines (SVMs) are powerful yet flexible supervised machine
learning algorithms which are used both for classification and regression. But
generally, they are used in classification problems. In 1960s, SVMs were first
introduced but later they got refined in 1990. SVMs have their unique way of
implementation as compared to other machine learning algorithms. Lately, they
are extremely popular because of their ability to handle multiple continuous and
categorical variables.

Working of SVM

An SVM model is basically a representation of different classes in a hyperplane

in multidimensional space. The hyperplane will be generated in an iterative
manner by SVM so that the error can be minimized. The goal of SVM is to divide
the datasets into classes to find a maximum marginal hyperplane (MMH).

The followings are important concepts in SVM −

 Support Vectors − Datapoints that are closest to the hyperplane is called
support vectors. Separating line will be defined with the help of these data
points.
 Hyperplane − As we can see in the above diagram, it is a decision plane or
space which is divided between a set of objects having different classes.
 Margin − It may be defined as the gap between two lines on the closet data
points of different classes. It can be calculated as the perpendicular
distance from the line to the support vectors. Large margin is considered
as a good margin and small margin is considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum
marginal hyperplane (MMH) and it can be done in the following two steps −
 First, SVM will generate hyperplanes iteratively that segregates the classes
in best way.
 Then, it will choose the hyperplane that separates the classes correctly.

SVM Kernels

In practice, SVM algorithm is implemented with kernel that transforms an input

data space into the required form. SVM uses a technique called the kernel trick in
which kernel takes a low dimensional input space and transforms it into a higher
dimensional space. In simple words, kernel converts non-separable problems into
separable problems by adding more dimensions to it. It makes SVM more
powerful, flexible and accurate. The following are some of the types of kernels
used by SVM.

Linear Kernel

It can be used as a dot product between any two observations. The formula of
linear kernel is as below −

From the above formula, we can see that the product between two vectors say 𝑥
& 𝑥𝑖 is the sum of the multiplication of each pair of input values.

Polynomial Kernel

It is more generalized form of linear kernel and distinguish curved or nonlinear

input space. Following is the formula for polynomial kernel −

Here d is the degree of polynomial, which we need to specify manually in the

learning algorithm.

Radial Basis Function (RBF) Kernel

RBF kernel, mostly used in SVM classification, maps input space in indefinite
dimensional space. Following formula explains it mathematically −

Here, gamma ranges from 0 to 1. We need to manually specify it in the learning

algorithm. A good default value of gamma is 0.1.
As we implemented SVM for linearly separable data, we can implement it in
Python for the data that is not linearly separable. It can be done by using kernels.

Pros and Cons of SVM Classifiers

Pros of SVM classifiers

SVM classifiers offers great accuracy and work well with high dimensional space.
SVM classifiers basically use a subset of training points hence in result uses very
less memory.

Cons of SVM classifiers

They have high training time hence in practice not suitable for large datasets.
Another disadvantage is that SVM classifiers do not work well with overlapping
classes.

Decision Tree

Introduction to Decision Tree

In general, Decision tree analysis is a predictive modelling tool that can be applied
across many areas. Decision trees can be constructed by an algorithmic approach
that can split the dataset in different ways based on different conditions. Decisions
trees are the most powerful algorithms that falls under the category of supervised
algorithms.
They can be used for both classification and regression tasks. The two main
entities of a tree are decision nodes, where the data is split and leaves, where we
got outcome. The example of a binary tree for predicting whether a person is fit
or unfit providing various information like age, eating habits and exercise habits,
is given below −

In the above decision tree, the question are decision nodes and final outcomes
are leaves. We have the following two types of decision trees.
 Classification decision trees − In this kind of decision trees, the decision
variable is categorical. The above decision tree is an example of
classification decision tree.
 Regression decision trees − In this kind of decision trees, the decision
variable is continuous.

splitting criterion

The splitting criterion also tells us which branches to grow from node N
with respect to the outcomes of the chosen test. More specifically, the splitting
criterion
indicates the splitting attribute and may also indicate either a split-point or
a splitting subset.

1. A is discrete-valued: In this case, the outcomes of the test at node N correspond

directly to the known values of A
2. A is continuous-valued: In this case, the test at node N has two possible
outcomes,
corresponding to the conditions A _ split point and A > split point, respectively,
where split point is the split-point returned by Attribute selection method as part
of the splitting criterion.
3. A is discrete-valued and a binary tree

In Decision Tree the major challenge is to identification of the attribute for the root
node in each level. This process is known as attribute selection. We have two
popular attribute selection measures:

1. Information Gain
2. Gini Index
3.Gain Ratio

Information Gain

Let node N represent or hold the tuples of partition D. The attribute with the highest
information gain is chosen as the splitting attribute for node N. This attribute
minimizes the information needed to classify the tuples in the resulting partitions
and reflects the least randomness or “impurity” in these partitions.

Such an approach minimizes the expected number of tests needed to classify

a given tuple and guarantees that a simple (but not necessarily the simplest) tree
is found.
The expected information needed to classify a tuple in D is given by
How much more information would we still need (after the partitioning) to arrive at
an exact classification? This amount is measured by

Information gain is defined as the difference between the original information

requirement and the new requirement
That is,

Random Forest Algorithm

Random forest is a supervised learning algorithm which is used for both
classification as well as regression. But however, it is mainly used for classification
problems. As we know that a forest is made up of trees and more trees means
more robust forest. Similarly, random forest algorithm creates decision trees on
data samples and then gets the prediction from each of them and finally selects
the best solution by means of voting. It is an ensemble method which is better
than a single decision tree because it reduces the over-fitting by averaging the
result.

Working of Random Forest Algorithm

We can understand the working of Random Forest algorithm with the help of
following steps −
 Step 1 − First, start with the selection of random samples from a given
dataset.
 Step 2 − Next, this algorithm will construct a decision tree for every sample.
Then it will get the prediction result from every decision tree.
 Step 3 − In this step, voting will be performed for every predicted result.
 Step 4 − At last, select the most voted prediction result as the final
prediction result.
The following diagram will illustrate its working −

Pros

The following are the advantages of Random Forest algorithm −

 It overcomes the problem of overfitting by averaging or combining the
results of different decision trees.
 Random forests work well for a large range of data items than a single
decision tree does.
 Random forest has less variance then single decision tree.
 Random forests are very flexible and possess very high accuracy.
 Scaling of data does not require in random forest algorithm. It maintains
good accuracy even after providing data without scaling.
 Random Forest algorithms maintains good accuracy even a large
proportion of the data is missing.
Cons

The following are the disadvantages of Random Forest algorithm −

 Complexity is the main disadvantage of Random forest algorithms.
 Construction of Random forests are much harder and time-consuming than
decision trees.
 More computational resources are required to implement Random Forest
algorithm.

CONFUSION MATRIX
It is the easiest way to measure the performance of a classification problem where
the output can be of two or more type of classes. A confusion matrix is nothing
but a table with two dimensions viz. “Actual” and “Predicted” and furthermore, both
the dimensions have “True Positives (TP)”, “True Negatives (TN)”, “False
Positives (FP)”, “False Negatives (FN)” as shown below −

The explanation of the terms associated with confusion matrix are as follows −
 True Positives (TP) − It is the case when both actual class & predicted
class of data point is 1.
 True Negatives (TN) − It is the case when both actual class & predicted
class of data point is 0.
 False Positives (FP) − It is the case when actual class of data point is 0 &
predicted class of data point is 1.
 False Negatives (FN) − It is the case when actual class of data point is 1 &
predicted class of data point is 0.

EXAMPLE
Metrics for Evaluating Classifier Performance

The accuracy of a classifier on a given test set is the percentage of test set tuples
that are correctly classified by the classifier. That is,

error rate or misclassification rate of a classifier, M, which is simply 1-

accuracy(M), where accuracy(M) is the accuracy of M. This also can be computed
as
We now consider the class imbalance problem, where the main class of interest
is rare. That is, the data set distribution reflects a significant majority of the negative
class and a minority positive class. For example, in fraud detection applications,
the class of interest (or positive class) is “fraud,” which occurs much less
frequently. The sensitivity and specificity measures can be used to measure
accuracy.

The precision and recall measures are also widely used in classification. Precision
can be thought of as a measure of exactness (i.e., what percentage of tuples
labeled as positive are actually such), whereas recall is a measure of
completeness (what percentage of positive tuples are labeled as such). If recall
seems familiar, that’s because it is the same as sensitivity (or the true positive
rate). These measures can be computed as

An alternative way to use precision and recall is to combine them into a single
measure. This is the approach of the F measure (also known as the F1 score or
F-score)
Naïve Bayes Classifier Algorithm
o Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification
problems.
o It is mainly used in text classification that includes a high-
dimensional training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the
basis of the probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

o Naïve: It is called Naïve because it assumes that the occurrence

of a certain feature is independent of the occurrence of other
features. Such as if the fruit is identified on the bases of color,
shape, and taste, then red, spherical, and sweet fruit is recognized
as an apple. Hence each feature individually contributes to identify
that it is an apple without depending on each other.
o Bayes: It is called Bayes because it depends on the principle
of Bayes' Theorem.
o

Bayes' Theorem:

o Bayes' theorem is also known as Bayes' Rule or Bayes' law,

which is used to determine the probability of a hypothesis with
prior knowledge. It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the

observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that

the probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the

evidence.

P(B) is Marginal Probability: Probability of Evidence.

Example

The tuple we wish to classify is

PATTERNS, FEATURES, PATTER REPRESENTATION

Pattern is everything around in this digital world. A pattern can either be seen physically or it
can be observed mathematically by applying algorithms.

Example: The colors on the clothes, speech pattern etc. In computer science, a pattern is
represented using vector feature values.

Pattern recognition is the process of recognizing patterns by using machine learning

algorithm. Pattern recognition can be defined as the classification of data based on knowledge
already gained or on statistical information extracted from patterns and/or their representation.
One of the important aspects of the pattern recognition is its application potential.

Examples: Speech recognition, speaker identification, multimedia document recognition.

In a typical pattern recognition application, the raw data is processed and converted into a
form that is amenable for a machine to use. Pattern recognition involves classification and
cluster of patterns.

CURSE OF DIMENSIONALITY

Handling the high-dimensional data is very difficult in practice, commonly known as the curse of
dimensionality. If the dimensionality of the input dataset increases, any machine learning
algorithm and model becomes more complex. As the number of features increases, the number of
samples also gets increased proportionally, and the chance of overfitting also increases. If the
machine learning model is trained on high-dimensional data, it becomes overfitted and results in
poor performance.

Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.
DIMENSIONALITY REDUCTION

In machine learning classification problems, there are often too many factors on the basis of
which the final classification is done. These fa factors
ctors are basically variables called features.
The higher the number of features, the harder it gets to visualize the training set and then
work on it. Sometimes, most of these features are correlated, and hence redundant. This is
where dimensionality reduction
ction algorithms come into play. Dimensionality reduction is the
process of reducing the number of random variables under consideration, by obtaining a set
of principal variables. It can be divided into feature selection and feature extraction.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables, or features, to get
a smaller subset which can be used to model the problem. It usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower dimension space,
i.e. a space with lesser no. of dimensions.

Methods of Dimensionality Reduction

The various methods used
sed for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)

Principal Component Analysis

This method was introduced by Karl Pearson. It works on a condition that while the data in a
higher dimensional space is mapped to data in a lower dimension space, the variance of the data
in the lower dimensional space should be maximum.
It involves the following steps:
 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large fraction of
variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data
loss in the process. But, the most important variances should be retained by the remaining
eigenvectors.

SUPERVISED AND UNSUPERVISED LEARNING

Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained using labeled Unsupervised learning algorithms are trained
data. using unlabeled data.

Supervised learning model takes direct feedback to Unsupervised learning model does not take
check if it is predicting correct output or not. any feedback.

Supervised learning model predicts the output. Unsupervised learning model finds the hidden
patterns in data.

In supervised learning, input data is provided to the In unsupervised learning, only input data is
model along with the output. provided to the model.

The goal of supervised learning is to train the model so The goal of unsupervised learning is to find
that it can predict the output when it is given new data. the hidden patterns and useful insights from
the unknown dataset.

Supervised learning needs supervision to train the Unsupervised learning does not need any
model. supervision to train the model.
Supervised learning can be categorized Unsupervised Learning can be classified
in Classification and Regression problems. in Clustering and Associations problems.

Supervised learning model produces an accurate result. Unsupervised learning model may give less
accurate result as compared to supervised
learning.

It includes various algorithms such as Linear It includes various algorithms such as

Regression, Logistic Regression, Support Vector Clustering, KNN, and Apriori algorithm.
Machine, Multi-class Classification, Decision tree,
Bayesian Logic, etc.

CLASSIFICATION—LINEAR AND NON-LINEAR

Classification Algorithms can be divided into the Mainly two category:

o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification

PERCEPTRON

Perceptron is an algorithm used for supervised learning of binary classifiers. Binary classifiers
decide whether an input, usually represented by a series of vectors, belongs to a specific class. a
perceptron is a single-layer neural network. They consist of four main parts including input
values, weights and bias, net sum, and an activation function.

The process begins by taking all the input values and multiplying them by their weights. Then,
all of these multiplied values are added together to create the weighted sum. The weighted sum is
then applied to the activation function, producing the perceptron's output. The activation function
plays the integral role of ensuring the output is mapped between required values such as (0,1) or
(-1,1). It is important to note that the weight of an input is indicative of the strength of a node.
Similarly, an input's bias value gives the ability to shift the activation function curve up or down.

As a simplified form of a neural network, specifically a single-layer neural network, perceptrons

play an important role in binary classification. This means the perceptron is used to classify data
into two parts, hence binary. Sometimes, perceptrons are also referred to as linear binary
classifiers for this reason.

LOGISTIC REGRESSION

 Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique. It is used for predicting the categorical
dependent variable using a given set of independent variables.

 Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.

 Logistic Regression is much similar to the Linear Regression except that how they are
used. Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
 In Logistic regression, instead of fi
fitting
tting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).

 The curve from the logistic function indicates the likelihood of something such as
whether the cells are cancerous or not, a mouse is obese or not based on its weight, etc.

 Logistic Regression is a significant machine learning algorithm because it has the ability
to provide probabilities and classify new data using continuous and discrete datasets.

 Logistic Regression can be used to classify the obse

observations
rvations using different types of data
and can easily determine the most effective variables used for the classification. The
below image is showing the logistic function:

Logistic Regression Equation:

The Logistic regression equation can be obtained fro from

m the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:

o We know the equation of the straight line can be written as:

o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:

o Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.

o Multinomial: In multinomial Logistic regression, there can be 3 or more possible

unordered types of the dependent variable, such as "cat", "dogs", or "sheep"

o Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types of
dependent variables, such as "low", "Medium", or "High".

BOOSTING AND BAGGING

 Bagging ( or Bootstrap Aggregation), is a simple and very powerful ensemble method.

Bagging is the application of the Bootstrap procedure to a high-variance machine
learning algorithm, typically decision trees.
 The idea behind bagging is combining the results of multiple models (for instance, all
decision trees) to get a generalized result. Now, bootstrapping comes into picture.
 Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a fair idea
of the distribution (complete set). The size of subsets created for bagging may be less
than the original set.
 It can be represented as follows:

Bagging works as follows:-

1. Multiple subsets are created from the original dataset, selecting observations with
replacement.
2. A base model (weak model) is created on each of these subsets.
3. The models run in parallel and are independent of each other.
4. The final predictions are determined by combining the predictions from all the models.
Now, bagging can be represented diagrammatically as follows

 Boosting is a sequential process, where each subsequent model attempts to correct the
errors of the previous model. The succeeding models are dependent on the previous
model.
 In this technique, learners are learned sequentially with early learners fitting simple
models to the data and then analyzing data for errors. In other words, we fit consecutive
trees (random sample) and at every step, the goal is to solve for net error from the prior
tree.
 When an input is misclassified by a hypothesis, its weight is increased so that next
hypothesis is more likely to classify it correctly. By combining the whole set at the end
converts weak learners into better performing model.
 Let’s understand the way boosting works in the below steps.
1. A subset is created from the original dataset.
2. Initially, all data points are given equal weights.
3. A base model is created on this subset.
4. This model is used to make predictions on the whole dataset.
 Errors are calculated using the actual values and predicted values.
 The observations which are incorrectly predicted, are given higher weights. (Here, the
three misclassified blue-plus points will be given higher weights)
 Another model is created and predictions are made on the dataset. (This model tries to
correct the errors from the previous model)
 Similarly, multiple models are created, each correcting the errors of the previous model.
 The final model (strong learner) is the weighted mean of all the models (weak learners).

CLUSTERING---PARTITIONAL AND HIERARCHICAL; K-MEANS CLUSTERING

Cluster analysis, or clustering, is an unsupervised machine learning task.

It involves automatically discovering natural grouping in data. Unlike supervised learning,

clustering algorithms only interpret the input data and find natural groups or clusters.

1. Examples of Clustering Algorithms

1. BIRCH
2. DBSCAN
3. K-Means
4. Spectral Clustering
5. Gaussian Mixture Model
K-MEANS

K-means clustering algorithm computes the centroids and iterates until we it finds optimal
centroid. It assumes that the number of clusters are already known. It is also called flat
clustering algorithm. The number of clusters identified from data by algorithm is represented by
‘K’ in K-means.
In this algorithm, the data points are assigned to a cluster in such a manner that the sum of the
squared distance between the data points and centroid would be minimum. It is to be understood
that less variation within the clusters will lead to more similar data points within same cluster.

Working of K-Means Algorithm

Step 1 − First, we need to specify the number of clusters, K, need to be generated by this
algorithm.
Step 2 − Next, randomly select K data points and assign each data point to a cluster. In simple
words, classify the data based on the number of data points.
Step 3 − Now it will compute the cluster centroids.
Step 4 − Next, keep iterating the following until we find optimal centroid which is the
assignment of data points to the clusters that are not changing any more
 4.1 − First, the sum of squared distance between data points and centroids would be
computed.
 4.2 − Assign each data point to the cluster that is closer than other cluster (centroid).
 4.3 − At last compute the centroids for the clusters by taking the average of all data
points of that cluster.
K-means follows Expectation-Maximization approach to solve the problem. The Expectation-
step is used for assigning the data points to the closest cluster and the Maximization-step is used
for computing the centroid of each cluster.
Applications of K-Means Clustering Algorithm

 Market segmentation
 Document Clustering
 Image segmentation
 Image compression
 Customer segmentation
 Analyzing the trend on dynamic data

EVALUATION METRICS :

Root mean square error or root mean square deviation is one of the most commonly used
measures for evaluating the quality of predictions. It shows how far predictions fall from
measured true values using Euclidean distance.

To compute RMSE, calculate the residual (difference between prediction and truth) for each data
point, compute the norm of residual for each data point, compute the mean of residuals and take
the square root of that mean. RMSE is commonly used in supervised learning applications, as
RMSE uses and needs true measurements at each predicted data point.

Root mean square error can be expressed as

where N is the number of data points
points, y(i) is the i-th measurement,, and y ̂(i) is its
corresponding prediction.

Mean Absolute Error (MAE)

Mean Absolute Error (also called L1 loss) is one of the most simple yet robust loss functions
used for regression models.

MAE takes the average sum of the absolute differences between the actual and the predicted
values. For a data point xi and its predic
predicted value yi, n being the total number of data points in the
dataset, the mean absolute error is defined as:

Coefficient of Determination (R Squared)

 The coefficient of determination is the square of the correlation(r), thus it ranges from 0
to 1.
 With linear regression,, the coefficient of determination is equal to the square of the
correlation between the x and y variables.
 If R2 is equal to 0, then the dependent variable cannot be predict
predicted
ed from the independent
variable.
 If R2 is equal to 1, then the dependent variable can be predicted from the independent
variable without any error.
 If R2 is between 0 and 1, then it indicates the extent that the dependent variable can be
predictable. If R2 of 0.10 means, it is 10 percent of the variance in the y variable is
predicted from the x variable. If 0.20 means, 20 percent of the variance in the y variable
is predicted from the x variable, and so on.
The value of R2 shows whether the model would be a good fit for the given data set.
TRAINING AND TESTING A CLASSIFIER
Training and Testing is a phenomena through which a system gets trained and becomes
adaptable to give result in an accurate manner. Learning is the most important phase as how
well the system performs on the data provided to the system depends on which algorithms
used on the data. Entire dataset is divided into two categories, one which is used in training
the model i.e. Training set and the other that is used in testing the model after training, i.e.
Testing set.

 Trainingset:

Training set is used to build a model. It consists of the set of images that are used to train
the system. Training rules and algorithms used give relevant information on how to
associate input data with output decision. The system is trained by applying these
algorithms on the dataset, all the relevant information is extracted from the data and
results are obtained. Generally, 80% of the data of the dataset is taken for training data

 Testingset:

Testing data is used to test the system. It is the set of data which is used to verify whether
the system is producing the correct output after being trained or not. Generally, 20% of
the data of the dataset is used for testing.

CROSS-VALIDATION
Cross-Validation

Cross-validation is a technique in which we train our model using the subset of the data-set
and then evaluate using the complementary subset of the data-set.
The three steps involved in cross-validation are as follows :
1. Reserve some portion of sample data-set.
2. Using the rest data-set train the model.
3. Test the model using the reserve portion of the data-set.
Methods of Cross Validation
 Validation
 LOOCV (Leave One Out Cross Validation)
 K-Fold Cross Validation

HANDLING- EXPLORATORY DATA ANALYSIS (EDA)

Steps in Data Exploration and Preprocessing:

1. Identification of variables and data types

2. Analyzing the basic metrics
3. Non-Graphical Univariate Analysis
4. Graphical Univariate Analysis
5. Bivariate Analysis
6. Variable transformations
7. Missing value treatment
8. Outlier treatment
9. Correlation Analysis
10. Dimensionality Reduction

ROC CURVE

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a
classification model at all classification thresholds. This curve plots two parameters:

 True Positive Rate

 False Positive Rate

True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:

False Positive Rate (FPR) is defined as follows:

An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the
classification threshold classifies more items as positive, thus increasing both False Positives and
True Positives. The following figure shows a typical ROC curve.

To compute the points in an ROC curve, we could evaluate a logistic regression model many
times with different classification thresholds, but this would be inefficient. Fortunately, there's
an efficient, sorting-based algorithm that can provide this information for us, called AUC.

AUC: Area Under the ROC Curve

AUC stands for "Area under the ROC Curve." That is, AUC measures the entire two-
dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).
(COST FUNCTIONS : same as evaluation functions)

Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Unit 5 Learning With Algorithm
No ratings yet
Unit 5 Learning With Algorithm
7 pages
Presentation UNIT-2
No ratings yet
Presentation UNIT-2
96 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Document
No ratings yet
Document
6 pages
DL PPR3
No ratings yet
DL PPR3
57 pages
Supervised Classification Notes
No ratings yet
Supervised Classification Notes
31 pages
ML and Ai Unit 04 and Unit 05
No ratings yet
ML and Ai Unit 04 and Unit 05
58 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Dsbdunitiii T1729232981820-1
No ratings yet
Dsbdunitiii T1729232981820-1
26 pages
Unit 6 Ai
No ratings yet
Unit 6 Ai
28 pages
Unit 2
No ratings yet
Unit 2
16 pages
INT354 - Unit 3
No ratings yet
INT354 - Unit 3
60 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit 5
No ratings yet
Unit 5
28 pages
CH 7
No ratings yet
CH 7
33 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
KNN & Support Vector Machines: Dr.S.Vasantharathna
No ratings yet
KNN & Support Vector Machines: Dr.S.Vasantharathna
22 pages
Raghav Soni (20IOT6014) Algo - Assignment
No ratings yet
Raghav Soni (20IOT6014) Algo - Assignment
14 pages
Module 3
No ratings yet
Module 3
79 pages
Cse Vsem 503 B PR Unit 2 Notes
No ratings yet
Cse Vsem 503 B PR Unit 2 Notes
17 pages
Types of Kernels in Support Vector Machines
No ratings yet
Types of Kernels in Support Vector Machines
14 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
B43 Exp3 ML
No ratings yet
B43 Exp3 ML
5 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
ML Unit-2 (CEC)
No ratings yet
ML Unit-2 (CEC)
96 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
U3 KNN
No ratings yet
U3 KNN
6 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
Unit - 3
No ratings yet
Unit - 3
73 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
Classification
No ratings yet
Classification
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
34 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
No ratings yet
Machine Learning Techniques - Overview of Decision Trees, Logistic Regression, SVM, and K-NN
1 page
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
Machine Learning
No ratings yet
Machine Learning
32 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
8 pages
Spam Not Spam
No ratings yet
Spam Not Spam
7 pages
Module Iii
No ratings yet
Module Iii
15 pages
Unit 1
No ratings yet
Unit 1
15 pages
SVM
No ratings yet
SVM
12 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Tutorial 7 Machine Learning Algorithms
No ratings yet
Tutorial 7 Machine Learning Algorithms
30 pages
This Is
No ratings yet
This Is
7 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
An Introduction To Machine Learning
No ratings yet
An Introduction To Machine Learning
136 pages
6 390 Lecture Notes Spring24
No ratings yet
6 390 Lecture Notes Spring24
144 pages
Machine Learning Algorithms For Breast Cancer Prediction
No ratings yet
Machine Learning Algorithms For Breast Cancer Prediction
8 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM Using Python
No ratings yet
SVM Using Python
24 pages
K Means R and Rapid Miner Patient and Mall Case Study
No ratings yet
K Means R and Rapid Miner Patient and Mall Case Study
80 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
AI Brochure
No ratings yet
AI Brochure
3 pages
Lecture 15 (91 Slides)
No ratings yet
Lecture 15 (91 Slides)
91 pages
Chapter 4 Descriptive Data Mining
No ratings yet
Chapter 4 Descriptive Data Mining
6 pages
Hierarchical
No ratings yet
Hierarchical
9 pages
Machine Learning Methods For Solar Radiation Forecasting. A Review
No ratings yet
Machine Learning Methods For Solar Radiation Forecasting. A Review
33 pages
PGM 7
No ratings yet
PGM 7
3 pages
Strohmeier 2014
No ratings yet
Strohmeier 2014
21 pages
Kernel Clustering
No ratings yet
Kernel Clustering
57 pages
Chemometric Software For Multivariate Data Analysis Based On Matlab
No ratings yet
Chemometric Software For Multivariate Data Analysis Based On Matlab
8 pages
Quantum Machine Learning in Medical Image Analysis: A Survey
No ratings yet
Quantum Machine Learning in Medical Image Analysis: A Survey
12 pages
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
No ratings yet
An Efficient K-Means Clustering Algorithm: Analysis and Implementation
12 pages
JAVIER KMeans Clustering Jupyter Notebook
No ratings yet
JAVIER KMeans Clustering Jupyter Notebook
7 pages
Week 10 Lecture - Introduction To Clustering
No ratings yet
Week 10 Lecture - Introduction To Clustering
35 pages
CTEVT Data Mining - Solution 2079
No ratings yet
CTEVT Data Mining - Solution 2079
19 pages
Artificial Intelligence Applied To Project Success: A Literature Review
No ratings yet
Artificial Intelligence Applied To Project Success: A Literature Review
6 pages
A Survey of Flow Cytometry Data Analysis Methods
No ratings yet
A Survey of Flow Cytometry Data Analysis Methods
20 pages
Knowledge Graph Implementation On The Wikipedia Page Using A Deep Learning Algorithm
No ratings yet
Knowledge Graph Implementation On The Wikipedia Page Using A Deep Learning Algorithm
12 pages
Thesis Final
No ratings yet
Thesis Final
3 pages
Cluster Analysis: DR Loveleen Gaur
No ratings yet
Cluster Analysis: DR Loveleen Gaur
11 pages
Ir QP Answer
No ratings yet
Ir QP Answer
59 pages
Image Clustering
No ratings yet
Image Clustering
6 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
ML File
No ratings yet
ML File
17 pages
A Decomposition-Clustering-Ensemble Learning Approach For Solar Radiation Forecasting
No ratings yet
A Decomposition-Clustering-Ensemble Learning Approach For Solar Radiation Forecasting
11 pages
Ultramicroscopy: Kunwar Muhammed Saaim, Saima Khan Afridi, Maryam Nisar, Saiful Islam
No ratings yet
Ultramicroscopy: Kunwar Muhammed Saaim, Saima Khan Afridi, Maryam Nisar, Saiful Islam
14 pages
Clustering of Study Program Using Block-Based K-Medoids: Jurnal Varian
No ratings yet
Clustering of Study Program Using Block-Based K-Medoids: Jurnal Varian
17 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet

Data Science Unit 3

Uploaded by

Data Science Unit 3

Uploaded by

K-nearest neighbors (KNN)

K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which

Working of KNN Algorithm

Pros and Cons of KNN

 It is very simple algorithm to understand and interpret.

 It is computationally a bit expensive algorithm because it stores all the

Calculating Credit Ratings

An SVM model is basically a representation of different classes in a hyperplane

The followings are important concepts in SVM −

In practice, SVM algorithm is implemented with kernel that transforms an input

It is more generalized form of linear kernel and distinguish curved or nonlinear

Here d is the degree of polynomial, which we need to specify manually in the

Radial Basis Function (RBF) Kernel

Here, gamma ranges from 0 to 1. We need to manually specify it in the learning

Pros and Cons of SVM Classifiers

Pros of SVM classifiers

Cons of SVM classifiers

Introduction to Decision Tree

1. A is discrete-valued: In this case, the outcomes of the test at node N correspond

Such an approach minimizes the expected number of tests needed to classify

Information gain is defined as the difference between the original information

Random Forest Algorithm

Working of Random Forest Algorithm

The following are the advantages of Random Forest algorithm −

The following are the disadvantages of Random Forest algorithm −

error rate or misclassification rate of a classifier, M, which is simply 1-

Why is it called Naïve Bayes?

o Naïve: It is called Naïve because it assumes that the occurrence

o Bayes' theorem is also known as Bayes' Rule or Bayes' law,

P(A|B) is Posterior probability: Probability of hypothesis A on the

P(B|A) is Likelihood probability: Probability of the evidence given that

P(A) is Prior Probability: Probability of hypothesis before observing the

P(B) is Marginal Probability: Probability of Evidence.

The tuple we wish to classify is

Pattern recognition is the process of recognizing patterns by using machine learning

Examples: Speech recognition, speaker identification, multimedia document recognition.

Components of Dimensionality Reduction

Methods of Dimensionality Reduction

Principal Component Analysis

SUPERVISED AND UNSUPERVISED LEARNING

Supervised Learning Unsupervised Learning

It includes various algorithms such as Linear It includes various algorithms such as

CLASSIFICATION—LINEAR AND NON-LINEAR

Classification Algorithms can be divided into the Mainly two category:

As a simplified form of a neural network, specifically a single-layer neural network, perceptrons

 Logistic Regression can be used to classify the obse

Logistic Regression Equation:

The Logistic regression equation can be obtained fro from

o We know the equation of the straight line can be written as:

o Multinomial: In multinomial Logistic regression, there can be 3 or more possible

BOOSTING AND BAGGING

 Bagging ( or Bootstrap Aggregation), is a simple and very powerful ensemble method.

Bagging works as follows:-

CLUSTERING---PARTITIONAL AND HIERARCHICAL; K-MEANS CLUSTERING

Cluster analysis, or clustering, is an unsupervised machine learning task.

It involves automatically discovering natural grouping in data. Unlike supervised learning,

1. Examples of Clustering Algorithms

Working of K-Means Algorithm

Root mean square error can be expressed as

Mean Absolute Error (MAE)

Coefficient of Determination (R Squared)

HANDLING- EXPLORATORY DATA ANALYSIS (EDA)

1. Identification of variables and data types

 True Positive Rate

False Positive Rate (FPR) is defined as follows:

AUC: Area Under the ROC Curve

You might also like