PP&DS 4
PP&DS 4
Pattern Recognition
Fingerprint, Facial image, hand geometry image Image Identity of the use
Features
A feature is a function of one or more measurements, computed so that it
quantifies some significant characteristics of the object.
Example:
➢ consider our face, eyes, ears, nose etc are features of the face.
➢ A set of features that are taken together, forms the features vector.
Example:
➢ In the above example of face, if all the features (eyes, ears, nose etc)
taken together then the sequence is feature vector([eyes, ears, nose]).
➢ Feature vector is the sequence of a features represented as a d-
dimensional column vector.
Example
f1 f2 f3 f4 f5 f6 Class
label
Pattern 1: 1 4 3 6 4 7 1
Pattern 2: 4 7 5 7 4 2 2
Pattern 3: 6 9 7 5 3 1 3
Pattern 4: 7 4 6 2 8 6 1
Pattern 5: 4 7 5 8 2 6 2
Pattern 6: 5 3 7 9 5 3 3
Pattern 7: 8 1 9 4 2 8 3
• In this case, n=7 and d=6. As can be seen,each pattern has six
attributes( or features).
• Each attribute in this case is a number between 1 and 9.
• The last number in each line gives the class of the pattern.
• In this case, the class of the patterns is either 1, 2 or 3.
GTGCATCTGACTCCT...
RNA is expressed as
GUGCAUCUGACUCCU....
• This can be translated into protein which would be of the form VHLTPEEK
....
➢ Some of the difficulties that come with high dimensional data manifest
during analyzing or visualizing the data to identify patterns, and some
manifest while training machine learning models.
➢
"As the number of features or dimensions grows, the amount of data we need
to generalize accurately grows exponentially."
Example: below. Fig. 1 (a) shows 10 data points in one dimension i.e. there is
only one feature in the data set.
It can be easily represented on a line with only 10 values, x=1, 2, 3... 10.
But if we add one more feature, same data will be represented in 2 dimensions
(Fig.1 (b)) causing increase in dimension space to 10*10 =100.
And again if we add 3rd feature, dimension space will increase to 10*10*10 =
1000.
As dimensions grows, dimensions space increases exponentially.
10^1 = 10
10^2 = 100
This exponential growth in data causes high sparsity in the data set and
unnecessarily increases storage space and processing time for the particular
modelling algorithm.
If the target depends on a third attribute, let’s say body type, the number of
training samples required to cover all the combinations increases
phenomenally.
The combinations are shown in figure 2. For two variables, we needed eight
training samples. For three variables, we need 24 samples.
Figure 2. Combination of values of 3 attributes for generalizing a model
Dimensionality Reduction
What is Dimensionality Reduction?
The number of input features, variables, or columns present in a given
dataset is known as dimensionality, and the process to reduce these features is
called dimensionality reduction.
A dataset contains a huge number of input features in various cases, which makes
the predictive modeling task more complicated.
Because it is very difficult to visualize or make predictions for the training dataset
with a high number of features, for such cases, dimensionality reduction
techniques are required to use.
These techniques are widely used in machine learning for obtaining a better fit
predictive model while solving the classification and regression problems.
It is commonly used in the fields that deal with high-dimensional data, such
as speech recognition, signal processing, bioinformatics, etc. It can also be used
for data visualization, noise reduction, cluster analysis, etc.
There are two ways to apply the dimension reduction technique, which are given
below:
Feature Selection
1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant
features is taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2. Wrappers Methods
• The wrapper method has the same goal as the filter method, but it takes a
machine learning model for its evaluation.
• In this method, some features are fed to the ML model, and evaluate the
performance.
• The performance decides whether to add those features or remove to
increase the accuracy of the model.
• This method is more accurate than the filtering method but complex to
work. Some common techniques of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination
o LASSO
o Elastic Net
o Ridge Regression, etc.
Feature Extraction:
o In this technique, firstly, all the n variables of the given dataset are taken
to train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1
features for n times, and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in the
performance of the model, and then we will drop that variable or features;
after that, we will be left with n-1 features.
o Repeat the complete process until no feature can be dropped.
o We start with a single feature only, and progressively we will add each
feature at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the
performance of the model.
• If a dataset has too many missing values, then we drop those variables as
they do not carry much useful information.
• To perform this, we can set a threshold level, and if a variable has missing
values more than that threshold, we will drop that variable.
• The higher the threshold value, the more efficient the reduction.
• High Correlation refers to the case when two variables carry approximately
similar information.
• Due to this factor, the performance of the model can be degraded.
• This correlation between the independent numerical variable gives the
calculated value of the correlation coefficient.
• If this value is higher than the threshold value, we can remove one of the
variables from the dataset. We can consider those variables or features
that show a high correlation with the target variable.
Random Forest
Factor Analysis
Auto-encoders
o Encoder: The function of the encoder is to compress the input to form the
latent-space representation.
o Decoder: The function of the decoder is to recreate the output from the
latent-space representation.
Machine Learning
In the real world, we are surrounded by humans who can learn everything from
their experiences with their learning capability, and we have computers or
machines which work on our instructions.
But can a machine also learn from experiences or past data like a human does?
So here comes the role of Machine Learning.
The below block diagram explains the working of Machine Learning algorithm:
Features of Machine Learning:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1.Supervised Learning
o If the given shape has four sides, and all the sides are equal, then it will be
labelled as a Square.
o If the given shape has three sides, then it will be labelled as a triangle.
o If the given shape has six equal sides then it will be labelled as hexagon.
Now, after training, we test our model using the test set, and the task of the
model is to identify the shape.
The machine is already trained on all types of shapes, and when it finds a new
shape, it classifies the shape on the bases of a number of sides, and predicts the
output.
Below are some popular Regression algorithms which come under supervised
learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
Advantages of Supervised learning:
o With the help of supervised learning, the model can predict the output on
the basis of prior experiences.
o In supervised learning, we can have an exact idea about the classes of
objects.
o Supervised learning model helps us to solve various real-world problems
such as fraud detection, spam filtering, etc.
o Supervised learning models are not suitable for handling the complex
tasks.
o Supervised learning cannot predict the correct output if the test data is
different from the training dataset.
o In supervised learning, we need enough knowledge about the classes of
object.
2) Unsupervised Learning
Here, we have taken an unlabeled input data, which means it is not categorized
and corresponding outputs are also not given.
Now, this unlabeled input data is fed to the machine learning model in order to
train it.
Firstly, it will interpret the raw data to find the hidden patterns from the data and
then will apply suitable algorithms such as k-means clustering, Decision tree, etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into
groups according to the similarities.
o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition
Advantages of Unsupervised Learning
3) Reinforcement Learning
o The agent continues doing these three things (take action, change
state/remain in the same state, and get feedback), and by doing these
actions, he learns and explores the environment.
o The agent learns that what actions lead to positive feedback or rewards
and what actions lead to negative feedback penalty.
o As a positive reward, the agent gets a positive point, and as a penalty, it
gets a negative point.
Classification Algorithm
In the below diagram, there are two classes, class A and Class B. These classes
have features that are similar to each other and dissimilar to other classes.
o Multi-class Classifier:
1. Lazy Learners:
Lazy Learner firstly stores the training dataset and wait until it receives the
test dataset.In Lazy learner case, classification is done on the basis of the most
related data stored in the training dataset. It takes less time in training but more
time for predictions.
Example: K-NN algorithm, Case-based reasoning
2. Eager Learners:
Classification Algorithms can be further divided into the Mainly two category:
o Linear Models
o Logistic Regression
o Support Vector Machines
o Non-linear Models
o K-Nearest Neighbours
o Kernel SVM
o Naïve Bayes
o Decision Tree Classification
o Random Forest Classification
Logistic Regression
o Logistic regression is one of the most popular Machine Learning
algorithms, which comes under the Supervised Learning technique.
o It is used for predicting the categorical dependent variable using a given
set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value.
o It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie
between 0 and 1.
o In Logistic regression, instead of fitting a regression line, we fit an "S"
shaped logistic function, which predicts two maximum values (0 or 1).
o Logistic Regression is a significant machine learning algorithm because it
has the ability to provide probabilities and classify new data using
continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used
for the classification.
The Logistic regression equation can be obtained from the Linear Regression
equation. The mathematical steps to get Logistic Regression equations are given
below:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):
On the basis of the categories, Logistic Regression can be classified into three
types:
SVM algorithm can be used for Face detection, image classification, text
categorization, etc.
Types of SVM
The dimensions of the hyperplane depend on the features present in the dataset,
which means if there are 2 features (as shown in image), then hyperplane will be
a straight line. And if there are 3 features, then hyperplane will be a 3-dimension
plane.
We always create a hyperplane that has a maximum margin, which means the
maximum distance between the data points.
Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect
the position of the hyperplane are termed as Support Vector. Since these vectors
support the hyperplane, hence called a Support vector.
Linear SVM:
Non-Linear SVM:
If data is linearly arranged, then we can separate it by using a straight line, but
for non-linear data, we cannot draw a single straight line. Consider the below
image:
So to separate these data points, we need to add one more dimension. For linear
data, we have used two dimensions x and y, so for non-linear data, we will add a
third dimension z. It can be calculated as:
z=x2 +y2
By adding the third dimension, the sample space will become as below image:
So now, SVM will divide the datasets into classes in the following way. Consider
the below image:
Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If
we convert it in 2d space with z=1, then it will become as:
K-Nearest Neighbor(KNN)
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar
to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it
gets new data, then it classifies that data into a category that is much
similar to the new data.
Suppose there are two categories, i.e., Category A and Category B, and we have
a new data point x1, so this data point will lie in which of these categories.
To solve this type of problem, we need a K-NN algorithm. With the help of K-NN,
we can easily identify the category or class of a particular dataset. Consider the
below diagram:
The K-NN working can be explained on the basis of the below algorithm:
Suppose we have a new data point and we need to put it in the required category.
Consider the below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points.
The Euclidean distance is the distance between two points, which we have
already studied in geometry. It can be calculated as:
o As we can see the 3 nearest neighbors are from category A, hence this new
data point must belong to category A.
Below are some points to remember while selecting the value of K in the K-NN
algorithm: Trying to Surf Wave Machine he Water
o There is no particular way to determine the best value for "K", so we need
to try some values to find the best out of them. The most preferred value
for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
o Large values for K are good, but it may find some difficulties.
o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which
can be described as:
Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used
to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.
o The formula for Bayes' theorem is given as:
Where,
Working of Naïve Bayes' Classifier can be understood with the help of the below
example:
So using this dataset we need to decide that whether we should play or not on a
particular day according to the weather conditions. So to solve this problem, we
need to follow the below steps:
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Weather No Yes
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny)= 0.35
P(Yes)=0.71
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.
There are three types of Naive Bayes Model, which are given below:
Note: A decision tree can contain categorical data (YES/NO) as well as numeric
data.
There are various algorithms in Machine learning, so choosing the best algorithm
for the given dataset and problem is the main point to remember while creating
a machine learning model.
Below are the two reasons for using the Decision tree:
▪ Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or
more homogeneous sets.
▪ Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
▪ Splitting: Splitting is the process of dividing the decision node/root node
into sub-nodes according to the given conditions.
▪ Branch/Sub Tree: A tree formed by splitting the tree.
▪ Pruning: Pruning is the process of removing the unwanted branches from
the tree.
▪ Parent/Child node: The root node of the tree is called the parent node,
and other nodes are called the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm
starts from the root node of the tree.
This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps
to the next node.
For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further.
It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:
CStep-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the
dataset created in step -3. Continue this process until a stage is reached
where you cannot further classify the nodes and called the final node as a
leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not.
So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM).
The root node splits further into the next decision node (distance from the office)
and one leaf node based on the corresponding labels.
The next decision node further gets split into one decision node (Cab facility) and
one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined
offer). Consider the below diagram:
Pruning is a process of deleting the unnecessary nodes from a tree in order to get
the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not capture
all the important features of the dataset. Therefore, a technique that decreases
the size of the learning tree without reducing accuracy is known as Pruning. There
are mainly two types of tree pruning technology used:
Note: To better understand the Random Forest Algorithm, you should have
knowledge of the Decision Tree Algorithm.
Since the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct output,
while others may not.
Therefore, below are two assumptions for a better Random forest classifier:
o There should be some actual values in the feature variable of the dataset
so that the classifier can predict accurate results rather than a guessed
result.
o The predictions from each tree must have very low correlations.
The Working process can be explained in the below steps and diagram:
Step-2: Build the decision trees associated with the selected data points
(Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-5: For new data points, find the predictions of each decision tree, and assign
the new data points to the category that wins the majority votes.
The working of the algorithm can be better understood by the below example:
Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each decision
tree produces a prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts the final decision.
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of
loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
o Although random forest can be used for both classification and regression
tasks, it is not more suitable for Regression tasks.
Bagging Vs Boosting
• We all use the Decision Tree Technique on day to day life to make the
decision.
• Organizations use these supervised machine learning techniques like
Decision trees to make a better decision and to generate more surplus and
profit.
• Ensemble methods combine different decision trees to deliver better
predictive results, afterward utilizing a single decision tree.
• The primary principle behind the ensemble model is that a group of weak
learners come together to form an active learner.
• There are two techniques given below that are used to perform ensemble
decision tree.
Bagging
Boosting:
Example: Let's understand the clustering technique with the real-world example
of Mall: When we visit any shopping mall, we can observe that the things with
similar usage are grouped together. Such as the t-shirts are grouped in one
section, and trousers are at other sections, similarly, at vegetable sections,
apples, bananas, Mangoes, etc., are grouped in separate sections, so that we can
easily find out the things. The clustering technique also works in the same way.
The clustering technique can be widely used in various tasks. Some most common
uses of this technique are:
o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation
system to provide the recommendations as per the past search of products.
Netflix also uses this technique to recommend the movies and web-series to its
users as per the watch history.
The below diagram explains the working of the clustering algorithm. We can see
the different fruits are divided into several groups with similar properties.
The clustering methods are broadly divided into Hard clustering (datapoint
belongs to only one group) and Soft Clustering (data points can belong to another
group also). But there are also other various approaches of Clustering exist. Below
are the main clustering methods used in Machine learning:
1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering
Partitioning Clustering
Hierarchical Clustering
▪ Fuzzy clustering is a type of soft method in which a data object may belong
to more than one group or cluster.
▪ Each dataset has a set of membership coefficients, which depend on the
degree of membership to be in a cluster.
▪ Fuzzy C-means algorithm is the example of this type of clustering; it is
sometimes also known as the Fuzzy k-means algorithm.
Clustering Algorithms
▪ The Clustering algorithms can be divided based on their models that are
explained above.
▪ There are different types of clustering algorithms published, but only a few
are commonly used.
▪ The clustering algorithm is based on the kind of data that we are using.
▪ Such as, some algorithms need to guess the number of clusters in the given
dataset, whereas some are required to find the minimum distance
between the observation of the dataset.
1. K-Means algorithm
2. Mean-shift algorithm
3. DBSCAN Algorithm
4. Expectation-Maximization Clustering using GMM
5. Agglomerative Hierarchical algorithm
6. Affinity Propagation
Hence each cluster has datapoints with some commonalities, and it is away from
other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
How does the K-Means Algorithm Work?
Step-2: Select random K points or centroids. (It can be other from the input
dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each data point to the new
closest centroid of each cluster.
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two
variables is given below:
o Let's take number k of clusters, i.e., K=2, to identify the dataset and to put
them into different clusters. It means here we will try to group these
datasets into two different clusters.
o We need to choose some random k points or centroid to form the cluster.
o These points can be either the points from the dataset or any other point.
o So, here we are selecting the below two points as k points, which are not
the part of our dataset.
o Now we will assign each data point of the scatter plot to its closest K-point
or centroid.
o We will compute it by applying some mathematics that we have studied to
calculate the distance between two points.
o So, we will draw a median between both the centroids. Consider the below
image:
From the above image, it is clear that points left side of the line is near to the K1
or blue centroid, and points to the right of the line are close to the yellow
centroid.
From the above image, we can see, one yellow point is on the left side of the line,
and two blue points are right to the line. So, these three points will be assigned
to new centroids.
As reassignment has taken place, so we will again go to the step-4, which is
finding new centroids or K-points.
o As we got the new centroids so again will draw the median line and
reassign the data points. So, the image will be:
o We can see in the above image; there are no dissimilar data points on
either side of the line, which means our model is formed. Consider the
below image:
As our model is ready, so we can now remove the assumed centroids, and the
two final clusters will be as shown in the below image:
Hierarchical Clustering in Machine Learning
▪ Hierarchical clustering is another unsupervised machine learning
algorithm, which is used to group the unlabeled datasets into a cluster and
also known as hierarchical cluster analysis or HCA.
▪ In this algorithm, we develop the hierarchy of clusters in the form of a tree,
and this tree-shaped structure is known as the dendrogram.
▪ Sometimes the results of K-means clustering and hierarchical clustering
may look similar, but they both differ depending on how they work.
▪ As there is no requirement to predetermine the number of clusters as we
did in the K-Means algorithm.
The working of the AHC algorithm can be explained using the below steps:
o Step-1: Create each data point as a single cluster. Let's say there are N data
points, so the number of clusters will also be N.
o Step-2: Take two closest data points or clusters and merge them to form
one cluster. So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to
form one cluster. There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the
following clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.
As we have seen, the closest distance between the two clusters is crucial for the
hierarchical clustering. There are various ways to calculate the distance between
two clusters, and these ways decide the rule for clustering. These measures are
called Linkage methods. Some of the popular linkage methods are given below:
1. Single Linkage: It is the Shortest Distance between the closest points of the
clusters. Consider the below image:
From the above-given approaches, we can apply any of them according to the
type of problem or business requirement.
The working of the dendrogram can be explained using the below diagram:
In the above diagram, the left part is showing how clusters are created in
agglomerative clustering, and the right part is showing the corresponding
dendrogram.
We can cut the dendrogram tree structure at any level as per our requirement.
We can understand the concept of regression analysis using the below example:
Types of Regression
▪ There are various types of regressions which are used in data science and
machine learning.
▪ Each type has its own importance on different scenarios, but at the core,
all the regression methods analyze the effect of the independent variable
on dependent variables.
Here we are discussing some important types of regression which are given
below:
o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:
Cost Functions
o True Negative: Model has given prediction No, and the real or actual
value was also No.
o True Positive: The model has predicted yes, and the actual value was also
true.
o False Negative: The model has predicted no, but the actual value was Yes,
it is also called as Type-II error.
o False Positive: The model has predicted Yes, but the actual value was No.
It is also called a Type-I error.
o The table is given for the two-class classifier, which has two predictions
"Yes" and "NO." Here, Yes defines that patient has the disease, and No
defines that patient does not has that disease.
o The classifier has made a total of 100 predictions. Out of 100
predictions, 89 are true predictions, and 11 are incorrect predictions.
o The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.
Cross Validation in Machine Learning
• In machine learning, we couldn’t fit the model on the training data and
can’t say that the model will work accurately for the real data.
• For this, we must assure that our model got the correct patterns from the
data, and it is not getting up too much noise.
• For this purpose, we use the cross-validation technique.
Cross-Validation
Example
• In first iteration we use the first 20 percent of data for evaluation, and
the remaining 80 percent for training
• while in the second iteration we use the second subset of 20 percent for
evaluation, and the remaining three subsets of the data for training
• The imbalance in the class distribution may vary, but a severe imbalance is more
challenging to model and may require specialized techniques.
Evaluation Metrics