Learning Types ML
Learning Types ML
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into a
well suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it
is used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use the
KNN algorithm, as it works on a similarity measure. Our KNN model will find the
similar features of the new data set to the cats and dogs images and based on the most
similar features it will put it in either cat or dog category.
Why do we need a K-NN Algorithm?
Suppose there are two categories, i.e., Category A and Category B, and we have a new data
point x1, so this data point will lie in which of these categories. To solve this type of problem,
we need a K-NN algorithm. With the help of K-NN, we can easily identify the category or class
of a particular dataset. Consider the below diagram:
The K-NN working can be explained on the basis of the below algorithm:
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry.
It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
o As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
Below are some points to remember while selecting the value of K in the K-NN algorithm:
o There is no particular way to determine the best value for "K", so we need to try some
values to find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of
outliers in the model.
o Large values for K are good, but it may find some difficulties.
Introduction to Decision Tree
In general, Decision tree analysis is a predictive modelling tool that can be applied across
many areas. Decision trees can be constructed by an algorithmic approach that can split the
dataset in different ways based on different conditions. Decisions trees are the most powerful
algorithms that falls under the category of supervised algorithms.
They can be used for both classification and regression tasks. The two main entities of a tree
are decision nodes, where the data is split and leaves, where we got outcome. The example of
a binary tree for predicting whether a person is fit or unfit providing various information like
age, eating habits and exercise habits, is given below −
In the above decision tree, the question are decision nodes and final outcomes are leaves. We
have the following two types of decision trees.
• Classification decision trees − In this kind of decision trees, the decision variable is
categorical. The above decision tree is an example of classification decision tree.
• Regression decision trees − In this kind of decision trees, the decision variable is
continuous.
Gini Index
It is the name of the cost function that is used to evaluate the binary splits in the dataset and
works with the categorial target variable “Success” or “Failure”.
Higher the value of Gini index, higher the homogeneity. A perfect Gini index value is 0 and
worst is 0.5 (for 2 class problem). Gini index for a split can be calculated with the help of
following steps −
• First, calculate Gini index for sub-nodes by using the formula p^2+q^2, which is the
sum of the square of probability for success and failure.
• Next, calculate Gini index for split using weighted Gini score of each node of that
split.
Classification and Regression Tree (CART) algorithm uses Gini method to generate binary
splits.
Split Creation
A split is basically including an attribute in the dataset and a value. We can create a split in
dataset with the help of following three parts −
• Part 1: Calculating Gini Score − We have just discussed this part in the previous
section.
• Part 2: Splitting a dataset − It may be defined as separating a dataset into two lists
of rows having index of an attribute and a split value of that attribute. After getting
the two groups - right and left, from the dataset, we can calculate the value of split by
using Gini score calculated in first part. Split value will decide in which group the
attribute will reside.
• Part 3: Evaluating all splits − Next part after finding Gini score and splitting dataset
is the evaluation of all splits. For this purpose, first, we must check every value
associated with each attribute as a candidate split. Then we need to find the best
possible split by evaluating the cost of the split. The best split will be used as a node
in the decision tree.
Building a Tree
As we know that a tree has root node and terminal nodes. After creating the root node, we can
build the tree by following two parts −
While creating terminal nodes of decision tree, one important point is to decide when to stop
growing tree or creating further terminal nodes. It can be done by using two criteria namely
maximum tree depth and minimum node records as follows −
• Maximum Tree Depth − As name suggests, this is the maximum number of the
nodes in a tree after root node. We must stop adding terminal nodes once a tree
reached at maximum depth i.e. once a tree got maximum number of terminal nodes.
• Minimum Node Records − It may be defined as the minimum number of training
patterns that a given node is responsible for. We must stop adding terminal nodes
once tree reached at these minimum node records or below this minimum.
As we understood about when to create terminal nodes, now we can start building our tree.
Recursive splitting is a method to build the tree. In this method, once a node is created, we
can create the child nodes (nodes added to an existing node) recursively on each group of
data, generated by splitting the dataset, by calling the same function again and again.
Prediction
After building a decision tree, we need to make a prediction about it. Basically, prediction
involves navigating the decision tree with the specifically provided row of data.
We can make a prediction with the help of recursive function, as did above. The same
prediction routine is called again with the left or the child right nodes.
Assumptions
The following are some of the assumptions we make while creating decision tree −
Introduction
Random forest is a supervised learning algorithm which is used for both classification as well
as regression. But however, it is mainly used for classification problems. As we know that a
forest is made up of trees and more trees means more robust forest. Similarly, random forest
algorithm creates decision trees on data samples and then gets the prediction from each of
them and finally selects the best solution by means of voting. It is an ensemble method which
is better than a single decision tree because it reduces the over-fitting by averaging the result.
We can understand the working of Random Forest algorithm with the help of following steps
−
• Step 1 − First, start with the selection of random samples from a given dataset.
• Step 2 − Next, this algorithm will construct a decision tree for every sample. Then it
will get the prediction result from every decision tree.
• Step 3 − In this step, voting will be performed for every predicted result.
• Step 4 − At last, select the most voted prediction result as the final prediction result.
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression. But generally, they are used
in classification problems. In 1960s, SVMs were first introduced but later they got refined in
1990. SVMs have their unique way of implementation as compared to other machine learning
algorithms. Lately, they are extremely popular because of their ability to handle multiple
continuous and categorical variables.
Working of SVM
• Support Vectors − Datapoints that are closest to the hyperplane is called support
vectors. Separating line will be defined with the help of these data points.
• Hyperplane − As we can see in the above diagram, it is a decision plane or space
which is divided between a set of objects having different classes.
• Margin − It may be defined as the gap between two lines on the closet data points of
different classes. It can be calculated as the perpendicular distance from the line to the
support vectors. Large margin is considered as a good margin and small margin is
considered as a bad margin.
The main goal of SVM is to divide the datasets into classes to find a maximum marginal
hyperplane (MMH) and it can be done in the following two steps −
• First, SVM will generate hyperplanes iteratively that segregates the classes in best
way.
• Then, it will choose the hyperplane that separates the classes correctly.
Businesses around the world today are smart and do everything to get and retain their
customers. They can identify malicious credit/debit card transactions, they can identify a
person uniquely with face or eye detection as a password to unlock a device, offer what their
customer are looking for in the least possible time, separate spams from regular emails, and
predict within how much time one can reach their intended destination depending upon length
of road, weather conditions, and traffic, etc.
These challenging tasks are possible only when the algorithms carrying out such predictions
are smart, and the learning approaches are the ones which make the algorithms smart.
When it comes to data mining, there are two main approaches of Machine Learning −
• Supervised learning
• Unsupervised learning
Read through this article to find out more about supervised and unsupervised learning and how
they are different from each other.
The supervised learning approach of machine learning is the approach with which the
algorithms are trained by using labelled datasets. The datasets train the algorithms to be
smarter. They make it easy for the algorithms to predict the outcome as accurate as possible.
A dataset is the collection of related yet discrete data, which can be used or managed
individually as well as a group. The labelled datasets are the named pieces of data that are
tagged with one or multiple labels pertaining to certain properties or characteristics.
For example, look at the picture below. It depicts classification and labelling −
The labelled datasets make the algorithms understand the relationship among the datasets and
carry out classification or prediction as a new outcome quickly with utmost accuracy. In this
approach, human intervention is necessary to define properties and characteristics as well as to
label the data appropriately.
Supervised Learning is used for the data where the input and output data can be precisely
mapped.
• Classification − In this approach, algorithms are trained to categorize the data into
distinct units depending on their labels. Examples of some classification algorithms are
− Decision Tree, Random Forest, Support Vector Machine, etc. Classification can be
of types Binary and Multi-class.
• Regression − This approach makes a computer program understand the relationship
between dependent and independent data. As the name suggests regression means
"going back to", the algorithm is exposed to the past data. Once training the algorithm
is completed, the algorithm can predict the future values easily. Some popular
regression algorithms are Linear, Logistic, and Polynomial regression. Regression can
be of types Linear and Non-linear.
Both the above algorithms of machine learning are used for prediction. Both the algorithms
work with the labelled datasets. Then what is the difference between the two?
The unsupervised learning approach of machine learning does not use labelled datasets for
training the algorithms. Instead, the machines learn on their own by accessing massive amount
of unclassified data and finding its implicit patterns. The algorithms analyze and cluster the
unlabelled datasets. There is no human intervention required while analyzing and clustering
hence the name "Unsupervised".
• Association −This approach uses some rules to find relationships between variables in
a dataset. This approach is often used in suggestions and recommendation. For example,
suggesting an item to a customer with: "The customers who bought this item also
bought", or "You may also like", or simply by showing allied product images and
recommending to buy related items. For example, when the primary product being
purchased is a computer, then suggesting to buy a wireless mouse and a remote
keyboard too.
• Clustering − It is a learning technique in data mining where unlabelled or unclassified
data are grouped depending on either similarities or differences among them. This
technique is helpful for the businesses to understand market segments depending on the
customers demographics.
• Dimensionality Reduction − It is a learning technique used to reduce the number of
random variables or ‘dimensions’ to obtain a set of principal variables, when the
number of variables is very high. This technique helps data compression without
compromising the usability of the data. This learning is used for pre-processing of the
audio/visual data to improve the quality of the outcome or making the background of
an image transparent.
The following table highlights the major differences between Supervised and Unsupervised
learning –
CLUSTERING
Data Mining - Cluster Analysis
Cluster is a group of objects that belongs to the same class. In other words, similar objects are
grouped in one cluster and dissimilar objects are grouped in another cluster.
What is Clustering?
Clustering is the process of making a group of abstract objects into classes of similar objects.
Points to Remember
The following points throw light on why clustering is required in data mining −
Clustering Methods
• Partitioning Method
• Hierarchical Method
• Density-based Method
• Grid-Based Method
• Model-Based Method
• Constraint-based Method
Partitioning Method
Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’
partition of data. Each partition will represent a cluster and k ≤ n. It means that it will classify
the data into k groups, which satisfy the following requirements −
The initial values for the means are arbitrarily assigned. These can be assigned randomly or
perhaps can use the values from the first k input items themselves. The convergence element
can be based on the squared error, but they are required not to be. For example, the algorithm
is assigned to different clusters. Other termination techniques have simply locked at a fixed
number of iterations. A maximum number of iterations can be included to ensure shopping
even without convergence.
Hierarchical Methods
This method creates a hierarchical decomposition of the given set of data objects. We can
classify hierarchical methods on the basis of how the hierarchical decomposition is formed.
There are two approaches here −
• Agglomerative Approach
• Divisive Approach
Agglomerative Approach
This approach is also known as the bottom-up approach. In this, we start with each object
forming a separate group. It keeps on merging the objects or groups that are close to one
another. It keep on doing so until all of the groups are merged into one or until the
termination condition holds.
Divisive Approach
This approach is also known as the top-down approach. In this, we start with all of the objects
in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is
down until each object in one cluster or the termination condition holds. This method is rigid,
i.e., once a merging or splitting is done, it can never be undone.
Here are the two approaches that are used to improve the quality of hierarchical clustering −
We are going to explain the most used and important Hierarchical clustering i.e.
agglomerative. The steps to perform the same is as follows −
• Step 1 − Treat each data point as single cluster. Hence, we will be having, say K
clusters at start. The number of data points will also be K at start.
• Step 2 − Now, in this step we need to form a big cluster by joining two closet
datapoints. This will result in total of K-1 clusters.
• Step 3 − Now, to form more clusters we need to join two closet clusters. This will
result in total of K-2 clusters.
• Step 4 − Now, to form one big cluster repeat the above three steps until K would
become 0 i.e. no more data points left to join.
• Step 5 − At last, after making one single big cluster, dendrograms will be used to
divide into multiple clusters depending upon the problem.
Introduction
DBSCAN algorithm can cluster densely grouped points efficiently into one cluster. It can
identify local density in the data points among large datasets. DBSCAN can very effectively
handle outliers. An advantage of DBSACN over the K-means algorithm is that the number of
centroids need not be known beforehand in the case of DBSCAN.
Epsilon is defined as the radius of each data point around which the density is considered.
minPoints is the number of points required within the radius so that the data point becomes a
core point.
In the DBSCAN algorithm, a circle with a radius epsilon is drawn around each data point and
the data point is classified into Core Point, Border Point, or Noise Point. The data point is
classified as a core point if it has minPoints number of data points with epsilon radius. If it has
points less than minPoints it is known as Border Point and if there are no points inside epsilon
radius it is considered a Noise Point.
In the above figure, we can see that point A has no points inside epsilon(e) radius. Hence it is
a Noise Point. Point B has minPoints(=4) number of points with epsilon e radius , thus it is a
Core Point. While the point has only 1 ( less than minPoints) point, hence it is a Border Point.
• DBSCAN does not require the number of centroids to be known beforehand as in the
case with the K-Means Algorithm.
• It can find clusters with any shape.
• It can also locate clusters that are not connected to any other group or clusters. It can
work well with noisy clusters.
• It is robust to outliers.