Module 5
Module 5
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which
work on our instructions. But can a machine also learn from experiences or past data
like a human does? So here comes the role of Machine Learning.
TRACE KTU
Machine Learning is said as a subset of artificial intelligence that is mainly concerned
with the development of algorithms which allow a computer to learn from the data
and past experiences on their own. The term machine learning was first introduced
by Arthur Samuel in 1959. We can define it in a summarized way as:
With the help of sample historical data, which is known as training data, machine
learning algorithms build a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine learning brings computer
science and statistics together for creating predictive models. Machine learning
constructs or uses the algorithms that learn from historical data. The more we will
provide the information, the higher will be the performance.
A machine has the ability to learn if it can improve its performance by gaining
more data.
TRACE KTU
At a broad level, machine learning can be classified into three types:
1. Supervised learning
2. Unsupervised learning
3. Reinforcement learning
1) Supervised Learning
Supervised learning is a type of machine learning method in which we provide sample
labeled data to the machine learning system in order to train it, and on that basis, it
predicts the output.
TRACE KTU
The system creates a model using labeled data to understand the datasets and learn
about each data, once the training and processing are done then we test the model
by providing a sample data to check whether it is predicting the exact output or not.
The goal of supervised learning is to map input data with the output data. The
supervised learning is based on supervision, and it is the same as when a student learns
things in the supervision of the teacher. The example of supervised learning is spam
filtering.
o Classification
o Regression
2) Unsupervised Learning
Unsupervised learning is a learning method in which a machine learns without any
supervision.
The training is provided to the machine with the set of data that has not been labeled,
classified, or categorized, and the algorithm needs to act on that data without any
supervision. The goal of unsupervised learning is to restructure the input data into new
features or a group of objects with similar patterns.
o Clustering
o Association
3) Reinforcement Learning
Reinforcement learning is a feedback-based learning method, in which a learning
agent gets a reward for each right action and gets a penalty for each wrong action.
The agent learns automatically with these feedbacks and improves its performance. In
reinforcement learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it improves its
performance.
The robotic dog, which automatically learns the movement of his arms, is an example
of Reinforcement learning.
o
TRACE KTU
Decision Tree Classification Algorithm
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Note: A decision tree can contain categorical data (YES/NO) as well as numeric
data.
TRACE KTU
Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for
the given dataset and problem is the main point to remember while creating a machine
learning model. Below are the two reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-
like structure.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts
from the root node of the tree. This algorithm compares the values of root attribute
with the record (real dataset) attribute and, based on the comparison, follows the
branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other
sub-nodes and move further. It continues the process until it reaches the leaf node of
the tree. The complete process can be better understood using the below algorithm:
o
dataset. TRACE KTU
Step-1: Begin the tree with the root node, says S, which contains the complete
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best
attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the decision tree
starts with the root node (Salary attribute by ASM). The root node splits further into
the next decision node (distance from the office) and one leaf node based on the
corresponding labels. The next decision node further gets split into one decision node
(Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below diagram:
Attribute
Selection Measures
While implementing a Decision tree, the main issue arises that how to select the best
TRACE KTU
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this
measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
o Information gain is the measurement of changes in entropy after the segmentation of
a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split first. It can be calculated
using the below formula:
Where,
2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.
o Gini index can be calculated using the below formula:
TRACE KTU
Gini Index= 1- ∑jPj2
A too-large tree increases the risk of overfitting, and a small tree may not capture all
the important features of the dataset. Therefore, a technique that decreases the size
of the learning tree without reducing accuracy is known as Pruning. There are mainly
two types of tree pruning technology used:
A statistical model is said to be overfitted if it can’t generalize well with unseen data.
TRACE KTU
Before understanding overfitting, we need to know some basic terms, which are:
Noise: Noise is meaningless or irrelevant data present in the dataset. It affects the
performance of the model if it is not removed.
Bias: Bias is a prediction error that is introduced in the model due to oversimplifying
the machine learning algorithms. Or it is the difference between the predicted values
and the actual values.
Play Video
x
Variance: If the machine learning model performs well with the training dataset, but
does not perform well with the test dataset, then variance occurs.
What is Overfitting?
o Overfitting & underfitting are the two main errors/problems in the machine learning
o
TRACE KTU
model, which cause poor performance in Machine Learning.
Overfitting occurs when the model fits more data than required, and it tries to capture
each and every datapoint fed to it. Hence it starts capturing noise and inaccurate data
from the dataset, which degrades the performance of the model.
o An overfitted model doesn't perform accurately with the test/unseen dataset and can’t
generalize well.
o An overfitted model is said to have low bias and high variance.
Suppose the model learns the training dataset, like the Y student. They perform very
well on the seen dataset but perform badly on unseen data or unknown instances. In
such cases, the model is said to be Overfitting.
And if the model performs well with the training dataset and also with the test/unseen
dataset, similar to student Z, it is said to be a good fit.
In the train-test split of the dataset, we can divide our dataset into random test and
training datasets. We train the model with a training dataset which is about 80% of the
total dataset. After training the model, we test it with the test dataset, which is 20 % of
the total dataset.
Now, if the model performs well with the training dataset but not with the test dataset,
then it is likely to have an overfitting issue.
TRACE KTU
For example, if the model shows 85% accuracy with training data and 50% accuracy
with the test dataset, it means the model is not performing well.
TRACE KTU
Ways to prevent the Overfitting
Although overfitting is an error in Machine learning which reduces the performance of
the model, however, we can prevent it in several ways. With the use of the linear model,
we can avoid overfitting; however, many real-world problems are non-linear ones. It is
important to prevent overfitting from the models. Below are several ways that can be
used to prevent overfitting:
1. Early Stopping
2. Train with more data
3. Feature Selection
4. Cross-Validation
5. Data Augmentation
6. Regularization
What is Hypothesis?
The hypothesis is defined as the supposition or proposed explanation based on
TRACE KTU
insufficient evidence or assumptions. It is just a guess based on some known facts
but has not yet been proven. A good hypothesis is testable, which results in either true
or false.
Example: Let's understand the hypothesis with a common example. Some scientist
claims that ultraviolet (UV) light can damage the eyes then it may also cause blindness.
49.9M
911
In this example, a scientist just claims that UV rays are harmful to the eyes, but we
assume they may cause blindness. However, it may or may not be possible. Hence,
these types of assumptions are called a hypothesis.
There are some common methods given to find out the possible hypothesis from the
Hypothesis space, where hypothesis space is represented by uppercase-h (H) and
hypothesis by lowercase-h (h). Th ese are defined as follows:
TRACE KTU
Hypothesis space (H):
Hypothesis space is defined as a set of all possible legal hypotheses; hence it is
also known as a hypothesis set. It is used by supervised machine learning algorithms
to determine the best possible hypothesis to describe the target function or best maps
input to output.
It is often constrained by choice of the framing of the problem, the choice of model,
and the choice of model configuration.
Hypothesis (h):
It is defined as the approximate function that best describes the target in supervised
machine learning algorithms. It is primarily based on data as well as bias and
restrictions applied to data.
Hence hypothesis (h) can be concluded as a single hypothesis that maps input to
proper output and can be evaluated as well as used to make predictions.
y= mx + b
Where,
Y: Range
m: Slope of the line which divided test data or changes in y divided by change in x.
x: domain
c: intercept (constant)
Example: Let's understand the hypothesis (h) and hypothesis space (H) with a two-
dimensional coordinate plane showing the distribution of data as follows:
TRACE KTU
Now, assume we have some test data by which ML algorithms predict the outputs for input
as follows:
Difference between Regression and Classification
In Regression, the output variable must be of In Classification, the output variable must be a discrete v
continuous nature or real value.
TRACE KTU
The task of the regression algorithm is to map
the input value (x) with the continuous output
The task of the classification algorithm is to map the
value(x) with the discrete output variable(y).
variable(y).
Regression Algorithms are used with continuous Classification Algorithms are used with discrete data.
data.
In Regression, we try to find the best fit line, In Classification, we try to find the decision boundary, w
which can predict the output more accurately. can divide the dataset into different classes.
Regression algorithms can be used to solve the Classification Algorithms can be used to solve classific
regression problems such as Weather Prediction, problems such as Identification of spam emails, Sp
House price prediction, etc. Recognition, Identification of cancer cells, etc.
The regression Algorithm can be further divided The Classification algorithms can be divided into B
into Linear and Non-linear Regression. Classifier and Multi-class Classifier.