Machine Learning
Machine Learning
Supervised learning is a type of machine learning in which a computer algorithm learns to make
predictions or decisions based on labeled data. Labeled data is made up of previously known
input variables (also known as features) and output variables (also known as labels). By
analyzing patterns and relationships between input and output variables in labeled data, the
algorithm learns to make predictions.
Examples: Image and speech recognition, recommendation systems, and fraud detection are all
examples of how supervised learning is used.
Training
Testing
Types of Supervised Learning
1. Classification
2. Regression
Classification
Classification is a type of supervised learning algorithm which involves the process of
accurately assigning data to different categories or classes.
Classification uses an algorithm to accurately assign test data into specific categories. It
recognizes specific entities within the dataset and attempts to draw some conclusions on how
those entities should be labeled or defined.
Regression
Regression is used to understand the relationship between dependent and independent
variables
Y=a + bX
Unsupervised Learning:
Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden patterns
or data groupings without the need for human intervention.
Its ability to discover similarities and differences in information make it the ideal solution.
1. Business Analysis
2. Web Searching
o Clustering:
o Clustering is a data mining technique which groups unlabeled data based on their
similarities or differences. Clustering algorithms are used to process raw, unclassified data
objects into groups represented by structures or patterns in the information. Clustering
algorithms can be categorized into a few types, specifically exclusive, overlapping,
hierarchical, and probabilistic.
o Association:
o An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that
occurs together in the dataset.
o Association rule makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
Decision Tree:
A decision tree is a non-parametric supervised learning algorithm for classification and
regression tasks. It has a hierarchical tree structure consisting of a root node, branches,
internal nodes, and leaf nodes. Decision trees are used for classification and regression tasks,
providing easy-to-understand models.
Root Node: The initial node at the beginning of a decision tree, where the entire
population or dataset starts dividing based on various features or conditions.
Decision Nodes: Nodes resulting from the splitting of root nodes are known as decision
nodes. These nodes represent intermediate decisions or conditions within the tree.
Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
Pruning: The process of removing or cutting down specific nodes in a decision tree to
prevent overfitting and simplify the model.
o P(no)= probability of no
Eg:
Day Weather Temperature Humidity Wind Play Football?
Day1 Sunny Hot High Weak NO
Day2 Sunny Hot High Strong NO
Day3 Cloudy Hot High Weak YES
Day4 Rain Mild High Weak YES
Day5 Rain Cool Normal Weak YES
Day6 Rain Cool Normal Strong NO
Day7 Cloudy Cool Normal Strong YES
Day8 Sunny Mild High Weak NO
Day9 Sunny Cool Normal Weak YES
Day10 Rain Mild Normal Weak YES
Day11 Sunny Mild Normal Strong YES
Day12 Cloudy Mild High Strong YES
Day13 Cloudy Hot Normal Weak YES
Day14 Rain Mild High Strong NO
Calculate IG of Weather
Step 1:
−9 9 5 5
S{+9,-5}= 14
log 2 − log 2 =0.94
14 14 14
Step 2:
Entropy of all the Attributes:
−2 2 3 3
Entropy of Sunny{+2,-3}= 5
log 2 − log 2 =0.97
5 5 5
−4 4 0 0
Entropy of Cloud{+4,-0}= 4 log2 4 − 4 log2 4 =0
−3 3 2 2
Entropy of Rain{+3,-2}= 5 log2 5 − 5 log2 5 =0.97
−5 −4 −5
Information Gain= Entropy (Whole Data) 14
log 2 Ent(S) 14
log 2 Ent(C) 14
log 2 Ent(R) =0.246
Calculate IG of Temperature
Step 1:
−9 9 5 5
S{+9,-5}= 14
log 2 − log 2 =0.94
14 14 14
Calculate IG of Humidity
Step 1:
−9 9 5 5
S{+9,-5}= 14
log 2 − log 2 =0.94
14 14 14
Calculate IG of Wind
Step 1:
−9 9 5 5
S{+9,-5}= 14
log 2 − log 2 =0.94
14 14 14
IG of Weather: 0.246
IG of Temperature: 0.029
IG of Humidity: 0.15
IG of Wind: 0.0478
IG of Temperature: 0.57
IG of Humidity: 0.97
IG of Wind: 0.01
Day Weather Temperature Humidity Wind Play Football?
Day4 Rain Mild High Weak YES
Day5 Rain Cool Normal Weak YES
Day6 Rain Cool Normal Strong NO
Day10 Rain Mild Normal Weak YES
Day14 Rain Mild High Strong NO
Calculate IG of Temperature
Step 1:
S¿ 0.97
Entropy of all the Attributes:
Entropy of Hot¿ 0
Entropy of Mild¿ 0.918
Entropy of Cold¿ 1.0
Information Gain =0.019
Calculate IG of Humidity
Step 1:
S¿ 0.97
Entropy of all the Attributes:
Entropy of High¿ 1
Entropy of Normal¿ 0.918
Information Gain =0.019
Calculate IG of Wind
Step 1:
S¿ 0.97
Entropy of all the Attributes:
Entropy of Strong¿ 0
Entropy of Weak¿ 0
Information Gain =0.97
IG of Temperature: 0.019
IG of Humidity: 0.019
IG of Wind: 0.97