0% found this document useful (0 votes)

16 views5 pages

Data Algo Metrics

Uploaded by

jeethhirrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views5 pages

Data Algo Metrics

Uploaded by

jeethhirrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

1.

Data Mining
We used classification models to predict the opinions from the dataset. Classification models
perform the task of drawing conclusions by observing data points and subsequently predict
the label value for each label in the dataset. In classification model, inputs are taken from
various features in the dataset and output is given in terms of a label or a class for the
predicted variable. Depending on the number of classes we want to predict there are two types
of classification - binary classification and multi-class classification. If two classes are to be
predicted then it is known as binary classification and if more than two classes are to be
predicted it becomes multi-class classification. To understand a pattern from qualitative
features of data, rule based classifiers like Naive Bayes, Random Forrest, KNN, Artificial
Neural Networks are generally used to give better performance. Thus, we have used these
models to predict suppliers.

1.1 Naive Bayes Classifier

Naive Bayes is an effective, simple and mostly-used machine learning classifier. It classifies
using the Maximum A Posteriori decision rule in a Bayesian conditional probability theorem.
It is normally used when features in the database are of contrasting nature. It takes available
past information to calculate the probability of future events (Soni D. 2018).
To perform the above operation we use Bayes Rule as follows.

Equation 3.1

1.2 Random Forrest Classifier

Random forest also know as Random Decision Forest are one of the supervised machine
learning classifier which analyses the given data features and figure out a way to effectively
split the data into trees and sub trees in order to predict new values. The output is achieved by
corroborating the collection of decision trees classifier to settle for a single result. Thus the
winner class is the one who appears most times in the list of outputs from all the decision
trees used (Borcan M. 2020).
To perform the above operation, each decision tree calculates a nodes using Gini Importance
(Ronaghan S. 2018)

Equation 3.2

To perform the above operation, each decision tree calculates a nodes using Gini Importance
(Ronaghan S. 2018
 nij = the importance of node j
 wj = weighted number of samples reaching node j
 Cj = the impurity value of node j
 leftj = child node from left split on node j
 rightj = child node from right split on node j

The importance for each feature on a decision tree is then calculated as:

Equation 3.3

 fij = the importance of feature i

 nij = the importance of node j

These can then be normalized to a value between 0 and 1 by dividing by the sum of all feature
importance values:

Equation 3.4

2
The final feature importance, at the Random Forest level, is it’s average over all the trees. The
sum of the feature’s importance value on each trees is calculated and divided by the total
number of trees:

Equation 3.5

 RFfij = the importance of feature i calculated from all trees in the Random Forest
model
 normfiij = the normalized feature importance for i in tree j
 T = total number of trees

1.3 KNN Classifier

It is a supervised machine learning classifier and it stands for k-Nearest Neighbours. It is a
simple algorithm which stores all available cases and classifies the new data point based on
how its neighbours are classified. ‘k’ in KNN is based on feature similarity which represents
the number of nearest neighbours to include so as to obtain desired outcome and process of
choosing right value of K is called parameter tuning (Subramanian 2019).
Similarity here is defined according to a distance metric between two data points by
Euclidean distance method.

Equation 3.6

1.4 ANN Classifier

Artificial Neural Network Classifier imitates the processing of human brain neurons as a basis
to develop algorithms which are used to create complex model patterns. It basically has
multiple interconnected nodes of neurons which are divided into three layers i.e input layer,
hidden layer and output layer. ANN mostly is used to solve multi class classification problem
as it measures interdependencies between the output classes (Jahnavi 2017).

3
Figure 3.2 – Single Layer Neural Network (Perceptron)

 x0, x1, x2, x3...x(n) – Input nodes (Independent Variables)

 w0, w1, w2, w3….w(n) – Weights represents strenghts of individual nodes
 b – Bias value to shift the activation function up or down
 f – Activation Function
 o - output

1.5 Evaluation
The results which are generated by the various data mining classification models will be
evaluated using the below mentioned metrics.
1.5.1 Confusion Matrix
It represents overall model performance in a tabular format, where prediction of classes are
summarised as follows:
 True Positives (TP): correctly predicted class 1 as 1
 True Negatives (TN): correctly predicted class 2 as 2
 False Positives (FP): incorrectly predicted class 2 as 1
 False Negatives (FN): incorrectly predicted class 1 as 2

1.5.2 Precision
It is the ratio of true positive classes to the number of actual positive classes and is formulated
as follow:
 Precision = TP/(TP + FP)

1.5.3 F1 Score

4
The harmonic mean of recall and precision is termed as F1 Score. It indicates the classifier’s
overall performance as it takes into account both false positives and false negatives as
follows:
 F1 Score = 2*(Recall * Precision) / (Recall + Precision)

1.5.4 Area under the Curve (AUC)

It is the probability of a model classifying positive observation higher than the negative one
and it values ranges from 0 to 1.

1.5.5 Classification Accuracy (CA)

It is the fraction of correctly predicted observations as follows:
 CA = (TP+TN)/(TP + TN + FP + FN)

1.6 Environment Setup

The technical specifications of the expected system configurations for the proposed
architecture are show below:

Memory Processor Speed

8 GB
RAM Intel i5 8250U 1.80 GHz
Table 3.1 - Hardware Specification

Storage Software and Libraries

Local Anaconda – Spyder IDE, Load libraries numpy, pandas, scikit, matplotlib, sklearn, dlib,
HDD keras, tensorflow
Table 3.2 – Software Specification

ML UNIT4
No ratings yet
ML UNIT4
10 pages
class 2a-Decision Trees
No ratings yet
class 2a-Decision Trees
28 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
DM UNIT III (1)
No ratings yet
DM UNIT III (1)
87 pages
ML Assignment Report Prithvi D
No ratings yet
ML Assignment Report Prithvi D
15 pages
Classification
No ratings yet
Classification
33 pages
Slide 3
No ratings yet
Slide 3
23 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
(REPORT) LAB - 2 - Decision - Tree
No ratings yet
(REPORT) LAB - 2 - Decision - Tree
17 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
CH 5
No ratings yet
CH 5
84 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Classification Notes (1)
No ratings yet
Classification Notes (1)
14 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
7 Classification
100% (3)
7 Classification
63 pages
classification
No ratings yet
classification
36 pages
Big Data Lesson 5 Lucrezia Noli
No ratings yet
Big Data Lesson 5 Lucrezia Noli
30 pages
Down 4
No ratings yet
Down 4
83 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
DM_06-Mar-2025
No ratings yet
DM_06-Mar-2025
13 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
ML Unit 2
No ratings yet
ML Unit 2
84 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
Unit 3
No ratings yet
Unit 3
16 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
UNIT 2-Part2
No ratings yet
UNIT 2-Part2
9 pages
331mt 3.1 (1)
No ratings yet
331mt 3.1 (1)
36 pages
3 DM Classification (2)
No ratings yet
3 DM Classification (2)
62 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Unit-3
No ratings yet
Unit-3
53 pages
8 Classification
No ratings yet
8 Classification
45 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Unit 4, DWDM,IT Dept, III Year- II Semester
No ratings yet
Unit 4, DWDM,IT Dept, III Year- II Semester
87 pages
Chatgpt Unit - 3
No ratings yet
Chatgpt Unit - 3
4 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Ds Notes Mca
No ratings yet
Ds Notes Mca
30 pages
Draft Xai
No ratings yet
Draft Xai
16 pages
Dbms Unit 3
No ratings yet
Dbms Unit 3
40 pages
ML (Interview)
No ratings yet
ML (Interview)
20 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Wireless-Signal-Based Vehicle Counting and Classification in Different Road Environments
No ratings yet
Wireless-Signal-Based Vehicle Counting and Classification in Different Road Environments
15 pages
Dmbi Unit-3
No ratings yet
Dmbi Unit-3
21 pages
Gradient Ascent
No ratings yet
Gradient Ascent
27 pages
Fast Automatic Face Recognition From Single Image Per Person Using GAW-KNN
No ratings yet
Fast Automatic Face Recognition From Single Image Per Person Using GAW-KNN
9 pages
Predicting Outcome of Soccer Matches Using Machine Learning
No ratings yet
Predicting Outcome of Soccer Matches Using Machine Learning
12 pages
Probability Drinking Water (Water Quality Index) Industrial Training
No ratings yet
Probability Drinking Water (Water Quality Index) Industrial Training
32 pages
fake review detection
No ratings yet
fake review detection
9 pages
New Chapters
No ratings yet
New Chapters
47 pages
R Machine Learning PDF
No ratings yet
R Machine Learning PDF
137 pages
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
100% (1)
DATA SCIENCE With DA, ML, DL, AI Using Python & R PDF
10 pages
Wine Quality Prediction
No ratings yet
Wine Quality Prediction
6 pages
Machine Learning With Python
No ratings yet
Machine Learning With Python
3 pages
Generative AI and Machine Learning Course Content
No ratings yet
Generative AI and Machine Learning Course Content
19 pages
BLAO KJAG CS231N FinalPaperFashionClassification PDF
No ratings yet
BLAO KJAG CS231N FinalPaperFashionClassification PDF
7 pages
Lec1-Lecture Advance Statistics
No ratings yet
Lec1-Lecture Advance Statistics
38 pages
Rapport PFE balsam bendhif
No ratings yet
Rapport PFE balsam bendhif
73 pages
STOCK MARKET PREDICTION USING MACHINE LEARNING
No ratings yet
STOCK MARKET PREDICTION USING MACHINE LEARNING
1 page
Loan Approval Prediction2
No ratings yet
Loan Approval Prediction2
72 pages
Professional Training Report at Sathyabama Institute of Science and Technology (Deemed To Be University)
No ratings yet
Professional Training Report at Sathyabama Institute of Science and Technology (Deemed To Be University)
34 pages
bakshi2018
No ratings yet
bakshi2018
9 pages
Chapter 4 _ Clustering
No ratings yet
Chapter 4 _ Clustering
21 pages
Customer Churn Prediction On E-Commerce Using Machine Learning
No ratings yet
Customer Churn Prediction On E-Commerce Using Machine Learning
8 pages
Heart Disease Prediction With Machine Learning
No ratings yet
Heart Disease Prediction With Machine Learning
11 pages
spotify2[1]
No ratings yet
spotify2[1]
16 pages
A Closer Look at Deep Learning On Tabular Data
No ratings yet
A Closer Look at Deep Learning On Tabular Data
43 pages
Deep Learning Powers Better Decisions in Financial Services
No ratings yet
Deep Learning Powers Better Decisions in Financial Services
29 pages
Transaction Analysis On Ethereum Network Using Machine Learning A Proposed System
No ratings yet
Transaction Analysis On Ethereum Network Using Machine Learning A Proposed System
6 pages
L1 - SLM Notes (Bacground, ML)
No ratings yet
L1 - SLM Notes (Bacground, ML)
29 pages
Data Mining
100% (1)
Data Mining
30 pages
Advances in Artificial Intelligence and Blockchain Technologies for Early Detection of Human Diseases
No ratings yet
Advances in Artificial Intelligence and Blockchain Technologies for Early Detection of Human Diseases
28 pages

Data Algo Metrics

Uploaded by

Data Algo Metrics

Uploaded by

1.

1.1 Naive Bayes Classifier

1.2 Random Forrest Classifier

 fij = the importance of feature i

1.3 KNN Classifier

1.4 ANN Classifier

 x0, x1, x2, x3...x(n) – Input nodes (Independent Variables)

1.5.4 Area under the Curve (AUC)

1.5.5 Classification Accuracy (CA)

1.6 Environment Setup

Memory Processor Speed

Storage Software and Libraries

You might also like