0% found this document useful (0 votes)

433 views13 pages

Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in

The document discusses different types of supervised learning algorithms for classification. It begins by defining supervised learning and classification. It then describes several statistical-based classification algorithms like statistical distribution-based outlier detection which uses probability models to identify outliers. It also discusses distance-based outlier detection which identifies outliers based on the distance of an object from other objects rather than relying on statistical tests. The document provides examples of applications of classification algorithms and approaches to classify problems.

Uploaded by

Veeraj Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

433 views13 pages

Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in

Uploaded by

Veeraj Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Please do not share these notes on apps like WhatsApp or Telegram.

The revenue we generate from the ads we show on our website and app
funds our services. The generated revenue helps us prepare new notes
and improve the quality of existing study materials, which are
available on our website and mobile app.

If you don't use our website and app directly, it will hurt our revenue,
and we might not be able to run the services and have to close them.
So, it is a humble request for all to stop sharing the study material we
provide on various apps. Please share the website's URL instead.
Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

CS-703(B) Open Elective, Data Mining and Warehousing

-------------------------------------------------------------------------------------------------
UNIT-IV: Supervised Learning: Classification: Statistical-based algorithms, Distance-based
algorithms, Decision tree-based algorithms, neural network-based algorithms, Rule-based
algorithms, Probabilistic Classifiers
--------------------------------------------------------------------
Supervised Learning
Supervised learning, also known as supervised machine learning, is a subcategory of machine
learning and artificial intelligence. It is defined by its use of labeled datasets to train algorithms
that to classify data or predict outcomes accurately. In supervised learning, algorithms learn from
labeled data. After understanding the data, the algorithm determines which label should be given
to new data by associating patterns to the unlabeled new data.
Supervised learning can be divided into two categories:
I. Classification: The model finds classes in which to place its inputs.
II. Regression: The model finds outputs that are real variables.
Basically supervised learning is a learning in which we train the machine using data which is well
labeled that means some data is already tagged with the correct answer. After that, the machine is
provided with a new set of examples (data) so that supervised learning algorithm analyses the
training data (set of training examples) and produces a correct outcome from labeled data.

Figure 1: Supervised Learning

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

Classification
Classification is a technique for determining which class the dependent belongs to based on one
or more independent variables. Classification is a data mining function that assigns items in a
collection to target categories or classes. The goal of classification is to accurately predict the target
class for each case in the data.
For example, a classification model could be used to identify loan applicants as low, medium, or
high credit risks.
There are various applications of classification algorithms as:
1. Medical Diagnosis
2. Image and pattern recognition
3. Fault detection
4. Financial market position etc.
There are two forms of data analysis that can be used for extracting models describing important
classes or to predict future data trends. These two forms are as follows −
i) Classification
ii) Prediction
Classification models predict categorical class labels; and prediction models predict continuous
valued functions. For example, we can build a classification model to categorize bank loan
applications as either safe or risky, or a prediction model to predict the expenditures in dollars of
potential customers on computer equipment given their income and occupation.
There are three main approaches to classify problem:
1. The first approach divides the space defined by data points into regions and each region
correspond to a given class.
2. The second approach is to find the probability of an example belonging to each class.
3. The third approach is to find the probability of a class containing that example.

Statistical-based algorithms
Statistical Distribution-Based Outlier Detection: The statistical distribution-based approach to
outlier detection assumes a distribution or probability model for the given data set (e.g., a Normal
or Poisson distribution) and then identifies outliers with respect to the model using a discordancy
test. Application of the test requires knowledge of the data set parameters knowledge of
distribution parameters such as the mean and variance and the expected number of outliers.

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

A statistical discordancy test examines two hypotheses:

➢ A working hypothesis
➢ An alternative hypothesis
A working hypothesis, H, is a statement that the entire data set of n objects comes from an initial
distribution model, F, that is,

The hypothesis is retained if there is no statistically significant evidence supporting its rejection.
A discordancy test verifies whether an object, oi, is significantly large (or small) in relation to the
distribution F. Different test statistics have been proposed for use as a discordancy test, depending
on the available knowledge of the data.
Assuming that some statistic, T, has been chosen for discordancy testing, and the value of the
statistic for object oi is vi, then the distribution of T is constructed. Significance probability, SP
(vi) = Prob (T > vi), is evaluated. If SP (vi) is sufficiently small, then oi is discordant and the
working hypothesis is rejected.
An alternative hypothesis, H, which states that oi comes from another distribution model, G, is
adopted. The result is very much dependent on which model F is chosen because oi may be an
outlier under one model and a perfectly valid value under another. The alternative distribution is
very important in determining the power of the test, that is, the probability that the working
hypothesis is rejected when oi is really an outlier. There are different kinds of alternative
distributions.
Inherent alternative distribution:
In this case, the working hypothesis that all the objects come from distribution F is rejected in
favor of the alternative hypothesis that all of the objects arise from another distribution.
G: H:Pr (F>0) oi € G, where i = 1, 2.…, n
F and G may be different distributions or differ only in parameters of the same distribution.
There are constraints on the form of the G distribution in that it must have potential to produce
outliers. For example, it may have a different mean or dispersion.
Mixture alternative distribution:
The mixture alternative states that discordant values are not outliers in the F population, but
contaminants from some other population, G. In this case, the alternative hypothesis is:

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

Slippage alternative distribution:

This alternative states that all of the objects (apart from some prescribed small number) arise
independently from the initial model, F, with its given parameters, whereas the remaining objects
are independent observations from a modified version of F in which the parameters have been
shifted.
There are two basic types of procedures for detecting outliers:
Block procedures: In this case, either all of the suspect objects are treated as outliers or all of
them are accepted as consistent.
Consecutive procedures: Its main idea is that the object that is least likely to be an outlier is tested
first. If it is found to be an outlier, then more extreme values are also considered outliers; otherwise,
the next most extreme object is tested, and so on. This procedure tends to be more effective than
block procedures.

Distance-Based Outlier Detection:

The notion of distance-based outliers was introduced to counter the main limitations imposed by
statistical methods. An object, o, in a data set, D, is a distance-based (DB) outlier with parameters
pct and dmin, that is, a DB (pct; dmin)-outlier, if at least a fraction, pct, of the objects in D lie at a
distance greater than dmin from o. In other words, rather than relying on statistical tests, we can
think of distance-based outliers as those objects that do not have enough neighbors, where
neighbors are defined based on distance from the given object. In comparison with statistical-based
methods, distance based outlier detection generalizes the ideas behind discordancy testing for
various standard distributions. Distance-based outlier detection avoids the excessive computation
that can be associated with fitting the observed distribution into some standard distribution and in
selecting discordancy tests.
For many discordancy tests, it can be shown that if an object, o, is an outlier according to the given
test, then o is also a DB (pct, dmin)-outlier for some suitably defined pct and dmin. For example,
if objects that lie three or more standard deviations from the mean are considered to be outliers,
assuming a normal distribution, then this definition can be generalized by a DB (0.9988, 0.13s)
outlier. Several efficient algorithms for mining distance-based outliers have been developed.
Index-based algorithm:

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

Given a data set, the index-based algorithm uses multi dimensional indexing structures, such as R-
trees or k-d trees, to search for neighbors of each object o within radius dmin around that object.
Let M be the maximum number of objects within the dmin-neighborhood of an outlier. Therefore,
once M+1 neighbors of object o are found, it is clear that o is not an outlier. This algorithm has a
worst-case complexity of O (n2k), where n is the number of objects in the data set and k is the
dimensionality. The index-based algorithm scales well as k increases. However, this complexity
evaluation takes only the search time into account, even though the task of building an index in
itself can be computationally intensive.
Nested-loop algorithm:
The nested-loop algorithm has the same computational complexity as the index-based algorithm
but avoids index structure construction and tries to minimize the number of I/Os. It divides the
memory buffer space into two halves and the data set into several logical blocks. By carefully
choosing the order in which blocks are loaded into each half, I/O efficiency can be achieved.

Decision tree-based algorithms

A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node
holds a class label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy computer that indicates whether a customer at
a company is likely to buy a computer or not. Each internal node represents a test on an attribute.
Each leaf node represents a class shown in figure 2.

Figure 2: Decision Tree

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

The benefits of having a decision tree are as follows −

i. It does not require any domain knowledge.
ii. It is easy to comprehend.
iii. The learning and classification steps of a decision tree are simple and fast.

The Tree Induction Algorithm: (Decision Tree Induction Algorithm)

A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known
as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was the successor of ID3. ID3
and C4.5 adopt a greedy approach. In this algorithm, there is no backtracking; the trees are
constructed in a top-down recursive divide-and-conquer manner.
The algorithm needs three parameters: D (Data Partition), Attribute List, and Attribute Selection
Method. Initially, D is the entire set of training tuples and associated class labels. Attribute list is
a list of attributes describing the tuples. Attribute selection method specifies a heuristic procedure
for selecting the attribute that “best” discriminates the given tuples according to class.

Neural network-based algorithms

➢ A neural network is a set of connected input/output units in which each connection has a
weight associated with it.
➢ During the learning phase, the network learns by adjusting the weights so as to be able to
predict the correct class label of the input tuples.
➢ Neural network learning is also referred to as connectionist learning due to the connections
between units.
➢ Neural networks involve long training time.
➢ Back propagation learns by iteratively processing a data set of training tuples, comparing
the network’s prediction for each tuple with the actual known target value.
➢ The target value may be the known class label of the training tuple (for classification
problems) or a continuous value (for prediction).
➢ For each training tuple, the weights are modified so, minimize the mean squared error
between the network’s prediction and the actual target value. These modifications are made
in the ―backwards direction, that is, from the output layer, through each hidden layer down
to the first hidden layer hence the name is back propagation.

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

➢ Although it is not guaranteed, in general the weights will eventually converge, and the
learning process stops.

Figure 3: Neural Network

Advantages:
➢ It includes their high tolerance of noisy data as well as their ability to classify patterns on
which they have not been trained.
➢ They can be used when user may have little knowledge of the relationships between
attributes and classes.
➢ They are well-suited for continuous-valued inputs and outputs, unlike most decision tree
algorithms.
➢ They have been successful on a wide array of real-world data, including handwritten
character recognition, pathology and laboratory medicine, and training a computer to
pronounce English text.
➢ Neural network algorithms are inherently parallel; parallelization techniques can be used
to speed up the computation process.

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

Rule-based algorithms:
IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification.
Rule Format:
IF condition THEN conclusion
Let us consider a rule R1,
R1: IF age = youth AND student = yes
THEN buy_computer = yes
Points to remember −
• The IF part of the rule is called rule antecedent or precondition.
• The THEN part of the rule is called rule consequent.

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

• The antecedent part the condition consist of one or more attribute tests and these tests are
logically ANDed.
• The consequent part consists of class prediction.
Note − We can also write rule R1 as follows −
R1: (age = youth) ^ (student = yes) (buys computer = yes)
If the condition holds true for a given tuple, then the antecedent is satisfied.
Rule Extraction:
Learn how to build a rule-based classifier by extracting IF-THEN rules from a decision tree.
Points to remember −
To extract a rule from a decision tree −
• One rule is created for each path from the root to the leaf node.
• To form a rule antecedent, each splitting criterion is logically ANDed.
• The leaf node holds the class prediction, forming the rule consequent.
Rule Induction Using Sequential Covering Algorithm:
Sequential Covering Algorithm can be used to extract IF-THEN rules form the training data. We
do not require generating a decision tree first. In this algorithm, each rule for a given class covers
many of the tuples of that class.
Some of the sequential Covering Algorithms are AQ, CN2, and RIPPER. As per the general
strategy the rules are learned one at a time. For each time rules are learned, a tuple covered by the
rule is removed and the process continues for the rest of the tuples. This is because the path to
each leaf in a decision tree corresponds to a rule.
The following is the sequential learning algorithm where rules are learned for one class at a time.
When learning a rule from a class Ci, rule to cover all the tuples from class C only and no tuple
form any other class.
Algorithm: Sequential Covering
Input:
D, a data set class-labeled tuple,
Att_vals, the set of all attributes and their possible values.
Output: A Set of IF-THEN rules.
Method:
Rule_set={ }; // initial set of rules learned is empty

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

for each class c do

repeat
Rule = Learn_One_Rule(D, Att_valls, c);
Remove tuples covered by rule form D;
until termination condition;

Rule_set=Rule_set+Rule; // add a new rule to rule-set

end for
return Rule_Set;
Rule Pruning
The rule is pruned is due to the following reason −
• The Assessment of quality is made on the original set of training data.
• The rule may perform well on training data but less well on subsequent data.
FOIL is one of the simple and effective method for rule pruning. For a given rule R,
FOIL_Prune = pos - neg / pos + neg
Where pos and neg is the number of positive and negative tuples covered by R, respectively.

Probabilistic Classifiers:
A Bayes classifier is a probabilistic model that is used for supervised learning. A Bayes classifier
is based on the idea that the role of a class is to predict the values of features for members of that
class. Examples are grouped in classes because they have common values for some of the features.
Such classes are often called natural kinds. The learning agent learns how the features depend on
the class and uses that model to predict the classification of a new example.
The simplest case is the naive Bayes classifier, which makes the independence assumption
that the input features are conditionally independent of each other given the classification. The
independence of the naive Bayes classifier is embodied in a belief network where the features are
the nodes, the target feature (the classification) has no parents, and the target feature is the only
parent of each input feature. This belief network requires the probability distributions P(X)
P(Y) for the target feature, or class, YY and P(Xi∣Y) P(Xi∣Y) for each input feature XiXi. For

example, the prediction is computed by conditioning on observed values for the input features and
querying the classification. Multiple target variables can be modeled and learned separately.

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Downloaded from www.rgpvnotes.in, whatsapp: 8989595022

Figure 4: Belief network corresponding to a naive Bayes classifier

Learning a Bayes Classifier

To learn a classifier, the distributions of P(Y)P(Y) and P(Xi∣Y)P(Xi∣Y) for each input feature can
be learned from the data. Each conditional probability distribution P(Xi∣Y)P(Xi∣Y) may be treated
as a separate learning problem for each value of YY.
The simplest case is to use the maximum likelihood estimate (the empirical proportion in the
training data as the probability), where P(Xi=xi∣Y=y)P(Xi=xi∣Y=y) is the number of cases
where Xi=xi∧Y=yXi=xi∧Y=y divided by the number of cases where Y=yY=y.

-----------------------------***-----------------------------

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Thank you for using our services. Please support us so that we can
improve further and help more people.
https://www.rgpvnotes.in/support-us

If you have questions or doubts, contact us on

WhatsApp at +91-8989595022 or by email at [email protected].

For frequent updates, you can follow us on

Instagram: https://www.instagram.com/rgpvnotes.in/.

Teaching With AI A Practical Guide To A New Era of Human Learning (José Antonio Bowen, C. Edward Watson) (Z-Library)
100% (7)
Teaching With AI A Practical Guide To A New Era of Human Learning (José Antonio Bowen, C. Edward Watson) (Z-Library)
280 pages
Sonu Report
No ratings yet
Sonu Report
8 pages
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
Introduction To Data Mining With Case Studies Author: G. K. Gupta Prentice Hall India, 2006
100% (2)
Introduction To Data Mining With Case Studies Author: G. K. Gupta Prentice Hall India, 2006
78 pages
Motivations of Fuzzy Logic
No ratings yet
Motivations of Fuzzy Logic
3 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
100% (1)
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
MCQ - Bda
33% (3)
MCQ - Bda
3 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
SPPU 2022 Solved Question Paper DWDM
50% (2)
SPPU 2022 Solved Question Paper DWDM
25 pages
Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
100% (1)
Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
Data Structure Unit 5
50% (4)
Data Structure Unit 5
14 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
94% (18)
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
70 pages
Unit I DM
No ratings yet
Unit I DM
27 pages
PIEAS Course Outline For MS System Engineering
No ratings yet
PIEAS Course Outline For MS System Engineering
20 pages
Arxiv 1809.02077 PDF
No ratings yet
Arxiv 1809.02077 PDF
8 pages
Data Mining-Mining Time Series Data
0% (1)
Data Mining-Mining Time Series Data
7 pages
8ad59658 1701235711480
No ratings yet
8ad59658 1701235711480
36 pages
Data Mining Question Bank
0% (1)
Data Mining Question Bank
7 pages
Data Warehousing & Mining: Unit - Iv
No ratings yet
Data Warehousing & Mining: Unit - Iv
32 pages
499 Project Topics For Computer Science and Engineering (CSE) List 1 - Collegelib
No ratings yet
499 Project Topics For Computer Science and Engineering (CSE) List 1 - Collegelib
11 pages
Direct Hashing and Pruning (Park-Chen-Yu) Direct Hashing and Pruning
No ratings yet
Direct Hashing and Pruning (Park-Chen-Yu) Direct Hashing and Pruning
3 pages
TIT Coding Test
No ratings yet
TIT Coding Test
11 pages
UNIT-1 Introduction To Data Mining
No ratings yet
UNIT-1 Introduction To Data Mining
29 pages
Outlier Detection
No ratings yet
Outlier Detection
22 pages
License Plate Recognition System Based On Image Processing Using Labview
No ratings yet
License Plate Recognition System Based On Image Processing Using Labview
6 pages
Question Bank - Big Data Analytics - Final1
100% (1)
Question Bank - Big Data Analytics - Final1
6 pages
DBMS - Unit 3 - Notes (Embedded & Dynamic SQL)
No ratings yet
DBMS - Unit 3 - Notes (Embedded & Dynamic SQL)
23 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Data Mining and Visualization Question Bank
100% (1)
Data Mining and Visualization Question Bank
11 pages
Question of The Day: N N N N
No ratings yet
Question of The Day: N N N N
8 pages
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
No ratings yet
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
24 pages
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
100% (1)
Unit 1 - Data Mining - WWW - Rgpvnotes.in PDF
13 pages
Data Analytics Important Questions
No ratings yet
Data Analytics Important Questions
11 pages
CS402 Data Mining and Warehousing Question Bank
No ratings yet
CS402 Data Mining and Warehousing Question Bank
6 pages
Freshdesk Property SOP
No ratings yet
Freshdesk Property SOP
34 pages
BI Lab Manual
0% (1)
BI Lab Manual
9 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Sonu Kumar Yadav COVID Report
No ratings yet
Sonu Kumar Yadav COVID Report
1 page
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Unit 1 - Software Architectures - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Software Architectures - WWW - Rgpvnotes.in
15 pages
DMW Question Paper
0% (1)
DMW Question Paper
7 pages
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
No ratings yet
Cs2032 Data Warehousing and Data Mining Notes (Unit III) .PDF - Www.chennaiuniversity - Net.notes
54 pages
Image Processing and Computer Vision Unit 1
100% (2)
Image Processing and Computer Vision Unit 1
8 pages
DWDM Notes/Unit 1
No ratings yet
DWDM Notes/Unit 1
31 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
IMP Questions ADA
No ratings yet
IMP Questions ADA
7 pages
UP Prime POS Product Deck
No ratings yet
UP Prime POS Product Deck
20 pages
Sonu Kumar Final Updated Form
No ratings yet
Sonu Kumar Final Updated Form
12 pages
Sonu Kumar Yadav July Payslip
No ratings yet
Sonu Kumar Yadav July Payslip
1 page
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
Sonu Kumar Yadav COVID Report
No ratings yet
Sonu Kumar Yadav COVID Report
1 page
Sonu Kumar Yadav (A1021204648)
No ratings yet
Sonu Kumar Yadav (A1021204648)
5 pages
Group Notice 17-8-22
No ratings yet
Group Notice 17-8-22
2 pages
Question Bank: Data Warehousing and Data Mining Semester: VII
No ratings yet
Question Bank: Data Warehousing and Data Mining Semester: VII
4 pages
Neural Network PHD Thesis PDF
100% (3)
Neural Network PHD Thesis PDF
5 pages
Certificate
No ratings yet
Certificate
1 page
Data Warehousing & Data Mining Important Questions
No ratings yet
Data Warehousing & Data Mining Important Questions
1 page
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
100% (1)
Data Warehousing & Data Mining (R20) Imp Questions:-Unit-1
3 pages
DOC-20231118-WA0008new Unit 3
No ratings yet
DOC-20231118-WA0008new Unit 3
15 pages
Omprakash: Exam Passed Board/University Years Division Percentage
No ratings yet
Omprakash: Exam Passed Board/University Years Division Percentage
2 pages
Sonu Kumar Yadav
No ratings yet
Sonu Kumar Yadav
1 page
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
Pattern Warehouse
No ratings yet
Pattern Warehouse
6 pages
MC5032 - DMDW
No ratings yet
MC5032 - DMDW
3 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Hierarchical Graph Representation Learning With Differentiable Pooling
No ratings yet
Hierarchical Graph Representation Learning With Differentiable Pooling
9 pages
Sonu Kumar Yadav
No ratings yet
Sonu Kumar Yadav
1 page
GenC Elevate 21 Oct
No ratings yet
GenC Elevate 21 Oct
1 page
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Ccs341-Question-Bank NNNNNN
No ratings yet
Ccs341-Question-Bank NNNNNN
10 pages
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Science & Big Data - WWW - Rgpvnotes.in
20 pages
CS2032 2 Marks & 16 Marks With Answers
100% (1)
CS2032 2 Marks & 16 Marks With Answers
30 pages
WWW - Manaresults.Co - In: I B. Tech. II Semester Regular Examinations, April/May - 2017 Data Structures
No ratings yet
WWW - Manaresults.Co - In: I B. Tech. II Semester Regular Examinations, April/May - 2017 Data Structures
4 pages
Artificial Neural Network Tutorial
No ratings yet
Artificial Neural Network Tutorial
8 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Ec 467 Pattern Recognition
No ratings yet
Ec 467 Pattern Recognition
2 pages
Simulation Vs Emulation
No ratings yet
Simulation Vs Emulation
1 page
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Kaur2020 Article Hyper-parameterOptimizationOfD
No ratings yet
Kaur2020 Article Hyper-parameterOptimizationOfD
15 pages
Data Mining Question Bank U3 & U4
No ratings yet
Data Mining Question Bank U3 & U4
3 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
Schmidhuber
No ratings yet
Schmidhuber
8 pages
Certificate
No ratings yet
Certificate
1 page
Lesson Plan F1.1-DMDW
No ratings yet
Lesson Plan F1.1-DMDW
3 pages
DBMS Question DBMS
100% (1)
DBMS Question DBMS
14 pages
Data Mining Question Bank
No ratings yet
Data Mining Question Bank
4 pages
Data Mining Question Bank
100% (1)
Data Mining Question Bank
3 pages
DWDM Assignment 1
No ratings yet
DWDM Assignment 1
4 pages
Final Documentation 9
No ratings yet
Final Documentation 9
69 pages
Copyright Liability - AI Inputs and Outputs
No ratings yet
Copyright Liability - AI Inputs and Outputs
17 pages
Design and Implementation of Automatic Speech Recognition Application 1&2&3 (2) 095630
No ratings yet
Design and Implementation of Automatic Speech Recognition Application 1&2&3 (2) 095630
70 pages
Assignment Guideline and Rubric CPC251
No ratings yet
Assignment Guideline and Rubric CPC251
3 pages
Content-Aware Network Traffic Prediction Framework For Quality of Service-Aware Dynamic Network Resource Management
No ratings yet
Content-Aware Network Traffic Prediction Framework For Quality of Service-Aware Dynamic Network Resource Management
18 pages
Predicitve Risk Model
No ratings yet
Predicitve Risk Model
32 pages
DL UNIT-4 Part-1
No ratings yet
DL UNIT-4 Part-1
10 pages
An Intelligent Overcurrent Protection Algorithm of Distribution Systems With Inverter Based Distributed Energy Resources
No ratings yet
An Intelligent Overcurrent Protection Algorithm of Distribution Systems With Inverter Based Distributed Energy Resources
6 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
Generative Ai Report
No ratings yet
Generative Ai Report
16 pages
Neural Networks Utilization For Oil Spill Classification Using A Sequential CNN Model
No ratings yet
Neural Networks Utilization For Oil Spill Classification Using A Sequential CNN Model
5 pages
Smart Battery Health Monitoring Using Digital Twin and AI/ML Technologies
No ratings yet
Smart Battery Health Monitoring Using Digital Twin and AI/ML Technologies
6 pages
Sign Language Recognition Based On Facial Expression and Hand Skeleton
No ratings yet
Sign Language Recognition Based On Facial Expression and Hand Skeleton
5 pages
Ebooks File ChatGPT MASTERY 12 Books in 1: Unlocking The Potential of AI, Everything You Need To Know To Make Money Mastering AI Irvin All Chapters
80% (5)
Ebooks File ChatGPT MASTERY 12 Books in 1: Unlocking The Potential of AI, Everything You Need To Know To Make Money Mastering AI Irvin All Chapters
50 pages

Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in

Uploaded by

Unit 4 - Data Mining and Warehousing - WWW - Rgpvnotes.in

Uploaded by

Please do not share these notes on apps like WhatsApp or Telegram.

CS-703(B) Open Elective, Data Mining and Warehousing

Figure 1: Supervised Learning

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

A statistical discordancy test examines two hypotheses:

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Slippage alternative distribution:

Distance-Based Outlier Detection:

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Decision tree-based algorithms

Figure 2: Decision Tree

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

The benefits of having a decision tree are as follows −

The Tree Induction Algorithm: (Decision Tree Induction Algorithm)

Neural network-based algorithms

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Figure 3: Neural Network

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

for each class c do

Rule_set=Rule_set+Rule; // add a new rule to rule-set

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

Figure 4: Belief network corresponding to a naive Bayes classifier

Learning a Bayes Classifier

follow us on instagram for frequent updates: www.instagram.com/rgpvnotes.in

If you have questions or doubts, contact us on

For frequent updates, you can follow us on

You might also like