0% found this document useful (0 votes)

81 views57 pages

Unit Iii Classification

This document discusses classification, which predicts categorical class labels. There are two forms of data analysis for extracting models: classification and prediction. Classification models predict categorical class labels, while prediction models predict continuous valued functions. For example, a classification model could categorize bank loan applications as either safe or risky, while a prediction model could predict customer expenditures based on income and occupation. The classification process involves building a classifier model from labeled training data, and then using the model to predict class labels for new unlabeled data.

Uploaded by

arshuyas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views57 pages

Unit Iii Classification

Uploaded by

arshuyas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 57

UNIT III

Classification:
 There are two forms of data analysis that can be used for extracting models describing
important classes or to predict future data trends. These two forms are as follows −
1. Classification
2. Prediction
 Classification models predict categorical class labels; and prediction models predict
continuous valued functions. For example, we can build a classification model to
categorize bank loan applications as either safe or risky,
 or a prediction model to predict the expenditures in dollars of potential customers on
computer equipment given their income and occupation.
Classification
 Classification is a form of data analysis that extracts models describing important data classes.
 Such models, called classifiers, predict categorical (discrete, unordered) class labels.
 For example, we can build a classification model to categorize bank loan applications as either
safe or risky. Such analysis can help provide us with a better understanding of the data at large.
 Classification:
 predicts categorical class labels
 classifies data based on the training set and the values in a classification attribute and uses
it in classifying new data
General Approach to Classification
 Data classification is a two-step process, consisting of a
1. learning step (where a classification model is constructed) and
2. a classification step (where the model is used to predict class labels for given data).
General Approach for Building Classification Model
4

Tid Attrib1 Attrib2 Attrib3 Class

Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Introduction to Data Mining, 2nd Edition 04/12/2022
How Does Classification Works?
 The Data Classification process includes two steps −
1. Building the Classifier or Model
2. Using Classifier for Classification

 Building the Classifier or Model

 This step is the learning step or the learning phase.
 In this step the classification algorithms build the classifier.
 The classifier is built from the training set made up of database tuples and their associated class
labels.
 Each tuple that constitutes the training set is referred to as a category or class. These tuples can
also be referred to as sample, object or data points.
 Using Classifier for Classification
 In this step, the classifier is used for classification. Here the test data is used to estimate the
accuracy of classification rules. The classification rules can be applied to the new data tuples if the
accuracy is considered acceptable.
8 An example: model construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED

Classifier
Mary Assistant Prof 3 no (Model)
James Assistant Prof 7 yes
Bill Professor 2 no
John Associate Prof 7 yes IF rank = ‘professor’
Mark Assistant Prof 6 no OR years > 6
THEN tenured = yes
Annie Associate Prof 3 no
Data mining: Classification
9 An example: model usage

Classifier

Testing Unseen Data

Data

(Jeff, Professor, 4)

NAME RANK YEARS TENURED

Tom Assistant Prof 2 no Tenured?
Lisa Associate Prof 7 no
Jack Professor 5 yes Yes
Ann Assistant Prof 7 yes
Data mining: Classification
Classification and Prediction Issues
 The major issue is preparing the data for Classification and Prediction. Preparing the data
involves the following activities −
 Data Cleaning − Data cleaning involves removing the noise and treatment of missing values.
The noise is removed by applying smoothing techniques and the problem of missing values is
solved by replacing a missing value with most commonly occurring value for that attribute.
 Relevance Analysis − Database may also have the irrelevant attributes. Correlation analysis
is used to know whether any two given attributes are related.
 Data Transformation and reduction − The data can be transformed by any of the
following methods.
 Normalization − The data is transformed using normalization. Normalization involves scaling
all values for given attribute in order to make them fall within a small specified range.
Normalization is used when in the learning step, the neural networks or the methods involving
measurements are used.
 Generalization − The data can also be transformed by generalizing it to the higher concept.
For this purpose we can use the concept hierarchies.
12
Classification vs. prediction
 Classification:
 predicts categorical class labels
 classifies data based on the training set and the values in a
classification attribute and uses it in classifying new data
 Prediction:
 models continuous-valued functions
 predicts unknown or missing values
Classification Algorithms
 Classification algorithms are used to analyze a given data set and takes each instance of it. It
assigns this instance to a particular class. Such that classification error will be least. It is
used to extract models. That define important data classes within the given data set.
Classification is a two-step process.
 During the first step, the model is created by applying a classification algorithm. That is on
training data set.
 Then in the second step, the extracted model is tested against a predefined test data set. That
is to measure the model trained performance and accuracy.
Applications of Classification Algorithms

 Email spam classification

 Bank customers loan pay willingness prediction.
 Cancer tumor cells identification.
 Sentiment analysis
 Drugs classification
 Facial key points detection
Classification Techniques
15

 Classification consists of assigning a class label to a set of unclassified cases.

 Supervised Classification
 The set of possible classes is known in advance.

 Unsupervised Classification
 Set of possible classes is not known. After classification we can try to assign a
name to that class.
 Unsupervised classification is called clustering.
16 Supervised learning
 Supervised learning, as the name indicates, has the presence of a supervisor as
a teacher.
 Basically supervised learning is when we teach or train the machine using data
that is well labeled.
 Which means some data is already tagged with the correct answer.
 After that, the machine is provided with a new set of examples(data) so that
the supervised learning algorithm analyses the training data(set of training
examples) and produces a correct outcome from labeled data.

Data mining: Classification

Example
 If the shape of the object is rounded and has a depression at the top, is red in color, then it
will be labeled as –Apple.
 If the shape of the object is a long curving cylinder having Green-Yellow color, then it will
be labeled as –Banana.
 Now suppose after training the data, you have given a new separate fruit, say Banana from
the basket, and asked to identify it.
 Since the machine has already learned the things from previous data
and this time have to use it wisely.
 It will first classify the fruit with its shape and color and would
confirm the fruit name as BANANA and put it in the Banana category.
 Thus the machine learns the things from training data(basket
containing fruits) and then applies the knowledge to test data(new
fruit).
Supervised Classification
20

CS 40003: Data
Analytics
 Supervised learning classified into two categories of algorithms:

 Classification: A classification problem is when the output variable is a
category, such as “Red” or “blue” or “disease” and “no disease”.
 Regression: A regression problem is when the output variable is a real
value, such as “dollars” or “weight”.
22 Unsupervised learning
 Unsupervised learning is the training of a machine using information that is neither
classified nor labeled and allowing the algorithm to act on that information without
guidance.
 Here the task of the machine is to group unsorted information according to
similarities, patterns, and differences without any prior training of data.
 Unlike supervised learning, no teacher is provided that means no training will be
given to the machine.
 Therefore the machine is restricted to find the hidden structure in unlabeled data by
itself.

Data mining: Classification

23 Example
For instance, suppose it is given an image having both dogs
and cats which it has never seen.

Data mining: Classification

 Thus the machine has no idea about the features of dogs and cats so we can’t categorize
it as ‘dogs and cats ‘.

 But it can categorize them according to their similarities, patterns, and differences,

 i.e., we can easily categorize the above picture into two parts.

 The first may contain all pics having dogs in it and the second part may contain all pics
having cats in it.

 Here you didn’t learn anything before, which means no training data or examples.
Classification Techniques
25

 A number of classification techniques are known, which can be broadly classified into the following
categories:

1. Statistical-Based Methods
•Regression
•Bayesian Classifier
•

2. Distance-Based Classification
•Simple approach
•K-Nearest Neighbours

3. Decision Tree-Based Classification

•ID, C , CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (A

Linear regression

 It is simplest form of regression. Linear regression attempts to model

the relationship between two variables by fitting a linear equation to
observe the data.
 Linear regression attempts to find the mathematical relationship
between variables.
 If outcome is straight line then it is considered as linear model and if it
is curved line, then it is a non linear model.
Linear regression
contd..

 The relationship between dependent variable is given by straight

line and it has only one independent variable.
Y = α + Β X
 Model 'Y', is a linear function of 'X'.
 The value of 'Y' increases or decreases in linear manner according
to which the value of 'X' also changes.
Linear regression
contd..
Statistical-Based Algorithms-
Regression
 Logistic Regression is a classification and not a regression algorithm. It estimates discrete
values (Binary values like 0/1, yes/no, true/false) based on a given set of independent
variable(s).
 The values obtained would always lie within 0 and 1 since it predicts the probability.
 regressions can be used in applications, such as:
 Credit score
 Measure the success rates of marketing campaigns
 Predict revenue for a particular product
Bayesian Classification.

 Bayesian classification uses Bayes theorem to predict the occurrence of any event.
Bayesian classifiers are the statistical classifiers with the Bayesian probability
understandings. The theory expresses how a level of belief, expressed as a probability.
 Bayes theorem came into existence after Thomas Bayes, who first utilized conditional
probability to provide an algorithm that uses evidence to calculate limits on an unknown
parameter.
Bayesian Classifier
33

 Principle
 If it walks like a duck, quacks like a duck, then it is probably a duck
Bayesian Classifier
34
 A statistical classifier

 Performs probabilistic prediction, i.e., predicts class membership probabilities

 Foundation

 Based on Bayes’ Theorem.

 Assumptions
1.The classes are mutually exclusive and exhaustive.
2.The attributes are independent given the class.

 Called “Naïve” classifier because of these assumptions.

 Empirically proven to be useful.
Example: Bayesian Classification
35

 Example 8.2: Air Traffic Data

 Let us consider a set

observation recorded in a
database

 Regarding the arrival of airplanes

in the routes from any airport to
New Delhi under certain
conditions.
Air-Traffic Data
36

Days Season Fog Rain Class

Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

Air-Traffic Data
37

Cond. from previous slide…

Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time
Air-Traffic Data
38

 In this database, there are four attributes

A = [ Day, Season, Fog, Rain]
with 20 tuples.
 The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

 Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:
Week Day Winter High None ???

 Classification technique eventually to map this tuple into an accurate class.

Bayesian Classifier
39

 In many applications, the relationship between the attributes set and the class variable is
non-deterministic.

 In other words, a test cannot be classified to a class label with certainty.

 In such a situation, the classification can be achieved probabilistically.

 The Bayesian classifier is an approach for modelling probabilistic relationships between

the attribute set and the class variable.

 More precisely, Bayesian classifier use Bayes’ Theorem of Probability for classification.

 Before going to discuss the Bayesian classifier, we should have a quick look at the
Theory of Probability and then Bayes’ Theorem.
Naive Bayes Algorithm
 Naïve Bayes algorithm is a supervised learning algorithm.
 It is a classification technique based on Bayes’ Theorem with an assumption of independence
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.
 For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and
that is why it is known as ‘Naive’.
 It is mainly used in text classification that includes a high-dimensional training dataset.
Bayes' Theorem:

•The formula for Bayes' theorem is given as:

P(A|B) is Posterior probability: Probability of hypothesis A on the observed

event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
P(B) is Marginal Probability: Probability of Evidence.
 Bayesian interpretation:
In the Bayesian interpretation, probability determines a "degree of belief." Bayes theorem
connects the degree of belief in a hypothesis before and after accounting for evidence. For
example, Lets us consider an example of the coin. If we toss a coin, then we get either heads or
tails, and the percent of occurrence of either heads and tails is 50%. If the coin is flipped numbers
of times, and the outcomes are observed, the degree of belief may rise, fall, or remain the same
depending on the outcomes.
Bayesian network:
 yesian Network falls under the classification of Probabilistic Graphical Modelling (PGM)
procedure that is utilized to compute uncertainties by utilizing the probability concept. Generally
known as Belief Networks, Bayesian Networks are used to show uncertainties using Directed
Acyclic Graphs (DAG)
 A Directed Acyclic Graph is used to show a Bayesian Network, and like some other statistical
graph, a DAG consists of a set of nodes and links, where the links signify the connection between
the nodes.
 The nodes here represent random variables, and the edges define the relationship between
these variables.
 A DAG models the uncertainty of an event taking place based on the Conditional
Probability Distribution (CDP) of each random variable. A Conditional Probability
Table (CPT) is used to represent the CPD of each variable in a network.
Distance-Based Algorithms:
SimpleApproach
 Strategy:
 Each item mapped to the same class is more similar to other items in that class.
 Similarity or distance measures may be used to identify alikeness of different items
in data base.
 Using similarity measure is simpler for classification, since classes are known in
advance.
 If definition is in the form of a information retrieval query, the classification
problem is determining the similarity between each query and tuple.
Simple Approach
 Here we assume that,
 Each tuple ti in data base defined as a vector (ti1,ti2,……..tik) of numeric values.
 Each class Cj is defined by the tuple(Cj1,Cj2… Cjk)of numeric values.
 Definition:
 Given a database D={t1,t2… tn} of tuples where each tuple ti<=<ti1,ti2,…tik>
contains numeric values and a set of classes C={C1….Cm} where each class Cj
consisting of numerical values Cj<=<Cj1,Cj2…Cjk> . Now the classification
problem is to assign each ti to the class Cj such that (ti,Cj)>=similarity
between(ti,Cl) for all values of Cl being element of c where Cl≠Cj
 Method:
 For similarity measures representative vector for each class must be determined.
 Calculate the center for each region.
 Place each item in the class where it is most similar to center of the class.
 Algorithm:
 Here we calculate the center of each class
Input: (tuples to be classified)
C1,…….,Cm
Output:
C // class to which t is assigned
Algorithm is:
dist=∞
For i:=1 to m do
if dist(Ci,t) < dist, then
c=i;
dist=dist(Ci,t);
K Nearest Neighbors
 Here we consider K nearest neighbors to a particular item.
 Assumptions:
 Entire training set includes not only the data in the set but also desired
classification for each item.
 The training data becomes the model.
 For classification of new items distance of each item in the training set must be
determined.
 K closest entries in the training set are considered further.
 New item placed in the class contains most items from set K closest items.
Algorithm

Quantitative - Methods Course Text
No ratings yet
Quantitative - Methods Course Text
608 pages
Statistical Decision Theory and Bayesian Analysis
No ratings yet
Statistical Decision Theory and Bayesian Analysis
632 pages
Byron Kaldis Encyclopedia of Philosophy and The Social Sciences
80% (5)
Byron Kaldis Encyclopedia of Philosophy and The Social Sciences
1,195 pages
Todinov M. Reverse Engineering of Algebraic Inequalities 2ed 2025
No ratings yet
Todinov M. Reverse Engineering of Algebraic Inequalities 2ed 2025
214 pages
History of Behavioral Economics PDF
100% (1)
History of Behavioral Economics PDF
17 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
100% (4)
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
257 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
26 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Statistical Issues in Drug Development 3rd Edition Stephen S. Senn
No ratings yet
Statistical Issues in Drug Development 3rd Edition Stephen S. Senn
44 pages
Introduction To Bayesian Theory
No ratings yet
Introduction To Bayesian Theory
2 pages
Data Mining Module 3
No ratings yet
Data Mining Module 3
27 pages
Classification
No ratings yet
Classification
23 pages
ML 3RD Unit
No ratings yet
ML 3RD Unit
67 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
Joserentalpdf
100% (3)
Joserentalpdf
4 pages
Stat230 S2010 Course Notes
100% (1)
Stat230 S2010 Course Notes
281 pages
Steps For Data Processing
No ratings yet
Steps For Data Processing
10 pages
Classification Basic Concept - Data Mining
No ratings yet
Classification Basic Concept - Data Mining
20 pages
Reasoning Uncertainty
No ratings yet
Reasoning Uncertainty
38 pages
Classification in Data Mining
No ratings yet
Classification in Data Mining
14 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Topic - 7 (Uncertainty)
No ratings yet
Topic - 7 (Uncertainty)
25 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Green 1982 PDF
No ratings yet
Green 1982 PDF
476 pages
Down 4
No ratings yet
Down 4
83 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Chapter 01 Introduction To ML
No ratings yet
Chapter 01 Introduction To ML
178 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Understanding Data Mining
No ratings yet
Understanding Data Mining
21 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Lecture 3: Probability: Bo Li
No ratings yet
Lecture 3: Probability: Bo Li
72 pages
DWM Unit 3 Final Notes
No ratings yet
DWM Unit 3 Final Notes
47 pages
Classification
No ratings yet
Classification
50 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Probability and Statistic Chapter1
No ratings yet
Probability and Statistic Chapter1
72 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
DSand ML
No ratings yet
DSand ML
76 pages
Dutch Book Theorem
No ratings yet
Dutch Book Theorem
52 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Machine Learning Types
No ratings yet
Machine Learning Types
30 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Machine Learning-Classification
No ratings yet
Machine Learning-Classification
52 pages
DAMI 011114a
No ratings yet
DAMI 011114a
48 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Bayesian Inference: Statisticat, LLC
No ratings yet
Bayesian Inference: Statisticat, LLC
30 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Bayesian System Identification Based On Probability Logic: James L. Beck
No ratings yet
Bayesian System Identification Based On Probability Logic: James L. Beck
23 pages
Classification
No ratings yet
Classification
15 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Question Bank
No ratings yet
Question Bank
16 pages
Module 2 - ML
No ratings yet
Module 2 - ML
53 pages
Overview Basics
No ratings yet
Overview Basics
16 pages
Data Mining Jntuh Cse R18
No ratings yet
Data Mining Jntuh Cse R18
20 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Classification Unit3
No ratings yet
Classification Unit3
15 pages
Class10-Introduction To ML
No ratings yet
Class10-Introduction To ML
32 pages
U4 Clasification and Prediction
No ratings yet
U4 Clasification and Prediction
15 pages
Instant Download Bayesian Nets and Causality Philosophical and Computational Foundations Williamson PDF All Chapters
100% (1)
Instant Download Bayesian Nets and Causality Philosophical and Computational Foundations Williamson PDF All Chapters
77 pages
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
No ratings yet
Classifiers (Support Vector Machines, Decision Trees, Nearest Neighbor Classification)
16 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Lecture 16
No ratings yet
Lecture 16
14 pages
2004 - Decision Analysis in Management Science
No ratings yet
2004 - Decision Analysis in Management Science
15 pages
The Futureof Statistics Bayesian
No ratings yet
The Futureof Statistics Bayesian
11 pages
STPM Mathematics Assign 3
No ratings yet
STPM Mathematics Assign 3
5 pages
The Application of Bayesian Theorem
No ratings yet
The Application of Bayesian Theorem
7 pages
(Ebook) A Philosophical Introduction To Probability (Volume 167) (Lecture Notes) by Galavotti, Maria Carla ISBN 9781575864907, 1575864908 Download
No ratings yet
(Ebook) A Philosophical Introduction To Probability (Volume 167) (Lecture Notes) by Galavotti, Maria Carla ISBN 9781575864907, 1575864908 Download
55 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
17 pages
Cognitive Science Unit 3
No ratings yet
Cognitive Science Unit 3
15 pages
II Session - Convocation Address: N R Narayan Murthy
No ratings yet
II Session - Convocation Address: N R Narayan Murthy
6 pages
A Convocation Address N.R Narayana Murthy
No ratings yet
A Convocation Address N.R Narayana Murthy
6 pages
Genetic Algorithm Based Bayesian Classification Algorithm For Object Oriented Data
No ratings yet
Genetic Algorithm Based Bayesian Classification Algorithm For Object Oriented Data
6 pages
Bayesian Probability
No ratings yet
Bayesian Probability
4 pages
Alexander Kruel 2010 Guide Bayes Theorem
No ratings yet
Alexander Kruel 2010 Guide Bayes Theorem
4 pages
HTML Code: PHP Script To Read The List of Item Names and Corresponding Price and To Find The Cheapsest and Costliest Item
No ratings yet
HTML Code: PHP Script To Read The List of Item Names and Corresponding Price and To Find The Cheapsest and Costliest Item
3 pages
Ivth Semester One Word Questions Stay Calm: Channel Firing
No ratings yet
Ivth Semester One Word Questions Stay Calm: Channel Firing
2 pages
Important Questions in AI
No ratings yet
Important Questions in AI
2 pages
Night and Death
No ratings yet
Night and Death
1 page
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

Unit Iii Classification

Uploaded by

Unit Iii Classification

Uploaded by

UNIT III

Tid Attrib1 Attrib2 Attrib3 Class

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

 Building the Classifier or Model

NAME RANK YEARS TENURED

Testing Unseen Data

NAME RANK YEARS TENURED

 Email spam classification

 Classification consists of assigning a class label to a set of unclassified cases.

Data mining: Classification

Data mining: Classification

Data mining: Classification

3. Decision Tree-Based Classification

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (A

 It is simplest form of regression. Linear regression attempts to model

 The relationship between dependent variable is given by straight

 Performs probabilistic prediction, i.e., predicts class membership probabilities

 Based on Bayes’ Theorem.

 Called “Naïve” classifier because of these assumptions.

 Example 8.2: Air Traffic Data

 Let us consider a set

 Regarding the arrival of airplanes

Days Season Fog Rain Class

Cond. to next slide…

Cond. from previous slide…

 In this database, there are four attributes

 Classification technique eventually to map this tuple into an accurate class.

 In other words, a test cannot be classified to a class label with certainty.

 In such a situation, the classification can be achieved probabilistically.

 The Bayesian classifier is an approach for modelling probabilistic relationships between

•The formula for Bayes' theorem is given as:

P(A|B) is Posterior probability: Probability of hypothesis A on the observed

You might also like