Classification in Data Mining

Data mining involves analyzing large datasets to discover patterns and extract knowledge. Classification is a data mining technique that builds a model to predict class membership for new data based on a training dataset where the classes are known. There are two main types of classifiers - discriminative classifiers that determine a single class and depend on data quality, and generative classifiers that model class distributions to predict unseen data. Classification involves training a model on a dataset where the classes are known, and then testing the model on another dataset to evaluate accuracy.

Uploaded by

ahkadflhkasdf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

190 views

Classification in Data Mining

Uploaded by

ahkadflhkasdf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

CLASSIFICATION IN

DATA MINING
Gaurav Chauhan
BCA 5 th
WHAT IS DATA MINING?
◦ Data mining in general terms means mining or digging deep into data which is in different forms to gain
patterns, and to gain knowledge on that pattern. In the process of data mining, large data sets are first
sorted, then patterns are identified, and relationships are established to perform data analysis and solve
problems.
WHAT IS CLASSIFICATION?
◦ It is a Data analysis task, i.e. the process of finding a model that describes and distinguishes data classes
and concepts. Classification is the problem of identifying to which of a set of categories (subpopulations),
a new observation belongs to, based on a training set of data containing observations and whose
categories membership is known.

◦ Example: Before starting any Project, we need to check its feasibility. In this case, a classifier is required
to predict class labels such as ‘Safe’ and ‘Risky’ for adopting the Project and to further approve it. It is a
two-step process such as :
◦ Learning Step (Training Phase): Construction of Classification Model
Different Algorithms are used to build a classifier by making the model learn using the training set
available. The model must be trained for the prediction of accurate results.
◦ Classification Step: Model used to predict class labels and testing the constructed model on test data
and hence estimate the accuracy of the classification rules.
TRAINING & TESTING
◦ Suppose there is a person who is sitting under a fan and the fan starts falling on him, he should get aside
in order not to get hurt. So, this is his training part to move away. While Testing if the person sees any
heavy object coming towards him or falling on him and moves aside then the system is tested positively
and if the person does not move aside then the system is negatively tested.
Same is the case with the data, it should be trained in order to get the accurate and best results.
◦ There are certain data types associated with data mining that actually tells us the format of the file
(whether it is in text format or in numerical format).
Classifiers can be categorized into two
major types:
◦ Discriminative: It is a very basic classifier and determines just one class for each row of data. It tries to
model just by depending on the observed data, depends heavily on the quality of data rather than on
distributions.
Example: Logistic Regression
Acceptance of a student at a University (Test and Grades need to be considered)
Suppose there are few students and the Result of them are as follows :

Student 1 : Test Score: 9/10, Grades: 8/10 Result: Accepted

Student 2 : Test Score: 3/10, Grades: 4/10, Result: Rejected
Student 3 : Test Score: 7/10, Grades: 6/10, Result: to be tested
◦ Generative: It models the distribution of individual classes and tries to learn the model that generates
the data behind the scenes by estimating assumptions and distributions of the model. Used to predict
the unseen data.
Example: Naive Bayes Classifier
Detecting Spam emails by looking at the previous data. Suppose 100 emails and that too divided in 1:4 i.e.
Class A: 25%(Spam emails) and Class B: 75%(Non-Spam emails). Now if a user wants to check that if an
email contains the word cheap, then that may be termed as Spam.
It seems to be that in Class A(i.e. in 25% of data), 20 out of 25 emails are spam and rest not.
And in Class B(i.e. in 75% of data), 70 out of 75 emails are not spam and rest are spam.
So, if the email contains the word cheap, what is the probability of it being spam ?? (= 80%)
CLASSIFIERS OF MACHINE
LEARNING
◦ Decision Trees
◦ Bayesian Classifiers
◦ Neural Networks
◦ K-Nearest Neighbor
◦ Support Vector Machines
◦ Linear Regression
◦ Logistic Regression
ASSOCIATED TOOLS AND
LANGUAGES
Used to mine/extract useful information from raw data.

◦ Main Languages used: R, SAS, Python, SQL

◦ Major Tools used: RapidMiner, Orange, KNIME, Spark, Weka
◦ Libraries used: Jupyter, NumPy, Matplotlib, Pandas, ScikitLearn, NLTK, TensorFlow, Seaborn, Basemap,
etc.
REAL LIFE EXAMPLES
◦ Market Basket Analysis:
It is a modeling technique that has been associated with frequent transactions of buying some
combination of items.
Example: Amazon and many other Retailers use this technique. While viewing some product, certain
suggestions for the commodities are shown that some people have bought in the past.
◦ Weather Forecasting:
Changing Patterns in weather conditions needs to be observed based on parameters such as
temperature, humidity, wind direction. This keen observation also requires the use of previous records in
order to predict it accurately.
ADVANTAGES
◦ Mining Based Methods are cost effective and efficient
◦ Helps in identifying criminal suspects
◦ Helps in predicting risk of diseases
◦ Helps Banks and Financial Institutions to identify defaulters so that they may approve Cards, Loan, etc.
DISADVANTAGES
◦ Privacy: When the data is either are chances that a company may give some information about their
customers to other vendors or use this information for their profit.
Accuracy Problem: Selection of Accurate model must be there in order to get the best accuracy and
result.
APPLICATIONS
◦ Marketing and Retailing
◦ Manufacturing
◦ Telecommunication Industry
◦ Intrusion Detection
◦ Education System
◦ Fraud Detection
GIST OF DATA MINING
◦ Choosing the correct classification method, like decision trees, Bayesian networks, or neural networks.
◦ Need a sample of data, where all class values are known. Then the data will be divided into two parts, a
training set, and a test set.
Now, the training set is given to a learning algorithm, which derives a classifier. Then the classifier is tested
with the test set, where all class values are hidden.
If the classifier classifies most cases in the test set correctly, it can be assumed that it works accurately also
on the future data else it may be a wrong model chosen.

Tony Robbins - The Dickens Process
100% (1)
Tony Robbins - The Dickens Process
3 pages
Basic Concept of Classification (Data Mining)
No ratings yet
Basic Concept of Classification (Data Mining)
11 pages
Classification
No ratings yet
Classification
15 pages
classification basic concept.data mining
No ratings yet
classification basic concept.data mining
20 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Classification
No ratings yet
Classification
50 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
No ratings yet
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
43 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
DATA MINING JNTUH CSE R18
No ratings yet
DATA MINING JNTUH CSE R18
20 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
DWM Merged
No ratings yet
DWM Merged
125 pages
UNIT 1 (1)
No ratings yet
UNIT 1 (1)
59 pages
Lect 1
No ratings yet
Lect 1
38 pages
Introd M
No ratings yet
Introd M
37 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
DWM Unit 3 Final Notes
No ratings yet
DWM Unit 3 Final Notes
47 pages
Introduction To Data Mining Techniques: Dr. Rajni Jain
No ratings yet
Introduction To Data Mining Techniques: Dr. Rajni Jain
11 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
Data Mining
No ratings yet
Data Mining
30 pages
Exercises 5
No ratings yet
Exercises 5
5 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
DM - MOD - 1 Part II
No ratings yet
DM - MOD - 1 Part II
14 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Unit 3
No ratings yet
Unit 3
33 pages
DATA MINING
No ratings yet
DATA MINING
7 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
2 unit
No ratings yet
2 unit
15 pages
overview_basics
No ratings yet
overview_basics
16 pages
A Review of Multi-Class Classification Algorithms
No ratings yet
A Review of Multi-Class Classification Algorithms
10 pages
Lesson Data Mining
No ratings yet
Lesson Data Mining
75 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Data Warehouse and Mining Notes
No ratings yet
Data Warehouse and Mining Notes
12 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
Lec 1
No ratings yet
Lec 1
48 pages
Use of Data Mining and Text Mining (Machine Learning)
No ratings yet
Use of Data Mining and Text Mining (Machine Learning)
42 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Data Mining Classification Prediction
No ratings yet
Data Mining Classification Prediction
3 pages
Data Mining With Clustering AND Classification
No ratings yet
Data Mining With Clustering AND Classification
16 pages
Data Mining: July 18, 2019 1
No ratings yet
Data Mining: July 18, 2019 1
41 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
Introd M
No ratings yet
Introd M
38 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Week2 - The Lean Startup
No ratings yet
Week2 - The Lean Startup
48 pages
Note 24. Stewardship Land and Heritage Assets: 161 Notes To The Financial Statements
No ratings yet
Note 24. Stewardship Land and Heritage Assets: 161 Notes To The Financial Statements
1 page
Level 3 Reading Exam A v0.2
No ratings yet
Level 3 Reading Exam A v0.2
6 pages
Same-Sex Marriage Panel Discussion
No ratings yet
Same-Sex Marriage Panel Discussion
2 pages
Complete Download A Pilgrimage to Nejd The Cradle of the Arab Race A Visit to the Court of the Arab Emir and Our Persian Campain Unabridged Edition Volume 1 Lady Anne Blunt PDF All Chapters
100% (4)
Complete Download A Pilgrimage to Nejd The Cradle of the Arab Race A Visit to the Court of the Arab Emir and Our Persian Campain Unabridged Edition Volume 1 Lady Anne Blunt PDF All Chapters
81 pages
Oriental Philosophy
No ratings yet
Oriental Philosophy
37 pages
Basic Maths 01 Trigonometry
No ratings yet
Basic Maths 01 Trigonometry
21 pages
Clauses Exercise 3
No ratings yet
Clauses Exercise 3
4 pages
Psychology Themes Variations 7th Edition Wayne Weiten - Experience the full ebook by downloading it now
100% (1)
Psychology Themes Variations 7th Edition Wayne Weiten - Experience the full ebook by downloading it now
84 pages
48 Wan Nooraini Wan Kamaruddin
No ratings yet
48 Wan Nooraini Wan Kamaruddin
8 pages
Hamā Anjuman Prayers For Naurooz in English, Farsi and Gujarati
No ratings yet
Hamā Anjuman Prayers For Naurooz in English, Farsi and Gujarati
52 pages
Grade 11
No ratings yet
Grade 11
53 pages
Propuesta Educativa
100% (1)
Propuesta Educativa
274 pages
Gupta Period
No ratings yet
Gupta Period
25 pages
Government's Sentencing Memorandum: Chuck Turner
No ratings yet
Government's Sentencing Memorandum: Chuck Turner
7 pages
The Popliteal Fossa
No ratings yet
The Popliteal Fossa
4 pages
India Gupta Empire: Chatura Ga
No ratings yet
India Gupta Empire: Chatura Ga
15 pages
Dana Gottesfeld Letter To Trump and Sessions To Plea For Martin Gottesfeld's Freedom
No ratings yet
Dana Gottesfeld Letter To Trump and Sessions To Plea For Martin Gottesfeld's Freedom
3 pages
Biodiversity and Ecological Status of Iringole Kavu
No ratings yet
Biodiversity and Ecological Status of Iringole Kavu
7 pages
Catholic Education Western Australia - Code of Ethical Conduct
No ratings yet
Catholic Education Western Australia - Code of Ethical Conduct
2 pages
Valid Metrics Framework
No ratings yet
Valid Metrics Framework
18 pages
SY 2023 2024 G7 Entrance Exam English AK
No ratings yet
SY 2023 2024 G7 Entrance Exam English AK
4 pages
ECG Interpretation
100% (1)
ECG Interpretation
82 pages
Chapter Three: The Ephesian Church Age (A.D. 53 To A.D. 170)
No ratings yet
Chapter Three: The Ephesian Church Age (A.D. 53 To A.D. 170)
5 pages
Barriers To Communication
No ratings yet
Barriers To Communication
10 pages
Chapter 11
No ratings yet
Chapter 11
4 pages
Madhubani Painting
No ratings yet
Madhubani Painting
14 pages
Notes For Lord of The Flies
No ratings yet
Notes For Lord of The Flies
4 pages
1992 Resumo Doutrina Wolfowitz New York Times (TYLER 1992)
No ratings yet
1992 Resumo Doutrina Wolfowitz New York Times (TYLER 1992)
4 pages

Classification in Data Mining

Uploaded by

Classification in Data Mining

Uploaded by

CLASSIFICATION IN

Student 1 : Test Score: 9/10, Grades: 8/10 Result: Accepted

◦ Main Languages used: R, SAS, Python, SQL

You might also like