Classification in Data Mining
Classification in Data Mining
DATA MINING
Gaurav Chauhan
BCA 5 th
WHAT IS DATA MINING?
◦ Data mining in general terms means mining or digging deep into data which is in different forms to gain
patterns, and to gain knowledge on that pattern. In the process of data mining, large data sets are first
sorted, then patterns are identified, and relationships are established to perform data analysis and solve
problems.
WHAT IS CLASSIFICATION?
◦ It is a Data analysis task, i.e. the process of finding a model that describes and distinguishes data classes
and concepts. Classification is the problem of identifying to which of a set of categories (subpopulations),
a new observation belongs to, based on a training set of data containing observations and whose
categories membership is known.
◦ Example: Before starting any Project, we need to check its feasibility. In this case, a classifier is required
to predict class labels such as ‘Safe’ and ‘Risky’ for adopting the Project and to further approve it. It is a
two-step process such as :
◦ Learning Step (Training Phase): Construction of Classification Model
Different Algorithms are used to build a classifier by making the model learn using the training set
available. The model must be trained for the prediction of accurate results.
◦ Classification Step: Model used to predict class labels and testing the constructed model on test data
and hence estimate the accuracy of the classification rules.
TRAINING & TESTING
◦ Suppose there is a person who is sitting under a fan and the fan starts falling on him, he should get aside
in order not to get hurt. So, this is his training part to move away. While Testing if the person sees any
heavy object coming towards him or falling on him and moves aside then the system is tested positively
and if the person does not move aside then the system is negatively tested.
Same is the case with the data, it should be trained in order to get the accurate and best results.
◦ There are certain data types associated with data mining that actually tells us the format of the file
(whether it is in text format or in numerical format).
Classifiers can be categorized into two
major types:
◦ Discriminative: It is a very basic classifier and determines just one class for each row of data. It tries to
model just by depending on the observed data, depends heavily on the quality of data rather than on
distributions.
Example: Logistic Regression
Acceptance of a student at a University (Test and Grades need to be considered)
Suppose there are few students and the Result of them are as follows :