Exp.8 Demonstration of Classification Process On Dataset Employee - Arff Using Naïve Bayes Algorithm
Exp.8 Demonstration of Classification Process On Dataset Employee - Arff Using Naïve Bayes Algorithm
Aim: This experiment illustrates the use of naïve bayes classifier in weka. The sample data
set used in this experiment is “employee”data available at arff format. This document
assumes that appropriate data pre processing has been performed.
Step2: next we select the “classify” tab and click “choose” button to select the “id3”classifier.
Step3: now we specify the various parameters. These can be specified by clicking in the text
box to the right of the chose button. In this example, we accept the default values his default
version does perform some pruning but does not perform error pruning.
Step4: under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is necessary to
get a reasonable idea of accuracy of generated model.
Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can be
done by right clicking the last result set and selecting “visualize tree” from the pop-up menu.
Step-9: In the main panel under “text “options click the “supplied test set” radio button and
then click the “set” button. This will show pop-up window which will allow you to open the
file containing test instances.
Data set employee.arff:
@relation employee
@attribute salary{10k,15k,17k,20k,25k,30k,35k,32k}
@data
%
The following screenshot shows the classification rules that were generated when naive bayes
algorithm is applied on the given dataset.