0% found this document useful (0 votes)
14 views10 pages

Data Mining Project Using Naive Bayes

The document is a student assignment submission for a Data Warehousing and Data Mining course. It discusses using the Naive Bayes classification algorithm to predict whether patients will have a stroke based on attributes like gender, age, and smoking status from a dataset containing over 5,000 records obtained from Kaggle. The student performs preprocessing like splitting the data into training and test sets, trains a Naive Bayes model on the training set which achieves an accuracy of 90.7697%, and tests it on the held-out test set for an accuracy of 92.4658%.

Uploaded by

Mr SHINIGAMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views10 pages

Data Mining Project Using Naive Bayes

The document is a student assignment submission for a Data Warehousing and Data Mining course. It discusses using the Naive Bayes classification algorithm to predict whether patients will have a stroke based on attributes like gender, age, and smoking status from a dataset containing over 5,000 records obtained from Kaggle. The student performs preprocessing like splitting the data into training and test sets, trains a Naive Bayes model on the training set which achieves an accuracy of 90.7697%, and tests it on the held-out test set for an accuracy of 92.4658%.

Uploaded by

Mr SHINIGAMI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

DWDM Midterm Assignment

Course: Data Warehousing & Data Mining

Submitted by:
ID:
Section:
To
Course Teacher:

Date of Submission:

American International University-Bangladesh (AIUB)


I am selecting Naive Bayes classification because,
1. This algorithm works very fast and can easily predict the class of a test
dataset.
2. We can use it to solve multi-class prediction problems as it’s quite useful
with them.
3. Naive Bayes classifier performs better than other models with less training
data if the assumption of independence of features holds.
4. If we have categorical input variables, the Naive Bayes algorithm performs
exceptionally well in comparison to numerical variables.

I have selected this dataset from Kaggle website. This dataset is used to
predict whether a patient is likely to get stroke based on the input
parameters like gender, age, and smoking status. Each row in the data
provides relevant information about the patient.
There are,
Attribute: 9
 ID
 Gender
 Age
 Ever Married
 Work Type
 Residence
 Avg Glucose Level
 Smoking Status
 Stroke
Stroke represents class attribute.

Total instance: 5110


Processes in WEKA:
Original Dataset:
Training Dataset:

Here are the data that were selected by using WEKA,


60% of instances were taken in this step.
Data of training set,

By using Naïve Bayes my accuracy of this training set is 90.7697%,


Test Dataset:
Here are the rest half of 40% data that were not taken during training
set,
By using Naïve Bayes my accuracy of this test set is 92.4658%,
Cross Validation:
By using Naïve Bayes my accuracy of this Cross Validation is 91.7808%,
Predicted Data:
Here is the prediction from our data,

You might also like