0% found this document useful (0 votes)

44 views6 pages

DWDM Mid Project

Uploaded by

akibshahrier0228

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views6 pages

DWDM Mid Project

Uploaded by

akibshahrier0228

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

AMERICAN INTERNATIONAL

UNIVERSITY-BANGLADESH
Faculty of Science and Technology

Project Cover Page

Assignment Title: Implementation of Naïve Bayes Algorithm
Assignment No: 01 Date of Submission: 16 July 2024
Course Title: Data Warehousing and Data Mining
Course Code: CSC4285 Section: A
Semester: Summer 2023-24 Course Teacher: Dr. Akinul Islam Jony

Declaration and Statement of Authorship:

1. I/we hold a copy of this Assignment/Case-Study, which can be produced if the original is lost/damaged.
2. This Assignment/Case-Study is my/our original work and no part of it has been copied from any other student’s work or
from any other source except where due acknowledgement is made.
3. No part of this Assignment/Case-Study has been written for me/us by any other person except where such
collaborationhas been authorized by the concerned teacher and is clearly acknowledged in the assignment.
4. I/we have not previously submitted or currently submitting this work for any other course/unit.
5. This work may be reproduced, communicated, compared and archived for the purpose of detecting plagiarism.
6. I/we give permission for a copy of my/our marked work to be retained by the Faculty for review and comparison,
including review by external examiners.
7. I/we understand thatPlagiarism is the presentation of the work, idea or creation of another person as though it is your
own. It is a formofcheatingandisaveryseriousacademicoffencethatmayleadtoexpulsionfromtheUniversity. Plagiarized
material can be drawn from, and presented in, written, graphic and visual form, including electronic data, and oral
presentations. Plagiarism occurs when the origin of them arterial used is not appropriately cited.
8. I/we also understand that enabling plagiarism is the act of assisting or allowing another person to plagiarize or to copy
my/our work.

* Student(s) must complete all details except the faculty use part.
** Please submit all assignments to your course teacher or the office of the concerned teacher.

Group Name/No.: -

No Name ID Program Signature

1 Shakibul Hasan 21-45263-2 BSc [CSE]
2 Srabone Raxit 21-45038-2 BSc [CSE]
3 Ashik Ahamed 21-45368-2 BSc [CSE]
4 Irtiza Ahsan Abir 21-45009-2 BSc [CSE]
5 Choose an item.

6 Choose an item.

7 Choose an item.

8 Choose an item.

9 Choose an item.

10 Choose an item.

Faculty use only

FACULTYCOMMENTS

Marks Obtained

Total Marks

Assignment/Case-Study Cover; © AIUB-2020

Project Description:
The purpose of this project is to implement the Naïve Bayes algorithm on a dataset. The labeled
dataset will be preprocessed by locating missing values, correcting invalid or noisy values,
dropping columns that do not impact the target variable, and converting numerical variables to
categorical variables, as the Naïve Bayes algorithm works best with categorical data. The dataset
will be trained, and unseen samples will be used to predict the outcomes. The percentage of
successful predictions, i.e., the accuracy of the model, will be calculated.

Dataset Link:
https://www.kaggle.com/datasets/rabieelkharoua/consumer-electronics-sales-
dataset?fbclid=IwZXh0bgNhZW0CMTAAAR3Y1p3NRHylnBwv2QI9RgsdbprSg8f9QqItRqAo
Fzuis5XjdO1QnvqeFbw_aem_irvYTw84umdZlqBYukuYCg

Dataset Description:
The “Predict Consumer Electronics Sales Dataset” provides insights into consumer electronics
sales and aims to analyze factors influencing purchase intent in the consumer electronics market.
The dataset consists of 9,000 sets of sample data. The instances in this dataset include Product ID,
Product Category (e.g., Smartphones, Laptops), Product Brand (e.g., Apple, Samsung), Product
Price, Customer Age, Customer Gender (0 - Male, 1 - Female), Purchase Frequency, Customer
Satisfaction (1 to 5), and Purchase Intent (0 - No, 1 - Yes). The Product ID variable will be
discarded since it does not impact the target variable, Purchase Intent. Product Price, Customer
Age, and Purchase Frequency are numerical variables that will be converted to categorical
variables. The dataset provides valuable information for building a model to understand and predict
customer purchase intent.

Implemented Code:
Coding was done in Python language using Google Colab.

1. Import csv file

import pandas as pd
df = pd.read_csv('/content/Consumer_Electronics_Sales_Data.csv')
df.head()

The `pandas` library was imported to read the csv file containing the dataset which was stored in
the files section of Google Colab. The `head` function is used to print the first 5 samples of the
dataset.

1|Page
2. Drop 'ProductID' column
df = df.drop('ProductID', axis=1)
df.head()

The `drop` function is used to drop the ‘ProductID’ column (`axis=1` means column) which has
no effect on the target variable.

3. Count missing values in each column

missing_values_count = df.isna().sum()
print(missing_values_count)

The `isna` function is used to locate missing values and the `sum` function is used to calculate the
total number of missing values. As can be seen, none of the variables contain any missing values.

4. Categorizing the numerical variables

bins = [15, 30, 50, 70]
labels = ['Young', 'Middle-age', 'Old-age']
df['CustomerAge'] = pd.cut(df['CustomerAge'], bins=bins, labels=labels, right=False)

bins = [1, 5, 15, 20]

labels = ['Occasional', 'Regular', 'Premium']
df['PurchaseFrequency'] = pd.cut(df['PurchaseFrequency'], bins=bins, labels=labels, right=False)

bins = [1, 1000, 2000, 3000]

labels = ['Low', 'Medium', 'High']
df['ProductPrice'] = pd.cut(df['ProductPrice'], bins=bins, labels=labels, right=False)

df.head()

2|Page
The `pd.cut` function is used to segment the data into the specified bins and labels. The
`right=False` parameter ensures that the bin intervals are closed on the left and open on the right,
meaning the rightmost edge of the interval is excluded from the bin.

This results in categorizing the ‘CustomerAge’ column into Young (15-30), Middle-age (31-50),
and Old-age (51-70), the ‘PurchaseFrequency’ column into Occasional (1-5), Regular (6-15), and
Premium (16-20), and the ‘ProductPrice’ column into Low (1-1000), Medium (1001-2000), and
High (2001-3000).

5. Renaming Categories
df['CustomerGender'] = df['CustomerGender'].replace({0: 'Male', 1: 'Female'})

df['CustomerSatisfaction'] = df['CustomerSatisfaction'].replace({1: 'Dissatisfied', 2: 'Somewhat

Dissatisfied', 3: 'Neutral', 4: 'Satisfied', 5: 'Very Satisfied'})

df['PurchaseIntent'] = df['PurchaseIntent'].replace({0: 'No', 1: 'Yes'})

df.head()

The `replace` function is used to rename numbered categories into more readable category names.
a) Renamed 0 with 'Male' and 1 with 'Female' in the 'CustomerGender' column.
b) Renamed 1 with 'Dissatisfied', 2 with 'Somewhat Dissatisfied', 3 with 'Neutral', 4 with
'Satisfied', 5 with 'Very Satisfied' in the 'CustomerSatisfaction' column.
c) Renamed 0 with 'No' and 1 with 'Yes' in the 'PurchaseIntent' column.

6. Splitting dataset for ‘Train’ and ‘Test’

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
for column in ['ProductCategory', 'ProductBrand', 'ProductPrice', 'CustomerAge',
'CustomerGender', 'PurchaseFrequency', 'CustomerSatisfaction', 'PurchaseIntent']:
df[column] = le.fit_transform(df[column])

3|Page
X = df.drop('PurchaseIntent', axis=1)
y = df['PurchaseIntent']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

X_train.shape, X_test.shape

Necessary modules from the `scikit-learn` library are imported. `train_test_split` is used to split
the dataset into training and testing sets, while `LabelEncoder` is used to convert categorical
variables into numerical values.

For each column, the `fit_transform` method of LabelEncoder is applied. This method learns
unique values, assigns them numeric codes, and converts the categorical values in the column to
their corresponding numeric codes, making them suitable for machine learning algorithms that
require numeric input.

The dataset is separated into features and the target variable. `X` contains all columns except
'PurchaseIntent', representing the input features for the model. `y` contains only the
'PurchaseIntent' column, which is the output or target variable the model will predict.

The `train_test_split` function to divide the data into training and testing subsets. `X_train` and
`y_train` are subsets of features and target variable used for training the model. `X_test` and
`y_test` are subsets of features and target variable used for testing and evaluating the model.
`test_size=0.3` specifies that 30% of the data should be reserved for testing, and 70% should be
used for training. `random_state=42` ensures that the split is reproducible by setting a fixed seed
for random number generation.

Finally, the `shape` function is used to display the shapes (i.e., dimensions) of the training and
testing sets.

7. The Naïve Bayes algorithm

import numpy as np

def gaussian_naive_bayes(X_train, y_train, X_test):

classes, counts = np.unique(y_train, return_counts=True)
priors = counts / len(y_train)

means = {}
stds = {}
for cls in classes:
cls_data = X_train[y_train == cls]
means[cls] = np.mean(cls_data, axis=0)
stds[cls] = np.std(cls_data, axis=0)

probs = []

4|Page
for cls in classes:
class_prob = np.sum(-0.5 * ((X_test - means[cls]) ** 2) / (stds[cls] ** 2)
- 0.5 * np.log(2 * np.pi * (stds[cls] ** 2)), axis=1)
probs.append(class_prob + np.log(priors[cls]))

y_pred = classes[np.argmax(probs, axis=0)]

return y_pred

y_pred = gaussian_naive_bayes(X_train, y_train, X_test)

accuracy = np.mean(y_pred == y_test)

print(f'Accuracy: {accuracy*100:.2f}%')

The `NumPy` library is imported which is used for numerical operations such as array
manipulation and mathematical calculations.

A function is defined and called that implements the Gaussian Naive Bayes algorithm.
a) To compute the prior probabilities of each class `np.unique(y_train, return_counts=True)`
gets the unique classes and their counts from the training labels and `priors` calculates the
prior probability of each class by dividing the count of each class by the total number of
training samples.
b) The mean and standard deviation of each feature for each class in the training data are
computed.
c) Then the log-probabilities of each test sample belonging to each class is computed.
d) After which the predicted class for each test sample by selecting the class with the highest
probability is determined.

The accuracy of the model is calculated and printed where `np.mean(y_pred == y_test)` computes
the mean of correct predictions (i.e., how many predictions match the true labels) and
`accuracy*100` converts the accuracy into a percentage.

Conclusion
The Gaussian Naive Bayes classifier achieved an accuracy of 80.15% in predicting customer
purchase intent based on consumer electronics sales data. This performance suggests effective
prediction capabilities, though future work could enhance results through improved data
preprocessing, feature engineering, and comparison with other algorithms.

5|Page

Mapeh (Arts) : Quarter 1 - Module 1: Logo Design
89% (19)
Mapeh (Arts) : Quarter 1 - Module 1: Logo Design
26 pages
Dsbda Lab Manual
No ratings yet
Dsbda Lab Manual
167 pages
Tutorial Data Management (Ip Class 12) (2024-25)
No ratings yet
Tutorial Data Management (Ip Class 12) (2024-25)
42 pages
AmanKhajuria IPproject
No ratings yet
AmanKhajuria IPproject
29 pages
Atulay PROJECT REPORT FILE DS 1
No ratings yet
Atulay PROJECT REPORT FILE DS 1
27 pages
Avanthika Ip
No ratings yet
Avanthika Ip
20 pages
Kami Export - Ip Practcal File Final (Term 1)
No ratings yet
Kami Export - Ip Practcal File Final (Term 1)
25 pages
Data Mining Journal 1 Kashan
No ratings yet
Data Mining Journal 1 Kashan
13 pages
Vansh
No ratings yet
Vansh
15 pages
FDS Record 5-8
No ratings yet
FDS Record 5-8
15 pages
Data Science Manual
No ratings yet
Data Science Manual
155 pages
Pallavi Model School, Alwal: A Project REPORT On
No ratings yet
Pallavi Model School, Alwal: A Project REPORT On
22 pages
Ai Class 12 Practical 2
No ratings yet
Ai Class 12 Practical 2
21 pages
Jiya Meena Final IP Project-1 - 250628 - 200559
No ratings yet
Jiya Meena Final IP Project-1 - 250628 - 200559
85 pages
PCED - Lösung en
No ratings yet
PCED - Lösung en
24 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
3mark QP MS
No ratings yet
3mark QP MS
8 pages
IIT FDS Assignment 1 Likhita
No ratings yet
IIT FDS Assignment 1 Likhita
7 pages
Cbleippu 14 B
No ratings yet
Cbleippu 14 B
8 pages
DSBDA Manual
No ratings yet
DSBDA Manual
76 pages
Project Report Format
No ratings yet
Project Report Format
18 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
DSML Problem Statements
No ratings yet
DSML Problem Statements
8 pages
IDS MIdterm Project - Section (C) Fall 24-25
No ratings yet
IDS MIdterm Project - Section (C) Fall 24-25
2 pages
Lab Questions IDSE 2024
No ratings yet
Lab Questions IDSE 2024
7 pages
DW Lab File
No ratings yet
DW Lab File
18 pages
PDA - Assignment Questions
No ratings yet
PDA - Assignment Questions
4 pages
IIT FDS Assignment1
No ratings yet
IIT FDS Assignment1
2 pages
CSCI946 Assignment - 1 - Task - Sheet
No ratings yet
CSCI946 Assignment - 1 - Task - Sheet
4 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
A1991370857 65680 10 2025 Csm355ca1
No ratings yet
A1991370857 65680 10 2025 Csm355ca1
6 pages
Project List Data Analytics
No ratings yet
Project List Data Analytics
13 pages
Hotel Project
No ratings yet
Hotel Project
45 pages
Ip Practical File 2
No ratings yet
Ip Practical File 2
30 pages
Beginner Level Projects
No ratings yet
Beginner Level Projects
5 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
ML Lab Question Set - 21
No ratings yet
ML Lab Question Set - 21
4 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
Ip - Xii - HHW Summer 2025
No ratings yet
Ip - Xii - HHW Summer 2025
2 pages
Ce473 Project - Fall 2024
No ratings yet
Ce473 Project - Fall 2024
8 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Assignment-2 IDS
No ratings yet
Assignment-2 IDS
2 pages
Amity International School SESSION: 2024-25 Informatics Practices (065) Class Xii Practical List
No ratings yet
Amity International School SESSION: 2024-25 Informatics Practices (065) Class Xii Practical List
5 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Capstone Project Guidelines
No ratings yet
Capstone Project Guidelines
2 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
Important Questions
No ratings yet
Important Questions
4 pages
Reconceptualizing Confucian Philosophy in The 21st Century 1st Edition Xinzhong Yao (Eds.) Download
100% (2)
Reconceptualizing Confucian Philosophy in The 21st Century 1st Edition Xinzhong Yao (Eds.) Download
56 pages
XII - IP - Practical - List 2023-24
No ratings yet
XII - IP - Practical - List 2023-24
4 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Practical File Informatics Practices (2024-2025)
No ratings yet
Practical File Informatics Practices (2024-2025)
47 pages
Datascience
No ratings yet
Datascience
8 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
ITECH2302 MainAssessment Report
No ratings yet
ITECH2302 MainAssessment Report
8 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
TOPIC 01 - INTRO TO ARCHITECTURAL DESIGN 5 - Snotes
No ratings yet
TOPIC 01 - INTRO TO ARCHITECTURAL DESIGN 5 - Snotes
26 pages
Informatics Practicals PDF
No ratings yet
Informatics Practicals PDF
10 pages
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
No ratings yet
Chief Affidavit of Petitioner M.v.O.P.225 of 2013
4 pages
A Project Report On Tutorial Data Management FOR Aissce 2021 Examination (As A Part of Informatics Practises Course (065) )
No ratings yet
A Project Report On Tutorial Data Management FOR Aissce 2021 Examination (As A Part of Informatics Practises Course (065) )
45 pages
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
No ratings yet
TNPSC Group 2 Mains Preparation Book List For Latest Updated Syllabus - TNPSC Group 4, VAO, Group 2, Group 1, Notificati 1
5 pages
PART II - Private Corporations
No ratings yet
PART II - Private Corporations
6 pages
Ept Reviewer With Answers
No ratings yet
Ept Reviewer With Answers
24 pages
Yogashayan Sanskrit Asanas Names List
No ratings yet
Yogashayan Sanskrit Asanas Names List
4 pages
Session 2: Personal Professional Development: Pre-Test
No ratings yet
Session 2: Personal Professional Development: Pre-Test
9 pages
Mineral Resource Conflict Jharkhand
No ratings yet
Mineral Resource Conflict Jharkhand
20 pages
Final Lec 1
No ratings yet
Final Lec 1
20 pages
Full-Stack Development 5 Day Workshop Syllabus
No ratings yet
Full-Stack Development 5 Day Workshop Syllabus
5 pages
Series Circuits
No ratings yet
Series Circuits
4 pages
Administration of Justice
No ratings yet
Administration of Justice
4 pages
Nursing in Research in Malawi
100% (1)
Nursing in Research in Malawi
28 pages
(Archives of Electrical Engineering) Modeling Simulation and Experimental Analysis of Permanent Magnet Brushless DC Motors For Sensorless Operation
No ratings yet
(Archives of Electrical Engineering) Modeling Simulation and Experimental Analysis of Permanent Magnet Brushless DC Motors For Sensorless Operation
17 pages
Chapter 2 - DTC & MTC
No ratings yet
Chapter 2 - DTC & MTC
49 pages
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
No ratings yet
Evaluate Vygotsky's Theory of Cognitive Development (8 Marks)
1 page
Life of Augustine of Hippo The Donatist Controvers... - (PG 25 - 164) PDF
No ratings yet
Life of Augustine of Hippo The Donatist Controvers... - (PG 25 - 164) PDF
140 pages
PSYC 262 Course Syllabus (Spring 2024)
No ratings yet
PSYC 262 Course Syllabus (Spring 2024)
11 pages
The Reign of Terror
No ratings yet
The Reign of Terror
11 pages
IDE Faith Sharing
No ratings yet
IDE Faith Sharing
9 pages
Beyond The Oedipus Complex
No ratings yet
Beyond The Oedipus Complex
16 pages
Villarba Vs Court of Appeals
No ratings yet
Villarba Vs Court of Appeals
16 pages
Main Street Magazine Issue 7
No ratings yet
Main Street Magazine Issue 7
8 pages
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
No ratings yet
CIN: U40109MH2005SGC153645: (A Govt. of Maharashtra Undertaking)
1 page
Warda Resume
No ratings yet
Warda Resume
4 pages
Atestat Engleza
No ratings yet
Atestat Engleza
9 pages
Armance V The State 2020 SCJ 148
No ratings yet
Armance V The State 2020 SCJ 148
9 pages
Winklers Disease
No ratings yet
Winklers Disease
2 pages
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
From Everand
ANSYS Workbench 2023 R2: A Tutorial Approach, 6th Edition
Prof. Sham Tickoo
No ratings yet
Autodesk 3ds Max 2023 for Beginners: A Tutorial Approach, 23rd Edition
From Everand
Autodesk 3ds Max 2023 for Beginners: A Tutorial Approach, 23rd Edition
Prof. Sham Tickoo
No ratings yet

DWDM Mid Project

Uploaded by

DWDM Mid Project

Uploaded by

AMERICAN INTERNATIONAL

Project Cover Page

Declaration and Statement of Authorship:

No Name ID Program Signature

Faculty use only

Assignment/Case-Study Cover; © AIUB-2020

1. Import csv file

3. Count missing values in each column

4. Categorizing the numerical variables

bins = [1, 5, 15, 20]

bins = [1, 1000, 2000, 3000]

df['CustomerSatisfaction'] = df['CustomerSatisfaction'].replace({1: 'Dissatisfied', 2: 'Somewhat

df['PurchaseIntent'] = df['PurchaseIntent'].replace({0: 'No', 1: 'Yes'})

6. Splitting dataset for ‘Train’ and ‘Test’

7. The Naïve Bayes algorithm

def gaussian_naive_bayes(X_train, y_train, X_test):

y_pred = classes[np.argmax(probs, axis=0)]

y_pred = gaussian_naive_bayes(X_train, y_train, X_test)

accuracy = np.mean(y_pred == y_test)

You might also like