0% found this document useful (0 votes)

29 views6 pages

Assignment 03

Uploaded by

DHRUV TILLU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views6 pages

Assignment 03

Uploaded by

DHRUV TILLU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Name: Dhruv Jayant Tillu Roll No.

: 6107
Subject: 510303 - BDA

ASSIGNMENT: 03
Aim: Perform Naïve Bayes & Linear Regression on individual dataset.

Requirements:
• Software: PyCharm Professional
• Libraries: PySpark Module
• Dataset: salary.csv and drug.csv from kaggle

Theory: Naive Bayes classification is a probabilistic algorithm based on Bayes' theorem, which assumes
independence among predictors. It is particularly effective for categorical data, making it suitable for tasks
like drug classification in medical datasets. The algorithm computes the probability of each class given the
input features, selecting the class with the highest probability as the predicted output. In this
implementation, categorical variables are transformed using label encoding, and feature scaling is applied to
ensure uniformity. The model's performance is evaluated using metrics such as accuracy, precision, recall,
and the confusion matrix, providing insights into its classification capabilities and effectiveness.

Linear regression is a statistical method used to model the relationship between a dependent variable (e.g., salary)
and one or more independent variables (e.g., years of experience). The technique assumes a linear relationship,
meaning the change in the dependent variable is proportional to changes in the independent variables. The model is
expressed in the form Y=a+bXY = a + bXY=a+bX, where YYY is the predicted value, aaa is the intercept, bbb is the
slope, and XXX represents the independent variable. The model is trained on a dataset to minimize errors, typically
using metrics such as Mean Absolute Error (MAE) and R² Score for evaluation.

Code for Naïve Bayes:

# Perform Naive Bayes Classification on Drug Dataset

import pandas as pd
from numpy.ma.core import shape
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

# load dataset
dataset = pd.read_csv('drug200.csv')
dataset

Age Sex BP Cholesterol Na_to_K Drug

0 23 F HIGH HIGH 25.355 DrugY
1 47 M LOW HIGH 13.093 drugC
2 47 M LOW HIGH 10.114 drugC
3 28 F NORMAL HIGH 7.798 drugX
4 61 F LOW HIGH 18.043 DrugY
.. ... .. ... ... ... ...
195 56 F LOW HIGH 11.567 drugC
196 16 M LOW HIGH 12.006 drugC
197 52 M NORMAL HIGH 9.894 drugX
198 23 M NORMAL NORMAL 14.020 drugX
199 40 F LOW NORMAL 11.349 drugX

[200 rows x 6 columns]

Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510303 - BDA

# Transform Categorical Data using Label Encoding

from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
dataset['Sex'] = labelencoder.fit_transform(dataset['Sex'])
dataset['BP'] = labelencoder.fit_transform(dataset['BP'])
dataset['Cholesterol'] = labelencoder.fit_transform(dataset['Cholesterol'])
dataset['Drug'] = labelencoder.fit_transform(dataset['Drug'])
dataset

Age Sex BP Cholesterol Na_to_K Drug

0 23 0 0 0 25.355 0
1 47 1 1 0 13.093 3
2 47 1 1 0 10.114 3
3 28 0 2 0 7.798 4
4 61 0 1 0 18.043 0
.. ... ... .. ... ... ...
195 56 0 1 0 11.567 3
196 16 1 1 0 12.006 3
197 52 1 2 0 9.894 4
198 23 1 2 1 14.020 4
199 40 0 1 1 11.349 4

[200 rows x 6 columns]

# Splitting the dataset into the Training set and Test set
X = dataset.iloc[:, 0:5].values
y = dataset.iloc[:, 5].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,

random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

nvc = GaussianNB()
nvc.fit(X_train, y_train)

GaussianNB()

y_pred = nvc.predict(X_test)
y_pred

array([3, 4, 3, 0, 0, 4, 4, 4, 3, 4, 1, 0, 0, 0, 2, 3, 0, 0, 4, 1, 1, 4,
4, 4, 0, 0, 0, 0, 0, 4, 4, 3, 1, 4, 0, 0, 4, 3, 1, 4, 0, 1, 0, 0,
0, 4, 4, 0, 1, 2])

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
import seaborn as sns
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True)

<Axes: >
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510303 - BDA

# Accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))

0.84

# Precision
from sklearn.metrics import precision_score
print(precision_score(y_test, y_pred, average=None))

[0.94736842 0.71428571 0.5 0.5 0.9375 ]

# Recall
from sklearn.metrics import recall_score
print(recall_score(y_test, y_pred, average=None))

[0.72 1. 1. 1. 0.9375]

# Perform visualization
import matplotlib.pyplot as plt
dataset = pd.read_csv('drug200.csv')
plt.bar(dataset['Age'], dataset['BP'])
plt.show()

plt.bar(dataset['Age'], dataset['Cholesterol'])
plt.show()
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510303 - BDA

Code for Linear Regression:

# Perform Linear Regression on Salary Dataset

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error

dataset = pd.read_csv('Salary_dataset.csv')
dataset

Unnamed: 0 YearsExperience Salary

0 0 1.2 39344.0
1 1 1.4 46206.0
2 2 1.6 37732.0
3 3 2.1 43526.0
4 4 2.3 39892.0
5 5 3.0 56643.0
6 6 3.1 60151.0
7 7 3.3 54446.0
8 8 3.3 64446.0
9 9 3.8 57190.0
10 10 4.0 63219.0
11 11 4.1 55795.0
12 12 4.1 56958.0
13 13 4.2 57082.0
14 14 4.6 61112.0
15 15 5.0 67939.0
16 16 5.2 66030.0
17 17 5.4 83089.0
18 18 6.0 81364.0
19 19 6.1 93941.0
20 20 6.9 91739.0
21 21 7.2 98274.0
22 22 8.0 101303.0
23 23 8.3 113813.0
24 24 8.8 109432.0
25 25 9.1 105583.0
26 26 9.6 116970.0
27 27 9.7 112636.0
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510303 - BDA

28 28 10.4 122392.0
29 29 10.6 121873.0

# Perform Visualization
plt.bar(dataset['YearsExperience'], dataset['Salary'])

<BarContainer object of 30 artists>

plt.scatter(dataset['YearsExperience'], dataset['Salary'])

<matplotlib.collections.PathCollection at 0x25c02ae3700>

# Drop first column

X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3,
random_state = 0)

# Fitting Simple Linear Regression to the Training set

regressor = LinearRegression()
regressor.fit(X_train, y_train)

LinearRegression()

# Predicting the Test set results

y_pred = regressor.predict(X_test)

# Performance Metrics
print('Mean Absolute Error:', mean_absolute_error(y_test, y_pred))
Name: Dhruv Jayant Tillu Roll No.: 6107
Subject: 510303 - BDA

print('Mean Squared Error:', mean_squared_error(y_test, y_pred))

print('Accuracy:', regressor.score(X_test, y_test))

Mean Absolute Error: 1.7541523789077474e-15

Mean Squared Error: 4.067564042545842e-30
Accuracy: 1.0

Conclusion: In conclusion, the Naive Bayes classification on the drug dataset showcases its effectiveness in
predicting drug categories based on patient attributes. By utilizing label encoding and feature scaling, the
model was trained and evaluated using metrics like accuracy, precision, and recall. The results highlight Naive
Bayes as a suitable choice for classification tasks in medical datasets, with potential for further optimization
through algorithm exploration and hyperparameter tuning.
Linear regression serves as a fundamental tool in predictive analytics, providing a straightforward method to
quantify relationships between variables. By fitting a linear model to the data, we can make informed
predictions, such as estimating salaries based on years of experience. Evaluating the model with metrics like
Mean Absolute Error (MAE) and R² Score ensures its effectiveness and accuracy. This approach not only aids
in understanding trends within the data but also facilitates decision-making across various fields, making it a
valuable asset for data analysis and interpretation.

Design of Cold Formed Steel Structures
80% (5)
Design of Cold Formed Steel Structures
94 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Robinson Series 22 - 44
100% (3)
Robinson Series 22 - 44
39 pages
MLA Lab Record (2024)
No ratings yet
MLA Lab Record (2024)
47 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
FL PDF
100% (1)
FL PDF
190 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
ML Lab Record
No ratings yet
ML Lab Record
38 pages
103 Manufacturing Engineering PDF
100% (1)
103 Manufacturing Engineering PDF
73 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
AIL303 M
No ratings yet
AIL303 M
22 pages
DA Programs
No ratings yet
DA Programs
44 pages
Board Information & Wiring Diagram Power Supply: Com Bat Com V V
No ratings yet
Board Information & Wiring Diagram Power Supply: Com Bat Com V V
1 page
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Python Solution
No ratings yet
Python Solution
30 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Answer PDF Lab
No ratings yet
Answer PDF Lab
34 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
DS Report 03
No ratings yet
DS Report 03
30 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Sarthak Python
No ratings yet
Sarthak Python
6 pages
Physical Pharmacy
No ratings yet
Physical Pharmacy
47 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Ankit Python
No ratings yet
Ankit Python
26 pages
TYCS Practical
No ratings yet
TYCS Practical
26 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
Python 1
No ratings yet
Python 1
3 pages
ML Assignment
No ratings yet
ML Assignment
3 pages
Atul MLT Exp 4-11
No ratings yet
Atul MLT Exp 4-11
17 pages
ML Manual
No ratings yet
ML Manual
21 pages
Core Java
No ratings yet
Core Java
6 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Practical 4
No ratings yet
Practical 4
2 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
ML Program 7, 8,9 And10
No ratings yet
ML Program 7, 8,9 And10
12 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
3.2.2 External Timber Walls
No ratings yet
3.2.2 External Timber Walls
24 pages
DS 5
No ratings yet
DS 5
2 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
Power Team PUA PMA Series Pumps - Catalog
No ratings yet
Power Team PUA PMA Series Pumps - Catalog
4 pages
Index
No ratings yet
Index
4 pages
MTech Thermal Sciences
No ratings yet
MTech Thermal Sciences
32 pages
Data Science
No ratings yet
Data Science
18 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Artificial Neural Network (Ann)
No ratings yet
Artificial Neural Network (Ann)
1 page
Polyflow Offshore Pull Through Procedure
No ratings yet
Polyflow Offshore Pull Through Procedure
8 pages
Student - Linear Regression Example - Colaboratory
No ratings yet
Student - Linear Regression Example - Colaboratory
6 pages
E-Governance Case Study: ITC's E-Choupal
No ratings yet
E-Governance Case Study: ITC's E-Choupal
16 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
NID Prevention
No ratings yet
NID Prevention
18 pages
Veritas Volume Manager Cheat Sheet
No ratings yet
Veritas Volume Manager Cheat Sheet
8 pages
00 I31e TB Series
No ratings yet
00 I31e TB Series
7 pages
Marshall jfx-1 PDF
No ratings yet
Marshall jfx-1 PDF
13 pages
Glosap Group Profile PDF
No ratings yet
Glosap Group Profile PDF
36 pages
Fighter Aircrafts PDF
No ratings yet
Fighter Aircrafts PDF
2 pages
Hiking Journal
No ratings yet
Hiking Journal
138 pages
Call For Book Chapters
No ratings yet
Call For Book Chapters
4 pages
1994 Castana Principle of AVO Crossplotting
No ratings yet
1994 Castana Principle of AVO Crossplotting
1 page
Unit Plan Template
No ratings yet
Unit Plan Template
14 pages
Resistance Welding Controls AND Applications
No ratings yet
Resistance Welding Controls AND Applications
68 pages
Portfolio - Aditya Panchal
No ratings yet
Portfolio - Aditya Panchal
26 pages
One Night and One Night Only
No ratings yet
One Night and One Night Only
1 page
Chart Alert: !! URGENT !!
No ratings yet
Chart Alert: !! URGENT !!
6 pages
Resume Pengelly Web
No ratings yet
Resume Pengelly Web
1 page
Distillation Column - Basic Distillation Equipment and Operation
No ratings yet
Distillation Column - Basic Distillation Equipment and Operation
6 pages
DRDO Nishant
No ratings yet
DRDO Nishant
4 pages
UV-LED Curing For An Industrial Wood Coating Application
No ratings yet
UV-LED Curing For An Industrial Wood Coating Application
6 pages

Assignment 03

Uploaded by

Assignment 03

Uploaded by

Name: Dhruv Jayant Tillu Roll No.

Code for Naïve Bayes:

Age Sex BP Cholesterol Na_to_K Drug

[200 rows x 6 columns]

# Transform Categorical Data using Label Encoding

Age Sex BP Cholesterol Na_to_K Drug

[200 rows x 6 columns]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25,

# Making the Confusion Matrix

[0.94736842 0.71428571 0.5 0.5 0.9375 ]

Code for Linear Regression:

Unnamed: 0 YearsExperience Salary

<BarContainer object of 30 artists>

# Drop first column

# Fitting Simple Linear Regression to the Training set

# Predicting the Test set results

print('Mean Squared Error:', mean_squared_error(y_test, y_pred))

Mean Absolute Error: 1.7541523789077474e-15

You might also like