0% found this document useful (0 votes)

17 views42 pages

Machine Learning Lab Manual (1)

The document is a lab manual for a Machine Learning course at Lakireddy Bali Reddy College of Engineering, detailing various modules including data exploration, visualization, preprocessing, and advanced techniques such as PCA, SVD, and LDA. Each module includes aims, descriptions, and sample programs demonstrating the application of machine learning concepts using Python. The manual serves as a comprehensive guide for students to understand and implement machine learning algorithms and data analysis techniques.

Uploaded by

divyareddyguduru456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views42 pages

Machine Learning Lab Manual (1)

Uploaded by

divyareddyguduru456

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

LAKIREDDY BALI REDDYCOLLEGE OF ENGINEERING

(AUTONOMOUS)
Department of Computer Science & Engineering

23DS61 – MACHINE LEARNING LAB Manual

R23-M.Tech Data Science

1|P a ge
Index
Name of the Module

1. Basic statistical functions for Data exploration.

2. Data Visualization: Box plot, Scatterplot, histograms.

3.Data Pre-processing: Handling missing values, Outliers, Normalization,

Scaling.
4. Principal Component Analysis(PCA)

5. Singular Value Decomposition(SVD)

6. Linear Discriminant Analysis(LDA)

7. Regression Analysis: Linear Regression, Logistic Regression,

Polynomial Regression.
8. Regularized Regression

9. K Nearest Neighbor (KNN) Classifier.

10.Support Vector Machines (SVMs)

11.Random Forest model

12.AdaBoost Classifier and XGBoost

2|P a ge
Module 1
Aim:
Basic statistical functions for data exploration
No of Attributes:
1. Alcohol: the amount of alcohol in wine
2. Volatile acidity: are high acetic acid in wine which leads to an unpleasant vinegar taste
3. Sulphates: a wine additive that contributes to SO2 levels and acts as an antimicrobial and antioxidant
4. Citric Acid: acts as a preservative to increase acidity (small quantities add freshness and flavour to wines)
5. Total Sulphur Dioxide: is the amount of free + bound forms of SO2
6. Density: sweeter wines have a higher density
7. Chlorides: the amount of salt in the wine
8. Fixed acidity: are non-volatile acids that do not evaporate readily
9. pH: the level of acidity
10. Free Sulphur Dioxide: it prevents microbial growth and the oxidation of wine
11. Residual sugar: is the amount of sugar remaining after fermentation stops. The key is to have a perfect
balance between — sweetness and sourness (wines > 45g/ltrs are sweet)

Program:

#load the dataset:

import pandas as pd
df=pd.read_csv('wr.csv')
print(df)

output:

3|P a ge
Info and description of dataset:
df.describe()
Output:

Retrieving the data types of attributes:

df.dtypes

Output:

Maximum and Minimum values of attributes:

df.min()
Output:

4|P a ge
df.max()
Output:

Sum and mean,std,var of a particular attribute:

Standard deviation:
df['chlorides'].std()
Output:
0.0470653020100901
Mean:
df['chlorides'].mean()
Output:
0.08746654158849279

df['chlorides'].var()
Output:
0.0022151426533009912

TO check the values are null or not:

df.isnull()
Output:

Converting dataset values into an array:

5|P a ge
df.to_numpy()
Output:

Filtering the data:

df[df.chlorides>1]

OUTPUT:

6|P a ge
Module 2
Aim:
Data Visualization: Box plot, scatter plot, histogram

Description:
A Box Plot is also known as Whisker plot is created to display the summary of the set of data values having
properties like minimum, first quartile, median, third quartile and maximum. In the box plot, a box is created
from the first quartile to the third quartile, a vertical line is also there which goes through the box at the
median.

Program:
#Boxplot
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("/content/drive/MyDrive/diabetes.csv")
#creation of boxplot with dataset
df.boxplot(figsize = (5,5))

Output:

#boxplot without dataset:

np.random.seed(10)
data = np.random.normal(100, 20, 200)
fig = plt.figure(figsize =(10, 7))
plt.boxplot(data)
plt.show()

7|P a ge
output:

Boxplot with particular attribute:

#creating boxplot with Quantiles

Q1=df['Insulin'].quantile(0.25)
Q2=df['Insulin'].quantile(0.50)
Q3=df['Insulin'].quantile(0.75)
IQR=Q3-Q1
LowestQuartile=Q1-(1.5*IQR)
HighestQuartile=Q3+(1.5*IQR)
print("first Quantile is :",Q1)
print("second Quantile is :",Q2)
print("third Quantile is :",Q3)
print("IQR is:",IQR)
print("LowestQuartile is:",LowestQuartile)
print("HighestQuartile is:",HighestQuartile)
df.boxplot(column="Insulin")

8|P a ge
output:

Model performance:

TP=int(input("enter True Positive Value"))

TN=int(input("enter True Negative Value"))
FP=int(input("enter False Positive Value"))
FN=int(input("enter False Positive Value"))
acc=(TP+TN)/(TP+TN+FP+FN)
err=(FP+FN)/(TP+TN+FP+FN)
sen=(TP)/(TP+FN)
spes=(TN)/(TN+FP)
prec=(TP)/(TP+FP)
f1=(2*prec*sen)/(prec+sen)
print("Accuracy:",acc)
print("Errorrate:",err)
print("Sensitivity:",sen)
print("Specificity:",spes)
print("Precision",prec)
print("f1-measure:",f1)

Output:

9|P a ge
Program:

#scatterplot without dataset:

import matplotlib.pyplot as plt

x =[5, 7, 8, 7, 2, 17, 2, 9,
4, 11, 12, 9, 6]
y =[99, 86, 87, 88, 100, 86,
103, 87, 94, 78, 77, 85, 86]
plt.scatter(x, y, c ="blue")
plt.show()

Output:

Program:

#Scatter plot with datset:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("/content/drive/MyDrive/diabetes.csv")
df.plot.scatter(x="Pregnancies",y="Insulin",s=20,color="violet")
df.plot.scatter(x="Glucose",y="BMI",s=20,color="indigo")
plt.title('Patients Pregnancies and insulin levels', fontsize = 20)
Output:

10 | P a g e
Program & output:

Histogram:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("/content/drive/MyDrive/diabetes.csv")
df.hist()

Histogram with particular attribute:

df['BloodPressure'].plot(kind='hist',color="purple”, bins=5)
df.hist("BMI",color="green")

output:

11 | P a g e
Module 3
Aim :
Data Preprocessing: Handling missing values, outliers, normalization, Scaling

Description:
Data preprocessing is essential before its actual use. Data preprocessing is the concept of changing the raw
data into a clean data set. The dataset is preprocessed in order to check missing values, noisy data, and other
inconsistencies before executing it to the algorithm. Data must be in a format appropriate for ML. For
example, if the algorithm processes only numeric data then if a class is labelled with “malignant” or
“benign” then it must be replaced by “0” or “1.” Data transformation and feature extraction are used to
expand the performance of classifiers and hence a classification algorithm will be able to create a
meaningful diagnosis. Only relevant features are selected and extracted for the particular disease. For
example, a cancer patient may have diabetes, so it is essential to separate related features of cancer from
diabetes. An unsupervised learning algorithm such as PCA is a familiar algorithm for feature extraction.
Supervised learning is appropriate for classification and predictive modeling.

Program :

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv("/content/drive/MyDrive/diabetes.csv")

Finding null values and deleting the columns with missing data

12 | P a g e
Deleting the row with missing data & filling the missing values

#NORMALISATION
# importing packages
import pandas as pd
import matplotlib.pyplot as plt
# create data
df = pd.DataFrame([
[180000, 110, 18.9, 1400],
[360000, 905, 23.4, 1800],
[230000, 230, 14.0, 1300],
[60000, 450, 13.5, 1500]],

columns=['Col A', 'Col B',

'Col C', 'Col D'])
# view data
display(df)

Output :

13 | P a g e
df.plot(kind = 'bar')
# copy the data
df_max_scaled = df.copy()
# apply normalization techniques
for column in df_max_scaled.columns:
df_max_scaled[column] = df_max_scaled[column] / df_max_scaled[column].abs().max()
# view normalized data
display(df_max_scaled)
df_max_scaled.plot(kind = 'bar')

Output :

# copy the data

df_min_max_scaled = df.copy()
# apply normalization techniques
for column in df_min_max_scaled.columns:
df_min_max_scaled[column]=(df_min_max_scaled[column]-
df_min_max_scaled[column].min())/(df_min_max_scaled[column].max()-
df_min_max_scaled[column].min())
# view normalized data
print(df_min_max_scaled)
import matplotlib.pyplot as plt
df_min_max_scaled.plot(kind = 'bar')

Output:

14 | P a g e
Scaling :

15 | P a g e
Module 4
Aim:
Principal Component Analysis (PCA)

Description:
A social network dataset is a dataset containing the structural information of a social network. In the general
case, a social network dataset consists of persons connected by edges. Social network datasets can represent
friendship relationships or may be extracted from a social networking Web site

Attributes are:
User ID
Gender
Age
Estimated Salary
Purchased

Principal Component Analysis (or PCA) uses linear algebra to transform the dataset into a compressed form.

Generally this is called a data reduction technique. A property of PCA is that you can choose the number of
dimensions or principal component in the transformed result.

Program:
#Importing of the dataset and slicing it into independent and dependent variables
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
dataset = pd.read_csv('/content/drive/MyDrive/Social_Network_Ads.csv')
dataset

Output:

16 | P a g e
Program:

Read and explore data:

X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

#Encoding of the data using label encoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

#feature Slicing to the training and test set of independent variables for reducing the
size to smaller values
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
X_train

Output:

17 | P a g e
Program:

#prediction of y
y_pred = classifier.predict(X_test)
y_pred

Output:

#evaluation of model using confusion matrix and accuracy

from sklearn.metrics import confusion_matrix,accuracy_score

cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
cm

18 | P a g e
Program:

Implementation of PCA

from sklearn.metrics import confusion_matrix, classification_report, accuracy_score,precision_score,recall_

score,precision_recall_curve,plot_precision_recall_curve,f1_score
from sklearn import metrics
from sklearn.decomposition import PCA
pca = PCA()
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)
from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

explained_variance = pca.explained_variance_ratio_
from sklearn.neighbors import KNeighborsClassifier
knn_classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
knn_classifier.fit(X_train, y_train)
y_pred_knn = knn_classifier.predict(X_test)
accuracy=accuracy_score(y_test, y_pred_knn)
precision = precision_score(y_test, y_pred_knn)
recall = recall_score(y_test, y_pred_knn)
specificity = metrics.recall_score(y_test, y_pred_knn, pos_label=0)
f=f1_score(y_test,y_pred_knn)
e=(1-accuracy)
print('Accuracy: ',accuracy)
print('Precision: ',precision)
print('Error:',e)
print('Recall: ',recall)
print('F1score: ',f)
print('Specificity',specificity)

Output:

19 | P a g e
Module 5
Aim :

Singular Value Decomposition (SVD)

Description:

The Singular Value Decomposition (SVD) of a matrix is a factorization of that matrix into three matrices. It
has some interesting algebraic properties and conveys important geometrical and theoretical insights about
linear transformations. It also has some important applications in data science.

Mathematics behind SVD

The SVD of mxn matrix A is given by the formula :
A=UWVT

 U: mxn matrix of the orthonormal eigenvectors of .

 VT: transpose of a nxn matrix containing the orthonormal eigenvectors of A^{T}A.
 W: a nxn diagonal matrix of the singular values which are the square roots of the
eigenvaluesof AT A.

Program :

from numpy import array

from scipy.linalg import svd
A = array([[-4,-7], [1, 4]])
print(A)
U, s, V = svd(A)
print("value of U:")
print('----------------')
print(U)
print("value of sigma(s):")
print('-----------------')
print(s)
print("value of v:")
print('-----------------')
print(V)
Output :

20 | P a g e
Program:
Singular Value Decomposition on Image:

import numpy as np
import matplotlib.pyplot as plt
from skimage import data
from skimage.color import rgb2gray
from scipy.linalg import svd

cat = data.chelsea()
plt.imshow(cat)
# convert to grayscale
gray_cat = rgb2gray(cat)

# calculate the SVD and plot the image

U,S,V_T = svd(gray_cat, full_matrices=False)
S = np.diag(S)
fig, ax = plt.subplots(5, 2, figsize=(8, 20))

curr_fig=0
for r in [5, 10, 70, 100, 200]:
cat_approx =U[:, :r] @ S[0:r, :r] @ V_T[:r, :]
ax[curr_fig][0].imshow(256-cat_approx)
ax[curr_fig][0].set_title("k = "+str(r))
ax[curr_fig,0].axis('off')
ax[curr_fig][1].set_title("Original Image")
ax[curr_fig][1].imshow(gray_cat)
ax[curr_fig,1].axis('off')
curr_fig +=1
plt.show()

Output:

21 | P a g e
22 | P a g e
Module 6

Aim :
Linear Discriminant Analysis (LDA)

Description:

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis is a

dimensionality reduction technique that is commonly used for supervised classification problems. It is
used for modelling differences in groups i.e. separating two or more classes. It is used to project the
features in higher dimension space into a lower dimension space.

Program:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('/content/drive/MyDrive/Wine.csv')
dataset

Output:

X = dataset.iloc[:, 0:13].values
y = dataset.iloc[:, 13].values

Splitting of data:
23 | P a g e
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Applying LDA
#Apply LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
lda = LDA(n_components = 2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
X_train

#fitting logistic regression

24 | P a g e
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
#predict the test results
y_pred = classifier.predict(X_test)
y_pred

#accuracy by confusion matrix

from sklearn.metrics import confusion_matrix, accuracy_score

cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test,y_pred)

#visualization of test results:

from matplotlib.colors import ListedColormap

X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() -
1, stop = X_set[:, 0].max() + 1, step = 0.01), np.arange(start = X_set[:, 1].min() -
1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, c
map = ListedColormap(('red', 'green', 'blue')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
25 | P a g e
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
plt.title('Logistic Regression (Test set)')
plt.xlabel('LD1')
plt.ylabel('LD2')
plt.legend()
plt.show()

26 | P a g e
Module 7
Aim :
Regression Analysis: Linear regression, Logistic regression, Polynomial
regression

Description:
Regression is a technique for investigating the relationship between independent variables or features and a
dependent variable or outcome. It's used as a method for predictive modelling in machine learning, in which
an algorithm is used to predict continuous outcomes.

Program:
#LINEAR REGESSION:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import PolynomialFeatures

data = pd.read_csv('bottle.csv',nrows=1000)
data['Salnty']=data['Salnty'].fillna(value=data['Salnty'].mean())
data['T_degC']=data['T_degC'].fillna(value=data['T_degC'].mean())
x=data[['Salnty']]
y=data['T_degC']

pf1=PolynomialFeatures(degree=4)
x1=pf1.fit_transform(x)
regr=LinearRegression()
regr.fit(x1,y)
y_pred=regr.predict(x1)

R_square = r2_score(y,y_pred)
print('Coefficient of Determination:', R_square)
ch='y'
while(ch=='y' or ch=='Y'):
sal=float(input("Enter Salinity to Predict :"))
sal1=pf1.fit_transform([[sal]])
p=regr.predict(sal1)
print("\nTemperature is ",p)
ch=input("Enter y to calculate more : ")

Output:
Coefficient of Determination: 0.7838361038646351
Enter Salinity to Predict :32.45
Temperature is [6.07079706]
Enter y to calculate more : n

27 | P a g e
#LOGISTIC REGRESSION

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error,mean_squared_error

data=pd.read_csv("bottle.csv",nrows=100)
data['Salnty'] = data['Salnty'].fillna(value=data['Salnty'].mean())
data['T_degC'] = data['T_degC'].fillna(value=data['T_degC'].mean())
x=data['Salnty']
y=data['T_degC']

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=3/10,random_state=0)
#Converting int 2-D arrays
x_train=x_train.to_numpy().reshape(-1, 1)
x_test=x_test.to_numpy().reshape(-1, 1)
y_train=y_train.to_numpy().reshape(-1, 1)
y_test=y_test.to_numpy().reshape(-1, 1)
reg=LinearRegression()

reg.fit(x_train,y_train)
# M and C values
print("Intercept (C) : ",reg.intercept_)
print("Slope (M) : ",reg.coef_)
#Predection of testing sets
y_pred=reg.predict(x_test)
x_pred=reg.predict(x_train)

print('Mean Absolute Error : ',mean_absolute_error(y_test,y_pred))

print('Mean Squared Error : ',mean_squared_error(y_test,y_pred))
print('Root MeanSquared Error : ',np.sqrt(mean_squared_error(y_test,y_pred)))

Output:
Intercept (C) : [131.27879866]
Slope (M) : [[-3.67906099]]
Mean Absolute Error : 1.0035873375117492
Mean Squared Error : 1.4170202202564186
Root MeanSquared Error : 1.1903865843735044

28 | P a g e
POLYNOMIAL REGRESSION
import pandas as pd
import tkinter
from tkinter import *
from sklearn.model_selection import train_test_split as tts
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error,r2_score
def polyregr():
data = pd.read_csv('bottle.csv',nrows=1000)
data['Salnty']=data['Salnty'].fillna(value=data['Salnty'].mean())
data['T_degC']=data['T_degC'].fillna(value=data['T_degC'].mean())
x=data[['Salnty']]
y=data['T_degC']
pf1=PolynomialFeatures(degree=4)
x1=pf1.fit_transform(x)
regr=LinearRegression()
regr.fit(x1,y)
v=entry.get()
pred=np.array([[v]], dtype=float)
p=regr.predict(pf1.fit_transform(pred))
t1.delete(1.0,END)
t1.insert(END,p[0])
root =Tk()
root.geometry("1000x200")
root.configure(background='black')
NameLb = Label(root, text="ENTER SALINITY:", fg="White",bg="Black")
NameLb.config(font=("Times",20,"bold"))
NameLb.grid(row=6, column=1, pady=20, sticky=W)
entry= Entry(root,width=40)
entry.grid(row=6,column=2)
dst = Button(root, text="PREDICT", command=polyregr,fg="Red",bg="Black")
dst.config(font=("Times",15,"bold"))
dst.grid(row=12, column=2,padx=10)
NameLb = Label(root, text="THE PREDICTED TEMPERATURE IS:", fg="White",bg="Black")
NameLb.config(font=("Times",20,"bold"))
NameLb.grid(row=10, column=1, pady=20, sticky=W)
t1 = Text(root, height=1, width=40,bg="Black",fg="White")
t1.config(font=("arial",15,"bold"))
t1.grid(row=10, column=2, padx=10)
root.mainloop()
Output:

29 | P a g e
30 | P a g e
Module 8
AIM:
Regularized Regression
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split, cross_val_score
from statistics import mean
data = pd.read_csv('/content/drive/MyDrive/auto-mpg.csv')
data

Output:

31 | P a g e
Evaluating the model:

32 | P a g e
Module 9
AIM:
K-Nearest Neighbour (kNN) Classifier

DESCRIPTION:
The k-nearest neighbors (KNN) algorithm is a simple, supervised machine learning algorithm that can be
used to solve both classification and regression problems. It’s easy to implement and understand, but has a
major drawback of becoming significantly slows as the size of that data in use grows.
KNN works by finding the distances between a query and all the examples in the data, selecting the specified
number examples (K) closest to the query, then votes for the most frequent label (in the case of classification)
or averages the labels (in the case of regression).

PROGRAM:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix, accuracy_sc
ore ,recall_score,precision_score,f1_score
df = pd.read_csv("/data.csv")
X= df.iloc[:, [0,3]].values
y= df.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.3, random_state = 0)
classifier = KNeighborsClassifier(n_neighbors=5,metric="e
uclidean")
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
#confusion matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
accuracy = accuracy_score(y_test, y_pred)
recall=recall_score(y_test,y_pred)
precision = precision_score(y_test, y_pred)
f1score=f1_score(y_test,y_pred)
print('Accuracy of the model:',accuracy)
print('precision of the model:',precision)
33 | P a g e
print('Recall of the model:',recall)
print('f1_score of the model:',f1score)
tp=confusion_matrix[0,0]
fp=confusion_matrix[0,1]
fn=confusion_matrix[1,0]
tn=confusion_matrix[1,1]
senstivity=tp/(tp+fn)

print('Sensitivity:',senstivity*100)
specificity=tn/(fp+tn)
print('Specificity:',specificity*100)

OUTPUT
[[67 12]
[20 21]]
Accuracy of the model: 0.7333333333333333
precision of the model: 0.6363636363636364
Recall of the model: 0.5121951219512195
f1_score of the model: 0.5675675675675675
Sensitivity: 77.01149425287356
Specificity: 63.63636363636363

KNN with Label Encoding and Scaling

Dataset: Social Networks whether purchased or not
Features: User Id, Gender, Age ,Estimated ,Purchased
Selected X features: userId,Gender,Age
Target Y : Published

PROGRAM:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
34 | P a g e
from sklearn.metrics import confusion_matrix, accuracy_sc
ore ,recall_score,precision_score,f1_score
df = pd.read_csv("/data.csv")
X= df.iloc[:, [0,2]].values
y= df.iloc[:, 4].values
#label encoding
le = LabelEncoder()
y = le.fit_transform(y)

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size = 0.3, random_state = 0)
#Scaling
sc= StandardScaler()
X_train= sc.fit_transform(X_train)
X_test= sc.transform(X_test)
classifier = KNeighborsClassifier(n_neighbors=5,metric="e
uclidean")
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
#confusion matrix
confusion_matrix = confusion_matrix(y_test, y_pred)
print(confusion_matrix)
accuracy = accuracy_score(y_test, y_pred)
recall=recall_score(y_test,y_pred)
precision = precision_score(y_test, y_pred)
f1score=f1_score(y_test,y_pred)
print('Accuracy of the model:',accuracy)
print('precision of the model:',precision)
print('Recall of the model:',recall)
print('f1_score of the model:',f1score)
tp=confusion_matrix[0,0]
fp=confusion_matrix[0,1]
fn=confusion_matrix[1,0]
tn=confusion_matrix[1,1]
senstivity=tp/(tp+fn)
35 | P a g e
print('Sensitivity:',senstivity*100)
specificity=tn/(fp+tn)
print('Specificity:',specificity*100)

OUTPUT
[[67 12]
[13 28]]
Accuracy of the model: 0.7916666666666666
precision of the model: 0.7
Recall of the model: 0.6829268292682927
f1_score of the model: 0.6913580246913581
Sensitivity: 83.75
Specificity: 70.0

KNN with Principal Component Analysis

PROGRAM
import pandas as pd
from sklearn.preprocessing import StandardScaler,LabelEnc
oder
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix,accuracy_sco
re,recall_score,precision_score,f1_score
df=pd.read_csv("/data.csv")
X= df.iloc[:, [0,2]].values
y= df.iloc[:, 4].values
#Principal Component Analysis
pca=PCA(n_components=2)
principalComponents=pca.fit_transform(X)
principalDf=pd.DataFrame(data=principalComponents,columns
=["pc1","pc2"])
finalDf=pd.concat([principalDf,df[['Purchased']]],axis=1)
finalDf=pd.DataFrame(finalDf)
#print(finalDf)
X=finalDf[['pc1','pc2']].values
y=finalDf['Purchased'].values

36 | P a g e
#Label Encoding and Scaling
le = LabelEncoder()
y = le.fit_transform(y)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_s
ize=0.3,random_state=0)

sc= StandardScaler()
X_train= sc.fit_transform(X_train)
X_test= sc.transform(X_test)
#kNN classifier
classifier = KNeighborsClassifier(n_neighbors=10,metric='
euclidean')
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
#Confusion Matrix
confusion_matrix=confusion_matrix(y_test,y_pred)
print(confusion_matrix)
accuracy=accuracy_score(y_test,y_pred)*100
precision=precision_score(y_test,y_pred)*100
recall=recall_score(y_test,y_pred)*100
f1_measure=f1_score(y_test,y_pred)*100
print('Accuracy of the model:',accuracy)
print('Precision of the model:',precision)
print('Recall of the model:',recall)
print('F1 Measure of the model:',f1_measure)
tp=confusion_matrix[0,0]
fp=confusion_matrix[0,1]
fn=confusion_matrix[1,0]
tn=confusion_matrix[1,1]
senstivity=tp/(tp+fn)
print('Sensitivity:',senstivity*100)
specificity=tn/(fp+tn)
print('Specificity:',specificity*100)

37 | P a g e
OUTPUT
[[75 4]
[13 28]]
Accuracy of the model: 85.83333333333333
Precision of the model: 87.5
Recall of the model: 68.29268292682927
F1 Measure of the model: 76.7123287671233
Sensitivity: 85.22727272727273

Specificity: 87.5

38 | P a g e
Module 10
AIM:
Support Vector Machines (SVMs)

DESCRIPTION:
Support vector machines (SVMs) are a set of supervised learning methods used
for classification, regression and outliers detection.
The advantages of support vector machines are:
 Effective in high dimensional spaces.
 Still effective in cases where number of dimensions is greater than the number of samples.

PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split as tts
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score

data=pd.read_csv('Social_Network_Ads.csv')
x=data.iloc[:,[2,3]].values
y=data.iloc[:,4].values

x_train,x_test,y_train,y_test=tts(x,y,test_size=0.25,random_state=0)

sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)

model=SVC(kernel='rbf',random_state=0)
# Why ‘rbf’, because it is nonlinear and gives better results as compared to linear
model.fit(x_train,y_train)
y_pred=model.predict(x_test)

cm=confusion_matrix(y_test,y_pred)
print("Confusion Matrix : \n",cm)
print("\nAccuracy Score :",accuracy_score(y_test,y_pred)*100)

OUTPUT:
Confusion Matrix :
[[64 4]
[ 3 29]]

Accuracy Score : 93.0

39 | P a g e
Module 11
AIM:
Random Forest model

DESCRIPTION:
The random forest is a classification algorithm consisting of many decisions trees. It uses bagging
and feature randomness when building each individual tree to try to create an uncorrelated forest of trees
whose prediction by committee is more accurate than that of any individual tree.

PROGRAM:
import pandas as pd
from sklearn.model_selection import train_test_split as tts
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.ensemble import RandomForestClassifier

data=pd.read_csv("Social_Network_Ads.csv")
x=data.iloc[:,[2,3]].values
y=data.iloc[:,4].values

x_train,x_test,y_train,y_test=tts(x,y,test_size=0.3,random_state=0)

sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)

forest=RandomForestClassifier(criterion='gini',n_estimators=10)

forest.fit(x_train, y_train)

y_pred = forest.predict(x_test)
cm=confusion_matrix(y_test,y_pred)
print("Confusion Matrix : \n",cm)
print("\nAccuracy Score :",accuracy_score(y_test,y_pred)*100)

OUTPUT:
Confusion Matrix :
[[72 7]
[ 5 36]]

Accuracy Score : 90.0

40 | P a g e
Module 12
AIM:
AdaBoost Classifier and XGBoost

DESCRIPTION:

AdaBoost:
AdaBoost, short for Adaptive Boosting, is a statistical classification meta-algorithm formulated
by Yoav Freund and Robert Schapire in 1995, who won the 2003 Gödel Prize for their work. It can be used
in conjunction with many other types of learning algorithms to improve performance. The output of the
other learning algorithms ('weak learners') is combined into a weighted sum that represents the final output
of the boosted classifier. Usually, AdaBoost is presented for binary classification, although it can be
generalized to multiple classes or bounded intervals on the real line.

XGBoost:
XGBoost is an optimized Gradient Boosting Machine Learning library. It is originally written in C++,
but has API in several other languages. The core XGBoost algorithm is parallelizable i.e. it does
parallelization within a single tree.

PROGRAM FOR ADABOOST:

import numpy as nm
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.model_selection import train_test_split
df= pd.read_csv('diabetes.csv')
x=df[['Age','Glucose']]
y=df['Outcome']
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)
abc = AdaBoostClassifier(n_estimators=50, learning_rate=1)
model = abc.fit(x_train, y_train)
y_pred = model.predict(x_test)
Accuracy =accuracy_score(y_test,y_pred)
cm=confusion_matrix(y_test,y_pred)
print("Confusion Matrix : \n",cm)
print("Accuracy:",Accuracy*100)
OUTPUT:
Confusion Matrix :
[[112 18]
[ 32 30]]
Accuracy: 73.95833333333334

41 | P a g e
PROGRAM FOR XGBOOST:
import pandas as pd
from sklearn.model_selection import train_test_split as tts
from sklearn.metrics import accuracy_score
from xgboost import XGBClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score

df= pd.read_csv('diabetes.csv')
x=df[['Age','Glucose']]
y=df['Outcome']

x_train, x_test, y_train, y_test= tts(x, y, test_size= 0.25, random_state=0)

sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)

model = XGBClassifier()
model.fit(x_train, y_train)
y_pred = model.predict(x_test)

cm=confusion_matrix(y_test,y_pred)
print("Confusion Matrix : \n",cm)
Accuracy =accuracy_score(y_test,y_pred)
print("Accuracy:",Accuracy*100)
OUTPUT:
Confusion Matrix :
[[103 27]
[ 30 32]]
Accuracy: 70.3125

42 | P a g e

Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
No ratings yet
K-Nearest Neighbors For Diabetes Prediction: Malik Yousaf (F2020019038) Ahsan Rauf (F2020019057)
15 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
LAB FILE-Shelly Sharma
No ratings yet
LAB FILE-Shelly Sharma
47 pages
Eda
No ratings yet
Eda
48 pages
EXP-2
No ratings yet
EXP-2
6 pages
DP
No ratings yet
DP
9 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
ml lab
No ratings yet
ml lab
14 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
1737527078055
No ratings yet
1737527078055
111 pages
data science practicals
No ratings yet
data science practicals
47 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
ml lab exam document
No ratings yet
ml lab exam document
14 pages
EXP-2 ML
No ratings yet
EXP-2 ML
6 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
EDA_INDEPTH
No ratings yet
EDA_INDEPTH
19 pages
Data Science Practicals - Ipynb
No ratings yet
Data Science Practicals - Ipynb
54 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
8. ML_Lab Manual
No ratings yet
8. ML_Lab Manual
54 pages
index
No ratings yet
index
4 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
IDML Lab Programs
No ratings yet
IDML Lab Programs
5 pages
EDP-3[2]
No ratings yet
EDP-3[2]
16 pages
Data science and analtics Laboratory
No ratings yet
Data science and analtics Laboratory
21 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
ds
No ratings yet
ds
28 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
ML_Unit_2
No ratings yet
ML_Unit_2
52 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Ex No3
No ratings yet
Ex No3
17 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Data_Cleaning
No ratings yet
Data_Cleaning
22 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
DA lab
No ratings yet
DA lab
27 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
C Language Programming Codes
From Everand
C Language Programming Codes
Durgesh
No ratings yet
Advancing AI With Integrity: Ethical Challenges and Solutions in Neural Machine Translation
No ratings yet
Advancing AI With Integrity: Ethical Challenges and Solutions in Neural Machine Translation
14 pages
A Proposed Model For Card Fraud Detection Based On CatBoost and Deep Neural Network
No ratings yet
A Proposed Model For Card Fraud Detection Based On CatBoost and Deep Neural Network
10 pages
Ai ( x ) Practice Paper 3-1
No ratings yet
Ai ( x ) Practice Paper 3-1
4 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Predictive Modeling
No ratings yet
Predictive Modeling
42 pages
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
No ratings yet
Real Time Prediction of Rock Mass Classification 2022 Journal of Rock Mecha
21 pages
Final Thesis Report-Bhuvanesh Kumar J
No ratings yet
Final Thesis Report-Bhuvanesh Kumar J
72 pages
MovieRecomendation
No ratings yet
MovieRecomendation
6 pages
Handwritten Digit Recognition with ML Models
No ratings yet
Handwritten Digit Recognition with ML Models
41 pages
Bender-Evaluation Guidelines For Machine Learning Tools in The Chemical Sciences-Article-Review-Machine Learning Chemistry Materials science-2022-NA
No ratings yet
Bender-Evaluation Guidelines For Machine Learning Tools in The Chemical Sciences-Article-Review-Machine Learning Chemistry Materials science-2022-NA
15 pages
Python Documentation
No ratings yet
Python Documentation
4 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Chaltu Fita
No ratings yet
Chaltu Fita
84 pages
Evaluation Metrics Class X Extended
No ratings yet
Evaluation Metrics Class X Extended
7 pages
Alerting Policies
No ratings yet
Alerting Policies
56 pages
Ahmad 2017 Ijca 915758
No ratings yet
Ahmad 2017 Ijca 915758
6 pages
Prediction of Stock Returns Using Machine Learning: A Project Report Submitted To Manipal Academy of Higher Education
No ratings yet
Prediction of Stock Returns Using Machine Learning: A Project Report Submitted To Manipal Academy of Higher Education
48 pages
Proposal Defense v6
No ratings yet
Proposal Defense v6
55 pages
Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques
No ratings yet
Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques
13 pages
Machine Learning Intro & Evaluation Metrics
No ratings yet
Machine Learning Intro & Evaluation Metrics
49 pages
Analytics For Improving Talent Acquisition Processes ICADABAI2015l
No ratings yet
Analytics For Improving Talent Acquisition Processes ICADABAI2015l
16 pages
Decision Models For Record Linkage
No ratings yet
Decision Models For Record Linkage
15 pages
Attention-Guided Convolutional Neural Network For Detecting
No ratings yet
Attention-Guided Convolutional Neural Network For Detecting
4 pages
Machine_Learning_Lab
No ratings yet
Machine_Learning_Lab
46 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Camm 4e Ch09 PPT
No ratings yet
Camm 4e Ch09 PPT
71 pages
Fraud Detection in Financial Transaction
No ratings yet
Fraud Detection in Financial Transaction
5 pages
YOLO-Based Vehicle Plate Number Recognition With Unconstrained Conditions
No ratings yet
YOLO-Based Vehicle Plate Number Recognition With Unconstrained Conditions
6 pages
Deep Learning-Based Depression Detection From Social Media
No ratings yet
Deep Learning-Based Depression Detection From Social Media
20 pages
9 AI PROJECT CYCLE NOTES
No ratings yet
9 AI PROJECT CYCLE NOTES
5 pages