0% found this document useful (0 votes)
78 views75 pages

109 Sourabh Vivek Chougule

This document contains an index of 24 Python programs related to machine learning algorithms and techniques including linear regression, logistic regression, KNN, clustering, neural networks, and natural language processing. The programs are applied to various datasets to predict values, classify images and text, and gain insights from data. Performance of each model is evaluated using various metrics to interpret and compare results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views75 pages

109 Sourabh Vivek Chougule

This document contains an index of 24 Python programs related to machine learning algorithms and techniques including linear regression, logistic regression, KNN, clustering, neural networks, and natural language processing. The programs are applied to various datasets to predict values, classify images and text, and gain insights from data. Performance of each model is evaluated using various metrics to interpret and compare results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Subject: KR&AI, ML and DM Practical Sem-III MCA-II yr

Name: Sourabh vivek chougule Roll No:- 109


INDEX

Sr. Pg.
Date Lab Title
No No.

1 1-Dec-21 Write a Python program to Find the correlation matrix 4

Plot the correlation plot on dataset and visualize giving an


2 3-Dec-21 5
overview of relationships among data on iris data
Implementing the ANOVA testing on iris dataset. Using only
one independent variable i.e. Species (iris-setosa, iris-
3 08-Dec-21 versicolor, iris-virginica) which are categorical and sepal 9
width as a continuous variable.
Write a Python program to predict mpg (miles per gallon) for a car
based on variable wt by applying simple linear regression on
'mtcars' dataset (Use Training data 80% and Testing Data 20%).
4 29-Nov-21 Record the performance of model in terms of MAE, MSE, RMSE and 13
R-squared value.
Change Training data to 70% and Testing Data 30%, compare &
interpret the performance of your model.
Write a Python program to predict mpg (miles per gallon) for a car
based on variables wt, cyl & disp by applying multi-linear
regression on 'mtcars' dataset (Use Training data 80% and
Testing Data 20%).
5 07-Dec-21 Record the performance of model in terms of MAE, MSE, RMSE and 16
R-squared value.
Remove variable disp from the feature set and check the
performance again. Compare & interpret the performance of your
model.
Write a Python program to predict mpg (miles per gallon) for a car
based on variables wt, cyl & disp by applying multi-linear
regression on 'mtcars' dataset (Use Training data 80% and
Testing Data 20%).
6 07-Jan-22 19
Record the performance of model in terms of MAE, MSE, RMSE and
R-squared value .
Replace disp by drat variable in the feature set and check the
performance again. Interpret the performance of your model.
Write a Python program to predict fruit (Apple or Orange) based
on its size & weight by applying logistic regression on
7 17-Jan-22 'apples_and_oranges' dataset (Use Training data 80% and 22
Testing Data 20%).
Evaluate the performance of the model using Accuracy Score

1|Page
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict fruit (Apple or Orange) based
on its size & weight by applying K-Nearest Neighbour (KNN)
model on 'apples_and_oranges' dataset (Use Training data 80%
8 21-Jan-22 and Testing Data 20%). 25
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Implementing the K-mean Algorithm on unsupervised data of a
mall, that contains the basic information (ID, age, gender, income,
9 14-Dec-21 27
spending score) about the customers. Finding the clusters based
on the income and spending.
Implementing the Agglomerative Hierarchical Clustering
Algorithm on unsupervised data of a mall, that contains the basic
10 16-Dec-21 30
information (ID, age, gender, income, spending score) about the
customers. Finding the clusters based on the income and spending.
Write a Python program to create an Association algorithm for
11 32
supervised classification on any dataset
Write a Python program to predict species (Setosa, Versicolor, or
Viriginica) for a new iris flower based on length & width of its
petals and sepals by applying Decision Tree model on 'iris'
12 11-Feb-22 dataset (Use Training data 80% and Testing Data 20%). 35
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict species (Setosa, Versicolor, or
Viriginica) for a new iris flower based on length & width of its
petals and sepals by applying Naive Bays Classification model on
13 07-Feb-22 'iris' dataset (Use Training data 80% and Testing Data 20%). 37
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict fruit (Apple or Orange) based
on its size & weight by applying Support Vector Machine (SVM)
model on 'apples_and_oranges' dataset (Use Training data 80%
14 24-Jan-22 and Testing Data 20%). 39
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict species (Setosa, Versicolor, or
Viriginica) for a new iris flower based on length & width of its
petals and sepals by applying Support Vector Machine (SVM)
model on 'iris' dataset (Use Training data 80% and Testing Data
15 28-Jan-22 41
20%).
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model and interpret the model performance.
Write a Python program to predict whether a person will have
stroke or not, based on age & bmi by applying Support Vector
Machine (SVM) model on 'healthcare-dataset-stroke-data'
dataset (Use Training data 80% and Testing Data 20%).
16 04-Feb-22 43
Evaluate the performance of the model using Accuracy Score
metric, Classification Report & Confusion Matrix, AUC ROC score
for the model by tunning hyperparameters for SVM model and
interpret the model performance.

2|Page
Python Program to implement Text Mining Basics:
i. Tokenization
17 20-Dec-22 ii. Finding frequency distinct 45
iii. Removing punctuations
iv. Stemming
Program to implement Text Mining: Sentimental Analysis,
18 09-Feb-22 using RNN LSTM learning model on DataSet of tweets on an 47
airline management.
19 21-Jan-22 Implementing python visualizations on cluster data 49

Creating & visualizing a simple ANN problem to understand


20 01-Feb-22 52
the implementation of an artificial neuron using python
Program to pre-process data of Australian weather and
21 02-Feb-22 implementing an Artificial Neural Network to predict the 55
whether
Write a Python program to prepare data, to be given to a
22 04-Feb-22 convolutional neural network CNN and create an Image 63
Classifier. Use the cat and dog training and test dataset.
Write a Python program to implement RNN by building a
23 09-Feb-22 character level prediction RNN and train in on the text of “Harry 67
Potter and the Philosopher’s Stone”.
Write a Python program to implement GAN, to create a curve
24 10-Feb-22 resembling a sine wave. Python library pytorch must be used 72
to set a random generator.

3|Page
Q1) Write a Python program to Find the correlation matrix.

Ans: -

import numpy as np

# x represents the total sale in

# dollars

x = [215, 325, 185, 332, 406, 522, 412,

614, 544, 421, 445, 408],

# y represents the temperature on

# each day of sale

y = [14.2, 16.4, 11.9, 15.2, 18.5, 22.1,

19.4, 25.1, 23.4, 18.1, 22.6, 17.2]

# create correlation matrix

matrix = np.corrcoef(x, y)

# print matrix

print(matrix)

OUTPUT:-

[[1. 0.95750662]
[0.95750662 1. ]]

4|Page
Q2) Plot the correlation plot on dataset and visualize giving an overview of relationships among data
on iris data
Ans: -

#importing necessary libraries


from scipy import stats as st
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

#mount the drive


from google.colab import drive
drive.mount("/content/drive", force_remount=True)

#reading and printing csv data


data = pd.read_csv("/content/drive/MyDrive/Dataset/Iris.csv")
data.head()

#visualizing coeficent
np.corrcoef(data['SepalLengthCm'],data['SepalWidthCm'])

#calculating pearson's coeficient


st.pearsonr(data['SepalLengthCm'],data['SepalWidthCm'])

#it will give correlation of each matrics to each one


data.corr()

5|Page
#Plotting heatmap. its color decides degree of correaltion
sns.heatmap(data.corr())

#Plottong scatterplots of each quantity to visualize corration


plt.figure(figsize=(15,8))
plt.subplot(231)
sns.scatterplot(data['SepalLengthCm'],data['SepalWidthCm'])
plt.subplot(232)
sns.scatterplot(data['SepalLengthCm'],data['PetalLengthCm'])
plt.subplot(233)
sns.scatterplot(data['SepalLengthCm'],data['PetalWidthCm'])
plt.subplot(234)
sns.scatterplot(data['SepalWidthCm'],data['SepalLengthCm'])
plt.subplot(235)
sns.scatterplot(data['SepalWidthCm'],data['PetalLengthCm'])
plt.subplot(236)
sns.scatterplot(data['SepalWidthCm'],data['PetalWidthCm'])

6|Page
plt.figure(figsize=(15,8))
plt.subplot(231)
sns.scatterplot(data['PetalLengthCm'],data['SepalLengthCm'])
plt.subplot(232)
sns.scatterplot(data['PetalLengthCm'],data['SepalWidthCm'])
plt.subplot(233)
sns.scatterplot(data['PetalLengthCm'],data['PetalWidthCm'])
plt.subplot(234)
sns.scatterplot(data['PetalWidthCm'],data['SepalLengthCm'])
plt.subplot(235)
sns.scatterplot(data['PetalWidthCm'],data['SepalWidthCm'])
plt.subplot(236)
sns.scatterplot(data['PetalWidthCm'],data['PetalLengthCm'])

7|Page
#Spearmans coeficient
from scipy.stats import spearmanr
spearmanr(data['SepalLengthCm'],data['SepalWidthCm'])

8|Page
Q3) Implementing the ANOVA testing on iris dataset. Using only one independent variable i.e. Species
(iris-setosa, iris-versicolor, iris-virginica) which are categorical and sepal width as a continuous
variable.

Ans: -

# importing the necessary libraries


from sklearn.datasets import load_iris
import pandas as pd
import seaborn as sns
from sklearn.feature_selection import f_classif
from sklearn.feature_selection import SelectKBest
from scipy.stats import shapiro
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.sandbox.stats.multicomp import TukeyHSDResults
from statsmodels.graphics.factorplots import interaction_plot
from pandas.plotting import scatter_matrix

# loading the dataset


df = pd.read_csv('D:\SIBAR MCA\KR&AI 2021\Lab AI\DataSet\Iris.csv')
df.head()

dataframe_iris=pd.DataFrame(df.data,columns=['sepalLength','sepalWidth','petal
Length','petalWidth'])

# Visualising the dataframe by plotting


scatter_matrix(dataframe_iris[['sepalLength', 'sepalWidth',
'petalLength','petalWidth']],figsize=(15,10))
plt.show()

ID=[]
Target=[]

9|Page
for i in range(0,150):
ID.append(i)
dataframe=pd.DataFrame(ID,columns=['ID'])
dataframe_iris_new=pd.concat([dataframe_iris,dataframe_iris1,dataframe],axis=1
)
dataframe_iris_new.columns

fig =
interaction_plot(dataframe_iris_new.sepalWidth,dataframe_iris_new.target,
dataframe_iris_new.ID,colors=['red','blue','green'],
ms=12)

dataframe_iris_new.info()

dataframe_iris_new.describe()

10 | P a g e
# To implement Anova test we have to create null hypothesis and alternate
hypothesis
# Null hypothesis=sample means are equal
# Alternate hypothesis=sample means are not equal

print(dataframe_iris_new['sepalWidth'].groupby(dataframe_iris_new['target']).m
ean())

dataframe_iris_new.mean()

# Anova calculate f-value and p-value.


# P-value:-p-value is used to evaluate hypothesis results.
# If p-value<0.05 we have to reject. And p-value>0.05 we have to accept null
hypothesis.
# F-value:-f-value is the ratio of variance between groups and variance within
groups.
# If f-value is close to 1 then we say that our null hypothesis is true
# check whether variance between groups are equal Anova use levene/barlett
test.
# Check normal distribution of data(shapiro-wilk test)

stats.shapiro(dataframe_iris_new['sepalWidth'][dataframe_iris_new['target']])

(0.7824662327766418, 1.1907719276761652e-13)

OUTPUT
Interpretation: -As p-value is significant we reject null hypothesis.

# Check equality of variance between groups(levene/bartlett test)

11 | P a g e
p_value=stats.levene(dataframe_iris_new['sepalWidth'],dataframe_iris_new['targ
et'])
p_value

LeveneResult(statistic=55.1738582824089, pvalue=1.1695737027924642e-12)

OUTPUT
Interpretation: As p-value is significant we reject null hypothesis

F_value,P_value=stats.f_oneway(dataframe_iris_new['sepalWidth'],dataframe_iris
_new['target'])
print("F_value=",F_value,",","P_value=",P_value)

F_value= 737.2872570149498 , P_value= 1.418242288711535e-82

OUTPUT:
Interpretation: As f-value is greater than 1.0, samples have different mean. We reject the null
hypothesis.

12 | P a g e
Q4) Write a Python program to predict mpg (miles per gallon) for a car based on variable wt by
applying simple linear regression on 'mtcars' dataset (Use Training data 80% and Testing Data
20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Change Training data to 70% and Testing Data 30%, compare & interpret the performance of your
model.

Ans: -
1st part-

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")

print(df)

print(df.head(5))

x = df.iloc[:,[6]].values

y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=3)

print(x_test)

from sklearn.linear_model import LinearRegression

LinearRegressor = LinearRegression()

LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

from math import sqrt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

13 | P a g e
print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))

print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output-----

Mean Absolute Error: 2.71

Root Mean Absolute Error: 1.65

Mean Squared Error: 10.18

R2-score: 0.81

2nd part -

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")

print(df)

print(df.head(5))

x = df.iloc[:,[6]].values

y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .3,random_state=3)

print(x_test)

from sklearn.linear_model import LinearRegression

LinearRegressor = LinearRegression()

LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

14 | P a g e
from math import sqrt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))

print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output: ----

Mean Absolute Error: 2.31

Root Mean Absolute Error: 1.52

Mean Squared Error: 9.12

R2-score: 0.84

15 | P a g e
Q5) Write a Python program to predict mpg (miles per gallon) for a car based on variables wt, cyl &
disp by applying multi-linear regression on 'mtcars' dataset (Use Training data 80% and Testing
Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value.
Remove variable disp from the feature set and check the performance again. Compare & interpret
the performance of your model.

Ans: -
1st part-

from google.colab import drive


drive.mount("/content/drive",force_remount = True)

import numpy as np
import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")

print(df)
print(df.head(5))

x = df.iloc[:,[2,3,6]].values
y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=3)

print(x_test)

from sklearn.linear_model import LinearRegression


LinearRegressor = LinearRegression()
LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

from math import sqrt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

16 | P a g e
print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))
print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output-----
Mean Absolute Error: 2.71

Root Mean Absolute Error: 1.65

Mean Squared Error: 10.18


R2-score: 0.81

2nd part-
from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np
import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")
print(df)

print(df.head(5))

x = df.iloc[:,[2,6]].values

y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=3)


print(x_test)

from sklearn.linear_model import LinearRegression


LinearRegressor = LinearRegression()

LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

from math import sqrt

17 | P a g e
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))


print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output: ----

Mean Absolute Error: 2.66


Root Mean Absolute Error: 1.63

Mean Squared Error: 9.84

R2-score: 0.81

18 | P a g e
Q6) Write a Python program to predict mpg (miles per gallon) for a car based on variables wt, cyl &
disp by applying multi-linear regression on 'mtcars' dataset (Use Training data 80% and Testing
Data 20%).
Record the performance of model in terms of MAE, MSE, RMSE and R-squared value .
Replace disp by drat variable in the feature set and check the performance again. Interpret the
performance of your model.
Ans: -

1st part-

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")

print(df)

print(df.head(5))

x = df.iloc[:,[2,3,6]].values

y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=3)

print(x_test)

from sklearn.linear_model import LinearRegression

LinearRegressor = LinearRegression()

LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

from math import sqrt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))

19 | P a g e
print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output-----

Mean Absolute Error: 2.71

Root Mean Absolute Error: 1.65

Mean Squared Error: 10.18

R2-score: 0.81

2nd part -

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/mtcars.csv")

print(df)

print(df.head(5))

x = df.iloc[:,[2,5,6]].values

y = df.iloc[:,1].values

print(x)

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=3)

print(x_test)

from sklearn.linear_model import LinearRegression

LinearRegressor = LinearRegression()

LinearRegressor.fit(x_train, y_train)

y_pred = LinearRegressor.predict(x_test)

from math import sqrt

from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

print('Mean Absolute Error: %.2f' % mean_absolute_error(y_test,y_pred))

20 | P a g e
print('Root Mean Absolute Error: %.2f' % sqrt(mean_absolute_error(y_test,y_pred)))

print('Mean Squared Error: %.2f' % mean_squared_error(y_test,y_pred))

print('R2-score: %.2f' % r2_score(y_test,y_pred))

Output: ----

Mean Absolute Error: 2.66

Root Mean Absolute Error: 1.63

Mean Squared Error: 9.84

R2-score: 0.81

21 | P a g e
Q7) Write a Python program to predict fruit (Apple or Orange) based on its size & weight by applying
logistic regression on 'apples_and_oranges' dataset (Use Training data 80% and Testing Data
20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

1st part-

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/apples_and_oranges.csv")

print(df.head(5))

x = df.iloc[:,0:2].values

y = df.iloc[:,2].values.reshape(-1,1)

print(x)

from sklearn.model_selection import train_test_split

#x and y are class and weights/size

#test_size= 20 for test and 80 for train if .2 is written

#random_state = can be anything 2,43,45, these are combinations of random data to be alloted to training and
testing

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=2)

print(x_test)

from sklearn.linear_model import LogisticRegression

LogRegressor = LogisticRegression()

LogRegressor.fit(x_train, y_train)

y_pred = LogRegressor.predict(x_test)

# y_pred = LogRegressor.predict([[70,5.30]])

print(y_pred)

Output-----

['orange' 'orange' 'apple' 'orange' 'orange' 'apple' 'orange' 'orange']

22 | P a g e
2nd part-

from google.colab import drive

drive.mount("/content/drive",force_remount = True)

import numpy as np

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Dataset/apples_and_oranges.csv")

print(df.head(5))

x = df.iloc[:,0:2].values

y = df.iloc[:,2].values.reshape(-1,1)

print(x)

from sklearn.model_selection import train_test_split

#x and y are class and weights/size

#test_size= 20 for test and 80 for train if .2 is written

#random_state = can be anything 2,43,45, these are combinations of random data to be alloted to training and
testing

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = .2,random_state=2)

print(x_test)

from sklearn.linear_model import LogisticRegression

LogRegressor = LogisticRegression()

LogRegressor.fit(x_train, y_train)

y_pred = LogRegressor.predict(x_test)

# y_pred = LogRegressor.predict([[70,5.30]])

print(y_pred)

from sklearn.metrics import confusion_matrix,accuracy_score,classification_report,roc_auc_score

print("Accuracy_Score", accuracy_score(y_test, y_pred))

print("Confusion_matrix\n", confusion_matrix(y_test, y_pred))

print("Classification_Report\n", classification_report(y_test, y_pred))

23 | P a g e
#print("Roc_Auc_Score\n", roc_auc_score(y_test, y_pred))

Output: ----

Accuracy_Score 1.0

Confusion_matrix

[[2 0]

[0 6]]

Classification_Report

precision recall f1-score support

apple 1.00 1.00 1.00 2

orange 1.00 1.00 1.00 6

accuracy 1.00 8

macro avg 1.00 1.00 1.00 8

weighted avg 1.00 1.00 1.00 8

-------------------------------------------------------------------------------------------------------------------------------------------------------

24 | P a g e
Q8) Write a Python program to predict fruit (Apple or Orange) based on its size & weight by
applying K-Nearest Neighbour (KNN) model on 'apples_and_oranges' dataset (Use Training data
80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

#Mounted the drive

from google.colab import drivec

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/drive/My Drive/Dataset/apples_and_oranges.csv")

x = df.iloc[:, 0:2].values

y = df.iloc[:, 2].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training KNN Model using fit() function

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)

knn.fit(x_train, y_train)

#Predicted whether fruit is apple or orange

y_pred = knn.predict(x_test)

pred_prob = knn.predict_proba(x_test) #Predicted probability

#print("The predicted probabilities are", pred_prob)

#w = input("Enter Weight of fruit")

#s = input("Enter size of fruit")

#y_pred = knn.predict([[w,s]])

#print("The fruit is", y_pred)

25 | P a g e
#Evaluating KNN Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

print("AUC ROC Score is : \n ", roc_auc_score(y_test, pred_prob[:,1]))

Output: -

Accuracy Score : 1.0

Classification Report :

precision recall f1-score support

apple 1.00 1.00 1.00 4

orange 1.00 1.00 1.00 4

accuracy 1.00 8

macro avg 1.00 1.00 1.00 8

weighted avg 1.00 1.00 1.00 8

Confusion Matrix:

[[4 0]

[0 4]]

AUC ROC Score is : 1.0

26 | P a g e
Q9) Implementing the K-mean Algorithm on unsupervised data of a mall, that contains the basic
information (ID, age, gender, income, spending score) about the customers. Finding the clusters
based on the income and spending.
Ans: -
# importing the necessary libraries
from sklearn.preprocessing import StandardScaler

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import os
import warnings

warnings.filterwarnings('ignore')

# loading the dataset


df = pd.read_xl('D:\SIBAR MCA\KR&AI 2021\Lab AI\DataSet\Mall_Customers.csv')
df.head()

OUTPUT:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
1 Male 19 15 39
2 Male 21 15 81
3 Female 20 16 6
4 Female 23 16 77
5 Female 31 17 40

# renaming the heads


df.rename(index=str, columns={'Annual Income (k$)': 'Income',
'Spending Score (1-100)': 'Score'},
inplace=True)

# data in a detailed way with pairplot


X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()

from sklearn.cluster import KMeans

27 | P a g e
clusters = []

for i in range(1, 11):


km = KMeans(n_clusters=i).fit(X)
clusters.append(km.inertia_)

fig, ax = plt.subplots(figsize=(12, 8))


sns.lineplot(x=list(range(1, 11)), y=clusters, ax=ax)
ax.set_title('Searching for Elbow')
ax.set_xlabel('Clusters')
ax.set_ylabel('Inertia')

# Annotate arrow
ax.annotate('Possible Elbow Point', xy=(3, 140000), xytext=(3, 50000),
xycoords='data',
arrowprops=dict(arrowstyle='->', connectionstyle='arc3',
color='blue', lw=2))

ax.annotate('Possible Elbow Point', xy=(5, 80000), xytext=(5, 150000),


xycoords='data',
arrowprops=dict(arrowstyle='->', connectionstyle='arc3',
color='blue', lw=2))

plt.show()

# based on the elbow points can have 3 or 5 clusters, creating 5 cluster to


classify based on income and spending

km5 = KMeans(n_clusters=5).fit(X)

X['Labels'] = km5.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'],
palette=sns.color_palette('hls', 5))
plt.title('KMeans with 5 Clusters')
plt.show()

28 | P a g e
Output:

We can analyze our 5 clusters in detail now:


• Label 0 is low income and low spending
• Label 1 is high income and high spending
• Label 2 is mid income and mid spending
• Label 3 is high income and low spending
• Label 4 is low income and high spending

29 | P a g e
Q10) Implementing the Agglomerative Hierarchical Clustering Algorithm on unsupervised data of
a mall, that contains the basic information (ID, age, gender, income, spending score) about the
customers. Finding the clusters based on the income and spending.
Ans: -
# importing the necessary libraries
from sklearn.preprocessing import StandardScaler

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

import os
import warnings

warnings.filterwarnings('ignore')

# loading the dataset


df = pd.read_xl('D:\SIBAR MCA\KR&AI 2021\Lab AI\DataSet\Mall_Customers.csv')
df.head()

OUTPUT:
CustomerID Gender Age Annual Income (k$) Spending Score (1-100)
1 Male 19 15 39
2 Male 21 15 81
3 Female 20 16 6
4 Female 23 16 77
5 Female 31 17 40

# renaming the heads


df.rename(index=str, columns={'Annual Income (k$)': 'Income',
'Spending Score (1-100)': 'Score'},
inplace=True)

# data in a detailed way with pairplot


X = df.drop(['CustomerID', 'Gender'], axis=1)
sns.pairplot(df.drop('CustomerID', axis=1), hue='Gender', aspect=1.5)
plt.show()

30 | P a g e
from sklearn.cluster import AgglomerativeClustering

agglom = AgglomerativeClustering(n_clusters=5,
linkage='average').fit(X)

X['Labels'] = agglom.labels_
plt.figure(figsize=(12, 8))
sns.scatterplot(X['Income'], X['Score'], hue=X['Labels'],
palette=sns.color_palette('hls', 5))
plt.title('Agglomerative with 5 Clusters')
plt.show()

OUTPUT

31 | P a g e
Q11) Write a Python program to create an Association algorithm for supervised classification on any
dataset

Ans: -

import numpy as np

import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules

# Changing the working location to the location of the file

cd C:\Users\Dev\Desktop\Kaggle\Apriori Algorithm

# Loading the Data

data = pd.read_excel('Online_Retail.xlsx')

data.head()

# Exploring the columns of the data

data.columns

# Exploring the different regions of transactions

data.Country.unique()

# Stripping extra spaces in the description

data['Description'] = data['Description'].str.strip()

# Dropping the rows without any invoice number

data.dropna(axis = 0, subset =['InvoiceNo'], inplace = True)

data['InvoiceNo'] = data['InvoiceNo'].astype('str')

# Dropping all transactions which were done on credit

data = data[~data['InvoiceNo'].str.contains('C')]

# Transactions done in France

basket_France = (data[data['Country'] =="France"]

.groupby(['InvoiceNo', 'Description'])['Quantity']

.sum().unstack().reset_index().fillna(0)

.set_index('InvoiceNo'))

# Transactions done in the United Kingdom

basket_UK = (data[data['Country'] =="United Kingdom"]

.groupby(['InvoiceNo', 'Description'])['Quantity']

32 | P a g e
.sum().unstack().reset_index().fillna(0)

.set_index('InvoiceNo'))

# Transactions done in Portugal

basket_Por = (data[data['Country'] =="Portugal"]

.groupby(['InvoiceNo', 'Description'])['Quantity']

.sum().unstack().reset_index().fillna(0)

.set_index('InvoiceNo'))

basket_Sweden = (data[data['Country'] =="Sweden"]

.groupby(['InvoiceNo', 'Description'])['Quantity']

.sum().unstack().reset_index().fillna(0)

.set_index('InvoiceNo'))

# Defining the hot encoding function to make the data suitable

# for the concerned libraries

def hot_encode(x):

if(x<= 0):

return 0

if(x>= 1):

return 1

# Encoding the datasets

basket_encoded = basket_France.applymap(hot_encode)

basket_France = basket_encoded

basket_encoded = basket_UK.applymap(hot_encode)

basket_UK = basket_encoded

basket_encoded = basket_Por.applymap(hot_encode)

basket_Por = basket_encoded

basket_encoded = basket_Sweden.applymap(hot_encode)

basket_Sweden = basket_encoded

33 | P a g e
# Building the model

frq_items = apriori(basket_France, min_support = 0.05, use_colnames = True)

# Collecting the inferred rules in a dataframe

rules = association_rules(frq_items, metric ="lift", min_threshold = 1)

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])

print(rules.head())

frq_items = apriori(basket_UK, min_support = 0.01, use_colnames = True)

rules = association_rules(frq_items, metric ="lift", min_threshold = 1)

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])

print(rules.head())

frq_items = apriori(basket_Por, min_support = 0.05, use_colnames = True)

rules = association_rules(frq_items, metric ="lift", min_threshold = 1)

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])

print(rules.head())

frq_items = apriori(basket_Sweden, min_support = 0.05, use_colnames = True)

rules = association_rules(frq_items, metric ="lift", min_threshold = 1)

rules = rules.sort_values(['confidence', 'lift'], ascending =[False, False])

pri

nt(rules.head())

34 | P a g e
Q12) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new iris
flower based on length & width of its petals and sepals by applying Decision Tree model on 'iris'
dataset (Use Training data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

#Mounted the drive

from google.colab import drive

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/Iris.csv")

x = df.iloc[:, 1:5].values

y = df.iloc[:, 5].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training the Model using fit() function

from sklearn.tree import DecisionTreeClassifier

dt=DecisionTreeClassifier()

dt.fit(x_train, y_train)

# Predict species (Setosa, Versicolor, or Viriginica) for a new iris flower

y_pred = dt.predict(x_test)

sepal_length = input("Enter the sepal length ")

sepal_width = input("Enter the sepal length ")

petal_length = input("Enter the petal length ")

petal_width = input("Enter the petal width")

35 | P a g e
y_pred1 = dt.predict([[sepal_length, sepal_width, petal_length, petal_width]])

print("The flower belongs to ", y_pred1)

#Evaluating the Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

Output: -

Accuracy Score : 0.9666666666666667


Classification Report :
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 11


Iris-versicolor 1.00 0.83 0.91 6
Iris-virginica 0.93 1.00 0.96 13

accuracy 0.97 30
macro avg 0.98 0.94 0.96 30
weighted avg 0.97 0.97 0.97 30

Confusion Matrix :
[[11 0 0]
[ 0 5 1]
[ 0 0 13]]

36 | P a g e
Q13) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new iris
flower based on length & width of its petals and sepals by applying Naive Bays Classification model
on 'iris' dataset (Use Training data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

#Mounted the drive

from google.colab import drive

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/drive/My Drive/Dataset/Iris.csv")

x = df.iloc[:, 1:5].values

y = df.iloc[:, 5].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training Naive Bayes Model using fit() function

from sklearn.naive_bayes import GaussianNB

nb = GaussianNB()

nb.fit(x_train, y_train)

y_pred = nb.predict(x_test)

sepal_length = input("Enter the sepal length ")

sepal_width = input("Enter the sepal length ")

petal_length = input("Enter the petal length ")

petal_width = input("Enter the petal width")

y_pred1 = nb.predict([[sepal_length, sepal_width, petal_length, petal_width]])

37 | P a g e
print("The flower belongs to ", y_pred1)

#Evaluating the Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

Ouptut:-

Accuracy Score : 0.9666666666666667

Classification Report :

precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 11

Iris-versicolor 1.00 0.83 0.91 6

Iris-virginica 0.93 1.00 0.96 13

accuracy 0.97 30

macro avg 0.98 0.94 0.96 30

weighted avg 0.97 0.97 0.97 30

Confusion Matrix : [[11 0 0]

[ 0 5 1]

[ 0 0 13]]

38 | P a g e
Q14) Write a Python program to predict fruit (Apple or Orange) based on its size & weight by
applying Support Vector Machine (SVM) model on 'apples_and_oranges' dataset (Use Training data
80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

#Mounted the drive

from google.colab import drive

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/apples_and_oranges.csv")

x = df.iloc[:, 0:2].values

y = df.iloc[:, 2].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training SVM Model using fit() function

from sklearn.svm import SVC

svc = SVC()

svc.fit(x_train, y_train)

#Predicted whether fruit is apple or orange

y_pred = svc.predict(x_test)

# w = input("Enter Weight of fruit")

# s = input("Enter size of fruit")

# y_pred1 = svc.predict([[w,s]])

# print("The fruit is", y_pred1)

39 | P a g e
#Evaluating SVM Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

Output:-

Accuracy Score : 0.875


Classification Report :
precision recall f1-score support

apple 0.80 1.00 0.89 4


orange 1.00 0.75 0.86 4

accuracy 0.88 8
macro avg 0.90 0.88 0.87 8
weighted avg 0.90 0.88 0.87 8

Confusion Matrix :
[[4 0]
[1 3]]

40 | P a g e
Q15) Write a Python program to predict species (Setosa, Versicolor, or Viriginica) for a new iris
flower based on length & width of its petals and sepals by applying Support Vector Machine (SVM)
model on 'iris' dataset (Use Training data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model and interpret the model performance.
Ans: -

gh

#Mounted the drive

from google.colab import drive

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/Iris.csv")

x = df.iloc[:, 0:2].values

y = df.iloc[:, 2].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training KNN Model using fit() function

from sklearn.svm import SVC

knn = SVC(kernel="linear")#poly,sigmoid,rbf

knn.fit(x_train, y_train)

#Predicted whether fruit is apple or orange

y_pred = knn.predict(x_test)

#pred_prob = knn.predict_proba(x_test) #Predicted probability

#print("The predicted probabilities are", pred_prob)

#w = input("Enter Weight of fruit")

#s = input("Enter size of fruit")

41 | P a g e
#y_pred = knn.predict([[w,s]])

#print("The fruit is", y_pred)

#Evaluating KNN Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, roc_auc_score

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

print("AUC ROC Score is : \n ", roc_auc_score(y_test, y_pred[:,1]))

Accuracy Score : 0.875


Classification Report :
precision recall f1-score support

apple 0.80 1.00 0.89 4


orange 1.00 0.75 0.86 4

accuracy 0.88 8
macro avg 0.90 0.88 0.87 8
weighted avg 0.90 0.88 0.87 8

Confusion Matrix :
[[4 0]
[1 3]]

42 | P a g e
Q16) Write a Python program to predict whether a person will have stroke or not, based on age &
bmi by applying Support Vector Machine (SVM) model on 'healthcare-dataset-stroke-data' dataset
(Use Training data 80% and Testing Data 20%).
Evaluate the performance of the model using Accuracy Score metric, Classification Report &
Confusion Matrix, AUC ROC score for the model by tunning hyperparameters for SVM model and
interpret the model performance.
Ans: -

#Mounted the drive

from google.colab import drive

drive.mount("/content/drive", force_remount = True)

#import dataset & necessary libraries

import pandas as pd

import numpy as np

df = pd.read_csv("/content/healthcare-dataset-stroke-data.csv")

x = df.iloc[:, [2,9]].values

y = df.iloc[:, 11].values

df.head(5)

#Spliting the dataset for training & testing purpose

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=.2, random_state = 100)

#Training SVM Model using fit() function

from sklearn.svm import SVC

svc = SVC(kernel="linear")

svc.fit(x_train, y_train)

#Predict whether person will be having stroke or not

y_pred = svc.predict(x_test)

age = input("Enter the age of a person")

bmi = input("Enter the BMI value of a person")

y_pred1 = svc.predict([[age,bmi]])

if(y_pred1 == 1):

43 | P a g e
print("The person will have stroke")

else:

print("The person will not have stroke")

#Evaluating SVM Model

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

print("Accuracy Score : ", accuracy_score(y_test, y_pred))

print("Classification Report : \n ", classification_report(y_test, y_pred))

print("Confusion Matrix : \n ", confusion_matrix(y_test, y_pred))

Output:-

Accuracy Score : 0.9653767820773931


Classification Report :
precision recall f1-score support

0 0.97 1.00 0.98 948


1 0.00 0.00 0.00 34

accuracy 0.97 982


macro avg 0.48 0.50 0.49 982
weighted avg 0.93 0.97 0.95 982

Confusion Matrix :
[[948 0]
[ 34 0]]
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use
`zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use
`zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py:1318: UndefinedMetricWarning:
Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use
`zero_division` parameter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))

44 | P a g e
Q17) Python Program to implement Text Mining Basics:

i. Tokenization
ii. Finding frequency distinct
iii. Removing punctuations
iv. Stemming
Ans: -

# Tokenization
# Importing necessary library
import pandas as pd
import numpy as np
import nltk
import os
import nltk.corpus
# sample text for performing tokenization
text = “We are learning text mining basics with python. python will help in
implementing different algorithms"
# importing word_tokenize from nltk
from nltk.tokenize import word_tokenize
# Passing the string text into word tokenize for breaking the sentences
token = word_tokenize(text)
token
Output:

['We','are','learning','text', 'mining','basics', 'with', 'python', '.',


'python', 'will', 'help', 'in', 'implementing', 'different', 'algorithms']

Program :Finding frequency distinct in the text

# finding the frequency distinct in the tokens


# Importing FreqDist library from nltk and passing token into FreqDist
from nltk.probability import FreqDist
fdist = FreqDist(token)
fdist
# To find the frequency of top 10 words
fdist1 = fdist.most_common(10)
fdist1

Output:

FreqDist({'python': 2, 'We': 1, 'are': 1, 'learning': 1, 'text': 1, 'mining':


1, 'basics': 1, 'with': 1, 'will': 1, 'help': 1, 'in': 1, 'implementing': 1,
'different': 1, 'algorithms': 1,})
[('python', 2),
('We', 1),
('are', 1),
('learning', 1),
('text', 1),
('mining', 1),
('basics', 1),
('with', 1),
('will', 1),
('help', 1),
('in', 1),
('implementing', 1),
('different', 1),
('algorithms', 1),]

45 | P a g e
# remove punctuation
import string text = "Thank you! For learning. Just adding, a few notes,
diagrams and ppts."
punct = set(string.punctuation)
text = "".join([ch for ch in tweet if ch not in punct])
print(text)

Output:

Thank you For learning Just adding a few notes diagrams and ppts

# program for the example of stemming

import nltk
from nltk.stem.porter import PorterStemmer
words = ["walk", "walking", "walked", "walks", "ran", "run", "running",
"runs"]
stemmer = PorterStemmer()

for word in words:


print(word + " ---> " + stemmer.stem(word))

Output:
walk ---> walk
walking ---> walk
walked ---> walk
walks ---> walk
ran ---> ran
run ---> run
running ---> run
runs ---> run

46 | P a g e
Q18) Program to implement Text Mining: Sentimental Analysis, using RNN LSTM learning
model on DataSet of tweets on an airline management.
Ans: -

There are 3 parts to the program :


1. Cleaning data
2. Creating the RNN LSTM Learning model
3. Testing the model on new data.
Program:
# Sentimental analysis using RNN
# Setting up the data for model creation

import pandas as pd

df = pd.read_excel(r “D: \ KR&AI \ Lab \ DataSet\Tweets.xlsx")

# Check the column names


df.columns

# Removing neutral Reviews


review_df = review_df[review_df['airline_sentiment'] != 'neutral']

print(review_df.shape)
review_df.head(5)

# convert the categorical values to numeric using the factorize() method

sentiment_label = review_df.airline_sentiment.factorize()

# retrieve all the text data from the dataset.


tweet = review_df.text.values

# Tokenize all the words in the text 


from tensorflow.keras.preprocessing.text import Tokenizer

tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(tweet)
encoded_docs = tokenizer.texts_to_sequences(tweet)

from tensorflow.keras.preprocessing.sequence import pad_sequences


padded_sequence = pad_sequences(encoded_docs, maxlen=200)

# Sentimental analysis using RNN


# Building the text classifier, using RNN LSTM model. 

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import LSTM,Dense, Dropout, SpatialDropout1D
from tensorflow.keras.layers import Embedding

embedding_vector_length = 32
model = Sequential()
model.add(Embedding(vocab_size, embedding_vector_length, input_length=200))
model.add(SpatialDropout1D(0.25))
model.add(LSTM(50, dropout=0.5, recurrent_dropout=0.5))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',optimizer='adam',
metrics=['accuracy'])

print(model.summary())

47 | P a g e
# Train the sentiment analysis model for 5 epochs on the whole dataset with a
batch size of 32 and a validation split of 20%.

history = model.fit(padded_sequence,sentiment_label[0],validation_split=0.2,
epochs=5, batch_size=32)

Output:

# Sentimental analysis using RNN


# Testing the sentiment analysis model on new data
# Define a function that takes a text as input and outputs its prediction
label.

def predict_sentiment(text):
tw = tokenizer.texts_to_sequences([text])
tw = pad_sequences(tw,maxlen=200)
prediction = int(model.predict(tw).round().item())
print("Predicted label: ", sentiment_label[1][prediction])

test_sentence1 = "I enjoyed my journey on this flight."


predict_sentiment(test_sentence1)

test_sentence2 = "This is the worst flight experience of my life!"


predict_sentiment(test_sentence2)

Output:

48 | P a g e
Q19) Implementing python visualizations on cluster data
Ans: -
# Import pandas and CSV file I/O library
import pandas as pd
# Import seaborn, a Python graphing library
import warnings
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white", color_codes=True)

# Load the Iris flower dataset


iris = pd.read_csv(" D:\KR&AI\Lab\DataSet\Iris.csv")
# Let's see how many examples we have of each species
iris["Species"].value_counts()

# Ploting using the .plot extension from Pandas dataframes


iris.plot(kind="scatter", x="SepalLengthCm", y="SepalWidthCm")

49 | P a g e
# Using seaborn library to make a plot
Sns.jointplot(x=”SepalLengthCm”, y=”SepalWidthCm”, data=iris, size=5)
Output:

#use seaborn's FacetGrid to color the scatterplot by species

sns.FacetGrid(iris, hue="Species", size=5) \


.map(plt.scatter, "SepalLengthCm", "SepalWidthCm") \
.add_legend()

50 | P a g e
51 | P a g e
Q20) Creating & visualizing a simple ANN problem to understand the implementation of an artificial
neuron using python
Ans: -

Training Data:
Input 1 Input 2 Input 3 Output
0 1 1 1
1 0 0 0
1 0 1 1

Test Data:
1 0 1 ?

Solution Program
import numpy as np

class NeuralNetwork():

def __init__(self):
# seeding for random number generation
np.random.seed(1)

#converting weights to a 3 by 1 matrix with values from -


1 to 1 and mean of 0
self.synaptic_weights = 2 * np.random.random((3, 1)) - 1

def sigmoid(self, x):


#applying the sigmoid function
return 1 / (1 + np.exp(-x))

def sigmoid_derivative(self, x):


#computing derivative to the Sigmoid function
return x * (1 - x)

def train(self, training_inputs, training_outputs,


training_iterations):

#training the model to make accurate predictions while


adjusting weights continually
for iteration in range(training_iterations):
#siphon the training data via the neuron
output = self.think(training_inputs)

#computing error rate for back-propagation


error = training_outputs - output

#performing weight adjustments


adjustments = np.dot(training_inputs.T, error *
self.sigmoid_derivative(output))

52 | P a g e
self.synaptic_weights += adjustments

def think(self, inputs):


#passing the inputs via the neuron to get output
#converting values to floats

inputs = inputs.astype(float)
output = self.sigmoid(np.dot(inputs,
self.synaptic_weights))
return output

if __name__ == "__main__":

#initializing the neuron class


neural_network = NeuralNetwork()

print("Beginning Randomly Generated Weights: ")


print(neural_network.synaptic_weights)

#training data consisting of 4 examples--3 input values and 1


output
training_inputs = np.array([[0,0,1],
[1,1,1],
[1,0,1],
[0,1,1]])

training_outputs = np.array([[0,1,1,0]]).T

#training taking place


neural_network.train(training_inputs, training_outputs,
15000)

print("Ending Weights After Training: ")


print(neural_network.synaptic_weights)

user_input_one = str(input("User Input One: "))


user_input_two = str(input("User Input Two: "))
user_input_three = str(input("User Input Three: "))

print("Considering New Situation: ", user_input_one,


user_input_two, user_input_three)
print("New Output data: ")
print(neural_network.think(np.array([user_input_one,
user_input_two, user_input_three])))
print("Wow, we did it!")

OUTPUT:

53 | P a g e
54 | P a g e
Q21) Program to pre-process data of Australian weather and implementing an Artificial Neural
Network to predict the whether
Ans: -

import matplotlib.pyplot as plt


import seaborn as sns
import datetime
from sklearn.preprocessing import LabelEncoder
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import seaborn as sns
from keras.layers import Dense, BatchNormalization, Dropout, LSTM
from keras.models import Sequential
from keras.utils import to_categorical
from keras.optimizers import Adam
from tensorflow.keras import regularizers
from sklearn.metrics import precision_score, recall_score, confusion_matrix,
classification_report, accuracy_score, f1_score
from keras import callbacks

np.random.seed(0)

# Loading the dataset file


data = pd.read_csv("D:\SIBAR MCA\KR&AI\Lab\DataSet\weatherAUS.csv")
data.head()

# Print the data details


data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null object
1 Location 145460 non-null object
2 MinTemp 143975 non-null float64
3 MaxTemp 144199 non-null float64
4 Rainfall 142199 non-null float64
5 Evaporation 82670 non-null float64
6 Sunshine 75625 non-null float64
7 WindGustDir 135134 non-null object
8 WindGustSpeed 135197 non-null float64
9 WindDir9am 134894 non-null object
10 WindDir3pm 141232 non-null object
11 WindSpeed9am 143693 non-null float64
12 WindSpeed3pm 142398 non-null float64
13 Humidity9am 142806 non-null float64
14 Humidity3pm 140953 non-null float64
15 Pressure9am 130395 non-null float64
16 Pressure3pm 130432 non-null float64
17 Cloud9am 89572 non-null float64
18 Cloud3pm 86102 non-null float64
19 Temp9am 143693 non-null float64
20 Temp3pm 141851 non-null float64
21 RainToday 142199 non-null object
22 RainTomorrow 142193 non-null object
dtypes: float64(16), object(7)
memory usage: 25.5+ MB

55 | P a g e
#Parsing datetime
# exploring the length of date objects
lengths = data["Date"].str.len()
lengths.value_counts()

#There don't seem to be any error in dates so parsing values into datetime
data['Date']= pd.to_datetime(data["Date"])
#Creating a collumn of year
data['year'] = data.Date.dt.year

# function to encode datetime into cyclic parameters.


#As I am planning to use this data in a neural network I prefer the months and
days in a cyclic continuous feature.

def encode(data, col, max_val):


data[col + '_sin'] = np.sin(2 * np.pi * data[col]/max_val)
data[col + '_cos'] = np.cos(2 * np.pi * data[col]/max_val)
return data

data['month'] = data.Date.dt.month
data = encode(data, 'month', 12)

data['day'] = data.Date.dt.day
data = encode(data, 'day', 31)

data.head()

# roughly a year's span section


# To see if the "year" attribute of data repeats
section = data[:360]
tm = section["day"].plot(color="#C2C4E2")
tm.set_title("Distribution Of Days Over Year")
tm.set_ylabel("Days In month")
tm.set_xlabel("Days In Year")

56 | P a g e
#  Splitting months and days into Sine and cosine combination provides the
cyclical continuous feature. This can be used as input features to ANN.
# Splitting of Month

cyclic_month = sns.scatterplot(x="month_sin",y="month_cos",data=data,
color="#C2C4E2")
cyclic_month.set_title("Cyclic Encoding of Month")
cyclic_month.set_ylabel("Cosine Encoded Months")
cyclic_month.set_xlabel("Sine Encoded Months")

# Splitting of Day

cyclic_day = sns.scatterplot(x='day_sin',y='day_cos',data=data,
color="#C2C4E2")
cyclic_day.set_title("Cyclic Encoding of Day")
cyclic_day.set_ylabel("Cosine Encoded Day")
cyclic_day.set_xlabel("Sine Encoded Day")

57 | P a g e
# Processing the data for missing values
# Filling missing values with mode of the column in value, for categorical
variables
for i in object_cols:
data[i].fillna(data[i].mode()[0], inplace=True)

# Filling missing values with mode of the column in value, for numerical
variables
for i in num_cols:
data[i].fillna(data[i].median(), inplace=True)

# Printing the Set


data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145460 entries, 0 to 145459
Data columns (total 30 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 145460 non-null datetime64[ns]
1 Location 145460 non-null object
2 MinTemp 145460 non-null float64
3 MaxTemp 145460 non-null float64
4 Rainfall 145460 non-null float64
5 Evaporation 145460 non-null float64
6 Sunshine 145460 non-null float64
7 WindGustDir 145460 non-null object
8 WindGustSpeed 145460 non-null float64
9 WindDir9am 145460 non-null object
10 WindDir3pm 145460 non-null object
11 WindSpeed9am 145460 non-null float64
12 WindSpeed3pm 145460 non-null float64
13 Humidity9am 145460 non-null float64
14 Humidity3pm 145460 non-null float64
15 Pressure9am 145460 non-null float64
16 Pressure3pm 145460 non-null float64
17 Cloud9am 145460 non-null float64

58 | P a g e
18 Cloud3pm 145460 non-null float64
19 Temp9am 145460 non-null float64
20 Temp3pm 145460 non-null float64
21 RainToday 145460 non-null object
22 RainTomorrow 145460 non-null object
23 year 145460 non-null int64
24 month 145460 non-null int64
25 month_sin 145460 non-null float64
26 month_cos 145460 non-null float64
27 day 145460 non-null int64
28 day_sin 145460 non-null float64
29 day_cos 145460 non-null float64
dtypes: datetime64[ns](1), float64(20), int64(3), object(6)
memory usage: 33.3+ MB

# Apply label encoder to each column with categorical data


label_encoder = LabelEncoder()
for i in object_cols:
data[i] = label_encoder.fit_transform(data[i])

# Prepairing attributes of scale data

features = data.drop(['RainTomorrow', 'Date','day', 'month'], axis=1) #


dropping target and extra columns

target = data['RainTomorrow']

#Set up a standard scaler for the features


col_names = list(features.columns)
s_scaler = preprocessing.StandardScaler()
features = s_scaler.fit_transform(features)
features = pd.DataFrame(features, columns=col_names)

features.describe().T

# Creating Model
#Assigning X and y the status of attributes and tags
X = features.drop(["RainTomorrow"], axis=1)
y = features["RainTomorrow"]

# Splitting test and training sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)

X.shape

#Early stopping
early_stopping = callbacks.EarlyStopping(
min_delta=0.001, # minimium amount of change to count as an improvement
patience=20, # how many epochs to wait before stopping
restore_best_weights=True,
)

# Initialising the NN
model = Sequential()

# layers

model.add(Dense(units = 32, kernel_initializer = 'uniform', activation =


'relu', input_dim = 26))

59 | P a g e
model.add(Dense(units = 32, kernel_initializer = 'uniform', activation =
'relu'))
model.add(Dense(units = 16, kernel_initializer = 'uniform', activation =
'relu'))
model.add(Dropout(0.25))
model.add(Dense(units = 8, kernel_initializer = 'uniform', activation =
'relu'))
model.add(Dropout(0.5))
model.add(Dense(units = 1, kernel_initializer = 'uniform', activation =
'sigmoid'))

# Compiling the ANN


opt = Adam(learning_rate=0.00009)
model.compile(optimizer = opt, loss = 'binary_crossentropy', metrics =
['accuracy'])

# Train the ANN


history = model.fit(X_train, y_train, batch_size = 32, epochs = 150,
callbacks=[early_stopping], validation_split=0.2)

Output:
Epoch 1/150
2551/2551 [==============================] - 5s 2ms/step - loss: 0.5967 -
accuracy: 0.7805 - val_loss: 0.3964 - val_accuracy: 0.7860
Epoch 2/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4413 -
accuracy: 0.7919 - val_loss: 0.3860 - val_accuracy: 0.8388
Epoch 3/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4290 -
accuracy: 0.8257 - val_loss: 0.3761 - val_accuracy: 0.8400
Epoch 4/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4174 -
accuracy: 0.8295 - val_loss: 0.3712 - val_accuracy: 0.8421
Epoch 5/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4137 -
accuracy: 0.8327 - val_loss: 0.3693 - val_accuracy: 0.8436
Epoch 6/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4091 -
accuracy: 0.8338 - val_loss: 0.3669 - val_accuracy: 0.8443
Epoch 7/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4082 -
accuracy: 0.8348 - val_loss: 0.3665 - val_accuracy: 0.8441
Epoch 8/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4049 -
accuracy: 0.8354 - val_loss: 0.3650 - val_accuracy: 0.8439
Epoch 9/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.4020 -
accuracy: 0.8357 - val_loss: 0.3642 - val_accuracy: 0.8441
Epoch 10/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3977 -
accuracy: 0.8363 - val_loss: 0.3635 - val_accuracy: 0.8445
Epoch 11/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3984 -
accuracy: 0.8353 - val_loss: 0.3615 - val_accuracy: 0.8445
Epoch 12/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3953 -
accuracy: 0.8368 - val_loss: 0.3618 - val_accuracy: 0.8443
Epoch 13/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3975 -
accuracy: 0.8340 - val_loss: 0.3608 - val_accuracy: 0.8444
Epoch 14/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3908 -
accuracy: 0.8373 - val_loss: 0.3597 - val_accuracy: 0.8449
Epoch 15/150

60 | P a g e
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3859 -
accuracy: 0.8383 - val_loss: 0.3597 - val_accuracy: 0.8445
Epoch 16/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3899 -
accuracy: 0.8355 - val_loss: 0.3593 - val_accuracy: 0.8433
Epoch 17/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3889 -
accuracy: 0.8364 - val_loss: 0.3581 - val_accuracy: 0.8441
Epoch 18/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3924 -
accuracy: 0.8336 - val_loss: 0.3580 - val_accuracy: 0.8438
Epoch 19/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3886 -
accuracy: 0.8361 - val_loss: 0.3582 - val_accuracy: 0.8431
Epoch 20/150
2551/2551 [==============================] - 4s 2ms/step - loss: 0.3860 -
accuracy: 0.8352 - val_loss: 0.3578 - val_accuracy: 0.8421

#Plotting training and validation loss over epochs

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['loss']], "#BDE2E2", label='Training loss')


plt.plot(history_df.loc[:, ['val_loss']],"#C2C4E2", label='Validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend(loc="best")

plt.show()

#Plotting training and validation accuracy over epochs

history_df = pd.DataFrame(history.history)

plt.plot(history_df.loc[:, ['accuracy']], "#BDE2E2", label='Training


accuracy')

61 | P a g e
plt.plot(history_df.loc[:, ['val_accuracy']], "#C2C4E2", label='Validation
accuracy')

plt.title('Training and Validation accuracy')


plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# Testing the model on Test Data


# Predicting the test set results
y_pred = model.predict(X_test)
y_pred = (y_pred > 0.5)

print(classification_report(y_test, y_pred))

62 | P a g e
Q22) Write a Python program to prepare data, to be given to a convolutional neural network CNN and
create an Image Classifier. Use the cat and dog training and test dataset.
Ans: -

# Importing the required libraries


import cv2
import os
import numpy as np
from random import shuffle
from tqdm import tqdm

'''Setting up the env'''


TRAIN_DIR = ‘D:/SIBAR MCA/KR&AI 2021/Lab AI/DataSet/train'
TEST_DIR = ‘D:/SIBAR MCA/KR&AI 2021/Lab AI/DataSet/ test1'
IMG_SIZE = 50
LR = 1e-3

'''Setting up the model which will help with tensorflow models'''


MODEL_NAME = 'dogsvscats-{}-{}.model'.format(LR, '6conv-basic')

'''Labelling the dataset'''


def label_img(img):
word_label = img.split('.')[-3]
# DIY One hot encoder
if word_label == 'cat': return [1, 0]
elif word_label == 'dog': return [0, 1]

'''Creating the training data'''


def create_train_data():
# Creating an empty list where we should store the training data
# after a little preprocessing of the data
training_data = []

# tqdm is only used for interactive loading


# loading the training data
for img in tqdm(os.listdir(TRAIN_DIR)):

# labeling the images


label = label_img(img)

path = os.path.join(TRAIN_DIR, img)

# loading the image from the path and then converting them into
# grayscale for easier covnet prob
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

# resizing the image for processing them in the covnet


img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))

# final step-forming the training data list with numpy array of the images
training_data.append([np.array(img), np.array(label)])

# shuffling of the training data to preserve the random state of our data

63 | P a g e
shuffle(training_data)

# saving our trained data for further uses if required


np.save('train_data.npy', training_data)
return training_data

'''Processing the given test data'''


# Almost same as processing the training data but
# we dont have to label it.
def process_test_data():
testing_data = []
for img in tqdm(os.listdir(TEST_DIR)):
path = os.path.join(TEST_DIR, img)
img_num = img.split('.')[0]
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
testing_data.append([np.array(img), img_num])

shuffle(testing_data)
np.save('test_data.npy', testing_data)
return testing_data

'''Running the training and the testing in the dataset for our model'''
train_data = create_train_data()
test_data = process_test_data()

# train_data = np.load('train_data.npy')
# test_data = np.load('test_data.npy')
'''Creating the neural network using tensorflow'''
# Importing the required libraries
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression

import tensorflow as tf
tf.reset_default_graph()
convnet = input_data(shape =[None, IMG_SIZE, IMG_SIZE, 1], name ='input')

convnet = conv_2d(convnet, 32, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 128, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 64, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = conv_2d(convnet, 32, 5, activation ='relu')


convnet = max_pool_2d(convnet, 5)

convnet = fully_connected(convnet, 1024, activation ='relu')

64 | P a g e
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation ='softmax')


convnet = regression(convnet, optimizer ='adam', learning_rate = LR,
loss ='categorical_crossentropy', name ='targets')

model = tflearn.DNN(convnet, tensorboard_dir ='log')

# Splitting the testing data and training data


train = train_data[:-500]
test = train_data[-500:]

'''Setting up the features and labels'''


# X-Features & Y-Labels

X = np.array([i[0] for i in train]).reshape(-1, IMG_SIZE, IMG_SIZE, 1)


Y = [i[1] for i in train]
test_x = np.array([i[0] for i in test]).reshape(-1, IMG_SIZE, IMG_SIZE, 1)
test_y = [i[1] for i in test]

'''Fitting the data into our model'''


# epoch = 5 taken
model.fit({'input': X}, {'targets': Y}, n_epoch = 5,
validation_set =({'input': test_x}, {'targets': test_y}),
snapshot_step = 500, show_metric = True, run_id = MODEL_NAME)
model.save(MODEL_NAME)

'''Testing the data'''


import matplotlib.pyplot as plt
# if you need to create the data:
# test_data = process_test_data()
# if you already have some saved:
test_data = np.load('test_data.npy')

fig = plt.figure()

for num, data in enumerate(test_data[:20]):


# cat: [1, 0]
# dog: [0, 1]

img_num = data[1]
img_data = data[0]

y = fig.add_subplot(4, 5, num + 1)
orig = img_data
data = img_data.reshape(IMG_SIZE, IMG_SIZE, 1)

# model_out = model.predict([data])[0]
model_out = model.predict([data])[0]

if np.argmax(model_out) == 1: str_label ='Dog'


else: str_label ='Cat'

y.imshow(orig, cmap ='gray')


plt.title(str_label)

65 | P a g e
y.axes.get_xaxis().set_visible(False)
y.axes.get_yaxis().set_visible(False)
plt.show()

66 | P a g e
Q23) Write a Python program to implement RNN by building a character level prediction RNN and train in on
the text of “Harry Potter and the Philosopher’s Stone”.

Ans: -

import numpy as np
import matplotlib.pyplot as plt
class ReccurentNN:
def __init__(self, char_to_idx, idx_to_char, vocab, h_size=75,
seq_len=20, clip_value=5, epochs=50, learning_rate=1e-2):
self.n_h = h_size
self.seq_len = seq_len # number of characters in each batch/time steps
self.clip_value = clip_value # maximum allowed value for the gradients
self.epochs = epochs
self.learning_rate = learning_rate
self.char_to_idx = char_to_idx # dictionary that maps characters to an index
self.idx_to_char = idx_to_char # dictionary that maps indices to characters
self.vocab = vocab # number of unique characters in the training text
# smoothing out loss as batch SGD is noisy
self.smooth_loss = -np.log(1.0 / self.vocab) * self.seq_len

# initialize parameters
self.params = {}

self.params["W_xh"] = np.random.randn(self.vocab, self.n_h) * 0.01


self.params["W_hh"] = np.identity(self.n_h) * 0.01
self.params["b_h"] = np.zeros((1, self.n_h))
self.params["W_hy"] = np.random.randn(self.n_h, self.vocab) * 0.01
self.params["b_y"] = np.zeros((1, self.vocab))

self.h0 = np.zeros((1, self.n_h)) # value of the hidden state at time step t = -1

# initialize gradients and memory parameters for Adagrad


self.grads = {}
self.m_params = {}
for key in self.params:
self.grads["d" + key] = np.zeros_like(self.params[key])
self.m_params["m" + key] = np.zeros_like(self.params[key])
def _encode_text(self, X):
X_encoded = []
for char in X:
X_encoded.append(self.char_to_idx[char])
return X_encoded

def _prepare_batches(self, X, index):


X_batch_encoded = X[index: index + self.seq_len]
y_batch_encoded = X[index + 1: index + self.seq_len + 1]

X_batch = []
y_batch = []

for i in X_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][i] = 1
X_batch.append(one_hot_char)

67 | P a g e
for j in y_batch_encoded:
one_hot_char = np.zeros((1, self.vocab))
one_hot_char[0][j] = 1
y_batch.append(one_hot_char)
return X_batch, y_batch

def _softmax(self, x):


# max value is substracted for numerical stability
# https://stats.stackexchange.com/a/338293
e_x = np.exp(x - np.max(x))
return e_x / np.sum(e_x)

def _forward_pass(self, X):

h = {} # stores hidden states


h[-1] = self.h0 # set initial hidden state at t=-1

y_pred = {} # stores softmax output probabilities

# iterate over each character in the input sequence


for t in range(self.seq_len):
h[t] = np.tanh(
np.dot(X[t], self.params["W_xh"]) + np.dot(h[t - 1], self.params["W_hh"]) + self.params["b_h"])
y_pred[t] = self._softmax(np.dot(h[t], self.params["W_hy"]) + self.params["b_y"])

self.ho = h[t]
return y_pred, h
def _backward_pass(self, X, y, y_pred, h):
dh_next = np.zeros_like(h[0])
for t in reversed(range(self.seq_len)):
dy = np.copy(y_pred[t])
dy[0][np.argmax(y[t])] -= 1 # predicted y - actual y

self.grads["dW_hy"] += np.dot(h[t].T, dy)


self.grads["db_y"] += dy

dhidden = (1 - h[t] ** 2) * (np.dot(dy, self.params["W_hy"].T) + dh_next)


dh_next = np.dot(dhidden, self.params["W_hh"].T)

self.grads["dW_hh"] += np.dot(h[t - 1].T, dhidden)


self.grads["dW_xh"] += np.dot(X[t].T, dhidden)
self.grads["db_h"] += dhidden

# clip gradients to mitigate exploding gradients


for grad, key in enumerate(self.grads):
np.clip(self.grads[key], -self.clip_value, self.clip_value, out=self.grads[key])
return
def _update(self):
for key in self.params:
self.m_params["m" + key] += self.grads["d" + key] * self.grads["d" + key]
self.params[key] -= self.grads["d" + key] * self.learning_rate / (np.sqrt(self.m_params["m" + key]) +
1e-8)
def test(self, test_size, start_index):
res = ""

68 | P a g e
x = np.zeros((1, self.vocab))
x[0][start_index] = 1
for i in range(test_size):
# forward propagation
h = np.tanh(np.dot(x, self.params["W_xh"]) + np.dot(self.h0, self.params["W_hh"]) +
self.params["b_h"])
y_pred = self._softmax(np.dot(h, self.params["W_hy"]) + self.params["b_y"])

# get a random index from the probability distribution of y


index = np.random.choice(range(self.vocab), p=y_pred.ravel())

# set x-one_hot_vector for the next character


x = np.zeros((1, self.vocab))
x[0][index] = 1

# find the char with the index and concat to the output string
char = self.idx_to_char[index]
res += char
return res
def train(self, X):
J = []
num_batches = len(X) // self.seq_len
X_trimmed = X[:num_batches * self.seq_len] # trim end of the input text so that we have full
sequences
X_encoded = self._encode_text(X_trimmed) # transform words to indices to enable processing
for i in range(self.epochs):
for j in range(0, len(X_encoded) - self.seq_len, self.seq_len):
X_batch, y_batch = self._prepare_batches(X_encoded, j)
y_pred, h = self._forward_pass(X_batch)
loss = 0
for t in range(self.seq_len):
loss += -np.log(y_pred[t][0, np.argmax(y_batch[t])])
self.smooth_loss = self.smooth_loss * 0.999 + loss * 0.001
J.append(self.smooth_loss)
self._backward_pass(X_batch, y_batch, y_pred, h)
self._update()
print('Epoch:', i + 1, "\tLoss:", loss, "")
return J, self.params
with open('Harry-Potter.txt') as f:
text = f.read().lower()
# use only a part of the text to make the process faster
text = text[:20000]
# text = [char for char in text if char not in ["(", ")", "\"", "'", ".", "?", "!", ",", "-"]]
# text = [char for char in text if char not in ["(", ")", "\"", "'"]]
chars = set(text)
vocab = len(chars)
# print(f"Length of training text {len(text)}")
# print(f"Size of vocabulary {vocab}")

# creating the encoding decoding dictionaries


char_to_idx = {w: i for i, w in enumerate(chars)}
idx_to_char = {i: w for i, w in enumerate(chars)}

parameter_dict = {
'char_to_idx': char_to_idx,

69 | P a g e
'idx_to_char': idx_to_char,
'vocab': vocab,
'h_size': 75,
'seq_len': 20,
# keep small to avoid diminishing/exploding gradients
'clip_value': 5,
'epochs': 50,
'learning_rate': 1e-2,
}

model = ReccurentNN(**parameter_dict)
loss, params = model.train(text)
plt.figure(figsize=(12, 8))
plt.plot([i for i in range(len(loss))], loss)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.show()
print(model.test(50,10))

OUTPUT:
Epoch: 1 Loss: 56.938160313575075
Epoch: 2 Loss: 49.479841032771944
Epoch: 3 Loss: 44.287300754487774
Epoch: 4 Loss: 42.75894603770088
Epoch: 5 Loss: 40.962449282519785
Epoch: 6 Loss: 41.06907316142755
Epoch: 7 Loss: 39.77795494997328
Epoch: 8 Loss: 41.059521063295485
Epoch: 9 Loss: 39.848893648177594
Epoch: 10 Loss: 40.42097045126549
Epoch: 11 Loss: 39.183043247471126
Epoch: 12 Loss: 40.09713939411275
Epoch: 13 Loss: 38.786694845855145
Epoch: 14 Loss: 39.41259563289025
Epoch: 15 Loss: 38.87094988626352
Epoch: 16 Loss: 38.80896936130275
Epoch: 17 Loss: 38.65301294936609
Epoch: 18 Loss: 38.2922486206415
Epoch: 19 Loss: 38.120326247610286
Epoch: 20 Loss: 37.94743442371039
Epoch: 21 Loss: 37.781826419304245
Epoch: 22 Loss: 38.02242197941186
Epoch: 23 Loss: 37.34639374983505
Epoch: 24 Loss: 37.383830387022115
Epoch: 25 Loss: 36.863261576664286
Epoch: 26 Loss: 36.81717706027801
Epoch: 27 Loss: 35.98781618662626

70 | P a g e
Epoch: 28 Loss: 34.883143187020806
Epoch: 29 Loss: 35.74233839750379
Epoch: 30 Loss: 34.17457373354039
Epoch: 31 Loss: 34.3659838303625
Epoch: 32 Loss: 34.6155982440106
Epoch: 33 Loss: 33.428021716569035
Epoch: 34 Loss: 33.06226727751935
Epoch: 35 Loss: 33.23334401686566
Epoch: 36 Loss: 32.9818416477839
Epoch: 37 Loss: 33.155764725505655
Epoch: 38 Loss: 32.937205806520474
Epoch: 39 Loss: 32.93063638107538
Epoch: 40 Loss: 32.943368437981256
Epoch: 41 Loss: 32.92520056534523
Epoch: 42 Loss: 32.96074563399301
Epoch: 43 Loss: 32.974579784369666
Epoch: 44 Loss: 32.86483014312194
Epoch: 45 Loss: 33.10532379921245
Epoch: 46 Loss: 32.89950584889016
Epoch: 47 Loss: 33.11303116056217
Epoch: 48 Loss: 32.731237824441756
Epoch: 49 Loss: 32.742918023080314
Epoch: 50 Loss: 32.421869906086144

is othe on. ogofostheodindearidut wlethallle, st oserarey d -lers amoathe y


thasathey at dll tos dn t s med d.). t t ile brs t d g htherive, d ogostare d.
ay shag hythay boumay tey thas ot havininggon

71 | P a g e
Q24) Write a Python program to implement GAN, to create a curve resembling a sine wave. Python
library pytorch must be used to set a random generator.
Ans: -

#importing the necessary libraries:


import torch
from torch import nn

import math
import matplotlib.pyplot as plt

#Set up a random generator seed, 111 represents the random seed 


torch.manual_seed(111)

#Preparing the Training Data


train_data_length = 1024
train_data = torch.zeros((train_data_length, 2))
train_data[:, 0] = 2 * math.pi * torch.rand(train_data_length)
train_data[:, 1] = torch.sin(train_data[:, 0])
train_labels = torch.zeros(train_data_length)
train_set = [
(train_data[i], train_labels[i]) for i in range(train_data_length) ]

# Plotting the training data, each point (x₁, x₂)


plt.plot(train_data[:, 0], train_data[:, 1], ".")

# Create a PyTorch data loader


batch_size = 32
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=batch_size, shuffle=True
)

#Implementing the Discriminator, in PyTorch, the neural network models are


represented by classes that inherit from nn.Module

72 | P a g e
class Discriminator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(2, 256),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(256, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 64),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(64, 1),
nn.Sigmoid(),
)
def forward(self, x):
output = self.model(x)
return output

#instantiate a Discriminator object
discriminator = Discriminator()

#Implementing the Generator, create a Generator class that inherits


from nn.Module
class Generator(nn.Module):
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Linear(2, 16),
nn.ReLU(),
nn.Linear(16, 32),
nn.ReLU(),
nn.Linear(32, 2),
)

def forward(self, x):


output = self.model(x)
return output

generator = Generator()

#set up parameters to use during training


lr = 0.001
num_epochs = 300
loss_function = nn.BCELoss()

#Create the optimizers using torch.optim


optimizer_discriminator = torch.optim.Adam(discriminator.parameters(), lr=lr)
optimizer_generator = torch.optim.Adam(generator.parameters(), lr=lr)

# implement a training loop 


for epoch in range(num_epochs):
for n, (real_samples, _) in enumerate(train_loader):
# Data for training the discriminator
real_samples_labels = torch.ones((batch_size, 1))
latent_space_samples = torch.randn((batch_size, 2))
generated_samples = generator(latent_space_samples)
generated_samples_labels = torch.zeros((batch_size, 1))
all_samples = torch.cat((real_samples, generated_samples))
all_samples_labels = torch.cat(
(real_samples_labels, generated_samples_labels)
)

73 | P a g e
# Training the discriminator
discriminator.zero_grad()
output_discriminator = discriminator(all_samples)
loss_discriminator = loss_function(
output_discriminator, all_samples_labels)
loss_discriminator.backward()
optimizer_discriminator.step()

# Data for training the generator


latent_space_samples = torch.randn((batch_size, 2))

# Training the generator


generator.zero_grad()
generated_samples = generator(latent_space_samples)
output_discriminator_generated = discriminator(generated_samples)
loss_generator = loss_function(
output_discriminator_generated, real_samples_labels
)
loss_generator.backward()
optimizer_generator.step()

# Show loss
if epoch % 10 == 0 and n == batch_size - 1:
print(f"Epoch: {epoch} Loss D.: {loss_discriminator}")
print(f"Epoch: {epoch} Loss G.: {loss_generator}")

# Checking the Samples Generated by the GAN


latent_space_samples = torch.randn(100, 2)
generated_samples = generator(latent_space_samples)
generated_samples = generated_samples.detach()
plt.plot(generated_samples[:, 0], generated_samples[:, 1], ".")

Output: After 300 epochs

74 | P a g e
75 | P a g e

You might also like