Anaconda is a distribution of the Python and R programming languages for scientific
computing, that aims to simplify package management and deployment. The distribution
includes data-science packages suitable for Windows, Linux, and macOS. It can be accessed
from
https://www.anaconda.com/products/individual
After installing anaconda, launch it by simply searching it and then launch it by clicking the
icon you get. When it opens, find and launch jupyter Or type jupyter notebook.
Home screen will open in a browser after terminal display for few seconds.
Go to New and select python 3 in drop down list.
A new notebook will be opened, you may rename it by clicking on text box displaying
“untitled”
You may enter your commands in the empty cells. Green color indicates the active cell
while blue indicates the inactive ones. You can make the cell inactive by pressing Esc key.
Hover the mouse over the inactive cell and click or press ENTER key to make it active.
To create a new cell, press + icon at the top or press B on keyboard to add a cell below or A
to add a cell above the current cell.
To remove a cell, press D twice, indicates number of the command
executed.
# Represents comments
# Like R, it recognizes numbers but not the valueless variables.
print(‘Bismillah') # You can print a value to the screen using “print.”
# = is used as an assignment operator.
A=1
A # will print the contents of A
type (A) # tells the class of the object A, we created above
Variable types don’t need to be declared. Python figures out the variable types on its own.
Variable Names are case sensitive and cannot start with a number. They can contain
letters, numbers, and underscores.
a, b, c = 17, 3.14, "test" # We can assign values to multiple variables at a time.
+ operator to concatenate (join) two strings
a = "hello "
b = "world"
print (a + b)
Python Data Structures
• Integers: Whole numbers e.g. 2, 3, 5, 0, -1
• Floats: Numbers with decimal point. e.g. 1.50
• Strings: Either single ('') or double ("") or triple quotes (""" or ''') can be used. For example,
“python” and ‘python’ are same strings. Unmatched ones can occur within the string.
"datatype’s"
• Tuple ( ) A collection of different things. Tuples are “immutable”, i.e., they cannot be modified
after creation.
myTuple = ('abc', 2.5, A)
myTuple[2] # Shall return the value at 3rd index position starting from 0.
myTuple.index(2.5) # shall tell us what is the index position of 2.5
List [ ] Lists are “mutable”, i.e., their elements can be modified.
myList = ['abc', 'def', 'ghij']
myList.append('klm')
myList
myList.count('def') # shall count the occurrences of def in a list
myList2 = [1,2,3]
myList3 = [4,5,6]
myList2 + myList3
Array [ ] vectors (1d) and matrices (>1d) , for numerical data manipulation are defined in
numpy. We need to import numpy to our python session.
import numpy as np # that’s what the community does, we can access any function like
np.array etc you may use any variable name here, we may also do import numpy but then
we need to access the functions like numpy.array etc. Lists are containers for elements
having differing data types but arrays are used as containers for elements of the same data
type.
myArray2 = np.array(myList2)
myArray3 = np.array(myList3)
myArray2 + myArray3
myArray2.dot(myArray3)
Data Analysis in python
To perform various tasks, a set of instructions is combined into functions. A function is
defined by the keyword def, and can be defined anywhere. A combination of various
functions are put together as Modules which then constitute Packages.
import pandas as pd #Pandas library is designed for quick and easy data manipulation, reading,
aggregation, visualization.
import numpy as np #NumPy is used to process arrays that store values of the same datatype. It
facilitates math operations on arrays and their vectorization.
import matplotlib.pyplot as plt #To plot the histograms and other statistical graphs
Working Directory Setting:
pwd # will tell us where we are (same as linux)
ls # will tell us what we have there (files and folders)
dir() # will tell us what variables (objects) we have try vars()
cd # we can change directory
import os #OS module functions for creating and removing a directory (folder), fetching its
contents, changing and identifying the current directory. Right way of doing that as in other
python IDEs above commands might not work.
os.chdir("C:/Desktop/DataAnalysis/") # shall change to directory to the required directory
os.getcwd() # To make sure we are at the right place
#Reading Dataframe exported previously
combined_PRAD = pd.read_csv("PRAD_labeled.csv", index_col=0)
#Tells you what type of variable it is.
type(combined_PRAD)
#Description about variables contained by the dataframe
combined_PRAD.info()
# gives the total size of a dataframe by multiplying the rows with
columns.
combined_PRAD.size
# tells about the shape of data, how many rows and columns present.
combined_PRAD.shape
# Gives you the information about the dimensions of dataset.
combined_PRAD.ndim
#Outputs the first five rows of the data
combined_PRAD.head()
#Gives the list of last 10 rows in the dataset
combined_PRAD.tail(10)
# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'label'.
combined_PRAD['labels']=
label_encoder.fit_transform(combined_PRAD['labels'])
#To have a list of unique enteries.
combined_PRAD['labels'].unique()
#counting the number of classes
combined_PRAD["labels"].value_counts()
#Assigning the numerical data to a "X" variable and labels column into
a "y" variable that will be used in the next steps
X = combined_PRAD.iloc[:,:-1]
y = combined_PRAD["labels"]
#importing train_test_split
from sklearn.model_selection import train_test_split
X_train, X_test ,Y_train, Y_test = train_test_split(X,y,test_size
=0.30, random_state=42)
from sklearn.preprocessing import StandardScaler
sc= StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#########Plotting TSNE plot to check whether problem is linear or
not#######
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.manifold import TSNE
fig, ax = plt.subplots()
m = TSNE(learning_rate=50)
X_tsne = m.fit_transform(X)
combined_PRAD["y"] = Y_train
combined_PRAD["comp-1"] = X_tsne[:,0]
combined_PRAD["comp-2"] = X_tsne[:,1]
sns.scatterplot(x="comp-1", y="comp-2", hue=combined_PRAD.y.tolist(),
palette=sns.color_palette('husl', 2),
data=combined_PRAD).set(title="Cancer data T-SNE
projection")
plt.savefig("TSNE-plot.png", dpi = 600)
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.neighbors import KNeighborsClassifier
KNN = KNeighborsClassifier(n_neighbors=7, metric='minkowski', p=1)
KNN.fit(X_train,Y_train)
# predict samples in the test set
prediction = KNN.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your KNN model and have X_test and Y_test
# KNN is your trained K Nearest Neighbors model
# Get predictions from the KNN model
Y_pred = KNN.predict(X_test)
# Create the confusion matrix
cm = confusion_matrix(Y_test, Y_pred)
# Display the confusion matrix using ConfusionMatrixDisplay
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=KNN.classes_) # KNN.classes_ contains your class
labels
disp.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("KNN.png", dpi=600)
plt.show()
# roc curve and auc
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score
from matplotlib import pyplot
KNN_probs = KNN.predict_proba(X_test)
KNN_probs = KNN_probs[:, 1]
KNN_auc = roc_auc_score(Y_test, KNN_probs)
KNN_fpr, KNN_tpr, _ = roc_curve(Y_test, KNN_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_fpr, KNN_tpr ,label='KNN =%.3f' % (KNN_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('Precision', fontsize=15)
pyplot.xlabel('Recall', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("PR_curve.png", dpi = 600)
pyplot.show()
#SVC_linear
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear', probability=True, random_state=40)
svm_linear.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_linear.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM linear model and have X_test and
Y_test
# svm_linear is your trained Support Vector Machine with linear kernel
model
# Get predictions from the SVM linear model
Y_pred_svm_linear = svm_linear.predict(X_test)
# Create the confusion matrix
cm_svm_linear = confusion_matrix(Y_test, Y_pred_svm_linear)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_svm_linear =
ConfusionMatrixDisplay(confusion_matrix=cm_svm_linear,
display_labels=svm_linear.classes_)
disp_svm_linear.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_linear.png", dpi=600)
plt.show()
svm_linear_probs = svm_linear.predict_proba(X_test)
svm_linear_probs = svm_linear_probs[:, 1]
svm_linear_auc = roc_auc_score(Y_test, svm_linear_probs)
svm_linear_fpr, svm_linear_tpr, _ = roc_curve(Y_test,
svm_linear_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_linear_fpr, svm_linear_tpr ,label='SVM_Linear =%.3f' %
(svm_linear_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
#SVC_poly
from sklearn.svm import SVC
# Training a SVM classifier using SVC polynomial
svm_poly = SVC(kernel='poly', probability=True, random_state=40)
svm_poly.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_poly.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
Y_pred_svm_poly = svm_poly.predict(X_test)
# Create the confusion matrix
cm_svm_poly = confusion_matrix(Y_test, Y_pred_svm_poly)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_svm_poly = ConfusionMatrixDisplay(confusion_matrix=cm_svm_poly,
display_labels=svm_poly.classes_)
disp_svm_poly.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_poly.png", dpi=600)
plt.show()
svm_poly_probs = svm_poly.predict_proba(X_test)
svm_poly_probs = svm_poly_probs[:, 1]
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_poly_fpr, svm_poly_tpr, _ = roc_curve(Y_test, svm_poly_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_poly_fpr, svm_poly_tpr ,label='SVM_poly =%.3f' %
(svm_poly_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
#SVC_RBF
from sklearn.svm import SVC
# Training a SVM classifier using SVC class
svm_rbf = SVC(kernel='rbf', probability=True, random_state=40)
svm_rbf.fit(X_train,Y_train).decision_function(X_test)
prediction = svm_rbf.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_svm_rbf = svm_rbf.predict(X_test)
# Create the confusion matrix
cm_svm_rbf = confusion_matrix(Y_test, Y_pred_svm_rbf)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_svm_rbf = ConfusionMatrixDisplay(confusion_matrix=cm_svm_rbf,
display_labels=svm_rbf.classes_)
disp_svm_rbf.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("svm_rbf.png", dpi=600)
plt.show()
svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_rbf_probs = svm_rbf_probs[:, 1]
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)
svm_rbf_fpr, svm_rbf_tpr, _ = roc_curve(Y_test, svm_rbf_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(svm_rbf_fpr, svm_rbf_tpr ,label='SVM_rbf =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
from sklearn.linear_model import LogisticRegression
LR = LogisticRegression()
LR.fit(X_train,Y_train).decision_function(X_test)
prediction = LR.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_LR = LR.predict(X_test)
# Create the confusion matrix
cm_LR = confusion_matrix(Y_test, Y_pred_LR)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_LR = ConfusionMatrixDisplay(confusion_matrix=cm_LR,
display_labels=LR.classes_)
disp_LR.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("LR.png", dpi=600)
plt.show()
LR_probs = LR.predict_proba(X_test)
LR_probs = LR_probs[:, 1]
LR_auc = roc_auc_score(Y_test, LR_probs)
LR_fpr, LR_tpr, _ = roc_curve(Y_test, LR_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(LR_fpr, LR_tpr ,label='LR =%.3f' % (LR_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
#Naive bayes
from sklearn.naive_bayes import GaussianNB
NB = GaussianNB()
NB.fit(X_train, Y_train)
prediction = NB.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_NB = NB.predict(X_test)
# Create the confusion matrix
cm_NB = confusion_matrix(Y_test, Y_pred_NB)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_NB = ConfusionMatrixDisplay(confusion_matrix=cm_NB,
display_labels=NB.classes_)
disp_NB.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("NB.png", dpi=600)
plt.show()
NB_probs = NB.predict_proba(X_test)
NB_probs = NB_probs[:, 1]
NB_auc = roc_auc_score(Y_test, NB_probs)
NB_fpr, NB_tpr, _ = roc_curve(Y_test, NB_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(NB_fpr, NB_tpr ,label='NB =%.3f' % (NB_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
#DECISION TREE CLASSIFIER
from sklearn.tree import DecisionTreeClassifier
DT= DecisionTreeClassifier(random_state=0)
DT.fit(X_train, Y_train)
prediction = DT.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_DT = DT.predict(X_test)
# Create the confusion matrix
cm_DT = confusion_matrix(Y_test, Y_pred_DT)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_DT = ConfusionMatrixDisplay(confusion_matrix=cm_DT,
display_labels=DT.classes_)
disp_DT.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("DT.png", dpi=600)
plt.show()
DT_probs = DT.predict_proba(X_test)
DT_probs = DT_probs[:, 1]
DT_auc = roc_auc_score(Y_test, DT_probs)
DT_fpr, DT_tpr, _ = roc_curve(Y_test, DT_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(DT_fpr, DT_tpr ,label='DT =%.3f' % (DT_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
from sklearn.neural_network import MLPClassifier
MLP = MLPClassifier()
MLP.fit(X_train, Y_train)
prediction = MLP.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_MLP = MLP.predict(X_test)
# Create the confusion matrix
cm_MLP = confusion_matrix(Y_test, Y_pred_MLP)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_MLP = ConfusionMatrixDisplay(confusion_matrix=cm_MLP,
display_labels=MLP.classes_)
disp_MLP.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("MLP.png", dpi=600)
plt.show()
MLP_probs = MLP.predict_proba(X_test)
MLP_probs = MLP_probs[:,1]
MLP_auc = roc_auc_score(Y_test, MLP_probs)
MLP_fpr, MLP_tpr, _ = roc_curve(Y_test, MLP_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(MLP_fpr, MLP_tpr ,label='MLP =%.3f' % (MLP_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
from sklearn.ensemble import AdaBoostClassifier
# define the model
AB = AdaBoostClassifier()
AB.fit(X_train, Y_train)
prediction = AB.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_AB = AB.predict(X_test)
# Create the confusion matrix
cm_AB = confusion_matrix(Y_test, Y_pred_AB)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_AB = ConfusionMatrixDisplay(confusion_matrix=cm_AB,
display_labels=AB.classes_)
disp_AB.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("AB.png", dpi=600)
plt.show()
AB_probs = AB.predict_proba(X_test)
AB_probs = AB_probs[:, 1]
AB_auc = roc_auc_score(Y_test, AB_probs)
AB_fpr, AB_tpr, _ = roc_curve(Y_test, AB_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(AB_fpr, AB_tpr ,label='AB =%.3f' % (AB_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
from sklearn.ensemble import RandomForestClassifier
# define the model
RF = RandomForestClassifier()
RF.fit(X_train, Y_train)
prediction = RF.predict(X_test)
Accuracy = accuracy_score(Y_test,prediction)
print('accuracy',Accuracy)
#Printing precision, recall and f1_scores
print(precision_score(Y_test,prediction))
print(recall_score(Y_test,prediction))
print(f1_score(Y_test,prediction))
# Assuming you have trained your SVM poly model and have X_test and
Y_test
# svm_poly is your trained SVM with polynomial kernel model
# Get predictions from the SVM poly model
Y_pred_RF = RF.predict(X_test)
# Create the confusion matrix
cm_RF = confusion_matrix(Y_test, Y_pred_RF)
# Display the confusion matrix using ConfusionMatrixDisplay
disp_RF = ConfusionMatrixDisplay(confusion_matrix=cm_RF,
display_labels=RF.classes_)
disp_RF.plot(cmap=plt.cm.Greens)
plt.xlabel('Predicted label', color='black')
plt.ylabel('True label', color='black')
plt.gca().tick_params(axis='x', colors='white') # Modify ticks on x-
axis
plt.gca().tick_params(axis='y', colors='white') # Modify ticks on y-
axis
plt.gcf().set_size_inches(10, 6)
plt.savefig("RF.png", dpi=600)
plt.show()
RF_probs = RF.predict_proba(X_test)
RF_probs = RF_probs[:,1]
RF_auc = roc_auc_score(Y_test, RF_probs)
RF_fpr, RF_tpr, _ = roc_curve(Y_test, RF_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(RF_fpr, RF_tpr ,label='RF =%.3f' % (RF_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
#plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
#####Combined AUROC and PR Curves
KNN_probs = KNN.predict_proba(X_test)
AB_probs = AB.predict_proba(X_test)
DT_probs = DT.predict_proba(X_test)
LR_probs = LR.predict_proba(X_test)
RF_probs = RF.predict_proba(X_test)
NB_probs = NB.predict_proba(X_test)
MLP_probs = MLP.predict_proba(X_test)
svm_linear_probs =svm_linear.predict_proba(X_test)
svm_rbf_probs = svm_rbf.predict_proba(X_test)
svm_poly_probs = svm_poly.predict_proba(X_test)
# keep probabilities for the positive outcome only
KNN_probs = KNN_probs[:, 1]
AB_probs = AB_probs[:, 1]
DT_probs = DT_probs[:, 1]
LR_probs = LR_probs[:, 1]
RF_probs = RF_probs[:, 1]
NB_probs = NB_probs[:, 1]
MLP_probs = MLP_probs[:,1]
svm_linear_probs = svm_linear_probs[:, 1]
svm_poly_probs = svm_poly_probs[:, 1]
svm_rbf_probs = svm_rbf_probs[:, 1]
# calculate scores
KNN_auc = roc_auc_score(Y_test, KNN_probs)
AB_auc = roc_auc_score(Y_test, AB_probs)
DT_auc = roc_auc_score(Y_test, DT_probs)
LR_auc = roc_auc_score(Y_test, LR_probs)
NB_auc = roc_auc_score(Y_test, NB_probs)
RF_auc = roc_auc_score(Y_test, RF_probs)
MLP_auc = roc_auc_score(Y_test, MLP_probs)
svm_poly_auc = roc_auc_score(Y_test, svm_poly_probs)
svm_linear_auc = roc_auc_score(Y_test, svm_linear_probs)
svm_rbf_auc = roc_auc_score(Y_test, svm_rbf_probs)
# calculate roc curves
KNN_fpr, KNN_tpr, _ = roc_curve(Y_test, KNN_probs)
AB_fpr, AB_tpr, _ = roc_curve(Y_test, AB_probs)
DT_fpr, DT_tpr, _ = roc_curve(Y_test, DT_probs)
NB_fpr, NB_tpr, _ = roc_curve(Y_test, NB_probs)
LR_fpr, LR_tpr, _ = roc_curve(Y_test, LR_probs)
RF_fpr, RF_tpr, _ = roc_curve(Y_test, RF_probs)
MLP_fpr, MLP_tpr, _ = roc_curve(Y_test, MLP_probs)
svm_linear_fpr, svm_linear_tpr, _ = roc_curve(Y_test,
svm_linear_probs)
svm_poly_fpr, svm_poly_tpr, _ = roc_curve(Y_test, svm_poly_probs)
svm_rbf_fpr, svm_rbf_tpr, _ = roc_curve(Y_test, svm_rbf_probs)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_fpr, KNN_tpr ,label='KNN =%.3f' % (KNN_auc))
pyplot.plot(AB_fpr, AB_tpr ,label='AB =%.3f' % (AB_auc))
pyplot.plot(NB_fpr, NB_tpr ,label='NB =%.3f' % (NB_auc))
pyplot.plot(DT_fpr, DT_tpr ,label='DT =%.3f' % (DT_auc))
pyplot.plot(LR_fpr, LR_tpr ,label='LR =%.3f' % (LR_auc))
pyplot.plot(RF_fpr, RF_tpr ,label='RF =%.3f' % (RF_auc))
pyplot.plot(MLP_fpr, MLP_tpr ,label='MLP =%.3f' % (MLP_auc))
pyplot.plot(svm_linear_fpr, svm_linear_tpr ,label='SVM_Linear =%.3f' %
(svm_linear_auc))
pyplot.plot(svm_poly_fpr, svm_poly_tpr ,label='SVM_Poly =%.3f' %
(svm_poly_auc))
pyplot.plot(svm_rbf_fpr, svm_rbf_tpr ,label='SVM_RBF =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('True Positive Rate', fontsize=15)
pyplot.xlabel('False Positive Rate', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
plt.savefig("AUC_ROC.png", dpi = 600)
pyplot.show()
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_recall_curve
KNN_precision, KNN_recall, _ = precision_recall_curve(Y_test,
KNN_probs)
AB_precision, AB_recall, _ = precision_recall_curve(Y_test, AB_probs)
DT_precision, DT_recall, _ = precision_recall_curve(Y_test, DT_probs)
LR_precision, LR_recall, _ = precision_recall_curve(Y_test, LR_probs)
NB_precision, NB_recall, _ = precision_recall_curve(Y_test, NB_probs)
RF_precision, RF_recall, _ = precision_recall_curve(Y_test, RF_probs)
MLP_precision, MLP_recall, _ = precision_recall_curve(Y_test,
MLP_probs)
svm_linear_precision, svm_linear_recall, _ =
precision_recall_curve(Y_test, svm_linear_probs)
svm_poly_precision, svm_poly_recall, _ =
precision_recall_curve(Y_test, svm_poly_probs)
svm_rbf_precision, svm_rbf_recall, _ = precision_recall_curve(Y_test,
svm_rbf_probs)
from sklearn.metrics import auc
# calculate the precision-recall auc
KNN_auc = auc(KNN_recall, KNN_precision)
AB_auc = auc(AB_recall, AB_precision)
DT_auc = auc(DT_recall, DT_precision)
NB_auc = auc(NB_recall, NB_precision)
LR_auc = auc(LR_recall, LR_precision)
RF_auc = auc(RF_recall, RF_precision)
MLP_auc = auc(MLP_recall, MLP_precision)
svm_linear_auc = auc(svm_linear_recall, svm_linear_precision)
svm_poly_auc = auc(svm_poly_recall, svm_poly_precision)
svm_rbf_auc = auc(svm_rbf_recall, svm_rbf_precision)
# plot the roc curve for the model
fig = pyplot.figure(figsize=(7, 5))
pyplot.plot(KNN_precision, KNN_recall ,label='KNN =%.3f' % (KNN_auc))
pyplot.plot(AB_precision, AB_recall ,label='AB =%.3f' % (AB_auc))
pyplot.plot(NB_precision, NB_recall ,label='NB =%.3f' % (NB_auc))
pyplot.plot(DT_precision, DT_recall ,label='DT =%.3f' % (DT_auc))
pyplot.plot(LR_precision, LR_recall ,label='LR =%.3f' % (LR_auc))
pyplot.plot(RF_precision, RF_recall ,label='RF =%.3f' % (RF_auc))
pyplot.plot(MLP_precision, MLP_recall ,label='MLP =%.3f' % (MLP_auc))
pyplot.plot(svm_linear_precision, svm_linear_recall ,label='SVM_Linear
=%.3f' % (svm_linear_auc))
pyplot.plot(svm_poly_precision, svm_poly_recall ,label='SVM_Poly =
%.3f' % (svm_poly_auc))
pyplot.plot(svm_rbf_precision, svm_rbf_recall ,label='SVM_RBF =%.3f' %
(svm_rbf_auc))
params = {'legend.fontsize': 10,
'legend.handlelength': 2}
pyplot.rcParams.update(params)
pyplot.legend()
pyplot.ylabel('Precision', fontsize=15)
pyplot.xlabel('Recall', fontsize=15)
pyplot.xticks(fontsize=12)
pyplot.yticks(fontsize=12)
#Show legend
pyplot.legend() #
plt.savefig("PR_curve.png", dpi = 600)
pyplot.show()