ADS - Phase 3
ADS - Phase 3
1. LOGISTIC REGRESSION
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report,
confusion_matrix
#scikit-learn
value1, y = make_classification(
n_features=6,
n_classes=2,
n_samples=800,
n_informative=2,
random_state=66,
n_clusters_per_class=1,
##. This code imports the make_classification function from the sklearn.datasets module.
##• The make_classification function generates a random dataset for classification tasks.
##• The function takes several arguments: n_features: the number of features (or independent
variables) in the dataset.
##• n_classes: the number of classes (or target variables) in the dataset.
##• These are the features that actually influence the target variable.
##• The function returns two arrays: X: an array of shape (n_samples, n_features) containing the
features of the dataset.
##• y: an array of shape (n_samples,) containing the target variable of the dataset.
plt.show()
##This code imports the matplotlib.pyplot module and creates a scatter plot using the scatter() function.
##• The X and y variables are assumed to be previously defined arrays or data frames.
##• The scatter() function takes three arguments: X[:, 0] and X[:, 1] are the first and second columns of
the X array, respectively, and c=y assigns a color to each point based on the corresponding value in the y
array.
##• The marker argument specifies the shape of the marker used for each point, in this case, an asterisk.
##• The resulting plot will have the values in the first column of X on the x-axis, the values in the second
column of X on the y-axis, and each point will be colored based on the corresponding value in y.
##This code imports the train_test_split function from the sklearn.model_selection module.
##• This function is used to split the dataset into training and testing sets.
##• The train_test_split function takes four arguments: X, y, test_size, and random_state.
##• X and y are the input features and target variable, respectively.
##• test_size is the proportion of the dataset that should be allocated to the testing set.
##• In this case, it is set to 0.33, which means that 33% of the data will be used for testing.
##• random_state is used to set the seed for the random number generator, which ensures that the
same random split is generated each time the code is run.
##• The function returns four variables: X_train, X_test, y_train, and y_test.
##• X_train and y_train are the training set, while X_test and y_test are the testing set.
##• These variables can be used to train and evaluate a machine learning model.
model = GaussianNB()
# Model training
model.fit(X_train, y_train)
# Predict Output
predicted = model.predict([X_test[6]])
##This code uses the scikit-learn library to build a Gaussian Naive Bayes classifier.
##• First, the code imports the GaussianNB class from the sklearn.naive_bayes module.
##• Next, a new instance of the GaussianNB class is created and assigned to the variable 'model'.
##• The model is then trained using the fit() method, which takes in the training data X_train and the
corresponding target values y_train.
##• After the model is trained, it is used to predict the output for a single test data point, which is the
7th element in the X_test array.
##• Finally, the actual value for the test data point is printed using y_test[6], and the predicted value is
printed using predicted[0].
#---------------
accuracy_score,
confusion_matrix,
ConfusionMatrixDisplay,
f1_score,
y_pred = model.predict(X_test)
print("Accuracy:", accuray)
##This code imports several functions from the sklearn.metrics module, including accuracy_score,
confusion_matrix, ConfusionMatrixDisplay, and f1_score.
##• These functions are used to evaluate the performance of a machine learning model.
##• The code then uses the model.predict method to generate predictions for the test data (X_test).
##• These predictions are compared to the actual labels (y_test) using the accuracy_score and f1_score
functions.
##• The accuracy_score function calculates the accuracy of the model's predictions,
## while the f1_score function calculates the F1 score, which is a weighted average of precision and
recall.
##• Finally, the code prints out the accuracy and F1 score of the model's predictions.
#-----------------------------
#####Expected output
####
####Accuracy: 0.8484848484848485
####This code snippet is not actually a code, but rather the output of some code that was run.
####• It shows the accuracy and F1 score of a model that was trained on some data.
## which means that the model correctly predicted the outcome of 84.8% of the cases.
#### which is a measure of the model's accuracy that takes into account both precision and recall.
#------------------------------------------------
labels = [0,1,2]
print(cm)
disp.plot()
########This code is using the scikit-learn library to create a confusion matrix and display it using
ConfusionMatrixDisplay.
########• Then, the confusion_matrix function is called with the test labels (y_test) and predicted
labels (y_pred) as inputs, along with the labels list.
########• Next, a ConfusionMatrixDisplay object is created with the confusion matrix as input, along
with the labels list.
########• Finally, the plot method is called on the display object to show the confusion matrix
graphically.
# Run this program on your local python
import numpy as np
import pandas as pd
def importdata():
balance_data = pd.read_csv(
'https://archive.ics.uci.edu/ml/machine-learning-' +
'databases/balance-scale/balance-scale.data',
sep=',', header=None)
return balance_data
def splitdataset(balance_data):
# Separating the target variable
X = balance_data.values[:, 1:5]
Y = balance_data.values[:, 0]
X, Y, test_size=0.3, random_state=100)
clf_gini = DecisionTreeClassifier(criterion="gini",
# Performing training
clf_gini.fit(X_train, y_train)
return clf_gini
clf_entropy = DecisionTreeClassifier(
criterion="entropy", random_state=100,
max_depth=3, min_samples_leaf=5)
# Performing training
clf_entropy.fit(X_train, y_train)
return clf_entropy
y_pred = clf_object.predict(X_test)
print("Predicted values:")
print(y_pred)
return y_pred
confusion_matrix(y_test, y_pred))
print("Accuracy : ",
print("Report : ",
classification_report(y_test, y_pred))
# Driver code
def main():
# Building Phase
data = importdata()
# Operational Phase
cal_accuracy(y_test, y_pred_gini)
cal_accuracy(y_test, y_pred_entropy)
if __name__ == "__main__":
main()
3. CREDITCARD FRAUD CSV IMPORT
Conclusion
In conclusion, preprocessing data before applying it to a machine learning
The above code example demonstrates how to preprocess data using the
popular Python library, Pandas, but there are many other libraries available for