FAIRFIELD INSTITUTE OF MANAGEMENT
& TECHNOLOGY
(Affiliated to GGSIPU University, an ‘A’ Grade college by DHE, GOVT. Of NCT
Delhi)
SUBJECT NAME:- MACHINE LEARNING WITH
PYTHON LAB FILE
SUBJECT CODE:- BCAP 311
SUBMITTED TO SUBMITTED BY
[Link] JOSHI NIKHIL KUMAR
ASSISTANT PROFESSOR 01290102021
IT DEPARTMENT B.C.A 5TH SEMESTER
LIST OF PRACTICALS
[Link] PRACTICALS PAGE NO. [Link]
1. Extract the data from the database 1
using python.
2. Write a program to implement linear 2-4
and logistic regression.
Write a program to implement the
3. naïve Bayesian classifier for a sample
training data set stored as a .CSV file. 5-6
Compute the accuracy of the classifier,
considering few test data sets.
4. Write a program to implement k-
nearest neighbors (KNN) and Support
Vector Machine (SVM) Algorithm for 7-8
classification.
5. Implement classification of a given 9
dataset using random forest.
Build an Artificial Neural Network
6. (ANN) by implementing the Back
propagation algorithm and test the 10 - 11
same using appropriate data sets.
Apply k-Means algorithm k-Means
algorithm to cluster a set of data stored
in a. CSV file. Use the same data set for
7. clustering using the k-Means 12 - 13
algorithm.
Compare the results of these two
algorithms and comment on the quality
of clustering. You can add Python ML
library classes in the program.
8. Write a program to implement Self - 14 - 15
Organizing Map (SOM).
9. Write a program for empirical 16 - 17
comparison of different supervised
learning algorithms.
10. Write a program for empirical 18 - 19
comparison of different unsupervised
learningalgorithms
1) Extract the data from the database using python.
this is our data in SampleDB database now we will write a code fro extract the data using python.
CODE:-
import [Link]
myconn = [Link](host = "localhost",
user = "root",passwd = "test",database="SampleDB")
cur = [Link]()
[Link]("select * from STUDENTS")
result = [Link]()
print("Student Details are :")
for x in result:
print(x)
[Link]()
[Link]()
OUTPUT:-
1
2) Write a program to implement linear and logistic regression.
a) Linear regression:-
import numpy as nmp
import [Link] as mtplt
def estimate_coeff(p, q):
n1 = [Link](p)
m_p = [Link](p)
m_q = [Link](q)
SS_pq = [Link](q * p) - n1 * m_q * m_p
SS_pp = [Link](p * p) - n1 * m_p * m_p
b_1 = SS_pq / SS_pp
b_0 = m_q - b_1 * m_p
return (b_0, b_1)
def plot_regression_line(p, q, b):
[Link](p, q, color = "m",
marker = "o", s = 30)
q_pred = b[0] + b[1] * p
[Link](p, q_pred, color = "g")
[Link]('p')
[Link]('q')
[Link]()
def main():
p = [Link]([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
q = [Link]([11, 13, 12, 15, 17, 18, 18, 19, 20, 22])
b = estimate_coeff(p, q)
print("Estimated coefficients are :\nb_0 = {}\
\nb_1 = {}".format(b[0], b[1]))
plot_regression_line(p, q, b)
if __name__ == "__main__":
main()
2
OUTPUT:-
3
b) Logistic regression:-
Now we have a logistic regression object that is ready to whether a tumor is cancerous
based on the tumor size:
CODE:-
import numpy
from sklearn import linear_model
X = [Link]([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69,
5.88]).reshape(-1,1)
y = [Link]([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
[Link](X,y)
print(predicted)
OUTPUT:-
We have predicted that a tumor with a size of 3.46mm will not be cancerous.
4
3) Write a program to implement the naïve Bayesian classifier for a sample
training data set stored as a .CSV file. Compute the accuracy of the classifier,
considering few test data sets.
CODE:-
import pandas as pd
from sklearn import tree
from [Link] import LabelEncoder
from sklearn.naive_bayes import GaussianNB
data = pd.read_csv('[Link]')
print("The first 5 values of data is :\n",[Link]())
X = [Link][:,:-1]
print("\nThe First 5 values of train data is\n",[Link]())
y = [Link][:,-1]
print("\nThe first 5 values of Train output is\n",[Link]())
le_outlook = LabelEncoder()
[Link] = le_outlook.fit_transform([Link])
le_Temperature = LabelEncoder()
[Link] = le_Temperature.fit_transform([Link])
le_Humidity = LabelEncoder()
[Link] = le_Humidity.fit_transform([Link])
le_Windy = LabelEncoder()
[Link] = le_Windy.fit_transform([Link])
print("\nNow the Train data is :\n",[Link]())
le_PlayTennis = LabelEncoder()
y = le_PlayTennis.fit_transform(y)
print("\nNow the Train output is\n",y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.20)
classifier = GaussianNB()
[Link](X_train,y_train)
from [Link] import accuracy_score
print("Accuracy is:",accuracy_score([Link](X_test),y_test))
5
OUTPUT:-
6
4) Write a program to implement k-nearest neighbors (KNN) and Support
Vector Machine (SVM) Algorithm for classification.
CODE:-
from sklearn.model_selection import train_test_split
from [Link] import KNeighborsClassifier
from sklearn import svm
from sklearn import datasets
from [Link] import accuracy_score
# Load the Iris dataset
iris = datasets.load_iris()
X = [Link]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# K-Nearest Neighbors (KNN) classifier
def knn_classifier(X_train, y_train, X_test, k):
knn = KNeighborsClassifier(n_neighbors=k)
[Link](X_train, y_train)
predictions = [Link](X_test)
return predictions
# Support Vector Machine (SVM) classifier
def svm_classifier(X_train, y_train, X_test):
svm_model = [Link]()
svm_model.fit(X_train, y_train)
predictions = svm_model.predict(X_test)
return predictions
k_value = 7
# Make predictions using KNN
knn_predictions = knn_classifier(X_train, y_train, X_test, k_value)
# Make predictions using SVM
svm_predictions = svm_classifier(X_train, y_train, X_test)
knn_accuracy = accuracy_score(y_test, knn_predictions)
svm_accuracy = accuracy_score(y_test, svm_predictions)
print(“ ”)
print(f"K-Nearest Neighbors (KNN) Accuracy: {knn_accuracy:.2f}")
7
print(“ ”)
print(f"Support Vector Machine (SVM) Accuracy: {svm_accuracy:.2f}")
OUTPUT:-
8
5) Implement classification of a given dataset using random forest.
CODE:-
from sklearn.model_selection import train_test_split
from [Link] import RandomForestClassifier
from sklearn import datasets
from [Link] import accuracy_score
iris = datasets.load_iris()
X = [Link]
y = [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
def random_forest_classifier(X_train, y_train, X_test, n_estimators):
rf_model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
rf_model.fit(X_train, y_train)
predictions = rf_model.predict(X_test)
return predictions
num_trees = 34
rf_predictions = random_forest_classifier(X_train, y_train, X_test, num_trees)
rf_accuracy = accuracy_score(y_test, rf_predictions)
print(f"Random Forest Accuracy: {rf_accuracy:.2f}")
OUTPUT:-
9
6) Build an Artificial Neural Network (ANN) by implementing the Back
propagation algorithm and test the same using appropriate data sets.
CODE:-
import numpy as np
def sigmoid(x):
return 1 / (1 + [Link](-x))
def sigmoid_derivative(x):
return x * (1 - x)
def initialize_weights(input_size, hidden_size, output_size):
input_hidden_weights = 2 * [Link](input_size, hidden_size) - 1
hidden_output_weights = 2 * [Link](hidden_size, output_size) - 1
return input_hidden_weights, hidden_output_weights
def train_neural_network(X, y, epochs, learning_rate):
input_size = [Link][1]
hidden_size = 3
output_size = 1
input_hidden_weights, hidden_output_weights = initialize_weights(input_size, hidden_size,
output_size)
for epoch in range(epochs):
hidden_layer_input = [Link](X, input_hidden_weights)
hidden_layer_output = sigmoid(hidden_layer_input)
output_layer_input = [Link](hidden_layer_output, hidden_output_weights)
predicted_output = sigmoid(output_layer_input)
error = y - predicted_output
output_error = error * sigmoid_derivative(predicted_output)
hidden_layer_error = output_error.dot(hidden_output_weights.T) *
sigmoid_derivative(hidden_layer_output)
hidden_output_weights += hidden_layer_output.[Link](output_error) * learning_rate
input_hidden_weights += [Link](hidden_layer_error) * learning_rate
if epoch % 1000 == 0:
print(f'Epoch {epoch}, Loss: {[Link]([Link](error))}')
10
return input_hidden_weights, hidden_output_weights
X = [Link]([[0, 0], [0, 1], [1, 0], [1, 1]])
y = [Link]([[0], [1], [1], [1]])
epochs = 10000
learning_rate = 0.1
trained_input_hidden_weights,trained_hidden_output_weights = train_neural_network(X, y,
epochs, learning_rate)
hidden_layer_output = sigmoid([Link](X, trained_input_hidden_weights))
predicted_output=sigmoid([Link](hidden_layer_output, trained_hidden_output_weights))
print("Predicted Output:")
print(predicted_output)
OUTPUT:-
11
7) Apply k-Means algorithm k-Means algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for clustering using the k-Means
algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Python ML library classes in the program.
CODE:-
import [Link] as plt
from sklearn import datasets
from [Link] import KMeans
import [Link] as sm
import pandas as pd
import numpy as np
iris = datasets.load_iris()
X = [Link]([Link])
[Link] = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = [Link]([Link])
[Link] = ['Targets']
model = KMeans(n_clusters=3)
[Link](X)
[Link](figsize=(14,7))
colormap = [Link](['red', 'lime', 'black'])
[Link](1, 2, 1)
[Link](X.Petal_Length, X.Petal_Width, c=colormap[[Link]], s=40)
[Link]('Real Classification')
[Link]('Petal Length')
[Link]('Petal Width')
# Plot the Models Classifications
[Link](1, 2, 2)
[Link](X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
[Link]('K Mean Classification')
[Link]('Petal Length')
[Link]('Petal Width')
print(“ ”)
print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))
12
print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y, model.labels_))
from sklearn import preprocessing
scaler = [Link]()
[Link](X)
xsa = [Link](X)
xs = [Link](xsa, columns = [Link])
from [Link] import GaussianMixture
gmm = GaussianMixture(n_components=3)
[Link](xs)
y_gmm = [Link](xs)
#y_cluster_gmm
[Link](2, 2, 3)
[Link](X.Petal_Length, X.Petal_Width, c=colormap[y_gmm], s=40)
[Link]('GMM Classification')
[Link]('Petal Length')
[Link]('Petal Width')
print("")
print('The accuracy score of EM: ',sm.accuracy_score(y, y_gmm))
print('The Confusion matrix of EM: ',sm.confusion_matrix(y, y_gmm))
OUTPUT:-
13
8) Write a program to implement Self - Organizing Map (SOM).
CODE:-
import numpy as np
import [Link] as plt
class SelfOrganizingMap:
def __init__(self, input_size, map_size):
self.input_size = input_size
self.map_size = map_size
[Link] = [Link](map_size[0], map_size[1], input_size)
def find_best_matching_unit(self, input_vector):
distances = [Link]([Link] - input_vector, axis=-1)
bmu_index = np.unravel_index([Link](distances), [Link])
return bmu_index
def update_weights(self, input_vector, bmu_index, learning_rate, radius):
for i in range(self.map_size[0]):
for j in range(self.map_size[1]):
distance = [Link]([Link]([i, j]) - [Link](bmu_index))
if distance <= radius:
influence = [Link](-(distance ** 2) / (2 * radius ** 2))
[Link][i, j, :] += learning_rate * influence * (input_vector - [Link][i, j, :])
def train(self, data, epochs=100, initial_learning_rate=0.1, initial_radius=None):
if initial_radius is None:
initial_radius = max(self.map_size) / 2
for epoch in range(epochs):
for input_vector in data:
bmu_index = self.find_best_matching_unit(input_vector)
learning_rate = initial_learning_rate * (1 - epoch/epochs)
radius = initial_radius * [Link](-epoch/epochs)
self.update_weights(input_vector, bmu_index, learning_rate, radius)
def visualize(self, data):
[Link](data[:, 0], data[:, 1], c='b', marker='o', label='Input Data')
[Link]([Link][:, :, 0], [Link][:, :, 1], c='r', marker='x', label='SOM Neurons')
[Link]()
14
[Link]()
if __name__ == "__main__":
[Link](42)
data = [Link](100, 2)
som = SelfOrganizingMap(input_size=2, map_size=(5, 5))
[Link](data, epochs=100)
[Link](data)
OUTPUT:-
15
9) Write a program for empirical comparison of different supervised
learning algorithms.
CODE:-
import numpy as np
import [Link] as plt
from [Link] import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from [Link] import DecisionTreeClassifier
from [Link] import SVC
from [Link] import RandomForestClassifier
from [Link] import accuracy_score
cancer = load_breast_cancer()
X, y = [Link], [Link]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
classifiers = {
'Logistic Regression': LogisticRegression(),
'Decision Tree': DecisionTreeClassifier(),
'Support Vector Machine': SVC(),
'Random Forest': RandomForestClassifier()
}
results = {}
for name, clf in [Link]():
[Link](X_train, y_train)
y_pred = [Link](X_test)
accuracy = accuracy_score(y_test, y_pred)
results[name] = accuracy
print(f'{name} Accuracy: {accuracy:.2f}')
names = list([Link]())
values = list([Link]())
fig, ax = [Link]()
[Link](names, values)
ax.set_ylabel('Accuracy')
16
ax.set_title('Empirical Comparison of Supervised Learning Algorithms')
[Link](rotation=45, ha='right')
[Link]()
OUTPUT:-
17
10) Write a program for empirical comparison of different unsupervised
learning algorithms.
CODE:-
import numpy as np
import [Link] as plt
from [Link] import load_breast_cancer
from [Link] import StandardScaler
from [Link] import KMeans, AgglomerativeClustering
from [Link] import GaussianMixture
from [Link] import silhouette_score
cancer = load_breast_cancer()
X, y = [Link], [Link]
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
algorithms = {
'K-Means': KMeans(n_clusters=2),
'Agglomerative Clustering': AgglomerativeClustering(n_clusters=2),
'Gaussian Mixture Model': GaussianMixture(n_components=2)
}
results = {}
for name, algorithm in [Link]():
labels = algorithm.fit_predict(X_scaled)
silhouette_avg = silhouette_score(X_scaled, labels)
results[name] = silhouette_avg
print(f'{name} Silhouette Score: {silhouette_avg:.2f}')
names = list([Link]())
values = list([Link]())
fig, ax = [Link]()
[Link](names, values)
ax.set_ylabel('Silhouette Score')
ax.set_title('Empirical Comparison of Unsupervised Learning Algorithms')
[Link](rotation=45, ha='right')
[Link]()
18
OUTPUT:-
19