0% found this document useful (0 votes)
40 views37 pages

FML LabFile 7exps

This document is a practical file submitted by five students for their Fundamentals of Machine Learning lab course. It contains four experiments implementing different machine learning algorithms: 1) Linear regression to predict relationships between seed attributes. MSE is calculated. 2) Logistic regression to classify seed length attributes. Accuracy and confusion matrix are reported. 3) KNN classification using multiple attributes. Accuracy, confusion matrix and plots are shown. 4) SVM classification is studied but code is not shown. The file includes abstracts, code samples and output for each experiment.

Uploaded by

Kunal Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views37 pages

FML LabFile 7exps

This document is a practical file submitted by five students for their Fundamentals of Machine Learning lab course. It contains four experiments implementing different machine learning algorithms: 1) Linear regression to predict relationships between seed attributes. MSE is calculated. 2) Logistic regression to classify seed length attributes. Accuracy and confusion matrix are reported. 3) KNN classification using multiple attributes. Accuracy, confusion matrix and plots are shown. 4) SVM classification is studied but code is not shown. The file includes abstracts, code samples and output for each experiment.

Uploaded by

Kunal Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 37

FML_LAB FILE

Practical File
Submitted in partial fulfillment for the evaluation of

“Fundamentals of Machine
Learning-Lab”

Submitted By:
Students Name: Kunal Saini , Aditya Jain , Aditi Jain , Ananya Tyagi , Aditya Pachisia.
Enrolment Number: 07317711621, 07617711621, 08217711621, 08817711621, 09117711621
Branch & Section: AIML(B)

Submitted To:

Dr. Sonakshi Vij

1
FML_LAB FILE

Index
S.No Details Experiment Date Grade/Evaluation Sign
No.

2
FML_LAB FILE

Experiment 1: Study and Implement Linear Regression.


Abstract:

3
FML_LAB FILE

Code: import pandas as pd


import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('Seed_Data.csv')

# Select the 'area' and 'lengthOfKernel' columns for linear regression


X = data[['area']] # Predictor variable
y = data['lengthOfKernel'] # Target variable

# Create a linear regression model


model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Perform predictions
y_pred = model.predict(X)

# Calculate the mean squared error


mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)

# Plot the linear regression line


plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel('area')
plt.ylabel('lengthOfKernel')
plt.title('Linear Regression: area vs lengthOfKernel')
plt.legend()
plt.show()

# Select the 'compactness' and 'widthOfKernel' columns for linear regression


X = data[['compactness']] # Predictor variable
y = data['widthOfKernel'] # Target variable

4
FML_LAB FILE
# Create a linear regression model
model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Perform predictions
y_pred = model.predict(X)

# Calculate the mean squared error


mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)

# Plot the linear regression line


plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel('compactness')
plt.ylabel('widthOfKernel')
plt.title('Linear Regression: compactness vs widthOfKernel')
plt.legend()
plt.show()

# Select the 'perimeter' and 'asymmetryCoefficient' columns for linear


regression
X = data[['perimeter']] # Predictor variable
y = data['asymmetryCoefficient'] # Target variable

# Create a linear regression model


model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Perform predictions
y_pred = model.predict(X)

# Calculate the mean squared error

5
FML_LAB FILE
mse = mean_squared_error(y, y_pred)
print("Mean Squared Error:", mse)

# Plot the linear regression line


plt.scatter(X, y, color='blue', label='Data')
plt.plot(X, y_pred, color='red', label='Linear Regression')
plt.xlabel(' perimeter ')
plt.ylabel(‘asymmetryCoefficient’)
plt.title('Linear Regression: perimeter vs asymmetryCoefficient’)
plt.legend()
plt.show()

Output:
Mean Squared Error: 0.019054032881784515

6
FML_LAB FILE
Mean Squared Error: 0.05962293563300296

Mean Squared Error: 2.1436398314362863


In [13]:

7
FML_LAB FILE

Experiment 2: Study and Implement Logistic Regression.


Abstract:

8
FML_LAB FILE
Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split

# Load the dataset


data = pd.read_csv('Seed_Data.csv')

# Select the ' lengthOfKernel ' and ' lengthofkernelgroove ' columns for logistic
regression
X = data[[' lengthOfKernel ']] # Predictor variable
y = data[' lengthofkernelgroove '] # Target variable

# Reshape the ‘lengthOfKernel’ feature to a 1-


dimensional array X = np.array(X).reshape(-1, 1)

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a logistic regression model


model = LogisticRegression()

# Fit the model to the training data


model.fit(X_train, y_train)

# Perform predictions on the test data


y_pred = model.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Create a confusion matrix


confusion_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")

9
FML_LAB FILE
print(confusion_mat)

# Plot the logistic regression curve


X_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_proba = model.predict_proba(X_range)[:, 1]

plt.scatter(X, y, color='blue', label='Data')


plt.plot(X_range, y_proba, color='red', label='Logistic Regression')
plt.xlabel('LengthOfKernel ')
plt.ylabel(' lengthofkernelgroove ')
plt.title('Logistic Regression: LengthOfKernel vs lengthofkernelgroove’)
plt.legend()
plt.show()

# Select the 'perimeter' and 'lengthofkernelgroove' columns for logistic regression


X = data[['perimeter']] # Predictor variable
y = data['lengthofkernelgroove'] # Target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a logistic regression model


model = LogisticRegression()

# Fit the model to the training data


model.fit(X_train, y_train)

# Perform predictions on the test data


y_pred = model.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Create a confusion matrix


confusion_mat = confusion_matrix(y_test, y_pred)

10
FML_LAB FILE
print("Confusion Matrix:")
print(confusion_mat)

# Plot the logistic regression curve


X_range = np.linspace(X.min(), X.max(), 100).reshape(-1, 1)
y_proba = model.predict_proba(X_range)[:, 1]

plt.scatter(X, y, color='blue', label='Data')


plt.plot(X_range, y_proba, color='red', label='Logistic Regression')
plt.xlabel('Perimeter')
plt.ylabel('Lengthofkernelgroove')
plt.title('Logistic Regression: Perimeter vs Lengthofkernelgroove')
plt.legend()
plt.show()

Output:
Accuracy: 0.5024390243902439
Confusion Matrix:
[[39 63]
[39 64]]

Logistic Regression: LengthOfKernel vs lengthofkernelgroove

11
FML_LAB FILE
Accuracy: 0.624390243902439
Confusion Matrix:
[[66 36]
[41 62]]

Logistic Regression: Perimeter vs Lengthofkernelgroove

12
FML_LAB FILE

Experiment 3: Study and Implement K Nearest Neighbour (KNN).


Abstract:

13
FML_LAB FILE

Code :
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the dataset


data = pd.read_csv('Seed_Data.csv')

# Select the relevant features and lengthofkernelgroove variable


X = data[['perimeter', 'area', 'lengthOfKernel ', 'asymmetryCoefficient', 'compactness']]
y = data['lengthofkernelgroove']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a KNN classifier with k=5


knn = KNeighborsClassifier(n_neighbors=5)

# Fit the model to the training data


knn.fit(X_train, y_train)

# Perform predictions on the test data


y_pred = knn.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Create a confusion matrix


confusion_mat = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(confusion_mat)

# Scatter plot of perimeter vs. lengthOfKernel


plt.scatter(data['perimeter'], data['lengthOfKernel '], c=data['lengthof kernelgroove'],
cmap='coolwarm')
plt.xlabel('Perimeter')
plt.ylabel('LengthOfKernel')
plt.title('KNN')
plt.colorbar(label='Lengthofkernelgroove')
plt.show()

14
FML_LAB FILE
# Histogram of asymmetryCoefficient (maximum heart rate achieved)
plt.hist(data[data['lengthofkernelgroove'] == 0]['asymmetryCoefficient'],
bins=30, alpha=0.5, label='Lengthofkernelgroove 0')
plt.hist(data[data['lengthofkernelgroove'] == 1]['asymmetryCoefficient'],
bins=30, alpha=0.5, label='Lengthofkernelgroove 1')
plt.xlabel('asymmetryCoefficient')
plt.ylabel('Frequency')
plt.title('KNN')
plt.legend()
plt.show()

Output:

15
FML_LAB FILE

16
FML_LAB FILE

KNN

asymmetryCoefficient

17
Experiment 4: Study and Implement classification
using SVM.
Abstract:

17
Code:
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load the CSV file into a pandas dataframe


df = pd.read_csv('Seed_Data.csv')

# Split the data into features and


lengthofkernelgroove variable X =
df.drop(['compactness'], axis=1) # Features
y = df['compactness'] # Target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the linear regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

# Evaluate the performance of the model using the mean squared error and R-
squared metric
print("Mean squared error:", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))

# Train the SVM model


model = SVR(kernel='linear')
model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)

18
# Evaluate the performance of the model using the mean squared error and R-
squared metric
print("Mean squared error:", mean_squared_error(y_test, y_pred))
print("R-squared:", r2_score(y_test, y_pred))

import matplotlib.pyplot as plt


from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import r2_score

# Load the CSV file into a pandas dataframe


df = pd.read_csv('Seed_Data.csv')

# Extract the relevant features


X = df[['area']]
y = df['widthOfKernel']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Scale the data


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the SVM model


svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(X_train_scaled, y_train)

# Make predictions on the test data


y_pred = svm_regressor.predict(X_test_scaled)

# Calculate the coefficient of determination (R^2) on the test data


r2 = r2_score(y_test, y_pred)
print('R^2 score:', r2)

19
FML_LAB FILE
# Plot the predicted values and the actual values on the test data
plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.xlabel('area')
plt.ylabel('widthOfKernel')
plt.title('Support Vector Regression')
plt.show()

# Extract the relevant features


X = df[['lengthOfKernel ']]
y = df['asymmetryCoefficient']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Scale the data


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train the SVM model


svm_regressor = SVR(kernel='rbf')
svm_regressor.fit(X_train_scaled, y_train)

# Make predictions on the test data


y_pred = svm_regressor.predict(X_test_scaled)

# Calculate the coefficient of determination (R^2) on the test data


r2 = r2_score(y_test, y_pred)
print('R^2 score:', r2)

# Plot the predicted values and the actual values on the test data
plt.figure(figsize=(10, 5))
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
plt.xlabel('lengthOfKernel ')

20
plt.ylabel('asymmetryCoefficient')
plt.title('Support Vector Regression')
plt.show()

Output:

R^2 score: -0.22790423264754622

compactness

R^2 score: -0.024969824035952604

21
22
23
FML_LAB FILE

Experiment 5: Study and Implement Bagging using Random Forests.


Abstract:

23
FML_LAB FILE

Code:
import pandas as
pd import numpy
as np
import matplotlib.pyplot as
plt import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the dataset


data = pd.read_csv('Seed_Data.csv')

# Select the features and lengthofkernelgroove variable


X = data.drop('lengthofkernelgroove', axis=1)
y =data['lengthofkernelgroove']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest classifier with 100 trees


rf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model to the training


data rf.fit(X_train, y_train)

# Perform predictions on the test


data y_pred = rf.predict(X_test)

# Calculate the accuracy of the model


accuracy = accuracy_score(y_test,
y_pred) print("Accuracy:", accuracy)

# Create a confusion matrix


confusion_mat = confusion_matrix(y_test,
y_pred) print("Confusion Matrix:")
print(confusion_mat)

# Feature Importance
feature_importance = rf.feature_importances_ feature_names = X.columns

# Sort feature importance in descending order


sorted_indices = np.argsort(feature_importance)[::-1]
sorted_feature_names = feature_names[sorted_indices]
sorted_feature_importance = feature_importance[sorted_indices]

24
FML_LAB FILE
#Bar plot of feature importance
plt.figure(figsize=(10, 6))

sns.barplot(x=sorted_feature_importance, y=sorted_feature_names) plt.xlabel('Feature Importance')


plt.ylabel('Features')
plt.title('Random Forest: Feature
Importance') plt.show()

# Heatmap of confusion
matrix plt.figure(figsize=(8,
6))
sns.heatmap(confusion_mat, annot=True, cmap='Blues', fmt='d')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Random Forest: Confusion
Matrix') plt.show()

25
Output:

25
26
FML_LAB FILE

Experiment 6: Study and Implement Naive Bayes.


Abstract:

27
FML_LAB FILE

Code:
import pandas as pd
import seaborn as
sns
import matplotlib.pyplot as plt
from sklearn.model_selection import
train_test_split from sklearn.naive_bayes import
GaussianNB
from sklearn.metrics import confusion_matrix, classification_report

# Load the dataset


data = pd.read_csv("Seed_Data.csv")

# Split the dataset into features (X) and lengthofkernelgroove variable (y)
X = data.drop("lengthofkernelgroove", axis=1)
y = data["lengthofkernelgroove"]

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Gaussian Naive Bayes


model naive_bayes = GaussianNB()

# Train the model


naive_bayes.fit(X_train, y_train)

# Make predictions on the test set


y_pred = naive_bayes.predict(X_test)

# Create a confusion matrix


cm = confusion_matrix(y_test, y_pred)

# Plot the confusion


matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, cmap="Blues", fmt="d",
cbar=False) plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()

# Create a classification report


report = classification_report(y_test, y_pred)

# Print the classification


report print("Classification
Report:") print(report)

28
FML_LAB FILE

29
FML_LAB FILE

Output:

29
FML_LAB FILE

Experiment 7:
Abstract:

30
FML_LAB FILE

Code:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import confusion_matrix

# Load the dataset


data = pd.read_csv("Seed_Data.csv")

# Split the dataset into features (X) and lengthofkernelgroove variable (y)
X = data.drop("lengthofkernelgroove", axis=1)
y = data["lengthofkernelgroove"]

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Create a Decision Tree classifier


decision_tree = DecisionTreeClassifier()

# Train the model


decision_tree.fit(X_train, y_train)

# Plot the decision tree


plt.figure(figsize=(12, 8))
plot_tree(decision_tree , feature_names=X.columns ,class_names=['0','1'],
filled=True)
plt.title("Decision Tree")
plt.show()

# Make predictions on the test set


y_pred = decision_tree.predict(X_test)

# Create a confusion matrix


cm = confusion_matrix(y_test, y_pred)

31
FML_LAB FILE
# Plot the confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, cmap="Blues", fmt="d", cbar=False)
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()
# Generate the classification report
report = classification_report(y_test, y_pred)
print("Classification Report:")
print(report)

Output:

32
FML_LAB FILE

33

You might also like