PML Lab Exp Full
PML Lab Exp Full
Aim
The goal of this experiment is to introduce the basics of Python programming, including syntax,
data types, control structures, and basic operations.
5 # Float
6 y = 3.14
7 print ( " y is of type : " , type ( y ) )
8
9 # String
10 name = " Alice "
11 print ( " name is of type : " , type ( name ) )
12
13 # Boolean
14 is_student = True
15 print ( " is_student is of type : " , type ( is_student ) )
Basic Operations
Python supports standard arithmetic operations.
1 a = 10
2 b = 3
3
1
AML311 - PML Lab
2. Control Structures
If Statement
The if statement is used for conditional execution.
1 age = 20
2
For Loop
The for loop in Python is used for iterating over a sequence.
1 # Iterating over a list
2 fruits = [ " apple " , " banana " , " cherry " ]
3 for fruit in fruits :
4 print ( fruit )
5
While Loop
The while loop continues executing as long as the condition is true.
1 count = 0
2 while count < 5:
3 print ( count )
4 count += 1
3. Data Structures
Lists
Lists are ordered collections of items.
1 # Creating a list
2 numbers = [1 , 2 , 3 , 4 , 5]
3
4 # Accessing elements
5 print ( " First element : " , numbers [0])
2 CEK
AML311 - PML Lab
8 # Modifying elements
9 numbers [0] = 10
10 print ( " Modified list : " , numbers )
11
12 # Adding elements
13 numbers . append (6)
14 print ( " List after appending : " , numbers )
15
16 # Removing elements
17 numbers . remove (3)
18 print ( " List after removing 3: " , numbers )
Strings
Strings are sequences of characters.
1 # Creating a string
2 message = " Hello , World ! "
3
4 # Accessing characters
5 print ( " First character : " , message [0])
6 print ( " Last character : " , message [ -1])
7
8 # Slicing
9 print ( " First 5 characters : " , message [:5])
10 print ( " Last 5 characters : " , message [ -5:])
11
12 # String methods
13 print ( " Uppercase : " , message . upper () )
14 print ( " Lowercase : " , message . lower () )
15 print ( " Count of ’o ’: " , message . count ( ’o ’) )
16 print ( " Replace ’ World ’ with ’ Python ’: " , message . replace ( " World " , "
Python " ) )
17
18 # Concatenation
19 greeting = " Hello "
20 name = " Alice "
21 full_greeting = greeting + " , " + name + " ! "
22 print ( full_greeting )
4. Functions
Functions in Python are defined using the def keyword.
1 # Defining a function
2 def greet ( name ) :
3 return f " Hello , { name }! "
4
5 # Calling a function
3 CEK
AML311 - PML Lab
Result
In this experiment, the basics of Python programming were introduced. Fundamental operations
such as printing output and performing arithmetic operations were covered. Control structures,
including if statements and loops (for and while), were explored to handle conditional and repet-
itive tasks. Essential data structures like lists and strings for storing and manipulating collections
of items were discussed. Practical examples demonstrated how to combine these concepts to solve
simple problems, such as checking for prime numbers and summing elements in a list.
4 CEK
Experiment 2: Familiarization of basic Python Libraries
Aim
The aim of this experiment is to familiarize with the basic Python libraries used in data science
and machine learning: NumPy, Pandas, Matplotlib, and Scikit-learn. This includes understanding
their core functionalities, performing basic operations, and visualizing data.
NumPy
NumPy (Numerical Python) is a library used for working with arrays and provides mathematical
functions to operate on these arrays.
Pandas
Pandas is a powerful library for data manipulation and analysis. It provides data structures like
DataFrame.
1
AML311 - PML Lab
• DataFrame Operations
• Data Manipulation
Example Code:
1 import pandas as pd
2
3 # Creating DataFrame
4 data = {
5 ’ Name ’: [ ’ Alice ’ , ’ Bob ’ , ’ Charlie ’] ,
6 ’ Age ’: [24 , 27 , 22] ,
7 ’ City ’: [ ’ New York ’ , ’ Los Angeles ’ , ’ Chicago ’]
8 }
9 df = pd . DataFrame ( data )
10 print ( " DataFrame :\ n " , df )
11
12 # DataFrame Operations
13 print ( " Mean Age : " , df [ ’ Age ’ ]. mean () )
14 print ( " Data in Age column :\ n " , df [ ’ Age ’ ])
15
16 # Data Manipulation
17 df [ ’ Age ’] = df [ ’ Age ’] + 1
18 print ( " Updated DataFrame :\ n " , df )
Matplotlib
Matplotlib is a plotting library for creating static, animated, and interactive visualizations in
Python.
• Bar Plot
• Scatter Plot
Example Code:
1 import matplotlib . pyplot as plt
2
3 # Line Plot
4 x = [1 , 2 , 3 , 4 , 5]
5 y = [2 , 3 , 5 , 7 , 11]
6 plt . plot (x , y , label = ’ Line Plot ’)
7 plt . xlabel ( ’X - axis ’)
8 plt . ylabel ( ’Y - axis ’)
9 plt . title ( ’ Line Plot Example ’)
10 plt . legend ()
11 plt . show ()
12
13 # Bar Plot
14 categories = [ ’A ’ , ’B ’ , ’C ’]
15 values = [10 , 15 , 7]
16 plt . bar ( categories , values , color = ’ green ’)
2 CEK
AML311 - PML Lab
Scikit-learn (Sklearn)
Scikit-learn is a machine learning library for Python. It features various classification, regression,
and clustering algorithms.
• Training a Model
• Making Predictions
Example Code:
1 from sklearn import datasets
2 from sklearn . model_selection import train_test_split
3 from sklearn . linear_model import LinearRegression
4 from sklearn . metrics import me an_ sq ua re d_ er ro r
5
6 # Loading Dataset
7 iris = datasets . load_iris ()
8 X = iris . data
9 y = iris . target
10
11 # Using only two features for simplicity
12 X = X [: , :2]
13
14 # Splitting Data
15 X_train , X_test , y_train , y_test = train_test_split (X , y , test_size =0.2 ,
random_state =42)
16
17 # Training a Model
18 model = LinearRegression ()
19 model . fit ( X_train , y_train )
20
21 # Making Predictions
22 predictions = model . predict ( X_test )
23
24 # Evaluating the Model
25 mse = me an _s qu ar ed_ er ro r ( y_test , predictions )
26 print ( " Mean Squared Error : " , mse )
3 CEK
AML311 - PML Lab
Result
The functionalities of essential Python libraries for data science and machine learning were explored.
NumPy was used for array operations and mathematical functions. Pandas was used for data ma-
nipulation with DataFrames. Matplotlib was employed to create various plots for data visualization.
Finally, Scikit-learn was utilized for loading datasets, training a machine learning model, making
predictions, and evaluating model performance.
Viva
NumPy
1. What is NumPy and why is it used in Python programming?
NumPy is a library for numerical computations in Python, providing support for arrays and
mathematical functions to perform operations efficiently.
4. What are some common operations you can perform on NumPy arrays?
Common operations include arithmetic operations (addition, multiplication), reshaping, slic-
ing, and applying mathematical functions like mean, sum, and standard deviation.
6. What are some mathematical functions provided by NumPy? Can you give ex-
amples?
Functions include np.mean(), np.sum(), np.sin(), and np.log(). Example: np.mean([1,
2, 3]) computes the average.
7. How do you compute the mean and standard deviation of a NumPy array?
Use np.mean(array) and np.std(array), respectively.
Pandas
1. What is Pandas and what are its primary data structures?
Pandas is a data manipulation library with two primary data structures: Series (one-dimensional)
and DataFrame (two-dimensional).
4 CEK
AML311 - PML Lab
Matplotlib
1. What is Matplotlib and why is it used in data visualization?
Matplotlib is a plotting library used to create static, interactive, and animated visualizations
in Python.
4. How can you customize the appearance of a plot in Matplotlib (e.g., colors, labels,
title)?
Use functions like plt.xlabel(), plt.ylabel(), plt.title(), and plt.plot(x, y, color=’r’)
to set labels, titles, and colors.
7. Can you create multiple plots in a single figure using Matplotlib? How?
Yes, use plt.subplot() to create multiple subplots in one figure.
Scikit-learn
1. What is Scikit-learn and what types of tasks is it used for?
Scikit-learn is a machine learning library for Python, used for tasks such as classification,
regression, clustering, and dimensionality reduction.
2. How do you load a dataset using Scikit-learn? Provide an example with the Iris
dataset.
Use datasets.load iris() to load the Iris dataset.
4. How do you fit a machine learning model to training data using Scikit-learn?
Use the fit() method on the model instance. Example: model.fit(X train, y train).
5 CEK
AML311 - PML Lab
6 CEK
AML311 - PML Lab
Algorithm
1. Accept user input for the elements of two lists, where the elements are separated by spaces.
5. Display the original lists, the union of the lists, and the intersection of the lists.
Code
# Accept user input for the lists
list1 = input("Enter the elements of the first list, separated by spaces: ").split()
list2 = input("Enter the elements of the second list, separated by spaces: ").split()
1 CEK
AML311 - PML Lab
print(f"List 1: {list1}")
print(f"List 2: {list2}")
print(f"Union: {union}")
print(f"Intersection: {intersection}")
Result
The experiment was conducted successfully, and the union and intersection of the lists were com-
puted.
2 CEK
AML311 - PML Lab
Algorithm
1. Accept input for the number of rows and columns of the first matrix.
2. Accept input for the number of rows and columns of the second matrix.
3. Ensure that the number of columns in the first matrix is equal to the number of rows in the
second matrix to enable multiplication.
6. Initialize a result matrix with dimensions equal to the number of rows of the first matrix and
the number of columns of the second matrix.
7. Iterate through each row of the first matrix and each column of the second matrix.
8. Compute the dot product of the row from the first matrix and the column from the second
matrix.
9. Store the computed value in the corresponding position in the result matrix.
10. Display the first matrix, the second matrix, and the resulting matrix after multiplication.
Code
# Function to multiply two matrices
def multiply_matrices(matrix1, matrix2):
# Get dimensions of matrices
rows1 = len(matrix1)
cols1 = len(matrix1[0])
rows2 = len(matrix2)
cols2 = len(matrix2[0])
return result
CEK
AML311 - PML Lab
# Ensure that number of columns in the first matrix equals the number of rows in the
second matrix
if cols1 != rows2:
print("Matrix multiplication not possible. Number of columns in the first matrix
must equal number of rows in the second matrix.")
else:
# Input first matrix
print("Enter elements of the first matrix:")
matrix1 = []
for i in range(rows1):
row = list(map(int, input().split()))
matrix1.append(row)
# Multiply matrices
result = multiply_matrices(matrix1, matrix2)
# Display matrices
print("Matrix 1:")
for row in matrix1:
print(row)
print("Matrix 2:")
for row in matrix2:
print(row)
print("Resulting Matrix:")
for row in result:
print(row)
Result
The experiment was conducted successfully, and the matrices were multiplied to produce the resul-
tant matrix.
CEK
AML311 - PML Lab
Algorithm
1. Open the text file test.txt in read mode.
Code
def find_most_frequent_words(filename):
with open(filename, ’r’) as file:
content = file.read()
words = content.split()
word_freq = {}
max_freq = max(word_freq.values())
most_frequent_words = [word for word, freq in word_freq.items() if freq ==
,→ max_freq]
filename = ’test.txt’
find_most_frequent_words(filename)
1 CEK
AML311 - PML Lab
Result
The experiment was conducted successfully, and the most frequent words in the text file were found
and displayed.
2 CEK
Experiment 6: Single, Multi variable and Polynomial Regression
Aim
Implement and demonstrate Single-variable, Multi-variable, and Polynomial Regression for a given set of
training data stored in a .CSV file and evaluate the accuracy.
Algorithm
1. Load the dataset from the CSV file into a pandas DataFrame.
Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
1
AML311 - PML Lab
# Load dataset
data = pd.read_csv(’data.csv’)
# Prepare data
X = data[[’SquareFeet’]].values
y = data[’Price’].values
# Single-variable Regression
X_train_single = X
y_train_single = y
model_single = LinearRegression()
model_single.fit(X_train_single, y_train_single)
y_pred_single = model_single.predict(X_train_single)
mse_single = mean_squared_error(y_train_single, y_pred_single)
print("Single-variable Regression")
print(f"Mean Squared Error: {mse_single}")
2 CEK
AML311 - PML Lab
# Polynomial Regression
poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)
model_poly = LinearRegression()
model_poly.fit(X_poly, y)
y_pred_poly = model_poly.predict(X_poly)
mse_poly = mean_squared_error(y, y_pred_poly)
print("\nPolynomial Regression")
print(f"Mean Squared Error: {mse_poly}")
Result
The experiment was conducted for Single-variable, Multi-variable, and Polynomial Regression, and the
Mean Squared Errors for each were obtained.
Viva Questions
General Concepts
1. What is regression analysis, and why is it important?
• Regression analysis is a statistical technique used to model and analyze the relationships be-
tween a dependent variable and one or more independent variables. It helps in understanding
the strength and nature of these relationships, making predictions, and inferring causal rela-
tionships.
2. What are the differences between linear regression and polynomial regression?
• Linear regression models the relationship between the independent and dependent variables
using a straight line. Polynomial regression extends this by fitting a polynomial function to the
data, allowing for more complex, non-linear relationships.
Single-variable Regression
1. What is single-variable regression?
• Single-variable regression, or simple linear regression, involves modeling the relationship between
a single independent variable and a dependent variable. It aims to find the best-fitting line that
minimizes the error between the predicted and actual values.
3 CEK
AML311 - PML Lab
Multi-variable Regression
1. What is multi-variable regression, and how does it differ from single-variable regression?
• Multi-variable regression, or multiple linear regression, models the relationship between multiple
independent variables and a dependent variable. Unlike single-variable regression, which uses
one feature, multi-variable regression uses several features to predict the target variable.
2. How does adding polynomial features in multi-variable regression help in modeling?
• Adding polynomial features allows the model to capture non-linear relationships between the
features and the target variable. This can improve model performance by fitting the data more
accurately.
3. What is the purpose of feature scaling in regression analysis?
• Feature scaling ensures that all features contribute equally to the model’s performance by
standardizing their ranges. This is particularly important in algorithms sensitive to feature
scales, though it is less critical for linear regression.
Polynomial Regression
1. What is polynomial regression, and when would you use it?
• Polynomial regression is used when the relationship between the independent and dependent
variables is non-linear. By fitting a polynomial function, the model can capture more complex
patterns in the data that linear regression cannot.
2. How do you choose the degree of the polynomial in polynomial regression?
• The degree of the polynomial is typically chosen based on model performance metrics and
cross-validation. Higher-degree polynomials can fit the training data better but may also lead
to overfitting. It’s important to balance model complexity with generalization.
3. What is overfitting, and how can it be prevented in polynomial regression?
• Overfitting occurs when the model learns the noise in the training data rather than the under-
lying pattern, resulting in poor performance on new data. To prevent overfitting, one can use
techniques such as regularization, cross-validation, and selecting an appropriate degree for the
polynomial.
4 CEK
AML311 - PML Lab
• Visualizing regression results helps in understanding the model’s fit, identifying patterns, and
detecting any issues such as overfitting or underfitting. It also aids in communicating results
and insights to stakeholders.
• A residual plot shows the residuals (errors) on the y-axis versus the predicted values or another
variable on the x-axis. It helps in diagnosing issues with the regression model, such as non-
linearity, heteroscedasticity, or outliers.
• Saving plots allows for a permanent record of the model’s performance and the relationships
between variables. This is useful for documentation, reporting, and further analysis.
Practical Considerations
1. What steps would you take if your regression model performs poorly?
• In linear regression, each coefficient represents the change in the dependent variable for a one-
unit change in the corresponding feature, assuming other features remain constant. In polyno-
mial regression, coefficients reflect the contribution of polynomial terms to the prediction.
5 CEK
Experiment 7: Logistic Regression
Aim
To implement and demonstrate logistic regression on a dataset and evaluate the accuracy.
Algorithm
1. Load Dataset: Read the dataset from a CSV file.
2. Prepare Data: Extract features and target variables from the dataset.
3. Split Data: Divide the dataset into training and testing sets.
5. Train Model: Fit the logistic regression model on the training data.
6. Predict: Use the trained model to predict the target values on the test set.
7. Evaluate Model: Calculate the accuracy and confusion matrix of the model.
8. Plot Decision Boundary: Visualize the decision boundary by plotting it along with the data
points.
Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
data = pd.read_csv(’student_admission.csv’)
X = data[[’Entrance_Score’, ’12th_Class_Score’]].values
y = data[’Admitted’].values
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
1
AML311 - PML Lab
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors=’k’, marker=’o’, cmap=plt.cm.coolwarm)
plt.xlabel(’Entrance Score’)
plt.ylabel(’12th Class Score’)
plt.title("Logistic Regression Decision Boundary")
plt.savefig(’logistic_regression_boundary.png’)
plt.clf()
Result
The logistic regression model was successfully implemented and evaluated on the given dataset.
Viva Questions
1. What is logistic regression, and how does it differ from linear regression?
Logistic regression is a statistical model used for binary classification tasks. Unlike linear regression,
which predicts continuous outcomes, logistic regression predicts the probability of a binary outcome
by applying a logistic function to the linear combination of input features.
2. What is the purpose of the sigmoid function in logistic regression?
The sigmoid function, or logistic function, maps any real-valued number into the (0, 1) interval. It
is used in logistic regression to convert the linear output of the model into a probability that can be
used to make binary predictions.
3. What is the decision boundary in logistic regression, and how is it visualized?
The decision boundary is a line or surface that separates different classes in the feature space. It
is determined by the logistic regression model and can be visualized by plotting the probability of
classification over the feature space and highlighting the regions corresponding to different classes.
4. What is the role of the training and testing sets in model evaluation?
The training set is used to fit the model and learn the parameters, while the testing set is used to
evaluate the model’s performance on unseen data. This separation helps in assessing how well the
model generalizes to new, unseen examples.
5. What is a confusion matrix, and how is it used in evaluating a logistic regression model?
A confusion matrix is a table used to evaluate the performance of a classification model. It displays
the number of true positive, true negative, false positive, and false negative predictions. By analyzing
these values, one can assess how well the model is distinguishing between classes.
2 CEK
AML311 - PML Lab
6. What do the terms True Positive (TP), True Negative (TN), False Positive (FP), and
False Negative (FN) represent in a confusion matrix?
8. How can you use a confusion matrix to calculate precision and recall?
TP
Recall =
TP + FN
It measures the proportion of true positives among the actual positives.
9. What is the F1-score, and how is it derived from the confusion matrix?
The F1-score is the harmonic mean of precision and recall, providing a single metric that balances
both. It is calculated as:
Precision × Recall
F1-score = 2 ×
Precision + Recall
This metric is useful when the class distribution is imbalanced and provides a more comprehensive
evaluation of the model’s performance.
10. What insights can be gained from analyzing the confusion matrix for a logistic regression
model?
Analyzing the confusion matrix helps in understanding where the model is making errors. For
example, high false positive rates may indicate that the model is overly aggressive in predicting the
positive class, while high false negative rates suggest it may be missing many positive instances. It
provides a detailed view of the model’s strengths and weaknesses in classification tasks.
3 CEK
Experiment 8: Naive Bayes Classifier
Aim
To implement a Python program that uses the Naive Bayes classifier to classify a dataset and calculate
the accuracy, precision, and recall for the dataset.
Algorithm:
1. Load the dataset from a CSV file.
2. Split the dataset into features (X) and target variable (y).
7. Evaluate the model by calculating accuracy, precision, recall, and confusion matrix.
Program:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score
data = pd.read_csv(’student_performance.csv’)
X = data[[’Study_Hours’, ’Previous_Exam_Score’]].values
y = data[’Passed’].values
model = GaussianNB()
model.fit(X_train, y_train)
1
AML311 - PML Lab
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors=’k’, marker=’o’, cmap=plt.cm.coolwarm)
plt.xlabel(’Study Hours’)
plt.ylabel(’Previous Exam Score’)
plt.title("Naive Bayes Decision Boundary")
plt.savefig(’naive_bayes_boundary.png’)
plt.clf()
Result:
The Naive Bayes classifier was successfully implemented and evaluated. The accuracy, precision, and recall
were calculated.
2 CEK
AML311 - PML Lab
4. How is the Naive Bayes classifier different from other classifiers like Logistic Regression?
Naive Bayes is a probabilistic model based on Bayes’ Theorem, while logistic regression is a linear
model that predicts the probability of class membership using a logistic function. Naive Bayes
assumes feature independence, whereas logistic regression does not assume any particular relationship
between the features.
3 CEK
Experiment 9: Decision Tree-based ID3 Algorithm
Aim
To write a Python program to demonstrate the working of the Decision Tree using the ID3 algorithm.
Algorithm
1. Load the dataset: Load a dataset that can be used for building the decision tree. This dataset should
have labeled data, where each sample consists of features and the corresponding target class.
2. Split the dataset: Divide the dataset into training and testing sets to evaluate the performance of
the decision tree.
4. Train the Model: Use the training dataset to train the decision tree model.
5. Classify New Data: Input a new data sample and classify it using the trained decision tree.
6. Evaluate the Model: Use the test data to evaluate the accuracy of the model.
8. Display the accuracy, precision, and recall, along with image of the decision tree.
Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
from sklearn import tree
import matplotlib.pyplot as plt
data = pd.read_csv(’student_performance.csv’)
1
AML311 - PML Lab
clf = DecisionTreeClassifier(criterion=’entropy’)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
plt.figure(figsize=(12,8))
tree.plot_tree(clf, feature_names=[’Study_Hours’, ’Assignments_Completed’, ’Attendance’],
class_names=[’Low’, ’Medium’, ’High’], filled=True)
plt.title("Decision Tree Visualization")
plt.savefig(’decision_tree_visualization.png’)
plt.clf()
Result
The Decision Tree classifier is successfully trained on the dataset and the model’s accuracy, precision, and
recall are calculated using the test set.
3. What are the criteria used for splitting nodes in the ID3 algorithm?
ID3 uses information gain, which is based on the concept of entropy. The feature with the highest
information gain is selected for splitting the node.
2 CEK
AML311 - PML Lab
6. What are the common metrics used to evaluate a decision tree model?
Common metrics include:
• Pruning: Reducing the size of the tree by removing nodes that provide little information gain.
• Setting a maximum depth: Limiting the depth of the tree to control its complexity.
• Minimum samples for split: Requiring a minimum number of samples for splitting a node.
• Cross-validation: Using validation data to tune parameters and avoid overfitting.
3 CEK
Experiment 10: Support Vector Machine Classifier
Aim
Write a Python program to implement a Support Vector Machine (SVM) classifier to classify a dataset
and evaluate the accuracy.
Algorithm
1. Load the dataset from a CSV file.
2. Separate the dataset into features (input variables) and target (output variable).
7. Evaluate the model using metrics such as accuracy, precision, and recall.
Program
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
data = pd.read_csv(’student_performance_svm.csv’)
X = data[[’Hours_Studied’, ’Previous_Score’]].values
y = data[’Pass’].values
model = SVC(kernel=’linear’)
model.fit(X_train, y_train)
1
AML311 - PML Lab
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors=’k’, marker=’o’, cmap=plt.cm.coolwarm)
plt.xlabel(’Hours Studied’)
plt.ylabel(’Previous Score’)
plt.title("SVM Decision Boundary")
plt.savefig(’svm_decision_boundary.png’)
plt.clf()
Result
The SVM classifier was successfully implemented to classify the dataset. The accuracy, precision, and
recall were calculated.
2 CEK
AML311 - PML Lab
3 CEK
Experiment 11: K-Nearest Neighbor Algorithm
Aim
To implement the K-Nearest Neighbor (KNN) algorithm to classify a dataset and evaluate the classification
accuracy.
Algorithm
1. Read the dataset from a CSV file and load it into a pandas DataFrame.
3. Normalize the feature data to ensure all features are on the same scale.
Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
data = pd.read_csv(’student_performance_knn.csv’)
X = data[[’StudyHours’, ’Attendance’]].values
y = data[’Performance’].map({’Low’: 0, ’Medium’: 1, ’High’: 2}).values
1
AML311 - PML Lab
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
plt.figure(figsize=(6,6))
sns.heatmap(conf_matrix, annot=True, fmt=’d’, cmap=’Blues’,
xticklabels=[’Low’, ’Medium’, ’High’], yticklabels=[’Low’, ’Medium’, ’High’])
plt.title(’Confusion Matrix’)
plt.xlabel(’Predicted’)
plt.ylabel(’Actual’)
plt.savefig(’confusion_matrix_knn.png’)
plt.close()
Result
The K-Nearest Neighbor (KNN) classifier was successfully implemented to classify the given dataset. The
accuracy, precision, and recall values were calculated.
2 CEK
AML311 - PML Lab
data. If K is too large, the model may oversimplify the decision boundary, leading to underfitting
and reduced accuracy.
3 CEK
Experiment 12: K-Means Clustering
Aim
To implement the K-Means Clustering algorithm using a given dataset and evaluate the clustering results.
Algorithm
1. Load the dataset containing the features to be clustered.
5. For each data point, compute the distance from the centroids and assign it to the nearest cluster.
6. After all points are assigned, recompute the centroid of each cluster by averaging the points in that
cluster.
7. Repeat the assign-recompute process until the centroids no longer change or the change is minimal.
Program
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import numpy as np
data = pd.read_csv(’student_performance_kmeans.csv’)
X = data[[’Math_Score’, ’Science_Score’]].values
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=’rainbow’, edgecolor=’k’, marker=’o’)
plt.scatter(centroids[:, 0], centroids[:, 1], c=’black’, s=200, alpha=0.75)
1
AML311 - PML Lab
plt.xlabel(’Math Score’)
plt.ylabel(’Science Score’)
plt.title(’K-Means Clustering of Student Performance’)
plt.savefig(’kmeans_clusters.png’)
plt.clf()
Result
The K-Means Clustering algorithm was successfully implemented and the clustering results were visualized.
2 CEK
Experiment 13: Artificial Neural Network using Backpropagation
Aim
To implement an Artificial Neural Network (ANN) using the Backpropagation algorithm and test it on a
given dataset.
Algorithm
1. Load the dataset and preprocess it as required.
3. Initialize the Artificial Neural Network. Set the number of input neurons, hidden neurons, and
output neurons, and initialize weights and biases randomly for each layer.
4. For each layer in the network, Compute the weighted sum of inputs and apply the activation function
(ReLU), and pass the outputs to the next layer.
5. Calculate the error at the output layer using the loss function and propagate the error back through
the network.
6. Train the ANN using the backpropagation algorithm by feeding the training data.
7. Update the weights of the network during training to minimize the error using gradient descent.
8. Once the network is trained, predict the target values for the test set.
Program
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
class NeuralNetwork:
1
AML311 - PML Lab
data = pd.read_csv(’student_performance_ann.csv’)
le = LabelEncoder()
data[’Result’] = le.fit_transform(data[’Result’])
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
input_size = X_train.shape[1]
hidden_size = 5
output_size = y_train.shape[1]
2 CEK
AML311 - PML Lab
learning_rate = 0.1
epochs = 1000
nn.train(X_train, y_train, learning_rate, epochs)
y_pred = nn.predict(X_test)
y_pred = np.argmax(y_pred, axis=1)
y_test = np.argmax(y_test, axis=1)
Result
The Artificial Neural Network model was successfully implemented using the Backpropagation algorithm.
The accuracy of the model was evaluated, and the confusion matrix was generated.
3 CEK
AML311 - PML Lab
4 CEK