Machine Learning Lab Manual (BCSL606)
Machine Learning Lab Manual (BCSL606)
Laboratory Manual
Semester: VI
Compiled By:
Mrs.Ayisha Khanum
Assistant Professor,
Department of CSD
PO's PO Description
Engineering knowledge: Apply the knowledge of mathematics, science, engineering
PO1 fundamentals, and an engineering specialization to the solution of complex
engineering problems.
Problem analysis: Identify, formulate, review research literature, and analyze
PO2 complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.
Design/development of solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs
PO3
with appropriate consideration for the public health and safety, and the cultural,
societal, and environmental considerations.
Conduct investigations of complex problems: Use research-based knowledge and
PO4 research methods including design of experiments, analysis and interpretation of
data, and synthesis of the information to provide valid conclusions.
Modern tool usage: Create, select, and apply appropriate techniques, resources, and
PO5 modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
The engineer and society: Apply reasoning informed by the contextual knowledge
PO6 to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.
Environment and sustainability: Understand the impact of the professional
PO7 engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
Ethics: Apply ethical principles and commit to professional ethics and
PO8 responsibilities and norms of the engineering practice.
Individual and team work: Function effectively as an individual, and as a member
PO9
or leader in diverse teams, and in multidisciplinary settings.
Communication: Communicate effectively on complex engineering activities with
the engineering community and with society at large, such as, being able to
PO10
comprehend and write effective reports and design documentation, make effective
presentations, and give and receive clear instructions.
Project management and finance: Demonstrate knowledge and understanding of
the engineering and management principles and apply these to one’s own work, as a
PO11 member and leader in a team, to manage projects and in multidisciplinary
environments.
Life-long learning: Recognize the need for, and have the preparation and ability to
PO12 engage in independent and life-long learning in the broadest context of technological
change.
Sl.NO Experiments
1 Develop a program to create histograms for all numerical features and analyze the distribution of each feature.
Generate box plots for all numerical features and identify any outliers. Use California Housing dataset.
Book 1: Chapter 2
2 Develop a program to Compute the correlation matrix to understand the relationships between pairs of
features. Visualize the correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise relationships between features. Use
California Housing dataset.
Book 1: Chapter 2
3 Develop a program to implement Principal Component Analysis (PCA) for reducing the dimensionality of the
Iris dataset from 4 features to 2.
Book 1: Chapter 2
4 For a given set of training data examples stored in a .CSV file, implement and demonstrate the Find-S
algorithm to output a description of the set of all hypotheses consistent with the training examples.
Book 1: Chapter 3
5 Develop a program to implement k-Nearest Neighbour algorithm to classify the randomly generated 100 values
of x in the range of [0,1]. Perform the following based on dataset generated.
1. Label the first 50 points [x1,. ..... ,xsoj aS follows: if (xi s 0.5), then x; e Classi, else x, e Classi
2. Classify the remaining points, X51,......,xi oo using KNN. Perform this for k——1,2,3,4,5,20,30
Book 2: Chapter - 2
6 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs
Book 1: Chapter — 4
7 Develop a program to demonstrate the working of Linear Regression and Polynomial Regression. Use Boston
Housing Dataset for Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency prediction) for
Polynomial Regression.
Book 1: Chapter — 5
8 Develop a program to demonstrate the working of the decision tree algorithm. Use Breast Cancer Data set for
building the decision tree and apply this knowledge to classify a new sample.
Book 2: Chapter — 3
9 Develop a program to implement the Naive Bayesian classifier considering Olivetti Face Data set for training.
Compute the accuracy of the classifier, considering a few test data sets.
Book 2: Chapter — 4
10 Develop a program to implement k-means clustering using Wisconsin Breast Cancer data set and visualize the
clustering result.
Book 2: Chapter - 4
Course outcomes (Course Skill Set):
Record should contain all the specified experiments in the syllabus and each experiment write-up will
be evaluated for 10 marks.
Total marks scored by the students are scaled down to 30 marks (60% of maximum marks).
Department shall conduct a test of 100 marks after the completion of all the experiments listed in the
syllabus.
In a test, test write-up, conduction of experiment, acceptable result, and procedural knowledge will
carry a weightage of 60% and the rest 40% for viva-voce.
The suitable rubrics can be designed to evaluate each student's performance and learning ability.
The marks scored shall be scaled down to 20 marks (40% of the maximum marks).
The Sum of scaled-down marks scored in the report write-up/journal and marks of a test is the total CIE
marks scored by the student.
Semester End Evaluation (SEE):
1. SEE marks for the practical course are 50 Marks.
2. SEE shall be conducted jointly by the two examiners of the same institute, examiners are appointed
by the Head of the Institute.
3. The examination schedule and names of examiners are informed to the university before the
conduction of the examination. These practical examinations are to be conducted between the
schedule mentioned in the academic calendar of the University.
6. Students can pick one question (experiment) from the questions lot prepared by the examiners
jointly.
7. Evaluation of test write-up/ conduction procedure and result/viva will be conducted jointly by
examiners.
General rubrics suggested for SEE are mentioned here, writeup-20%, Conduction procedure and result
in -60%, Viva-voce 20% of maximum marks. SEE for practical shall be evaluated for 100 marks and scored
marks shall be scaled down to 50 marks (however, based on course type, rubrics shall be decided by the
examiners)
Change of experiment is allowed only once and 15% of Marks allotted to the procedure part are to be
made zero.
Python Code:
import matplotlib
matplotlib.use('TkAgg') # Use TkAgg backend
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
Output:
Python Code:
import matplotlib
matplotlib.use('TkAgg') # Use TkAgg backend
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
Output:
Python Code:
import pandas as pd
# Find-S algorithm
def find_s_algorithm(X, y):
# Initialize the hypothesis to the most general hypothesis (all attributes can be anything)
hypothesis = ['?' for _ in range(X.shape[1])]
Python Code:
import matplotlib
matplotlib.use('TkAgg') # Use the TkAgg backend for stable display
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
# Step 2: Label the first 50 points as Class1 and the rest as Class2
y_labels = np.array(['Class1' if x <= 0.5 else 'Class2' for x in x_values.flatten()])
Output:
Python Code:
import matplotlib
matplotlib.use('TkAgg') # Use TkAgg backend for interactive plotting
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
# Solve the weighted least squares problem using np.linalg.lstsq for efficiency
theta, _, _, _ = np.linalg.lstsq(X_train_b * weights[:, np.newaxis], y_train * weights, rcond=None)
# Plot results
plt.scatter(X_test, y_test, color='blue', label='True values')
plt.scatter(X_test, y_pred, color='red', label='Predicted values')
plt.xlabel('Median Income')
plt.ylabel('Median House Value')
plt.title('Locally Weighted Regression (LWR)')
plt.legend()
plt.grid(True)
Python Code:
import numpy as np
import pandas as pd
import matplotlib
matplotlib.use('TkAgg') # Use TkAgg backend
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_california_housing
# Load California Housing dataset for Linear Regression
data = fetch_california_housing(as_frame=True)
X = data.data[['AveRooms']]
Output:
Python Code:
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Classify a new sample (randomly selected from the test set for demonstration)
new_sample = X_test[0].reshape(1, -1) # Take the first sample from the test set
predicted_class = clf.predict(new_sample)
Output:
Python Code:
import numpy as np
from scipy.io import loadmat
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the olivettifaces.mat file (ensure it's in the same directory or update the path)
data = loadmat('olivettifaces.mat')
# Assuming labels are the index of faces (0-40 for 40 individuals, 10 images per individual)
y = np.repeat(np.arange(40), 10) # 40 classes (individuals), 10 images per class
# Make predictions
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")
Output:
Python Code:
import matplotlib
matplotlib.use('TkAgg') # Use the TkAgg backend for interactive GUI rendering
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
# Load the breast cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target
# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Output: