Lab Manual
Lab Manual
AHMEDABAD
Laboratory Manual
Year: 2024-2025
1
INDEX
Page No.
Sr. Date Marks Signature
Experiment CO
No.
From To
2
Experiment No: 1 Date: / /
OBJECTIVES:
THEORY:
EXERCISES:
1) Perform numerical operations using Numpy to Creating arrays
2) Perform Accessing Array Elements using Numpy
3) Perform Slicing arrays using Numpy
4) Draw a different types of graphs using Matplotlib
5) Write a program to draw following graph using Matplotlib.
QUIZ:
1. Which of the functions is a function to create a numpy array?
a) empty() b) array() c) ones() d) All the above
3
2. What is the output of the below code?
import numpy as np
a = np.arange(2, 8)
print(a)
a) array([2, 3, 4, 5, 6, 7]) b) array([3, 4, 5, 6, 7])
c) array([2, 3, 4, 5, 6, 7, 8]) d) array([3, 4, 5, 6, 7, 8])
EVALUATION:
Problem Analysis Understanding Timely Mock Total
and Solution Level Completion
(3) (3) (2) (2) (10)
4
Experiment No: 2 Date: / /
OBJECTIVES:
THEORY:
EXERCISES:
1) Load a CSV file into a Pandas DataFrame
2) Create a simple Pandas Series from a list
3) Create a simple Pandas DataFrame
QUIZ:
1. Which of the following features is not provided by the Pandas module?
a) Merge and join the data sets b) Filter data using the condition
c) Plot and visualize the data d) None of the above
3. Given a dataset named ‘data’ containing the 5 columns and 10 rows, find the output
of the given code? print(len(data.columns))
a) 5 b) 10 c) 15 d) 50
5
5. Which of the following commands return the data type of the values in each column
in the data frame.
a) print(df.dtype) b) print(dtypes(df))
c) print(df.dtypes) d) None of the above
6
Experiment No: 3 Date: / /
OBJECTIVES:
After completing this experiment students will be able to…
• Understand concept of Machine learning.
• The objective of the Find-S algorithm is to find the most specific hypothesis that fits all
positive instances in the training data while minimizing the number of misclassifications
of negative instances.
THEORY:
Refer Unit 2 of course curriculum.
• The Find-S algorithm is a simple, yet powerful, concept learning algorithm used in machine
learning for learning a hypothesis from training data represented as instances of a target
concept. Developed by Tom Mitchell, the Find-S algorithm is primarily used in the context
of supervised learning for learning from examples.
• Hypothesis Representation: The hypothesis space in the Find-S algorithm is represented
using a conjunction of attribute-value pairs. Each attribute-value pair in the hypothesis
represents a specific condition that must be satisfied by positive instances of the target
concept.
EXERCISES:
1. Initialize the hypothesis to the most specific hypothesis in the hypothesis space.
2. For each positive training instance, update the hypothesis to include only the attributes that
are present in the instance.
3. For each negative training instance, remove any attributes that are present in the instance
from the hypothesis.
4. Return the final hypothesis.
QUIZ:
State True or False
1. Find S algorithm only considers positive training examples and neglect negative training
examples.
2. In Find-S algorithm we move bottom to top i.e. general hypothesis to specific hypothesis.
3. A maximally specific hypothesis covers none of the negative training examples.
7
Signature with date : _________________________
Experiment No: 4 Date: / /
OBJECTIVES:
After completing this experiment students will be able to…
• The primary objective of the practical is to understand data pre-processing along with
identifying various types of data.
THEORY:
QUIZ:
1. What is the main advantage of using feature selection?
a) speeding-up the training of an algorithm
b) fine tuning the model’s performance
c) remove noisy features
3. Given 20 potential features, How many models do you have to evaluate in all the subsets
algorithm
a) 20 b) 40 c) 1048576 d) 1048596
OBJECTIVES:
THEORY:
A decision tree is a popular machine learning algorithm used for both classification and regression
tasks. It's a predictive modeling tool that maps observations about an item to conclusions about its
target value, typically represented in a tree-like structure.
The learned tree should be tested on test instances with unknown class labels, and the predicted
class labels for the test instances should be printed as output. Predicted class labels (0/1) for the
test data must be exactly in the order in which the test instances are present in the test file.
Refer Unit 3 of course curriculum. Students are suggested to read chapter 3 of Machine Learning
authored by Dutt, Chandramouli and das.
EXERCISES:
1. Predict class labels of test data.
QUIZ:
1. What is a decision tree?
a) A visual representation of decision-making using nodes and branches
b) A mathematical formula for predicting outcomes
c) A statistical model for regression analysis
10
Experiment No: 6 Date: / /
TITLE: Implement the K-nearest neighbor algorithm for predicting class labels.
OBJECTIVES:
THEORY:
The k-nearest neighbors (k-NN) algorithm is a straightforward and intuitive machine learning
algorithm used for both classification and regression tasks. It's a type of instance-based learning,
where the algorithm memorizes the training dataset and makes predictions for new instances based
on their similarity to existing instances in the training data.
Training data: data.csv
1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 0 0 1
1 1 1 1 1 1 1 1 0
1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 0 1
1 1 1 0 1 1 0 1 1
1 1 0 1 1 1 0 1 0
1 1 1 0 1 1 0 0 1
1 1 1 0 1 0 0 1 1
1 1 1 0 1 0 0 0 1
0 1 1 1 1 1 0 1 1
0 1 1 1 1 1 0 0 1
1 0 1 1 1 1 0 1 0
0 1 1 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0
1 0 0 1 1 1 0 1 0
1 0 0 1 0 1 1 1 0
0 1 1 1 1 0 0 0 1
1 0 1 1 1 1 1 1 0
0 1 1 0 1 1 0 1 1
11
Test Data: test.csv
0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0
0 1 1 1 1 0 0 0
Refer unit 4 of course curriculum. Students are suggested to read chapter 7 of Machine Learning
authored by Dutt, Chandramouli and das
EXERCISES:
1.
Load the training and test datasets.
2.
Implement a function to calculate the Euclidean distance between two data points.
3.
For each data point in the test dataset, find the K-nearest neighbors from the training dataset.
4.
Determine the majority class among the K-nearest neighbors.
5.
Assign the predicted class label to the test data point.
6.
Evaluate the performance of the algorithm using metrics such as accuracy, precision, recall, and F1
score.
QUIZ:
1. What does KNN stand for?
a) K-Nearest Neighbors b) Kernel Nonlinear Network
c) K-Means Nearest Neighbors d) None of the above
2. In KNN, how is the distance between a new data point and its neighbors typically measured?
a) Euclidean distance b) Manhattan distance c) Cosine similarity d) All of the above
OBJECTIVES:
After completing this experiment students will be able to…
• Understand the imported data from known repositories
THEORY:
https://pandas.pydata.org/docs/user_guide/index.html
Refer unit 4 of course curriculum.
EXERCISES:
a. Find rows and columns in dataset
b. Find basic information regarding the dataset using the describe command.
c. Find values using values command.
QUIZ:
1. What is Pandas used for?
a) Data analysis and manipulation b) Web development
c) Machine learning d) Image processing
4. How do you select a subset of rows and columns from a Pandas DataFrame?
a) df.loc[row_index, column_index] b) df.iloc[row_index, column_index]
c) df[row_index, column_index] d) df.select(row_index, column_index)
13
Signature with Date : __________________________
Experiment No: 8 Date: / /
OBJECTIVES:
THEORY:
Linear Regression : Linear regression is a foundational and widely-used statistical technique for
modeling the relationship between a dependent variable (often denoted as y) and one or more
independent variables (often denoted as x). It's a supervised learning algorithm used for predictive
modeling and understanding the relationship between variables.
y=β0+β1x1+β2x2+…+βnxn+ϵ
where:
• y is the dependent variable.
• x1,x2,…,xn are the independent variables.
• β0 is the intercept (the value of y when all independent variables are zero).
• β1,β2,…,βn are the coefficients (slopes) of the independent variables.
• ϵ is the error term, representing the difference between the observed and predicted values.
https://scikit-learn.org/stable/
Refer unit 4 of course curriculum.
EXERCISES:
a. Import home_data.csv on kaggle using pandas.
b. Understand data by running head ,info and describe command.
c. Plot the price of house with respect to area using matplotlib library.
d. Apply linear regression model to predict the price of house.
QUIZ:
1. What is linear regression used for?
14
a) Data visualization b) Clustering c) Predictive modeling d) Dimensionality reduction
15
Experiment No: 9 Date: / /
TITLE: Implement the K-means clustering algorithm for clustering a set of points.
OBJECTIVES:
THEORY:
K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a
dataset into a predefined number of clusters. It aims to group similar data points together and
discover underlying patterns or structures in the data.
Refer unit 5 of course curriculum.
EXERCISES:
1. Load the dataset containing the points to be clustered.
2. Initialize K cluster centroids randomly.
3. Repeat until convergence (i.e., cluster assignments do not change):
a. Assign each data point to the nearest cluster centroid.
b. Update each cluster centroid to be the mean of all data points assigned to it.
4. Assign each test data point to the nearest cluster centroid.
5. Evaluate the performance of the algorithm
QUIZ:
1. What is K-means clustering used for?
a) Dimensionality reduction b) Data cleaning c) Data clustering d) Model selection
16
4. How is the initial centroid for K-means clustering selected?
a) Randomly b) Based on the mean of the data points
c) Based on the median of the data points d) Based on the mode of the data points
17
Experiment No: 10 Date: / /
OBJECTIVES:
THEORY:
EXERCISES:
a. Find rows and columns using shape command
b. Print first 30 instances using head command
c. Find out the data instances in each class. (use group by and size)
d. Plot the univariate graphs (box plot and histograms)
e. Plot the multivariate plot (scatter matrix)
f. Split data to train model by 80% data values
g. Apply K-NN and k means clustering to check accuracy and decide which is better.
QUIZ:
1. Which algorithm is supervised and which one is unsupervised?
a) K-means clustering is supervised, KNN algorithm is unsupervised
b) K-means clustering is unsupervised, KNN algorithm is supervised
c) Both K-means clustering and KNN algorithm are supervised
d) Both K-means clustering and KNN algorithm are unsupervised
19