0% found this document useful (0 votes)
35 views82 pages

ML Practical Format

The document is a laboratory manual for the Fundamentals of Machine Learning course (4341603) designed for Diploma Engineering students at Dr S & S.S. Ghandhy College of Engineering & Technology. It outlines the vision and mission of the Directorate of Technical Education, the institute, and the department, along with the practical outcomes and course objectives aimed at developing industry-relevant skills in students. The manual includes guidelines for both faculty and students, detailing the practical experiments, assessment criteria, and safety precautions necessary for effective learning in machine learning applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views82 pages

ML Practical Format

The document is a laboratory manual for the Fundamentals of Machine Learning course (4341603) designed for Diploma Engineering students at Dr S & S.S. Ghandhy College of Engineering & Technology. It outlines the vision and mission of the Directorate of Technical Education, the institute, and the department, along with the practical outcomes and course objectives aimed at developing industry-relevant skills in students. The manual includes guidelines for both faculty and students, detailing the practical experiments, assessment criteria, and safety precautions necessary for effective learning in machine learning applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 82

Fundamentals of Machine Learning (4341603)

Diploma Engineering
Laboratory Manual
(Fundamentals of Machine Learning)
(4341603)

[IT sem-4]
Enrollment No 216120316003
Name Chauhan Krish Pravinbhai
Branch Information Technology(IT)
Academic Term 222
Institute Dr S & S.S. Ghandhy College of Engg &
Tech.

Directorate Of Technical Education


Gandhinagar - Gujarat
Fundamentals of Machine Learning (4341603)

DTE’s Vision:
● To provide globally competitive technical education;
● Remove geographical imbalances and inconsistencies;
● Develop student friendly resources with a special focus on girls’ education
and support to weaker sections;
● Develop programs relevant to industry and create a vibrant pool of technical
professionals.

Institute’s Vision:
“To be a unique center of excellence in technical education & innovation for
sustainable growth of industry and society.”

Institute’s Mission:

1. To impart globally viable technical core competencies and skills.


2. To respond effectively to the ever changing needs of industry and community at
large.
3. To promote conducive campus environment and resources for qualitative
education and innovation.
4. To inculcate moral, ethical and professional values amongst all internal
stakeholders.

Department’s Vision:
“To be a leading department in providing competent IT engineer for the benefit of
industry and society.”

Department’s Mission:
 M1: To prepare competent IT engineer by imparting qualitative technical education in
IT field with best infrastructure.
 M2: To enhance the student’s technical competency in IT field for solving real world
problems.
 M3: To nurture professional and ethical values in IT engineer to become a responsible
member of workforce and society.
Fundamentals of Machine Learning (4341603)

Certificate

This is to certify that Mr./Ms ………………………………………………………………….


Enrolment No. ………….……………. of …….………. Semester of Diploma
in..................................................................................................................of
………………………………………………………………………. (GTU Code) has satisfactorily completed the
term work in course..................................................................................for the academic year:
…………………… Term: Odd/Even prescribed in the GTU curriculum.

Place:…………..

Date: …………………..

Signature of Course Faculty Head of the Department


Fundamentals of Machine Learning (4341603)

Preface
The primary aim of any laboratory/Practical/field work is enhancement of required skills as
well as creative ability amongst students to solve real time problems by developing relevant
competencies in psychomotor domain. Keeping in view, GTU has designed competency focused
outcome-based curriculum -2021 (COGC-2021) for Diploma engineering programmes. In this more
time is allotted to practical work than theory. It shows importance of enhancement of skills amongst
students and it pays attention to utilize every second of time allotted for practical amongst Students,
Instructors and Lecturers to achieve relevant outcomes by performing rather than writing practice in
study type. It is essential for effective implementation of competency focused outcome- based Green
curriculum-2021. Every practical has been keenly designed to serve as a tool to develop & enhance
relevant industry needed competency in each and every student. These psychomotor skills are very
difficult to develop through traditional chalk and board content delivery method in the classroom.
Accordingly, this lab manual has been designed to focus on the industry defined relevant outcomes,
rather than old practice of conducting practical to prove concept and theory.

By using this lab manual, students can read procedure one day in advance to actual
performance day of practical experiment which generates interest and also, they can have idea of
judgement of magnitude prior to performance. This in turn enhances predetermined outcomes
amongst students. Each and every Experiment /Practical in this manual begins by competency,
industry relevant skills, course outcomes as well as practical outcomes which serve as a key role for
doing the practical. The students will also have a clear idea of safety and necessary precautions to be
taken while performing experiment.

This manual also provides guidelines to lecturers to facilitate student-centred lab activities
for each practical/experiment by arranging and managing necessary resources in order that the
students follow the procedures with required safety and necessary precautions to achieve outcomes.
It also gives an idea that how students will be assessed by providing Rubrics.
Fundamentals of machine learning course will help students to build up core competencies
in understanding machine learning approaches and students will be able to design and train machine
learning modes for various use cases. The lab work of the course is designed to develop crisp
understanding of the underpinning theory.

Although we try our level best to design this lab manual, but always there are chances of
improvement. We welcome any suggestions for improvement.

3 | Page
Fundamentals of Machine Learning (4341603)

Programme Outcomes (POs):

1. Basic and Discipline specific knowledge: Apply knowledge of basic mathematics, science
and engineering fundamentals and engineering specialization to solve the engineering
problems.

2. Problem analysis: Identify and analyse well-defined engineering problems using codified
standard methods.

3. Design/ development of solutions: Design solutions for engineering well-defined technical


problems and assist with the design of systems components or processes to meet specified
needs.

4. Engineering Tools, Experimentation and Testing: Apply modern engineering tools and
appropriate technique to conduct standard tests and measurements.

5. Engineering practices for society, sustainability and environment: Apply appropriate


technology in context of society, sustainability, environment and ethical practices.

6. Project Management: Use engineering management principles individually, as a team


member or a leader to manage projects and effectively communicate about well-defined
engineering activities.

7. Life-long learning: Ability to analyze individual needs and engage in updating in the context
of technological changes in field of engineering.

4 | Page
Fundamentals of Machine Learning (4341603)

Practical Outcome - Course Outcome matrix


Course Outcomes (COs):

a) CO1: -To understand the need of machine learning for various problem solving.
b) CO2: - Prepare machine leaning model and learning the evaluation methods.
c) CO3: - Evaluate various supervised learning algorithms using appropriate dataset.
d) CO4: -Evaluate various unsupervised learning algorithms using appropriate dataset.
e) CO5:-To understand the use of various existing machine learning libraries.
CO1 CO2 CO3 CO4 CO5
S. No. Practical Outcome/Title of experiment

1. Numerical Computing with Python (NumPy, ✔


Matplotlib)

2. Introduction to Pandas for data import and ✔


export (Excel, CSV etc.)

3. Basic Introduction to Scikit learn ✔

4. Implement the Find-S concept learning algorithm ✔


that finds the most specific hypothesis that is
consistent with the given training data.
Conditions:
Hypothesis can only be conjunction (AND) of
literals. Literals are either attributes or their
negations.

5. Import Pima indian diabetes data ✔


Apply select K best and chi2 for feature selection
Identify the best features

Write a program to learn a decision tree and use


6. ✔
it to predict class labels of test data
Training and test data will be explicitly provided
by instructor.
Tree pruning should not be performed.

7. ML Project ✔
Use the following dataset as music.csv

5 | Page
Fundamentals of Machine Learning (4341603)

a. Store file as music.csv and import it to python


using pandas
b. Prepare the data by splitting data in input (age
, gender) and output(genre) data set
c. Use decision tree model from sklearn to
predict the genre of various age group people.
(Ex A male of age 21 likes hiphop whereas
female of age 22 likes dance)
d. Calculate the accuracy of the model.
e. vary training and test size to check different
accuracy values model achieves.

8. Write a program to use a K-nearest neighbor it



to predict class labels of test data.
Training and test data must be provided
explicitly.
Import vgsales.csv from kaggle platform.
9. ✔
a. Find rows and columns in dataset
b. Find basic information regarding dataset using
describe command.
C. Find values using values command.
Project on regression
10. ✔
a. Import home_data.csv on kaggle
using pandas
b. Understand data by running head,
info and describe command

6 | Page
Fundamentals of Machine Learning (4341603)

c. Plot the price of house with respect


to area using matplotlib library
d. Apply linear regression model to
predict the price of house

11. Write a program to cluster a set of points using ✔


K-means.
Training and test data must be provided
explicitly.

12. Import Iris dataset ✔


a. Find rows and columns using shape
command
b. Print first 30 instances using head
command
c. Find out the data instances in each
class. (use group by and size)
e. Plot the univariate graphs (box plot
and histograms)
f. Plot the multivariate plot (scatter
matrix)
g. Split data to train model by 80%
data values
h. Apply K-NN and k means clustering
to check accuracy and decide which
is better.

Industry Relevant Skills

The following industry relevant skills are expected to be developed in the students by
performance of experiments of this course.

a) Student will learn to automate variety of task making system more efficient and cost
effective
b) Student will learn efficient handling of data that will cater to better data analytics
c) Student will lean to implement machine learning approaches to varied field of
applications from healthcare to e-commerce.

7 | Page
Fundamentals of Machine Learning (4341603)

Guidelines to Course Faculty


1. Couse faculty should demonstrate experiment with all necessary implementation
strategies described in curriculum.
2. Couse faculty should explain industrial relevance before starting of each experiment.
3. Course faculty should involve & give opportunity to all students for hands on experience.
4. Course faculty should ensure mentioned skills are developed in the students by asking.
5. Utilise 2 hours of lab hours effectively and ensure completion of write up with quiz also.
6. Encourage peer to peer learning by doing same experiment through fast learners.

Instructions for Students

1. Organize the work in the group and make record of all observations.
2. Students shall develop maintenance skill as expected by industries.
3. Student shall attempt to develop related hand-on skills and build confidence.

8 | Page
Fundamentals of Machine Learning (4341603)

Continuous Assessment Sheet


Enrolment No: Name
Name: Term:
4. Student shall develop the habits of evolving more ideas, innovations, skills etc.
5. Student shall refer technical magazines and data books.
6. Student should develop habit to submit the practical on date and time.
7. Student should well prepare while submitting write-up of exercise.

Sr Practical Outcome/Title of experiment Page Date Marks Sign


no
(25)

Numerical
1 Computing with Python (NumPy,
Matplotlib)

Introduction
2 to Pandas for data import and export (Excel,
CSV etc.)

Basic Introduction to Scikit learn


3

Implement
4 the Find-S concept learning algorithm that
finds the most specific hypothesis that is
consistent with the given training data.
Conditions:
Hypothesis can only be conjunction (AND) of literals. Literals
are either attributes or their negations.

Import5Pima indian diabetes data


Apply select K best and chi2 for feature selection
Identify the best features

Write a6 program to learn a decision tree and use it to


predict class labels of test data
Training and test data will be explicitly provided by
instructor.
Tree pruning should not be performed.

9 | Page
Fundamentals of Machine Learning (4341603)

7
ML Project
Use the following dataset as music.csv

a. Store file as music.csv and import it to python


using pandas
b. Prepare the data by splitting data in input (age,
gender) and output(genre) data set
c. Use decision tree model from sklearn to predict
the genre of various age group people. (Ex A male of
age 21 likes hiphop whereas female of age 22 like
dance)
d. Calculate the accuracy of the model.
e. vary training and test size to check different
accuracy values model achieves.
8 Write a program to use a K-nearest neighbor it to
predict class labels of test data.
Training and test data must be provided explicitly.

9
Import vgsales.csv from kaggle platform.
a. Find rows and columns in dataset
b. Find basic information regarding dataset using
describe command.
C. Find values using values command.

10 | Page
Fundamentals of Machine Learning (4341603)

Project on regression
a. Import home_data.csv on kaggle using
pandas
b. Understand data by running head, info
and describe command
c. Plot the price of house with respect to
area using matplotlib library
d. Apply linear regression model to
predict the price of house

Write a program to cluster a set of points using K-


means.
Training and test data must be provided explicitly.
Import Iris dataset
a. Find rows and columns using shape
command
b. Print first 30 instances using head
command
c. Find out the data instances in each
class. (use group by and size)
i. Plot the univariate graphs (box plot and
histograms)
j. Plot the multivariate plot (scatter
matrix)
k. Split data to train model by 80% data
values
l. Apply K-NN and k means clustering to
check accuracy and decide which is
better.

11 | Page
Fundamentals of Machine Learning (4341603)

Date: ……………
Practical No.1: Numerical Computing with Python (NumPy, Matplotlib)
A. Objective: Getting familiarized with python libraries related to visualization and
computation.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO4, PO7.

C. Expected Skills to be developed based on competency:


I. To understand the use of well know python libraries.
II. To visualize data and implement logics based on data..
D. Expected Course Outcomes(Cos)
CO -1
E. Practical Outcome(PRo)
Store and represent data using python libraries.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.

G. Prerequisite Theory:
Refer Unit 1 of course curriculum. Also explore the link following link
https://numpy.org/doc/stable/
https://matplotlib.org/stable/tutorials/index
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

12 | Page
Fundamentals of Machine Learning (4341603)

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):
Program:-
Numpy:-
import numpy as np
# Declaring Array
Array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print('1-D Array:-', Array)
Output:-
1-D Array:- [ 1 2 3 4 5 6 7 8 9 10]

# 1) Check Dimension of the Array


Array1 = np.array([1, 2, 3, 4, 5])
Array11 = np.array([[1, 2, 3, 4, 5]])
print('1-D Array:-', Array1.ndim)
print('2-D Array:-', Array11.ndim)
Output:-
1-D Array:- 1
2-D Array:- 2

# 2)Shaping Array
Array2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('1-D Array Shape:-', Array2.shape)
Output:-
1-D Array Shape:- (3, 3)

# 3)Arange Function in Numpy


A = np.arange(2, 11)
print("1-D Array using Arange:-", A)
Output:-
13 | Page
Fundamentals of Machine Learning (4341603)

1-D Array using Arange:- [ 2 3 4 5 6 7 8 9 10]

# 4)Reshaping Function in Numpy


B = np.arange(3, 12)
print('Resizing 1-D Array:-\n', B.reshape(3, 3))
print('Resizing 1-D Array:-', B)
Output:-
Resizing 1-D Array:-
[[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
Resizing 1-D Array:- [ 3 4 5 6 7 8 9 10 11]

# 5)Sin,Cos and Tan Function in Numpy


arr = np.array([1, 2, 3, 4, 5])
sin = np.sin(arr*np.pi/180)
print(“SIN:-“,sin)
cos = np.cos(arr*np.pi/180)
print(“COS:-”,cos)
tan = np.tan(arr*np.pi/180)
print(“TAN:-”,tan)
Output:-
SIN:- [0.01745241 0.0348995 0.05233596 0.06975647 0.08715574]
COS:- [0.9998477 0.99939083 0.99862953 0.99756405 0.9961947 ]
TAN:- [0.01745506 0.03492077 0.05240778 0.06992681 0.08748866]

# 6)Round Function in Numpy


arr = np.array([1.5269, 2.1278, 3.7419, 4.1893, 5.9562])
print("1-D Array:-",arr)
print("Rounding 1-D Array using around:-", np.around(arr, 2))
Output:-
1-D Array:- [1.5269 2.1278 3.7419 4.1893 5.9562]
Rounding 1-D Array using around:- [1.53 2.13 3.74 4.19 5.96]

# 7)Ceil Function in Numpy


arr = np.array([1.5269, 2.1278, 3.7419, 4.1893, 5.9562])
print(np.ceil(arr))
Output:-
1-D Array:- [1.5269 2.1278 3.7419 4.1893 5.9562]
Ceiling 1-D Array:- [2. 3. 4. 5. 6.]

14 | Page
Fundamentals of Machine Learning (4341603)

# 8)Floor Function in Numpy


arr = np.array([1.5269, 2.1278, 3.7419, 4.1893, 5.9562])
print(np.floor(arr))
Output:-
1-D Array:- [1.5269 2.1278 3.7419 4.1893 5.9562]
Flooring 1-D Array:- [1. 2. 3. 4. 5.]

# 9)Ones Function in Numpy


arr = np.ones(5)
print('Ones Array:-', arr)
Output:-
Ones Array:- [1. 1. 1. 1. 1.]

# 10)Zeros Function in Numpy


arr = np.zeros(5)
print('Zero Array:-', arr)
Output:-
Zero Array:- [0. 0. 0. 0. 0.]

# 11)Array_Split Function in Numpy


arr = np.array([1, 2, 3, 4, 5, 6])
print(“1-D Array:-”,arr)
split = np.array_split(arr, 3)
print(“Splitting 1-D Array:-“,split)
Output:-
1-D Array:- [1 2 3 4 5 6]
Splitting 1-D Array:- [array([1, 2]), array([3, 4]), array([5, 6])]

K. Practical related Quiz.


1. Which of the function is a function to create a numpy array?
a) empty()
b) array()
c) ones()
d) All the above
2. What is the output of the below code?
a) array([2, 3, 4, 5, 6, 7])
15 | Page
Fundamentals of Machine Learning (4341603)

b) array([3, 4, 5, 6, 7])

c) arraY([2, 3, 4, 5, 6, 7, 8])

d) array([3, 4, 5, 6, 7, 8])
3. Find the output of the below code
a = np.array([[[1,2,3],[4,5,6]]])
a) 1
b) (1,3)
c) 3
d) (3,1)
4. By default, Plot() function plots a?
a) Bar chart
b) Line chart
c) Pie chart
d) Horizontal bar chart
5. Which of the following type of chart is not supported by pyplot?
a) Pie
b) Boxplot
c) Histogram
d) All of the above
6. To create histogram pyplot provides?
a) hist()
b) histo()
c) histg()
d) histogram()
L. References / Suggestions ( lab manual designer should give)
Numpy
https://www.youtube.com/watch?v=Rbh1rieb3zc

Matplotlib
16 | Page
Fundamentals of Machine Learning (4341603)

https://www.youtube.com/watch?v=yZTBMMdPOww

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentati 5 Poor write-up and
Executed and but presentation is missing with
on diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understandin understood and
5 the cannot explain the
g& cannot give
performance performance
Explanation explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Total 5 criteria and each having 0-5 marks, weightage taken from curriculum of that
course.
Max 25 marks

Sign with Date

17 | Page
Fundamentals of Machine Learning (4341603)

Practical No.2: Introduction to Pandas for data import and export (Excel, CSV etc.)
A. Objective: Getting familiarized with python machine learning libraries.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO4, PO7.

C. Expected Skills to be developed based on competency:


I. To understand the use of well know python machine learning libraries.
II. Using machine learning methods in python libraries.
D. Expected Course Outcomes(Cos)
CO-5
E. Practical Outcome(PRo)
Using machine learning methods implemented in Pandas library.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.

G. Prerequisite Theory:
Refer Unit 6 of course curriculum. Also explore the link following link
https://pandas.pydata.org/docs/user_guide/index.html

H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

18 | Page
Fundamentals of Machine Learning (4341603)

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):
Program:-
import pandas as pd
student=["Krish","Harshil","Krinal"]
gender=["Male","Male","Female"]
marks=[50,50,50]
dic={"Student Name:-":student,"Gender:-":gender,"Marks:-":marks}
df=pd.DataFrame(dic)
df.to_csv('Panda_Practical.csv')
print(df)

Output:-

pd.read_csv('Panda_Practical.csv')

Panda_Practical.csv:-

19 | Page
Fundamentals of Machine Learning (4341603)

K. Practical related Quiz.


1. Which of the following feature is not provided by the Pandas module?
a) Merge and join the data sets
b) Filter data using the condition
c) Plot and visualize the data
d) None of the above
2. From which of the following files, pandas can read data?
a) JSON

b) Excel

c) HTML
d) All the above
3. Given a dataset named ‘data’ containing the 5 columns and 10 rows, find the
output of the below code?
print(len(data.columns))
a) 5
b) 10
c) 15
d) 50
4. What does the attribute shape return?
a) It returns the number of rows and columns respectively in the form of a
tuple
b) It returns the number of columns and rows respectively in the form of a
list
c) It returns the number of rows and columns respectively in the form of a
list
d) It returns the number of columns and rows respectively in the form of a
tuple
5. Which of the following commands return the data type of the values in each
column in the data frame
a) print(df.dtype)
20 | Page
Fundamentals of Machine Learning (4341603)

b) print(dtypes(df))
c) print(df.dtypes)
d) None of the above
L. References / Suggestions ( lab manual designer should give)
https://www.youtube.com/watch?v=RhEjmHeDNoA
M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentati 5 Poor write-up and
Executed and but presentation is missing with
on diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understandin understood and
5 the cannot explain the
g& cannot give
performance performance
Explanation explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

21 | Page
Fundamentals of Machine Learning (4341603)

Practical No.3: Basic Introduction to Scikit learn


A. Objective: Getting familiarized with python machine learning libraries.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO4, PO7.

C. Expected Skills to be developed based on competency:


I. To understand the use of well know python machine learning libraries.
II. Using machine learning methods in python libraries.
D. Expected Course Outcomes(Cos)
CO-5
E. Practical Outcome(PRo)
Using machine learning methods implemented in Scikit library.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer Unit 6 of course curriculum. Also explore the link following link
https://scikit-learn.org/stable/user_guide.html
H. Resources/Equipment Required

Sr.No. Instrument/Equipment Specification Quantity


/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.

22 | Page
Fundamentals of Machine Learning (4341603)

 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):

Introduction to Scikit Learn:-


Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of
efficient tools for machine learning and statistical modeling including classification, regression, clustering and
dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is
built upon NumPy, SciPy and Matplotlib.

Features of Scikit Learn:-


Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is focused on
modeling the data. Some of the most popular groups of models provided by Sklearn are as follows −

Supervised Learning algorithms − Almost all the popular supervised learning algorithms, like Linear
Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of scikit-learn.

Unsupervised Learning algorithms − On the other hand, it also has all the popular unsupervised learning
algorithms from clustering, factor analysis, PCA (Principal Component Analysis) to unsupervised neural
networks.
Clustering − This model is used for grouping unlabeled data.

Cross Validation − It is used to check the accuracy of supervised models on unseen data.

Dimensionality Reduction − It is used for reducing the number of attributes in data which can be further
used for summarisation, visualisation and feature selection.

Ensemble methods − As name suggest, it is used for combining the predictions of multiple supervised
models.

Feature extraction − It is used to extract the features from data to define the attributes in image and text
data.

Feature selection − It is used to identify useful attributes to create supervised models.

Open Source − It is open source library and also commercially usable under BSD license.
23 | Page
Fundamentals of Machine Learning (4341603)

Program-1:-
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
target_names = iris.target_names
print("Feature names:", feature_names)
print("Target names:", target_names)
print("\nFirst 10 rows of X:\n", X[:10])

Output:-

Program-2 :-
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
print("X:-",len(X))
print("\n")
print("y:-",len(y))
print("\n")

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.31, random_state = 1
)
print("X Training:-",X_train.shape)
print("X Testing:-",X_test.shape)
print("Y Training:-",y_train.shape)
print("Y Testing:-",y_test.shape)
24 | Page
Fundamentals of Machine Learning (4341603)

Output:-

K. Practical related Quiz.


State True or False
1. Why do we need two sets: a train set and a test set?
a) To train the model faster
b) To validate the model on unseen data
c) To improve the accuracy of the model
2. Cross-validation allows us to:
a) train the model faster
b) measure the generalization performance of the model
c) reach better generalization performance
d) estimate the variability of the generalization score
3. How is a tabular dataset organized?
a) a column represents a sample and a row represents a feature
b) a column represents a feature and a row represents a sample
c) the target variable is represented by a row
d) the target variable is represented by a column
4. A categorical variable is:
a) a variable with only two different possible values
b) a variable with continuous numerical values
c) a variable with a finite set of possible values

25 | Page
Fundamentals of Machine Learning (4341603)

L. References / Suggestions ( lab manual designer should give)


https://www.youtube.com/watch?v=OobqWEUrVKw

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory Developing(2 Limited (1


Criteria
Marks - Marks) (4 to 3 - - Marks) -Mark)
Marks)
Present in
Watched other
practical
Performed students
session but not
Performed practical with performing
attentively
Enagement 5 practical others help practical but not
tried him/her self

him/her participated in
self performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentati 5 Poor write-up and
Executed and but presentation is missing with
on diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understandin g understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

26 | Page
Fundamentals of Machine Learning (4341603)

Practical No.4: Implement the Find-S concept learning algorithm that finds the most specific
hypothesis that is consistent with the given training data.

Conditions:

Hypothesis can only be conjunction (AND) of literals. Literals are either


attributes or their negations.

A. Objective: To understand the concept of learning


B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Organising search for an acceptable hypothesis.
II. Implementing a procedure.
III. Testing use cases over implementing procedures.
D. Expected Course Outcomes(Cos)
CO-2
E. Practical Outcome(PRo)
Identifying the best features using K best and chi2 algorithms on Pima indian
diabetes data.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer Unit 2 of course curriculum. Students are suggested to read chapter 2 of
Machine Learning authored by TOM. M. Mithell
Dataset
1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 0 0 1
1 1 1 1 1 1 1 1 0
1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 0 1
1 1 1 0 1 1 0 1 1
1 1 0 1 1 1 0 1 0

27 | Page
Fundamentals of Machine Learning (4341603)

1 1 1 0 1 1 0 0 1
1 1 1 0 1 0 0 1 1
1 1 1 0 1 0 0 0 1
0 1 1 1 1 1 0 1 1
0 1 1 1 1 1 0 0 1
1 0 1 1 1 1 0 1 0
0 1 1 1 1 0 0 1 1
1 1 0 1 0 1 0 1 0
1 0 0 1 1 1 0 1 0
1 0 0 1 0 1 1 1 0
0 1 1 1 1 0 0 0 1
1 0 1 1 1 1 1 1 0
0 1 1 0 1 1 0 1 1

H. Resources/Equipment Required
Sr.No. Instrument/Equipment
Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.

28 | Page
Fundamentals of Machine Learning (4341603)

J. Procedure to be followed/Source code (CE & IT software subjects):


Program:-
import pandas as pd
import numpy as np
time=["Morning","Evening","Morning","Evening"]
weather=["Sunny","Rainy","Sunny","Sunny"]
temparature=["Warm","Cold","Moderate","Cold"]
company=["Yes","No","Yes","Yes"]
humidity=["Mild","Mild","Normal","High"]
wind=["Strong","Normal","Strong","Strong"]
goes=["Yes","No","Yes","Yes"]
dic={"Time":time,"Weather":weather,"Temparature":temparature,"Company":company,"Humidity":hum
idity,"Wind":wind,"Goes":goes}
df=pd.DataFrame(dic)
#print(df)
df.to_csv("Practical-4_Dataset.csv")
data=pd.read_csv("Practical-4_Dataset.csv")
print(data,"\n")
#making an array of all the attributes
d=np.array(data)[:,:-1]
print ("The attributes are:\n",d)

#segragating the target that has positive and negative


target =np.array(data)[:,-1]
print("The target is:\n" , target)

#training function to implement find-s algorithm


def train(c,t):
for i, val in enumerate(t):
if val=='Yes':
specific_hypothesis=c[i].copy()
break
for i, val in enumerate(c):
if t[i]=='Yes':
for x in range((len(specific_hypothesis))):
if val [x] != specific_hypothesis[x]:
specific_hypothesis[x]='?'
else:
pass
return specific_hypothesis
#obtaining the final
print("The final hypothesis is:",train(d,target))

29 | Page
Fundamentals of Machine Learning (4341603)

Output:-

Practical-4_Dataset.csv

K. Observations and Calculations/Input-Output (CE & IT software subjects):


Observation Table: Prepare chart of Input use cases. : Boolean input attributes (x1,
x2, … , x8) in first 8 columns. The last (9th) column represents the Boolean class label (y).
Each row is a training instance. There are 20 training instances as mentioned in
prerequisite theory.

L. Practical related Quiz.


State True or False
1. Find S algorithm only considers positive training examples and neglect
negative training examples.
2. In Find-S algorithm we move bottom to top i.e. general hypothesis to specific
hypothesis.
3. A maximally specific hypothesis covers none of the negative training examples.

M. References / Suggestions ( lab manual designer should give)


https://www.youtube.com/watch?v=FgqtsPkeklg

N. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

30 | Page
Fundamentals of Machine Learning (4341603)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentatio 5 Poor write-up and
Executed and but presentation is missing with
n diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& cannot give
performance performance
Explanation explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

31 | Page
Fundamentals of Machine Learning (4341603)

Practical No.5: Import Pima indian diabetes data Apply select K best and chi2 for feature
selection Identify the best features.

A. Objective: The primary objective of the practical is to understand data pre-


processing along with identifying various types of data.
B. Expected Program Outcomes (POs):-PO1, PO2 , PO3,PO6,PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Applying feature selection algorithms on imported data
III. Selecting features based on the evaluation parameters
D. Expected Course Outcomes(Cos)
CO-2
E. Practical Outcome(PRo)
Identifying the best features using K best and chi2 algorithms on Pima indian
diabetes data.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer Unit 2 of course curriculum. Students are suggested to read chapter 2 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

32 | Page
Fundamentals of Machine Learning (4341603)

I. Safety and necessary Precautions followed


Read the experiment thoroughly before starting and ensure that you understand all
the steps and concepts involved from underpinning theory.
Keep the workspace clean and organized, free from clutter and unnecessary
materials.
Use the software according to its intended purpose and instructions.
Ensure that all the necessary equipment and software are in good working
condition.
Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):

Program:-
import pandas as pd
import numpy as np
from sklearn.feature_selection import *
read=pd.read_csv("diabetes.csv")
#read=list(read)
#print(read)
read=pd.DataFrame(read)
#print(read)
dataframe=read.values
#print(dataframe)
x=dataframe[:,0:8]
y=dataframe[:,8]
test=SelectKBest(score_func=f_classif,k=4)
fit=test.fit(x,y)
np.set_printoptions(precision=3)
print(fit.scores_)
features=fit.transform(x)
print(features[0:5,:])

33 | Page
Fundamentals of Machine Learning (4341603)

Output:-

K. Observations and Calculations/Input-Output (CE & IT software subjects):


Observation Table: Prepare a accuracy table by varying training and test data.

L. Practical related Quiz.


1. What is the main advantage of using feature selection?
a) speeding-up the training of an algorithm
b) fine tuning the model’s performance
c) remove noisy features

2. When selecting feature, the decision should be made using:


a) the entire dataset
b) the training set
c) the testing set
3. Given 20 potential features, How many models do you have to evaluate in all
the subsets algorithm
a) 20
b) 40
c) 1048576
d) 1048596

4. The best fit model of size 5(i.e., with 5 features) always contains the set of
features from best fit model of size 4.
a) True
b) False
M. References / Suggestions ( lab manual designer should give)
Diabetes Prediction using Machine Learning from Kaggle-
https://www.youtube.com/watch?v=HTN6rccMu1k
https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
N. Assessment-Rubrics

34 | Page
Fundamentals of Machine Learning (4341603)

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentati 5 Poor write-up and
Executed and but presentation is missing with
on diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understandin understood and
5 the cannot explain the
g& cannot give
performance performance
Explanation explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

35 | Page
Fundamentals of Machine Learning (4341603)

Q-1:-Define select k-best algorithm.


Ans: The SelectKBest algorithm is a feature selection method that selects features according to the k highest
scores. It is a part of the scikit-learn library and can be used for both classification and regression data.

The score_func parameter determines the function used to score the features. The default function is f_classif,
which only works with classification tasks. You can change this parameter to use other scoring functions.

Q-2:-Define set _print options use ni python.


Ans: numpy.set_printoptions is a function in the NumPy library that sets printing options for NumPy objects. These
options determine the way floating point numbers, arrays and other NumPy objects are displayed1.

You can use this function to set various printing options such as precision, threshold, edgeitems, linewidth,
suppress, nanstr, infstr, formatter, sign, and floatmode

Q-3:-Define F score method of sk learn.


Ans: The f1_score function is part of the sklearn.metrics package in the scikit-learn library. It calculates the F1
score for a set of predicted labels1.

The F1 score is the harmonic mean of precision and recall, where an F1 score reaches its best value at 1 and worst
score at 0. The formula for the F1 score is: F1 = 2 * (precision * recall) / (precision + recall)2.

To use the f1_score function, you can import it into your program like this: from sklearn.metrics import f1_score.

36 | Page
Fundamentals of Machine Learning (4341603)

Practical No.6: Write a program to learn a decision tree and use it to predict class labels of test
data.

 Training and test data will be explicitly provided by instructor.


 Tree pruning should not be performed.

A. Objective: Learning Decision tree and predicting class labels.


B. Expected Program Outcomes (POs):-PO1, PO2 , PO3,PO6,PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Doing Prediction of class labels
III. Splitting of attribute based on criteria
D. Expected Course Outcomes(Cos)
CO -3
E. Practical Outcome(PRo)
Identifying the class labels using decision tree.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
The learned tree should be tested on test instances with unknown class labels, and the
predicted class labels for the test instances should be printed as output. Predicted class
labels (0/1) for the test data must be exactly in the order in which the test instances are
present in the test file.

Refer Unit 3 of course curriculum. Students are suggested to read chapter 3 of


Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit

37 | Page
Fundamentals of Machine Learning (4341603)

1 System supporting Jupyter Python 3.x 1


Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):
80:20 Ratio:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
iris=load_iris()
x=iris.data
y=iris.target
#print("X:-\n",x)
#print("Y:-\n",y)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
print("X Training:-\n",x_train.shape)
print("X Testing:-\n",x_test.shape)
print("Y Training:-\n",y_train.shape)
print("Y Testing:-\n",y_test.shape)
print("\n")
38 | Page
Fundamentals of Machine Learning (4341603)

cls=DecisionTreeClassifier()
cls.fit(x_train,y_train)
y_pred=cls.predict(x_test)
accuracy=accuracy_score(y_test,y_pred)
print("Accuracy:-",accuracy)
Output:-

70:30 Ratio:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
iris=load_iris()
x=iris.data
y=iris.target
#print("X:-\n",x)
#print("Y:-\n",y)
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=42)
print("X Training:-\n",x_train.shape)
print("X Testing:-\n",x_test.shape)
print("Y Training:-\n",y_train.shape)
print("Y Testing:-\n",y_test.shape)

39 | Page
Fundamentals of Machine Learning (4341603)

print("\n")
cls=DecisionTreeClassifier()
cls.fit(x_train,y_train)
y_pred=cls.predict(x_test)
accuracy=accuracy_score(y_test,y_pred)
print("Accuracy:-",accuracy)
Output:-

40 | Page
Fundamentals of Machine Learning (4341603)

K. Observations and Calculations/Input-Output (CE & IT software subjects):


Observation Table: Prepare a table of predicted class labels.

Calculations:
Calculate confusion matrix

41 | Page
Fundamentals of Machine Learning (4341603)

L. Practical related Quiz.


1. What is a decision tree?
a) A visual representation of decision-making using nodes and branches
b) A mathematical formula for predicting outcomes
c) A statistical model for regression analysis
2. What is the purpose of a decision tree?
a) To predict outcomes based on input variables
b) To summarize data using a graphical representation
c) To perform hypothesis testing on a dataset
3. What is a split in a decision tree?
a) A branch that represents a decision based on a feature or attribute
b) A point where the tree branches into different paths
c) A method for reducing the complexity of a decision tree
4. What is pruning in a decision tree?
a) A technique for simplifying the tree by removing branches that don't
contribute to accuracy
b) A method for reducing the number of input variables
c) A way to increase the complexity of a decision tree
5. What is over fitting in a decision tree?
a) When the tree is too simple and doesn't capture all the relevant
information
b) When the tree is too complex and fits the training data too closely
c) When the input variables are not correlated with the outcome variable

M. References / Suggestions ( lab manual designer should give)


https://cse.iitkgp.ac.in/~pabitra/course/ml/ml.html

42 | Page
Fundamentals of Machine Learning (4341603)

N. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentation 5 Poor write-up and
Executed and but presentation is missing with
diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

43 | Page
Fundamentals of Machine Learning (4341603)

Practical No.7: ML Project: Use the following dataset as music.csv.

i. Store file as music.csv and import it to python using pandas


ii. Prepare the data by splitting data in input(age ,gender) and output(genre) data set
iii. Use decision tree model from sklearn to predict the genre of various age group
people.(Ex A male of age 21 likes hiphop whereas female of age 22 likes dance)
iv. Calculate the accuracy of the model.
v. Vary training and test size to check different accuracy values model achieves.

A. Objective: Effectively use sklearn library to make predictions using decision tree.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3,PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Doing Prediction of class labels
III. Splitting of attribute based on criteria
IV. Learning on how to work on machine learning project.
D. Expected Course Outcomes(Cos)
CO-2
E. Practical Outcome(PRo)
Determine accuracy of the classification model.
F. Expected Affective domain Outcome(ADos)

44 | Page
Fundamentals of Machine Learning (4341603)

Handle tools /components/equipment carefully with safety and necessary


precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer: https://scikit-learn.org/stable/
Refer Unit 3 of course curriculum. Students are suggested to read chapter 3 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.

45 | Page
Fundamentals of Machine Learning (4341603)

J. Procedure to be followed/Source code (CE & IT software subjects):

80:20 Ratio:-
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
df=pd.read_csv("music.csv")
#print(df)
input=df[['age','gender']]
output=df['genre']
#print('Input:-\n',input)
#print('Output:-\n',output)
input_array=input.to_numpy()
output_array=output.to_numpy()
#print('Input:-',input_array)
#print('Output:-,output_array)
input_train,input_test,output_train,output_test=train_test_split(input,output,test_size=0.2,random_state=42)
print('Input Training:-',input_train.shape)
print('Input Testing:-',input_test.shape)
print('Output Training:-',output_train.shape)
print('Output Testing:-',output_train.shape)
model=DecisionTreeClassifier()
model.fit(input_train,output_train)
new_data=pd.DataFrame({'age':[21,22],'gender':[1,0]})
predict=model.predict(new_data)
for i,predict in enumerate(predict):
print(f"{i+1}:{predict}")
y_prediction=model.predict(input_test)
accuracy=accuracy_score(output_test,y_prediction)
print(f"Accuracy:-",accuracy)
Output:-

70:30 Ratio:-
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
df=pd.read_csv("music.csv")
46 | Page
Fundamentals of Machine Learning (4341603)

#print(df)
input=df[['age','gender']]
output=df['genre']
#print('Input:-\n',input)
#print('Output:-\n',output)
input_array=input.to_numpy()
output_array=output.to_numpy()
#print('Input:-',input_array)
#print('Output:-,output_array)
input_train,input_test,output_train,output_test=train_test_split(input,output,test_size=0.3,random_state=42)
print('Input Training:-',input_train.shape)
print('Input Testing:-',input_test.shape)
print('Output Training:-',output_train.shape)
print('Output Testing:-',output_train.shape)
model=DecisionTreeClassifier()
model.fit(input_train,output_train)
new_data=pd.DataFrame({'age':[21,22],'gender':[1,0]})
predict=model.predict(new_data)
for i,predict in enumerate(predict):
print(f"{i+1}:{predict}")
y_prediction=model.predict(input_test)
accuracy=accuracy_score(output_test,y_prediction)
print(f"Accuracy:-",accuracy)
Output:-

47 | Page
Fundamentals of Machine Learning (4341603)

K. Observations and Calculations/Input-Output (CE & IT software subjects):


Observation Table: Prepare a table genre suggestion accuracy by varying test and
train size.

Calculations:
Calculate confusion matrix

48 | Page
Fundamentals of Machine Learning (4341603)

L. Practical related Quiz.


1. What is the root node in a decision tree?
a) The topmost node that represents the output or decision
b) The node that has no parent
c) The node that has the maximum number of child nodes
d) The node that is located at the centre of the tree
2. What are some advantages of using decision trees for machine learning?
a) They are easy to interpret and visualize.
b) They can handle both categorical and numerical data.
c) They can handle missing values and noisy data.
d) All of the above.

M. References / Suggestions ( lab manual designer should give)


Python Machine Learning Tutorial (Data Science):-
https://www.youtube.com/watch?v=7eh4d6sabA0

N. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

49 | Page
Fundamentals of Machine Learning (4341603)

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentation 5 Poor write-up and
Executed and but presentation is missing with
diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

50 | Page
Fundamentals of Machine Learning (4341603)

Practical No.8: Write a program to use a K-nearest neighbor it to predict class labels of test
data. Training and test data must be provided explicitly.

A. Objective: Learn simplest supervised machine learning algorithm used for


classification.
B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Doing Prediction of class labels
III. Splitting of attribute based on criteria
D. Expected Course Outcomes(Cos)
CO-3
E. Practical Outcome(PRo)
Classifying data points based on how its neighbour are classified.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Training data: data.csv
1 1 1 1 1 1 0 1 1
1 1 1 1 1 1 0 0 1
1 1 1 1 1 1 1 1 0
1 1 1 1 1 0 0 1 1
1 1 1 1 1 0 0 0 1
1 1 1 0 1 1 0 1 1
1 1 0 1 1 1 0 1 0
1 1 1 0 1 1 0 0 1
1 1 1 0 1 0 0 1 1
1 1 1 0 1 0 0 0 1
0 1 1 1 1 1 0 1 1
0 1 1 1 1 1 0 0 1
1 0 1 1 1 1 0 1 0
0 1 1 1 1 0 0 1 1

51 | Page
Fundamentals of Machine Learning (4341603)

1 1 0 1 0 1 0 1 0
1 0 0 1 1 1 0 1 0
1 0 0 1 0 1 1 1 0
0 1 1 1 1 0 0 0 1
1 0 1 1 1 1 1 1 0
0 1 1 0 1 1 0 1 1
Test Data: test.csv
0 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0
0 1 1 0 1 0 0 0
0 1 1 1 1 0 0 0

Refer unit 4 of course curriculum. Students are suggested to read chapter 7 of


Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.

52 | Page
Fundamentals of Machine Learning (4341603)

J. Procedure to be followed/Source code (CE & IT software subjects):

80:20 Ratio:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset


iris = load_iris()
X = iris.data
Y = iris.target

# Split the dataset into training and test sets


X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Create an instance of the KNN classifier


knn = KNeighborsClassifier(n_neighbors=3)

# Fit the classifier on the training data


knn.fit(X_train, Y_train)

# Predict labels for the test data


y_pred = knn.predict(X_test)
print("Predicted labels:", y_pred)
print("True labels:", Y_test)

# Calculate and print the accuracy


accuracy = accuracy_score(Y_test, y_pred)
print("Accuracy:", accuracy)
Output:-

70:30 Ratio:-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset


iris = load_iris()
X = iris.data
Y = iris.target

# Split the dataset into training and test sets

53 | Page
Fundamentals of Machine Learning (4341603)

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)

# Create an instance of the KNN classifier


knn = KNeighborsClassifier(n_neighbors=3)

# Fit the classifier on the training data


knn.fit(X_train, Y_train)

# Predict labels for the test data


y_pred = knn.predict(X_test)
print("Predicted labels:", y_pred)
print("True labels:", Y_test)

# Calculate and print the accuracy


accuracy = accuracy_score(Y_test, y_pred)
print("Accuracy:", accuracy)
Output:-

54 | Page
Fundamentals of Machine Learning (4341603)

K. Practical related Quiz.


1. What does KNN stand for?
a) K-Nearest Neighbors
b) Kernel Nonlinear Network
c) K-Means Nearest Neighbors
d) None of the above
2. In KNN, how is the distance between a new data point and its neighbors
typically measured?
a) Euclidean distance
b) Manhattan distance
c) Cosine similarity
d) All of the above
3. In what type of machine learning problems is KNN generally used?
a) Regression problems
b) Classification problems
c) Clustering problems
d) Dimensionality reduction problems
4. What are some advantages of using KNN for machine learning?
a) It is a simple and easy-to-implement algorithm.
b) It can handle both continuous and categorical data.
c) It can adapt to complex decision boundaries.
d) All of the above.

L. References / Suggestions ( lab manual designer should give)


https://cse.iitkgp.ac.in/~pabitra/course/ml/ml.html
https://www.youtube.com/watch?v=4HKqjENq9OU

55 | Page
Fundamentals of Machine Learning (4341603)

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentation 5 Poor write-up and
Executed and but presentation is missing with
diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

56 | Page
Fundamentals of Machine Learning (4341603)

Practical No.9: Import vgsales.csv from kaggle platform.


a. Find rows and columns in dataset
b. Find basic information regarding dataset using describe command.
C. Find values using values command.

A. Objective: understand the imported data from known repositories.


B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Understanding the data imported.
III. Using pandas library
D. Expected Course Outcomes(Cos)
CO-3
E. Practical Outcome(PRo)
Identifying data attributes.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
https://pandas.pydata.org/docs/user_guide/index.html
Refer unit 4 of course curriculum. Students are suggested to read chapter 7 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

57 | Page
Fundamentals of Machine Learning (4341603)

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.

J. Procedure to be followed/Source code (CE & IT software subjects):


Program:-
import pandas as pd
# Load the dataset
dataset=pd.read_csv('vgsales.csv')
# Get the number of rows and columns
num_rows=dataset.shape[0]
num_columns=dataset.shape[1]
print("Number of Rows:-",num_rows)
print("Number of Columnns:-",num_columns)
description=dataset.describe()
print ("Description:-\n",description)
values=dataset.values
print ("Values:-\n",values)

Output:-

58 | Page
Fundamentals of Machine Learning (4341603)

K. Practical related Quiz.


1. What is Pandas used for?
a) Data analysis and manipulation
b) Web development
c) Machine learning
d) Image processing
2. What are the two main data structures in Pandas?
a) Series and DataFrames
b) Arrays and lists
c) Dictionaries and tuples
d) Matrices and vectors
3. How do you read a CSV file into a Pandas DataFrame?
a) pd.read_csv('filename.csv')
b) pd.read_excel('filename.csv')
c) pd.read_table('filename.csv')
d) pd.read_json('filename.csv')
4.How do you select a subset of rows and columns from a Pandas
DataFrame?
a) df.loc[row_index, column_index]
b) df.iloc[row_index, column_index]
c) df[row_index, column_index]
d) df.select(row_index, column_index)
5. How do you group data in a Pandas DataFrame?
a) df.groupby(column_name)
b) df.group_by(column_name)
c) df.sort_by(column_name)
d) df.filter_by(column_name)

59 | Page
Fundamentals of Machine Learning (4341603)

L. References / Suggestions ( lab manual designer should give)


https://www.youtube.com/watch?v=7eh4d6sabA0

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentatio 5 Poor write-up and
Executed and but presentation is missing with
n diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submitted
but by the end of before the end
thework
2ndweek of3rd week after 3 week time
within 1 week

Max 25 marks

Sign with Date

60 | Page
Fundamentals of Machine Learning (4341603)

Practical No.10: Project on regression


a. Import home_data.csv on kaggle using pandas
b. Understand data by running head ,info and describe command
c. Plot the price of house with respect to area using matplotlib library
d. Apply linear regression model to predict the price of house

A. Objective: understand the linear model.


B. Expected Program Outcomes (POs):-PO1, PO2, PO3,PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Importing existing datasets from data repositories.
II. Understanding the data imported.
III. Using sklearn library to implement linear model.
D. Expected Course Outcomes(Cos)
CO-3
E. Practical Outcome(PRo)
Predicting values using linear regresssion.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
https://scikit-learn.org/stable/
Refer unit 4 of course curriculum. Students are suggested to read chapter 8 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

61 | Page
Fundamentals of Machine Learning (4341603)

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.
 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):
Program (a,b,c) :-

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load the dataset


dataset = pd.read_csv('home_data.csv')

# Display the first few rows


print (dataset.head ( ) )

# Get information about the dataset


print (dataset. info() )

# Get summary statistics of the dataset


print (dataset . describe())

# Extract 'Area' and 'Price' columns


area = dataset[ 'sqft_lot']
price = dataset['price']

#Create scatter plot


plt. scatter(area, price)

# Customize pcot
plt.xlabel( ' sqft_lot' )

62 | Page
Fundamentals of Machine Learning (4341603)

plt.ylabel( 'price')
plt.title('House Price vs. Area')

# Display the plot


plt. show()

Output:-

63 | Page
Fundamentals of Machine Learning (4341603)

64 | Page
Fundamentals of Machine Learning (4341603)

Program (d):-
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the dataset


data = pd.read_csv('home_data.csv')

# Separate the features (X) and target variable (y)


X = data[['sqft_lot', 'bedrooms', 'zipcode']]
y = data['price']

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model


model = LinearRegression()

65 | Page
Fundamentals of Machine Learning (4341603)

# Fit the model to the training data


model.fit(X_train, y_train)

# Predict on the test data


y_pred = model.predict(X_test)

# Calculate the mean squared error


mse = mean_squared_error(y_test, y_pred)
print('Mean Squared Error:', mse)

Output:-

K. Practical related Quiz.


1. What is linear regression used for?
a) Data visualization
b) Clustering
c) Predictive modeling
d) Dimensionality reduction
2. In linear regression, what is the objective?
a) To minimize the mean squared error between the predicted and actual
values
b) To maximize the correlation coefficient between the features and target
variable
c) To maximize the R-squared value between the features and target variable
d) To minimize the sum of absolute errors between the predicted and actual
values
3. How is linear regression implemented in Scikit-Learn?
a) By instantiating a LinearRegression object and calling its fit method
b) By instantiating a Regression object and calling its fit method
c) By instantiating a LinearModel object and calling its fit method
d) By instantiating a LinearSolver object and calling its fit method

66 | Page
Fundamentals of Machine Learning (4341603)

4. What is the R-squared value in linear regression?


a) A measure of how well the model fits the data
b) A measure of the correlation between the features and target variable
c) A measure of the variance in the target variable that can be explained by
the features
d) A measure of the error between the predicted and actual values
L. References / Suggestions ( lab manual designer should give)
https://www.youtube.com/watch?v=8jazNUpO3lQ

67 | Page
Fundamentals of Machine Learning (4341603)

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentatio 5 Poor write-up and
Executed and but presentation is missing with
n diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding understood and
5 the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

68 | Page
Fundamentals of Machine Learning (4341603)

Practical No.11: Write a program to cluster a set of points using K-means.


Training and test data must be provided explicitly.

A. Objective: Determining the correct number of clusters.


B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Learning data pre-processing task.
II. Determining optimal number of clusters.
III. Understanding feature selection.
D. Expected Course Outcomes(Cos)
CO-4
E. Practical Outcome(PRo)
Predicting values using linear regresssion.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer unit 5 of course curriculum. Students are suggested to read chapter 9 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit
1 System supporting Jupyter Python 3.x 1
Notebook

I. Safety and necessary Precautions followed


 Read the experiment thoroughly before starting and ensure that you
understand all the steps and concepts involved from underpinning theory.

69 | Page
Fundamentals of Machine Learning (4341603)

 Keep the workspace clean and organized, free from clutter and unnecessary
materials.
 Use the software according to its intended purpose and instructions.
 Ensure that all the necessary equipment and software are in good working
condition.
 Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
 If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.
J. Procedure to be followed/Source code (CE & IT software subjects):

Program:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
# Generate random data points
np.random.seed(0)
X = np.random.rand(100, 2) # 100 points in 2D space
# Perform K-means clustering
k = 3 # Number of clusters
kmeans = KMeans(n_clusters=k)
kmeans.fit(X)
# Get the cluster labels and centroids
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Plot the data points and clusters
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='x', s=150, c='red')
plt.title('K-means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.show()
Output:-

70 | Page
Fundamentals of Machine Learning (4341603)

K. Practical related Quiz.


1. What is K-means clustering used for?
a) Dimensionality reduction
b) Data cleaning
c) Data clustering
d) Model selection
2. What is the objective of K-means clustering?
a) To minimize the sum of squared distances between data points and their
centroids
b) To maximize the variance between data points and their centroids
c) To minimize the sum of absolute distances between data points and their
centroids
d) To maximize the correlation between data points and their centroids
3. What is the value of K in K-means clustering?
a) The number of clusters
b) The number of data points
c) The number of features
d) The number of centroids
4. How is the initial centroid for K-means clustering selected?
a) Randomly
b) Based on the mean of the data points
c) Based on the median of the data points
d) Based on the mode of the data points
5. How do you evaluate the quality of the clustering in K-means clustering?
a) By calculating the sum of squared distances between data points and their
centroids
b) By calculating the silhouette score
c) By calculating the F1 score

71 | Page
Fundamentals of Machine Learning (4341603)

d) By calculating the Pearson correlation coefficient


L. References / Suggestions ( lab manual designer should give)
https://www.youtube.com/watch?v=EItlUEPCIzM

M. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
attentively
Engagement 5 him/her others help practical but not
participated in
self tried him/her self
performance

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentati 5 Poor write-up and
Executed and but presentation is missing with
on diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understandin understood and
5 the cannot explain the
g& cannot give
performance performance
Explanation explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

72 | Page
Fundamentals of Machine Learning (4341603)

Practical No.12: Import Iris dataset.


a. Find rows and columns using shape command
b. Print first 30 instances using head command
c. Find out the data instances in each class.(use groupby and size)
m. Plot the univariate graphs(box plot and histograms)
n. Plot the multivariate plot(scatter matrix)
o. Split data to train model by 80% data values
p. Apply K-NN and k means clustering to check accuracy and decide which is
better.

A. Objective: Differentiate between supervised v/s unsupervised learning approaches


B. Expected Program Outcomes (POs):-PO1, PO2, PO3, PO6, PO7.

C. Expected Skills to be developed based on competency:


I. Learn to handle data efficiently.
II. Identifying the similarity between data sets.
III. Finding Neighbours and generating responses.
IV. Computing Accuracy of the method used.
D. Expected Course Outcomes(Cos)
CO-4
E. Practical Outcome(PRo)
Differentiate between supervised and unsupervised learning.
F. Expected Affective domain Outcome(ADos)
Handle tools /components/equipment carefully with safety and necessary
precaution.
In software ethics Environment sustainability and environment consciousness
whenever suitable.
G. Prerequisite Theory:
Refer unit 5 of course curriculum. Students are suggested to read chapter 9 of
Machine Learning authored by Dutt, Chandramouli and das
H. Resources/Equipment Required
Sr.No. Instrument/Equipment Specification Quantity
/Components/Trainer kit

73 | Page
Fundamentals of Machine Learning (4341603)

1 System supporting Jupyter Python 3.x 1


Notebook

I. Safety and necessary Precautions followed


Read the experiment thoroughly before starting and ensure that you understand all
the steps and concepts involved from underpinning theory.
Keep the workspace clean and organized, free from clutter and unnecessary
materials.
Use the software according to its intended purpose and instructions.
Ensure that all the necessary equipment and software are in good working
condition.
Never eat or drink in the lab, as it can cause contamination and create safety
hazards.
If any accidents or injuries occur, immediately notify the instructor and seek
medical attention if necessary.

J. Procedure to be followed/Source code (CE & IT software subjects):

Program(a):-
from sklearn.datasets import load_iris
# Load the Iris dataset
iris = load_iris()
# Get the data and target
X = iris.data
y = iris.target
# Print the shape of the dataset
print("Number of rows:", X.shape[0])
print("Number of columns:", X.shape[1])

Output:-

74 | Page
Fundamentals of Machine Learning (4341603)

Program(b):-
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

# Print the first 30 instances


for i in range(30):
print(f"Instance {i+1}: {iris.data[i]}")

Output:-

Program(c):-
import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

75 | Page
Fundamentals of Machine Learning (4341603)

# Create a DataFrame with the data and target


df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Count the instances in each class


class_counts = df.groupby('target').size()

# Print the class counts


print(class_counts)

Output:-

Program(d):-
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

# Create a DataFrame with the data and target


df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Plot box plots for each feature


plt.figure(figsize=(10, 6))
plt.title('Box Plots for Iris Features')
sns.boxplot(data=df.drop('target', axis=1))
plt.xticks(rotation=45)
plt.show()

# Plot histograms for each feature


plt.figure(figsize=(10, 6))
plt.title('Histograms for Iris Features')
df.drop('target', axis=1).hist(bins=10, grid=False, edgecolor='black')
plt.tight_layout()
plt.show()
Output:-

76 | Page
Fundamentals of Machine Learning (4341603)

Program(f):-
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

# Create a DataFrame with the data and target


df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target

# Plot the scatter matrix


sns.set(style="ticks")
sns.pairplot(df, hue="target")
plt.show()

77 | Page
Fundamentals of Machine Learning (4341603)

Output:-

Program(g):-
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

# Load the Iris dataset


iris = load_iris()

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Print the sizes of the training and testing sets


print("Training set size:", X_train.shape[0])
print("Testing set size:", X_test.shape[0])

Output:-

78 | Page
Fundamentals of Machine Learning (4341603)

Program(h):-
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.cluster import KMeans
from sklearn.metrics import accuracy_score

# Load the Iris dataset


iris = load_iris()

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train and evaluate K-NN classifier


knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
knn_predictions = knn.predict(X_test)
knn_accuracy = accuracy_score(y_test, knn_predictions)

# Train and evaluate K-means clustering


kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_train)
kmeans_predictions = kmeans.predict(X_test)
kmeans_accuracy = accuracy_score(y_test, kmeans_predictions)

# Compare accuracies
print("K-NN Accuracy:", knn_accuracy)
print("K-means Accuracy:", kmeans_accuracy)

Output:-

K. Observations and Calculations/Input-Output (CE & IT software subjects):


Observation Table: Draw accuracy table for both KNN and Kmeans.

79 | Page
Fundamentals of Machine Learning (4341603)

L. Practical related Quiz.


1. Which algorithm is supervised and which one is unsupervised?
a) K-means clustering is supervised, KNN algorithm is unsupervised
b) K-means clustering is unsupervised, KNN algorithm is supervised
c) Both K-means clustering and KNN algorithm are supervised
d) Both K-means clustering and KNN algorithm are unsupervised
2. What is the output of K-means clustering?
a) A classification of the data points into different classes
b) A prediction of the target variable for a given data point
c) A grouping of similar data points into K clusters
d) The K nearest neighbors for a given data point
3. What is the output of KNN algorithm?
a) A classification of the data points into different classes
b) A prediction of the target variable for a given data point
c) A grouping of similar data points into K clusters
d) The K nearest neighbors for a given data point
4. What is the primary objective of K-means clustering?
a) To classify data points into different classes
b) To find the K nearest neighbors for a given data point
c) To group similar data points into K clusters
d) To predict the target variable for a given data point
5. What is the primary objective of KNN algorithm?
a) To classify data points into different classes
b) To find the K nearest neighbors for a given data point
c) To group similar data points into K clusters
d) To predict the target variable for a given data point

80 | Page
Fundamentals of Machine Learning (4341603)

M. References / Suggestions ( lab manual designer should give)


https://youtu.be/6kZ-OPLNcgE

N. Assessment-Rubrics

Total Exceptional (5 Satisfactory (4 Developing(2 Limited (1


Criteria
Marks - Marks) to 3 - Marks) - Marks) -Mark)

Present in
Watched other
practical session
Performed Performed students
but not
practical practical with performing
5 attentively
Engagement him/her others help practical but not
participated in
self tried him/her
performance
self

Accuracy 5 Accurately done 1-2 3-5 More than 5


errors/mistakes
errors/mistakes errors/mistakes committed
found identified

No errors, Complete write-up Some of the


Program is well and output tables commands
Documentatio 5 Poor write-up and
Executed and but presentation is missing with
n diagram or missing
Documented poor missing
content
Properly. outputs

Fully Understood the Partially


Partially
understood performance but understood
Understanding 5 understood and
the cannot explain the
& Explanation cannot give
performance performance
explanation
& can explain &can give little
perfectly explanation

Work is submitted
Work done after
later than 1 week 2nd week but
Time 5 Completed Work submittedafter
but by the end of before the end
thework 3 week time
2ndweek of3rd week
within 1 week

Max 25 marks

Sign with Date

81 | Page

You might also like