0% found this document useful (0 votes)
5 views46 pages

ML Lab 2024-26 Final

The document is a lab manual for the Machine Learning and Data Analytics course at PES College of Engineering, detailing the vision and mission of the institution and department, along with program educational objectives and outcomes. It includes a series of experiments utilizing Python libraries for machine learning, such as regression analysis, decision trees, and clustering algorithms, with a focus on real-world applications. The manual aims to equip students with the necessary skills and ethical values to address societal challenges in the field of computer applications.

Uploaded by

keertthui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views46 pages

ML Lab 2024-26 Final

The document is a lab manual for the Machine Learning and Data Analytics course at PES College of Engineering, detailing the vision and mission of the institution and department, along with program educational objectives and outcomes. It includes a series of experiments utilizing Python libraries for machine learning, such as regression analysis, decision trees, and clustering algorithms, with a focus on real-world applications. The manual aims to equip students with the necessary skills and ethical values to address societal challenges in the field of computer applications.

Uploaded by

keertthui
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

P.E.S.

COLLEGE OF ENGINEERING
Mandya-571401, Karnataka
(An Autonomous Institution, under Visveswaraiah Technological University,
Belagavi)
Aided by Govt. of Karnataka Recognized by AICTE, New Delhi.
Phone: 08232-220043, 220120 Extn:213 Fax:08232-222075

Department of Master of Computer Applications

II SEMESTER

LAB MANUAL
Machine Learning and Data Analytics using Python
(Integrated course)

Subject Code: P24MCA21

Academic Year: 2024-2026

VISION AND MISSION


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Vision of PESCE
PESCE shall be a leading institution imparting quality engineering and management education
developing creative and socially responsible professionals.

Mission of PESCE
 Provide state-of-the-art infrastructure, motivate the faculty to be proficient in their field of
specialization and adopt best teaching-learning practices.
 Impart engineering and managerial skills through competent and committed faculty using
outcome based educational curriculum.
 Inculcate professional ethics, leadership qualities and entrepreneurial skills to meet the
societalneeds.
 Promote research, product development and industry-institution interaction.

Vision of the Department


A Department of high repute imparting quality education to develop competent computer
applicationsoftware professionals and technocrats to serve the society.

Mission of the Department


Committed to
 To provide state-of-the-art facilities with supportive environment for teaching and learning.
 To prepare the students with curricula of industry expectation.
 Train the students to be competent to solve the real-world problems in the field of computer
Applications and nurturing the students with ethical values for well-being in the society.

Dept. of MCA, PESCE, Mandya Page 2


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

PROGRAM EDUCATIONAL OBJECTIVES (PEOs)


PEO-1. Deliver competence in a global environment as computer software professional with
Practice of software engineering principles.

PEO-2. Exhibit Technical and managerial skills to provide solutions for societal acceptable
problems and manage projects.

PEO-3. Excel in profession with effective communication skills, ethical attitude, teamwork
and ability torelate computer applications to broader societal context.

PROGRAMME OUTCOMES (POs)


PO-1. (Foundation Knowledge): Apply knowledge of mathematics, programming logic and
coding fundamentals for solution architecture and problem solving.

PO-2. (Problem Analysis): Identify, review, formulate and analyze problems for primarily
focusing on customer requirements using critical thinking frameworks.

PO-3. (Development of Solutions): Design, develop and investigate problems with as an


innovative approach for solutions incorporating ESG/SDG goals.

PO-4. (Modern Tool Usage): Select, adapt and apply modern computational tools such as
development of algorithms with an understanding of the limitations including human biases.

PO-5. (Individual and Teamwork): Function and communicate effectively as an individual or a


team leader in diverse and multidisciplinary groups. Use methodologies such as agile.

PO-6. (Project Management and Finance): Use the principles of project management such as

scheduling, work breakdown structure and be conversant with the principles of Finance for

profitable project management.

PO-7. (Ethics): Commit to professional ethics in managing software projects with financial
aspects, learn to use new technologies for cyber security and insulate customers from
malware.

PO-8. (Life-long Learning): Change management skills and the ability to learn, keep up with
contemporary technologies and ways of working.

Dept. of MCA, PESCE, Mandya Page 3


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Sl. Blooms
Experiments COs POs
No. Levels
Python programs to show the usage of Python Libraries for ML
1 application such as Pandas, Matplotlib and Seaborn. Read the
training data from a .CSV file
Write a program to demonstrate Regression analysis with
2
residual plots on a given data set
Write a program to implement the binary logistic Bayesian
classifier for a sample training data set stored as a .CSV file.
3
Compute the accuracy of the classifier, considering few test data
sets
Write a program to implement k-Nearest Neighbour algorithm
4 to classify the iris data set. Print both correct and wrong
predictions
Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building
5
the decision tree and apply this knowledge to classify a new
sample
Write a program to implement k-Means clustering algorithm to
6
cluster the set of data stored in .CSV file
Write a program to implement SVM algorithm to classify the
7
iris data set. Print both correct and wrong predictions
Build an Artificial Neural Network by implementing the
8 Backpropagation algorithm and test the same using appropriate
data sets
Write a program to compute summary statistics such as mean,
9 median, mode, standard deviation and variance of the given
different types of data

Dept. of MCA, PESCE, Mandya Page 4


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

1. Python programs to show the usage of Python Libraries for ML applications such as Pandas,
Matplotlib and Seaborn. Read the training data from a .CSV file

Name of the Dataset

Autos_mpg.data: info about different cars & their characteristics


(http://archive.ics.uci.edu/ml/datasets/auto+mpg)

Dataset Description

1. 'mpg': Miles per gallon, a measure of fuel efficiency.


2. 'cylinders': Number of cylinders in the engine.
3. 'displacement': Total volume of all cylinders in an engine.
4. 'horsepower': Engine power output measured in horsepower.
5. 'weight': Total weight of the vehicle.
6. 'acceleration': Rate at which the vehicle can increase its speed.
7. 'year': The manufacturing year of the vehicle.
8. 'origin': Country of origin of the vehicle.
9. 'name': Name or identifier of the vehicle model.

In [1]: import pandas as pd


import matplotlib.pyplot as plt
import seaborn as sns import warnings
warnings.filterwarnings('ignore')

In [2]: autos=pd.read_csv(r'D:\Teaching\ML\auto+mpg\auto-mpg.data', sep='\s+', header=None)

In [3]: print( autos.head (6))

In [4]: autos.info()

Out: <class 'pandas.core.frame.DataFrame'>

RangeIndex: 398 entries, 0 to 397 Data


columns (total columns):

Dept. of MCA, PESCE, Mandya Page 5


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

# Column Non-Null Count Dtype ---


------ -------------- -----
0 398 non-null float64
1 1 398 non-null int64
2 2 398 non-null float64
3 3 398 non-null object
4 4 398 non-null float64
5 5 398 non-null float64
6 6 398 non-null int64
7 7 398 non-null int64
8 8 398 non-null object
dtypes: float64(4), int64(3), object(2)
memory usage: 28.1+ KB

In [5]: autos.columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'accelerati on',


'year', 'origin', 'name']

In [6]: autos.info()

Out: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 398 entries, 0 to 397 Data
columns (total 9 columns):
# Column Non-Null Count Dtype -
-- ------ -------------- -----
mpg 398 non-null float64
1 cylinders 398 non-null int64
2 displacement 398 non-null float64
3 horsepower 398 non-null object
4 weight 398 non-null float64
5 acceleration 398 non-null float64
6 year 398 non-null int64
7 origin 398 non-null int64
8 name 398 non-null object

dtypes: float64(4), int64(3), object(2)


memory usage: 28.1+ KB

In[7]: autos.shape

Out: (398, 9)

In[8]: autos.horsepower.unique()

Dept. of MCA, PESCE, Mandya Page 6


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Out: array(['130.0', '165.0', '150.0', '140.0', '198.0', '220.0', '215.0','225.0', '190.0', '170.0', '160.0',
'95.00', '97.00', '85.00', '88.00', '46.00', '87.00', '90.00', '113.0', '200.0', '210.0', '193.0', '?', '100.0',
'105.0', '175.0', '153.0', '180.0', '110.0','72.00', '86.00', '70.00', '76.00', '65.00', '69.00', '60.00','80.00',
'54.00', '208.0', '155.0', '112.0', '92.00', '145.0', '137.0', '158.0', '167.0', '94.00', '107.0', '230.0',
'49.00','75.00', '91.00', '122.0', '67.00', '83.00', '78.00', '52.00', '61.00', '93.00', '148.0', '129.0', '96.00',
'71.00', '98.00', '115.0', '53.00', '81.00', '79.00', '120.0', '152.0', '102.0',
'108.0', '68.00', '58.00', '149.0', '89.00', '63.00', '48.00','66.00', '139.0','103.0', '125.0', '133.0',
'138.0', '135.0', '142.0', '77.00', '62.00', '132.0', '84.00', '64.00', '74.00', '116.0', '82.00'],
dtype=object)

In[9]: autos["horsepower"] = pd.to_numeric(autos["horsepower"], errors='coerce')


autos.info( )

Out: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 398 entries, 0 to 397 Data
columns (total 9 columns):
# Column Non-Null Count Dtype --- --
---- -------------- -----
mpg 398 non-null float64
1 cylinders 398 non-null int64
2 displacement 398 non-null float64
3 horsepower 392 non-null float64
4 weight 398 non-null float64
5 acceleration 398 non-null float64
6 year 398 non-null int64
7 origin 398 non-null int64
8 name 398 non-null object
dtypes: float64(5), int64(3), object(1)
memory usage: 28.1+ KB

In[10]: autos.describe()

Out: mpg cylinders displacement horsepower weight acceleration year origin

count
In[11]:398.000000
mean
398.000000 398.000000
autos[autos.horsepower.isnull(
23.514573 5.454774
)] 392.000000 398.000000
193.425879 104.469388 2970.424623
398.000000
15.568090
398.000000
76.010050
398.000000
1.572864

std 7.815984 1.701004 104.269838 38.491160 846.841774 2.757689 3.697627 0.802055

min 9.000000 3.000000 68.000000 46.0000001613.000000 8.000000 70.000000 1.000000

25% 17.500000 4.000000 104.250000 75.0000002223.750000 13.825000 73.000000 1.000000

50% of23.000000
Dept. 4.000000
MCA, PESCE, Mandya 148.500000 93.5000002803.500000 15.500000 76.000000 1.000000
Page 7
75% 29.000000 8.000000 262.000000 126.000000 3608.000000 17.175000 79.000000 2.000000

max 46.600000 8.000000 455.000000 230.000000 5140.000000 24.800000 82.000000 3.000000


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Out:

In[12]: val=autos['horsepower'].mean()
Print(val)

Out: 104.46938775510205

In[13]: autos['horsepower'].fillna(autos['horsepower'].mean( ), inplace=True)

In[14]: print(autos.head(7))

Out:
mpg cylinders displacement horsepower weight acceleration year \
0 18.0 8 307.0 130.000000 3504.0 12.0 70
1 15.0 8 350.0 165.000000 3693.0 11.5 70
2 18.0 8 318.0 150.000000 3436.0 11.0 70
3 16.0 8 304.0 150.000000 3433.0 12.0 70
4 17.0 8 302.0 140.000000 3449.0 10.5 70
5 15.0 8 429.0 198.000000 4341.0 10.0 70
6 14.0 8 454.0 220.000000 4354.0 9.0 70

In[15]: autos.mpg.describe()

Out: count 398.000000


mean 23.514573
std 7.815984
min 9.000000
25% 17.500000
50% 23.000000
75% 29.000000
max 46.600000
Name: mpg, dtype: float64

In[16]: # So the minimum value is 9 and maximum is 46, but on average it is 23.44 with a variation
Dept. of MCA, PESCE, Mandya Page 8
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

sns.distplot(autos['mpg'])
plt.title('Distribution plot for MPG values', fontsize=21)

Out: Text(0.5, 1.0, 'Distribution plot for MPG values')


Analysis: So the minimum value is 9 and maximum is 46 but on average it is 23.44 with a variation
of 7.8

In[17]: autos['origin'] = autos.origin.replace([1,2,3],['USA','Europe','Japan'])

In[18]: autos.head()

Out:

Dept. of MCA, PESCE, Mandya Page 9


ML and Data Analytics using Python LAB MANUAL (P24MCA21)
.

In[19]:x=autos['origin']
y=autos['mpg']
fig = plt.figure ( figsize = (10, 5))
plt.bar(x, y,color ='Purple',width = 0.4)
plt.xlabel ("Country Name", fontsize=12)
plt.ylabel ("MPG",fontsize=12)
plt.title ("Average mpg values for different countries", fontsize=20)
plt.show ( )

Out:

Analysis:Japan has more MPG values compared to USA and EUROPE


2. Write a program to demonstrate Regression analysis with residual plots on a given data set

Name of the Dataset


MCA Salary.csv
Dataset Description

Dept. of MCA, PESCE, Mandya Page 10


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

1.YearsExperiens: Shows how long he/she has been working.


2.Salary: Indicates how much money he/she earns for their work.
In[1]: import pandas as pd
import numpy as np import statsmodels.api assn from sklearn.model_selection
import train_test_split from sklearn
import metrics from sklearn
import metrics import math
import seaborn as sns

In[2]: #mca_sal_df=pd.read_csv(r'D:\Teaching\ML\Codes-Data-Files\Machine Learning (Codes


and Data Files)\Data\MCA Salary.csv')
sal_df=pd.read_csv(r'D:\Teaching\ML\2023\Salary.csv')
head()

Out:
YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0

In[3]: sal_df.shape

Out: (30, 2)

In [4]: sal_df.info()

Out: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
Dept. of MCA, PESCE, Mandya Page 11
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

dtypes: float64(2) memory usage: 612.0 bytes

In [5]: sal_df.describe()

Out:
YearsExperience Salary
count 30.000000 30.000000
mean 5.313333 76003.000000
std 2.837888 27414.429785
min 1.100000 37731.000000
25% 3.200000 56720.750000
50% 4.700000 65237.000000
75% 7.700000 100544.750000
max 10.500000 122391.000000

In [6]: # Data distribution


plt.title('Salary Distribution Plot')
sns.distplot(sal_df['Salary'])
plt.show()

In [7]: #add constant term of 1 to the dataset


X=sn.add_constant(sal_df["YearsExperience"])
Y=mca_sal_df["Salary"]

In [8]: #split dataset into train and test set into 80:20 respectively
train_X,test_X,train_y,test_y=train_test_split(X,Y,train_size=0.7,random_state=100)

Dept. of MCA, PESCE, Mandya Page 12


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In [9]: #fiiting the model using OLS method


sal_lm=sn.OLS(train_y,train_X).fit()

In [10]: #print the estimated parameters


print(sal_lm.params)

Out: const 25202.887786


YearsExperience 9731.203838
dtype: float64

In [11]: #prints the model summary contains the information required for diagnosing a regression model
sal_lm.summary()

Out: OLS Regression Results


Dep. Variable: Salary R-squared: 0.949
Model: OLS Adj. R-squared: 0.946
Method: Least Squares F-statistic: 352.9
Date: Sat, 23 Dec 2023 Prob (F- statistic): 9.91e-14
Time: 23:22:55 Log-Likelihood: -211.80
No. Observations: 21 AIC: 427.6
Df Residuals: 19 BIC: 429.7
Df Model: 1
Covariance Type: nonrobust

coef std err t P>|t| [0.0250.975]


const 2.52e+04 2875.387 8.765 0.000 1.92e+0 3.12e+04
YearsExperience 9731.2038 517.993 18.786 0.000 8647.033 1.08e+04

Omnibus: 1.843 Durbin-Watson: 1.749


Prob(Omnibus): 0.398 Jarque-Bera(JB): 1.106
Skew: 0.219 Prob(JB): 0.575
Kurtosis: 1.964 Cond. No. 12.3

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In [12]:#takes the X parameter and returns the predicted values


pred_y=sal_lm.predict(test_X)

In [13]:print(pred_y, test_y)

Out :

Dept. of MCA, PESCE, Mandya Page 13


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

9 61208.341988
26 117649.324249
28 125434.287320
13 65100.823523
5 53423.378917
12 64127.703139
27 118622.444633
25 112783.7223306
6 54396.499301
dtype: float64
9 57189.0
25 116969.0
28 122391.0
13 57081.0
5 56642.0
12 56957.0
27 112635.0
25 105582.0
6 60150.0
Name: Salary, dtype: float64

In [14]: #R squared error


from sklearn.metrics import r2_score, mean_squared_error
error_score = abs(metrics.r2_score(test_y,pred_y))
print("R squared error:",error_score)

Out: R squared error: 0.9627668685473271

In [15]:# Prediction on test set


sns.regplot(x=test_y, y=pred_y, color = 'Green')
plt.title('Salary vs Experience (Test Set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.legend(['predicted [test] values'], loc='upper left')
plt.show()

Dept. of MCA, PESCE, Mandya Page 14


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Dept. of MCA, PESCE, Mandya Page 15


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

3. Write a program to implement the binary logistic Bayesian classifier for a sample training
data set stored as a .CSV file. Compute the accuracy of the classifier, considering few test
data sets

DATASET

pima_indian.csv

In[1]: import pandas as pd


from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics

In[2]: df = pd.read_csv(r"C:\Users\DEPT\Downloads\pima_indian.csv")
feature_col_names = ['num_preg', 'glucose_conc', 'diastolic_bp', 'thickness', 'insulin', 'bmi',
'diab_pred', 'age']
predicted_class_names = ['diabetes']

In[3]: X = df[feature_col_names].values # these are factors for the prediction


y = df[predicted_class_names].values # this is what we want to predict

In[4]: #splitting the dataset into train and test data


xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.33)

In[5]: print ('\n The total number of Training Data :',ytrain.shape)


print ('\n The total number of Test Data :',ytest.shape)

out:
The total number of Training Data : (514, 1)

The total number of Test Data : (254, 1)

In[6]: # Training Naive Bayes (NB) classifier on training data.


clf = GaussianNB( ).fit(xtrain,ytrain.ravel( ))
predicted = clf.predict(xtest)

In[7]: #printing Confusion matrix, accuracy, Precision and Recall


print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))

Out:
Confusion matrix
[[135 28]
[ 33 58]]

Dept. of MCA, PESCE, Mandya Page 16


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In[8]: from sklearn.metrics import confusion_matrix, classification_report


from sklearn.metrics import accuracy_score
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Creates a confusion matrix
cm = confusion_matrix(ytest, predicted)
sns.heatmap(cm, annot=True, fmt='g')
plt.title('Accuracy using brute:{0:.2f}'.format(accuracy_score(ytest, predicted)))
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.show()

In[9]: print('Accuracy Metrics')


print(classification_report(ytest,predicted))

Out:
Accuracy Metrics
precision recall f1-score support

0 0.80 0.83 0.82 163


1 0.67 0.64 0.66 91

accuracy 0.76 254


macro avg 0.74 0.73 0.74 254
weighted avg 0.76 0.76 0.76 254

In[10]:#Prediction for new data set


Dept. of MCA, PESCE, Mandya Page 17
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

predictTestData= clf.predict([[6,148,72,35,0,33.6,0.627,50]])
print("Predicted Value for individual Test Data:", predictTestData)

Out:
Predicted Value for individual Test Data: [1]

In[11]: predictTestData1= clf.predict([[1,80,66,29,0,26.6,0.351,31]])


print("Predicted Value for individual Test Data:", predictTestData1)

Out:
Predicted Value for individual Test Data: [0]

Dept. of MCA, PESCE, Mandya Page 18


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

4. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions

DATA SET

‘iris’ dataset from sklearn

DATASET DESCRIPTION

1. Sepal Length: The length of the sepal (the green leaf-like structure) of the iris flower, measured
in centimeters.
2. Sepal Width: The width of the sepal of the iris flower, measured in centimeters.
3. Petal Length: The length of the petal (the colored leaf-like structure) of the iris flower, measured
in centimeters.
4. Petal Width: The width of the petal of the iris flower, measured in centimeters.
5. Species: The species of the iris plant, which can be one of three types: Setosa, Versicolor, or
Virginica. This feature categorizes the iris flowers into distinct species based on their
characteristics.

In[1]: from sklearn.model_selection import train_test_split


from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report, confusion_matrix
from sklearn import datasets

In[2]: iris=datasets.load_iris()

In[3]: x = iris.data
y = iris.target

In[4]: print ('sepal-length', 'sepal-width', 'petal-length', 'petal-width')


print(x)

Out: sepal-length sepal-width petal-length petal-width


[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
[5.4 3.9 1.7 0.4]
[4.6 3.4 1.4 0.3]
[5. 3.4 1.5 0.2]
.. ..

Dept. of MCA, PESCE, Mandya Page 19


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In[5]: print('class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica')


print(y)

Out: class: 0-Iris-Setosa, 1- Iris-Versicolour, 2- Iris-Virginica


[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0000000000000111111111111111111111111
1111111111111111111111111122222222222
2222222222222222222222222222222222222
2 2]

In[6]: x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3)

In[7]: classifier = KNeighborsClassifier(n_neighbors=5)


classifier.fit(x_train, y_train)

Out: KNeighborsClassifier()

In[8]: y_pred=classifier.predict(x_test)

In[9]: import numpy as np


for i in range(len(x_test)):
x=x_test[i]
x_new=np.array([x])
prediction=classifier.predict(x_new)
print("TARGET=",y_test[i],iris["target_names"][y_test[i]],"PREDICTED=",predicti
on,iris["target_names"][prediction])
print(classifier.score(x_test,y_test))

Out : TARGET= 1 versicolor PREDICTED= [1] ['versicolor']


TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica'
.. ..

In[10]: print('Confusion Matrix')


print(confusion_matrix(y_test,y_pred))

Out: Confusion Matrix


[[12 0 0]
[ 0 16 1]
[ 0 0 16]]

Dept. of MCA, PESCE, Mandya Page 20


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In[11]:from sklearn.metrics import confusion_matrix


from sklearn.metrics import accuracy_score
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Creates a confusion matrix
cm = confusion_matrix(y_test, y_pred)
# Transform to df for easier plotting
cm_df = pd.DataFrame(cm, index = ['setosa','versicolor','virginica'],
columns = ['setosa','versicolor','virginica'])
sns.heatmap(cm_df, annot=True)
plt.title('Accuracy using brute:{0:.3f}'.format(accuracy_score(y_test, y_pred)))
plt.ylabel('Actual label')
plt.xlabel('Predicted label')
plt.show()

Out:

In[12]: print('Accuracy Metrics')


print(classification_report(y_test,y_pred))

Out: Accuracy Metrics


precision recall f1-score support

0 1.00 1.00 1.00 12


1 1.00 0.94 0.97 17
2 0.94 1.00 0.97 16

accuracy 0.98 45
macro avg 0.98 0.98 0.98 45
weighted avg 0.98 0.98 0.98 45

Dept. of MCA, PESCE, Mandya Page 21


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

5. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample

DATA SET

‘iris’ dataset from seaborn library

DATASET DESCRIPTION

1. Sepal Length: The length of the sepal (the green leaf-like structure) of the iris flower, measured
in centimeters.
2. Sepal Width: The width of the sepal of the iris flower, measured in centimeters.
3. Petal Length: The length of the petal (the colored leaf-like structure) of the iris flower, measured
in centimeters.
4. Petal Width: The width of the petal of the iris flower, measured in centimeters.
5. Species: The species of the iris plant, which can be one of three types: Setosa, Versicolor, or
Virginica. This feature categorizes the iris flowers into distinct species based on their
characteristics.

These features collectively describe various physical attributes of iris flowers, which are commonly
used in machine learning tasks for tasks such as classification and clustering.

In [1]: import pandas as pd


import numpy as np
import statsmodels.api as sn
from sklearn.model_selection import train_test_split
from sklearn import metrics
import matplotlib.pyplot as plt
import seaborn as sns

In [2]: iris_df=sns.load_dataset('iris')

In [3]: iris_df.head()

Out: sepal_length sepal_width petal_length petal_width species


5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Dept. of MCA, PESCE, Mandya Page 22


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In [4]: iris_df.info()

Out: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
sepal_length 150 non-null float64
1 sepal_width 150 non-null float64
2 petal_length 150 non-null float64
3 petal_width 150 non-null float64
4 species 150 non-null object
dtypes: float64(4), object(1)
memory usage: 6.0+ KB

In [5]: # Unique Classes in the dataset


iris_df['species'].unique()

Out: array(['setosa', 'versicolor', 'virginica'], dtype=object)

In [6]: iris_df.isnull().sum()

Out: sepal_length 0 sepal_width 0


petal_length 0 petal_width
0
species 0
dtype: int64

In [7]: # Replaces the target class values to numerical values (Object to numeric)
iris_df['species']=iris_df['species'].map({'setosa':0,'versicolor':1,'virginica':2})

In[8]: iris_df.head(105)

Dept. of MCA, PESCE, Mandya Page 23


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In [9]: #independent feature and dependent features


X=iris_df.iloc[:,:-1] y=iris_df['species']
In [10]: X,y

Out: (sepal_length sepal_width petal_length petal_width


5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2 .. ... ... ... ...
145 6.7 3.0 5.2 2.3
146 6.3 2.5 5.0 1.9
147 6.5 3.0 5.2 2.0
148 6.2 3.4 5.4 2.3
149 5.9 3.0 5.1 1.8

[150 rows x 4 columns],

0
1 0
2 0
3 0
4 0 ..
145 2
146 2
147 2
148 2
149 2
Name: species, Length: 150, dtype: int64)

In [11]: ### train test split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [12]: X_train
Out[12]:
sepal_length sepal_width petal_length petal_width
81 5.5 2.4 3.7 1.0
133 6.3 2.8 5.1 1.5
137 6.4 3.1 5.5 1.8
75 6.6 3.0 4.4 1.4
109 7.2 3.6 6.1 2.5

... ... ... ... ...

106 4.9 2.5 4.5 1.7


14 5.8 4.0 1.2 0.2
92 5.8 2.6 4.0 1.2

Dept. of MCA, PESCE, Mandya Page 24


ML and Data Analytics using Python LAB MANUAL (P24MCA21)
102 7.1 3.0 5.9 2.1
105 rows × 4 columns

In [13]: y_train
Out: 81 1
133 2
137 2
75 1
109 2
..
71 1
106 2
14 0
92 1
102 2
Name: species, Length: 105, dtype: int64

In [14]: #Model building


from sklearn.tree import DecisionTreeClassifier
## Postpruning
treemodel=DecisionTreeClassifier(max_depth=2)
treemodel.fit(X_train,y_train)

In [15]: #prediction
y_pred=treemodel.predict(X_test) y_pred
Out[15]: array([1, 0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0], dtype=int64)

In [16]: from sklearn.metrics import accuracy_score,classification_report


score=accuracy_score(y_pred,y_test)
print(score)

Out: 0.9777777777777777

In [17]: print(classification_report(y_pred,y_test))

Out[17] : precision recall f1-score support


1.00 1.00 1.00 19

Dept. of MCA, PESCE, Mandya Page 25


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

1 0.92 1.00 0.96 12


2 1.00 0.93 0.96 14
accuracy 0.98 45
macro avg 0.97 0.98 0.97 45
weighted avg 0.98 0.98 0.98 45

In [18]: from sklearn import tree


plt.figure(figsize=(15,10))
tree.plot_tree(treemodel,filled=True)

Out:

In [19]: predictTestData1= treemodel.predict([[1,80,66,29,0,26.6,0.351,31]])


print("Predicted Value for individual Test Data:", predictTestData1)

Dept. of MCA, PESCE, Mandya Page 26


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

6. Write a program to implement k-Means clustering algorithm to cluster the set of data
stored in .CSV file

DATASET
Income Data.csv
DATASET DECSRIPTION
1.income: income of the individual.
2.age: age of the individual.

In [1]: import pandas as pd


import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import warnings
warnings.filterwarnings('ignore')

In [2]: income_df=pd.read_csv(r'D:\Teaching\ML\Codes-Data-Files\Machine Learning


(Codes and Data Files)\Data\Income Data.csv')

In [3]: income_df.head()

Out:
income age
0 41100.0 48.75
1 54100.0 28.10
2 47800.0 46.75
3 19100.0 40.25
4 18200.0 35.80

In [4]: income_df.info()

Out: <class 'pandas.core.frame.DataFrame'>


RangeIndex: 300 entries, 0 to 299
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 income 300 non-null
float64 1 age 300 non-null
float64 dtypes: float64(2)
memory usage: 4.8 KB

In [5]: plt.figure(figsize=(10,6))

Dept. of MCA, PESCE, Mandya Page 27


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

plt.scatter(income_df['income'], income_df['age'])
plt.xlabel('Income')
plt.ylabel('Age')
plt.title('Income Data')

Out: Text(0.5, 1.0, 'Income Data')

Analysis: The age value upto 30 has high income ranges between 50000-60000.
In [6]: cluster_range = range(1, 10)
cluster_errors = [ ]
for num_clusters in cluster_range:
clusters = KMeans(num_clusters)
clusters.fit(income_df)
cluster_errors.append(clusters.inertia)
plt.figure(figsize=(6,4))
plt.plot(cluster_range, cluster_errors, marker = "o");
plt.title('Elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('Cluster Score')

Dept. of MCA, PESCE, Mandya Page 28


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Out: Text(0, 0.5, 'Cluster Score')

Analysis:The elbow point is falling down at 2 so we take n.clusters value as 2.

In [7]: cluster_errors

Out: [77496243724.64746,
12598951960.688824,
6107696328.700776,
3093566239.1138325,
2208535279.104451,
1468601128.8812134,
1167521998.0943167,
916192175.9564873,
727270333.3059859]

In [8]: clusters_model = KMeans(n_clusters=2, random_state=42)


clusters_model.fit(income_df)

In [9]: pred=clusters_model.predict(income_df)
pred

Out: array([0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
Dept. of MCA, PESCE, Mandya Page 29
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,
1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0])

In [10]: clusters_model.labels_

Out: array([0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0,
0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0,
0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1,
0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,
0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1,
1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
1,0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0,
1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0])

In [11]: income_df['cluster'] = pd.DataFrame(pred, columns=['cluster'])

In [12]: income_df.head()

Out:
income age cluster
41100.0 48.75 0
1 54100.0 28.10 0
2 47800.0 46.75 0
3 19100.0 40.25 1
4 18200.0 35.80 1

In [13]: import seaborn as sn


sn.lmplot(x="age", y="income", data=income_df, fit_reg=False, hue='cluster');
#plt.legend('lower right')
plt.show()

Dept. of MCA, PESCE, Mandya Page 30


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In [14]: clusters_model.cluster_centers_

Out: array([[4.98601990e+04, 3.80713930e+01],


[1.85808081e+04, 3.92449495e+01]])

In [15]: clusters_model = KMeans(n_clusters=3, random_state=42)


clusters_model.fit(income_df)

In [16]: pred=clusters_model.predict(income_df)
Pred

Out: array([2, 0, 2, 1, 1, 1, 0, 2, 1, 2, 0, 0, 0, 2, 0, 1, 2, 2, 1, 0, 1, 2,
0, 2, 1, 1, 2, 1, 0, 0, 1, 2, 2, 0, 0, 1, 0, 1, 2, 0, 1, 0, 2, 0,
0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 2, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1,
2, 0, 2, 0, 1, 1, 1, 1, 0, 2, 0, 1, 2, 2, 1, 2, 0, 2, 2, 0, 0, 1,
2, 2, 1, 0, 1, 0, 0, 0, 2, 0, 1, 2, 0, 1, 2, 0, 0, 2, 1, 2, 0, 0,
2, 1, 0, 2, 1, 1, 2, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1,
2, 1, 1, 0, 1, 2, 1, 1, 0, 2, 0, 2, 1, 1, 2, 1, 1, 0, 2, 1, 2, 0,
1, 1, 0, 0, 2, 0, 2, 0, 0, 2, 1, 0, 2, 2, 2, 1, 0, 2, 1, 0, 0, 0,
2, 0, 2, 0, 0, 1, 2, 2, 2, 2, 0, 1, 2, 1, 2, 2, 0, 0, 1, 2, 0, 1,
2, 1, 0, 1, 0, 1, 0, 1, 2, 1, 2, 0, 2, 2, 1, 0, 0, 0, 0, 2, 1, 0,
2, 0, 0, 0, 2, 1, 1, 2, 0, 2, 2, 0, 0, 2, 0, 1, 1, 1, 2, 2, 0, 1,
1, 1, 1, 0, 2, 1, 2, 0, 0, 2, 0, 0, 1, 2, 0, 1, 2, 0, 1, 0, 1, 1,
2, 1, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0, 1, 1, 0, 0, 2, 0, 0, 0, 1, 0,

Dept. of MCA, PESCE, Mandya Page 31


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

1, 1, 0, 0, 1, 1, 1, 0, 2, 2, 1, 0, 2, 2])
In [17]: income_df['cluster'] = pd.DataFrame(pred, columns=['cluster']) income_df.head()

Out:
income age cluster
41100.0 48.75 2
1 54100.0 28.10 0
2 47800.0 46.75 2
3 19100.0 40.25 1
4 18200.0 35.80 1

In [18]: import seaborn as sn


sn.lmplot(x="age", y="income", data=income_df, fit_reg=False, hue='cluster');
#plt.legend('lower right')
plt.show()

Dept. of MCA, PESCE, Mandya Page 32


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

7. Write a program to implement SVM algorithm to classify the iris data set. Print both
correct and wrong predictions

In[1]: # Import necessary libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score
from matplotlib.colors import ListedColormap
import seaborn as sns

In[2]: # Loading the Iris dataset using scikit-learn’s datasets module. The load_iris() function from
this module loads the well-known Iris dataset
iris=datasets.load_iris( )

In[3]: #Selecting specific features from the Iris dataset.


x=iris.data[:, [2, 3]]
y=iris.target

In[4]: x

out:
array([[1.4, 0.2],
[1.4, 0.2],
[1.3, 0.2],
[1.5, 0.2],
[1.4, 0.2],
[1.7, 0.4],
[1.4, 0.3],
[1.5, 0.2],
.. .. ])

In[5]: y

out:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
Dept. of MCA, PESCE, Mandya Page 33
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In[6]: # Creating a Pandas DataFrame (`iris_df`) from the feature matrix `X` and the
target vector `y` obtained from the Iris dataset.
iris_df=pd.DataFrame(x, columns=iris.feature_names[2:])
iris_df['target']=y

In[7]: plt.figure(figsize=(10,6))
plt.scatter(x[y==0,0], x[y==0,1],color='red', marker='o', label='Setosa')
plt.scatter(x[y==1,0], x[y==1,1],color='blue', marker='x', label='Versicolor')
plt.scatter(x[y==2,0], x[y==2,1],color='green', marker='^', label='Virginica')
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.legend(loc='upper left')
plt.title('Data Distribution')
plt.show()

Out:

In[8]: # The code is using the train_test_split function from scikit-learn to split the dataset into
training and testing sets
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size=0.3,
random_state=42)

# By using the StandardScaler from scikit-learn to standardize the features in the training
and test sets.
sc=StandardScaler()
x_train_std=sc.fit_transform(x_train)
x_test_std=sc.transform(x_test)
Dept. of MCA, PESCE, Mandya Page 34
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

In[9]: # By using scikit-learn’s SVC (Support Vector Classification) to create a Support Vector
Machine (SVM) model with a linear kernel
svm_cl=SVC(kernel='linear', C=1.0, random_state=0)
svm_cl.fit(x_train_std, y_train)

Out:
SVC(kernel='linear', random_state=0)

In[10]: # defines a function called plot_decision_regions that can be used to visualize


decision boundaries of a classifier.
def plot_dec_region(x, y, classifier, test_idx=None, resolution=0.02):
#setup marker and color map
markers=('s','x','o','^', 'v')
colors=('red','blue','lightgreen', 'gray', 'cyan')
cmap=ListedColormap(colors[:len(np.unique(y))])

In[11]: #plot the decision surface


x1_min, x1_max=x[:, 0].min()-1, x[:,0].max()+1
x2_min, x2_max=x[:, 1].min()-1, x[:,0].max()+1
xx1, xx2=np.meshgrid(np.arange(x1_min,x1_max, resolution),
np.arange(x2_min,x2_max, resolution))
z=classifier.predict(np.array([xx1.ravel(),xx2.ravel()]).T)
z=z.reshape(xx1.shape)
plt.contourf(xx1, xx2, z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(),xx2.max())

#plot all samples


for idx, c1 in enumerate(np.unique(y)):
plt.scatter(x=x[y==c1,0],y=x[y==c1, 1], alpha=0.8, c=cmap(idx),
marker=markers[idx], label=c1)
plt.show()

In[12]: #combine the standardized feature matrices (X_train_std and X_test_std) and the
corresponding target vectors (y_train and y_test)
x_combine_std=np.vstack((x_train_std, x_test_std))
#combine train and test target values
y_combine=np.hstack((y_train,y_test))

In[13]: # By using the plot_decision_regions function to visualize the decision boundaries


of the Support Vector Machine (SVM) classifier (svm) on the combined standardized
feature matrix (X_combined_std) and target vector (y_combined)

Dept. of MCA, PESCE, Mandya Page 35


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

#visualizing the decision boundaries


plot_dec_region(x_combine_std, y_combine, classifier=svm_cl)
plt.xlabel('Petal length [Standardized]')
plt.ylabel('Petal width [Standardized]')
plt.legend(loc='upper left')
plt.title('SVM Decision Boundaries')
plt.show()

Out:

In[14]: # Make predictions using the SVM model (svm) on the standardized test data
(X_test_std) and then calculating the confusion matrix.
y_pred=svm_cl.predict(x_test_std)
cm=confusion_matrix(y_test, y_pred)
print("Confusion Matrix\n", cm)
accuracy=accuracy_score(y_test,y_pred)
print("Accuracy:", accuracy)

Out:
Confusion Matrix
[[19 0 0]
[ 0 13 0]
[ 0 0 13]]
Accuracy: 1.0

In[15]: #plotting the confusion matrix


plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

Dept. of MCA, PESCE, Mandya Page 36


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Out:

8. Build an Artificial Neural Network by implementing the Backpropagation algorithm and


test the same using appropriate data sets

Dept. of MCA, PESCE, Mandya Page 37


ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Training Examples:

Expected % in
Example Sleep Study
Exams
1 2 9 92
2 1 5 86
3 3 6 89

Normalize the input :

Expected % in
Example Sleep Study
Exams
1 2/3 = 0.66666667 9/9 = 1 0.92
2 1/3 = 0.33333333 5/9 = 0.55555556 0.86
3 3/3 = 1 6/9 = 0.66666667 0.89

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100

#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))

#Derivative of Sigmoid Function


def derivatives_sigmoid(x):
return x * (1 - x)

#Variable initialization
epoch=5000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer

#weight and bias initialization


wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

#draws a random range of numbers uniformly of dim x*y


for i in range(epoch):
Dept. of MCA, PESCE, Mandya Page 38
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

#Forward Propogation
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Bckpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
#how much hidden layer wts contributed to error
hiddengrad = derivatives_sigmoid(hlayer_act)
d_hiddenlayer = EH * hiddengrad
# dotproduct of nextlayererror and currentlayerop
wout += hlayer_act.T.dot(d_output) *lr
wh += X.T.dot(d_hiddenlayer) *lr

print("Input: \n" + str(X))


print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Out:

Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.89417246]
[0.88311751]
[0.89255249]]

9. Write simple python programs to understand the Basic Libraries such as Statistics, Math,
Numpy and Scipy

a. Statistics
Dept. of MCA, PESCE, Mandya Page 39
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

# Python code to demonstrate the working of mean(), median(), mode()


# importing statistics to handle statisticaloperations
from statistics import mean
from statistics import mode
from statistics import median
from statistics import median_low
from statistics import median_high
from statistics import variance
from statistics import stdev

# List of positive integer numbers


data1 = [20, 30, 40, 20, 50, 50, 70, 90, 50, 10]

# List of floating point values


data2 = [21.4, 51.1, 62.7, 82.9]

# Tuple of a set of negative integers


data3 = [-45, -11, -12, -19, -34]

# Tuple of set of positive and negative integers


data4 = [-11, -12, -13, -14, 15, 15, 17, 18]

print("DATA SET")
print("Data-set 1", data1)
print("Data-set 1", data2)
print("Data-set 1", data3)
print("Data-set 1", data4)

print("MEAN")
# using mean () to calculate average of list elements
print ("The average of data-set 1 is : %.2f " %(mean(data1)))
print ("The average of data-set 2 is : %.2f " %(mean(data2)))
print ("The average of data-set 3 is : %.2f " %(mean(data3)))
print ("The average of data-set 4 is : %.2f " %(mean(data4)))
print("\n")

print("MODE")
# Printing the median of above datasets
print("Mode of data-set 1 is %.2f " %(mode(data1)))
print("Mode of data-set 2 is %.2f " %(mode(data2)))
print("Mode of data-set 3 is %.2f " %(mode(data3)))
print("Mode of data-set 4 is %.2f " %(mode(data4)))
print("\n")

print("MEDIAN")
# Printing the median of above datasets
Dept. of MCA, PESCE, Mandya Page 40
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

print("Median of data-set 1 is %.2f " %(median(data1)))


print("Median of data-set 2 is %.2f " %(median(data2)))
print("Median of data-set 3 is %.2f " %(median(data3)))
print("Median of data-set 4 is %.2f " %(median(data4)))
print("\n")

print("LOW and HIGH MEDIAN")


# simple list of a set of integers
sample = [1, 3, 3, 4, 5, 7]
print("Sample Set:", sample)

# Printing the median of above datasets


print("Median of data-set 1 is %.2f " %(median(sample)))

# Print low median of the data-set


print("Low Median of the set is %.2f " %(median_low(sample)))

# Print high median of the data-set


print("High Median of the set is %.2f " %(median_high(sample)))
print("\n")

print("VARIANCE")
# Print the variance of the data-sets
print("Variance of data-set 1 is %.2f " %(variance(data1)))
print("Variance of data-set 2 is %.2f " %(variance(data2)))
print("Variance of data-set 3 is %.2f " %(variance(data3)))
print("Variance of data-set 4 is %.2f " %(variance(data4)))
print("\n")

print("STANDARD DEVIATION")
# Print the standard deviation of the data-sets
print("The Standard Deviation of data-set 1 is %.2f" % (stdev(data1)))
print("The Standard Deviation of data-set 2 is %.2f" % (stdev(data2)))
print("The Standard Deviation of data-set 3 is %.2f" % (stdev(data3)))
print("The Standard Deviation of data-set 4 is %.2f" % (stdev(data4)))

Output:

DATA SET
Data-set 1 [20, 30, 40, 20, 50, 50, 70, 90, 50, 10]
Dept. of MCA, PESCE, Mandya Page 41
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

Data-set 2 [21.4, 51.1, 62.7, 82.9]


Data-set 3 [-45, -11, -12, -19, -34]
Data-set 4 [-11, -12, -13, -14, 15, 15, 17, 18]

MEAN
The average of data-set 1 is : 43.00
The average of data-set 2 is : 54.53
The average of data-set 3 is : -24.20
The average of data-set 4 is : 1.88

MODE
Mode of data-set 1 is 50.00
Mode of data-set 2 is 21.40
Mode of data-set 3 is -45.00
Mode of data-set 4 is 15.00

MEDIAN
Median of data-set 1 is 45.00
Median of data-set 2 is 56.90
Median of data-set 3 is -19.00
Median of data-set 4 is 2.00

LOW and HIGH MEDIAN


Sample Set: [1, 3, 3, 4, 5, 7]
Median of data-set 1 is 3.50
Low Median of the set is 3.00
High Median of the set is 4.00

VARIANCE
Variance of data-set 1 is 601.11
Variance of data-set 2 is 660.32
Variance of data-set 3 is 219.70
Variance of data-set 4 is 237.84

STANDARD DEVIATION
The Standard Deviation of data-set 1 is 24.52
The Standard Deviation of data-set 2 is 25.70
The Standard Deviation of data-set 3 is 14.82
The Standard Deviation of data-set 4 is 15.42

b. Math

#Calculation of the permutations and the combinations using math and scipy library.
#p = n! / (n - r)!
Dept. of MCA, PESCE, Mandya Page 42
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

#c = n! / (r! * (n - r)!)

import math
from scipy.special import perm, comb
n = int(input("Enter value for n:"))
r = int(input("Enter value for r:"))
def permutations_count(n, r):
return math.factorial(n) // math.factorial(n - r)
def combinations_count(n, r):
return math.factorial(n) // (math.factorial(n - r) * math.factorial(r))
print("The permutation of", n, "and", r, "is ")
print(permutations_count(n, r))
print("The combination of", n, "and", r, "is ")
print(combinations_count(n, r))

output:

Enter value for n:6

Enter value for r:4

The permutation of 6 and 4 is


360

The combination of 6 and 4 is


15

c. Numpy

# Python program for matrix multiplication operations


# importing numpy
Dept. of MCA, PESCE, Mandya Page 43
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

import numpy as np
Rows1 = int(input("Give the number of rows for matrix1:"))
Columns1 = int(input("Give the number of columns for matrix1:"))
Rows2 = int(input("Give the number of rows for matrix2:"))
Columns2 = int(input("Give the number of columns for matrix2:"))
if (Columns1 != Rows2):
print("Multiplication not possible....")
else:
print("Please write the elements of the matrix1 in a single line and separated by a space: ")
# User will give the entries in a single line
elements1 = list(map(int, input().split()))
print("Please write the elements of the matrix2 in a single line and separated by a space: ")
elements2 = list(map(int, input().split()))
# Printing the matrix given by the user
mat1 = np.array(elements1).reshape(Rows1, Columns1)
print("Matrix 1")
print(mat1)
mat2 = np.array(elements2).reshape(Rows2, Columns2)
print("Matrix 2")
print(mat2)
# producting matrices
print("Product of(mat1,mat2)...")
print(np.dot(mat1,mat2))
print() # prints newline

Output1:

Give the number of rows for matrix1:2


Give the number of columns for matrix1:3

Give the number of rows for matrix2:3


Give the number of columns for matrix2:2

Please write the elements of the matrix1 in a single line and separated by a space:
1234 56
Please write the elements of the matrix2 in a single line and separated by a space:
246815

Matrix 1
[[1 2 3]
[4 5 6]]

Matrix 2
[[2 4]
[6 8]
Dept. of MCA, PESCE, Mandya Page 44
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

[1 5]]
Product of(mat1, mat2)...
[[17 35]
[44 86]]

Output2:

Give the number of rows for matrix1:2


Give the number of columns for matrix1:3

Give the number of rows for matrix2:2


Give the number of columns for matrix2:3

Multiplication not possible....

d. scipy

#Python program to calculate determinant, eigenvalues of (X) and correspond eigenvector of a


two-dimensional square matrix
from scipy import linalg
import numpy as np
n = int(input("Enter the value for n:"))

#enter value for square matrix


print("Enter matrix elements")
elements = list(map(int, input().split()))
arr = np.array(elements).reshape(n,n)
print("Input Matrix")
print(arr)
print()

#pass values to det() function


Mdet=linalg.det( arr )
print ("Determinant of a matrix is :", Mdet)
print()

#pass value into eig function


eg_val, eg_vect = linalg.eig(arr)
print("Eigen values are")
#get eigenvalues
print(eg_val)
print()

print("Eigen Vectors are")


Dept. of MCA, PESCE, Mandya Page 45
ML and Data Analytics using Python LAB MANUAL (P24MCA21)

#get eigenvectors
print(eg_vect)

Output:
Enter the value for n:2
Enter matrix elements
4715
Input Matrix
[[4 7]
[1 5]]
Determinant of a matrix is : 12.999999999999998
Eigen values are
[1.8074176+0.j 7.1925824+0.j]

Eigen Vectors are


[[-0.95428251 -0.90983868]
[ 0.29890615 -0.41496214]]

Dept. of MCA, PESCE, Mandya Page 46

You might also like