ML Manual AIDS
ML Manual AIDS
REGULATION-2021
NAME :
REGISTER NUMBER :
YEAR : II
SEMESTER : IV
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.
.
VISION
To produce high quality, creative and ethical engineers, and technologists
contributing effectively to the ever-advancing Artificial Intelligence and Data
Science field.
MISSION
To educate future software engineers with strong fundamentals by
continuously improving the teaching-learning methodologies using
contemporary aids.
To produce ethical engineers/researchers by instilling the values of
humility, humaneness, honesty and courage to serve the society.
To create a knowledge hub of Artificial Intelligence and Data Science
with everlasting urge to learn by developing, maintaining and continuously
improving the resources/Data Science.
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.
Register No:
BONAFIDE CERTIFICATE
DATE:
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, EEE, ECE, CSE & IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.
COURSE OUTCOMES
PAGE
S.NO. TOPIC DATE SIGNATURE
NO
V
SYLLABUS
AL3461 MACHINE LEARNING LABORATORY
COURSE OBJECTIVES
➢ To get practical knowledge on implementing machine learning algorithms in real time problem for
getting solutions
➢ To implement supervised learning and their applications
➢ To understand unsupervised learning like clustering and EM algorithms
➢ To understand the theoretical and practical aspects of probabilistic graphical models.
Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Suggested Exercises:
1. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
2. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
3. Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file and compute the accuracy with a few test data sets.
4. Implement naïve Bayesian Classifier model to classify a set of documents and measure the
accuracy, precision, and recall.
5. Write a program to construct a Bayesian network considering medical data. Use this model to
d demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
6. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
7. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.
8.Write a program to implement Decision Tree classification model.
9. Implement Logistic regression Algorithm with a dataset . And measure the accuracy score and
Confusion matrix.
10. Implement Linear regression Algorithm with a dataset . And measure the accuracy score.
Ex.No: 1 For a given set of training data examples stored in a
.CSV file implement and demonstrate the Candidate –
Elimination algorithm to output a description of the set
Date : of all hypotheses consistent with the training examples
AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent with the training examples.
DATASET: trainingdata1.xlsx
Link : https://docs.google.com/spreadsheets/d/1X-
SG3qz2zCkWvGXMQ2GlsP76zvf7hMD4/edit?usp=share_link&ouid=107168863405783275058&rtpof=true&sd=tru
e
ALGORITHM:
10
PROGRAM/SOURCE CODE:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_excel('trainingdata1.xlsx'))
print(data)
Origin Manufacturer color Decade Type Example Type
0 Japan Honda blue 1980 economy positive
1 Japan Toyota green 1970 sports positive
2 Japan Toyota blue 1990 economy negative
3 USA Chrysler red 1980 economy positive
4 Japan Honda white 1980 economy positive
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
print("concept:",concepts)
print("target:",target)
OUTPUT :
Initialization of specific_h and general_h
specific_h: ['Japan' 'Honda' 'blue' 1980 'economy']
general_h: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]
concepts: [['Japan' 'Honda' 'blue' 1980 'economy']
['Japan' 'Toyota' 'green' 1970 'sports']
['Japan' 'Toyota' 'blue' 1990 'economy']
['USA' 'Chrysler' 'red' 1980 'economy']
['Japan' 'Honda' 'white' 1980 'economy']]
general_h : 5
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]
Indices []
Final Specific_h:
['Japan' 'Honda' 'blue' 1980 'economy']
12
RESULT:
Thus, the program to Implement the concept of decision trees with suitable data set from real world
problem and classify the data set to produce new sample using Python has been executed successfully.
13
Ex.No: 2 Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using
Date : appropriate data sets.
AIM:
Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
ALGORITHM:
PROGRAM/SOURCE CODE:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
14
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d _output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[92.]
[86.]
[89.]]
Predicted Output:
[[0.99999908]
[0.99999712]
[0.99999904]]
RESULT:
Thus, the program to Build an Artificial Neural Network by implementing the Backpropagation
algorithm using python has been executed successfully.
15
Ex.No: 3 Write a program to implement the naïve Bayesian classifier for a
sample training dataset stored as a .CSV file. Compute the accuracy
of the classifier, considering few test data sets
Date :
AIM:
To implement the naïve Bayesian classifier for a sample training dataset stored as a .CSV
file and compute the accuracy with a few test data sets.
DATASET: pima_indian.csv
LINK: https://drive.google.com/file/d/18PcjOtDELvR8wY4-iCiAXm1wNuox67a7/view?usp=share_link
ALGORITHM:
Step 2: . Load the dataset and split it into training and testing datasets using `train_test_split`.
Step 3: . Train the Naive Bayes classifier on the training data using
`GaussianNB().fit(xtrain,ytrain.ravel())`.
Step 4: Predict the class labels for the testing data using `clf.predict(xtest)`.
PROGRAM/SOURCE CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
df = pd.read_csv("pima_indian.csv")
feature_col_names = ['num_preg', 'glucose_conc', 'diastolic_bp', 'thickness', 'insulin', 'bmi', 'diab_pred', 'ag
e']
predicted_class_names = ['diabetes']
16
X = df[feature_col_names].values # these are factors for the prediction
y = df[predicted_class_names].values # this is what we want to predict
xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.33)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)
clf = GaussianNB().fit(xtrain,ytrain.ravel())
predicted = clf.predict(xtest)
predictTestData= clf.predict([[6,148,72,35,0,33.6,0.627,50]])
OUTPUT:
the total number of Training Data : (514, 1)
Confusion matrix
[[148 23]
[ 35 48]]
Accuracy of the classifier is 0.7716535433070866
RESULT:
Thus, the program to implement the naïve Bayesian classifier for a sample training dataset stored as a
.CSV file using Python has been executed successfully.
17
Ex.No: 4 Implement naïve Bayesian Classifier model to classify a
set of documents and measure the accuracy, precision
Date : ,and recall.
AIM:
Implement naïve Bayesian Classifier model to classify a set of documents and
measure the accuracy, precision, and recall
DATASET: naivetext.csv
LINK: https://drive.google.com/file/d/1sEpbtiB9qP6DdpqlvbB_6OL8aBsJqe8s/view?usp=share_link
ALGORITHM:
Step 6: Use the trained classifier to make predictions on the test data.
Step 7: Calculate and print the accuracy, confusion matrix, precision, and recall.
PROGRAM/SOURCE CODE:
import pandas as pd
msg=pd.read_csv('naivetext.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
18
X=msg.message
y=msg.labelnum
print(X)
print(y)
OUTPUT:
The dimensions of the dataset (18, 2)
0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
19
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today
Confusion matrix
[[2 0]
[0 3]]
20
RESULT:
21
Ex.No: 5 Write a program to construct a Bayesian network
considering medical data. Use this model to demonstrate
Date : the diagnosis of heart patients using standard Heart Disease
Data Set.
AIM:
To construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart Disease Data Set.
DATASET: heart.csv
LINK: https://drive.google.com/file/d/10C80zeowRWEGazpPZw_n0wK4f_rRlRbL/view?usp=share_link
ALGORITHM:
Step 6: Learn CPDs of the model from the dataset using MLE.
Step 7: Perform inference with the Bayesian network using 'VariableElimination' class.
Step 8: Compute and print the probabilities of heart disease given evidence of 'restecg=1'
and 'cp=2' using the 'query' method of 'VariableElimination' object.
22
PROGRAM/SOURCE CODE:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
23
model= BayesianModel([('age','heartdisease'),('gender','heartdisease'),('exang','heartdisease'),('
cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])
24
RESULT:
Thus, the program to implement Bayesian network considering medical
data. Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set has been executed successfully.
25
Ex.No: 6 Apply EM algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for
Date : clustering using the k-Means algorithm.
Compare the results of these two algorithms.
AIM:
To Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using the k-Means algorithm. Compare the results of these two algorithms.
DATASET: iris.csv
LINK: https://drive.google.com/file/d/1-lseekjQ6h1xHKETLYlm7a_IfZ-3sAOY/view?usp=share_link
ALGORITHM:
Step 2: Read the dataset from the given CSV file into a pandas dataframe.
Step 3: Extract the input features from the dataset anDstore it in a new dataframe
X.
Step 4: Create a KMeans model with three clusters and fit it to the input data X.
Step 5: Create a Gaussian Mixture model with three components and fit it to the input data
X.
Step 6: Print the accuracy score and confusion matrix of both models.
26
PROGRAM/SOURCE CODE:
X = dataset.iloc[:, :-1]
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])
# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])
27
# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])
28
#GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])
RESULT:
Thus, the program to implement EM algorithm to cluster a set of data stored in a
.CSV file. Use the same data set for clustering using the k-Means algorithm has been executed
successfully.
29
Ex.No: 7 Write a program to implement k-Nearest Neighbour
algorithm to classify the iris dataset. Print both
Date : correct and wrong predictions
AIM:
To implement k-Nearest Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions
DATASET: iris.csv
LINK: https://drive.google.com/file/d/1vVwpEuIyb-r3uVtrvbZxBMDsX5wpq-ml/view?usp=share_link
ALGORITHM:
Step 1: Load the Iris dataset from a CSV file into a pandas dataframe.
Step 2: Split the dataset into the input features X and output class y.
Step 4: Initialize a KNN classifier with n_neighbors set to 5 and fit it to the training data.
Step 5: Predict the class labels for the testing set using the KNN classifier.
Step 6: Calculate and print the confusion matrix, classification report, and accuracy score
of the KNN classifier.
30
PROGRAM/SOURCE CODE:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
ypred = classifier.predict(Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
31
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")
32
RESULT:
Thus, the program to implement k-Nearest Neighbour algorithm to classify the iris
data set using Python has been executed successfully.
33
Ex.No: 8
Write a program to implement Decision Tree
Date : classification model using a .CSV file to measure the
accuracy.
AIM:
To implement Decision Tree classification model using a .CSV file to measure the
accuracy.
DATASET: data_cleaned.csv
LINK: https://drive.google.com/file/d/1VbrVnGcblK7e2PXVQlMkUTtxTkaVVYBW/view?usp=share_link
ALGORITHM:
Step 1: Load cleaned data and split into training and validation sets.
Step 7: Iterate through different values of max_depth to generate train and validation
Accuracy scores.
Step 8: Visualize train and validation accuracy scores using a line graph.
Step 9: Create a decision tree classifier with max_depth of 8 and max_leaf_nodes of 25.
Step 10: Fit the classifier to the training set and evaluate accuracy on training and validation
Sets.
34
Step 11: Use graphviz to create a visualization of the decision tree.
PROGRAM/SOURCE CODE:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data=pd.read_csv('data_cleaned.csv')
print(data.shape)
data.isnull().sum()
y = data['Survived']
X = data.drop(['Survived'], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state = 101, stratify=y, t
est_size=0.25)
y_train.value_counts (normalize=True)
y_valid.value_counts(normalize=True)
X_train.shape, y_train.shape
X_valid.shape, y_valid.shape
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(random_state=10)
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid, y_valid)
dt_model.predict(X_valid)
dt_model.predict_proba(X_valid)
y_pred = dt_model.predict_proba(X_valid)[:,1]
y_new = []
for i in range(len(y_pred)):
if y_pred[i]<=0.7:
y_new.append(0)
else:
y_new.append(1)
from sklearn.metrics import accuracy_score
accuracy_score(y_valid, y_new)
train_accuracy = []
validation_accuracy = []
for depth in range(1,30):
dt_model = DecisionTreeClassifier(max_depth=depth, random_state=10)
35
dt_model.fit(X_train, y_train)
train_accuracy.append(dt_model.score(X_train, y_train))
validation_accuracy.append(dt_model.score(X_valid, y_valid))
frame = pd.DataFrame({'max_depth':range(1,30), 'train_acc':train_accuracy, 'valid_acc':val
idation_accuracy})
frame.head(15)
plt.figure(figsize=(12,6))
plt.plot(frame['max_depth'], frame['train_acc'], marker='o')
plt.plot(frame['max_depth'], frame['valid_acc'], marker='o')
plt.xlabel('Depth of tree')
plt.ylabel('performance')
plt.legend()
dt_model = DecisionTreeClassifier(max_depth=8, max_leaf_nodes=25, random_state=10)
dt_model
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid, y_valid)
from sklearn import tree
!pip install graphviz
decision_tree = tree.export_graphviz(dt_model,out_file='tree.dot',feature_names=X_train.c
olumns,max_depth=2,filled=True)
!dot -Tpng tree.dot -o tree.jpg
image = plt.imread('tree.jpg')
plt.figure(figsize=(15,15))
plt.imshow(image)
OUTPUT:
36
RESULT:
Thus, the program to implement Decision Tree classification model
using a .CSV file to measure the accuracy using Python has been executed successfully.
37
Ex.No: 9
Implement Logistic regression Algorithm with a dataset
Date : .And measure the accuracy score and confusion matrix
AIM:
To implement Logistic regression Algorithm with a dataset . And measure the accuracy
score and Confusion matrix.
DATASET: summa.csv
LINK:https://drive.google.com/file/d/1w8C2PmuZkDOuVEhIwTdBb3LJMW7HIJ1R/view?usp=share_link
ALGORITHM:
Step 1: Import necessary libraries and load the dataset using pandas.read_csv().
Step 2: Extract the 'temp' and 'label' columns and reshape them.
Step 3: Create a scatter plot with a logistic regression line using seaborn.regplot().
Step 4: Split the data into training and testing sets using train_test_split().
Step 5: Initialize a LogisticRegression model object and fit the training data.
Step 6: Predict the y values for the testing data and calculate the accuracy score using
Accuracy score.
Step 7: Generate a confusion matrix to evaluate the model's performance on the entire
Dataset.
38
PROGRAM/SOURCE CODE:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,confusion_matrix
df=pd.read_csv('summa.csv')
x=df['temp'].values
y=df['label'].values
x=x.reshape(-1,1)
y=y.reshape(-1,1)
plt.scatter(x,y)
sns.regplot(x=x,
y=y,
data=df,
logistic=True,
line_kws={'color':'black'},
scatter_kws={'color':'green'})
39
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2)
lr=LogisticRegression()
lr.fit(xtrain,ytrain)
print("confusion matrix:","\n",confusion_matrix(y,lr.predict(x)))
RESULT:
Thus, the program to implement Logistic regression Algorithm with a dataset . And
measure the accuracy score and Confusion matrix using Python has been executed
successfully.
40
Ex.No: 10
Implement Linear regression Algorithm with a
Date :
dataset And measure the accuracy score.
AIM:
Implement Linear regression Algorithm with a dataset and measure the accuracy score.
LINK: https://drive.google.com/file/d/1zSPRRkSxqLPYWsQTCQHjAqUfNv14zrqU/view?usp=share_link
ALGORITHM:
Step 1: Import necessary libraries and mount Google Drive to access the dataset using
Drive.mount().
Step 2: Load the 'linear_data.csv' dataset using pandas.read_csv() function and store it in a
DataFrame variable called df.
Step 3: Extract the 'x' and 'y' columns and reshape them.
Step 4: Split the data into training and testing sets using train_test_split() with a test size of
0.25.
Step 5: Initialize a LinearRegression model object and fit the training data.
Step 6: Predict the y values for the testing data using lr.predict().
Step 7: Create a scatter plot of the 'x' and 'y' data with a linear regression line using
matplotlib.pyplot.scatter() and matplotlib.pyplot.plot() functions Dataset.
Step 8: Calculate the R-squared score of the model using r2_score() function by comparing
the predicted y values with the actual y values in the testing set.
41
PROGRAM/SOURCE CODE:
df=pd.read_csv('/linear_data.csv')
x=df['x'].values
y=df['y'].values
x=x.reshape(-1,1)
y=y.reshape(-1,1)
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.25)
lr=LinearRegression()
lr.fit(xtrain,ytrain)
y_pred=lr.predict(xtest)
plt.scatter(x,y,c='blue')
plt.plot(xtest,y_pred)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
42
accuracy = r2_score(ytest, y_pred)
print("R2_score:", accuracy)
RESULT:
Thus, the program to Implement Linear regression Algorithm with a
dataset and measure the accuracy score using python has been executed successfully.
43
44