0% found this document useful (0 votes)
22 views6 pages

ML Exp-5,6

Uploaded by

prasunagummadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

ML Exp-5,6

Uploaded by

prasunagummadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

MACHINE LEARNING

Date:05-09-2024

EXPERIMENT-05
Aim: To Predict house prices using linear regression and evaluate model performance with MAE, MSE,
and RMSE.

Program:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import
train_test_split from sklearn.linear_model import
LinearRegression from sklearn import metrics
import pandas as pd

#load the dataset


USAhousing=pd.read_csv('C:/22261A6630/USA_Housing (1).csv')

#Display yje first few row of the dataset


print(USAhousing.head())

#define the features(independent variables ) and target(dependent variable)


#x:matrix features(multiple independent variables)
X=USAhousing[['Avg. Area Income','Avg. Area House Age','Avg. Area Number of Rooms','Avg.
Area Number of Bedrooms','Area Population']]

#Y:Target variable(dependent variable)


y=USAhousing['Price']

#Split the data into training and testing sets


X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)

#initialize the linear regression model


lm= LinearRegression()

#train the model on the traing data


lm.fit(X_train,y_train)

#use the trained model to make the predictions on the test data
predictions=lm.predict(X_test)

#visualise the predictions vs the actual


values plt.scatter(y_test,predictions)
plt.xlabel('actual prices')
plt.ylabel('predicted prices')
plt.title('actual vs predicted prices')
plt.show()

#evaluate the model using various error metrics


print('MAE(Mean Absolute Error):',metrics.mean_absolute_error(y_test,predictions))
print('MSE(Mean Squared Error):',metrics.mean_squared_error(y_test,predictions))
print('RMSE(Root Mean Squared Error):',np.sqrt(metrics.mean_squared_error(y_test,predictions)))

22261A6623 1
MACHINE LEARNING
Date:05-09-2024
OUTPUT:

Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms \
0 79545.458574 5.682861 7.009188
1 79248.642455 6.002900 6.730821
2 61287.067179 5.865890 8.512727
3 63345.240046 7.188236 5.586729
4 59982.197226 5.040555 7.839388

Avg. Area Number of Bedrooms Area Population Price \


0 4.09 23086.800503 1.059034e+06
1 3.09 40173.072174 1.505891e+06
2 5.13 36882.159400 1.058988e+06
3 3.26 34310.242831 1.260617e+06
4 4.23 26354.109472 6.309435e+05

Address
0 208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1 188 Johnson Views Suite 079\nLake Kathleen, CA...
2 9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3 USS Barnett\nFPO AP 44820
4 USNS Raymond\nFPO AE 09386

MAE(Mean Absolute Error): 81563.14733994054


MSE(Mean Squared Error): 10337337828.267305
RMSE(Root Mean Squared Error): 101672.69952286752

22261A6623 2
MACHINE LEARNING
Date:05-09-2024

Aim:To Predict tennis play outcomes using a decision tree classifier and evaluate model
performance with accuracy, precision, recall, and F1-score.

Program:
import numpy as np
import pandas as pd
from sklearn import metrics
df=pd.read_csv('C:/22261A6630/play_tennis.csv')
value=['outlook','temp','humidity','wind']
df
len(df)
df.shape
df.head()
df.tail()
df.describe()
from sklearn import preprocessing
string_to_int = preprocessing.LabelEncoder()
df=df.apply(string_to_int.fit_transform)
feature_cols=['outlook','temp','humidity','wind']
X=df[feature_cols]
y=df.play
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30)
#PERFORM training
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='entropy',random_state=100)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
from sklearn.metrics import accuracy_score
print("accuracy:",metrics.accuracy_score(y_test,y_pred))
data_p=pd.DataFrame({'Actual':y_test,'predicted':y_pred})
data_p
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

OUTPUT:
accuracy: 0.6
[[1 2]
[0 2]]
precision recall f1-score support

0 1.00 0.33 0.50 3


1 0.50 1.00 0.67 2

accuracy 0.60 5
macro avg 0.75 0.67 0.58 5
weighted avg 0.80 0.60 0.57 5

22261A6623 3
MACHINE LEARNING
Date:12-09-2024

EXPERIMENT-06

Aim:Tune hyperparameters to find the best decision tree model, evaluate its performance, and
visualize the best tree along with its classification metrics.

Program:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import
DecisionTreeClassifier,plot_tree from
sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report,accuracy_score,confusion_matrix
import matplotlib.pyplot as plt
df=pd.read_csv('C:/Users/MGIT/OneDrive/Desktop/22261A6621/archive (2)/play_tennis.csv')
Label_Encoders={}
for column in df.columns:
le=LabelEncoder()
df[column]=le.fit_transform(df[column])
Label_Encoders[column]=le
X=df.drop('play',axis=1)
y=df['play']

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.33,random_state=42)hyperparameters
=[
{'criterion':'gini','max_depth':None,'min_samples_split':2,'min_samples_leaf':1},
{'criterion':'entropy','max_depth':4,'min_samples_split':2,'min_samples_leaf':1},
{'criterion':'gini','max_depth':6,'min_samples_split':5,'min_samples_leaf':2},
{'criterion':'entropy','max_depth':8,'min_samples_split':10,'min_samples_leaf':4},
]
best_accuracy=0
best_params=None
best_tree=None
for params in hyperparameters:
tree=DecisionTreeClassifier(**params,random_state=42)
tree.fit(x_train,y_train)
y_pred=tree.predict(x_test)
accuracy=accuracy_score(y_test,y_pred)
print(f"Parameters:{paimport pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
df=pd.read_csv('C:/Users/MGIT/OneDrive/Desktop/22261A6621/archive (2)/play_tennis.csv')
label_encoders={}
for column in df.columns:
le=LabelEncoder()
df[column]=le.fit_trarams},Accuracy:{accuracy:.4f}")
if accuracy>best_accuracy:
22261A6623 4
MACHINE LEARNING
Date:12-09-2024
best_accuracy=accuracy
best_params=params
best_tree=tree
print(f"\nBest Parameters:{best_params},Best Accuracy:{best_accuracy:.4f}")
plt.figure(figsize=(12,8))
plot_tree(best_tree,filled=True,feature_names=list(x.columns),class_names=list(label_encoders['play
'].classes_),rounded=True)
plt.title('Best Decision Tree')
plt.show()
y_pred_best=best_tree.predict(x_test)
print("Best Decision Tree - Classification Report:")
print(classification_report(y_test,y_pred_best,target_names=label_encoders['play'].classes_))
print("Best Decision Tree - Confusion Matrix:")
print(confusion_matrix(y_test,y_pred_best))
print("Best Decision Tree - Accuracy Score:")
print(accuracy_score(y_test,y_pred_best))

OUTPUT:

Parameters:{'criterion': 'gini', 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf':


1},Accuracy:0.6000
Parameters:{'criterion': 'entropy', 'max_depth': 4, 'min_samples_split': 2, 'min_samples_leaf':
1},Accuracy:0.6000
Parameters:{'criterion': 'gini', 'max_depth': 6, 'min_samples_split': 5, 'min_samples_leaf': 2},Accuracy:0.8000
Parameters:{'criterion': 'entropy', 'max_depth': 8, 'min_samples_split': 10, 'min_samples_leaf':
4},Accuracy:0.6000
Best Parameters:{'criterion': 'gini', 'max_depth': 6, 'min_samples_split': 5, 'min_samples_leaf': 2},Best
Accuracy:0.8000

Best Decision Tree - Classification Report:


precision recall f1-score support

No 1.00 0.50 0.67 2


Yes 0.75 1.00 0.86 3

accuracy 0.80 5
macro avg 0.88 0.75 0.76 5
weighted avg 0.85 0.80 0.78 5

Best Decision Tree - Confusion Matrix:


[[1 1]
[0 3]]
Best Decision Tree - Accuracy Score:
0.8

22261A6623 5
MACHINE LEARNING
Date:12-09-2024

Aim: To visualize a scatter plot of data points with X and Y coordinates, colored by class labels.

Program:

import matplotlib.pyplot as plt


X=[4,5,10,4,3,11,14,8,10,12]
Y=[21,19,24,17,16,25,24,22,21,21]
classes=[0,0,1,0,0,1,1,0,1,1]
plt.scatter(X,Y,c=classes)
plt.show()

OUTPUT:

22261A6623 6

You might also like