ML Exp-5,6
ML Exp-5,6
Date:05-09-2024
EXPERIMENT-05
Aim: To Predict house prices using linear regression and evaluate model performance with MAE, MSE,
and RMSE.
Program:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import
train_test_split from sklearn.linear_model import
LinearRegression from sklearn import metrics
import pandas as pd
#use the trained model to make the predictions on the test data
predictions=lm.predict(X_test)
22261A6623 1
MACHINE LEARNING
Date:05-09-2024
OUTPUT:
Avg. Area Income Avg. Area House Age Avg. Area Number of Rooms \
0 79545.458574 5.682861 7.009188
1 79248.642455 6.002900 6.730821
2 61287.067179 5.865890 8.512727
3 63345.240046 7.188236 5.586729
4 59982.197226 5.040555 7.839388
Address
0 208 Michael Ferry Apt. 674\nLaurabury, NE 3701...
1 188 Johnson Views Suite 079\nLake Kathleen, CA...
2 9127 Elizabeth Stravenue\nDanieltown, WI 06482...
3 USS Barnett\nFPO AP 44820
4 USNS Raymond\nFPO AE 09386
22261A6623 2
MACHINE LEARNING
Date:05-09-2024
Aim:To Predict tennis play outcomes using a decision tree classifier and evaluate model
performance with accuracy, precision, recall, and F1-score.
Program:
import numpy as np
import pandas as pd
from sklearn import metrics
df=pd.read_csv('C:/22261A6630/play_tennis.csv')
value=['outlook','temp','humidity','wind']
df
len(df)
df.shape
df.head()
df.tail()
df.describe()
from sklearn import preprocessing
string_to_int = preprocessing.LabelEncoder()
df=df.apply(string_to_int.fit_transform)
feature_cols=['outlook','temp','humidity','wind']
X=df[feature_cols]
y=df.play
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30)
#PERFORM training
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='entropy',random_state=100)
classifier.fit(X_train,y_train)
y_pred=classifier.predict(X_test)
from sklearn.metrics import accuracy_score
print("accuracy:",metrics.accuracy_score(y_test,y_pred))
data_p=pd.DataFrame({'Actual':y_test,'predicted':y_pred})
data_p
from sklearn.metrics import classification_report,confusion_matrix
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))
OUTPUT:
accuracy: 0.6
[[1 2]
[0 2]]
precision recall f1-score support
accuracy 0.60 5
macro avg 0.75 0.67 0.58 5
weighted avg 0.80 0.60 0.57 5
22261A6623 3
MACHINE LEARNING
Date:12-09-2024
EXPERIMENT-06
Aim:Tune hyperparameters to find the best decision tree model, evaluate its performance, and
visualize the best tree along with its classification metrics.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import
DecisionTreeClassifier,plot_tree from
sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report,accuracy_score,confusion_matrix
import matplotlib.pyplot as plt
df=pd.read_csv('C:/Users/MGIT/OneDrive/Desktop/22261A6621/archive (2)/play_tennis.csv')
Label_Encoders={}
for column in df.columns:
le=LabelEncoder()
df[column]=le.fit_transform(df[column])
Label_Encoders[column]=le
X=df.drop('play',axis=1)
y=df['play']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.33,random_state=42)hyperparameters
=[
{'criterion':'gini','max_depth':None,'min_samples_split':2,'min_samples_leaf':1},
{'criterion':'entropy','max_depth':4,'min_samples_split':2,'min_samples_leaf':1},
{'criterion':'gini','max_depth':6,'min_samples_split':5,'min_samples_leaf':2},
{'criterion':'entropy','max_depth':8,'min_samples_split':10,'min_samples_leaf':4},
]
best_accuracy=0
best_params=None
best_tree=None
for params in hyperparameters:
tree=DecisionTreeClassifier(**params,random_state=42)
tree.fit(x_train,y_train)
y_pred=tree.predict(x_test)
accuracy=accuracy_score(y_test,y_pred)
print(f"Parameters:{paimport pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
df=pd.read_csv('C:/Users/MGIT/OneDrive/Desktop/22261A6621/archive (2)/play_tennis.csv')
label_encoders={}
for column in df.columns:
le=LabelEncoder()
df[column]=le.fit_trarams},Accuracy:{accuracy:.4f}")
if accuracy>best_accuracy:
22261A6623 4
MACHINE LEARNING
Date:12-09-2024
best_accuracy=accuracy
best_params=params
best_tree=tree
print(f"\nBest Parameters:{best_params},Best Accuracy:{best_accuracy:.4f}")
plt.figure(figsize=(12,8))
plot_tree(best_tree,filled=True,feature_names=list(x.columns),class_names=list(label_encoders['play
'].classes_),rounded=True)
plt.title('Best Decision Tree')
plt.show()
y_pred_best=best_tree.predict(x_test)
print("Best Decision Tree - Classification Report:")
print(classification_report(y_test,y_pred_best,target_names=label_encoders['play'].classes_))
print("Best Decision Tree - Confusion Matrix:")
print(confusion_matrix(y_test,y_pred_best))
print("Best Decision Tree - Accuracy Score:")
print(accuracy_score(y_test,y_pred_best))
OUTPUT:
accuracy 0.80 5
macro avg 0.88 0.75 0.76 5
weighted avg 0.85 0.80 0.78 5
22261A6623 5
MACHINE LEARNING
Date:12-09-2024
Aim: To visualize a scatter plot of data points with X and Y coordinates, colored by class labels.
Program:
OUTPUT:
22261A6623 6