E.X No.6 Build D: Ecision Trees and Random Forests
E.X No.6 Build D: Ecision Trees and Random Forests
Aim:
Algorithm:
1. Start
2. Import the required libraries: pandas and matplotlib.pyplot.
3. Import the kyphosis data set into python script using pandas read_csv.
4. By exploratory data analysis determine the size of the data set :raw_data.info()
5. Visualize the data set using seaborn library.
6. Preprocess the data:
a.Splitting the dataset into the training and test set
b.train the decision tree model.
7.Make predictions using model.predict(x_test_data) and measure the performance using scikit-
learn's built-in functions classification_report and confusion_matrix.
8.Now train random forest model and make predictions using random forest Model.
9.Measure the performance of the random forest model and generate the confusion matrix.
10.Stop.
Program:
import pandas as pd
import numpy as np
#Visalization libraries
%matplotlib inline
raw_data = pd.read_csv('kyphosis.csv')
raw_data.columns
raw_data.info()
x = raw_data.drop('Kyphosis', axis = 1)
y = raw_data['Kyphosis']
model = DecisionTreeClassifier()
model.fit(x_training_data, y_training_data)
predictions = model.predict(x_test_data)
print(classification_report(y_test_data, predictions))
print(confusion_matrix(y_test_data, predictions))
random_forest_model = RandomForestClassifier()
random_forest_model.fit(x_training_data, y_training_data)
random_forest_predictions = random_forest_model.predict(x_test_data)
print(classification_report(y_test_data, random_forest_predictions))
print(confusion_matrix(y_test_data, random_forest_predictions))
Output:
RangeIndex: 81 entries, 0 to 80
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Kyphosis 81 non-null object
1 Age 81 non-null int64
2 Number 81 non-null int64
3 Start 81 non-null int64
dtypes: int64(3), object(1)
memory usage: 2.7+ KB
precision recall f1-score support
accuracy 0.68 25
macro avg 0.50 0.51 0.50 25
weighted avg 0.73 0.68 0.70 25
[[16 5]
[ 3 1]]
precision recall f1-score support
accuracy 0.76 25
macro avg 0.55 0.55 0.55 25
weighted avg 0.76 0.76 0.76 25
[[18 3]
[ 3 1]]
Result:
Thus a python program to build decision trees and random forests has been written and
executed successfully.