0% found this document useful (0 votes)
70 views

Decision Tree and Python Coding

The document provides instructions for a coding assignment involving decision trees in Python. It instructs the reader to read an article on decision trees and Jupyter notebooks, and make example code run that creates a decision tree classifier, fits it to training data, plots the tree, performs cost complexity pruning, and graphs the accuracy of pruned trees for different alpha values on both training and test data. The code samples split data, encode features, create binary target variables, train and test a decision tree classifier, extract alpha values, and calculate accuracy scores.

Uploaded by

I'm a red ant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Decision Tree and Python Coding

The document provides instructions for a coding assignment involving decision trees in Python. It instructs the reader to read an article on decision trees and Jupyter notebooks, and make example code run that creates a decision tree classifier, fits it to training data, plots the tree, performs cost complexity pruning, and graphs the accuracy of pruned trees for different alpha values on both training and test data. The code samples split data, encode features, create binary target variables, train and test a decision tree classifier, extract alpha values, and calculate accuracy scores.

Uploaded by

I'm a red ant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Decision Tree and Python Coding

Due today at 10:30 AM

Instructions

Read the article here and make the codes run in Jupyter. Review on how to make methods in Python.
Put your name as part of the screenshots for the codes.

y_not_zero = y>0 #get the index for each non zero value in y.

y[y_not_zero] = 1 #set each non-zero value in y to 1.

y.unique() #verify that y only contains 0 and 1.

#split data into training and testing

X_train, X_test, y_train, y_test = train_test_split(X_encoded, y_int, test_size=0.30, random_state=42)

#create decision tree and fit it to the training data

clf_df=DecisionTreeClassifier(random_state=42)

clf_df=clf_df.fit(X_train,y_train)

##plot the tree

plt.figure(figsize=(15,7.5))

plot_tree(clf_df,filled=True,rounded=True,class_name=["","",],feature_names=X_encoded.columns);

path =clf_df.cost_complexity_pruning_path(X_train,y_train) #determine values for alpha

ccp_alphas = path.ccp_alphas #extract different values for alpha

ccp_alphas = ccp_alphas[:-1] #exclude the maximum value for alpha

clf_dt = [] #create an array that we will put decision trees into

## now create one decision tree per value for for alpha and store it in the array
for ccp_alpha in ccp_alphas:

~~graph the accuracy of the trees using the Training Dataset and the Testing Dataset as a function of
the alpha.

train_score = [clf_df.score(X_train,y_train) for clf_df in clf_dt]

test_score = [clf_df.score(X_test,y_test)for clf_df in clf_dt]

fig,ax = plt.subplots()

ax.set_xlabel("alphas")

ax.set_ylabel("accuracy")

ax.set_title("Accuracy vs alpha for training and testing sets")

ax.plot(ccp_alphas, train_scores, marker='o', label="train", drawstyle="steps-post")

ax.plot(ccp_alphas, test_scores, marker='o', label = "test", drawstyle="steps-post")

ax.legend()

plt.show()

You might also like