0% found this document useful (0 votes)
38 views2 pages

Hitters

This document loads baseball player data, builds a regression tree to predict salary based on performance statistics, prunes the tree, performs cross-validation to evaluate the model, and calculates performance metrics on a test set. It loads data, builds and prunes a regression tree to predict salary from years of experience and hits, evaluates the model using cross-validation, makes predictions on a test set and calculates error metrics.

Uploaded by

Brokin Hart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views2 pages

Hitters

This document loads baseball player data, builds a regression tree to predict salary based on performance statistics, prunes the tree, performs cross-validation to evaluate the model, and calculates performance metrics on a test set. It loads data, builds and prunes a regression tree to predict salary from years of experience and hits, evaluates the model using cross-validation, makes predictions on a test set and calculates error metrics.

Uploaded by

Brokin Hart
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

# Loading the library

library(ISLR)
library(tree)

# Attach the data set


attach(Hitters)

# Regression Tree
t_1<-tree(log(Salary)~Years+Hits,data=Hitters)
plot(t_1)
text(t_1)

t_1_p<-prune.tree(t_1,best=3)

# Plotting
plot(t_1_p)
text(t_1_p)

# Deleting the Missing Observations


Hitters=na.omit(Hitters)
dim(Hitters)

# Setting the Seed


set.seed(2)

# Training Data Set


train=sample(1:263,132)

# Regression Tree
Hitters_tree=tree(log(Salary)~Years+Hits+Runs+Walks,data=Hitters,subset=train)

# Plotting the Tree


plot(Hitters_tree)
text(Hitters_tree,pretty=0)

# Six-fold Cross-Validation
Hitters_cv=cv.tree(Hitters_tree,K=6)
plot(Hitters_cv$size,Hitters_cv$dev,type="b",xlab="Size",ylab="MSE",col="red",lwd=2
)

# Pruning The Tree


Hitters_prune=prune.tree(Hitters_tree,best=3)

# Plotting the Pruned Tree


plot(Hitters_prune)
text(Hitters_prune,pretty=0)

# Prediction for the Test Data


log_salary_pred=predict(Hitters_prune,newdata=Hitters[-train,])
Salary_pred=exp(log_salary_pred)

# Actual Salary Variable for Test Dataset


Salary_act=Hitters[-train,"Salary"]

# Ploting of Actual Value and Predicted Values


plot(Salary_pred,Salary_act,xlab="Predicted Value",ylab="Actual Salary", col="red")
abline(0,1,col="blue",lwd=2)
# Test MSE
MSE<-mean((Salary_pred-Salary_act)^2)
RMSE<-sqrt(MSE)

You might also like