Tutorial03_Linear Regression
Tutorial03_Linear Regression
Functions
• A function takes in data, process it and return a
result or accomplish a specific task.
• Generally runs as funcname(input)
• Some basic functions useful for summarizing data:
– length() : length of a vector (number of elements)
– min() : minimum value
– max() : maximum value
– range() : range of data
– mean() : mean
– sd() : standard deviation
– sum() : sum
• Create a vector
– temp <- c(35, 23, 29, 31, 28, 27)
• length(temp)
[1] 6
• min(temp)
[1] 23
• max(temp)
[1] 35
• range(temp)
[1] 23 35
• mean(temp)
[1] 28.8333
• sd(temp)
[1] 4.020779
• sum(temp)
[1] 173
Points Scored
# Linear regression model for points scored
> PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB +
DRB + TOV + STL + BLK, data=NBA)
> summary(PointsReg)
Points Scored
# Sum of Squared Errors
> PointsReg$residuals
Points Scored
# Root mean squared error
> RMSE = sqrt(SSE/nrow(NBA))
> RMSE
[1] 184.4049
Points Scored
# Remove insignifcant variables
> PointsReg2 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + DRB + STL + BLK, data=NBA)
> summary(PointsReg2)
Points Scored
> PointsReg3 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL + BLK, data=NBA)
> summary(PointsReg3)
Points Scored
> PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL, data=NBA)
> summary(PointsReg4)
Points Scored
Compute SSE and RMSE for new model
> SSE_4 = sum(PointsReg4$residuals^2)
> RMSE_4 = sqrt(SSE_4/nrow(NBA))
> SSE_4
[1] 28421465
> RMSE_4
[1] 184.493
Making Predictions
# Read in test set
NBA_test = read.csv("NBA_test.csv")
Making Predictions
# Make predictions on test set
> PointsPredictions = predict(PointsReg4,
newdata=NBA_test)
> PointsPredictions
Making Predictions
# Compute out-of-sample R^2
> SSE = sum((PointsPredictions - NBA_test$PTS)^2)
> SST = sum((mean(NBA$PTS) - NBA_test$PTS)^2)
> R2 = 1 - SSE/SST
> R2
[1] 0.8127142