0% found this document useful (0 votes)
5 views

Tutorial03_Linear Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Tutorial03_Linear Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

IIMT2641

Introduction to Business Analytics

Introduction to Business Analytics


IIMT2641

Tutorial 03 – Introduction to R (II)


Linear Regression - NBA
IIMT2641
Introduction to Business Analytics

Functions
• A function takes in data, process it and return a
result or accomplish a specific task.
• Generally runs as funcname(input)
• Some basic functions useful for summarizing data:
– length() : length of a vector (number of elements)
– min() : minimum value
– max() : maximum value
– range() : range of data
– mean() : mean
– sd() : standard deviation
– sum() : sum

The University of Hong Kong 3


IIMT2641
Introduction to Business Analytics

• Create a vector
– temp <- c(35, 23, 29, 31, 28, 27)

• length(temp)
[1] 6
• min(temp)
[1] 23
• max(temp)
[1] 35

The University of Hong Kong 4


IIMT2641
Introduction to Business Analytics

• range(temp)
[1] 23 35
• mean(temp)
[1] 28.8333
• sd(temp)
[1] 4.020779
• sum(temp)
[1] 173

• In RStudio, we could type “?<funcation name>” to look


for the description of a function. e.g. ?mean

The University of Hong Kong 5


IIMT2641
Introduction to Business Analytics

Linear Regression - NBA


Read in the data
> NBA = read.csv("NBA_train.csv")
> str(NBA)

The University of Hong Kong 6


IIMT2641
Introduction to Business Analytics

Playoffs and Wins


# Compute Points Difference
> NBA$PTSdiff = NBA$PTS - NBA$oppPTS

# Check for linear relationship


> plot(NBA$PTSdiff, NBA$W)

The University of Hong Kong 8


IIMT2641
Introduction to Business Analytics

Playoffs and Wins


# Linear regression model for wins
> WinsReg = lm(W ~ PTSdiff, data=NBA)
> summary(WinsReg)

The University of Hong Kong 9


IIMT2641
Introduction to Business Analytics

Points Scored
# Linear regression model for points scored
> PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB +
DRB + TOV + STL + BLK, data=NBA)
> summary(PointsReg)

The University of Hong Kong 10


IIMT2641
Introduction to Business Analytics

Points Scored
# Sum of Squared Errors
> PointsReg$residuals

> SSE = sum(PointsReg$residuals^2)


> SSE
[1] 28394314

The University of Hong Kong 11


IIMT2641
Introduction to Business Analytics

Points Scored
# Root mean squared error
> RMSE = sqrt(SSE/nrow(NBA))
> RMSE
[1] 184.4049

# Average number of points in a season


> mean(NBA$PTS)
[1] 8370.24

The University of Hong Kong 12


IIMT2641
Introduction to Business Analytics

Points Scored
# Remove insignifcant variables
> PointsReg2 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + DRB + STL + BLK, data=NBA)
> summary(PointsReg2)

The University of Hong Kong 13


IIMT2641
Introduction to Business Analytics

Points Scored
> PointsReg3 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL + BLK, data=NBA)
> summary(PointsReg3)

The University of Hong Kong 14


IIMT2641
Introduction to Business Analytics

Points Scored
> PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL, data=NBA)
> summary(PointsReg4)

The University of Hong Kong 15


IIMT2641
Introduction to Business Analytics

Points Scored
Compute SSE and RMSE for new model
> SSE_4 = sum(PointsReg4$residuals^2)
> RMSE_4 = sqrt(SSE_4/nrow(NBA))
> SSE_4
[1] 28421465
> RMSE_4
[1] 184.493

The University of Hong Kong 16


IIMT2641
Introduction to Business Analytics

Making Predictions
# Read in test set
NBA_test = read.csv("NBA_test.csv")

The University of Hong Kong 17


IIMT2641
Introduction to Business Analytics

Making Predictions
# Make predictions on test set
> PointsPredictions = predict(PointsReg4,
newdata=NBA_test)
> PointsPredictions

The University of Hong Kong 18


IIMT2641
Introduction to Business Analytics

Making Predictions
# Compute out-of-sample R^2
> SSE = sum((PointsPredictions - NBA_test$PTS)^2)
> SST = sum((mean(NBA$PTS) - NBA_test$PTS)^2)
> R2 = 1 - SSE/SST
> R2
[1] 0.8127142

# Compute the RMSE


> RMSE = sqrt(SSE/nrow(NBA_test))
> RMSE
[1] 196.3723

The University of Hong Kong 19

You might also like