0% found this document useful (0 votes)
7 views17 pages

Tutorial03 - Linear Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views17 pages

Tutorial03 - Linear Regression

Uploaded by

1135399568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

IIMT2641

Introduction to Business Analytics

Introduction to Business Analytics


IIMT2641

Tutorial 03 – Introduction to R (II)


Linear Regression - NBA
IIMT2641
Introduction to Business Analytics

Functions
• A function takes in data, process it and return a
result or accomplish a specific task.
• Generally runs as funcname(input)
• Some basic functions useful for summarizing data:
– length() : length of a vector (number of elements)
– min() : minimum value
– max() : maximum value
– range() : range of data
– mean() : mean
– sd() : standard deviation
– sum() : sum

The University of Hong Kong 3


IIMT2641
Introduction to Business Analytics

• Create a vector
– temp <- c(35, 23, 29, 31, 28, 27)

• length(temp)
[1] 6
• min(temp)
[1] 23
• max(temp)
[1] 35

The University of Hong Kong 4


IIMT2641
Introduction to Business Analytics

• range(temp)
[1] 23 35
• mean(temp)
[1] 28.8333
• sd(temp)
[1] 4.020779
• sum(temp)
[1] 173

• In RStudio, we could type “?<funcation name>” to look


for the description of a function. e.g. ?mean

The University of Hong Kong 5


IIMT2641
Introduction to Business Analytics

Linear Regression - NBA


Read in the data
> NBA = read.csv("NBA_train.csv")
> str(NBA)

The University of Hong Kong 6


IIMT2641
Introduction to Business Analytics

Playoffs and Wins


# Compute Points Difference
> NBA$PTSdiff = NBA$PTS - NBA$oppPTS

# Check for linear relationship


> plot(NBA$PTSdiff, NBA$W)

The University of Hong Kong 8


IIMT2641
Introduction to Business Analytics

Playoffs and Wins


# Linear regression model for wins
> WinsReg = lm(W ~ PTSdiff, data=NBA)
> summary(WinsReg)

The University of Hong Kong 9


IIMT2641
Introduction to Business Analytics

Points Scored
# Linear regression model for points scored
> PointsReg = lm(PTS ~ X2PA + X3PA + FTA + AST + ORB +
DRB + TOV + STL + BLK, data=NBA)
> summary(PointsReg)

The University of Hong Kong 10


IIMT2641
Introduction to Business Analytics

Points Scored
# Sum of Squared Errors
> PointsReg$residuals

> SSE = sum(PointsReg$residuals^2)


> SSE
[1] 28394314

The University of Hong Kong 11


IIMT2641
Introduction to Business Analytics

Points Scored
# Root mean squared error
> RMSE = sqrt(SSE/nrow(NBA))
> RMSE
[1] 184.4049

# Average number of points in a season


> mean(NBA$PTS)
[1] 8370.24

The University of Hong Kong 12


IIMT2641
Introduction to Business Analytics

Points Scored
# Remove insignifcant variables
> PointsReg2 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + DRB + STL + BLK, data=NBA)
> summary(PointsReg2)

The University of Hong Kong 13


IIMT2641
Introduction to Business Analytics

Points Scored
> PointsReg3 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL + BLK, data=NBA)
> summary(PointsReg3)

The University of Hong Kong 14


IIMT2641
Introduction to Business Analytics

Points Scored
> PointsReg4 = lm(PTS ~ X2PA + X3PA + FTA + AST
+ ORB + STL, data=NBA)
> summary(PointsReg4)

The University of Hong Kong 15


IIMT2641
Introduction to Business Analytics

Points Scored
Compute SSE and RMSE for new model
> SSE_4 = sum(PointsReg4$residuals^2)
> RMSE_4 = sqrt(SSE_4/nrow(NBA))
> SSE_4
[1] 28421465
> RMSE_4
[1] 184.493

The University of Hong Kong 16


IIMT2641
Introduction to Business Analytics

Making Predictions
# Read in test set
NBA_test = read.csv("NBA_test.csv")

The University of Hong Kong 17


IIMT2641
Introduction to Business Analytics

Making Predictions
# Make predictions on test set
> PointsPredictions = predict(PointsReg4,
newdata=NBA_test)
> PointsPredictions

The University of Hong Kong 18


IIMT2641
Introduction to Business Analytics

Making Predictions
# Compute out-of-sample R^2
> SSE = sum((PointsPredictions - NBA_test$PTS)^2)
> SST = sum((mean(NBA$PTS) - NBA_test$PTS)^2)
> R2 = 1 - SSE/SST
> R2
[1] 0.8127142

# Compute the RMSE


> RMSE = sqrt(SSE/nrow(NBA_test))
> RMSE
[1] 196.3723

The University of Hong Kong 19

You might also like