0% found this document useful (0 votes)
55 views6 pages

MLR Version2

This document describes conducting multiple linear regression analysis in R to predict stock index prices using interest rates and unemployment rates as predictor variables. It includes steps to import and prepare the data, fit three regression models (multiple linear regression, random forest regression, and support vector regression), compare their performance using RMSE, and visualize the predicted versus actual values. The assignment is to replicate this analysis for a marketing dataset to predict sales using three advertising variables as predictors and compare the regression models.

Uploaded by

Melanie Samsona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

MLR Version2

This document describes conducting multiple linear regression analysis in R to predict stock index prices using interest rates and unemployment rates as predictor variables. It includes steps to import and prepare the data, fit three regression models (multiple linear regression, random forest regression, and support vector regression), compare their performance using RMSE, and visualize the predicted versus actual values. The assignment is to replicate this analysis for a marketing dataset to predict sales using three advertising variables as predictors and compare the regression models.

Uploaded by

Melanie Samsona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

# copy and paste to a blank file in R

# your cellphone may block this file if given in R format

## Your assignment is found at the bottom of the document


## If you want to see the corresponding output then
# run R code line by line

## Full Illustration of Multiple Linear Regression Using R


## You have to run every line of code to see the output

## Because of him ...


## "Life is really simple, but we insist on making it complicated."
## --- **Confucius**

## Topic: Prediction of Stock Index Price using Interest_Rate and


## Unemployment_Rate as predictor variables

## Methods: Multiple Linear Regression, Random Forest Regression and Support Vector
Regression

# Prepared by"
# Carlito O. Daarol
# Faculty/Statistician/Data Scientist
# Mathematics Department
# Mindanao State University
# General Santos city
# January 1, 2021

# Step 1: Enter the data.

(Year <-
c(2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2
016,2016,2016,2016,2016,2016,2016,2016))
(Month <- c(12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1))
(Interest_Rate <-
c(2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,1.75))
(Unemployment_Rate <-
c(5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9
,6.2,6.2,6.1))
(Stock_Index_Price <-
c(1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,
949,884,866,876,822,704,719))

## combine all variables as a table


data <-
as.data.frame(cbind(Year,Month,Interest_Rate,Unemployment_Rate,Stock_Index_Price))

# check first 6 rows


head(data)
# load variable names to memory
attach(data)

# --------------Here is how to save file---------------------


# save table as Excel csv file for future retrieval
write.csv(data,file="Multreg.csv")

# --------------Here is how to load file---------------------


filedata <- read.csv("Multreg.csv")
filedata
# -----------------------------------------------------------

# Step 2: Check for linearity of relationship by inspection


# response variable = Stock_Index_Price versus
# predictor1 = Interest_Rate
# predictor2 = Unemployment_Rate

plot(x=Interest_Rate, y=Stock_Index_Price)
plot(x=Unemployment_Rate, y=Stock_Index_Price)
# the plot should suggest a linear pattern

# Step 3: Use this template, apply the multiple linear regression in R


# model_mlr <- lm(Dependent variable ~ First independent Variable + Second
independent variable + ...)
# summary(model)

model <- lm(Stock_Index_Price ~ Interest_Rate + Unemployment_Rate)


summary(model)
multiple_linear_prediction <- predict(model, filedata)

# Step 4: Inspect the results


# You should see this kind of results

# Call:
# lm(formula = Stock_Index_Price ~ Interest_Rate + Unemployment_Rate)

# Residuals:
# Min 1Q Median 3Q Max
# -158.205 -41.667 -6.248 57.741 118.810

# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 1798.4 899.2 2.000 0.05861 .
# Interest_Rate 345.5 111.4 3.103 0.00539 **
# Unemployment_Rate -250.1 117.9 -2.121 0.04601 *
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Residual standard error: 70.56 on 21 degrees of freedom


# Multiple R-squared: 0.8976, Adjusted R-squared: 0.8879
# F-statistic: 92.07 on 2 and 21 DF, p-value: 4.043e-11

# Step 5 You can use the coefficients in the summary in


# order to build the multiple linear regression equation as follows:

# Stock_Index_Price = (Intercept) + (Interest_Rate coef)*X1 (Unemployment_Rate


coef)*X2
# Stock_Index_Price = (1798.4) + (345.5)*X1 + (-250.1)*X2
# You should x1 and x2 with a particular value, say x1 = 1.5 and x2 = 5.8

# Step 6: Make a prediction


# For example, imagine that you want to predict the stock index price after you
collected the following data:

# Interest Rate = 1.5 (i.e., X1= 1.5)


# Unemployment Rate = 5.8 (i.e., X2= 5.8)
Stock_Index_Price1 = (1798.4) + (345.5)*(1.5) + (-250.1)*(5.8)
Stock_Index_Price1

# Step 7: Some additional statistics to consider in the output summary:


# pickup the value of each statistic

# 1. Adjusted R-squared reflects the fit of the model,


# where a higher value generally indicates a better fit

# 2. Intercept coefficient is the Y-intercept

# 3. Interest_Rate coefficient is the change in Y due to a change of


# one unit in the interest rate (everything else held constant)

# 4. Unemployment_Rate coefficient is the change in Y due to a


# change of one unit in the unemployment rate
# (everything else held constant)

# 5. Std. Error reflects the level of accuracy of the coefficients


# Pr(>|t|) is the p-value. A p-value of less than 0.05 is
# considered to be statistically significant

# Step 8: Extract all results in the regression model

# extract variables under the model object


names(model)
# [1] "coefficients" "residuals" "effects" "rank"
"fitted.values" "assign"
# v[7] "qr" "df.residual" "xlevels" "call" "terms"
"model"

# extract fitted.values or forecast values


model$fitted.values

# extract model coefficients


model$coefficients

# extract the variables in the model


model$model
model$model$Stock_Index_Price
model$model$Interest_Rate
model$model$Unemployment_Rate

# Step 9: check for normality of the regression error terms


plot(model$residuals)
hist(model$residuals)
shapiro.test(model$residuals)
# non-normal if p-value < 0.05

# Step 10: check for percentage error


rmse_mlr <- sqrt(mean(model$residuals^2))

# Step 11: check for percentage error


# using RandomForest and Support Vector regression

library(randomForest)
random_forest <- randomForest(Stock_Index_Price ~., data = filedata, ntree = 5000)
random_forest_prediction <- predict(random_forest, filedata)
residual <- random_forest_prediction - filedata$Stock_Index_Price
rmse_rf <- sqrt(mean(residual^2))

library(e1071)
svr <- svm(Stock_Index_Price ~., data = filedata)
OptModelsvm=tune(svm,Stock_Index_Price ~., data =
filedata,ranges=list(elsilon=seq(0,1,0.1), cost=1:100))
BestModel=OptModelsvm$best.model
svr_prediction <- predict(BestModel, filedata)
residual <- svr_prediction - filedata$Stock_Index_Price
rmse_svr <- sqrt(mean(residual^2))

# present model root mean square error for all models


c(rmse_mlr,rmse_rf,rmse_svr)
#[1] 66.00463 42.85300 17.79912

# Step 12: Plot for Stock_Index_Price and predicted valuer


# MLR model versus RandomForest Model versus Support Vector model

num_obs <- length(Stock_Index_Price)


x_vals = seq(from = 0, to = 2000, length.out = num_obs)
par(mfrow =c(1,3))
plot(x_vals,Stock_Index_Price, xlab="Multiple Linear Regression",main=)
points(x_vals,multiple_linear_prediction, col = "blue", pch=4, type="l",lwd=2)

plot(x_vals,Stock_Index_Price, xlab="Random Forest Regression")


points(x_vals,random_forest_prediction, col = "green", pch=4, type="l",lwd=2)

plot(x_vals,Stock_Index_Price, xlab="Support Vector Regression")


points(x_vals,svr_prediction, col = "red", pch=4, type="l",lwd=2)
par(mfrow =c(1,1))

# Another display using ggplot2


data <- as.data.frame(cbind(x_vals,Stock_Index_Price))
colnames(data) <- c("X","Y")

library(ggplot2)

title = paste0("Multiple Linear Regression RMSE = ", round(rmse_mlr,2))


ggplot2::ggplot() +
ggplot2::geom_point(data = data, size = 2,
ggplot2::aes(x = X, y = Y, color = "Stock_Index_Price"))+

# Multiple Linear Regression Predictions


ggplot2::geom_line(data = data, size = 2, alpha = 0.7,
ggplot2::aes(x = X, y = multiple_linear_prediction,
color = "Predicted with MLR")) +
ggtitle(title)

title = paste0("Random Forest Regression RMSE = ", round(rmse_rf,2))


ggplot2::ggplot() +

ggplot2::geom_point(data = data, size = 2,


ggplot2::aes(x = X, y = Y, color = "Stock_Index_Price"))+

ggplot2::geom_line(data = data, size = 2, alpha = 0.7,


ggplot2::aes(x = X, y = random_forest_prediction,
color = "Predicted with RandomForest"))+
ggtitle(title)

title = paste0("Support Vector Regression RMSE = ", round(rmse_svr,2))


ggplot2::ggplot() +

ggplot2::geom_point(data = data, size = 2,


ggplot2::aes(x = X, y = Y, color = "Stock_Index_Pricea"))+

ggplot2::geom_line(data = data, size = 2, alpha = 0.7,


ggplot2::aes(x = X, y = svr_prediction ,
color = "Predicted with Support Vector
Regression"))+
ggtitle(title)

# Your Group Assignment to be submitted as a word file


# Conduct a multiple Regression model for the given dataset
#marketing dataset

data <- read.csv("E:/Advance Models/marketing.csv")


data <- data[-1]
head(data)
dim(data)

# Your problem: Develop the 3 regression models and get


# the corresponding rmse

# Response variable: sales


# Predictor variable1 = youtube advertising
# Predictor variable2 = facebook advertising
# Predictor variable3 = newspaper advertising

You might also like