0% found this document useful (0 votes)

225 views27 pages

Stats216 hw4 PDF

The document discusses using bagging and random forests to predict weight based on body measurements from a dataset. It asks the reader to: 1) Plot test MSE vs number of trees for bagging and random forests, finding random forests performs worse. 2) Compare important variables between the two methods, finding they identify similar top variables. 3) Compare random forest test error to other models from homework 3, finding it does worse. 4) Determine if 500 trees is enough for the random forest, finding more trees do not further improve accuracy.

Uploaded by

Alex Nutkiewicz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

225 views27 pages

Stats216 hw4 PDF

Uploaded by

Alex Nutkiewicz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

3/17/2017 Homework__4.

html

Problem 1
Recall the body dataset from problem 4 of Homework 3. In that problem we used PCR and PLSR to predict
someones weight. Here we will re-visit this objective, using bagging and random forests. Start by setting aside
200 observations from your dataset to act as a test set, using the remaining 307 as a training set. Ideally, you
would be able to use your code from Homework 3 to select the same test set as you did on that problem.

load("/Users/alexnutkiewicz/Downloads/body.rdata")

#separation of data into testing and training data

set.seed(36)
testing = sample(seq_len(nrow(X)), size = 200)
OJtest = X[testing,]
OJtrain = X[-testing,]

Using the ranger package in CRAN, use Bagging and Random Forests to predict the weights in the test set, so
that you have two sets of predictions. Then answer the following questions:

library(ranger)
predictData = data.frame(Weight = Y$Weight[-testing],OJtrain)
#randomForest requires a fewer number of trees to be created than bagging
rf.Weight= ranger(Weight ~., data = predictData, mtry = sqrt(ncol(X)), importance = "imp
urity")

#bagging considers all X predictors when splitting the tree

bag.Weight = ranger(Weight ~., data = predictData, mtry = ncol(X), importance = "impurit
y")

The MSR and % variance explained are based on out of bag estimates. Because mtry = __, this is the number of
variables randomly chosen at each split.

a. Produce a plot of test MSE (as in Figure 8.8 in the text) as a function of number of trees for Bagging and
Random Forests. You should produce one plot with two curves: one corresponding to Bagging and the
other to Random Forests.

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 1/23
3/17/2017 Homework__4.html

rfMSE = rep(0,300)
bagMSE = rep(0,300)
for(i in 1:300){
rf.Weight= ranger(Weight ~., data = predictData, mtry = sqrt(ncol(X)), num.trees = i,
importance = "impurity")
rfPreds = predict(rf.Weight, data = OJtest)$predictions
bag.Weight = ranger(Weight ~., data = predictData, mtry = ncol(X), num.trees = i, impo
rtance = "impurity")
bagPreds = predict(bag.Weight, data = OJtest)$predictions
rf.MSE = mean(((rfPreds - Y$Weight[testing])^2))
rfMSE[i] = rf.MSE
bag.MSE = mean(((bagPreds - Y$Weight[testing])^2))
bagMSE[i] = bag.MSE
}
allData = data.frame(num = 1:300, rfMSE, bagMSE)

library(ggplot2)
library(reshape2)
#id.vars = variable you want to keep constant
iceCream = melt(allData, id.vars = "num")
ggplot(iceCream, aes(x = num, y = value, col = variable)) + geom_line() + labs(title =
"Test MSE of Random Forest and Bagging", x = "Number of Trees", y = "Test MSE")

b. Which variables does your random forest identify as most important? How do they compare with the most
important variables as identied by Bagging?

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 2/23
3/17/2017 Homework__4.html

set.seed(36)
rf.Weight= ranger(Weight ~., data = predictData, mtry = sqrt(ncol(X)), importance = "imp
urity")
ranger::importance(rf.Weight)

## Wrist.Diam Wrist.Girth Forearm.Girth

## 830.4598 2345.3725 6386.7745
## Elbow.Diam Bicep.Girth Shoulder.Girth
## 2169.4676 4098.4510 5176.0857
## Biacromial.Diam Chest.Depth Chest.Diam
## 826.5204 1406.5377 2656.3576
## Chest.Girth Navel.Girth Waist.Girth
## 7625.7852 825.9595 7453.7234
## Pelvic.Breadth Bitrochanteric.Diam Hip.Girth
## 301.2205 424.3708 2085.0039
## Thigh.Girth Knee.Diam Knee.Girth
## 1005.8425 934.6713 1831.6493
## Calf.Girth Ankle.Diam Ankle.Girth
## 1256.9831 267.7527 1120.4317

bag.Weight= ranger(Weight ~., data = predictData, mtry = ncol(X), importance = "impurit

y")
ranger::importance(bag.Weight)

## Wrist.Diam Wrist.Girth Forearm.Girth

## 301.34315 379.00006 7021.25154
## Elbow.Diam Bicep.Girth Shoulder.Girth
## 226.32558 3033.17893 3235.13185
## Biacromial.Diam Chest.Depth Chest.Diam
## 170.78835 297.45789 576.25497
## Chest.Girth Navel.Girth Waist.Girth
## 17175.25933 276.15938 10138.24990
## Pelvic.Breadth Bitrochanteric.Diam Hip.Girth
## 132.30439 188.58822 3237.71442
## Thigh.Girth Knee.Diam Knee.Girth
## 483.41152 526.93071 1784.72922
## Calf.Girth Ankle.Diam Ankle.Girth
## 900.09897 84.94235 429.03810

Based on the values above, we see that the most importance variables in Random Forest is Chest.Girth,
Forearm.Girth, and Waist.Girth. In Bagging, similarly, the most important variables are Forearm.Girth, Waist.Girth,
and Chest.Girth. So it seems like these methods would identify similar type of variables as being most important.

c. Compare the test error of your random forest (with 500 trees) against the test errors of the three methods
you evaluated in Homework 3. Does your random forest make better predictions than your predictions from
Homework 3?

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 3/23
3/17/2017 Homework__4.html

set.seed(36)
rf.500 = predict(rf.Weight, data = OJtest)$predictions
rf.MSE500 = mean(((rf.500 - Y$Weight[testing])^2))
rf.MSE500

## [1] 9.160329

The Test MSE values of our PCR, PLSR, lasso predictions were 8.562, 7.952, and 8.141, respectively. So,
compared with our Random Forest approach, it seems like it does worse, which makes sense because CART
approaches dont really do feature selection in their model creation.

d. The ranger() function uses 500 as the default number of trees. For this problem, is 500 enough trees? How
can you tell?

set.seed(36)
rf.2000 = ranger(Weight ~., data = predictData, num.trees = 2000, importance = "impurit
y")
preds2000 = predict(rf.2000, data = OJtest)$predictions
rf.MSE2000 = mean(((preds2000 - Y$Weight[testing])^2))
rf.MSE2000

## [1] 9.295284

After running the model at 2000 trees, we get a worse Test MSE, so clearly it doesnt matter at this point if we add
more trees to our model.

Problem 2
Here we explore the maximal margin classier on a toy data set.

a. We are given n = 7 observations in p = 2 dimensions. For each observation, there is an associated class
label. Sketch the observations.

X1 = c(5.29, 3.30, 7.30, 1.28, 3.32, 7.30, 7.29)

X2 = c(7.30, 3.29, 7.29, 7.30, 1.29, 5.31, 1.30)
Y = c("green", "green", "green", "green", "red", "red", "red")
plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 4/23
3/17/2017 Homework__4.html

b. Sketch the optimal separating hyperplane, and provide the equation for this hyperplane (of the form 0 +
1X1 + 2X2 = 0).

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

abline(a=-1, b=1)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 5/23
3/17/2017 Homework__4.html

Based on the plot, we see an optimal separating hyperplane between (3.32,1.28) and (3.30,3.29). Using
estimation, we nd that the equation for this line is 1 + X - X2 = 0.

c. Describe the classication rule for the maximal margin classier. It should be something along the lines of
Classify to Red if 0 + 1X1 + 2X2 > 0, and classify to Green otherwise. Provide the values for 0, 1,
and 2.

Classify to green if 0 + 1X + 2X < 0 and classify to red if 0 + 1X + 2X > 0. 0 = 1 1 = 1 2 = -1

d. On your sketch, indicate the margin for the maximal margin hyperplane. How wide is the margin?

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

abline(a=-1, b=1)
abline(a=-2, b=1, untf=FALSE, lty = 2)
abline(a=0, b=1, untf=FALSE, lty = 2)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 6/23
3/17/2017 Homework__4.html

The

maximal margin hyperplane 1 units wide.

e. Indicate the support vectors for the maximal margin classier.

All of the points along the support vectors are support vectors for the maximal margin classier.

f. Argue that a slight movement of the seventh observation would not aect the maximal margin hyperplane.

By moving the 7th observation, we would not aect the maximal margin hyperplane it only based on a small set of
observations. Because this observation is far from that hyperplane and support vectors, there should be a
minimal impact.

g. Sketch a hyperplane that is not the optimal separating hyperplane, and provide the equation for this
hyperplane.

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

abline(a=-0.55, b=2)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 7/23
3/17/2017 Homework__4.html

This

new hyperplane has the equation -0.55 + 2X1 - X2 = 0.

h. Draw an additional observation on the plot so that the two classes are no longer separable by a
hyperplane.

newX1 = c(5.29, 3.30, 7.30, 1.28, 3.32, 7.30, 7.29, 6.02)

newX2 = c(7.30, 3.29, 7.29, 7.30, 1.29, 5.31, 1.30, 2.19)
newY = c("green", "green", "green", "green", "red", "red", "red", "green")
plot(newX1, newX2, col = newY, xlim = c(0, 8), ylim = c(0, 8))

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 8/23
3/17/2017 Homework__4.html

no! As we can see there is a new 8th point inltrating the classied red section.

Problem 3
This problem involves the OJ data set which is part of the ISLR package.

a. Create a training set containing a random sample of 535 observations, and a test set containing the
remaining observations.

library(ISLR)
summary(OJ)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 9/23
3/17/2017 Homework__4.html

## Purchase WeekofPurchase StoreID PriceCH PriceMM

## CH:653 Min. :227.0 Min. :1.00 Min. :1.690 Min. :1.690
## MM:417 1st Qu.:240.0 1st Qu.:2.00 1st Qu.:1.790 1st Qu.:1.990
## Median :257.0 Median :3.00 Median :1.860 Median :2.090
## Mean :254.4 Mean :3.96 Mean :1.867 Mean :2.085
## 3rd Qu.:268.0 3rd Qu.:7.00 3rd Qu.:1.990 3rd Qu.:2.180
## Max. :278.0 Max. :7.00 Max. :2.090 Max. :2.290
## DiscCH DiscMM SpecialCH SpecialMM
## Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.05186 Mean :0.1234 Mean :0.1477 Mean :0.1617
## 3rd Qu.:0.00000 3rd Qu.:0.2300 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :0.50000 Max. :0.8000 Max. :1.0000 Max. :1.0000
## LoyalCH SalePriceMM SalePriceCH PriceDiff
## Min. :0.000011 Min. :1.190 Min. :1.390 Min. :-0.6700
## 1st Qu.:0.325257 1st Qu.:1.690 1st Qu.:1.750 1st Qu.: 0.0000
## Median :0.600000 Median :2.090 Median :1.860 Median : 0.2300
## Mean :0.565782 Mean :1.962 Mean :1.816 Mean : 0.1465
## 3rd Qu.:0.850873 3rd Qu.:2.130 3rd Qu.:1.890 3rd Qu.: 0.3200
## Max. :0.999947 Max. :2.290 Max. :2.090 Max. : 0.6400
## Store7 PctDiscMM PctDiscCH ListPriceDiff
## No :714 Min. :0.0000 Min. :0.00000 Min. :0.000
## Yes:356 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.140
## Median :0.0000 Median :0.00000 Median :0.240
## Mean :0.0593 Mean :0.02731 Mean :0.218
## 3rd Qu.:0.1127 3rd Qu.:0.00000 3rd Qu.:0.300
## Max. :0.4020 Max. :0.25269 Max. :0.440
## STORE
## Min. :0.000
## 1st Qu.:0.000
## Median :2.000
## Mean :1.631
## 3rd Qu.:3.000
## Max. :4.000

set.seed(36)
train = sample(1:nrow(OJ), 535)
OJ.train = OJ[train,]
OJ.test = OJ[-train,]

b. Fit a (linear) support vector classier to the training data using cost=0.05, with Purchase as the response
and the other variables as predictors. Use the summary() function to produce summary statistics about the
SVM, and describe the results obtained.

library(e1071)
set.seed(36)
OJ.svm = svm(Purchase~., data=OJ.train, kernel = "linear", cost = 0.05)
summary(OJ.svm)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 10/23
3/17/2017 Homework__4.html

##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "linear",
## cost = 0.05)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.05
## gamma: 0.05555556
##
## Number of Support Vectors: 262
##
## ( 131 131 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM

This summary shows us that the model selects 262 out of the 535 observations as support points to predict 2
classes, 131 in one class and 131 in the other.

c. What are the training and test error rates?

#training table and prediction rate

set.seed(36)
OJtrainPreds = predict(OJ.svm, newdata = OJ.train)
train.table = table(obs = OJ.train$Purchase, pred = OJtrainPreds)
train.table

## pred
## obs CH MM
## CH 267 46
## MM 46 176

1-sum(diag(train.table))/sum(train.table)

## [1] 0.1719626

#testing table and prediction rate

set.seed(36)
OJtestPreds = predict(OJ.svm, newdata = OJ.test)
test.table = table(obs = OJ.test$Purchase, pred = OJtestPreds)
test.table

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 11/23
3/17/2017 Homework__4.html

## pred
## obs CH MM
## CH 302 38
## MM 49 146

1-sum(diag(test.table))/sum(test.table)

## [1] 0.1626168

Based on the above classication results, were getting reasonablly close testing (80.9%) and training (83.7%)
classication rates.

d. Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

set.seed(36)
svmTune = tune(svm,Purchase~.,data=OJ.train,
ranges=list(cost=c(.01,.02,.05,.1,.2,.5,1,2,5,10)),kernel="linear")
summary(svmTune)

##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.2
##
## - best performance: 0.1718728
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.1832285 0.04877999
## 2 0.02 0.1868623 0.04710860
## 3 0.05 0.1850454 0.04910731
## 4 0.10 0.1756813 0.04388344
## 5 0.20 0.1718728 0.04230377
## 6 0.50 0.1811670 0.03851654
## 7 1.00 0.1792802 0.04410072
## 8 2.00 0.1793152 0.03971763
## 9 5.00 0.1830538 0.04395067
## 10 10.00 0.1812020 0.04424007

plot(svmTune)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 12/23
3/17/2017 Homework__4.html

tuning our SVM function, we found that many dierent values of cost result in similar amounts of error, roughly
around 17%. However, weve determined that 0.1 is our best parameter.

e. Compute the training and test error rates using this new value for cost.

newOJsvm = svm(Purchase~., data=OJ.train, kernel = "linear", cost = 0.1)

#training
newOJtrainPreds = predict(newOJsvm, newdata = OJ.train)
newtrain.table = table(obs = OJ.train$Purchase, pred = newOJtrainPreds)
newtrain.table

## pred
## obs CH MM
## CH 263 50
## MM 39 183

1-sum(diag(newtrain.table))/sum(newtrain.table)

## [1] 0.1663551

#testing
newOJtestPreds = predict(newOJsvm, newdata = OJ.test)
newtest.table = table(obs = OJ.test$Purchase, pred = newOJtestPreds)
newtest.table

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 13/23
3/17/2017 Homework__4.html

## pred
## obs CH MM
## CH 298 42
## MM 44 151

1-sum(diag(newtest.table))/sum(newtest.table)

## [1] 0.1607477

By creating a new SVM model and re-running a testing and training prediction code, we nd slightly improved
classication rates (83.4% and 81.5%). However, as we found from our code in part d, the costs result in similar
amounts of error, which explains why our improvement isnt that signicant.

f. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for
gamma.

radSVM = svm(Purchase~., data=OJ.train, kernel = "radial", cost = 0.05)

summary(radSVM)

##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "radial",
## cost = 0.05)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.05
## gamma: 0.05555556
##
## Number of Support Vectors: 447
##
## ( 222 225 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM

#training table and prediction rate

set.seed(36)
OJradtrainPreds = predict(radSVM, newdata = OJ.train)
radsvm.train.table = table(obs = OJ.train$Purchase, pred = OJradtrainPreds)
radsvm.train.table

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 14/23
3/17/2017 Homework__4.html

## pred
## obs CH MM
## CH 286 27
## MM 88 134

1-sum(diag(radsvm.train.table))/sum(radsvm.train.table)

## [1] 0.2149533

#testing table and prediction rate

set.seed(36)
OJradtestPreds = predict(radSVM, newdata = OJ.test)
radsvm.test.table = table(obs = OJ.test$Purchase, pred = OJradtestPreds)
radsvm.test.table

## pred
## obs CH MM
## CH 316 24
## MM 95 100

1-sum(diag(radsvm.test.table))/sum(radsvm.test.table)

## [1] 0.2224299

set.seed(36)
radialSVM = tune(svm , Purchase~. , data=OJ.train ,
ranges=list(cost=c(.01,.02,.05,.1,.2,.5,1,2,5,10)), kernel="radial")
summary(radialSVM)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 15/23
3/17/2017 Homework__4.html

##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.1866177
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.4149196 0.05553547
## 2 0.02 0.4149196 0.05553547
## 3 0.05 0.2356045 0.06045365
## 4 0.10 0.2018868 0.05730822
## 5 0.20 0.1980084 0.06225772
## 6 0.50 0.1885744 0.05898148
## 7 1.00 0.1866177 0.05977225
## 8 2.00 0.1978686 0.05464430
## 9 5.00 0.1959818 0.05531844
## 10 10.00 0.2034242 0.06308126

radialOJsvm = svm(Purchase~., data=OJ.train, kernel = "radial", cost = 1)

#training
radialOJtrainPreds = predict(radialOJsvm, newdata = OJ.train)
radtrain.table = table(obs = OJ.train$Purchase, pred = radialOJtrainPreds)
radtrain.table

## pred
## obs CH MM
## CH 275 38
## MM 49 173

1-sum(diag(radtrain.table))/sum(radtrain.table)

## [1] 0.1626168

#testing
radialOJtestPreds = predict(radialOJsvm, newdata = OJ.test)
radtest.table = table(obs = OJ.test$Purchase, pred = radialOJtestPreds)
radtest.table

## pred
## obs CH MM
## CH 312 28
## MM 62 133

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 16/23
3/17/2017 Homework__4.html

1-sum(diag(radtest.table))/sum(radtest.table)

## [1] 0.1682243

Now, using a radial kernel with a cost = 2, were getting better prediction rates (85.4% and 82.0%)!

g. Repeat parts (b) through (e) using a support vector machine with a polynomial kernel of degree 2.

polyOJSVM = svm(Purchase~., data=OJ.train, kernel = "polynomial", degree=2, cost = 0.05)

summary(polyOJSVM)

##
## Call:
## svm(formula = Purchase ~ ., data = OJ.train, kernel = "polynomial",
## degree = 2, cost = 0.05)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.05
## degree: 2
## gamma: 0.05555556
## coef.0: 0
##
## Number of Support Vectors: 427
##
## ( 212 215 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM

#training table and prediction rate

set.seed(36)
OJpolytrainPreds = predict(polyOJSVM, newdata = OJ.train)
polysvm.train.table = table(obs = OJ.train$Purchase, pred = OJpolytrainPreds)
polysvm.train.table

## pred
## obs CH MM
## CH 307 6
## MM 172 50

1-sum(diag(polysvm.train.table))/sum(polysvm.train.table)

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 17/23
3/17/2017 Homework__4.html

## [1] 0.3327103

#testing table and prediction rate

set.seed(36)
OJpolytestPreds = predict(polyOJSVM, newdata = OJ.test)
polysvm.test.table = table(obs = OJ.test$Purchase, pred = OJpolytestPreds)
polysvm.test.table

## pred
## obs CH MM
## CH 330 10
## MM 164 31

1-sum(diag(polysvm.test.table))/sum(polysvm.test.table)

## [1] 0.3252336

set.seed(36)
polySVM = tune(svm , Purchase~. , data=OJ.train , ranges=list(cost=c(.01,.02,.05,.1,.2,.
5,1,2,5,10)), kernel="polynomial", degree = 2)
summary(polySVM)

##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.19413
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.4149196 0.05553547
## 2 0.02 0.3851153 0.05320332
## 3 0.05 0.3458071 0.05901895
## 4 0.10 0.3233054 0.06237070
## 5 0.20 0.2858840 0.06399353
## 6 0.50 0.2278127 0.06978194
## 7 1.00 0.2221523 0.07434275
## 8 2.00 0.2016073 0.07409238
## 9 5.00 0.1941300 0.06193817
## 10 10.00 0.2016073 0.06151487

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 18/23
3/17/2017 Homework__4.html

polyOJsvm = svm(Purchase~., data=OJ.train, kernel = "polynomial", degree = 2, cost = 5)

#training
polyOJtrainPreds = predict(polyOJsvm, newdata = OJ.train)
polytrain.table = table(obs = OJ.train$Purchase, pred = polyOJtrainPreds)
polytrain.table

## pred
## obs CH MM
## CH 286 27
## MM 55 167

1-sum(diag(polytrain.table))/sum(polytrain.table)

## [1] 0.153271

#testing
polyOJtestPreds = predict(polyOJsvm, newdata = OJ.test)
polytest.table = table(obs = OJ.test$Purchase, pred = polyOJtestPreds)
polytest.table

## pred
## obs CH MM
## CH 309 31
## MM 69 126

1-sum(diag(polytest.table))/sum(polytest.table)

## [1] 0.1869159

Looking at our results, we see that a polynomial kernel doesnt quite beat out the results from our radial kernel
from earlier.

h. Repeat parts (b) through (e) using a linear support vector machine, applied to an expanded feature set
consisting of linear and all possible quadratic terms for the predictors. How does this compare to the
polynomial kernel both conceptually and in terms of the results for this problem?

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 19/23
3/17/2017 Homework__4.html

set.seed(315)
#2 refers to changing columns
#quadratic = apply(OJ[,-1], 2, as.numeric)
#quadratic = quadratic^2
#quadOJ = cbind(quadratic,OJ)

#newOJ = as.numeric(OJ[,-1])
#quadratic = do.call(poly, c(lapply(2:18, function(x) as.numeric(OJ[,x])), degree=2, raw
=TRUE))
#quadOJ = cbind(quadratic, OJ$Purchase)

newquadOJ = data.frame(model.matrix(~.^2-1, OJ))

keepquadOJ = newquadOJ[-c(1,2)]
quadOJ = cbind(OJ$Purchase, keepquadOJ)
colnames(quadOJ)[1] = "Purchase"

quadOJtrain = quadOJ[train,]
quadOJtest = quadOJ[-train,]

hquadSVM = svm(Purchase ~polym(PriceCH,PriceMM,DiscCH,DiscMM,LoyalCH,SalePriceMM,SalePri

ceCH,PriceDiff,PctDiscMM,PctDiscCH,ListPriceDiff, degree=2), data=quadOJtrain, kernel =
"linear", cost = 0.05)
summary(hquadSVM)

##
## Call:
## svm(formula = Purchase ~ polym(PriceCH, PriceMM, DiscCH, DiscMM,
## LoyalCH, SalePriceMM, SalePriceCH, PriceDiff, PctDiscMM,
## PctDiscCH, ListPriceDiff, degree = 2), data = quadOJtrain,
## kernel = "linear", cost = 0.05)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.05
## gamma: 0.01298701
##
## Number of Support Vectors: 256
##
## ( 127 129 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 20/23
3/17/2017 Homework__4.html

quadOJsvm = tune(svm, Purchase~polym(PriceCH,PriceMM,DiscCH,DiscMM,LoyalCH,SalePriceMM,S

alePriceCH,PriceDiff,PctDiscMM,PctDiscCH,ListPriceDiff, degree=2), data=quadOJtrain, ran
ges=list(cost=c(.01,.02,.05,.1,.2,.5,1,2,5,10)), kernel = "linear")
summary(quadOJsvm)

##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.01
##
## - best performance: 0.2242837
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.2242837 0.07316077
## 2 0.02 0.2673305 0.06020821
## 3 0.05 0.3026555 0.04910140
## 4 0.10 0.3009783 0.05457131
## 5 0.20 0.3046820 0.04299572
## 6 0.50 0.3066038 0.05260060
## 7 1.00 0.3028302 0.04732529
## 8 2.00 0.2990566 0.05611746
## 9 5.00 0.3008386 0.05382667
## 10 10.00 0.2989518 0.04491017

hquadtrainPreds = predict(hquadSVM, newdata = quadOJtrain)

hquadtrain.table = table(obs = quadOJtrain$Purchase, pred = hquadtrainPreds)
hquadtrain.table

## pred
## obs CH MM
## CH 278 35
## MM 62 160

1-sum(diag(hquadtrain.table))/sum(hquadtrain.table)

## [1] 0.1813084

hquadtestPreds = predict(hquadSVM, newdata = quadOJtest)

hquadtest.table = table(obs = quadOJtest$Purchase, pred = hquadtestPreds)
hquadtest.table

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 21/23
3/17/2017 Homework__4.html

## pred
## obs CH MM
## CH 304 36
## MM 64 131

1-sum(diag(hquadtest.table))/sum(hquadtest.table)

## [1] 0.1869159

quadsvm = svm(Purchase~polym(PriceCH,PriceMM,DiscCH,DiscMM,LoyalCH,SalePriceMM,SalePrice
CH,PriceDiff,PctDiscMM,PctDiscCH,ListPriceDiff, degree=2), data=quadOJtrain, kernel = "l
inear", cost = 0.01)

#training
quadOJtrainPreds = predict(quadsvm, newdata = OJ.train)
quadtrain.table = table(obs = OJ.train$Purchase, pred = quadOJtrainPreds)
quadtrain.table

## pred
## obs CH MM
## CH 275 38
## MM 65 157

1-sum(diag(quadtrain.table))/sum(quadtrain.table)

## [1] 0.1925234

#testing
quadOJtestPreds = predict(quadsvm, newdata = OJ.test)
quadtest.table = table(obs = OJ.test$Purchase, pred = quadOJtestPreds)
quadtest.table

## pred
## obs CH MM
## CH 301 39
## MM 65 130

1-sum(diag(quadtest.table))/sum(quadtest.table)

## [1] 0.1943925

Compared to the polynomial kernel, we see worse classication rates. A polynomial kernel has a more exible
decision boundary than a linear one, which would generally be better when dealing with a higher dimensional
space.

i. Overall, which approach seems to give the best results on this data?

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 22/23
3/17/2017 Homework__4.html

Overall, it seems like the polynomial kernel provides the best test error results on our data.

Problem 4
Consider a dataset with n observations, xi Rp for i = 1,,n. In this problem we show that the K-means
algorithm is guaranteed to converge but not necessarily to the globally optimal solution.

All solutions attached as images.

a. At the beginning of each iteration of the K-means algorithm, we have K clusters C1, , CK Rp, and each
data point is assigned to the cluster with the nearest centroid (at this point, the centroids are not
necessarily equal to the mean of the data points assigned to the cluster). Show (according to problem
specications):

b. Dene (according to problem specications):

c. Show that the K-means algorithm is guaranteed to converge.

d. Give, as an example, a toy data set and a pair of initial centroids for which the 2-means algorithm does not
converge to the globally optimal min.

le://localhost/Users/alexnutkiewicz/Desktop/STATS%20216/Homework__4.html 23/23
Problem 4a
Problem 4b

Problem 4c
Problem 4d
Problem 4d

Maxima by Example
No ratings yet
Maxima by Example
66 pages
2016dec 02323 02402 Solution en
No ratings yet
2016dec 02323 02402 Solution en
41 pages
Pset 6 - Fall2019 - Solutions PDF
100% (3)
Pset 6 - Fall2019 - Solutions PDF
33 pages
Worksheet Classification1
No ratings yet
Worksheet Classification1
15 pages
Pruebas de Bondad de Ajuste Con Library Rrisk en R
No ratings yet
Pruebas de Bondad de Ajuste Con Library Rrisk en R
35 pages
CHNGPT Code R
No ratings yet
CHNGPT Code R
25 pages
21 Nonparametric Regression
No ratings yet
21 Nonparametric Regression
18 pages
DS4E Notes 20230525 20230526
No ratings yet
DS4E Notes 20230525 20230526
15 pages
HW11數學規劃
No ratings yet
HW11數學規劃
14 pages
RCsplines
No ratings yet
RCsplines
18 pages
Data Science
No ratings yet
Data Science
15 pages
Regn Lect 3
No ratings yet
Regn Lect 3
10 pages
Ejercicio Análisis Discriminante
No ratings yet
Ejercicio Análisis Discriminante
20 pages
BES - R Lab 9
No ratings yet
BES - R Lab 9
7 pages
Regression Analysis Assignment1111
No ratings yet
Regression Analysis Assignment1111
13 pages
Question
No ratings yet
Question
7 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Adardaf
No ratings yet
Adardaf
29 pages
Make Up Cat
No ratings yet
Make Up Cat
6 pages
2 Econometrics
No ratings yet
2 Econometrics
6 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
p16 p26 Annotated
No ratings yet
p16 p26 Annotated
14 pages
Manova
No ratings yet
Manova
8 pages
Medidas de Tendencia Central 2020 PDF
No ratings yet
Medidas de Tendencia Central 2020 PDF
26 pages
DAafgfaga
No ratings yet
DAafgfaga
22 pages
332 3 Muscle - Mass
No ratings yet
332 3 Muscle - Mass
9 pages
Week 7 and Week 8
No ratings yet
Week 7 and Week 8
29 pages
Anova 2 Dec 2015
No ratings yet
Anova 2 Dec 2015
16 pages
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
No ratings yet
ASSIGNMENT NO - 2, FDAS - SUMANYAKUMARI - Bfia
6 pages
HW1 Solution Fall2024
No ratings yet
HW1 Solution Fall2024
11 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
Soluciones Unidad 3 Opcionales
No ratings yet
Soluciones Unidad 3 Opcionales
15 pages
Package Rfit': R Topics Documented
No ratings yet
Package Rfit': R Topics Documented
35 pages
7th Report
No ratings yet
7th Report
14 pages
WEEK
No ratings yet
WEEK
17 pages
Bodyfat Revisited, Revisited: Read - Delim With Plot
No ratings yet
Bodyfat Revisited, Revisited: Read - Delim With Plot
8 pages
7406HW02 1
No ratings yet
7406HW02 1
3 pages
Tutorial 11: For Relationship
No ratings yet
Tutorial 11: For Relationship
15 pages
RJournal 2012-2 Kloke+McKean
No ratings yet
RJournal 2012-2 Kloke+McKean
8 pages
A Regression Equation Model For Height and Weight
No ratings yet
A Regression Equation Model For Height and Weight
8 pages
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
No ratings yet
Lab Tutorial 9: Regression Modelling: 9.1 Fitting Linear Models: Linear Regression
4 pages
Zhi-Hua Zhou (Auth.) - Machine Learning (2021, Springer) (10.1007 - 978-981!15!1967-3) - Libgen - Li
100% (1)
Zhi-Hua Zhou (Auth.) - Machine Learning (2021, Springer) (10.1007 - 978-981!15!1967-3) - Libgen - Li
460 pages
Map Assign 8
No ratings yet
Map Assign 8
7 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
Assignment - 3.R: 2021-08-02 by Harshith H S 2001115
No ratings yet
Assignment - 3.R: 2021-08-02 by Harshith H S 2001115
12 pages
Exp 5
No ratings yet
Exp 5
7 pages
R Practice
No ratings yet
R Practice
38 pages
Practical Machine Learning
No ratings yet
Practical Machine Learning
11 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
R
No ratings yet
R
4 pages
Body Fat Prediction
No ratings yet
Body Fat Prediction
11 pages
Cappstone
No ratings yet
Cappstone
2 pages
B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished
No ratings yet
B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished
14 pages
Department of Statistics: COURSE STATS 330/762
No ratings yet
Department of Statistics: COURSE STATS 330/762
8 pages
HW4 Solutions: Problem 6.2
No ratings yet
HW4 Solutions: Problem 6.2
8 pages
Using Maple To Perform Least Squares Fit (Or Regression)
No ratings yet
Using Maple To Perform Least Squares Fit (Or Regression)
16 pages
Control Systems: GATE Objective & Numerical Type Solutions
No ratings yet
Control Systems: GATE Objective & Numerical Type Solutions
9 pages
Prak. Robotika Cerdas Tugas 2
0% (1)
Prak. Robotika Cerdas Tugas 2
7 pages
Final Quiz 2 3
No ratings yet
Final Quiz 2 3
4 pages
S&S PDF
No ratings yet
S&S PDF
224 pages
Digital Fundamentals
No ratings yet
Digital Fundamentals
19 pages
Solution: Homework#1
No ratings yet
Solution: Homework#1
7 pages
Python - Unit - IV - QB With Key
No ratings yet
Python - Unit - IV - QB With Key
17 pages
Lecture 6373 07
No ratings yet
Lecture 6373 07
53 pages
23UDSSBP301 DS - Lab
No ratings yet
23UDSSBP301 DS - Lab
26 pages
Taylor Series Multiple Variable Function
No ratings yet
Taylor Series Multiple Variable Function
1 page
FP Tree
No ratings yet
FP Tree
42 pages
Chess AI - Pseudo Code
100% (1)
Chess AI - Pseudo Code
3 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
3 Binary - Search
No ratings yet
3 Binary - Search
20 pages
Class Notes 02feb2023
No ratings yet
Class Notes 02feb2023
70 pages
3 ArtificialNeuralNetworks PDF
No ratings yet
3 ArtificialNeuralNetworks PDF
77 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Flood Prediction Using Supervised Machine Learning Algorithms
No ratings yet
Flood Prediction Using Supervised Machine Learning Algorithms
4 pages
Lecture 14
No ratings yet
Lecture 14
25 pages
ADA Manual Updated
No ratings yet
ADA Manual Updated
54 pages
Week 3
No ratings yet
Week 3
15 pages
The FPGA Implementation of The Digital Receiver 3
No ratings yet
The FPGA Implementation of The Digital Receiver 3
64 pages
Matched Filtering and Digital Pulse Amplitude Modulation (PAM)
No ratings yet
Matched Filtering and Digital Pulse Amplitude Modulation (PAM)
32 pages
Recurrence Relations 2
No ratings yet
Recurrence Relations 2
5 pages
Log (n+1) 1 H (N 1) /2
No ratings yet
Log (n+1) 1 H (N 1) /2
6 pages
CU-2021 B.Sc. (Honours) Mathematics Semester-VI Paper-CC-14P Practical QP
No ratings yet
CU-2021 B.Sc. (Honours) Mathematics Semester-VI Paper-CC-14P Practical QP
2 pages
Dip Cat 2
No ratings yet
Dip Cat 2
2 pages
Graphs Part 3: Applications of DFS: Application 1 DFS: Topological Sort
No ratings yet
Graphs Part 3: Applications of DFS: Application 1 DFS: Topological Sort
3 pages
Assignment 2 Ee 684 A
No ratings yet
Assignment 2 Ee 684 A
2 pages
Ian Talks JS A-Z: WebDevAtoZ, #1
From Everand
Ian Talks JS A-Z: WebDevAtoZ, #1
Ian Eress
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Stats216 hw4 PDF

Uploaded by

Stats216 hw4 PDF

Uploaded by

3/17/2017 Homework__4.

#separation of data into testing and training data

#bagging considers all X predictors when splitting the tree

## Wrist.Diam Wrist.Girth Forearm.Girth

bag.Weight= ranger(Weight ~., data = predictData, mtry = ncol(X), importance = "impurit

## Wrist.Diam Wrist.Girth Forearm.Girth

X1 = c(5.29, 3.30, 7.30, 1.28, 3.32, 7.30, 7.29)

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

Classify to green if 0 + 1X + 2X < 0 and classify to red if 0 + 1X + 2X > 0. 0 = 1 1 = 1 2 = -1

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

maximal margin hyperplane 1 units wide.

e. Indicate the support vectors for the maximal margin classier.

plot(X1, X2, col = Y, xlim = c(0, 8), ylim = c(0, 8))

new hyperplane has the equation -0.55 + 2X1 - X2 = 0.

newX1 = c(5.29, 3.30, 7.30, 1.28, 3.32, 7.30, 7.29, 6.02)

## Purchase WeekofPurchase StoreID PriceCH PriceMM

c. What are the training and test error rates?

#training table and prediction rate

#testing table and prediction rate

newOJsvm = svm(Purchase~., data=OJ.train, kernel = "linear", cost = 0.1)

radSVM = svm(Purchase~., data=OJ.train, kernel = "radial", cost = 0.05)

#training table and prediction rate

#testing table and prediction rate

radialOJsvm = svm(Purchase~., data=OJ.train, kernel = "radial", cost = 1)

polyOJSVM = svm(Purchase~., data=OJ.train, kernel = "polynomial", degree=2, cost = 0.05)

#training table and prediction rate

#testing table and prediction rate

polyOJsvm = svm(Purchase~., data=OJ.train, kernel = "polynomial", degree = 2, cost = 5)

newquadOJ = data.frame(model.matrix(~.^2-1, OJ))

hquadSVM = svm(Purchase ~polym(PriceCH,PriceMM,DiscCH,DiscMM,LoyalCH,SalePriceMM,SalePri

quadOJsvm = tune(svm, Purchase~polym(PriceCH,PriceMM,DiscCH,DiscMM,LoyalCH,SalePriceMM,S

hquadtrainPreds = predict(hquadSVM, newdata = quadOJtrain)

hquadtestPreds = predict(hquadSVM, newdata = quadOJtest)

All solutions attached as images.

b. Dene (according to problem specications):

c. Show that the K-means algorithm is guaranteed to converge.

You might also like