0% found this document useful (0 votes)

122 views6 pages

STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1

This document contains solutions to homework problems from an Introduction to Statistical Learning course. It includes summaries of key concepts related to model flexibility, types of prediction problems, k-nearest neighbors classification, and code to analyze the MovieLens dataset. Sample code analyzes relationships between college characteristics and fits models to movie rating data.

Uploaded by

Babu Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views6 pages

STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1

Uploaded by

Babu Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

STAT 4540 Homework 1 Solution

1 ISLR 2.4.1
(a) We expect the performance of a flexible statistical learning method to be better. A more flexible
approach will fit the data closer and with the large sample size a better fit than an inflexible approach
would be obtained.
(b) We expect the performance of a flexible statistical learning method to be worse. A flexible method
would overfit the small number of observations.
(c) We expect the performance of a flexible statistical learning method to be better. With more
degrees of freedom, a flexible model would obtain a better fit.
(d) We expect the performance of a flexible statistical learning method to be worse. Flexible methods
will fit to the noise in the error terms and thus increase the variance.

2 ISLR 2.4.4
(a)

• Response variable: health status (ill/healthy); predictors: age, blood pressure, gender, etc. The
goal is prediction.
• Response variable: outcome of a test (fail/pass); predictors: hardness of the test, preparing time,
etc. The goal is prediction.
• Response variable: poll result (approve/against); predictors: socioeconomic status, eduction level,
age, etc. The goal is both inference and prediction.
(b)
• Response variable: stock market price; predictors: previous prices. The goal is prediction.
• Response variable: income; predictors: age, education level, gender, etc. The goal is both prediction
and inference.
• Response variable: working hours of a bulb; predictors: brand, price, type, etc. The goal is
prediction.
(c)

• Marketing survey.
• Movie rating.
• Symptoms of diseases.

1
3 ISLR 2.4.7
(a)
√
d(x1 , x0 ) = 32 = 3
√
d(x2 , x0 ) = 22 = 2
p
d(x3 , x0 ) = 12 + 32 ≈ 3.2
p
d(x4 , x0 ) = 12 + 22 ≈ 2.2
p
d(x5 , x0 ) = 12 + 12 ≈ 1.4
p
d(x6 , x0 ) = 12 + 12 + 12 ≈ 1.7.

(b) Our prediction is green since the single nearest neighbor is obs 5, with Y = green.
(c) Our prediction is red, since 3-nearest neighbors are obs 5, 6, 2, with corresponding Y = green, red,
red.
(d) Small. A small K would be flexible for a non-linear decision boundary, whereas a large K would try
to fit a more linear boundary because it takes more points into consideration.

4 ISLR 2.4.8
R code and output:
##(a)
college <- read.csv("College.csv", header = TRUE)

##(b)
rownames(college) = college[,1]
fix(college)

college=college[,-1]
fix(college)

##(c)
#(i)
summary(college)
Private Apps Accept Enroll Top10perc Top25perc F.Undergra
No :212 Min. : 81 Min. : 72 Min. : 35 Min. : 1.00 Min. : 9.0 Min. : 1
Yes:565 1st Qu.: 776 1st Qu.: 604 1st Qu.: 242 1st Qu.:15.00 1st Qu.: 41.0 1st Qu.: 9
Median : 1558 Median : 1110 Median : 434 Median :23.00 Median : 54.0 Median : 1707
Mean : 3002 Mean : 2019 Mean : 780 Mean :27.56 Mean : 55.8 Mean : 3700
3rd Qu.: 3624 3rd Qu.: 2424 3rd Qu.: 902 3rd Qu.:35.00 3rd Qu.: 69.0 3rd Qu.: 4005
Max. :48094 Max. :26330 Max. :6392 Max. :96.00 Max. :100.0 Max. :31643
P.Undergrad Outstate Room.Board Books Personal PhD
Min. : 1.0 Min. : 2340 Min. :1780 Min. : 96.0 Min. : 250 Min. : 8.00
1st Qu.: 95.0 1st Qu.: 7320 1st Qu.:3597 1st Qu.: 470.0 1st Qu.: 850 1st Qu.: 62.00
Median : 353.0 Median : 9990 Median :4200 Median : 500.0 Median :1200 Median : 75.00
Mean : 855.3 Mean :10441 Mean :4358 Mean : 549.4 Mean :1341 Mean : 72.66
3rd Qu.: 967.0 3rd Qu.:12925 3rd Qu.:5050 3rd Qu.: 600.0 3rd Qu.:1700 3rd Qu.: 85.00
Max. :21836.0 Max. :21700 Max. :8124 Max. :2340.0 Max. :6800 Max. :103.00
Terminal S.F.Ratio perc.alumni Expend Grad.Rate
Min. : 24.0 Min. : 2.50 Min. : 0.00 Min. : 3186 Min. : 10.00
1st Qu.: 71.0 1st Qu.:11.50 1st Qu.:13.00 1st Qu.: 6751 1st Qu.: 53.00
Median : 82.0 Median :13.60 Median :21.00 Median : 8377 Median : 65.00
Mean : 79.7 Mean :14.09 Mean :22.74 Mean : 9660 Mean : 65.46
3rd Qu.: 92.0 3rd Qu.:16.50 3rd Qu.:31.00 3rd Qu.:10830 3rd Qu.: 78.00
Max. :100.0 Max. :39.80 Max. :64.00 Max. :56233 Max. :118.00

2
#(ii)
pairs(college[,1:10])

#(iii)
plot(college$Private, college$Outstate)

#(iv)
Elite = rep("No", nrow(college))
Elite[college$Top10perc>50] = "Yes"
Elite = as.factor(Elite)
college = data.frame(college, Elite)

3
summary(college$Elite)
# No Yes
# 699 78
plot(college$Elite, college$Outstate)

# (v)
par(mfrow=c(2,2))
hist(college$Apps)
hist(college$perc.alumni, col=2)
hist(college$S.F.Ratio, col=3, breaks=10)
hist(college$Expend, breaks=100)

4
# (vi)
par(mfrow=c(1,2))
plot(college$Outstate, college$Grad.Rate)
# High tuition correlates to high graduation rate.
plot(college$Top10perc, college$Grad.Rate)
# Colleges with the most students from top 10% perc don’t necessarily
have the highest graduation rate.

5
R code:
dat <- read.table("u.data", sep = "\t")
colnames(dat) <- c("usrid", "movid", "rating", "timestamp")

dat$time <- as.POSIXct(dat$timestamp, origin="1970-01-01", tz="UTC")

rating <- dat[ c(1:3, 5)]

dat <- scan("u.item", what = rep("character", 24), sep = "\n", encoding = "UTF-8")
movdf <- matrix(NA_character_, length(dat), 24)
for (ii in 1:length(dat)) {
movdf[ii, ] <- strsplit(dat[ii], split = "\\|")[[1]]
}

colnames(movdf) <- c("movid", "title", "reldate", "vidreldate", "URL",

"unknown", "Action", "Adventure", "Animation",
"Children", "Comedy", "Crime", "Documentary", "Drama", "Fantasy",
"FilmNoir", "Horror", "Musical", "Mystery", "Romance",
"SciFi","Thriller", "War", "Western")

movie <- matrix(as.numeric(movdf[ , c(1, 6:24)]), nrow = nrow(movdf), ncol = length(c(1, 6:24)))
colnames(movie) <- colnames(movdf)[c(1, 6:24)]
head(movie)

5
action <- rowSums(movie[,c("Action", "Adventure", "Fantasy", "Horror", "SciFi", "Thriller")])
children <- rowSums(movie[,c("Animation", "Children")])
comedy <- rowSums(movie[,c("Comedy"), drop=FALSE])
drama <- rowSums(movie[,c("Crime", "Documentary", "Drama", "FilmNoir",
"Musical", "Mystery", "Romance", "War", "Western")])

genre <- cbind(action, children, comedy, drama)

genre <- genre[rating$movid, , drop=FALSE]

logit <- function(p) log(p / (1 - p))

pop1 <- aggregate(rating$rating > 3, by = list(rating$movid), sum)
pop2 <- aggregate(rating$rating > 0, by = list(rating$movid), sum)
pop <- logit((pop1[ , 2] ) / (pop2[ , 2]))
head(pop, n=5)
# 0.8962438 -0.4502010 -0.4989912 0.3381129 -0.1865860

popular <- pop[rating$movid]

x <- cbind(1, genre, popular)
y <- rating$rating

head(x, n=5)
# action children comedy drama popular
[1,] 1 0 2 1 0 1.1564319
[2,] 1 3 0 0 0 1.4160205
[3,] 1 1 0 0 0 -2.4849066
[4,] 1 1 0 1 1 0.2231436
[5,] 1 1 0 0 2 0.4519851

head(y, n=5)
# 3 3 1 2 1

AI ML Report
No ratings yet
AI ML Report
35 pages
Solutions Manual Using R Introductory ST
No ratings yet
Solutions Manual Using R Introductory ST
33 pages
Stat 151 - Final Review
No ratings yet
Stat 151 - Final Review
15 pages
Azure Data Factory
100% (4)
Azure Data Factory
16 pages
Machine Learning Coursera All Exercies
75% (12)
Machine Learning Coursera All Exercies
117 pages
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
100% (1)
Azure Data Factory Data Flows: Luke Newport Technical Specialist - Data & AI
30 pages
Data Science Interview Quesions
No ratings yet
Data Science Interview Quesions
22 pages
Individual Variable Data Analysis: Warning
No ratings yet
Individual Variable Data Analysis: Warning
38 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Advanced Statistical Methods Using R
No ratings yet
Advanced Statistical Methods Using R
32 pages
"Probability and Statistics": Assignment For
No ratings yet
"Probability and Statistics": Assignment For
22 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
No ratings yet
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
10 pages
Lab Manual Record: St. Josephs PG College
No ratings yet
Lab Manual Record: St. Josephs PG College
14 pages
Note 4
No ratings yet
Note 4
18 pages
Excel Analysis ToolPak Tutorial
No ratings yet
Excel Analysis ToolPak Tutorial
15 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Lec 13
No ratings yet
Lec 13
46 pages
R Assignment
No ratings yet
R Assignment
9 pages
Final Exam in Statistics
No ratings yet
Final Exam in Statistics
7 pages
Assignment 2 2020
No ratings yet
Assignment 2 2020
6 pages
Mathematical Theory of Deep
No ratings yet
Mathematical Theory of Deep
275 pages
PA Univariate R Solution
No ratings yet
PA Univariate R Solution
6 pages
Logistic Regression in R
No ratings yet
Logistic Regression in R
19 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
A Love Affair That Lasts A Lifetime: Cornwall
No ratings yet
A Love Affair That Lasts A Lifetime: Cornwall
14 pages
Regression2 Implementation
No ratings yet
Regression2 Implementation
29 pages
Project Advance Stats - Abhishek
No ratings yet
Project Advance Stats - Abhishek
14 pages
Multilevel Models in R Presente and Future
No ratings yet
Multilevel Models in R Presente and Future
8 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Assignment DDLJ
No ratings yet
Assignment DDLJ
8 pages
Experiment 5
No ratings yet
Experiment 5
13 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
Practical 10
No ratings yet
Practical 10
22 pages
2022 ML Assignments
No ratings yet
2022 ML Assignments
45 pages
STA3022Test2 2018
No ratings yet
STA3022Test2 2018
7 pages
Assignment 1 2
No ratings yet
Assignment 1 2
2 pages
Qurashi Mohd Mudassir: Objective
No ratings yet
Qurashi Mohd Mudassir: Objective
2 pages
Transcript - Challenges Working With Big Data
No ratings yet
Transcript - Challenges Working With Big Data
2 pages
Greenwood Intermediate Statistics With R
No ratings yet
Greenwood Intermediate Statistics With R
429 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
A New Forecasting Framework For Stock Market Index Value With An Overfitting Prevention LSTM Model
No ratings yet
A New Forecasting Framework For Stock Market Index Value With An Overfitting Prevention LSTM Model
24 pages
Assignment College
No ratings yet
Assignment College
6 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
Ds & ML Project (IBM)
No ratings yet
Ds & ML Project (IBM)
9 pages
Calibration: The Achilles Heel of Predictive Analytics: Opinion Open Access
No ratings yet
Calibration: The Achilles Heel of Predictive Analytics: Opinion Open Access
7 pages
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
No ratings yet
EDUC/PSY 6600: Unit 6 Homework: Categorical Data - Binomial and Chi Squared Tests
34 pages
EDUC/PSY 6600: Unit 2 Homework: Your Name Fall 2019
No ratings yet
EDUC/PSY 6600: Unit 2 Homework: Your Name Fall 2019
48 pages
Supervised ANN
No ratings yet
Supervised ANN
19 pages
Sample Q - A For Module 3 - 4
No ratings yet
Sample Q - A For Module 3 - 4
18 pages
Merge
No ratings yet
Merge
28 pages
Practical Notebook: Minhaz Uddin 2024-11-23
No ratings yet
Practical Notebook: Minhaz Uddin 2024-11-23
17 pages
HW5 Solution Fall 2024
No ratings yet
HW5 Solution Fall 2024
18 pages
Rose 2016
No ratings yet
Rose 2016
17 pages
2023-Scoring Predictors Stunting Based On The Epidemiological Triad
No ratings yet
2023-Scoring Predictors Stunting Based On The Epidemiological Triad
10 pages
Plagiarism
No ratings yet
Plagiarism
24 pages
Sta108hw4 1
No ratings yet
Sta108hw4 1
5 pages
Full Lecture
No ratings yet
Full Lecture
69 pages
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
No ratings yet
AI-Powered Predictive Analytics For Vehicle Maintenance Scheduling
16 pages
R Commands
No ratings yet
R Commands
5 pages
Analysis Course HW1
No ratings yet
Analysis Course HW1
5 pages
Module 2.9
No ratings yet
Module 2.9
11 pages
Research Paper by Rahul Sharma
No ratings yet
Research Paper by Rahul Sharma
15 pages
2101 F 12 Logistic Regression With R1
No ratings yet
2101 F 12 Logistic Regression With R1
10 pages
Ai&Ml: Unit-1
No ratings yet
Ai&Ml: Unit-1
28 pages
ISLR Solutions - Classification
No ratings yet
ISLR Solutions - Classification
20 pages
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
No ratings yet
Module 4: Recommended Exercises: Problem 1: KNN (Exercise 2.4.7 in ISL Textbook, Slightly Modified)
6 pages
Commands For Data Analysis Using R
No ratings yet
Commands For Data Analysis Using R
11 pages
Spatiotemporal Transformer
No ratings yet
Spatiotemporal Transformer
14 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
TRADECOVE
No ratings yet
TRADECOVE
42 pages
ISYE6501 Homework 1
No ratings yet
ISYE6501 Homework 1
7 pages
Rineng S 25 00942
No ratings yet
Rineng S 25 00942
55 pages
R Programming-1
No ratings yet
R Programming-1
6 pages
Ds
No ratings yet
Ds
2 pages
R Console
No ratings yet
R Console
6 pages
IBS Sample I
No ratings yet
IBS Sample I
10 pages
Test 1 Review A
No ratings yet
Test 1 Review A
7 pages
Regularization
No ratings yet
Regularization
19 pages
Stock Price Prediction Project (Possible Viva Questions)
No ratings yet
Stock Price Prediction Project (Possible Viva Questions)
5 pages
GWP3 G8525
No ratings yet
GWP3 G8525
18 pages
IV AI-DS AD3491 FDSA Unit5
No ratings yet
IV AI-DS AD3491 FDSA Unit5
35 pages
Bitspilani ML Ai Wilp
No ratings yet
Bitspilani ML Ai Wilp
31 pages
Final Report - CAB 420
No ratings yet
Final Report - CAB 420
13 pages
Atelier Regression Logistique
No ratings yet
Atelier Regression Logistique
4 pages
DAL Viva QnA
No ratings yet
DAL Viva QnA
4 pages
Module2 BDA
No ratings yet
Module2 BDA
44 pages
Tobit Models - R Data Analysis Examples
No ratings yet
Tobit Models - R Data Analysis Examples
9 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
WILP ASM End-Sem (Regular) Solutions
No ratings yet
WILP ASM End-Sem (Regular) Solutions
3 pages
WILP ASM End-Sem (Makeup) Solutions
No ratings yet
WILP ASM End-Sem (Makeup) Solutions
7 pages
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Trigonometric Ratios to Transformations (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
ACT Math Section and SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
3/5 (3)
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
From Everand
SAT Math Level 2 Subject Test Practice Problems 2013 Edition
Dr. David Kronmiller
1/5 (1)

STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1

Uploaded by

STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1

Uploaded by

STAT 4540 Homework 1 Solution

dat$time <- as.POSIXct(dat$timestamp, origin="1970-01-01", tz="UTC")

rating <- dat[ c(1:3, 5)]

colnames(movdf) <- c("movid", "title", "reldate", "vidreldate", "URL",

genre <- cbind(action, children, comedy, drama)

logit <- function(p) log(p / (1 - p))

popular <- pop[rating$movid]

You might also like