0% found this document useful (0 votes)

15 views12 pages

ST201 2024

The Spring 2024 ST201 exam consists of four questions, with candidates required to answer three, including a compulsory question. The exam focuses on statistical models and data analysis, particularly regarding used car prices and diabetes data, emphasizing regression analysis and interpretation of results. Candidates are provided with statistical tables, graph paper, and allowed to use calculators during the 2-hour writing period.

Uploaded by

gisergo10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views12 pages

ST201 2024

Uploaded by

gisergo10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Spring 2024 Exam

ST201
Statistical Models and Data Analysis

Suitable for all candidates

Instructions to candidates

This paper contains FOUR questions. Answer THREE questions. Question 1 is compulsory, and
two of the other three questions need to be answered. Question 1 is worth 40 marks, and each
of Questions 2-4 is worth 30 marks. The total mark is 100. The numbers in brackets beside
each question indicate the marks available for that part of the question.

Candidates must not take the question paper away with them after the examination. Please
place the question paper within your answer booklet at the end of the examination.

Time allowed Reading Time: None

Writing Time: 2 hours
You are supplied with: Murdoch & Barnes Statistical Tables, 4th edition
Graph paper
You may also use: No additional materials
Calculators: Calculators are allowed in this exam

©LSE ST 2024/ST201 Page 1 of 12

1. Consider data on used cars for a car market in 2021. The data has 3,457 observations,
each corresponding to a used car on sale. The data contains the following eight variables.

• Year: Year when the car was ﬁrst bought.

• Age: Number of years that the car has been used. It is calculated as Age = 2021 -
Year, where Year is the year when the car was ﬁrst bought.
• Price: Price (in US dollars) at which the car is being sold.
• Usage: The mileage (km).
• Fuel: Fuel type of car (petrol/diesel/CNG/LPG).
• Seller: Whether the seller is an individual or a dealer (Individual/Dealer).
• Transmission: Gear transmission of the car (Automatic/Manual).
• Owner: Number of previous owners of the car (1/2/3/4 or more).

The goal is to understand how the price of a used car depends on its characteristics.

(a) Based on the above descriptions of the variables, explain their variable types. [4 marks]
(b) The variable Age is believed to be a key predictor of the sale price. We explore the
relationship between Age and Price.
i. Output 1(a) gives two scatter plots. Panel (a) plots Age versus Price, and Panel
(b) plots Age versus log(Price). Based on these two plots, comment on the rela-
tionship between Age and Price. [3 marks]
ii. If you are asked to fit a simple linear regression model to understand the rela-
tionship between Price and Age, would you choose Price or log(Price) as the
response variable? Give the reasons for your choice. [4 marks]
iii. Output 1(b) gives the R output for a simple linear regression model, which re-
gresses log(Price) onto Age. Interpret the estimated regression coefficient of
Age.
[3 marks]
(c) The sale price of a car is believed to also depend on whether the seller is an individual
or a dealer.
i. Output 1(c) gives a boxplot for log(Price) versus Seller. Comment on the relation-
ship between Price and Seller. [2 marks]
ii. Output 1(d) gives the R output for a simple linear regression model, which re-
gresses log(Price) onto Seller. Interpret the estimated regression coefficient.
[2 marks]
(d) We then run multiple linear regression models, regressing log(Price) onto car char-
acteristics.
i. Output 1(e) gives the R output for a multiple linear regression model, which in-
cludes all the variables. We notice that the regression coefficient for Year is not
obtained, as indicated by “NA". Explain why this happens. Give the name of this
issue and suggest a way to handle the issue. [4 marks]
ii. Output 1(f) gives the R output for a multiple linear regression model, which in-
cludes all the available variables except for Year. Interpret the R2 value of the
obtained model. [3 marks]
iii. Give the formula for calculating the adjusted R value and explain why this value
2

is needed in addition to the R2 value. [3 marks]

©LSE ST 2024/ST201 Page 2 of 12

iv. Explain how dummy variables are created for the variable Owner in the model in
Output 1(f) and interpret the corresponding estimated coefﬁcients. [4 marks]
v. Predict the sale price of a car in this market in 2021. The information about this
car is given in Output 1(g). [4 marks]
vi. Add an interaction term between Age and Seller into the regression model. Write
down the model equation and interpret the regression coefﬁcient for the interac-
tion term. [4 marks]

Output 1(a)

Output 1(b)

> res1 = lm(log(Price)~Age, data = data.train)

> summary(res1)

Call:
lm(formula = log(Price) ~ Age, data = data.train)

Residuals:
Min 1Q Median 3Q Max
-2.17813 -0.40370 -0.01431 0.28945 3.14327

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.283008 0.021939 240.80 <2e-16 ***
Age -0.138769 0.002449 -56.67 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6035 on 3455 degrees of freedom

Multiple R-squared: 0.4817,Adjusted R-squared: 0.4816
F-statistic: 3211 on 1 and 3455 DF, p-value: < 2.2e-16

©LSE ST 2024/ST201 Page 3 of 12

Output 1(c)

Output 1(d)

> res2 = lm(log(Price)~Seller, data = data.train)

> summary(res2)

Call:
lm(formula = log(Price) ~ Seller, data = data.train)

Residuals:
Min 1Q Median 3Q Max
-3.1828 -0.5247 -0.0139 0.5167 2.8199

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.60070 0.02725 168.85 <2e-16 ***
SellerIndividual -0.55617 0.03149 -17.66 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8028 on 3455 degrees of freedom

Multiple R-squared: 0.08283,Adjusted R-squared: 0.08257
F-statistic: 312 on 1 and 3455 DF, p-value: < 2.2e-16

Output 1(e)

> res3 = lm(log(Price)~., data = data.train)

> res3

Call:
lm(formula = log(Price) ~ ., data = data.train)

Coefficients:
(Intercept) Age Usage FuelDiesel
9.903e+00 -1.141e-01 -3.052e-07 6.123e-01
FuelLPG FuelPetrol SellerIndividual TransmissionManual
-4.340e-02 9.409e-02 -2.008e-01 -7.662e-01
Owner4_or_more Owner2 Owner3 Year
-1.106e-01 -5.020e-02 -9.845e-02 NA

©LSE ST 2024/ST201 Page 4 of 12

Output 1(f)

> res4 = lm(log(Price)~ .-Year, data = data.train)

> summary(res4)

Call:
lm(formula = log(Price) ~ . - Year, data = data.train)

Residuals:
Min 1Q Median 3Q Max
-1.7556 -0.2997 -0.0005 0.2941 2.4130

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.903e+00 8.545e-02 115.894 < 2e-16 ***
Age -1.141e-01 2.381e-03 -47.917 < 2e-16 ***
Usage -3.052e-07 2.095e-07 -1.457 0.14532
FuelDiesel 6.123e-01 8.113e-02 7.547 5.67e-14 ***
FuelLPG -4.340e-02 1.375e-01 -0.316 0.75236
FuelPetrol 9.409e-02 8.118e-02 1.159 0.24651
SellerIndividual -2.008e-01 1.990e-02 -10.091 < 2e-16 ***
TransmissionManual -7.662e-01 2.730e-02 -28.070 < 2e-16 ***
Owner4_or_more -1.106e-01 6.144e-02 -1.800 0.07201 .
Owner2 -5.020e-02 2.071e-02 -2.424 0.01539 *
Owner3 -9.845e-02 3.548e-02 -2.775 0.00555 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4736 on 3446 degrees of freedom

Multiple R-squared: 0.6816,Adjusted R-squared: 0.6807
F-statistic: 737.8 on 10 and 3446 DF, p-value: < 2.2e-16

Output 1(g)

Age Usage Fuel Seller Transmission Owner Year

9 100000 Diesel Individual Manual 1 2012

©LSE ST 2024/ST201 Page 5 of 12

2. In this question, we continue to investigate the data in Question 1.

(a) Consider the multiple linear regression model in Output 1(f).

i. Output 2(a) gives the residual plot for this model. Name two assumptions as-
sessed by this plot and explain these two assumptions. [4 marks]
ii. Is there evidence of the violation of these two assumptions based on the plot in
Output 2(a)? Explain your answer. [4 marks]
iii. Explain the concept of outlier and give the name of the statistic that is used to
detect outliers. [3 marks]
iv. Explain the concept of high-leverage observation. [2 marks]
v. Output 2(b) shows a plot for the hat values. Are there observations that have
high leverage? Explain your answer. [2 marks]
(b) We perform model selection with the model in Output 1(f) being the full model.
i. Suppose that we use the backward stepwise selection procedure. How many
models will be searched through? [3 marks]
ii. Suppose that we use the best subset selection procedure. How many models
will be searched through? [3 marks]
iii. Name three evaluation criteria that can be used to compare the candidate mod-
els. [3 marks]
iv. For each criterion you give in your answer to Question 2(b)iii, how would you
choose the model? Do you choose the model with the largest or smallest crite-
rion value? [3 marks]
v. Explain the concept of overﬁtting. [3 marks]

Output 2(a)

Output 2(b)

©LSE ST 2024/ST201 Page 6 of 12

3. In this question, we analyse a dataset on diabetes. The goal is two-fold. First, we hope
to understand the relationship between having diabetes and diagnostic measurements
of health conditions, such as blood pressure and body mass index. Second, we hope
to predict whether a patient has diabetes based on the diagnostic measurements. The
following variables are available.

• Glucose: Two-hour plasma glucose concentration in an oral glucose tolerance test

(mg/dL)
• BloodPressure: Diastolic blood pressure (mmHg)
• SkinThickness: Triceps skin fold thickness (mm)
• Insulin: Two-Hour serum insulin (muU/ml)
• BMI: Body mass index (weight in kg/(height in m)2 )
• Age: Age (years)
• Diet: A measure of whether the person continuously followed a healthy diet routine;
1 if yes and 0 otherwise
• Diabetes: 1 if the person has diabetes and 0 otherwise

(a) Suppose that a researcher tries to understand whether continuously following a healthy
diet routine (measured by variable Diet) can reduce the risk of having diabetes.
i. Explain the concept of a confounding variable in the current context. [3 marks]
ii. Give an example of a potential confounder variable. Explain your answer. [2 marks]
(b) Output 3(a) gives the result of logistic regression, regressing Diabetes on the rest of
the variables.
i. Construct a 95% confidence interval for the coefficient for Diet and interpret this
confidence interval. You may need to use the supplied Murdoch & Barnes Statis-
tical Tables. [4 marks]
ii. Give the formula for calculating the deviance residuals and explain what the de-
viance residuals can be used for. [4 marks]
iii. Given an observation whose health diagnostic measurements are given in Output
3(b), predict whether they have diabetes. Explain your answer. [3 marks]
(c) Suppose you include some polynomial terms of age and fit a new model based on
the training set. The result is shown in Output 3(c). Compare this model with the
one in Output 3(a) using the likelihood ratio test. Write down the test statistic, the
reference distribution, and the p-value. You may need to use the supplied Murdoch &
Barnes Statistical Tables. [3 marks]
(d) We now evaluate the two logistic regression models in Output 3(a) and Output 3(c)
in terms of their performance on a validation set. Output 3(d) gives two ROC curves,
with the solid one from the model in Output 3(a) and the dashed one from the model
in Output 3(c).
i. Explain how each point on an ROC curve is obtained. [4 marks]
ii. Give the name of a statistic that can be used for comparing the two models
based on their ROC curves. [3 marks]
(e) Linear discriminant analysis can also be used for predicting Diabetes. Describe the
assumptions of linear discriminant analysis. [4 marks]

Output 3(a)

> res.glm1 = glm(Diabetes~., family = binomial, data = train)

> summary(res.glm1)

Call:
glm(formula = Diabetes ~ ., family = binomial, data = train)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.8797516 0.7521290 -10.477 < 2e-16 ***
Glucose 0.0327814 0.0039500 8.299 < 2e-16 ***
BloodPressure -0.0078977 0.0060381 -1.308 0.191
SkinThickness 0.0074762 0.0075869 0.985 0.324
Insulin -0.0008366 0.0009917 -0.844 0.399
BMI 0.0687947 0.0164366 4.185 2.85e-05 ***
Age 0.0388568 0.0091365 4.253 2.11e-05 ***
Diet 0.0708684 0.2294000 0.309 0.757
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 793.94 on 613 degrees of freedom

Residual deviance: 607.04 on 606 degrees of freedom
AIC: 623.04

Output 3(b)

Glucose BloodPressure SkinThickness Insulin BMI Age Diabetes Diet

148 72 35 0 33.6 50 1 0

Output 3(c)

> res.glm2 = glm(Diabetes~.+ I(Age^2)+I(Age^3), family = binomial, data = train)

> summary(res.glm2)

Call:
glm(formula = Diabetes ~ . + I(Age^2) + I(Age^3), family = binomial,
data = train)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.089e+01 3.776e+00 -2.885 0.003915 **
Glucose 3.442e-02 4.099e-03 8.398 < 2e-16 ***
BloodPressure -7.921e-03 6.272e-03 -1.263 0.206629
SkinThickness 6.539e-03 7.678e-03 0.852 0.394411
Insulin -7.388e-04 1.019e-03 -0.725 0.468357
BMI 5.824e-02 1.694e-02 3.438 0.000586 ***
Age 1.583e-01 2.984e-01 0.531 0.595745
Diet 8.956e-02 2.356e-01 0.380 0.703893
I(Age^2) 7.940e-04 7.479e-03 0.106 0.915450
I(Age^3) -3.579e-05 5.924e-05 -0.604 0.545737
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 793.94 on 613 degrees of freedom

Residual deviance: 577.61 on 604 degrees of freedom
AIC: 597.61

Output 3(d)

4. Consider a regression problem with a continuous response variable Y and a p-dimensional
predictor vector X = (X1 , ..., Xp )⊤ .

(a) A regression model is trained based on a training set. Let the obtained regression
function be fˆ, which maps a p-dimensional vector to a real value. Suppose you are
given a test dataset, denoted by (x̃i , ỹi ), i = 1, ..., m. Write down the formula for
calculating the test mean squared error based on the test dataset. [3 marks]
(b) Explain the meanings of the following terms in a prediction problem:
• bias,
• variance,
• irreducible error.
Explain how the test error can be decomposed based on these quantities. [5 marks]
(c) A regression tree is trained on the car sales data in Questions 1 and 2, and the result
is shown in Output 4(a).
i. You are given a test observation whose predictor vector is shown in Output 4(b).
Predict the value of Y for this test observation. [3 marks]
ii. Explain why tree pruning is typically needed. [4 marks]
iii. Output 4(c) gives a plot from performing a ten-fold cross-validation for tree prun-
ing. Explain the meaning of tree size on the X-axis of the plot. [3 marks]
iv. Based on Output 4(c), what tree size would you choose? Explain your answer.
[3 marks]
(d) Bagging and random forest are further applied to the data. Output 4(d) gives out-of-
bag error for the bagging and random forest results.
i. Explain the meaning of m in the legend in Output 4(d). [3 marks]
ii. Explain the concept of out-of-bag error. [4 marks]
iii. Based on this output, which method would you choose for making predictions?
Explain your answer. [2 marks]

Output 4(a)

Output 4(b)

Age Price Usage Fuel Seller Transmission Owner Year

14 821.9178 70000 Petrol Individual Manual 1 2007

Output 4(c)

Output 4(d)

MAST90083 2021 S2 Exam Paper
No ratings yet
MAST90083 2021 S2 Exam Paper
4 pages
Thailand's Basic Education Core Curriculum 2551 (English Version)
80% (5)
Thailand's Basic Education Core Curriculum 2551 (English Version)
308 pages
Part 1 - Basic Research Method
100% (2)
Part 1 - Basic Research Method
222 pages
04 Chapter 3
95% (19)
04 Chapter 3
11 pages
Training & Development in Yes Bank
100% (1)
Training & Development in Yes Bank
69 pages
Attitudes Towards Research
No ratings yet
Attitudes Towards Research
11 pages
Completely Randomized Design
No ratings yet
Completely Randomized Design
34 pages
Anova
100% (2)
Anova
49 pages
Tourism Measurements
100% (1)
Tourism Measurements
17 pages
BRM Assignment-1: Submitted To: Prof. Bhuwandeep Singh Submitted By: Kalpana Das 22202081 Mba 1 Section B
No ratings yet
BRM Assignment-1: Submitted To: Prof. Bhuwandeep Singh Submitted By: Kalpana Das 22202081 Mba 1 Section B
3 pages
Feature Selection For Machine Learning Based Iot Botnet Attack Detection
No ratings yet
Feature Selection For Machine Learning Based Iot Botnet Attack Detection
98 pages
Tutorial 02 Probabilistic Analysis
No ratings yet
Tutorial 02 Probabilistic Analysis
13 pages
Statistic and Probability Report
No ratings yet
Statistic and Probability Report
20 pages
Syllabus-Pavement Analysis and Design
No ratings yet
Syllabus-Pavement Analysis and Design
24 pages
Knowledge and Practices of Radiographers Regarding Infection Control in Radiology Departments in Malawi
No ratings yet
Knowledge and Practices of Radiographers Regarding Infection Control in Radiology Departments in Malawi
5 pages
Ch08 (Hypothesis Testing)
No ratings yet
Ch08 (Hypothesis Testing)
28 pages
Group Assignment Cover Sheet: Student Details
No ratings yet
Group Assignment Cover Sheet: Student Details
27 pages
MJC - The STJM Command
No ratings yet
MJC - The STJM Command
38 pages
Gec 4 Bsoa
No ratings yet
Gec 4 Bsoa
13 pages
Session7 LinearRegression
No ratings yet
Session7 LinearRegression
52 pages
ECE5550-Notes04 - Kopya
No ratings yet
ECE5550-Notes04 - Kopya
50 pages
Selfie Aging Index: An Index For The Self-Assessment of Healthy and Active Aging
No ratings yet
Selfie Aging Index: An Index For The Self-Assessment of Healthy and Active Aging
10 pages
Test Your Knowledge of Linear Regression and PCA in R
No ratings yet
Test Your Knowledge of Linear Regression and PCA in R
7 pages
ES031 M3 HypothesisTestingSingleSample
No ratings yet
ES031 M3 HypothesisTestingSingleSample
55 pages
Al-Powered Educational Tools and Pedagogical Performance of Teachers in SelectedSecondary Schools: Basis in Crafting Al-Based Pedagogical Innovation
0% (1)
Al-Powered Educational Tools and Pedagogical Performance of Teachers in SelectedSecondary Schools: Basis in Crafting Al-Based Pedagogical Innovation
17 pages
Cost Modelling of Office Buildings in Hong Kong
No ratings yet
Cost Modelling of Office Buildings in Hong Kong
1 page
Example Metrics - Final Assignment - WS1920 - SH
No ratings yet
Example Metrics - Final Assignment - WS1920 - SH
9 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
No ratings yet
DS100-2-Grp#4 Chapter 6 Advanced Analytical Theory and Methods Regression (CADAY, CASTOR, CRUZ, SANORIA, TAN)
4 pages
DS - Tute 2
No ratings yet
DS - Tute 2
15 pages
ASC Assignment 1 and 2 Questions
No ratings yet
ASC Assignment 1 and 2 Questions
2 pages
Ec220 ST2021-exam
No ratings yet
Ec220 ST2021-exam
7 pages
The University of Auckland: Second Semester, 2004 Campus: City
No ratings yet
The University of Auckland: Second Semester, 2004 Campus: City
23 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
11 pages
Stat 212: Business Statistics Ii
No ratings yet
Stat 212: Business Statistics Ii
9 pages
20BCE1205 Lab3
No ratings yet
20BCE1205 Lab3
9 pages
Writing
No ratings yet
Writing
8 pages
IE 451 Fall 2023-2024 Homework 4 Solutions
No ratings yet
IE 451 Fall 2023-2024 Homework 4 Solutions
19 pages
Answer The Questions!: Lecturer: László Koltay, Max 30 Points
No ratings yet
Answer The Questions!: Lecturer: László Koltay, Max 30 Points
4 pages
ps3 PDF
No ratings yet
ps3 PDF
3 pages
19MCMS017012 ARUN REDDY Assignment - Summer Semester - Business Mathematics 2 - BBA - 2018 - 19
No ratings yet
19MCMS017012 ARUN REDDY Assignment - Summer Semester - Business Mathematics 2 - BBA - 2018 - 19
9 pages
Assignment Econ6034 2023 s1
No ratings yet
Assignment Econ6034 2023 s1
7 pages
Tut Sol Week12
No ratings yet
Tut Sol Week12
8 pages
Assignment 2 Question
No ratings yet
Assignment 2 Question
9 pages
Text Problems Solved
No ratings yet
Text Problems Solved
9 pages
QM-II Midterm OCT 2014 Solution
No ratings yet
QM-II Midterm OCT 2014 Solution
19 pages
1 Module 1
No ratings yet
1 Module 1
32 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Assignment 2
100% (1)
Assignment 2
8 pages
Homework 2
100% (1)
Homework 2
12 pages
Final Exam S1 2007
No ratings yet
Final Exam S1 2007
9 pages
HW1 2023
No ratings yet
HW1 2023
4 pages
Week2 Excel Problem Statement Real Estate-1
No ratings yet
Week2 Excel Problem Statement Real Estate-1
2 pages
STAT 1043 Homework 9
No ratings yet
STAT 1043 Homework 9
3 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Problem Statement - Excel Project - Treo's Real Estate
No ratings yet
Problem Statement - Excel Project - Treo's Real Estate
3 pages
Excel Project - Investment Firm
No ratings yet
Excel Project - Investment Firm
3 pages
ISLP - Website 135 200
No ratings yet
ISLP - Website 135 200
66 pages
VaibhavKumar Extendedproject PDF
100% (2)
VaibhavKumar Extendedproject PDF
10 pages
ISLP - Website-135-200 (1) - 1-60
No ratings yet
ISLP - Website-135-200 (1) - 1-60
60 pages
Anshul Dyundi Predictive Modelling Alternate Project July 2022
No ratings yet
Anshul Dyundi Predictive Modelling Alternate Project July 2022
11 pages
Group 7 Research
No ratings yet
Group 7 Research
16 pages
Linear Regression Quiz
No ratings yet
Linear Regression Quiz
6 pages
CHRIST (Deemed To Be University), Bangalore - 560 029
No ratings yet
CHRIST (Deemed To Be University), Bangalore - 560 029
3 pages
Devidutta Predictive Modeling PDF
No ratings yet
Devidutta Predictive Modeling PDF
25 pages
Assignment 2 Question
No ratings yet
Assignment 2 Question
4 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
5 pages
Lab2-Markdown XFL (CLEAN)
No ratings yet
Lab2-Markdown XFL (CLEAN)
7 pages
Assignment 2
No ratings yet
Assignment 2
1 page
Study-On-Ratio-Analysis-Of-Siddartha Bank LTD
No ratings yet
Study-On-Ratio-Analysis-Of-Siddartha Bank LTD
35 pages
HW12
No ratings yet
HW12
10 pages
STA 3201 Introduction To Econometrics - FT - DEC - 22
No ratings yet
STA 3201 Introduction To Econometrics - FT - DEC - 22
4 pages
Past Exam
No ratings yet
Past Exam
5 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Fin - Man305 Term Test 2 2024
No ratings yet
Fin - Man305 Term Test 2 2024
8 pages
STAT 31631 - Statistical Modeling - Assignment01
No ratings yet
STAT 31631 - Statistical Modeling - Assignment01
2 pages
CS-30004 (Dsa) - CS End Nov 2024
No ratings yet
CS-30004 (Dsa) - CS End Nov 2024
17 pages
3602final Question
No ratings yet
3602final Question
18 pages
Ec486 2017
No ratings yet
Ec486 2017
8 pages
Graded Homework 1 Solutions
No ratings yet
Graded Homework 1 Solutions
19 pages
ST201 Exam Final
No ratings yet
ST201 Exam Final
11 pages
Homework1 1
No ratings yet
Homework1 1
3 pages
Assignment 1
No ratings yet
Assignment 1
11 pages
NP069828 Psmod Lab
No ratings yet
NP069828 Psmod Lab
8 pages
Eco 15
No ratings yet
Eco 15
3 pages

ST201 2024

Uploaded by

ST201 2024

Uploaded by

Spring 2024 Exam

Suitable for all candidates

Time allowed Reading Time: None

©LSE ST 2024/ST201 Page 1 of 12

• Year: Year when the car was ﬁrst bought.

is needed in addition to the R2 value. [3 marks]

©LSE ST 2024/ST201 Page 2 of 12

> res1 = lm(log(Price)~Age, data = data.train)

Residual standard error: 0.6035 on 3455 degrees of freedom

©LSE ST 2024/ST201 Page 3 of 12

> res2 = lm(log(Price)~Seller, data = data.train)

Residual standard error: 0.8028 on 3455 degrees of freedom

> res3 = lm(log(Price)~., data = data.train)

©LSE ST 2024/ST201 Page 4 of 12

> res4 = lm(log(Price)~ .-Year, data = data.train)

Residual standard error: 0.4736 on 3446 degrees of freedom

Age Usage Fuel Seller Transmission Owner Year

©LSE ST 2024/ST201 Page 5 of 12

(a) Consider the multiple linear regression model in Output 1(f).

©LSE ST 2024/ST201 Page 6 of 12

• Glucose: Two-hour plasma glucose concentration in an oral glucose tolerance test

©LSE ST 2024/ST201 Page 7 of 12

> res.glm1 = glm(Diabetes~., family = binomial, data = train)

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 793.94 on 613 degrees of freedom

Glucose BloodPressure SkinThickness Insulin BMI Age Diabetes Diet

©LSE ST 2024/ST201 Page 8 of 12

> res.glm2 = glm(Diabetes~.+ I(Age^2)+I(Age^3), family = binomial, data = train)

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 793.94 on 613 degrees of freedom

©LSE ST 2024/ST201 Page 9 of 12

©LSE ST 2024/ST201 Page 10 of 12

Age Price Usage Fuel Seller Transmission Owner Year

©LSE ST 2024/ST201 Page 11 of 12

©LSE ST 2024/ST201 Page 12 of 12

You might also like