0% found this document useful (0 votes)
27 views

Assignment 2

This document outlines an assignment for a statistical learning course involving analyzing various datasets using R. It includes 5 questions analyzing datasets on household children, car insurance claims, stock prices, housing prices, and developing regression models. Students are asked to test distributions, calculate probabilities, determine best fitting distributions, estimate parameters, perform regression analyses, and evaluate assumptions and predictions. The deadline for submission is February 16, 2024.

Uploaded by

CHAI TZE ANN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Assignment 2

This document outlines an assignment for a statistical learning course involving analyzing various datasets using R. It includes 5 questions analyzing datasets on household children, car insurance claims, stock prices, housing prices, and developing regression models. Students are asked to test distributions, calculate probabilities, determine best fitting distributions, estimate parameters, perform regression analyses, and evaluate assumptions and predictions. The deadline for submission is February 16, 2024.

Uploaded by

CHAI TZE ANN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Term 2, 2023/2024

DSA211 Statistical Learning with R

Assignment 2

Deadline of Submission: by 2pm on 16 February, 2024 (Friday)

Use R functions to solve the following problems:

1. A survey recorded the number of children 18 years old or younger who lived in 800
households. The data are stored in Children2024A.csv.

(a) Test whether the data set is not from Poisson distribution at α = 0.05.
(b) Based on the Poisson model found in part (a), find the probability that a randomly
selected household has at least two children 18 years old or younger.

2. In a local insurance company, 150 claim amounts (in thousand dollars) of car insur-
ance policies were randomly selected. The claim amounts are recorded and stored in
CarInsurance2024A.csv.

(a) Determine which distribution (normal, lognormal, or exponential) is the best choice
to model the above data.
(b) Plot graphs to show how the data set fits to the best model found in part (a).
(c) What are the estimated parameters of the best model found in part (a)?

3. You are given the data set SingTel2024A.csv. The data file is the record of Singapore
Telecommunication (SingTel) Limited’s (with code Z74.SI) stock price (in AdjClose col-
umn) from 6 December 2022 to 5 December 2023. Suppose an investment has 50,000
shares of on 5 December 2023 at price of $2.30 per share.

(a) Assume the daily return rate is normally distributed, calculate the one-day 98.5%
VaR for the investment.
(b) Based on the historical approach without any assumption of daily return rate
distribution, calculate the one-day 98.5% VaR for the investment.

1
4. Based on the data set SingTel2024A.csv, we would like to develop a simple linear
regression model to predict the daily price range (difference between daily high price
and daily low price), based on the size of daily trade volume. Justify whether we should
apply this simple linear regression model to this data set at the 0.05 level of significance.
If not, provide your justifications. If yes, construct the simple linear regression model.

5. The HDB2024A.csv data set consists of a random sample of 60 recently sold HDB
flats in Singapore. The selling price Price and the assessed value Value in dollars were
recorded in the data set. Suppose that we wanted to develop a model to predict selling
price based on assessed value (each house had been assessed at full value one year prior
to the study).

(a) Estimate the simple linear regression equation by the least squared method.
(b) Construct a scatter plot with the estimated simple linear regression equation.
(c) Perform a residual analysis on your results and evaluate the regression assumptions.
(d) At the 0.05 level of significance, is there evidence of a linear relationship between
the assessed value and the selling price?
(e) Construct a 95% confidence interval estimate of the population slope.
(f) Construct a 95% prediction interval of the selling price for the assessed value of 700
thousand dollars.
(g) Construct a 95% confidence interval estimate of the average selling price for the
assessed value of 750 thousand dollars.
(h) Is it appropriate to use the model to predict the selling price for the assessed value
of 1 million dollars? Justify your answer.

-END-

You might also like