0% found this document useful (0 votes)
40 views2 pages

DAR Descriptive

The document is a mid-term examination for a data analytics course using R. It contains 4 questions asking students to: 1) Analyze bank index data and patient satisfaction data using correlation plots and multiple linear regression; 2) Analyze closing prices of stock market indices using box plots, correlation plots, and multiple linear regression models; 3) Derive a regression equation and make a prediction based on data about seat belt usage and traffic deaths; 4) Apply logistic regression to IPL (Indian Premier League) sports data and compare two models.

Uploaded by

ganesh gowtham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views2 pages

DAR Descriptive

The document is a mid-term examination for a data analytics course using R. It contains 4 questions asking students to: 1) Analyze bank index data and patient satisfaction data using correlation plots and multiple linear regression; 2) Analyze closing prices of stock market indices using box plots, correlation plots, and multiple linear regression models; 3) Derive a regression equation and make a prediction based on data about seat belt usage and traffic deaths; 4) Apply logistic regression to IPL (Indian Premier League) sports data and compare two models.

Uploaded by

ganesh gowtham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

THIAGARAJAR SCHOOL OF MANAGEMENT

MADURAI

DATA ANALYTICS USING R – MID TERM EXAMINATION

Duration : 1 Hour 50 Minutes Marks: 30

Instructions

1. Write your own answers and don’t copy from others (Negative marks are applicable
to who are ALL copied and shared).
2. For Part B questions students should upload a single file contains the answer.
3. Time management is must for business students, i.e. no additional timing will be
provided (evidences are required even for genuine cases).

Part B – Answer any 3 questions (3 x 10 =30 marks)

1. a. Read the Bank NIFTY indices for recent FY.

i. Find the Correlation between variables (1 Marks)


ii. Use the Corrplot to plot the correlation values (2 Marks)
iii. Write your inferences (2 Marks)

b. Patient Satisfaction. A hospital administrator wished to study the relation between


patient satisfaction (Y) with patient’s age (X1, in years), severity of illness (X2, an
index) and anxiety level (X3, an index). The administrator randomly selected 46
patients and fitted a multiple linear regression model, for which the output is shown
below.
Predictor Coef SE Coef
Constant 158.49 18.13
X1 -1.1416 0.2148
X2 -0.4420 0.4920
X3 -13.470 7.100

R-Sq = 68.2%

i) Write the least squares regression model relating Y to X1, X2 and X3. (1.5 Marks)

ii) How does a patient’s age influence his/her satisfaction? (1.5 Marks)

a) For 1 year increase in age, predicted satisfaction increases by 1.14.


b) For 1 year increase in age, predicted satisfaction decreases by 0.44 controlling for X2 and X3
c) For 1 year increase in age, predicted satisfaction decreases by 5.31 controlling for X2 and X3.
d) For 1 year increase in age, predicted satisfaction decreases by 1.14.
e) For 1 year increase in age, predicted satisfaction decreases by 13.47 controlling for X2 and X3.
f) For 1 year increase in age, predicted satisfaction decreases by 1.14 controlling for X2 and X3.
iii) Demonstrate PREDICT function with your own test data for the above (2 marks).

2. Read the Closing Price of NIFTY FMGC & NIFTY Realty recent FY year (POC).

1. Box plot and Corrplot of any 3 variables and write your inferences (4 Marks)
2. Create any two models (using MLR) and justify which model is best? (4 Marks)
3. Demonstrate PREDICT function with your own test data (2 Marks).

3. Seat belts: An Australian study conducted in 2018 reported that failure to wear seat belts
led to 10500 deaths in the previous year and that the number of deaths would decrease by
1200 for every 3 percentage point gain in seat belt usage. It is also estimated that the number
of deaths would be 40,000 if nobody wear seat belts.

i) Write down the Regression Equation for the above data. Explain the notations
clearly. (3 Marks)

ii) Based on your answer above, what would be the predicted number of deaths in a
year if 65% of the people wear seat belts ? (2 Marks)
a) 39740 b) 130500 c) 14000 d) 26000 e) 39860

iii) True/False : Here X and Y have a positive correlation coefficient and hence a
positive association pattern : (1 Mark)

iv) Suppose the observed number of deaths in 2018 was 9755 and it is estimated that
80% of people wore seat belts that year as a result of an aggressive awareness
campaign by the Government. Then the residual will be _____________________
(4 Marks)

4. Apply your own creativity (with the context) using logistic regression (IPL RAWDATA)

a. Build at-least 2 models using logistic regression, justify which model is best and
write your inferences (10 marks).

You might also like