0% found this document useful (0 votes)
0 views

Assignment_02

This assignment requires students to conduct regression analyses using provided data on GMAT scores and GPAs, as well as simulate data for linear regression models. Students must interpret coefficients, predict values, and assess interaction effects among various predictors related to starting salaries. Additionally, the assignment involves generating data with varying noise levels and analyzing the impact on model fit and confidence intervals.

Uploaded by

zhiqianhuang813
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

Assignment_02

This assignment requires students to conduct regression analyses using provided data on GMAT scores and GPAs, as well as simulate data for linear regression models. Students must interpret coefficients, predict values, and assess interaction effects among various predictors related to starting salaries. Additionally, the assignment involves generating data with varying noise levels and analyzing the impact on model fit and confidence intervals.

Uploaded by

zhiqianhuang813
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

BU.510.

650 Assignment #2
Data Analytics Page 1 of 2
Dr. Ruxian Wang Johns Hopkins Carey Business School

Assignment #2

Attention: Please prepare two files for each homework assignment: the .docx or .pdf file for your
answers including figures to each question; the other .R file for your R script. File names should
be “LastName FirstName number.docx” and “LastName FirstName number.R”. All assignments
should submitted via our course website.

1. Grade point average of 12 graduating MBA students, GPA, and their GMAT scores taken
before entering the MBA program are given below. Use the GMAT scores as a predictor of
GPA, and conduct a regression of GPA on GMAT scores.

x=GMAT y=GPA
560 3.20
540 3.44
520 3.70
580 3.10
520 3.00
620 4.00
660 3.38
630 3.83
550 2.67
550 2.75
600 2.33
537 3.75

(a) Obtain and interpret the coefficient of determination R2 .


(b) Calculate the fitted value for the second person
(c) Test whether GMAT is an important predictor variable (use significant level 0.05)

2. Suppose we have a data set with five predictors, X1 =GPA, X2 = IQ, X3 = Gender (1
for Female and 0 for Male), X4 = Interaction between GPA and IQ, and X5 = Interaction
between GPA and Gender. The response is starting salary after graduation (in thousands of
dollars). Suppose we use least squares to fit the model, and get βb0 = 50, βb1 = 20, βb2 = 0.07,
βb3 = 35, βb4 = 0.01, βb5 = −10.

(a) Which answer is correct, and why?


i. For a fixed value of IQ and GPA, males earn more on average than females.
ii. For a fixed value of IQ and GPA, females earn more on average than males.
iii. For a fixed value of IQ and GPA, males earn more on average than females provided
that the GPA is high enough.
iv. For a fixed value of IQ and GPA, females earn more on average than males provided
that the GPA is high enough.
(b) Predict the salary of a female with IQ of 110 and a GPA of 4.0.
(c) True or false: Since the coefficient for the GPA/IQ interaction term is very small, there
is very little evidence of an interaction effect. Justify your answer.
2 BU.510.650, Assignment #2

3. In this exercise you will create some simulated data and will fit simple linear regression
models to it. Make sure to use command set.seed(1) prior to starting part (a) to ensure
consistent results. (Hint: rnorm(n, mean = a, sd = b) generates n random variables with
mean a, standard deviation b, e.g., rnorm(100, mean = 10, sd = 5) returns a vector with
100 values, each of which follows a normal distribution with mean 10 and standard deviation
5.)

(a) Using the rnorm() function, create a vector, x, containing 100 observations drawn from
a N (0, 1) distribution. This represents a feature, X.
(b) Using the rnorm() function, create a vector, , containing 100 observations drawn from
a N (0, 0.25) distribution i.e. a normal distribution with mean zero and variance 0.25.
(c) Using x and , generate a vector y according to the model

Y = −1 + 0.5X + . (1)

What is the length of the vector y? What are the values of β0 and β1 in this linear
model?
(d) Create a scatterplot displaying the relationship between x and y. Comment on what
you observe.
(e) Fit a least squares linear model to predict y using x. Comment on the model obtained.
How do βb0 and βb1 compare to β0 and β1
(f) Now fit a polynomial regression model that predicts y using x and x2 . Is there evidence
that the quadratic term improves the model fit? Explain your answer.
(g) Repeat (a)-(f) after modifying the data generation process in such a way that there is
less noise in the data. The model (1) should remain the same. You can do this by
decreasing the variance of the normal distribution used to generate the error term  in
(b). Describe your results.
(h) Repeat (a)-(f) after modifying the data generation process in such a way that there is
more noise in the data. The model (1) should remain the same. You can do this by
increasing the variance of the normal distribution used to generate the error term  in
(b). Describe your results.
(i) What are the confidence intervals for β0 and β1 based on the original data set, the noisier
data set, and the less noisy data set? Comment on your results.

You might also like