Homework2
Homework2
PART1
This problem uses data from two data sets, the Berkeley Guidance Study and a UN study on
Fertility rates for 193 countries. These data sets are available in Doc Sharing.
Please submit responses to the following questions, including plots and analysis, to the Week 2.
The data from the Berkeley Guidance study resides in one file HW2_BGALL in both SAS and
*.csv format. In this study, the coding for gender is: 0 = males, 1= females
1. Generate two scatter-plots of PPgdp vs Fertility – one using linear scaling for both
variables and the other using log scaling for both variables. (10 pts)
2. Perform regressions using both log and linear transformations of the data. Note: I have
transform (20 pts)
3. Which regression exhibits a better fit (log or linear)? Use the ANOVA summary support
your reasoning. (20 pts)
PART2
This involves the ‘Heart’ data set that is available on SAS Enterprise Guide, in the SAS-Help
directory.
Please submit responses to the following questions, including plots and analysis, to the Week 2.
1. Generate a scatter plot matrix of the following continuous variables: Briefly explain the
relationships that you see.
4. Create 2 bi-linear regression models that predict cholesterol level: (60 pts)
a. For the first model, use a continuous independent variable that exhibits the best
correlation with the independent variable
b. For the second model, use a continuous variable that exhibits the next best correlation
with the dependent variable
c. Explain and support the difference between the models from steps (a) and (b)
5. Create a multiple-regression model that uses all of the continuous variables. (60 pts)
b. Compare the performance of the multiple regression model with the best bi-linear regression
model from the question 4 (the bi-linear model)