0% found this document useful (0 votes)
82 views2 pages

Homework2

This document provides instructions for homework assignments involving analysis of two datasets: 1) The Berkeley Guidance Study dataset containing demographic and health variables for males and females. Students are asked to generate scatter plots, regressions to predict weight, and analyze relationships between variables. 2) The UN fertility dataset containing economic and fertility rate data for 193 countries. Students are asked to generate scatter plots with linear and log transformations of variables and compare linear vs log regression models. Students are also asked to analyze the Heart disease dataset, generating scatter plots and regressions to predict cholesterol and blood pressure levels, testing differences between males and females, and comparing bi-linear and multiple regression models.

Uploaded by

Rahul Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views2 pages

Homework2

This document provides instructions for homework assignments involving analysis of two datasets: 1) The Berkeley Guidance Study dataset containing demographic and health variables for males and females. Students are asked to generate scatter plots, regressions to predict weight, and analyze relationships between variables. 2) The UN fertility dataset containing economic and fertility rate data for 193 countries. Students are asked to generate scatter plots with linear and log transformations of variables and compare linear vs log regression models. Students are also asked to analyze the Heart disease dataset, generating scatter plots and regressions to predict cholesterol and blood pressure levels, testing differences between males and females, and comparing bi-linear and multiple regression models.

Uploaded by

Rahul Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Homework #2

PART1

This problem uses data from two data sets, the Berkeley Guidance Study and a UN study on
Fertility rates for 193 countries. These data sets are available in Doc Sharing.  

Please submit responses to the following questions, including plots and analysis, to the Week 2.

The data from the Berkeley Guidance study resides in one file HW2_BGALL in both SAS and
*.csv format. In this study, the coding for gender is: 0 = males, 1= females

1. Generate a scatter-plot matrix of all continuous variables. Group the variables by


gender, so that each paired observation has a gender-associated label. Explain the relationships
that you see. Generate two linear regressions that predict WT18 (weight at 18 years old) . One
regression model is to predict the weight for boys and the other to predict the weight for girls.
Interpret the results. (25 pts)
2. Generate a multiple linear regression model to explain WT 18 using the variables HT2,
WT2, HT9, WT9 and ST9. Find the R-squared, the overall ANOVA table and overall F-test.
Compute the t-statistics  to test each of the coefficients against to be 0. State the conclusions
from this test. (25 pts)
The data for the UN study on fertility is in HW_UN for both *csv and sas format.  It contains
data on income, GDP and fertility rates for 193 nations. It also contains log transforms for both
variables.

1. Generate two scatter-plots of PPgdp vs Fertility – one using linear scaling for both
variables and  the other using log scaling for both variables. (10 pts)
2. Perform regressions using both log and linear transformations of the data. Note: I have
transform (20 pts)
3. Which regression exhibits a better fit (log or linear)? Use  the ANOVA summary support
your reasoning. (20 pts)

PART2

This involves the ‘Heart’ data set that is available on SAS Enterprise Guide, in the  SAS-Help
directory.

Please submit responses to the following questions, including plots and analysis, to the Week 2.

1. Generate a scatter plot matrix of the following continuous variables:  Briefly explain the
relationships that you see.

a.    Height, Weight, Diastolic, Systolic, MRW, Cholesterol  (30 pts)


2. Use t-tests  to accept or reject the statements listed below. State conclusions using the
terminology of hypothesis testing.  (30 pts)

a.    There is no difference in cholesterol levels between males and females.

b.    There is no difference in diastolic blood pressure between males and females.

3. Provide estimates and standard errors for:  (20 pts)

a.    Mean  cholesterol for the entire population.

b.    Diastolic blood pressure for the entire population.

4. Create 2 bi-linear regression models that predict cholesterol level:  (60 pts)

a.    For the first model,  use a continuous independent variable that exhibits the best
correlation with the independent variable

b.    For the second model, use a continuous variable that exhibits the next best  correlation
with the dependent variable

c.    Explain and support the difference between the models from steps (a) and (b)

5. Create a multiple-regression model that uses all of the continuous variables. (60 pts)

a. Explain and support the results of the model.

b. Compare the performance of the multiple regression model with the best bi-linear regression
model from the question 4 (the bi-linear model)

You might also like