0% found this document useful (0 votes)
6 views5 pages

Section 10.1 - 2 - Shared Lab

Uploaded by

hannahsunday770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views5 pages

Section 10.1 - 2 - Shared Lab

Uploaded by

hannahsunday770
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab 10.

1 (2)- Multiple Regression

STAT 200: Lab Activity for Section 10.1.2


Multiple Regression - Learning objectives:
 Use computer output to make predictions and interpret coefficients using a multiple regression
model
 Test the effectiveness of individual terms in the regression model
 Use the Analysis of Variance to understand how the variability found with the response variable is
partitioned into one of two explanations: explained or not explained by the regression model
 Calculate and Interpret R-squared in the context of regression model

Activity 1: Predicting Domestic Gross Income with Movies


Data on 162 movies released in Hollywood between 2016 and 20181
Data Dictionary: The dataset HollywoodMovies2016_2018 (available on Canvas) includes nine variables:
 Name
 Studio (produced)
 Year (year released)
 Genre (Drama, Comedy, etc)
 Dom_Gross (gross income for domestic (U.S.) viewers in millions $)
 Audience Score (via Rotten Tomatoes: percent who gave the movie a positive rating)
 T_OpenWkend (number of screens where movie was shown in theatres for opening weekend)
 G_OpenWkend (opening weekend gross income in millions $)
 Budget (production budget in millions $)

1
https://www.lock5stat.com/datapage.html (3rd edition)
Exploratory Data Analysis
Open the dataset.

1. How many of these variables are quantitative? the response variable is?

2. Make simple dotplots of the four predictor variables. Look at shape and the range of values for
each predictor.

Analysis
3. Using intuition, are there any of the four predictors that you tentatively think would do a good job
of predicting domestic gross income?

It is always good to adhere to model simplicity where you build a regression model that has the fewest
number predictors needed to explain the variability found in the response variable. Because of this, we
will not immediately use the k = 4 model (four predictors).

© - Pennsylvania State University


Lab 10.1 (2)- Multiple Regression

4. Obtain a correlation matrix plot (an array of scatterplots) and pairwise correlations for all
quantitative variables, including the response variable. Use the variable order found in the data
dictionary above.

Reminder: Stat > Basic Stats > Correlation: Under Graphs select: Correlations

When you look at the scatterplots of the response variable versus each predictor variable:
A. Is there one predictor that shows a very strong linear relationship with the response variable?

B. Is there one predictor that shows a nonlinear relationship with the response variable which
would suggest that it should not be included in the multiple regression model?

C. Are there two predictor variables that show a rather strong linear relationship which could
suggest that both predictors may not be needed in the model?

5. Fit the k = 3 multiple regression model where you include predictors: AudienceScore,
G_OpenWkend, and Budget.

Reminder: Stat > Regression > Regression > Fit Regression Model: Under Results select: Basic
Tables

6. Which predictors are effective at the 0.05 level? Which is the most effective predictor (Note: you
will need to go beyond just looking at the p-value)?

7. Is there a plausible explanation as to why budget is not an effective predictor?

8. With the k = 3 fitted model, interpret the coefficient of G_OpenWkend in context.

9. With the k = 3 fitted model, Interpret R-squared in context.

10. Predict the Domestic Gross Income (in millions $) for a movie with an Audience Score of 50%,
where Opening Weekend Gross Income is 10 million ($), from a Budget of 40 million ($).

Reminder: Stat > Regression > Regression > Predict

© - Pennsylvania State University


Lab 10.1 (2)- Multiple Regression

11. On page 617 from your textbook, the authors explain how the variability in the response variable
can be partitioned into one of two explanations.

Regression: Error:
Total
Variability Variability not
Variability in the
explained by the explained by the
response variable
model model

Look at the Analysis of Variance (ANOVA) Table on the Minitab output and find the values for the
three variabilities.

SSModel (Regression) =

SSE (Error) =

SST0 (Total) =

For any regression model, the Coefficient of Determination R2 is:

2 SSModel
R=
SSTotal

With the k = 3 Model: Verify the calculation of R2. Does it match the number found on the Minitab
Output?

*Note: When you fit a regression model to a data set, you should always first check to see if the
model conditions are met. This is covered in Section 10.2 of your textbook. This topic is beyond
what is typically covered in Stat 200. However, when working with real data, this is a crucial step. A
second applied course in Statistics would include this essential topic.

© - Pennsylvania State University


Lab 10.1 (2)- Multiple Regression

Activity 2: Predicting crocodile body lengths


The ranges inhabited by the Indian gharial crocodile and the Australian saltwater crocodile overlap in
Bangladesh, a country located in Southern Asia. The Indian gharial crocodile is known for its long, thin
snout and sharp, interlocking teeth. The Australian saltwater crocodile has a wide snout, along with the
strongest bite of any living animal, which allows them to drown or swallow their prey as whole.

Wildlife scientists have used crocodile skeletons to measure the lengths of both the heads and the
complete bodies (in centimeters) with both crocodile species.

Data Dictionary: The dataset Crocodile (available on Canvas) includes three variables2:
 Body Length (centimeters)
 Head Length (centimeters)
 Species: Indian, Australian, where 17 are Indian and 15 are Australian

2
De Veaux, R., Velleman, P., and Bock, D., 2020. Data and Models, 5th edition, Pearson Education.
Exploratory Data Analysis
1. Categorize the three variables (quant or cat).

2. We’re going to explore if head length can be a predictor of body length. What is the response
variable in that situation?

3. Obtain a scatterplot of the two variables, where y = Body Length and x = Head Length. Is it
appropriate to fit a simple linear regression model to this data?

Analysis
4. Fit a simple regression model where Head Length is used to predict Body Length. Write out the
regression equation.

5. Is Head Length an effective predictor at the 0.05 level? What is the value for R 2?

Next, we will investigate this question: Could the categorical variable Species explain more the variability
in Body Length?

6. First make a scatterplot, where y = Body Length and x = Head Length. Include Species as a grouping
variable.
Reminder: Graph > Scatterplot > Groups Overlaid

7. What do you notice when looking at this scatterplot? Does it appear that Species is important
examining the relationship between the two quantitative variables? If you fit a simple regression
line for each species, what would you find?

8. Fit a regression model where you also include Species as a Categorical variable

© - Pennsylvania State University


Lab 10.1 (2)- Multiple Regression

Reminder: Stat > Regression > Regression > Fit Regression Model (now include the categorical
variable of Species)

9. Is Head Length still an effective predictor at the 0.05 level? What is the value for R 2? Does it appear
that Species helps to explain more of the variability in Body Length?

© - Pennsylvania State University

You might also like