0% found this document useful (0 votes)
58 views2 pages

Hands On Analysis - Simple Linear Regression - Baseball Data

This document discusses using simple linear regression to analyze the relationship between batting average (X) and home runs (Y) for baseball players. It uses data from 209 players in the 2002 season who had at least 100 at-bats. Steps include constructing a scatter plot, performing the regression, checking assumptions, and interpreting the results.

Uploaded by

UTKARSH PABALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views2 pages

Hands On Analysis - Simple Linear Regression - Baseball Data

This document discusses using simple linear regression to analyze the relationship between batting average (X) and home runs (Y) for baseball players. It uses data from 209 players in the 2002 season who had at least 100 at-bats. Steps include constructing a scatter plot, performing the regression, checking assumptions, and interpreting the results.

Uploaded by

UTKARSH PABALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Hands on Analysis: Simple Linear Regression

Suppose we are interested in whether there is a relationship between batting average (X)
and number of home runs (Y) a player hits. Some fans might argue, for example, that
those who hit lots of home runs also tend to make a lot of strikes outs so that their batting
average is lower. Let us check it out, using a regression of the number of home runs
against the player’s batting average (hits divided by at bats). Because baseball batting
averages tend to be highly variable for low number of at bats, we restrict our data set to
those players who has at least 100 at bats for the 2002 season. This leaves us with 209
players.

39. Construct a scatter plot of home runs versus batting average.


40. Informally, is there evidence of a relationship between the variables?
41. What would you say about the variability of the number of home runs, for those with higher
batting averages?
42. Refer to the previous exercise. Which regression assumption might this presage difficulty
for?
43. Perform a regression of home runs on battling average. Obtain a normal probability plot of
the standardization residuals from this regression. Does the normal probability plot indicate
acceptable normality. Construct a plot of the residuals versus the fitted values (fitted values
refers to y’s). What pattern do you see? What does this indicate regarding the regression
assumptions?
44. Take a natural log of home runs, and perform a regression of In home runs on batting
average. Obtain a normal probability plot of the standardized residuals from this regression.
Does the normal probability plot indicate acceptable normality?
45. Construct a plot of the residuals versus the fitted values. Do you see strong evidence that the
constant variance assumption has been violated? (Remember to avoid the Rorschach effect.)
therefore conclude that the assumptions are validated.
46. Write the population regression equation for our model. Intercept the meeting of 𝛽0 and 𝛽1.
47. State the regression equation (from the regression results) in words and numbers.
48. Interpret the value of the y-intercept b0.
49. Interpret the value of the slope b1.
50. Estimate the number of home runs (not In home runs) for a player with a batting average of
0.300.
51. What is the size of the typical error in predicting the number of home runs, based on the
player’s batting average?
52. What percentage of the variability in the In home runs does batting average account for?
53. Perform the hypothesis test for determining whether a linear relationship exist between the
variables.
54. Construct and interpret a 95% confidence interval for the unknown true slope of the
regression line.
55. Construct and interpret a 95% confidence interval for the mean number of home runs for all
players who had a batting average of 0.300.
56. Construct and interpret a 95% prediction interval for a randomly chosen player with a 0.300
batting average. Is this prediction interval useful?
57. List all the outliers. Mention the value of all the variables for the outliers.
58. List the high leverage points.
59. List the influential observations, according to Cook’s distance.

You might also like