0% found this document useful (0 votes)

121 views11 pages

Notebook 2 - Linear Regression

Uploaded by

Blobby Hatchner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views11 pages

Notebook 2 - Linear Regression

Uploaded by

Blobby Hatchner

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Notebook 2 - Linear Regression

May 23, 2024

Reference Guide for R (student resource) - Check out our reference guide for a full listing
of useful R commands for this project.

0.1 Data Science Project: Use data to determine the best and worst colleges
for conquering student debt.
0.1.1 Notebook 2: Simple Linear Regression
Does college pay off? We’ll use some of the latest data from the US Department of Education’s
College Scorecard Database to answer that question.
In this notebook (the 2nd of 4 total notebooks), you’ll use R to create scatterplots, fit simple linear
regression models, and compare the strength of your models. By the end of this notebook, you’ll
see what factors make certain colleges better investments than others.
[1]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code
# This command downloads a useful package of R commands
library(coursekata)

�� CourseKata packages ��

coursekata 0.15.0 ��
� dslabs 0.8.0 � Metrics
0.1.4
� Lock5withR 1.2.2 � lsr
0.5.2
� fivethirtyeightdata 0.1.0 � mosaic
1.9.1
� fivethirtyeight 0.6.2 � supernova
3.0.0

0.1.2 The Dataset (four_year_colleges.csv)

General description - In this notebook, we’ll be using the four_year_colleges.csv file, which
only includes schools that offer four-year bachelors degrees and/or higher graduate degrees. Com-
munity colleges and trade schools often have different goals (e.g. facilitating transfers, direct career
education) than institutions that offer four-year bachelors degrees. By comparing four-year colleges
only to other four-year colleges, we’ll have clearer analyses and conclusions.

1
This data is a subset of the US Department of Education’s College Scorecard Database. The data
is current as of the 2020-2021 school year.
Description of all variables: See here
Detailed data file description: See here

0.1.3 1.0 - Creating scatterplots

To begin, let’s download our data. We’ll download the four_year_colleges.csv file from the
skewthescript.org website and store it in an R dataframe called dat.

[2]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code
# This command downloads the data
dat <- read.csv('https://skewthescript.org/s/four_year_colleges.csv')

1.1 - Use the head command to print out the first several rows of the dataset.
[3]: # Your code goes here
head(dat)

OPEID name city state region me

<int> <chr> <chr> <chr> <chr> <d
1 100200 Alabama A & M University Normal AL South 15.
2 105200 University of Alabama at Birmingham Birmingham AL South 15.
A data.frame: 6 × 26
3 105500 University of Alabama in Huntsville Huntsville AL South 14.
4 100500 Alabama State University Montgomery AL South 17.
5 105100 The University of Alabama Tuscaloosa AL South 17.
6 831000 Auburn University at Montgomery Montgomery AL South 12.
1.2 - Use the dim command to find the number of colleges (rows) and number of variables (columns)
in our dataset.
[4]: # Your code goes here
dim(dat)

1. 1053 2. 26
Check yourself: Your code should have printed out two numbers: 1053 and 26.
A good measure of whether attending a certain college “pays off” is its student loan default
rate. If a college is low-cost and prepares students for high-paying jobs, few students will default
on their loans. If a college is high-cost and does not prepare students for high-paying jobs, many
students will have trouble paying off their loans (high default rate).
So, our main outcome variable in this analysis will be default_rate. We’re going to use scatter-
plots to see how strongly different predictor variables correlate with default rates. In particular,
we’re going to explore how well each of the following variables predicts colleges’ default rates: -
pct_PELL - percent of student body that receives PELL grants. Note: PELL grants are government
scholarships given to students from low-income families - grad_rate - percent of students who suc-
cessfully graduate - net_tuition - Net tuition (tuition minus average discounts and allowances)
per student, in thousands of dollars

2
To begin, let’s create a scatterplot of colleges’ default rates and the percent of their student body
that receive PELL grants. We can use the gf_point command to make the graph:

[5]: ## Run this code but do not edit it

# Create scatterplot: default_rate ~ pct_PELL
gf_point(default_rate ~ pct_PELL, data = dat)

We see that there’s a positive relationship between pct_PELL and default_rate. The colleges with
the highest rates of PELL grant recipients (low-income students) also tend to have higher student
loan default rates. In other words, if you were to fit a model to this data, it would predict higher
default rates at schools that serve more PELL recipients.
We must keep in mind: correlation is not causation. The scatterplot shows us that default
rates and PELL recipient rates are positively correlated. However, the graph doesn’t show us a
clear causal explanation behind the correlation. For example, here are several causal explanations
that this graph can’t clarify: - PELL recipients may only be able to afford to attend low-quality
colleges. These colleges have higher default rates because they fail to prepare students for the
workforce. - PELL recipients may have less familial resources to weather the storms of financial
emergencies in the first few years after college. So, the schools that serve PELL recipients at high
rates will also have more of their students defaulting on loans (regardless of the school’s quality). -
PELL recipients may have attended lower-quality high schools, which don’t properly prepare them
for college. So, these students may drop out of college at higher rates, which raises their chances
of defaulting on student loans.
Or, it could be a combination of all those explanations! We can’t tell from this analysis alone.

3
1.3 - In the next question, you will create a scatterplot that visualizes the relationship between
grad_rate and default_rate. Before doing so, make a prediction: Do you expect student loan
default rates to positively or negatively correlate with graduation rates? Why?
Double-click this cell to type your answer here: negatively, since if students cannot pay off
loans, graduation rates would be low
1.4 - Create a scatterplot that visualizes the relationship between grad_rate (predictor) and
default_rate (outcome).

[6]: # Your code goes here

gf_point(default_rate ~ grad_rate, data = dat)

Check yourself: Your code should have generated a scatterplot with the x-axis labled with
grad_rate and the y-axis labeled with default_rate.
1.5 - Using your scatterplot, describe the relationship between graduation rates and student loan
default rates. For instance, are these variables positively or negatively related? How can you tell?
Does this corroborate your prediction from Question 1.3? Explain.
Double-click this cell to type your answer here: negatively related, for ever unit increase in
grad_rate, decrease in default_rate, does corroborate pediction from q 1.3

4
0.1.4 2.0 - Simple linear regression (one predictor)
2.1 - If you haven’t taken AP Stats, watch this video, which provides an introduction to linear
regression.
Note: This video is adapted from other materials and covers data from a separate context. How-
ever, the video provides a good intro to the concepts and models we’ll be using in this section of
the project.
Let’s create a linear regression model relating pct_PELL (x) and default_rate (y). To visualize our
model, we can graph the line modeled by our equation on top of the scatterplot relating pct_PELL
to default_rate. We use the gf_point command to produce the scatterplot, the gf_lm command
to graph our linear model, and the %>% symbol to put the elements together on the same graph:
[7]: ## Run this code but do not edit it
# Overlay linear model of default_rate ~ pct_PELL on top of scatterplot
gf_point(default_rate ~ pct_PELL, data = dat) %>% gf_lm(color = "orange")

2.2 - Is the slope value of this model positive or negative? How can you tell?
Double-click this cell to type your answer here: positive, line sloping upward
R can help us find the equation that models this linear regression line. As shown in the video,
we can model a linear trend between a predictor (x) and outcome (y) using this linear regression
formula:

5
𝑦 ̂ = 𝛽 0 + 𝛽1 𝑥
Where: - 𝑦 ̂ (pronounced “y hat”) is the predicted y-value (predicted outcome value) - 𝛽0 (pro-
nounced “beta zero”) is the y-intercept –> the predicted y-value (outcome value) when x = 0
(the predictor’s value is 0) - 𝛽1 (pronounced “beta 1”) is the slope –> the predicted change in y
(outcome) for a 1-unit increase in x (predictor) - 𝑥 is the x-value (predictor value)
To fit a linear regression model to a set of data in R, we use the lm command. lm stands for
“linear model.” Here, we use lm to find the linear regression model relating pct_PELL (x) and
default_rate (y).

[8]: ## Run this code but do not edit it

# Create and display linear model: default_rate ~ pct_PELL
PELL_model <- lm(default_rate ~ pct_PELL, data = dat)
PELL_model

Call:
lm(formula = default_rate ~ pct_PELL, data = dat)

Coefficients:
(Intercept) pct_PELL
-0.9327 0.1765

The output of the lm command is a bit clunky, but here’s what it means: - The (Intercept) value
is the y-intercept (𝛽0 ) - The pct_PELL value is the coeﬀicient for the predictor. In other words, it’s
the slope (𝛽1 )
So, our regression equation can be written as:

𝑦 ̂ = −0.9327 + (0.1765)𝑥

2.3 - Identify the slope value and interpret what it means (in context).
Double-click this cell to type your answer here: For every one unit increase in the pct_PELL
value, the default_rate increases by 0.1765
2.4 - Use the gf_point and gf_lm commands to visualize a linear regression model for predicting
default_rate (outcome) using grad_rate (predictor).

[10]: # Your code goes here

gf_point(default_rate ~ grad_rate, data = dat) %>% gf_lm(color = "orange")

6
Check yourself: Your scatterplot should have a line on it with a negative slope.
2.5 - Use the lm command to find the linear regression model you visualized above. Store the
model in an object called grad_model and print it to see its values.

[13]: # Your code goes here

grad_model <- lm(default_rate ~ grad_rate, data = dat)
grad_model

Call:
lm(formula = default_rate ~ grad_rate, data = dat)

Coefficients:
(Intercept) grad_rate
14.4600 -0.1584

Check yourself: If you print out grad_model, you should see two numbers: 14.46 and -0.1584.
2.6 - Identify the slope value and interpret what it means (in context).
Double-click this cell to type your answer here: slope -0.1584, for every one unit increase in
the grad_rate, there is a 0.1584 decrease in the loan default rate

7
0.1.5 3.0 - Analyzing strength (𝑅2 )
In addition to the direction of a relationship (positive or negative), we can also look at the strength
of a relationship. The strength is a measure of the quality of our model’s predictions. A key
metric for analyzing the strength of a model is 𝑅2 . The following diagram (from Skew The Script)
shows the 𝑅2 values of various linear models:
In the “weak” correlations, we see that our predictions (the linear model) tend to be far away from
the actual data values (the points). If we used a model with weak correlation to predict new data
values, our predictions would have high error. If we used a model with strong correlation to predict
new data values, our predictions would have low error.
𝑅2 takes values between 0 - 1 (alternatively: 0% - 100%). The stronger the model, the closer 𝑅2
gets to 1 (or 100%). The weaker the model, the closer 𝑅2 gets to 0 (or 0%). An intuitive way to
think about it: for the perfectly strong correlations, the model gives 100% perfect predictions. The
models explain 100% of the variation in the data, so 𝑅2 = 100%. As the correlations get weaker,
they start leaving room for error, since the models capture less of the variation in the data. So, the
𝑅2 value declines from 100%, approaching 0% if there’s no correlation (model adds no prediction
power compared to naive guessing).
Optional Resource: If you’d like a more thorough explanation of the math behind 𝑅2 , check out
this video.
To see the 𝑅2 values of our linear regression models, we can use the summary command. For
example, here we get the summary printout of grad_model.

[14]: ## Run this code but do not edit it

# Summarize default_rate ~ grad_rate model
summary(grad_model)

Call:
lm(formula = default_rate ~ grad_rate, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.9199 -1.4038 -0.2248 0.9011 20.5450

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.45997 0.29152 49.60 <2e-16 ***
grad_rate -0.15839 0.00474 -33.42 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.608 on 1051 degrees of freedom

Multiple R-squared: 0.5151, Adjusted R-squared: 0.5147
F-statistic: 1117 on 1 and 1051 DF, p-value: < 2.2e-16

There’s a lot going on in this printout. For now, focus at the bottom of the printed information.

8
The Multiple R-squared value is the 𝑅2 value for the model. In this case, 𝑅2 = 51.5%. So, we
can say that the correlation between graduation rates and student loan default rates is moderately
strong. This model would yield moderately strong predictions for default rates if used to predict
on new colleges.
3.1 - Let’s consider a new variable: net_tuition (tuition minus average discounts and allowances
per student, in thousands of dollars). How well does a school’s tuition predict its student loan
default rate? Let’s start exploring. Go ahead and create a scatterplot that visualizes the relationship
between net_tuition (predictor) and default_rate (outcome). Overlay a linear regression
model on the graph using the %>% gf_lm(color = "orange") command.

[15]: # Your code goes here

gf_point(default_rate ~ net_tuition, data = dat) %>% gf_lm(color="orange")

3.2 - Use the lm command to find the linear regression model you visualized above. Store the
model in an object called tuition_model and print out the model’s values.
[16]: # Your code goes here
tuition_model <- lm(default_rate ~ net_tuition, data = dat)
tuition_model

Call:
lm(formula = default_rate ~ net_tuition, data = dat)

9
Coefficients:
(Intercept) net_tuition
8.0029 -0.2077

Check yourself: If you print out tuition_model, you should see two numbers: 8.0029 and -0.2077.
3.3 - Use the summary command to find the 𝑅2 value of your linear model.
[17]: # Your code goes here
summary(tuition_model)

Call:
lm(formula = default_rate ~ net_tuition, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.4480 -1.9912 -0.5984 1.2492 25.4189

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.00294 0.21329 37.52 <2e-16 ***
net_tuition -0.20772 0.01331 -15.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.375 on 1051 degrees of freedom

Multiple R-squared: 0.1882, Adjusted R-squared: 0.1875
F-statistic: 243.7 on 1 and 1051 DF, p-value: < 2.2e-16

Check yourself: The 𝑅2 value for tuition_model should be 0.1882.

3.4 - When evaluating different college options to predict if attending them would “pay off,” many
students look very closely at the tuition and costs of attending. Very few students look at colleges’
graduation rates. Is this reasonable or a mistake? Justify your answers using the 𝑅2 values for the
grad_model and tuition_model.
Double-click this cell to type your answer here: for grad model much higher than tuition
model, so grad rates provide better estimate than tuition cost
3.5 - The correlation between tuition costs and student loan default rates is negative. This means
that as tuition costs get higher, fewer student tend to default on their student loans. Is that
possible? What might be going on here?
Double-click this cell to type your answer here: relationship would ideally be positive, but
quality of school may be preparing students to pay off tuition

10
0.1.6 Feedback (Required)
Please take 2 minutes to fill out this anonymous notebook feedback form, so we can continue
improving this notebook for future years!

M348 Applied Statistical Modelling - Applications
No ratings yet
M348 Applied Statistical Modelling - Applications
512 pages
Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
PS4 PDF
No ratings yet
PS4 PDF
10 pages
Problem Set 1
No ratings yet
Problem Set 1
5 pages
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
No ratings yet
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
25 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Notebook 3 - Multiple Regression
No ratings yet
Notebook 3 - Multiple Regression
11 pages
Notebook 3 - Multiple Regression
No ratings yet
Notebook 3 - Multiple Regression
10 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
16 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
17 pages
Lesson Week 13
No ratings yet
Lesson Week 13
6 pages
R Multiple Regression Exercise 2019
No ratings yet
R Multiple Regression Exercise 2019
6 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
Quanti - Simple Linear Regression - With Group Activities
No ratings yet
Quanti - Simple Linear Regression - With Group Activities
6 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Econometrics Cheat Sheet
No ratings yet
Econometrics Cheat Sheet
4 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Lecture 01
No ratings yet
Lecture 01
26 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Homework 2
100% (1)
Homework 2
14 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Econ 2b03 Assignment 1
No ratings yet
Econ 2b03 Assignment 1
8 pages
22 Linear Fit Post
No ratings yet
22 Linear Fit Post
7 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
RStudio
No ratings yet
RStudio
4 pages
HW1 Solution
No ratings yet
HW1 Solution
23 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Notebook 1 - Basic R & Data Exploration - Jupyter Notebook
No ratings yet
Notebook 1 - Basic R & Data Exploration - Jupyter Notebook
21 pages
Lab 2
No ratings yet
Lab 2
22 pages
cs447 - Tool Making Predictions With Simple Linear Regression
No ratings yet
cs447 - Tool Making Predictions With Simple Linear Regression
5 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
cs447 - Tool Assessing Linear Prediction Rules With Residuals
No ratings yet
cs447 - Tool Assessing Linear Prediction Rules With Residuals
7 pages
Topic1 P
No ratings yet
Topic1 P
301 pages
Econ 1-2
No ratings yet
Econ 1-2
33 pages
Econometrics: A Predictive Modeling Approach: Francis X. Diebold University of Pennsylvania
No ratings yet
Econometrics: A Predictive Modeling Approach: Francis X. Diebold University of Pennsylvania
247 pages
RT1 Project 1&2 Assignment
No ratings yet
RT1 Project 1&2 Assignment
5 pages
Unit 561 Unequal Variance and More With Answers
No ratings yet
Unit 561 Unequal Variance and More With Answers
13 pages
Lesson 4 8 Answer Key AP Precalculus Math Medic Db114f9b6f
No ratings yet
Lesson 4 8 Answer Key AP Precalculus Math Medic Db114f9b6f
3 pages
Objects Oriented Programming OOP
No ratings yet
Objects Oriented Programming OOP
66 pages
Pbset1 Dofile
No ratings yet
Pbset1 Dofile
3 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Statistical Sleuthing Through Generalized Linear Models: Statistics 149
No ratings yet
Statistical Sleuthing Through Generalized Linear Models: Statistics 149
4 pages
Econ117 ps1
No ratings yet
Econ117 ps1
6 pages
Applied Methods PHD Syllabus
No ratings yet
Applied Methods PHD Syllabus
8 pages
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
No ratings yet
A Brief Overview of The Classical Linear Regression Model: Introductory Econometrics For Finance' © Chris Brooks 2013 1
80 pages
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
No ratings yet
14.170: Programming For Economists: Melissa Dell Matt Notowidigdo Paul Schrimpf
52 pages
Bahan Univariate Linear Regression
No ratings yet
Bahan Univariate Linear Regression
64 pages
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
No ratings yet
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
5 pages
Swoboda A07 CH12 MultRegression TopUnivTemplate
No ratings yet
Swoboda A07 CH12 MultRegression TopUnivTemplate
20 pages
R Regression Exercise 2019
No ratings yet
R Regression Exercise 2019
9 pages
Note 4
No ratings yet
Note 4
18 pages
Solution Manual For Introductory Econometrics A Modern Approach 5th Edition Wooldridge 1111531048 9781111531041 PDF Download
83% (6)
Solution Manual For Introductory Econometrics A Modern Approach 5th Edition Wooldridge 1111531048 9781111531041 PDF Download
49 pages
Algebra I m5 Topic B Lesson 7 Teacher
No ratings yet
Algebra I m5 Topic B Lesson 7 Teacher
13 pages
A Review of Basic Econometrics
No ratings yet
A Review of Basic Econometrics
5 pages
Chapter 1
No ratings yet
Chapter 1
24 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
R Stats Cheatsheet
No ratings yet
R Stats Cheatsheet
1 page
Lecture 10
No ratings yet
Lecture 10
5 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
ML Assignment-01
No ratings yet
ML Assignment-01
7 pages
NOTES
No ratings yet
NOTES
14 pages
Chapter 8.3. Maximum Likelihood Estimation: Prof. Tesler
No ratings yet
Chapter 8.3. Maximum Likelihood Estimation: Prof. Tesler
11 pages
7SSMM700 Tutorial 3 Solutions
No ratings yet
7SSMM700 Tutorial 3 Solutions
22 pages
Lesson 1 Intro To Hypothesis Testing
No ratings yet
Lesson 1 Intro To Hypothesis Testing
26 pages
Pengaruh Penentuan Lokasi Terhadap Kesuksesan Usah
No ratings yet
Pengaruh Penentuan Lokasi Terhadap Kesuksesan Usah
12 pages
Essentials of Econometrics 4th Edition Gujarati Solutions Manual Instant Download
100% (2)
Essentials of Econometrics 4th Edition Gujarati Solutions Manual Instant Download
48 pages
Comparingfootballteams
No ratings yet
Comparingfootballteams
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
54 pages
Machine Learning-5
No ratings yet
Machine Learning-5
89 pages
Correlation and Regression With Just Excel
No ratings yet
Correlation and Regression With Just Excel
4 pages
Correlation and Regression: Statistics For Economics 1
No ratings yet
Correlation and Regression: Statistics For Economics 1
72 pages
SSC CGL Jso Answer Key
No ratings yet
SSC CGL Jso Answer Key
36 pages
Dynamic Panel Data
No ratings yet
Dynamic Panel Data
51 pages
Predicting Student Academic Success DDA
No ratings yet
Predicting Student Academic Success DDA
26 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
34 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
72 pages
SPM Form 4/5 Mini Math Notes
No ratings yet
SPM Form 4/5 Mini Math Notes
9 pages
Jurnal Jutrids C Indu-1
No ratings yet
Jurnal Jutrids C Indu-1
14 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
2a Notes Measures of Dispersion
No ratings yet
2a Notes Measures of Dispersion
10 pages
FIN B385F Formula Book Unit 1
No ratings yet
FIN B385F Formula Book Unit 1
3 pages
Anel Peralta Participation Activity 2
No ratings yet
Anel Peralta Participation Activity 2
7 pages
Bulacan State University: Republic of The Philippines
No ratings yet
Bulacan State University: Republic of The Philippines
12 pages
Chapter1
No ratings yet
Chapter1
55 pages
Data Analyst Roadmap
No ratings yet
Data Analyst Roadmap
10 pages
Analysis of Survival Data First Edition Cox - Download The Ebook Today To Explore Every Detail
100% (1)
Analysis of Survival Data First Edition Cox - Download The Ebook Today To Explore Every Detail
62 pages
22-Article Text-144-3-10-20200220
No ratings yet
22-Article Text-144-3-10-20200220
10 pages
Ch6 Forecasting
No ratings yet
Ch6 Forecasting
20 pages
S5 M1 Quiz 9 - Normal Distribution
No ratings yet
S5 M1 Quiz 9 - Normal Distribution
4 pages

Notebook 2 - Linear Regression

Uploaded by

Notebook 2 - Linear Regression

Uploaded by

Notebook 2 - Linear Regression

May 23, 2024

�� CourseKata packages ������������������������������������

0.1.2 The Dataset (four_year_colleges.csv)

0.1.3 1.0 - Creating scatterplots

OPEID name city state region me

[5]: ## Run this code but do not edit it

[6]: # Your code goes here

[8]: ## Run this code but do not edit it

[10]: # Your code goes here

[13]: # Your code goes here

[14]: ## Run this code but do not edit it

Residual standard error: 2.608 on 1051 degrees of freedom

[15]: # Your code goes here

Residual standard error: 3.375 on 1051 degrees of freedom

Check yourself: The 𝑅2 value for tuition_model should be 0.1882.

You might also like

�� CourseKata packages ��