0% found this document useful (0 votes)

219 views

Notebook 3 - Multiple Regression

This document is a guide for conducting a data science project using multiple regression analysis to evaluate the impact of various factors on student loan default rates among four-year colleges. It details the use of R programming to analyze data from the US Department of Education's College Scorecard Database, focusing on variables such as net tuition and graduation rates. The document emphasizes the importance of controlling for confounding variables to accurately assess the relationship between college costs and student debt outcomes.

Uploaded by

simoncheng

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

219 views

Notebook 3 - Multiple Regression

Uploaded by

simoncheng

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Notebook 3 - Multiple Regression

May 22, 2024

Reference Guide for R (student resource) - Check out our reference guide for a full listing
of useful R commands for this project.

0.1 Data Science Project: Use data to determine the best and worst colleges
for conquering student debt.
0.1.1 Notebook 3: Multiple Regression
Does college pay off? We’ll use some of the latest data from the US Department of Education’s
College Scorecard Database to answer that question.
In this notebook (the 3rd of 4 total notebooks), you’ll use R to create a more advanced type of
model: multiple regression models. In doing so, you’ll be able to isolate which factors (controlling
for other variables) that make certain colleges worth the price of admission.

[1]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code.
# This command downloads a useful package of R commands
library(coursekata)

�� CourseKata packages ��

coursekata 0.15.0 ��
� dslabs 0.8.0 � Metrics
0.1.4
� Lock5withR 1.2.2 � lsr
0.5.2
� fivethirtyeightdata 0.1.0 � mosaic
1.9.1
� fivethirtyeight 0.6.2 � supernova
3.0.0

0.1.2 The Dataset (four_year_colleges.csv)

General description - In this notebook, we’ll be using the four_year_colleges.csv file, which
only includes schools that offer four-year bachelors degrees and/or higher graduate degrees. Com-
munity colleges and trade schools often have different goals (e.g. facilitating transfers, direct career
education) than institutions that offer four-year bachelors degrees. By comparing four-year colleges
only to other four-year colleges, we’ll have clearer analyses and conclusions.

1
This data is a subset of the US Department of Education’s College Scorecard Database. The data
is current as of the 2020-2021 school year.
Description of all variables: See here
Detailed data file description: See here

0.1.3 1.0 - Motivating multiple regression

To begin, let’s download our data. We’ll download the four_year_colleges.csv file from the
skewthescript.org website and store it in an R dataframe called dat.

[2]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code.
# This command downloads data from the file 'colleges.csv' and stores it in an␣
↪object called `dat`

dat <- read.csv('https://skewthescript.org/s/four_year_colleges.csv')

head(dat)

OPEID name city state region me

<int> <chr> <chr> <chr> <chr> <d
1 100200 Alabama A & M University Normal AL South 15.
2 105200 University of Alabama at Birmingham Birmingham AL South 15.
A data.frame: 6 × 26
3 105500 University of Alabama in Huntsville Huntsville AL South 14.
4 100500 Alabama State University Montgomery AL South 17.
5 105100 The University of Alabama Tuscaloosa AL South 17.
6 831000 Auburn University at Montgomery Montgomery AL South 12.
As before, we’re going to use student loan default rates as our key outcome variable in
determining whether college “pays off.”
In the previous notebook, we looked at the following predictors of student loan default rates: -
pct_PELL - percent of student body that receives PELL grants. Note: PELL grants are government
scholarships given to students from low-income families - grad_rate - percent of students who
successfully graduate - net_tuition - Net tuition (tuition minus average discounts and allowances)
per student, in thousands of dollars
In the last notebook, we fit a simple linear regression model to predict default_rate (outcome)
using net_tuition (predictor). Below is the scatterplot we produced (along with a visual of our
linear model):

[3]: ## Run this code but do not edit it

# create scatterplot: default_rate ~ net_tuition, with linear model overlayed
gf_point(default_rate ~ net_tuition, data = dat) %>% gf_lm(color = "orange")

2
1.1 (Review Question) - Is the correlation between tuition costs and student loan default rates
positive or negative? Does the direction of the relationship suprise you? Why or why not?
Double-click this cell to type your answer here:
Below is the same graphic except, this time, we color the colleges by their graduation rates. Take
a look:
[4]: ## Run this code but do not edit it
# show scatter for default_rate ~ net_tuition, color by grad_rate
gf_point(default_rate ~ net_tuition, color = ~grad_rate, data = dat)

3
Note: As stated in the dataset description above, default_rate describes the percent of all of a
school’s borrowers that are in default on their student loans. This includes students who have
graduated, transferred, or did not complete their programs.
There’s a lot going on in this graph. For help, we recommend watching this video, which
discusses how to interpret graphs that visualize multiple variables at once.
1.2 - Look at the bottom-right corner of the graph. These are colleges that charge their students
a lot of money (high tuition) yet, somehow, they have low student loan default rates. Describe the
graduation rates of these schools.
Double-click this cell to type your answer here: The graduation rates of high tuition schools
are close to 100 because the dots on the right are generally yellow/yellow-green.
1.3 - Look at the top-left corner of the graph. These are colleges that don’t charge a lot (low
tuitions) yet, somehow, their students have high default rates. Describe the graduation rates of
these schools.
Double-click this cell to type your answer here: The graduation rates of lower tuition schools
tend to be lower, from 0 to 50%, as the dots are blue/blue-green on the left side of the graph.
1.4 - Based on your answers to the previous two questions, give a possible reason why students
at lower-cost schools (who, presumably, have less initial debt than their peers) somehow have
higher loan default rates.
Double-click this cell to type your answer here: Lower cost schools might be more accessible
for students in poorer financial scenarios, which means they may not be able to pay off loans or

4
finish schooling, which increases default rates and decreases graduation rates.
In data science, we say that graduation rates and tuition are confounded. Since they both rise
and fall together, it can be hard to tell which is really “making the difference” in default rates. Is
it possible to “tease out” which factor is more directly associated with students being able to pay
off their loans? The next section will introduce you to a new type of modeling - multiple regression
- that can help us answer this question.

0.1.4 2.0 - Fitting and interpreting a multiple regression model

Again, let’s show the scatterplot between net_tuition (predictor) and default_rate (outcome),
along with the linear model:
[5]: ## Run this code but do not edit it
# create scatterplot: default_rate ~ net_tuition, with linear model overlayed
gf_point(default_rate ~ net_tuition, data = dat) %>% gf_lm(color = "orange")

2.1 (Review Question) - Use the lm command to fit and store the linear regression model that’s
visualized above, using net_tuition (predictor) in order to predict default_rate (outcome). Save
the model in an object called tuition_model and print out the model.
[8]: # Your code goes here
tuition_model <- lm(default_rate ~ net_tuition, data=dat)
tuition_model

5
Call:
lm(formula = default_rate ~ net_tuition, data = dat)

Coefficients:
(Intercept) net_tuition
8.0029 -0.2077

Check yourself: If you print out tuition_model, you should see two numbers: 8.0029 and -0.2077.
Recall that simple linear regressions follow this formula:

𝑦 ̂ = 𝛽 0 + 𝛽1 𝑥
Where: - 𝑦 ̂ is the predicted y-value (predicted outcome value) - 𝛽0 is the y-intercept –> the predicted
y-value (outcome value) when x = 0 (the predictor’s value is 0) - 𝛽1 is the slope –> the predicted
change in y (outcome) for a 1-unit increase in x (predictor) - 𝑥 is the x-value (predictor value)
2.2 (Review Question) - What is the slope value from our tuition_model? Interpret the
meaning of this value (in context).
Double-click this cell to type your answer here: The slope value is -0.2077, meaning for
every $1,000 increase in net tuition, default rate is expected to decrease 0.2077 percent.
2.3 (Review Question) - Use the summary command on tuition_model to see summary infor-
mation about the linear model. What is the 𝑅2 value from our tuition_model? What does this
value indicate about the strength of the model?
[9]: # Your code goes here
summary(tuition_model)

Call:
lm(formula = default_rate ~ net_tuition, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.4480 -1.9912 -0.5984 1.2492 25.4189

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.00294 0.21329 37.52 <2e-16 ***
net_tuition -0.20772 0.01331 -15.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.375 on 1051 degrees of freedom

Multiple R-squared: 0.1882, Adjusted R-squared: 0.1875
F-statistic: 243.7 on 1 and 1051 DF, p-value: < 2.2e-16

6
Check yourself: You should find an 𝑅2 value of 0.1882
Double-click this cell to type your answer here: the R^2 value is 0.1882, implying a weak
correlation between net tuition and default_rate
Student Note: There’s a lot going on in the following section. We recommend taking a break
to watch this video, which provides an overview of multiple regression models and walks through
interpreting the values from this model. Once you’re done with the video, continue reading below.
So far, we have only been working with simple linear regressions: models that use one predictor
variable (net_tuition) to predict the outcome variables (default_rate). If we’d like to use
multiple predictor variables at once in order to model our outcome, we can use a technique called
multiple regression.
For example, imagine we want to use both net_tuition (𝑥1 ) and grad_rate (𝑥2 ) to predict
default_rate (𝑦). We can write a new model with multiple predictors, like this:

𝑦 ̂ = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2
Where: - 𝑦 ̂ is the predicted default_rate - 𝑥1 is the net_tuition - 𝑥2 is the grad_rate
This means that… - 𝛽1 is the slope for net_tuition –> the slope between default_rate and
net_tuition, controlling for all other predictors - 𝛽2 is the slope for grad_rate –> the slope
between default_rate and grad_rate, controlling for all other predictors - 𝛽0 is the y-intercept
–> the predicted y-value when 𝑥1 = 0 and 𝑥2 = 0 (when net_tuition and grad_rate are both 0)
Let’s go ahead and fit this model, so we can understand what this all really means. In R, if we
want to use multiple predictors within our model (such as net_tuition and grad_rate), we simply
include both of them in our lm command. See below:
[10]: ## Run this code but do not edit it
# fit multiple regression: default_rate ~ net_tuition + grad_rate
tuition_grad_model <- lm(default_rate ~ net_tuition + grad_rate, data = dat)
tuition_grad_model

Call:
lm(formula = default_rate ~ net_tuition + grad_rate, data = dat)

Coefficients:
(Intercept) net_tuition grad_rate
14.478742 0.006692 -0.160296

As you can see, the values have changed a bit, and an extra slope term has now appeared in our
model. We can plug these values into our model like so:

𝑦 ̂ = 14.479 + (0.007)𝑥1 + (−0.160)𝑥2

Here’s how we can interpret the slopes in our model: - 𝛽1 = 0.007 –> For every 1,000 dollar
increase in net_tuition, we expect a 0.007 percent point increase in default_rate, controlling

7
for grad_rate - 𝛽2 = −0.160 –> For every 1 percentage point increase in grad_rate, we expect a
0.160 percentage point decrease in default_rate, controlling for net_tuition
The key is that multiple regression allows you to control for other predictors, which helps us
eliminate confounding. When we can control for graduation rates - i.e. when comparing colleges
with similar graduation rates - we see that tuition is now positively related to default rates. In
other words, if students attend colleges with similar graduation rates, we’d expect the one that
charges more in tuition to have higher rates of default.
So, charging students more for school is, in fact, associated with higher rates of default - as long
as we’re comparing among schools with similar graduation rates.
Just as we can use the summary command to find the 𝑅2 value of a simple linear regression, we can
use summary to find the 𝑅2 of our multiple regression model:
[11]: ## Run this code but do not edit it
# summary of tuition_grad_model
summary(tuition_grad_model)

Call:
lm(formula = default_rate ~ net_tuition + grad_rate, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.9530 -1.4051 -0.1913 0.9162 20.4882

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.478742 0.293922 49.261 <2e-16 ***
net_tuition 0.006692 0.013066 0.512 0.609
grad_rate -0.160296 0.006023 -26.616 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.609 on 1050 degrees of freedom

Multiple R-squared: 0.5153, Adjusted R-squared: 0.5143
F-statistic: 558.1 on 2 and 1050 DF, p-value: < 2.2e-16

2.4 - How does the 𝑅2 value of our multiple regression model (tuition_grad_model) compare
to the 𝑅2 value of our simple linear regression model (tuition_model). Did adding grad_rate
alongside net_tuition help make the model’s predictions stronger? Explain.
Double-click this cell to type your answer here: The R^2 for the multiple regression model
is higher, so adding grad_rate to net_tuition made the model’s predictions stronger.

0.1.5 3.0 - Making your own multiple regression models

3.1 - There’s no reason that you have to stop at 2 predictors. Your model could have many
predictors! Use the lm command to create a model that predicts default_rate using three pre-

8
dictor variables: net_tuition, grad_rate, and pct_PELL. Store the model in an object called
tuition_grad_pell_model and then print out the model.

[12]: # Your code goes here

tuition_grad_pell_model <- lm(default_rate ~ net_tuition + grad_rate +␣
↪pct_PELL, data=dat)

tuition_grad_pell_model

Call:
lm(formula = default_rate ~ net_tuition + grad_rate + pct_PELL,
data = dat)

Coefficients:
(Intercept) net_tuition grad_rate pct_PELL
8.51264 0.03059 -0.11731 0.09045

Check yourself: When you print out the model, you should see four numbers: 8.513, 0.031, -0.117,
0.090
3.2 - Interpret (in context) the slope value for pct_PELL from your model.
Double-click this cell to type your answer here: For every 1 percent increase of the proportion
of Pell grant receiving students at the instituion, the expected value of the default rate increases
by 0.090 percent.
3.3 - Use the summary command to find the 𝑅2 value of the tuition_grad_pell_model.

[13]: # Your code goes here

summary(tuition_grad_pell_model)

Call:
lm(formula = default_rate ~ net_tuition + grad_rate + pct_PELL,
data = dat)

Residuals:
Min 1Q Median 3Q Max
-8.1491 -1.2449 -0.0454 1.0199 18.8640

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.512643 0.552863 15.397 <2e-16 ***
net_tuition 0.030587 0.012354 2.476 0.0134 *
grad_rate -0.117307 0.006603 -17.766 <2e-16 ***
pct_PELL 0.090451 0.007275 12.432 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

9
Residual standard error: 2.437 on 1049 degrees of freedom
Multiple R-squared: 0.5775, Adjusted R-squared: 0.5763
F-statistic: 478 on 3 and 1049 DF, p-value: < 2.2e-16

Check yourself: The output should show an 𝑅2 value of 0.5775

3.4 - Compare the 𝑅2 values from tuition_grad_model and tuition_grad_pell_model. Did
adding pct_PELL strengthen the model’s predictions? If so, did it strengthen the model’s predictions
by a large amount? Explain.
Double-click this cell to type your answer here: Yes, the model’s factor of determination
increased from 0.51 to 0.57, which is an improvement on its predictive capability on our data.
3.5 - Create your own multiple regression model, using variables of your own choosing. Analyze
the slope values from at least two separate predictors and try to maximize the 𝑅2 value.
Hints: - Look at the dataset description here to identify good potential predictor variables for
your model. - You may be tempted to use all the variables in the dataset as predictors. This may
not be the best idea. The next notebook will explore why.
[21]: # Your code goes here
fam_income_tuition_model <- lm(default_rate ~ med_alum_earnings + grad_rate +␣
↪pct_PELL, data=dat)

summary(fam_income_tuition_model)

Call:
lm(formula = default_rate ~ med_alum_earnings + grad_rate + pct_PELL,
data = dat)

Residuals:
Min 1Q Median 3Q Max
-8.124 -1.339 -0.184 1.097 18.563

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.812792 0.593973 16.521 < 2e-16 ***
med_alum_earnings -0.042631 0.008303 -5.135 3.37e-07 ***
grad_rate -0.089102 0.007170 -12.427 < 2e-16 ***
pct_PELL 0.081407 0.007222 11.272 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.414 on 1049 degrees of freedom

Multiple R-squared: 0.5855, Adjusted R-squared: 0.5843
F-statistic: 493.8 on 3 and 1049 DF, p-value: < 2.2e-16

10
0.1.6 Feedback (Required)
Please take 2 minutes to fill out this anonymous notebook feedback form, so we can continue
improving this notebook for future years!

PS4 PDF
No ratings yet
PS4 PDF
10 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Data Interpretation Guide For All Competitive and Admission Exams
From Everand
Data Interpretation Guide For All Competitive and Admission Exams
Mohmmad Khaja Shareef
2.5/5 (6)
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
No ratings yet
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
25 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Solved - The Data in WAGE2.RAW On Working Men Was Used To Estima...
0% (1)
Solved - The Data in WAGE2.RAW On Working Men Was Used To Estima...
1 page
A Study of Solubility of Strontium Sulfate: Gil G/DM
No ratings yet
A Study of Solubility of Strontium Sulfate: Gil G/DM
9 pages
Analysis of Mtcars
100% (1)
Analysis of Mtcars
3 pages
Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
17 pages
Lesson Week 13
No ratings yet
Lesson Week 13
6 pages
Rudimentary Quantitative Analytics in Building A College Financial Decision Support System
No ratings yet
Rudimentary Quantitative Analytics in Building A College Financial Decision Support System
40 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
R Multiple Regression Exercise 2019
No ratings yet
R Multiple Regression Exercise 2019
6 pages
Lecture 01
No ratings yet
Lecture 01
26 pages
Data Mining
100% (1)
Data Mining
6 pages
Final Challenge Report - Bryan Sebastian Vasquez
No ratings yet
Final Challenge Report - Bryan Sebastian Vasquez
7 pages
DS Final Report
No ratings yet
DS Final Report
5 pages
Tutorial 1-13 Answer Intermediate Macro
No ratings yet
Tutorial 1-13 Answer Intermediate Macro
40 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Lesson 4 8 Answer Key AP Precalculus Math Medic Db114f9b6f
No ratings yet
Lesson 4 8 Answer Key AP Precalculus Math Medic Db114f9b6f
3 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Swoboda_A07_CH12_MultRegression_TopUnivTemplate
No ratings yet
Swoboda_A07_CH12_MultRegression_TopUnivTemplate
20 pages
Case Problem #5
No ratings yet
Case Problem #5
18 pages
ch4 PDF
No ratings yet
ch4 PDF
32 pages
Multivariate Regression, slides
No ratings yet
Multivariate Regression, slides
61 pages
Multlin 4
No ratings yet
Multlin 4
6 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
No ratings yet
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridge instant download
53 pages
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
No ratings yet
06 - Grouped and Dummy Regression - Causal Inference For The Brave and True
5 pages
Analysis of Student Loan Repayment PDF
No ratings yet
Analysis of Student Loan Repayment PDF
9 pages
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual download
100% (2)
Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual download
52 pages
22 Linear Fit Post
No ratings yet
22 Linear Fit Post
7 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
C Programming Concepts
From Everand
C Programming Concepts
Jitendra Patel
No ratings yet
ETR 560 Final Project by Z1782470
No ratings yet
ETR 560 Final Project by Z1782470
15 pages
Homework 1
No ratings yet
Homework 1
3 pages
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
No ratings yet
Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages
24 pages
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
100% (1)
Student Solutions Manual to Introductory Econometrics 2nd edition Edition Jeffrey M. Wooldridgepdf download
45 pages
College Major Analysis 1
No ratings yet
College Major Analysis 1
5 pages
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Individual Variable Data Analysis: Warning
No ratings yet
Individual Variable Data Analysis: Warning
38 pages
Note 4
No ratings yet
Note 4
18 pages
Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manualinstant download
100% (4)
Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manualinstant download
52 pages
Get Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual Free All Chapters Available
100% (14)
Get Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual Free All Chapters Available
50 pages
Homework 2
100% (1)
Homework 2
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
100% (5)
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
50 pages
DSAAct7
No ratings yet
DSAAct7
4 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Topic1 P
No ratings yet
Topic1 P
301 pages
Download Study Resources for Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual
100% (12)
Download Study Resources for Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual
52 pages
Human Capital and Economic Growth - Statistical Approach
No ratings yet
Human Capital and Economic Growth - Statistical Approach
10 pages
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
100% (9)
Immediate download Using Econometrics A Practical Guide 6th Edition Studenmund Solutions Manual all chapters
55 pages
Case Study 1
No ratings yet
Case Study 1
4 pages
Linear Mixed Models in Stata
No ratings yet
Linear Mixed Models in Stata
17 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Interpretation and Statistical Inference With OLS Regressions
No ratings yet
Interpretation and Statistical Inference With OLS Regressions
17 pages
ECON6003 Course Syllabus
No ratings yet
ECON6003 Course Syllabus
10 pages
Fyp PPT Final
No ratings yet
Fyp PPT Final
18 pages
Pharma Supply Chain
No ratings yet
Pharma Supply Chain
63 pages
Multiple Linear Regression: Response Explanatory - I
No ratings yet
Multiple Linear Regression: Response Explanatory - I
5 pages
Logistic SPSS
100% (1)
Logistic SPSS
29 pages
Drought Nilufa
No ratings yet
Drought Nilufa
19 pages
GATE 2025 Syllabus For Data Science Artificial Intelligence DA
No ratings yet
GATE 2025 Syllabus For Data Science Artificial Intelligence DA
2 pages
Business Statistics Syllabus
No ratings yet
Business Statistics Syllabus
3 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
أثر عمليات اعادة التأمين على الأداء المالي لشركات التأمينات العامة المصرية ؛ دراسة كمية - محمد السيد حافظ ؛ مجلة الدراسات المالية والتجارية (تجارة بني سويف) ، ، مج29 ، ع2 ، 2019
No ratings yet
أثر عمليات اعادة التأمين على الأداء المالي لشركات التأمينات العامة المصرية ؛ دراسة كمية - محمد السيد حافظ ؛ مجلة الدراسات المالية والتجارية (تجارة بني سويف) ، ، مج29 ، ع2 ، 2019
42 pages
Analisis Fix
No ratings yet
Analisis Fix
4 pages
Effect of Islamic Work Ethics On Job Performance M
No ratings yet
Effect of Islamic Work Ethics On Job Performance M
13 pages
International Biometric Society
No ratings yet
International Biometric Society
21 pages
(eBook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Business Analytics 7th Edition 2024 scribd download
100% (8)
(eBook PDF) Spreadsheet Modeling and Decision Analysis: A Practical Introduction to Business Analytics 7th Edition 2024 scribd download
32 pages
CA4229 Week 3 Land Use Planning - Applied Val
No ratings yet
CA4229 Week 3 Land Use Planning - Applied Val
69 pages
Applied Ordinal Logistic Regression Using Stata by Xing Liu pdf download
100% (1)
Applied Ordinal Logistic Regression Using Stata by Xing Liu pdf download
31 pages
IILM Greater Noida PGDM
No ratings yet
IILM Greater Noida PGDM
2 pages
The Influence of Brand Awareness and Price On Amazon Prime Video's SVOD Subscribe Intention in Generations Y and Z in Indonesia
No ratings yet
The Influence of Brand Awareness and Price On Amazon Prime Video's SVOD Subscribe Intention in Generations Y and Z in Indonesia
8 pages
Articol 4
No ratings yet
Articol 4
9 pages
Assessing The Restorative Components
No ratings yet
Assessing The Restorative Components
12 pages
Final
No ratings yet
Final
13 pages
QMS 064-DAS-Content
No ratings yet
QMS 064-DAS-Content
3 pages
Introduction To Econometrics PDF
No ratings yet
Introduction To Econometrics PDF
13 pages
University of Helsinki Dog Study
No ratings yet
University of Helsinki Dog Study
10 pages
Biggest Loser 2015 Article
No ratings yet
Biggest Loser 2015 Article
19 pages
Mess PDF
No ratings yet
Mess PDF
94 pages
Lean Six Sigma Green Belt Curriculum
No ratings yet
Lean Six Sigma Green Belt Curriculum
6 pages
Towards AI-Enabled Hardware Security: Challenges and Opportunities
No ratings yet
Towards AI-Enabled Hardware Security: Challenges and Opportunities
10 pages

Notebook 3 - Multiple Regression

Uploaded by

Notebook 3 - Multiple Regression

Uploaded by

Notebook 3 - Multiple Regression

May 22, 2024

�� CourseKata packages ������������������������������������

0.1.2 The Dataset (four_year_colleges.csv)

0.1.3 1.0 - Motivating multiple regression

dat <- read.csv('https://skewthescript.org/s/four_year_colleges.csv')

OPEID name city state region me

[3]: ## Run this code but do not edit it

0.1.4 2.0 - Fitting and interpreting a multiple regression model

Residual standard error: 3.375 on 1051 degrees of freedom

𝑦 ̂ = 14.479 + (0.007)𝑥1 + (−0.160)𝑥2

Residual standard error: 2.609 on 1050 degrees of freedom

0.1.5 3.0 - Making your own multiple regression models

[12]: # Your code goes here

[13]: # Your code goes here

Check yourself: The output should show an 𝑅2 value of 0.5775

Residual standard error: 2.414 on 1049 degrees of freedom

You might also like

�� CourseKata packages ��