0% found this document useful (0 votes)

2K views11 pages

Notebook 2 - Linear Regression

This document outlines a data science project focused on analyzing student debt across colleges using linear regression in R. It includes instructions for creating scatterplots, fitting regression models, and interpreting the relationships between various college metrics and student loan default rates. The dataset used is from the US Department of Education's College Scorecard Database, specifically for four-year colleges.

Uploaded by

simoncheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2K views11 pages

Notebook 2 - Linear Regression

Uploaded by

simoncheng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Notebook 2 - Linear Regression

May 22, 2024

Reference Guide for R (student resource) - Check out our reference guide for a full listing
of useful R commands for this project.

0.1 Data Science Project: Use data to determine the best and worst colleges
for conquering student debt.
0.1.1 Notebook 2: Simple Linear Regression
Does college pay off? We’ll use some of the latest data from the US Department of Education’s
College Scorecard Database to answer that question.
In this notebook (the 2nd of 4 total notebooks), you’ll use R to create scatterplots, fit simple linear
regression models, and compare the strength of your models. By the end of this notebook, you’ll
see what factors make certain colleges better investments than others.
[1]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code
# This command downloads a useful package of R commands
library(coursekata)

�� CourseKata packages ��

coursekata 0.15.0 ��
� dslabs 0.8.0 � Metrics
0.1.4
� Lock5withR 1.2.2 � lsr
0.5.2
� fivethirtyeightdata 0.1.0 � mosaic
1.9.1
� fivethirtyeight 0.6.2 � supernova
3.0.0

[ ]:

0.1.2 The Dataset (four_year_colleges.csv)

General description - In this notebook, we’ll be using the four_year_colleges.csv file, which
only includes schools that offer four-year bachelors degrees and/or higher graduate degrees. Com-

1
munity colleges and trade schools often have different goals (e.g. facilitating transfers, direct career
education) than institutions that offer four-year bachelors degrees. By comparing four-year colleges
only to other four-year colleges, we’ll have clearer analyses and conclusions.
This data is a subset of the US Department of Education’s College Scorecard Database. The data
is current as of the 2020-2021 school year.
Description of all variables: See here
Detailed data file description: See here

0.1.3 1.0 - Creating scatterplots

To begin, let’s download our data. We’ll download the four_year_colleges.csv file from the
skewthescript.org website and store it in an R dataframe called dat.

[2]: ## Run this code but do not edit it. Hit Ctrl+Enter to run the code
# This command downloads the data
dat <- read.csv('https://skewthescript.org/s/four_year_colleges.csv')

1.1 - Use the head command to print out the first several rows of the dataset.
[3]: # Your code goes here
head(dat)

OPEID name city state region me

<int> <chr> <chr> <chr> <chr> <d
1 100200 Alabama A & M University Normal AL South 15.
2 105200 University of Alabama at Birmingham Birmingham AL South 15.
A data.frame: 6 × 26
3 105500 University of Alabama in Huntsville Huntsville AL South 14.
4 100500 Alabama State University Montgomery AL South 17.
5 105100 The University of Alabama Tuscaloosa AL South 17.
6 831000 Auburn University at Montgomery Montgomery AL South 12.
1.2 - Use the dim command to find the number of colleges (rows) and number of variables (columns)
in our dataset.
[4]: # Your code goes here
dim(dat)

1. 1053 2. 26
Check yourself: Your code should have printed out two numbers: 1053 and 26.
A good measure of whether attending a certain college “pays off” is its student loan default
rate. If a college is low-cost and prepares students for high-paying jobs, few students will default
on their loans. If a college is high-cost and does not prepare students for high-paying jobs, many
students will have trouble paying off their loans (high default rate).
So, our main outcome variable in this analysis will be default_rate. We’re going to use scatter-
plots to see how strongly different predictor variables correlate with default rates. In particular,
we’re going to explore how well each of the following variables predicts colleges’ default rates: -
pct_PELL - percent of student body that receives PELL grants. Note: PELL grants are government

2
scholarships given to students from low-income families - grad_rate - percent of students who suc-
cessfully graduate - net_tuition - Net tuition (tuition minus average discounts and allowances)
per student, in thousands of dollars
To begin, let’s create a scatterplot of colleges’ default rates and the percent of their student body
that receive PELL grants. We can use the gf_point command to make the graph:

[5]: ## Run this code but do not edit it

# Create scatterplot: default_rate ~ pct_PELL
gf_point(default_rate ~ pct_PELL, data = dat)

We see that there’s a positive relationship between pct_PELL and default_rate. The colleges with
the highest rates of PELL grant recipients (low-income students) also tend to have higher student
loan default rates. In other words, if you were to fit a model to this data, it would predict higher
default rates at schools that serve more PELL recipients.
We must keep in mind: correlation is not causation. The scatterplot shows us that default
rates and PELL recipient rates are positively correlated. However, the graph doesn’t show us a
clear causal explanation behind the correlation. For example, here are several causal explanations
that this graph can’t clarify: - PELL recipients may only be able to afford to attend low-quality
colleges. These colleges have higher default rates because they fail to prepare students for the
workforce. - PELL recipients may have less familial resources to weather the storms of financial
emergencies in the first few years after college. So, the schools that serve PELL recipients at high
rates will also have more of their students defaulting on loans (regardless of the school’s quality). -
PELL recipients may have attended lower-quality high schools, which don’t properly prepare them

3
for college. So, these students may drop out of college at higher rates, which raises their chances
of defaulting on student loans.
Or, it could be a combination of all those explanations! We can’t tell from this analysis alone.
1.3 - In the next question, you will create a scatterplot that visualizes the relationship between
grad_rate and default_rate. Before doing so, make a prediction: Do you expect student loan
default rates to positively or negatively correlate with graduation rates? Why?
Double-click this cell to type your answer here: Negatively correlate because non-graduates
will have a harder time in the workforce and have an increased rate of defaulting on their loans, so
lower grad rate would lead to higher default rates, therefore a negative correlation.
1.4 - Create a scatterplot that visualizes the relationship between grad_rate (predictor) and
default_rate (outcome).

[6]: # Your code goes here

gf_point(default_rate ~ grad_rate, data = dat)

Check yourself: Your code should have generated a scatterplot with the x-axis labled with
grad_rate and the y-axis labeled with default_rate.
1.5 - Using your scatterplot, describe the relationship between graduation rates and student loan
default rates. For instance, are these variables positively or negatively related? How can you tell?
Does this corroborate your prediction from Question 1.3? Explain.
Double-click this cell to type your answer here: negatively linear correlation because the

4
points strongly match a negative linear prediction, this matches my prediction from the previous
question.

0.1.4 2.0 - Simple linear regression (one predictor)

2.1 - If you haven’t taken AP Stats, watch this video, which provides an introduction to linear
regression.
Note: This video is adapted from other materials and covers data from a separate context. How-
ever, the video provides a good intro to the concepts and models we’ll be using in this section of
the project.
Let’s create a linear regression model relating pct_PELL (x) and default_rate (y). To visualize our
model, we can graph the line modeled by our equation on top of the scatterplot relating pct_PELL
to default_rate. We use the gf_point command to produce the scatterplot, the gf_lm command
to graph our linear model, and the %>% symbol to put the elements together on the same graph:
[8]: ## Run this code but do not edit it
# Overlay linear model of default_rate ~ pct_PELL on top of scatterplot
gf_point(default_rate ~ pct_PELL, data = dat) %>% gf_lm(color = "orange")

2.2 - Is the slope value of this model positive or negative? How can you tell?
Double-click this cell to type your answer here: positive because as pct_PELL increases,
default_rate increases as well.

5
R can help us find the equation that models this linear regression line. As shown in the video,
we can model a linear trend between a predictor (x) and outcome (y) using this linear regression
formula:

𝑦 ̂ = 𝛽 0 + 𝛽1 𝑥
Where: - 𝑦 ̂ (pronounced “y hat”) is the predicted y-value (predicted outcome value) - 𝛽0 (pro-
nounced “beta zero”) is the y-intercept –> the predicted y-value (outcome value) when x = 0
(the predictor’s value is 0) - 𝛽1 (pronounced “beta 1”) is the slope –> the predicted change in y
(outcome) for a 1-unit increase in x (predictor) - 𝑥 is the x-value (predictor value)
To fit a linear regression model to a set of data in R, we use the lm command. lm stands for
“linear model.” Here, we use lm to find the linear regression model relating pct_PELL (x) and
default_rate (y).

[9]: ## Run this code but do not edit it

# Create and display linear model: default_rate ~ pct_PELL
PELL_model <- lm(default_rate ~ pct_PELL, data = dat)
PELL_model

Call:
lm(formula = default_rate ~ pct_PELL, data = dat)

Coefficients:
(Intercept) pct_PELL
-0.9327 0.1765

The output of the lm command is a bit clunky, but here’s what it means: - The (Intercept) value
is the y-intercept (𝛽0 ) - The pct_PELL value is the coeﬀicient for the predictor. In other words, it’s
the slope (𝛽1 )
So, our regression equation can be written as:

𝑦 ̂ = −0.9327 + (0.1765)𝑥

2.3 - Identify the slope value and interpret what it means (in context).
Double-click this cell to type your answer here: For every increase of 1% in percentage of
PELL grant receiving students, there is expected to be a 0.1675% increase in default rate.
2.4 - Use the gf_point and gf_lm commands to visualize a linear regression model for predicting
default_rate (outcome) using grad_rate (predictor).

[10]: # Your code goes here

gf_point(default_rate ~ grad_rate, data=dat) %>% gf_lm(color="blue")

6
Check yourself: Your scatterplot should have a line on it with a negative slope.
2.5 - Use the lm command to find the linear regression model you visualized above. Store the
model in an object called grad_model and print it to see its values.

[12]: # Your code goes here

grad_model <- lm(default_rate ~ grad_rate, data=dat)
grad_model

Call:
lm(formula = default_rate ~ grad_rate, data = dat)

Coefficients:
(Intercept) grad_rate
14.4600 -0.1584

Check yourself: If you print out grad_model, you should see two numbers: 14.46 and -0.1584.
2.6 - Identify the slope value and interpret what it means (in context).
Double-click this cell to type your answer here: for every 1% increase in graduation rate,
there is expected to be a 0.1584% decrease in default rate.

7
0.1.5 3.0 - Analyzing strength (𝑅2 )
In addition to the direction of a relationship (positive or negative), we can also look at the strength
of a relationship. The strength is a measure of the quality of our model’s predictions. A key
metric for analyzing the strength of a model is 𝑅2 . The following diagram (from Skew The Script)
shows the 𝑅2 values of various linear models:
In the “weak” correlations, we see that our predictions (the linear model) tend to be far away from
the actual data values (the points). If we used a model with weak correlation to predict new data
values, our predictions would have high error. If we used a model with strong correlation to predict
new data values, our predictions would have low error.
𝑅2 takes values between 0 - 1 (alternatively: 0% - 100%). The stronger the model, the closer 𝑅2
gets to 1 (or 100%). The weaker the model, the closer 𝑅2 gets to 0 (or 0%). An intuitive way to
think about it: for the perfectly strong correlations, the model gives 100% perfect predictions. The
models explain 100% of the variation in the data, so 𝑅2 = 100%. As the correlations get weaker,
they start leaving room for error, since the models capture less of the variation in the data. So, the
𝑅2 value declines from 100%, approaching 0% if there’s no correlation (model adds no prediction
power compared to naive guessing).
Optional Resource: If you’d like a more thorough explanation of the math behind 𝑅2 , check out
this video.
To see the 𝑅2 values of our linear regression models, we can use the summary command. For
example, here we get the summary printout of grad_model.

[13]: ## Run this code but do not edit it

# Summarize default_rate ~ grad_rate model
summary(grad_model)

Call:
lm(formula = default_rate ~ grad_rate, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.9199 -1.4038 -0.2248 0.9011 20.5450

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.45997 0.29152 49.60 <2e-16 ***
grad_rate -0.15839 0.00474 -33.42 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.608 on 1051 degrees of freedom

Multiple R-squared: 0.5151, Adjusted R-squared: 0.5147
F-statistic: 1117 on 1 and 1051 DF, p-value: < 2.2e-16

There’s a lot going on in this printout. For now, focus at the bottom of the printed information.

8
The Multiple R-squared value is the 𝑅2 value for the model. In this case, 𝑅2 = 51.5%. So, we
can say that the correlation between graduation rates and student loan default rates is moderately
strong. This model would yield moderately strong predictions for default rates if used to predict
on new colleges.
3.1 - Let’s consider a new variable: net_tuition (tuition minus average discounts and allowances
per student, in thousands of dollars). How well does a school’s tuition predict its student loan
default rate? Let’s start exploring. Go ahead and create a scatterplot that visualizes the relationship
between net_tuition (predictor) and default_rate (outcome). Overlay a linear regression
model on the graph using the %>% gf_lm(color = "orange") command.

[15]: # Your code goes here

gf_point(default_rate ~ net_tuition, data=dat) %>% gf_lm(color="orange")

3.2 - Use the lm command to find the linear regression model you visualized above. Store the
model in an object called tuition_model and print out the model’s values.
[18]: # Your code goes here
tuition_model <- lm(default_rate ~ net_tuition, data=dat)
tuition_model

Call:
lm(formula = default_rate ~ net_tuition, data = dat)

9
Coefficients:
(Intercept) net_tuition
8.0029 -0.2077

Check yourself: If you print out tuition_model, you should see two numbers: 8.0029 and -0.2077.
3.3 - Use the summary command to find the 𝑅2 value of your linear model.
[19]: # Your code goes here
summary(tuition_model)

Call:
lm(formula = default_rate ~ net_tuition, data = dat)

Residuals:
Min 1Q Median 3Q Max
-6.4480 -1.9912 -0.5984 1.2492 25.4189

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.00294 0.21329 37.52 <2e-16 ***
net_tuition -0.20772 0.01331 -15.61 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.375 on 1051 degrees of freedom

Multiple R-squared: 0.1882, Adjusted R-squared: 0.1875
F-statistic: 243.7 on 1 and 1051 DF, p-value: < 2.2e-16

Check yourself: The 𝑅2 value for tuition_model should be 0.1882.

3.4 - When evaluating different college options to predict if attending them would “pay off,” many
students look very closely at the tuition and costs of attending. Very few students look at colleges’
graduation rates. Is this reasonable or a mistake? Justify your answers using the 𝑅2 values for the
grad_model and tuition_model.
Double-click this cell to type your answer here: This is a mistake because the 0.5151 R^2
value for the predictive power of the relation between default rate and grad rate is much stronger
(moderate vs weak) of a predictor than the 0.1882 R^2 value for the predictive power of the relation
between default rate and net tuition.
3.5 - The correlation between tuition costs and student loan default rates is negative. This means
that as tuition costs get higher, fewer student tend to default on their student loans. Is that
possible? What might be going on here?
Double-click this cell to type your answer here: Yes, because higher tuition schools may
tend to give more useful educations that allow for lower default rates, or perhaps because students
who can afford higher tuition schools are less at risk to default on their loans after schooling.

10
0.1.6 Feedback (Required)
Please take 2 minutes to fill out this anonymous notebook feedback form, so we can continue
improving this notebook for future years!

Least Significance Difference Test (LSD)
No ratings yet
Least Significance Difference Test (LSD)
54 pages
All Life Bank - AIML - ML - Project - Low - Code - Notebook
No ratings yet
All Life Bank - AIML - ML - Project - Low - Code - Notebook
78 pages
Time Series Forecasting Project (Shoe Sales)
No ratings yet
Time Series Forecasting Project (Shoe Sales)
26 pages
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
100% (1)
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
140 pages
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
100% (1)
Machine Learning - Nabeel Khan - Final Project Report - Problem 2
24 pages
Answer Book (Ashish)
100% (1)
Answer Book (Ashish)
21 pages
Anshul Dyundi Machine Learning July 2022
50% (2)
Anshul Dyundi Machine Learning July 2022
46 pages
Week 1 Graded Quiz On Solution PDF
100% (1)
Week 1 Graded Quiz On Solution PDF
2 pages
Car Transport Prediction
100% (2)
Car Transport Prediction
27 pages
ML Week 3 Logistic Regression
60% (10)
ML Week 3 Logistic Regression
6 pages
2025 IFT CFA Level I Facts and Formula Sheet hd4wwj
No ratings yet
2025 IFT CFA Level I Facts and Formula Sheet hd4wwj
17 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Bankruptcy Prevention Project
No ratings yet
Bankruptcy Prevention Project
16 pages
AV Project Shivakumar Vanga
100% (1)
AV Project Shivakumar Vanga
37 pages
Business Report: Pgpdsba Advanced Statistics Module Project
100% (3)
Business Report: Pgpdsba Advanced Statistics Module Project
18 pages
Palash Bhai - Machine Learning Assignment
100% (2)
Palash Bhai - Machine Learning Assignment
18 pages
Machine Learning (Project5) PDF
100% (2)
Machine Learning (Project5) PDF
13 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
PM ProjectJune - 2021
100% (1)
PM ProjectJune - 2021
33 pages
Capstone Project - Final Submission
No ratings yet
Capstone Project - Final Submission
36 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Advanced Statistics
100% (1)
Advanced Statistics
16 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
17 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Assignment 2
100% (1)
Assignment 2
8 pages
Machine Learning Coursera Quiz 2
100% (1)
Machine Learning Coursera Quiz 2
6 pages
2018 AJC H2 Prelim P2 + Solution (Remove CNR)
No ratings yet
2018 AJC H2 Prelim P2 + Solution (Remove CNR)
18 pages
Panel Data Analysis
No ratings yet
Panel Data Analysis
364 pages
CH 8 Response Surface Methods (Central Composite Designs, CCDS)
No ratings yet
CH 8 Response Surface Methods (Central Composite Designs, CCDS)
34 pages
Standard Deviation
No ratings yet
Standard Deviation
5 pages
Cheat Sheet
No ratings yet
Cheat Sheet
163 pages
Week Systematic Errors and Random Errors in Analysis
No ratings yet
Week Systematic Errors and Random Errors in Analysis
9 pages
Business Report SMDM Bhushan
No ratings yet
Business Report SMDM Bhushan
18 pages
Business Analytics Report: Submitted To
No ratings yet
Business Analytics Report: Submitted To
32 pages
Notebook 3 - Multiple Regression
No ratings yet
Notebook 3 - Multiple Regression
11 pages
Chapter 5
No ratings yet
Chapter 5
10 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Vijayalakshmi
No ratings yet
Vijayalakshmi
17 pages
Comps Sample Questions Applied Statistics Methods
No ratings yet
Comps Sample Questions Applied Statistics Methods
135 pages
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
No ratings yet
Advanced Statistics ANOVA PCA EDA Project Report 3 Great Lakes
28 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Partial Least Squares Structural Equation Modeling: September 2017
No ratings yet
Partial Least Squares Structural Equation Modeling: September 2017
41 pages
Problem 1 - (Download Data) : Importing Nessceary Libraries
No ratings yet
Problem 1 - (Download Data) : Importing Nessceary Libraries
16 pages
Problem 2 - Survey: Importing Nessceary Libraries
No ratings yet
Problem 2 - Survey: Importing Nessceary Libraries
10 pages
1) Introduction A) Defining Problem Statement:-: ST ST
No ratings yet
1) Introduction A) Defining Problem Statement:-: ST ST
10 pages
Project: Advanced Statistics: Anova, Eda and Pca
No ratings yet
Project: Advanced Statistics: Anova, Eda and Pca
35 pages
MySQL - Week 1 Quiz
No ratings yet
MySQL - Week 1 Quiz
9 pages
AS Extended Buisnesss Report
No ratings yet
AS Extended Buisnesss Report
25 pages
Advanced Statistics Project
No ratings yet
Advanced Statistics Project
23 pages
Netflix PDF
No ratings yet
Netflix PDF
12 pages
Materi 5 - Heterokedastisitas Dan Multikolinearitas
No ratings yet
Materi 5 - Heterokedastisitas Dan Multikolinearitas
39 pages
Portofolio
No ratings yet
Portofolio
25 pages
Asphalt Shingles Data Analysis PDF
No ratings yet
Asphalt Shingles Data Analysis PDF
4 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
End Term Quiz1 - Attempt Review
No ratings yet
End Term Quiz1 - Attempt Review
5 pages
FIN213 - Semester Test 2 Solutions Memo 20240503
No ratings yet
FIN213 - Semester Test 2 Solutions Memo 20240503
13 pages
Ch02-Regression Handout
No ratings yet
Ch02-Regression Handout
22 pages
Simple Regression Quiz
No ratings yet
Simple Regression Quiz
6 pages
Notebook 1 - Basic R & Data Exploration
No ratings yet
Notebook 1 - Basic R & Data Exploration
19 pages
ARIMA Box-Jenkins 1st
No ratings yet
ARIMA Box-Jenkins 1st
15 pages
Probability Distribution On Discrete Random Variables DLP
No ratings yet
Probability Distribution On Discrete Random Variables DLP
3 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
ECON3050 - Regression Analysis Research Paper
No ratings yet
ECON3050 - Regression Analysis Research Paper
17 pages
Predictive Modeling - Supporting File1
No ratings yet
Predictive Modeling - Supporting File1
3 pages
Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
The Blahut-Arimoto Algorithm For TheCalculation of The Capacity of A Discrete Memoryless Channel - Lawrence Ip - Ps
No ratings yet
The Blahut-Arimoto Algorithm For TheCalculation of The Capacity of A Discrete Memoryless Channel - Lawrence Ip - Ps
8 pages
Problem Set 1
No ratings yet
Problem Set 1
4 pages
Continuous Random Variable
No ratings yet
Continuous Random Variable
8 pages
BA Assignment
No ratings yet
BA Assignment
6 pages
Notebook 3 - Multiple Regression
No ratings yet
Notebook 3 - Multiple Regression
10 pages
Lead Scoring Subjective Questions
No ratings yet
Lead Scoring Subjective Questions
3 pages
Stochastic Frontier Analysis and DEA
No ratings yet
Stochastic Frontier Analysis and DEA
3 pages
Digital Communication Systems by Simon Haykin-133
No ratings yet
Digital Communication Systems by Simon Haykin-133
6 pages
Identifying Models Using Kendall Notation
No ratings yet
Identifying Models Using Kendall Notation
4 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
16 pages
Universiti Utara Malaysia Additional Assessment: Confidential 1 CS/JAN 2022/SSQL1113
No ratings yet
Universiti Utara Malaysia Additional Assessment: Confidential 1 CS/JAN 2022/SSQL1113
3 pages
Practice Midterm2 Fall2011
No ratings yet
Practice Midterm2 Fall2011
9 pages
2nd Year Stat ch.12 Test
No ratings yet
2nd Year Stat ch.12 Test
1 page
Why Do You Need To Scale Data in KNN: 3 Answers
No ratings yet
Why Do You Need To Scale Data in KNN: 3 Answers
1 page
N - 9 N - 15 M - 33 M - 42 SS - 740 SS - 1240: Males Females
No ratings yet
N - 9 N - 15 M - 33 M - 42 SS - 740 SS - 1240: Males Females
3 pages
G 2 Tos - Math3a
No ratings yet
G 2 Tos - Math3a
2 pages

Notebook 2 - Linear Regression

Uploaded by

Notebook 2 - Linear Regression

Uploaded by

Notebook 2 - Linear Regression

May 22, 2024

�� CourseKata packages ������������������������������������

0.1.2 The Dataset (four_year_colleges.csv)

0.1.3 1.0 - Creating scatterplots

OPEID name city state region me

[5]: ## Run this code but do not edit it

[6]: # Your code goes here

0.1.4 2.0 - Simple linear regression (one predictor)

[9]: ## Run this code but do not edit it

[10]: # Your code goes here

[12]: # Your code goes here

[13]: ## Run this code but do not edit it

Residual standard error: 2.608 on 1051 degrees of freedom

[15]: # Your code goes here

Residual standard error: 3.375 on 1051 degrees of freedom

Check yourself: The 𝑅2 value for tuition_model should be 0.1882.

You might also like

�� CourseKata packages ��