0% found this document useful (0 votes)

6 views7 pages

LinearRegressionLab

Lab about how to run a linear regression algorithm

Uploaded by

2021katelynsmith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views7 pages

LinearRegressionLab

Lab about how to run a linear regression algorithm

Uploaded by

2021katelynsmith

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Linear Regression Lab Resubmission

Katelyn Smith

2023-11-01

Purpose
To demonstrate the application of prediction using linear regression, assessing linear model fit, and interpreting
the output of a linear model.

The Data
This data is from the 1840 census. It provided counts of various demographics at a county level.
The data set was downloaded from the Integrated Public Use Microdata Series out of the University of
Minnesota. The documentation and data download can be found here
census1840 <- read_csv("C_1840.csv") %>%
drop_na() # delete all rows with any missing values

## Rows: 1276 Columns: 127

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): rectype
## dbl (126): year, region, statefip, stateicp, county, cntypopf, cntypops, qco...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Exploratory Analysis
# selecting only the numeric variables
# since those are th only variables
# eligible for a linear regression
census1840_num <- select_if(census1840, is.numeric)

# creating a literacy rate variable to

# be predicted by subtracting 1 from
# the total number of people who are
# illiterate divided by the total
# people
census1840_lit <- census1840_num %>%
mutate(lit_rate = 1 - (nlit_c/ntotal_c))

summary(census1840_lit$lit_rate)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 0.6790 0.9267 0.9678 0.9524 0.9919 1.0000

1
# creating proportions for the groupf
# of people to better standardize for
# population differences

census1840_prop1 <- census1840_lit %>%

mutate(female_prop = (nfemale_c/ntotal_c))

census1840_prop2 <- census1840_prop1 %>%

mutate(male_prop = (nmale_c/ntotal_c))

census1840_prop3 <- census1840_prop2 %>%

mutate(free_prop = (numperhh_c/ntotal_c))

census1840_prop4 <- census1840_prop3 %>%

mutate(slave_prop = (nslave_c/ntotal_c))

# visualization

ggplot(census1840_prop4, aes(x = lit_rate,

y = female_prop)) + geom_point() + ggtitle("Proportion of Females vs Literacy Rate at a County Level

Proportion of Females vs Literacy Rate at a County Level in 1840

0.5

0.4
female_prop

0.3

0.2

0.1

0.7 0.8 0.9 1.0

lit_rate
ggplot(census1840_prop4, aes(x = lit_rate,
y = slave_prop)) + geom_point() + ggtitle("Proportion of Slaves vs Literacy Rate at a County Level i

2
Proportion of Slaves vs Literacy Rate at a County Level in 1840

0.75
slave_prop

0.50

0.25

0.00

0.7 0.8 0.9 1.0

lit_rate
I chose to look at the minority groups as the compare the literacy rate because historically those groups have
had less access to education. Surprisingly, there seems to be a positive relationship between the literacy rate
and proprtion of slaves in a county. I have chosen the proportion of slaves as my predictor variable to further
explore this surprising result.

Linear Model
# Running the linear model linear model
# with lit_rate as the predicted
# variable and slave_prop as the
# predictor variable
Lit_model <- lm(data = census1840_prop4,
lit_rate ~ slave_prop)

# summarizing an printing the summary

# information for the linear model
(Lit_model_Summary <- summary(Lit_model))

##
## Call:
## lm(formula = lit_rate ~ slave_prop, data = census1840_prop4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.26772 -0.02617 0.01301 0.03859 0.05329
##

3
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.946710 0.001777 532.777 < 2e-16 ***
## slave_prop 0.031826 0.006337 5.023 5.82e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04864 on 1274 degrees of freedom
## Multiple R-squared: 0.01942, Adjusted R-squared: 0.01865
## F-statistic: 25.23 on 1 and 1274 DF, p-value: 5.821e-07
(f_val <- Lit_model_Summary$fstatistic[1])

## value
## 25.22695
(f_df1 <- Lit_model_Summary$fstatistic[2])

## numdf
## 1
(f_df2 <- Lit_model_Summary$fstatistic[3])

## dendf
## 1274
(f_pval <- pf(f_val, f_df1, f_df2, lower.tail = FALSE))

## value
## 5.820978e-07
(rsqr <- Lit_model_Summary$r.squared)

## [1] 0.01941689
(RSE <- sigma(Lit_model))

## [1] 0.04864105
From the summary of the linear model of the literacy rate and proportion of slaves, the equation of the line
is: yb = 0.947 + 0.032x. The slope indicates that as the proportion of slaves in a county increases by 1, the
literacy rate is predicted to increase by 3.18%. The intercept indicates that when the literacy rate is 0, the
county would be made up of 94% slaves.
This small slope indicates that while the literacy rate and proportion of slaves have a positive and statistically
significant relationship, it is not a very strong relationship. The adjusted R2 is only 0.017 which means that
1.7% of the variation in literacy rate can be explained by the proportion of slaves in a county and confirms
that the proportion of slaves in a county is not a good predictor of literacy rate.
The residual standard error is 4.86% meaning that the irreducible error quite low which is promising as is the
very low p-value, but the other model statistics show that despite these two statistics, this model is actually
not a very good fit to predict literacy rate.

Further model assessment

predictions <- augment(Lit_model)
# Plot data, model, residuals
predictions %>%
ggplot(aes(x = slave_prop)) + geom_line(aes(y = .fitted),

4
color = "blue") + geom_point(aes(y = .fitted),
color = "blue") + geom_point(aes(y = lit_rate),
color = "red") + geom_segment(aes(x = slave_prop,
y = .fitted, xend = slave_prop, yend = lit_rate),
alpha = 0.2)

1.0

0.9
.fitted

0.8

0.7

0.00 0.25 0.50 0.75

slave_prop
This graph shows visually what the descriptive model statistics reported. The points seem to be very far
from the line of best fit confirming that the proportion of slaves in a county does not predict the literacy rate
very well.
# Look at the residuals with a density
# plot
ggplot(predictions, aes(x = .resid)) + geom_density()

5
9
density

−0.2 −0.1 0.0

.resid
# Look at the residuals with a scatter
# plot
predictions %>%
ggplot(aes(x = slave_prop)) + geom_point(aes(y = .resid))

6
0.0
.resid

−0.1

−0.2

0.00 0.25 0.50 0.75

slave_prop
The density plot is skewed extremely left and the residual plot looks very similar to the confirming that
proportion of slaves in a county is not a very strong predictor of literacy rate.
Overall, despite the promising positive relationship in the scatter plot of proportion os slaves vs literacy rate
at a county level in 1840, there is not a strong redictive realtionship.

Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Pset 7 - Fall2019 - Solutions PDF
50% (2)
Pset 7 - Fall2019 - Solutions PDF
35 pages
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
100% (78)
Introductory Econometrics A Modern Approach 5th Edition Wooldridge Solutions Manual 1
6 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
VGLM Cbind Family Data : G G G G
No ratings yet
VGLM Cbind Family Data : G G G G
4 pages
Returns To Education: Chapter 1: Defining and Collecting Data
100% (1)
Returns To Education: Chapter 1: Defining and Collecting Data
13 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Assignment 5
No ratings yet
Assignment 5
13 pages
Notebook 2 - Linear Regression
No ratings yet
Notebook 2 - Linear Regression
11 pages
PS6_sol
No ratings yet
PS6_sol
7 pages
22 Linear Fit Post
No ratings yet
22 Linear Fit Post
7 pages
Notebook 4 - Machine Learning
No ratings yet
Notebook 4 - Machine Learning
17 pages
Notebook 3 - Multiple Regression
No ratings yet
Notebook 3 - Multiple Regression
11 pages
Exercise 5
No ratings yet
Exercise 5
3 pages
Assignment # 4
No ratings yet
Assignment # 4
7 pages
Homework 2 Questions
No ratings yet
Homework 2 Questions
7 pages
Laine Reed Linreg Project Report
No ratings yet
Laine Reed Linreg Project Report
6 pages
Untitled
No ratings yet
Untitled
5 pages
cs447_tool-assessing-linear-prediction-rules-with-residuals
No ratings yet
cs447_tool-assessing-linear-prediction-rules-with-residuals
7 pages
Problem Set 3
No ratings yet
Problem Set 3
2 pages
FECO Ex4
No ratings yet
FECO Ex4
4 pages
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
Jamboree Linear Regression Version 2 Jupyter Notebook
No ratings yet
Jamboree Linear Regression Version 2 Jupyter Notebook
12 pages
Lecture 4
No ratings yet
Lecture 4
25 pages
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
No ratings yet
Lab-4: Regression Analysis: Logistic & Multinomial Logistic Regression
10 pages
Lesson Week 13
No ratings yet
Lesson Week 13
6 pages
DSAAct7
No ratings yet
DSAAct7
4 pages
2101f12LogisticRegressionWithR1
No ratings yet
2101f12LogisticRegressionWithR1
10 pages
Linear Regression
No ratings yet
Linear Regression
73 pages
Impact Evaluation Universidad Del Rosario: Problem Set 3
No ratings yet
Impact Evaluation Universidad Del Rosario: Problem Set 3
10 pages
Untitled4 Assigment 3
No ratings yet
Untitled4 Assigment 3
9 pages
Assignment 5 (Dummy Variable) : Group 1
No ratings yet
Assignment 5 (Dummy Variable) : Group 1
28 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
RStudio
No ratings yet
RStudio
4 pages
Econometrics - Week 5 Tutorials 2024
No ratings yet
Econometrics - Week 5 Tutorials 2024
3 pages
Notebook 3_ Multiple Regression
No ratings yet
Notebook 3_ Multiple Regression
10 pages
Regression hw3
No ratings yet
Regression hw3
3 pages
STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1
No ratings yet
STAT 4540 Homework 1 Solution: 1 ISLR 2.4.1
6 pages
Chapter 10 Simple Linear Regression and Correlation
No ratings yet
Chapter 10 Simple Linear Regression and Correlation
28 pages
07 GLM
No ratings yet
07 GLM
49 pages
HW3
No ratings yet
HW3
19 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
5 pages
Homework 2
No ratings yet
Homework 2
14 pages
R Code Default Data PDF
No ratings yet
R Code Default Data PDF
10 pages
Document (1)
No ratings yet
Document (1)
4 pages
Unit 561 Unequal Variance and More With Answers
No ratings yet
Unit 561 Unequal Variance and More With Answers
13 pages
Individual Variable Data Analysis: Warning
No ratings yet
Individual Variable Data Analysis: Warning
38 pages
Lab 4 Classification v.0
No ratings yet
Lab 4 Classification v.0
5 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
Multicollinearity and Oaxaca -Tutorial
No ratings yet
Multicollinearity and Oaxaca -Tutorial
35 pages
Homework 1
No ratings yet
Homework 1
3 pages
RT1 Project 1&2 Assignment
No ratings yet
RT1 Project 1&2 Assignment
5 pages
Unit 554 Multivariate Analysis and Non Linearity Without Answers
No ratings yet
Unit 554 Multivariate Analysis and Non Linearity Without Answers
10 pages
ansprac2
No ratings yet
ansprac2
6 pages
Thomas Watson - Scatterplot practice
No ratings yet
Thomas Watson - Scatterplot practice
9 pages
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
No ratings yet
Lab 4: Logistic Regression: PSTAT 131/231, Winter 2019
10 pages
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
From Everand
Bilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation
Fouad Sabry
No ratings yet
Instant Download Language Put To Work: The Making of The Global Call Centre Workforce 1st Edition Enda Brophy (Auth.) PDF All Chapters
100% (6)
Instant Download Language Put To Work: The Making of The Global Call Centre Workforce 1st Edition Enda Brophy (Auth.) PDF All Chapters
62 pages
Transit Visa: Purpose of Visit
No ratings yet
Transit Visa: Purpose of Visit
3 pages
104 Mailing Transmittal (JAS)
No ratings yet
104 Mailing Transmittal (JAS)
8 pages
Ho To Upgrade Firmware For Logo fs04
No ratings yet
Ho To Upgrade Firmware For Logo fs04
1 page
Beyond Boundaries - Huawei MatePad Pro 13.2-Inch, MateBook D 16 2024, and FreeClip Raise The Bar For Portable Innovation and Creative Mastery
No ratings yet
Beyond Boundaries - Huawei MatePad Pro 13.2-Inch, MateBook D 16 2024, and FreeClip Raise The Bar For Portable Innovation and Creative Mastery
4 pages
3 - RPD - RT - B31.3 - 03 - Radiographic, Rev 00
No ratings yet
3 - RPD - RT - B31.3 - 03 - Radiographic, Rev 00
25 pages
Pack Test Catalog
No ratings yet
Pack Test Catalog
52 pages
RHEL10 vs RHEL9 Presentation Example
No ratings yet
RHEL10 vs RHEL9 Presentation Example
18 pages
An Industry Vision For Offers and Orders: Airline Retailing
No ratings yet
An Industry Vision For Offers and Orders: Airline Retailing
26 pages
Ses Field Services Rate Sheet 2024
No ratings yet
Ses Field Services Rate Sheet 2024
3 pages
Using GNS3 and VirtualBox Virtualize Network Test-Labs - Tashi Wangchuk
No ratings yet
Using GNS3 and VirtualBox Virtualize Network Test-Labs - Tashi Wangchuk
128 pages
Offer Letter Coforge
No ratings yet
Offer Letter Coforge
4 pages
SNL Plummer Block Housings: Solve Housing Problems
No ratings yet
SNL Plummer Block Housings: Solve Housing Problems
108 pages
Esport Presentation
100% (1)
Esport Presentation
23 pages
Apply For Ethiopian Passport Online
No ratings yet
Apply For Ethiopian Passport Online
1 page
Tax Law Assignment (1)
No ratings yet
Tax Law Assignment (1)
12 pages
ARM Processor Core
No ratings yet
ARM Processor Core
34 pages
Elevator Safety Tips
No ratings yet
Elevator Safety Tips
2 pages
VCIES 2025 -Other-countries
No ratings yet
VCIES 2025 -Other-countries
4 pages
New York
No ratings yet
New York
9 pages
02 - MTU - at University
No ratings yet
02 - MTU - at University
13 pages
Scripts A6
No ratings yet
Scripts A6
6 pages
82 Conde V CA
No ratings yet
82 Conde V CA
1 page
Chopra - Recurrent Neural Networks with Non-Sequential Data to Predict Hospital Readmission of Diabetic Patients
No ratings yet
Chopra - Recurrent Neural Networks with Non-Sequential Data to Predict Hospital Readmission of Diabetic Patients
6 pages
Tata Steel
No ratings yet
Tata Steel
2 pages
Appendix 5 - NDA
No ratings yet
Appendix 5 - NDA
5 pages
AOP Mid-Year Accomplishment Report (To Be Submitted After The First Semester of Current SY)
No ratings yet
AOP Mid-Year Accomplishment Report (To Be Submitted After The First Semester of Current SY)
6 pages
Dry Doc Million Dollar Sires
No ratings yet
Dry Doc Million Dollar Sires
10 pages
Smith Bell v. Sotelo Matti, GR L-16570, March 9, 1922 (Per J. Romualdez, en Banc)
No ratings yet
Smith Bell v. Sotelo Matti, GR L-16570, March 9, 1922 (Per J. Romualdez, en Banc)
8 pages
Unit 1 Learning Activity No. 1
No ratings yet
Unit 1 Learning Activity No. 1
3 pages