0% found this document useful (0 votes)

24 views6 pages

Seminar 4

Uploaded by

Zhou Pat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views6 pages

Seminar 4

Uploaded by

Zhou Pat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Seminar

Week 4 - Regression I (Prediction)

In the seminar this week, we will cover the following topics:

1. The lm() function.

2. The predict() function.
3. Interpreting the output of simple linear regression models.
4. Making and exporting tables and plots.

Ethnic Minorities and Electoral Turnout

What determines the electoral turnout rates of voters from ethnic minority groups? Existing theory sug-
gests that one important driver of turnout for ethnic minority voters is when elections feature candidates
from that ethnic group on the ballot paper. These candidate-centred approaches suggest that ethnic
minority candidates may be better at, and devote more resources towards, mobilising support from their
co-ethnic electorates than other candidates. There is some empirical evidence that suggests that when a
minority candidate is on the ballot, participation by minority voters increases.
An alternative theoretical perspective is that it is not the ethnicity of the candidate that matters, but
rather the ethnic composition of the electorate. According to this view, when ethnic groups are a very small
minority in a district, this implies a lack of descriptive representation which may produce a “disillusioned”
electorate with little incentive to participate. If this is the case, then as the size of an ethnic group within
a district increases, we should also expect increases in the rates of electoral participation for members of
that ethnic group.
In the paper (“Candidates or Districts? Reevaluating the Role of Race in Voter Turnout”), Bernard Fraga
evaluates both of these expectations using data from US congressional and primary elections. We will use
the data from this study to evaluate claims of this sort using regression analyses. Fraga analyses turnout
data for four different racial and ethnic groups, but we will focus on the data for black voters.
You can download the data from the "Seminar materials and activites" section on Moodle or here.
Put the csv. into your POLS0083/data folder, and then load an R script to use this week. Don’t forget
to set the working directory using the function setwd() as we have done for previous weeks. Then load
blackturnout.csv using the read.csv function.
A description of the variables is listed below:

Name Description
year Year the election was held

1
Name Description
state State in which the election was held
district District in which the election was held (unique within state but not
across states)
turnout The proportion of the black voting-age population in a district that
votes in the general election
CVAP The proportion of a district’s voting-age population that is black
candidate Binary variable coded “1” when the election includes a black candidate;
“0” when the election does not include a black candidate

It will be a little easier to interpret the regression output if we convert the two proportion variables into
percentages. Do this now using the following lines of code:

blackturnout$turnout <- blackturnout$turnout * 100

blackturnout$CVAP <- blackturnout$CVAP * 100

Question 1

For this question, try using two new functions that we have not introduced in previous weeks.

• unique() returns the unique values of any particular vector

• length() returns the length of a vector, i.e. the number of observations in the vector

1. Using the unique() function, find out what the years included in our data
are.

2. Using the length() function, find out how many states are included in our
data.

Question 2

In the following questions, we will be estimating several linear regression models, but let’s first take a
look at the relationship between the proportion of a district’s voting-age population that is black and
turnout.

1. Create a scatter plot which has CVAP on the x-axis and turnout on the y-axis.
Is the relationship between these variables positive or negative? If you are
struggling with how to make a scatterplot, revisit the materials from last
week. Make sure to include a title (the main argument) and label the x- and
y-axses (the xlab and ylab arguments).

2
2. Estimate a linear regression model where the dependent variable is the
percentage of the black people who voted in the election, and the
independent variable is the percentage of a district’s voting-age population
that is black. (Make sure that you get these the right way around!)
Interpret the resulting 𝛼̂ and 𝛽 ̂ coeﬀicients.

Ď the lm() function

Linear regression is implemented in R using the lm() function. The assumption we are making when we use a
linear regression model is that there is a linear relationship between the dependent and independent variable.
The lm() function needs to know a) the relationship we’re trying to model and b) the dataset that contains
our observations. The two arguments we need to provide to the lm() function are described below.

Argument Description
formula The formula describes the relationship between the dependent and
independent variables, for example dependent.variable ~
independent.variable
data This is simply the name of the dataset that contains the variable of interest. In
our case, this is the merged dataset called blackturnout.

For more information on how the lm() function works, type help(lm) in R.

3. Create a nice looking regression table.

Ď the texreg package

Note that to display the estimates from a regression model, we will usually want to create a nice looking
table. You may choose to use the texreg package to neatly summarise your regression model. First, install
the texreg package by typing install.packages("texreg") into your console.
Note: you only need to install a package on a computer once. Do not run this code every time
you run your R script! We therefore usually install packages in the console and not in the script.

install.packages("texreg")

Second, after the package has finished installing, type in library(texreg) into your script.

library(texreg)

If you are using RMarkdown, or if you want to have a look at the model in the console, you can use the
screenreg() function on the object where you save the results from your regression model to summarise
your regression model. For example, if you named your regression model turnout_cvap_ols, summarise your
model by typing into R.

screenreg(turnout_cvap_ols)

If you are not using Rmarkdown to complete the exercise, and you want to include a table with the model
output in a word document, you can do the following. Similar to exporting plots, you can export a table
directly to a folder on your computer. You should have a folder called POLS0083, which currently contains
your scripts and data. Create another subfolder and call it tables. You should then be able to save the
regression output from above as a table by using the following code:

3
htmlreg(turnout_cvap_ols, file = "tables/turnout_cvap_ols.doc")

You can then copy-paste the table from this file into your word document.
You can further customise the appearance of the table either once you have exported it into word, or once
you have ‘knitted’ your RMarkdown document, or with some options inside the commands. For instance, to
change the variable and columns labels:

screenreg(turnout_cvap_ols,
custom.model.names = "Turnout",
custom.coef.names = c("Intercept","% Black"))

htmlreg(turnout_cvap_ols,
custom.model.names = "Turnout",
custom.coef.names = c("Intercept","% Black"),
file = "tables/turnout_cvap_ols.doc")

4. Use the abline() function to add the estimated regression line to the scatter
plot you created earlier. Read the help file for this function to see some
different ways to achieve this.

5. Use the summary() function on your estimated linear regression model

object. Locate and interpret the 𝑅2 for this model.

Question 3

Once we have estimated a regression model, we can use that model to produce fitted or predicted values.
Fitted values represent our best guess for the value of our dependent variable for a specific value of our
independent variable.
The fitted value formula is:

𝑌𝑖̂ = 𝛼̂ + 𝛽 ̂ ∗ 𝑋𝑖

Let’s say that, on the basis of turnout_cvap_ols we would like to know what percentage of the black
population are likely to turnout in an election when the percentage of the district’s voting age population
that is black is equal to 5%. We can substitute in the relevant coeﬀicients from turnout_cvap_ols and
the value for our X variable (5 in this case), and we get:

𝑌𝑖̂ = 37.59 + 0.196 ∗ 5 = 38.57

Ď the predict() function

Rather than calculating these values manually, we can also produce fitted values in R by using the predict()
function. The predict function takes two main arguments.

4
Argument Description
object The object is the model object that we would like to use to produce fitted
values. Here, we would like to base the analysis on turnout_cvap_ols and so
we specify object = turnout_cvap_ols.
newdata This is an optional argument which we use to specify the values of our
independent variable(s) that we would like fitted values for. If we leave this
argument empty, R will automatically calculate fitted values for all of the
observations in the data that we used to estimate the original model. If we
include this argument, we need to provide a data.frame which has a variable
with the same name as the independent variable in our model. Here, we specify
newdata = data.frame(CVAP = 5), as we would like the fitted value for a
district where 5% of the population is black.

predict(object = turnout_cvap_ols, newdata = data.frame(CVAP = 5))

This is the same as the result we obtained when we calculated the fitted value manually. The good thing
about the predict() function, however, is that we will be able to use it for more complicated models that
we will study later in this course, and it can be useful for calculating many different fitted values.

Calculate the predicted level of black turnout for two cases: where the
percentage of black people in the population is equal to the 25th percentile and
75th percentile values for the distribution of that variable in the data. Use the
quantile() function to work out the values for the 25th percentile and 75th
percentile values. (That is, work out the fitted values for the interquartile range
values of 𝑋.) You will first need to work out the interquartile range, and then
added these values to the predict() function.

Question 4

For this question, we will continue practice visualising our data.

1. Create a boxplot that compares turnout in elections with and without a

co-ethnic candidate. Be sure to use informative labels. Interpret the
resulting graph. If you are struggling, look back at the code that we used
last week to create boxplots.
Hint: the names argument of the boxplot function allows you to provide names
for the groups which will appear under the relevant boxplot.

2. Save the above plot.

Ď Saving plots

At various points throughout this course it will be helpful if you can save the plots you have created as separate
files so that they can be imported into documents that you are working on. In general, we recommend saving
your plots as .pdf files.
The code below shows how to export a pdf of a plot directly to a folder on your computer. You should have
a folder called POLS0083, which currently contains your scripts and data, and a folder called tables. Create

5
another subfolder (within POLS0083) and call it plots. You should then be able to save the plot you created
above by using the following code:
(Note: if you are producing your work in R Markdown it won’t be necessary to save the .pdf as you can just
produce the plot directly in the R Markdown file as you have been doing.)

pdf("plots/coethnic_turnout_boxplot.pdf", width = 8, height = 8)

boxplot(turnout ~ candidate, data = blackturnout,
names = c("Non-Coethnic Candidate", "Coethnic Candidate"),
ylab = "Voter turnout (black voters)",
xlab = "One or more co-ethnic candidates")
dev.off()

You will notice that nothing appears in the plotting window when you run this code, but if you have set up
your folders correctly then a new plot called coethnic_turnout_boxplot.pdf should have appeared in the
plots folder that you have just created.
Note that the width = 8, height = 8 part of the code above tells R the dimensions of the image you would
like to create (here you have specified that you would like a square plot which is 8 by 8 inches). You can try
adjusting these numbers to see how they affect the shape of the image you are producing.
Another way to save plots as .pdf is to choose “Export” and then “Save as pdf” right above the plot window.
In this case, you don’t need the pdf() and dev.off() functions above.

SPecial Power of Attorney - GRAB Application
76% (78)
SPecial Power of Attorney - GRAB Application
2 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Power of Nature / Ep. 1 "Network": Chinese Zero TO Hero
No ratings yet
Power of Nature / Ep. 1 "Network": Chinese Zero TO Hero
20 pages
Red Dead Redemption 2 - Entire Script
No ratings yet
Red Dead Redemption 2 - Entire Script
1,182 pages
Every Day I Write The Book Notes On Style by Amitava Kumar PDF
100% (1)
Every Day I Write The Book Notes On Style by Amitava Kumar PDF
251 pages
BHOLI
No ratings yet
BHOLI
4 pages
XB Report Card Comments
No ratings yet
XB Report Card Comments
3 pages
PS4 PDF
No ratings yet
PS4 PDF
10 pages
04jankiewicz Models07 1
No ratings yet
04jankiewicz Models07 1
20 pages
How To Make A Viral Video
No ratings yet
How To Make A Viral Video
11 pages
R Codes
No ratings yet
R Codes
5 pages
Grammar Worksheets
No ratings yet
Grammar Worksheets
87 pages
Final Project FA24
No ratings yet
Final Project FA24
10 pages
R Stastics PDF
No ratings yet
R Stastics PDF
30 pages
II PUC English Notes
No ratings yet
II PUC English Notes
2 pages
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
No ratings yet
Experiment No.8 - Fit Simple Linear Regression Models Using Built-In Functions.
8 pages
Sales Objections
No ratings yet
Sales Objections
10 pages
Data Analytics Lesson 12 Notes
No ratings yet
Data Analytics Lesson 12 Notes
6 pages
Statistical Modelling
No ratings yet
Statistical Modelling
39 pages
Red Panda Reading 21
No ratings yet
Red Panda Reading 21
2 pages
RPS2
No ratings yet
RPS2
4 pages
Quanti - Simple Linear Regression - With Group Activities
No ratings yet
Quanti - Simple Linear Regression - With Group Activities
6 pages
Sei Shonagon - The Pillow Book
No ratings yet
Sei Shonagon - The Pillow Book
1 page
Unit 5-1
No ratings yet
Unit 5-1
17 pages
R Tutorial
No ratings yet
R Tutorial
15 pages
Lab 10 Forest Regression
No ratings yet
Lab 10 Forest Regression
5 pages
Learn Linear Regression With R - Linear Regression in R Cheatsheet - Codecademy
No ratings yet
Learn Linear Regression With R - Linear Regression in R Cheatsheet - Codecademy
5 pages
Stats Notes
No ratings yet
Stats Notes
4 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
Exercice V
No ratings yet
Exercice V
5 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Uni T - 2 - R Programming
No ratings yet
Uni T - 2 - R Programming
10 pages
Analysis of The Appointment (3e - Arago, J. - Guevarra, M.)
No ratings yet
Analysis of The Appointment (3e - Arago, J. - Guevarra, M.)
42 pages
Keanu Reeves John Wick
No ratings yet
Keanu Reeves John Wick
1 page
TP2 Reg 2024
No ratings yet
TP2 Reg 2024
5 pages
Basic Regression Analysis 2
No ratings yet
Basic Regression Analysis 2
6 pages
Econ7020X 2024S FinalExam
No ratings yet
Econ7020X 2024S FinalExam
10 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
18 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Unit5 R
No ratings yet
Unit5 R
5 pages
Introductory Econometrics Practical
No ratings yet
Introductory Econometrics Practical
1 page
21BCS5999 - Ankit Kumar (Assignment 2)
No ratings yet
21BCS5999 - Ankit Kumar (Assignment 2)
16 pages
Crime in Oklahoma 2017 Final 07.16.18
No ratings yet
Crime in Oklahoma 2017 Final 07.16.18
148 pages
Surgical Guidelines For Dental Implant Placement: British Dental Journal September 2006
No ratings yet
Surgical Guidelines For Dental Implant Placement: British Dental Journal September 2006
15 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Song Lyrics of The 1950s
No ratings yet
Song Lyrics of The 1950s
10 pages
Your Marathon Training Plan
No ratings yet
Your Marathon Training Plan
16 pages
Project Report ON Pleading: Meaning, Basic Rules of Pleading and A Draft-Application For Amendment of Pleading
No ratings yet
Project Report ON Pleading: Meaning, Basic Rules of Pleading and A Draft-Application For Amendment of Pleading
19 pages
Unit 554 Multivariate Analysis and Non Linearity Without Answers
No ratings yet
Unit 554 Multivariate Analysis and Non Linearity Without Answers
10 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Oulier in R
No ratings yet
Oulier in R
8 pages
CEB IT Infrastructure ITIL V3 Cheat Sheets Preview
No ratings yet
CEB IT Infrastructure ITIL V3 Cheat Sheets Preview
6 pages
Vicarious Liability
No ratings yet
Vicarious Liability
12 pages
Nxivm Corporation and First Principles, Inc. v. The Ross Institute, Rick Ross Also Known as Ricky Ross, John Hochman, and Stephanie Franco, Paul Martin and Wellspring Retreat, Inc., Consolidated-Defendants-Appellees, 364 F.3d 471, 1st Cir. (2004)
No ratings yet
Nxivm Corporation and First Principles, Inc. v. The Ross Institute, Rick Ross Also Known as Ricky Ross, John Hochman, and Stephanie Franco, Paul Martin and Wellspring Retreat, Inc., Consolidated-Defendants-Appellees, 364 F.3d 471, 1st Cir. (2004)
18 pages
Ed and - Ing, Superlatives and Comparatives
No ratings yet
Ed and - Ing, Superlatives and Comparatives
4 pages
Dat LM3940
No ratings yet
Dat LM3940
9 pages
Almario v. Alba, GR No. L-66068
No ratings yet
Almario v. Alba, GR No. L-66068
2 pages
SAT Math To Know in One Page PDF
No ratings yet
SAT Math To Know in One Page PDF
3 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Problem Set #1
No ratings yet
Problem Set #1
6 pages
R Lab 4
No ratings yet
R Lab 4
7 pages
Advertising and Brand Management
No ratings yet
Advertising and Brand Management
2 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Aspen Plus Tutorial Separations
No ratings yet
Aspen Plus Tutorial Separations
15 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Balanced Reading Program 2018 - 2019
No ratings yet
Balanced Reading Program 2018 - 2019
8 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
No ratings yet
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
25 pages
R Workshop PART 2
No ratings yet
R Workshop PART 2
36 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Adv Analytical Theory and Methods: Regression
No ratings yet
Adv Analytical Theory and Methods: Regression
45 pages
Problem Set 1
No ratings yet
Problem Set 1
5 pages
15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
CS ELEC 4 Finals Module
No ratings yet
CS ELEC 4 Finals Module
57 pages
Simple Regression Model Fitting
No ratings yet
Simple Regression Model Fitting
5 pages
R Multiple Regression Exercise 2019
No ratings yet
R Multiple Regression Exercise 2019
6 pages
Advanced - Linear Regression
No ratings yet
Advanced - Linear Regression
57 pages
R Course
No ratings yet
R Course
7 pages
Prediction Model
No ratings yet
Prediction Model
5 pages
Logistic Regression
No ratings yet
Logistic Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Predict and Co
No ratings yet
Predict and Co
6 pages
Statistics Cheat Sheet
100% (1)
Statistics Cheat Sheet
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Linear Mixed Models in Stata
No ratings yet
Linear Mixed Models in Stata
17 pages
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
No ratings yet
How To Use "Qqplot": X: Independent Variable, Y: Dependent Variable
6 pages
Which Test When: 1 Exploratory Tests
No ratings yet
Which Test When: 1 Exploratory Tests
5 pages
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
No ratings yet
Multinomial Logistic Regression - R Data Analysis Examples - IDRE Stats
8 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages