0% found this document useful (0 votes)
24 views6 pages

Seminar 4

Uploaded by

Zhou Pat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views6 pages

Seminar 4

Uploaded by

Zhou Pat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Seminar

Week 4 - Regression I (Prediction)

In the seminar this week, we will cover the following topics:

1. The lm() function.


2. The predict() function.
3. Interpreting the output of simple linear regression models.
4. Making and exporting tables and plots.

Ethnic Minorities and Electoral Turnout

What determines the electoral turnout rates of voters from ethnic minority groups? Existing theory sug-
gests that one important driver of turnout for ethnic minority voters is when elections feature candidates
from that ethnic group on the ballot paper. These candidate-centred approaches suggest that ethnic
minority candidates may be better at, and devote more resources towards, mobilising support from their
co-ethnic electorates than other candidates. There is some empirical evidence that suggests that when a
minority candidate is on the ballot, participation by minority voters increases.
An alternative theoretical perspective is that it is not the ethnicity of the candidate that matters, but
rather the ethnic composition of the electorate. According to this view, when ethnic groups are a very small
minority in a district, this implies a lack of descriptive representation which may produce a “disillusioned”
electorate with little incentive to participate. If this is the case, then as the size of an ethnic group within
a district increases, we should also expect increases in the rates of electoral participation for members of
that ethnic group.
In the paper (“Candidates or Districts? Reevaluating the Role of Race in Voter Turnout”), Bernard Fraga
evaluates both of these expectations using data from US congressional and primary elections. We will use
the data from this study to evaluate claims of this sort using regression analyses. Fraga analyses turnout
data for four different racial and ethnic groups, but we will focus on the data for black voters.
You can download the data from the "Seminar materials and activites" section on Moodle or here.
Put the csv. into your POLS0083/data folder, and then load an R script to use this week. Don’t forget
to set the working directory using the function setwd() as we have done for previous weeks. Then load
blackturnout.csv using the read.csv function.
A description of the variables is listed below:

Name Description
year Year the election was held

1
Name Description
state State in which the election was held
district District in which the election was held (unique within state but not
across states)
turnout The proportion of the black voting-age population in a district that
votes in the general election
CVAP The proportion of a district’s voting-age population that is black
candidate Binary variable coded “1” when the election includes a black candidate;
“0” when the election does not include a black candidate

It will be a little easier to interpret the regression output if we convert the two proportion variables into
percentages. Do this now using the following lines of code:

blackturnout$turnout <- blackturnout$turnout * 100


blackturnout$CVAP <- blackturnout$CVAP * 100

Question 1

For this question, try using two new functions that we have not introduced in previous weeks.

• unique() returns the unique values of any particular vector


• length() returns the length of a vector, i.e. the number of observations in the vector

1. Using the unique() function, find out what the years included in our data
are.

2. Using the length() function, find out how many states are included in our
data.

Question 2

In the following questions, we will be estimating several linear regression models, but let’s first take a
look at the relationship between the proportion of a district’s voting-age population that is black and
turnout.

1. Create a scatter plot which has CVAP on the x-axis and turnout on the y-axis.
Is the relationship between these variables positive or negative? If you are
struggling with how to make a scatterplot, revisit the materials from last
week. Make sure to include a title (the main argument) and label the x- and
y-axses (the xlab and ylab arguments).

2
2. Estimate a linear regression model where the dependent variable is the
percentage of the black people who voted in the election, and the
independent variable is the percentage of a district’s voting-age population
that is black. (Make sure that you get these the right way around!)
Interpret the resulting 𝛼̂ and 𝛽 ̂ coefficients.

Ď the lm() function

Linear regression is implemented in R using the lm() function. The assumption we are making when we use a
linear regression model is that there is a linear relationship between the dependent and independent variable.
The lm() function needs to know a) the relationship we’re trying to model and b) the dataset that contains
our observations. The two arguments we need to provide to the lm() function are described below.

Argument Description
formula The formula describes the relationship between the dependent and
independent variables, for example dependent.variable ~
independent.variable
data This is simply the name of the dataset that contains the variable of interest. In
our case, this is the merged dataset called blackturnout.

For more information on how the lm() function works, type help(lm) in R.

3. Create a nice looking regression table.

Ď the texreg package

Note that to display the estimates from a regression model, we will usually want to create a nice looking
table. You may choose to use the texreg package to neatly summarise your regression model. First, install
the texreg package by typing install.packages("texreg") into your console.
Note: you only need to install a package on a computer once. Do not run this code every time
you run your R script! We therefore usually install packages in the console and not in the script.

install.packages("texreg")

Second, after the package has finished installing, type in library(texreg) into your script.

library(texreg)

If you are using RMarkdown, or if you want to have a look at the model in the console, you can use the
screenreg() function on the object where you save the results from your regression model to summarise
your regression model. For example, if you named your regression model turnout_cvap_ols, summarise your
model by typing into R.

screenreg(turnout_cvap_ols)

If you are not using Rmarkdown to complete the exercise, and you want to include a table with the model
output in a word document, you can do the following. Similar to exporting plots, you can export a table
directly to a folder on your computer. You should have a folder called POLS0083, which currently contains
your scripts and data. Create another subfolder and call it tables. You should then be able to save the
regression output from above as a table by using the following code:

3
htmlreg(turnout_cvap_ols, file = "tables/turnout_cvap_ols.doc")

You can then copy-paste the table from this file into your word document.
You can further customise the appearance of the table either once you have exported it into word, or once
you have ‘knitted’ your RMarkdown document, or with some options inside the commands. For instance, to
change the variable and columns labels:

screenreg(turnout_cvap_ols,
custom.model.names = "Turnout",
custom.coef.names = c("Intercept","% Black"))

htmlreg(turnout_cvap_ols,
custom.model.names = "Turnout",
custom.coef.names = c("Intercept","% Black"),
file = "tables/turnout_cvap_ols.doc")

4. Use the abline() function to add the estimated regression line to the scatter
plot you created earlier. Read the help file for this function to see some
different ways to achieve this.

5. Use the summary() function on your estimated linear regression model


object. Locate and interpret the 𝑅2 for this model.

Question 3

Once we have estimated a regression model, we can use that model to produce fitted or predicted values.
Fitted values represent our best guess for the value of our dependent variable for a specific value of our
independent variable.
The fitted value formula is:

𝑌𝑖̂ = 𝛼̂ + 𝛽 ̂ ∗ 𝑋𝑖

Let’s say that, on the basis of turnout_cvap_ols we would like to know what percentage of the black
population are likely to turnout in an election when the percentage of the district’s voting age population
that is black is equal to 5%. We can substitute in the relevant coefficients from turnout_cvap_ols and
the value for our X variable (5 in this case), and we get:

𝑌𝑖̂ = 37.59 + 0.196 ∗ 5 = 38.57

Ď the predict() function

Rather than calculating these values manually, we can also produce fitted values in R by using the predict()
function. The predict function takes two main arguments.

4
Argument Description
object The object is the model object that we would like to use to produce fitted
values. Here, we would like to base the analysis on turnout_cvap_ols and so
we specify object = turnout_cvap_ols.
newdata This is an optional argument which we use to specify the values of our
independent variable(s) that we would like fitted values for. If we leave this
argument empty, R will automatically calculate fitted values for all of the
observations in the data that we used to estimate the original model. If we
include this argument, we need to provide a data.frame which has a variable
with the same name as the independent variable in our model. Here, we specify
newdata = data.frame(CVAP = 5), as we would like the fitted value for a
district where 5% of the population is black.

predict(object = turnout_cvap_ols, newdata = data.frame(CVAP = 5))

This is the same as the result we obtained when we calculated the fitted value manually. The good thing
about the predict() function, however, is that we will be able to use it for more complicated models that
we will study later in this course, and it can be useful for calculating many different fitted values.

Calculate the predicted level of black turnout for two cases: where the
percentage of black people in the population is equal to the 25th percentile and
75th percentile values for the distribution of that variable in the data. Use the
quantile() function to work out the values for the 25th percentile and 75th
percentile values. (That is, work out the fitted values for the interquartile range
values of 𝑋.) You will first need to work out the interquartile range, and then
added these values to the predict() function.

Question 4

For this question, we will continue practice visualising our data.

1. Create a boxplot that compares turnout in elections with and without a


co-ethnic candidate. Be sure to use informative labels. Interpret the
resulting graph. If you are struggling, look back at the code that we used
last week to create boxplots.
Hint: the names argument of the boxplot function allows you to provide names
for the groups which will appear under the relevant boxplot.

2. Save the above plot.

Ď Saving plots

At various points throughout this course it will be helpful if you can save the plots you have created as separate
files so that they can be imported into documents that you are working on. In general, we recommend saving
your plots as .pdf files.
The code below shows how to export a pdf of a plot directly to a folder on your computer. You should have
a folder called POLS0083, which currently contains your scripts and data, and a folder called tables. Create

5
another subfolder (within POLS0083) and call it plots. You should then be able to save the plot you created
above by using the following code:
(Note: if you are producing your work in R Markdown it won’t be necessary to save the .pdf as you can just
produce the plot directly in the R Markdown file as you have been doing.)

pdf("plots/coethnic_turnout_boxplot.pdf", width = 8, height = 8)


boxplot(turnout ~ candidate, data = blackturnout,
names = c("Non-Coethnic Candidate", "Coethnic Candidate"),
ylab = "Voter turnout (black voters)",
xlab = "One or more co-ethnic candidates")
dev.off()

You will notice that nothing appears in the plotting window when you run this code, but if you have set up
your folders correctly then a new plot called coethnic_turnout_boxplot.pdf should have appeared in the
plots folder that you have just created.
Note that the width = 8, height = 8 part of the code above tells R the dimensions of the image you would
like to create (here you have specified that you would like a square plot which is 8 by 8 inches). You can try
adjusting these numbers to see how they affect the shape of the image you are producing.
Another way to save plots as .pdf is to choose “Export” and then “Save as pdf” right above the plot window.
In this case, you don’t need the pdf() and dev.off() functions above.

You might also like