P Refresher
P Refresher
July 2019
2
Contents
Pre-Prefresher Exercises 11
Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
I Math 15
1 Linear Algebra 17
1.1 Working with Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Basics of Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Systems of Equations as Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Finding Solutions to Augmented Matrices and Systems of Equations . . . . . 24
1.7 Rank — and Whether a System Has One, Infinite, or No Solutions . . . . . . 26
1.8 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9 Linear Systems and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.11 Getting Inverse of a Matrix using its Determinant . . . . . . . . . . . . . . . 30
Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3
4 CONTENTS
3 Limits 45
Example: The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 45
Example: The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 The Limit of a Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Limits of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Answers to Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4 Calculus 57
Example: The Mean is a Type of Integral . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Higher-Order Derivatives (Derivatives of Derivatives of Derivatives) . . . . . . 61
4.3 Composite Functions and the Chain Rule . . . . . . . . . . . . . . . . . . . . 62
4.4 Derivatives of natural logs and the exponent . . . . . . . . . . . . . . . . . . . 63
4.5 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Taylor Series Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7 The Indefinite Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.8 The Definite Integral: The Area under the Curve . . . . . . . . . . . . . . . . 71
4.9 Integration by Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Optimization 81
Example: Meltzer-Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Concavity of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 FOC and SOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Global Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.7 Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.8 Applications of Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 103
II Programming 131
7 Orientation and Reading in Data 133
7.1 Motivation: Data and You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Orienting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3 The Computer and You: Giving Instructions . . . . . . . . . . . . . . . . . . 138
7.4 Base-R vs. tidyverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.5 A is for Athens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
9 Visualization 171
9.1 Motivation: The Law of the Census . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2 Read data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.3 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.4 Tabulating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.5 base R graphics and ggplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.6 Improving your graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.7 Cross-tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.8 Composition Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.9 Line graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
12 Simulation 215
12.1 Motivation: Simulation as an Analytical Tool . . . . . . . . . . . . . . . . . . 216
12.2 Pick a sample, any sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.3 The sample() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.4 Random numbers from specific distributions . . . . . . . . . . . . . . . . . . . 219
12.5 r, p, and d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
12.6 set.seed() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
14 Text 233
Where are we? Where are we headed? . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.2 Goals for today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.3 Reading and writing text in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.4 paste() and sprintf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.5 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14.6 Representing Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.7 Important packages for parsing text . . . . . . . . . . . . . . . . . . . . . . . 239
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
The Harvard Gov Prefresher is held each year in August. All relevant information is on
our website, including the day-to-day schedule. The 2019 Prefresher instructors are Shannon
Parker and Meg Schwenzfeier, and the faculty sponsor is Gary King.
This booklet serves as the text for the Prefresher, available as a webpage and as a printable
PDF. It is the product of generations of Prefresher instructors. See below for a full list of
instructors and contributors.
• Authors and Instructors: Curt Signorino 1996-1997; Ken Scheve 1997-1998; Eric Dick-
son 1998-2000; Orit Kedar 1999; James Fowler 2000-2001; Kosuke Imai 2001-2002;
Jacob Kline 2002; Dan Epstein 2003; Ben Ansell 2003-2004; Ryan Moore 2004-2005;
Mike Kellermann 2005-2006; Ellie Powell 2006-2007; Jen Katkin 2007-2008; Patrick
Lam 2008-2009; Viridiana Rios 2009-2010; Jennifer Pan 2010-2011; Konstantin Kashin
2011-2012; Soledad Prillaman 2013; Stephen Pettigrew 2013-2014; Anton Strezhnev
2014-2015; Mayya Komisarchik 2015-2016; Connor Jerzak 2016-2017; Shiro Kuriwaki
2017-2018; Yon Soo Park 2018
• Repository Maintainer: Shiro Kuriwaki (kuriwaki)
• Contributors: Thanks to Juan Dodyk (juandodyk), Hunter Rendleman (hrendleman),
and Tyler Simko (tylersimko) for contributing to the booklet for corrections and im-
provements as students.
Contributing
We transitioned the booklet into a bookdown github repository in 2018. As we update this
version, we appreciate any bug reports or fixes appreciated.
All changes should be made in the .Rmd files in the project root. Changes pushed to the
repository will be checked for compilation by Travis-CI. To contribute a change, please make
a pull request and set the repository maintainer as the reviewer.
9
10 CONTENTS
Pre-Prefresher Exercises
Before our first meeting, please try solving these questions. They are a sample of the very
beginning of each math section. We have provided links to the parts of the book you can
read if the concepts are new to you.
The goal of this “pre”-prefresher assignment is not to intimidate you but to set common
expectations so you can make the most out of the actual Prefresher. Even if you do not
understand some or all of these questions after skimming through the linked sections, your
effort will pay off and you will be better prepared for the math prefresher. We are also open
to adjusting these expectations based on feedback (this class is for you), so please do not
hesitate to write to the instructors for feedback.
Linear Algebra
Vectors
1 4
Define the vectors u = 2 , v = 5, and the scalar c = 2. Calculate the following:
3 6
1. u + v
2. cv
3. u · v
If you are having trouble with these problems, please review Section 1.1 “Working with
Vectors” in Chapter 1.
Are the following sets of vectors linearly independent?
( ) ( )
1 2
1. u = ,v=
2 4
1 3
2. u = 2, v = 7
5 9
2 3 5
3. a = −1, b = −4, c = −10 (this requires some guesswork)
1 −2 −8
11
12 CONTENTS
If you are having trouble with these problems, please review Section 1.2.
Matrices
7 5 1
11 9 3
A=
2 14
21
4 1 5
1 2 8
3 9 11
B=
4
7 5
5 1 9
What is A + B?
Given that
1 2 8
C = 3 9 11
4 7 5
What is A + C?
Given that
c=2
What is cA?
If you are having trouble with these problems, please review Section 1.3.
Operations
Summation
∑
3
2. (3k + 2)
k=1
∑
4
3. (3k + i + 2)
i=1
Products
∏
3
1. i
i=1
∏
3
2. (3k + 2)
k=1
1. 42
2. 42 23
3. log10 100
4. log2 4
5. log e, where log is the natural log (also written as ln) – a log with base e, and e is
Euler’s constant
6. ea eb ec , where a, b, c are each constants
7. log 0
8. e0
9. e1
10. log e2
Limits
1. lim (x − 1)
x→2
(x−2)(x−1)
2. lim (x−2)
x→2
x2 −3x+2
3. lim x−2
x→2
Calculus
For each of the following functions f (x), find the derivative f ′ (x) or d
dx f (x)
1. f (x) = c
2. f (x) = x
3. f (x) = x2
4. f (x) = x3
5. f (x) = 3x2 + 2x1/3
6. f (x) = (x3 )(2x4 )
For a review, please see Section 4.1 - 4.2
Optimization
For each of the followng functions f (x), does a maximum and minimum exist in the domain
x ∈ R? If so, for what are those values and for which values of x?
1. f (x) = x
2. f (x) = x2
3. f (x) = −(x − 2)2
If you are stuck, please try sketching out a picture of each of the functions.
Probability
1. If there are 12 cards, numbered 1 to 12, and 4 cards are chosen, how many distinct
possible choices are there? (unordered, without replacement)
2. Let A = {1, 3, 5, 7, 8} and B = {2, 4, 7, 8, 12, 13}. What is A ∪ B? What is A ∩ B? If A
is a subset of the Sample Space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, what is the complement
AC ?
3. If we roll two fair dice, what is the probability that their sum would be 11?
4. If we roll two fair dice, what is the probability that their sum would be 12?
For a review, please see Sections 6.2 - 6.3.
Part I
Math
15
Chapter 1
Linear Algebra
Topics: • Working with Vectors • Linear Independence • Basics of Matrix Algebra • Square
Matrices • Linear Equations • Systems of Linear Equations • Systems of Equations as
Matrices • Solving Augmented Matrices and Systems of Equations • Rank • The Inverse of
a Matrix • Inverse of Larger Matrices
We can also think of a vector as defining a point in n-dimensional space, usually Rn ; each
element of the vector defines the coordinate of the point in a particular direction.
Vector Addition and Subtraction: If two vectors, u and v, have the same length
(i.e. have the same number of elements), they can be added (subtracted) together:
( )
u + v = u1 + v1 u2 + v2 ··· uk + vn
( )
u − v = u1 − v1 u2 − v2 ··· uk − vn
Scalar Multiplication: The product of a scalar c (i.e. a constant) and vector v is:
( )
cv = cv1 cv2 ... cvn
17
18 CHAPTER 1. LINEAR ALGEBRA
Vector Inner Product: The inner product (also called the dot product or scalar product)
of two vectors u and v is again defined if and only if they have the same number of elements
∑
n
u · v = u1 v1 + u2 v2 + · · · + un vn = ui vi
i=1
Linear independence is only defined for sets of vectors with the same number of elements;
any linearly independent set of vectors in n-space contains at most n vectors.
( ) ( ) ( ) ( )
Since 9 13 17 is a linear combination of 1 2 3 , 2 3 4 , and 3 4 5 , these 4
vectors constitute a linearly dependent set.
Example 1.2 (Linear Independence). Are the following sets of vectors linearly indepen-
dent?
( ) ( )
1. (2 3 1) and
( 4 6) 1 ( )
2. 1 0 0 , 0 5 0 , and 10 10 0
Exercise 1.2 (Linear Independence). Are the following sets of vectors linearly independent?
1.
1 1 1
v1 = 0 , v2 = 0 , v3 = 1
0 1 1
2.
2 −4 −2
v1 = 2 , v2 = 6 , v3 = 8
1 5 6
Note that you can think of vectors as special cases of matrices; a column vector of length k
is a k × 1 matrix, while a row vector of the same length is a 1 × k matrix.
It’s also useful to think of matrices as being made up of a collection of row or column vectors.
For example, ( )
A = a1 a2 · · · am
Note that matrices A and B must have the same dimensionality, in which case they are
conformable for addition.
Example 1.3. ( ) ( )
1 2 3 1 2 1
A= , B=
4 5 6 2 1 2
A+B=
Note that the number of columns of the first matrix must equal the number of rows of the
second matrix, in which case they are conformable for multiplication. The sizes of the
matrices (including the resulting product) must be
(m × k)(k × n) = (m × n)
2. Commutative: A+B=B+A
3. Distributive: A(B + C) = AB + AC
(A + B)C = AC + BC
Commutative law for multiplication does not hold – the order of multiplication matters:
AB ̸= BA
For example,
(
) ( )
1 2 2 1
A= , B=
−1 3 0 1
( ) ( )
2 3 1 7
AB = , BA =
−2 2 −1 3
T
( 0 1 ) ( )
1 3 2 12 7
(AB)T = 2 2 =
2 −1 3 5 −3
3 −1
( ) 1 2 ( )
0 2 3 12 7
T T
B A = 3 −1 =
1 2 −1 5 −3
2 3
22 CHAPTER 1. LINEAR ALGEBRA
1 5 −7
1 1 0
B=
0
−1 1
2 0 0
( )
3 2 −1
C=
0 4 6
x − 3y = −3
2x + y = 8
Example: x = 3 and y = 2 is the solution to the above 2 × 2 linear system. If you graph the
two lines, you will find that they intersect at (3, 2).
Does a linear system have one, no, or multiple solutions? For a system of 2 equations with
2 unknowns (i.e., two lines): _
One solution: The lines intersect at exactly one point.
No solution: The lines are parallel.
Infinite solutions: The lines coincide.
Methods to solve linear systems:
1. Substitution
2. Elimination of variables
3. Matrix methods
Exercise 1.4 (Linear Equations). Provide a system of 2 equations with 2 unknowns that
has
1. one solution
2. no solution
3. infinite solutions
as
Ax = b
where
The m × n coefficient matrix A is an array of mn real numbers arranged in m rows by n
columns:
a11 a12 · · · a1n
a21 a22 · · · a2n
A= . .. ..
.. . .
am1 am2 ··· amn
x1
x2
The unknown quantities are represented by the vector x = . .
..
xn
24 CHAPTER 1. LINEAR ALGEBRA
b1
b2
The right hand side of the linear system is represented by the vector b = . .
..
bm
Augmented Matrix: When we append b to the coefficient matrix A, we get the augmented
b = [A|b]
matrix A
a11 a12 · · · a1n | b1
a21 a22 · · · a2n | b2
.. .. .. ..
. . . | .
am1 am2 · · · amn | bm
Exercise 1.5 (Augmented Matrix). Create an augmented matrix that represent the follow-
ing system of equations:
x1 − 15x2 − 11x5 = 9
Row Echelon Form: Our goal is to translate our augmented matrix or system of equations
into row echelon form. This will provide us with the values of the vector x which solve
the system. We use the row operations to change coefficients in the lower triangle of the
augmented matrix to 0. An augmented matrix of the form
a′11 a′12 a′13 ··· a′1n | b′1
0 a′22 a′23 ··· a′2n | b′2
0 0 a′33 ··· a′3n ′
| b3
.. .. .
0 . . | ..
0 0
′ ′
0 0 0 0 amn | bm
is said to be in row echelon form — each row has more leading zeros than the row preceding
it.
Reduced Row Echelon Form: We can go one step further and put the matrix into
reduced row echelon form. Reduced row echelon form makes the value of x which solves the
system very obvious. For a system of m equations in m unknowns, with no all-zero rows,
the reduced row echelon form would be
1.6. FINDING SOLUTIONS TO AUGMENTED MATRICES AND SYSTEMS OF EQUATIONS25
1 0 0 0 0 | b∗1
0 1 0 0 0 | b∗2
∗
0 0 1 0 0 | b3
.. .
0 0 0 . 0 | ..
0 0 0 0 1 | b∗m
b
which represents a linear system equivalent to that represented by matrix A.
b by a constant c,
Multiplying by a Constant: If we multiply the second row of matrix A
we get the augmented matrix ( )
a11 a12 | b1
ca21 ca22 | cb2
b
which represents a linear system equivalent to that represented by matrix A.
b to the second,
Adding (subtracting) Rows: If we add (subtract) the first row of matrix A
we obtain the augmented matrix
( )
a11 a12 | b1
a11 + a21 a12 + a22 | b1 + b2
b
which represents a linear system equivalent to that represented by matrix A.
Example 1.6. Solve the following system of equations by using elementary row operations:
x − 3y = −3
2x + y = 8
Exercise 1.6 (Solving Systems of Equations). Put the following system of equations into
augmented matrix form. Then, using Gaussian or Gauss-Jordan elimination, solve the
system of equations by putting the matrix into row echelon or reduced row echelon form.
26 CHAPTER 1. LINEAR ALGEBRA
x + y + 2z = 2
1. 3x − 2y + z = 1
y−z =3
2x + 3y − z = −8
2. x + 2y − z = 12
−x − 4y + z = −6
Exercise 1.7 (Rank of Matrices). Find the rank of each matrix below:
(Hint: transform the matrices into row echelon form. Remember that the number of nonzero
rows of a matrix in row echelon form is the rank of that matrix)
1 1 2
1.2 1 3
1 2 3
1 3 3 −3 3
1 3 1 1 3
2.
1 3 2
−1 −2
1 3 0 3 −2
1.8. THE INVERSE OF A MATRIX 27
AB = BA = In
AB = In
Solving for B is equivalent to solving for n linear systems, where each column of B is solved
for the corresponding column in In . We can solve the systems simultaneously by augmenting
28 CHAPTER 1. LINEAR ALGEBRA
Exercise 1.8 (Finding the inverse of matrices). Find the inverse of the following matrix:
1 0 4
1. A = 0 2 0
0 0 1
Ax = b
Hence, given A and b and given that A is nonsingular, then x = A−1 b is a unique solution
to this system.
1.10. DETERMINANTS 29
Exercise 1.9 (Solve linear system using inverses). Use the inverse matrix to solve the
following linear system:
−3x + 4y = 5
2x − y = −10
Hint: the linear system above can be written in the matrix form
Az = b
given ( )
−3 4
A= ,
2 −1
( )
x
z= ,
y
and ( )
5
b=
−10
1.10 Determinants
Singularity: Determinants can be used to determine whether a square matrix is nonsingu-
lar.
A square matrix is nonsingular if and only if its determinant is not zero.
Determinant of a 1 × 1 matrix, A, equals a11
a11 a12
Determinant of a 2 × 2 matrix, A, :
a21 a22
det(A) = |A|
= a11 |a22 | − a12 |a21 |
= a11 a22 − a12 a21
We can extend the second to last equation above to get the definition of the determinant of
a 3 × 3 matrix:
Let’s extend this now to any n × n matrix. Let’s define Aij as the (n − 1) × (n − 1)
submatrix of A obtained by deleting row i and column j. Let the (i, j)th minor of A be
the determinant of Aij :
Mij = |Aij |
Then for any n × n matrix A
For example, in figuring out whether the following matrix has an inverse?
1 1 1
A = 0 2 3
5 5 1
Exercise 1.10 (Determinants and Inverses). Determine whether the following matrices are
nonsingular:
1 0 1
1. 2 1 2
1 0 −1
2 1 2
2. 1 0 1
4 1 4
Hence, we can examine how changes in the parameters and bi affect the solutions xj .
Determinant Formula for the Inverse of a 2 × 2:
( )
a b
The determinant of a 2 × 2 matrix A is defined as:
c d
( )
1 d −b
det(A) −c a
For example, Let’s calculate the inverse of matrix A from Exercise 1.9 using the determinant
formula.
Recall,
( )
−3 4
A=
2 −1
( )
1 −1 −4
det(A) −2 −3
( )
1 −1 −4
−5 −2 −3
(1 4
)
5 5
2 3
5 5
Exercise 1.11 (Calculate Inverse using Determinant Formula). Caculate the inverse of A
( )
3 5
A=
−7 2
2. No solution:
−x + y = 0
x − y = 2
3. Infinite solutions:
−x + y = 0
2x − 2y = 0
1.11. GETTING INVERSE OF A MATRIX USING ITS DETERMINANT 33
x − 3y = −3
2x + y = 8
x − 3y = −3
7y = 14
x − 3y = −3
y = 2
x = 3
y = 2
∑ ∏
2.1 Summation Operators and
Addition (+), Subtraction (-), multiplication and division are basic operations of arithmetic
– combining numbers. In statistics and calculus, we want to add a sequence of numbers
that can be expressed as a pattern without needing to write down all its components. For
example, how would we express the sum of all numbers from 1 to 100 without writing a
hundred numbers?
∑ ∏
For this we use the summation operator and the product operator .
Summation:
∑
100
xi = x1 + x2 + x3 + · · · + x100
i=1
∑
The bottom of the symbol indicates an index (here, i), and its
∑start value 1. At the top
is where the index ends. The notion of “addition” is part of the symbol. The content to
the right of the summation is the meat of what we add. While you can pick your favorite
index, start, and end values, the content must also have the index.
∑
n ∑
n
• cxi = c xi
i=1 i=1
∑n ∑
n ∑
n
• (xi + yi ) = xi + yi
i=1 i=1 i=1
∑n
• c = nc
i=1
35
36 CHAPTER 2. FUNCTIONS AND OPERATIONS
Product:
∏
n
xi = x1 x2 x3 · · · xn
i=1
Properties:
∏
n ∏
n
• cxi = cn xi
i=1 i=1
∏
n ∏
n
• cxk = cn−k xi
i=k i=k
∏
n
• (xi + yi ) = a total mess
i=1
∏
n
• c = cn
i=1
Factorials!:
x! = x · (x − 1) · (x − 2) · · · (1)
Modulo: Tells you the remainder when you divide the first number by the second.
• 17 mod 3 = 2
• 100 % 30 = 10
∑
5
Example 2.1 (Operators). 1. i=
i=1
∏
5
2. i=
i=1
3. 14 mod 4 =
4. 4! =
∑
5
2. 2
i=1
∏
5
3. (2)xi
i=3
2.2. INTRODUCTION TO FUNCTIONS 37
Example 2.2 (Functions). For each of the following, state whether they are one-to-one or
many-to-one functions.
1. For x ∈ [0, ∞], f : x → x2 (this could also be written as f (x) = x2 ).
2. For x ∈ [−∞, ∞], f : x → x2 .
Exercise 2.2 (Functions). For each of the following, state whether they are one-to-one or
many-to-one functions.
1. For x ∈ [−3, ∞], f : x → x2 .
√
2. For x ∈ [0, ∞], f : x → x
38 CHAPTER 2. FUNCTIONS AND OPERATIONS
f (X) = {y : y = f (x), x ∈ X}
y = loga (x) ⇐⇒ ay = x
The log function can be thought of as an inverse for exponential functions. a is referred to
as the “base” of the logarithm.
Common Bases: The two most common logarithms are base 10 and base e.
1. Base 10: y = log10 (x) ⇐⇒ 10y = x. The base 10 logarithm is often simply written
as “log(x)” with no base denoted.
2. Base e: y = loge (x) ⇐⇒ ey = x. The base e logarithm is referred to as the
“natural” logarithm and is written as “ln(x)“.
Properties of exponential functions:
• ax ay = ax+y
• a−x = 1/ax
• ax /ay = ax−y
• (ax )y = axy
• a0 = 1
2.3. LOG AND EXP 39
loga (ax ) = x
and
aloga (x) = x
• log(xy) = log(x) + log(y)
• log(xy ) = y log(x)
• log(1/x) = log(x−1 ) = − log(x)
• log(x/y) = log(x · y −1 ) = log(x) + log(y −1 ) = log(x) − log(y)
• log(1) = log(e0 ) = 0
Change of Base Formula: Use the change of base formula to switch bases as necessary:
loga (x)
logb (x) =
loga (b)
Example:
ln(x)
log10 (x) =
ln(10)
You can use logs to go between sum and product notation. This will be particularly impor-
tant when you’re learning maximum likelihood estimation.
(∏
n )
log xi = log(x1 · x2 · x3 · · · · xn )
i=1
= log(x1 ) + log(x2 ) + log(x3 ) + · · · + log(xn )
∑n
= log(xi )
i=1
Therefore, you can see that the log of a product is equal to the sum of the logs. We can
write this more generally by adding in a constant, c:
(∏
n )
log cxi = log(cx1 · cx2 · · · cxn )
i=1
= log(cn · x1 · x2 · · · xn )
= log(cn ) + log(x1 ) + log(x2 ) + · · · + log(xn )
∑
n
= n log(c) + log(xi )
i=1
40 CHAPTER 2. FUNCTIONS AND OPERATIONS
Simplify each of the following logarithms. By “simplify”, we actually really mean - use as
many of the logarithmic properties as you can.
9 5
2. log( xz3y )
√
3. ln xy
Solving for variables is especially important when we want to find the roots of an equation:
those values of variables that cause an equation to equal zero. Especially important in
finding equilibria and in doing maximum likelihood estimation.
2.6. SETS 41
Quadratic Formula: For quadratic equations ax2 + bx + c = 0, use the quadratic formula:
√
−b ± b2 − 4ac
x=
2a
2.6 Sets
Interior Point: The point x is an interior point of the set S if x is in S and if there is
some ϵ-ball around x that contains only points in S. The interior of S is the collection of
all interior points in S. The interior can also be defined as the union of all open sets in S.
• If the set S is circular, the interior points are everything inside of the circle, but not
on the circle’s rim.
• Example: The interior of the set {(x, y) : x2 + y 2 ≤ 4} is {(x, y) : x2 + y 2 < 4} .
Boundary Point: The point x is a boundary point of the set S if every ϵ-ball around x
contains both points that are in S and points that are outside S. The boundary is the
collection of all boundary points.
• If the set S is circular, the boundary points are everything on the circle’s rim.
• Example: The boundary of {(x, y) : x2 + y 2 ≤ 4} is {(x, y) : x2 + y 2 = 4}.
Open: A set S is open if for each point x in S, there exists an open ϵ-ball around x
completely contained in S.
• If the set S is circular and open, the points contained within the set get infinitely close
to the circle’s rim, but do not touch it.
• Example: {(x, y) : x2 + y 2 < 4}
Closed: A set S is closed if it contains all of its boundary points.
• Alternatively: A set is closed if its complement is open.
• If the set S is circular and closed, the set contains all points within the rim as well as
the rim itself.
• Example: {(x, y) : x2 + y 2 ≤ 4}
• Note: a set may be neither open nor closed. Example: {(x, y) : 2 < x2 + y 2 ≤ 4}
Complement: The complement of set S is everything outside of S.
42 CHAPTER 2. FUNCTIONS AND OPERATIONS
1. y = 3x + 2 =⇒ −3x = 2 − y =⇒ 3x = y − 2 =⇒ x = 31 (y − 2)
2. x = ln y
Answer to Exercise 2.4:
−2
1. 3
2. x = {1, -4}
3. x = - ln 10
44 CHAPTER 2. FUNCTIONS AND OPERATIONS
Chapter 3
Limits
Solving limits, i.e. finding out the value of functions as its input moves closer to some value,
is important for the social scientist’s mathematical toolkit for two related tasks. The first
is for the study of calculus, which will be in turn useful to show where certain functions are
maximized or minimized. The second is for the study of statistical inference, which is the
study of inferring things about things you cannot see by using things you can see.
Theorem 3.1 (Central Limit Theorem (i.i.d. case)). For any series of independent and
identically distributed random variables X1 , X2 , · · ·, we know the distribution of its sum even
if we do not know the distribution of X. The distribution of the sum is a Normal distribution.
X̄n − µ d
√ − → Normal(0, 1),
σ/ n
where µ is the mean of X and σ is the standard deviation of X. The arrow is read as
“converges in distribution to”. Normal(0, 1) indicates a Normal Distribution with mean 0
and variance 1.
That is, the limit of the distribution of the lefthand side is the distribution of the righthand
side.
The sign of a limit is the arrow “→”. Although we have not yet covered probability (in
Section 6) so we have not described what distributions and random variables are, it is worth
foreshadowing the Central Limit Theorem. The Central Limit Theorem is powerful because
it gives us a guarantee of what would happen if n → ∞, which in this case means we collected
more data.
45
46 CHAPTER 3. LIMITS
1.00
Estimate of the Probability of Heads after n trials
0.75
Estimate at
0.50 n = 1,000:
0.487
0.25
0.00
Figure 3.1: As the number of coin tosses goes to infinity, the average probabiity of heads
converges to 0.5
A finding that perhaps rivals the Central Limit Theorem is the Law of Large Numbers:
Theorem 3.2 ((Weak) Law of Large Numbers). For any draw of identically distributed
independent variables with mean µ, the sample average after n draws, X̄n , converges in
probability to the true mean as n → ∞:
p
A shorthand of which is X̄n −
→ µ, where the arrow is read as “converges in probability to”.
Intuitively, the more data, the more accurate is your guess. For example, the Figure 3.1
shows how the sample average from many coin tosses converges to the true value : 0.5.
3.1. SEQUENCES 47
3.1 Sequences
We need a couple of steps until we get to limit theorems in probability. First we will
introduce a “sequence”, then we will think about the limit of a sequence, then we will think
about the limit of a function.
A sequence
{xn } = {x1 , x2 , x3 , . . . , xn }
is an ordered set of real numbers, where x1 is the first term in the sequence and yn is the
nth term. Generally, a sequence is infinite, that is it extends to n = ∞. We can also write
the sequence as
{xn }∞
n=1
where the subscript and superscript are read together as “from 1 to infinity.”
We find the sequence by simply “plugging in” the integers into each n. The important thing
is to get a sense of how these numbers are going to change. Example 1’s numbers seem
to come closer and closer to 2, but will it ever surpass 2? Example 2’s numbers are also
increasing each time, but will it hit a limit? What is the pattern in Example 3? Graphing
helps you make this point more clearly. See the sequence of n = 1, ...20 for each of the three
examples in Figure 3.2.
Definition 3.1. The sequence {yn } has the limit L, which we write as
lim yn = L
n→∞
, if for any ϵ > 0 there is an integer N (which depends on ϵ) with the property that
|yn − L| < ϵ for each n > N . {yn } is said to converge to L. If the above does not hold, then
{yn } diverges.
48 CHAPTER 3. LIMITS
1.0
2.00 20
1.75 0.5
15
Cn
An
Bn
1.50 0.0
10
1.25 −0.5
1.00
−1.0
5 10 15 20 5 10 15 20 5 10 15 20
n n n
This looks reasonable enough. The harder question, obviously is when the parts of the
fraction don’t converge. If limn→∞ yn = ∞ and limn→∞ zn = ∞, What is limn→∞ yn − zn ?
What is limn→∞ yznn ?
It is nice for a sequence to converge in limit. We want to know if complex-looking sequences
converge or not. The name of the game here is to break that complex sequence up into
sums of simple fractions where n only appears in the denominator: n1 , n12 , and so on. Each
of these will converge to 0, because the denominator gets larger and larger. Then, because
of the properties above, we can then find the final sequence.
Solution. At first glance, n + 3 and n both grow to ∞, so it looks like we need to divide
infinity by infinity. However, we can express this fraction as a sum, then the limits apply
separately:
( ) ( )
n+3 3 3
lim = lim 1 + = lim 1 + lim
n→∞ n n→∞ n n→∞ n→∞ n
| {z } | {z }
1 0
Exercise 3.1. Find the following limits of sequences, then explain in English the intuition
for why that is the case.
2n
1. lim 2
n→∞ n +1
2. lim (n3 − 100n2 )
n→∞
50 CHAPTER 3. LIMITS
We’ve now covered functions and just covered limits of sequences, so now is the time to
combine the two.
A function f is a compact representation of some behavior we care about. Like for sequences,
we often want to know if f (x) approaches some number L as its independent variable x moves
to some number c (which is usually 0 or ±∞). If it does, we say that the limit of f (x), as
x approaches c, is L: lim f (x) = L. Unlike a sequence, x is a continuous number, and we
x→c
can move in decreasing order as well as increasing.
For a limit L to exist, the function f (x) must approach L from both the left (increasing)
and the right (decreasing).
Definition 3.2 (Limit of a function). Let f (x) be defined at each point in some open
interval containing the point c. Then L equals lim f (x) if for any (small positive) number ϵ,
x→c
there exists a corresponding number δ > 0 such that if 0 < |x − c| < δ, then |f (x) − L| < ϵ.
A neat, if subtle result is that f (x) does not necessarily have to be defined at c for lim to
x→c
exist.
Properties: Let f and g be functions with lim f (x) = k and lim g(x) = ℓ.
x→c x→c
Simple limits of functions can be solved as we did limits of sequences. Just be careful which
part of the function is changing.
Example 3.3 (Limits of Functions). Find the limit of the following functions.
1. limx→c k
2. limx→c x
3. limx→2 (2x − 3)
4. limx→c xn
Limits can get more complex in roughly two ways. First, the functions may become large
polynomials with many moving pieces. Second,the functions may become discontinuous.
The function can be thought of as a more general or “smooth” version of sequences. For
example,
(x4 + 3x99)(2x5 )
lim
x→∞ (18x7 + 9x6 3x2 1)(x + 1)
So there are a few more alternatives about what a limit of a function could be:
1. Right-hand limit: The value approached by f (x) when you move from right to left.
2. Left-hand limit: The value approached by f (x) when you move from left to right.
3. Infinity: The value approached by f (x) as x grows infinitely large. Sometimes this
may be a number; sometimes it might be ∞ or −∞.
4. Negative infinity: The value approached by f (x) as x grows infinitely negative. Some-
times this may be a number; sometimes it might be ∞ or −∞.
The distinction between left and right becomes important when the function is not deter-
mined for some values of x. What are those cases in the examples below?
3.4 Continuity
To repeat a finding from the limits of functions: f (x) does not necessarily have to be defined
at c for lim to exist. Functions that have breaks in their lines are called discontinuous.
x→c
Functions that have no breaks are called continuous. Continuity is a concept that is more
fundamental to, but related to that of “differentiability”, which we will cover next in calculus.
Definition 3.3 (Continuity). Suppose that the domain of the function f includes an open
interval containing the point c. Then f is continuous at c if lim f (x) exists and if lim f (x) =
x→c x→c
f (c). Further, f is continuous on an open interval (a, b) if it is continuous at each point in
the interval.
To prove that a function is continuous for all points is beyond this practical introduction to
math, but the general intuition can be grasped by graphing.
Example 3.4 (Continuous and Discontinuous Functions). For each function, determine if
it is continuous or discontinuous.
√
1. f (x) = x
2. f (x) = ex
3. f (x) = 1 + x12
4. f (x) = floor(x).
52 CHAPTER 3. LIMITS
f(x) = x 1
f(x) =
x
15
20
10
10
f(x)
f(x)
5 −10
−20
0
0 5 10 15 −2 −1 0 1 2
x x
f(x) = x f(x) = ex
3
6
2
f(x)
f(x)
4
1
2
0 0
0.0 2.5 5.0 7.5 10.0 −2 −1 0 1 2
x x
1 f(x) = floor(x)
f(x) = 1 + 2
x 5
150 4
3
f(x)
100
f(x)
2
50
1
0 0
−4 −2 0 2 4 0 1 2 3 4 5
x x
The floor is the smaller of the two integers bounding a number. So floor(x = 2.999) = 2,
floor(x = 2.0001) = 2, and floor(x = 2) = 2.
Solution. In Figure 3.4, we can see that the first two functions are continuous, and the next
two are discontinuous. f (x) = 1 + x12 is discontinuous at x = 0, and f (x) = floor(x) is
discontinuous at each whole number.
Some properties of continuous functions:
1. If f and g are continuous at point c, then f + g, f − g, f · g, |f |, and αf are continuous
at point c also. f /g is continuous, provided g(c) ̸= 0.
2. Boundedness: If f is continuous on the closed bounded interval [a, b], then there is a
number K such that |f (x)| ≤ K for each x in [a, b].
3. Max/Min: If f is continuous on the closed bounded interval [a, b], then f has a maxi-
mum and a minimum on [a, b]. They may be located at the end points.
x2 + 2x
f (x) = .
x
1. Graph the function. Is it defined everywhere?
2. What is the functions limit at x → 0?
54 CHAPTER 3. LIMITS
Answers to Examples
Example 3.1 { } { }
Solution. 1.{ {An }}= 2 − n12 = 1, 74 , 17 31 49
9 , 16 , 25 , . . . = 2
2 { }
2. {Bn } = n n+1 = 2, 52 , 10 17
3 , 4 ...,
{ ( )} { }
3. {Cn } = (−1)n 1 − n1 = 0, 12 , − 23 , 34 , − 45
Exercise 3.1
Example 3.3
Solution. 1. k
2. c
3. limx→2 (2x − 3) = 2 lim x − 3 lim 1 = 1
x→2 x→2
4. limx→c xn = lim x · · · [ lim x] = c · · · c = cn
x→c x→c
Exercise 3.2
Solution. Although this function seems large, the thing our eyes should focus on is where
the highest order polynomial remains. That will grow the fastest, so if the highest order
term is on the denominator, the fraction will converge to 0, if it is on the numerator it will
converge to negative infinity. Previewing the multiplication by hand, we can see that the
−x9 on the numerator will be the largest power. So the answer will be −∞. We can also
confirm this by writing out fractions:
( )( 2 )
1 + x33 − 4x
99
− x5 + 1
lim ( )( )
4
x→∞ 1 + 9 − 3 5 − 1 7 1 + x1
18x 18x 18x
x4 x5 1 1
× ×− × ×
1 1 18x7 x
x
=1 × lim
−x→∞ 18
Exercise 3.4
Solution. See Figure 3.5.
Divide each part by x, and we get x + x2 on the numerator, 1 on the denominator. So,
without worrying about a function being not defined, we can say limx→0 f (x) = 0.
3.4. CONTINUITY 55
x2 + 2x
f(x) =
x2
4
2
f(x)
−2
−4 −2 0 2
x
Calculus
Calculus is a fundamental part of any type of statistics exercise. Although you may not be
taking derivatives and integral in your daily work as an analyst, calculus undergirds many
concepts we use: maximization, expectation, and cumulative probability.
∞
∑
E(X) = xj P (X = xj )
j=1
even more concretely, if the potential values of X are finite, then we can write out the
expected value as a weighted mean, where the weights is the probability that the value
occurs.
∑
E(X) = x ·
|{z} P (X = x)
x
| {z }
value weight, or PMF
57
58 CHAPTER 4. CALCULUS
4.1 Derivatives
The derivative of f at x is its rate of change at x: how much f (x) changes with a change in x.
The rate of change is a fraction — rise over run — but because not all lines are straight and
the rise over run formula will give us different values depending on the range we examine,
we need to take a limit (Section 3).
Definition 4.1 (Derivative). Let f be a function whose domain includes an open interval
containing the point x. The derivative of f at x is given by
d f (x + h) − f (x) f (x + h) − f (x)
f (x) = lim = lim
dx h→0 (x + h) − x h→0 h
If f (x) is a straight line, the derivative is the slope. For a curve, the slope changes by the
values of x, so the derivative is the slope of the line tangent to the curve at x. See, For
example, Figure 4.1.
If f ′ (x) exists at a point x0 , then f is said to be differentiable at x0 . That also implies
that f (x) is continuous at x0 .
Properties of derivatives
Suppose that f and g are differentiable at x and that α is a constant. Then the functions
f ± g, αf , f g, and f /g (provided g(x) ̸= 0) are also differentiable at x. Additionally,
Constant rule:
′
[kf (x)] = kf ′ (x)
Sum rule:
′
[f (x) ± g(x)] = f ′ (x) ± g ′ (x)
With a bit more algebra, we can apply the definition of derivatives to get a formula for of
the derivative of a product and a derivative of a quotient.
Product rule:
′
[f (x)g(x)] = f ′ (x)g(x) + f (x)g ′ (x)
Quotient rule:
′ f ′ (x)g(x) − f (x)g ′ (x)
[f (x)/g(x)] = , g(x) ̸= 0
[g(x)]2
Finally, one way to think of the power of derivatives is that it takes a function a notch down
in complexity. The power rule applies to any higher-order function:
4.1. DERIVATIVES 59
f(x) = 2x
6
2.025
3
f ' (x)
2.000
f(x)
−3 1.975
−6
1.950
−2 0 2 −2 0 2
x x
g(x) = x3
20
20
g ' (x)
10
g(x)
0 10
−10
−20
0
−2 0 2 −2 0 2
x x
Power rule: [ k ]′
x = kxk−1
For any real number k (that is, both whole numbers and fractions). The power rule is proved
by induction, a neat method of proof used in many fundamental applications to prove that
a general statement holds for every possible case, even if there are countably infinite cases.
We’ll show a simple case where k is an integer here.
[ k ]′
x = kxk−1
for any integer k.
First, consider the first case (the base case) of k = 1. We can show by the definition of
derivatives (setting f (x) = x1 = 1) that
(x + h) − x
[x1 ]′ = lim = 1.
h→0 (x + h) − x
Because 1 is also expressed as 1x1−1 , the statement we want to prove holds for the case
k = 1.
Now, assume that the statement holds for some integer m. That is, assume
′
[xm ] = mxm−1
Then, for the case m + 1, using the product rule above, we can simplify
[ m+1 ]′
x = [xm · x]′
= (xm )′ · x + (xm ) · (x)′
= mxm−1 · x + xm ∵ by previous assumption
= mxm + xm
= (m + 1)xm
= (m + 1)x(m+1)−1
Therefore, the rule holds for the case k = m + 1 once we have assumed it holds for k = m.
Combined with the first case, this completes proof by induction – we have now proved that
the statement holds for all integers k = 1, 2, 3, · · ·.
To show that it holds for real fractions as well, we can prove expressing that exponent by a
fraction of two integers.
These “rules” become apparent by applying the definition of the derivative above to each of
the things to be “derived”, but these come up so frequently that it is best to repeat until it
is muscle memory.
4.2. HIGHER-ORDER DERIVATIVES (DERIVATIVES OF DERIVATIVES OF DERIVATIVES)61
Exercise 4.1 (Derivative of Polynomials). For each of the following functions, find the
first-order derivative f ′ (x).
1. f (x) = c
2. f (x) = x
3. f (x) = x2
4. f (x) = x3
5. f (x) = x12
6. f (x) = (x3 )(2x4 )
7. f (x) = x4 − x3 + x2 − x + 1
8. f (x) = (x2 + 1)(x3 − 1)
9. f (x) = 3x2 + 2x1/3
2
10. f (x) = xx2 −1
+1
d dy
f ′ (x), y ′ , f (x),
dx dx
We can keep applying the differentiation process to functions that are themselves derivatives.
The derivative of f ′ (x) with respect to x, would then be
f ′ (x + h) − f ′ (x)
f ′′ (x) = lim
h→0 h
and we can therefore call it the Second derivative:
d2 d2 y
f ′′ (x), y ′′ , f (x),
dx2 dx2
Similarly, the derivative of f ′′ (x) would be called the third derivative and is denoted f ′′′ (x).
dn dn y
And by extension, the nth derivative is expressed as dx n f (x), dxn .
f (x) = x3
f ′ (x) = 3x2
f ′′ (x) = 6x
f ′′′ (x) = 6
f ′′′′ (x) = 0
62 CHAPTER 4. CALCULUS
Earlier, in Section 4.1, we said that if a function differentiable at a given point, then it
must be continuous. Further, if f ′ (x) is itself continuous, then f (x) is called continuously
differentiable. All of this matters because many of our findings about optimization (Section
5) rely on differentiation, and so we want our function to be differentiable in as many layers.
A function that is continuously differentiable infinitly is called “smooth”. Some examples:
f (x) = x2 , f (x) = ex .
Example 4.2. Let f (x) = log x for 0 < x < ∞ and g(x) = x2 for −∞ < x < ∞.
Then
(f ◦ g)(x) = log x2 , −∞ < x < ∞ − {0}
Also
(g ◦ f )(x) = [log x]2 , 0 < x < ∞
With the notation of composite functions in place, now we can introduce a helpful addi-
tional rule that will deal with a derivative of composite functions as a chain of concentric
derivatives.
Chain Rule:
Let y = (f ◦ g)(x) = f [g(x)]. The derivative of y with respect to x is
d
{f [g(x)]} = f ′ [g(x)]g ′ (x)
dx
We can read this as: “the derivative of the composite function y is the derivative of f
evaluated at g(x), times the derivative of g.”
The chain rule can be thought of as the derivative of the “outside” times the derivative of
the “inside”, remembering that the derivative of the outside function is evaluated at the
value of the inside function.
4.4. DERIVATIVES OF NATURAL LOGS AND THE EXPONENT 63
This expression does not imply that the dg(x)’s cancel out, as in fractions. They are
part of the derivative notation and you can’t separate them out or cancel them.)
Example 4.3 (Composite Exponent). Find f ′ (x) for f (x) = (3x2 + 5x − 7)6 .
The direct use of a chain rule is when the exponent of is itself a function, so the power rule
could not have applied generaly:
Generalized Power Rule:
If f (x) = [g(x)]p for any rational number p,
Theorem 4.1. The functions ex and the natural logarithm log(x) are continuous and
differentiable in their domains, and their first derivate is
(ex )′ = ex
1
log(x)′ =
x
Also, when these are composite functions, it follows by the generalized power rule that
( )′
eg(x) = eg(x) · g ′ (x)
′ g ′ (x)
(log g(x)) = , if g(x) > 0
g(x)
20 20
15 15
f ' (x)
f(x)
10 10
5 5
0 0
−2 0 2 −2 0 2
x x
f(x) = ex f(x) = ex
d
2. Same thing if there were a constant in front: dx αex = αex
dn x
3. Same thing no matter how many derivatives there are in front: dx n αe = αex
4. Chain Rule: When the exponent is a function of x, remember to take derivative of
d g(x)
that function and add to product. dx e = eg(x) g ′ (x)
Example 4.4 (Derivative of exponents). Find the derivative for the following.
1. f (x) = e−3x
2
2. f (x) = ex
3. f (x) = (x − 1)ex
Derivatives of log
The natural log is the mirror image of the natural exponent and has mirroring properties,
again, to repeat the theorem,
d
1. log prime x is one over x: dx log x = x1 (Figure 4.3)
d d k
2. Exponents become multiplicative constants: dx log xk = dx k log x = x
′
d
3. Chain rule again: dx log u(x) = uu(x)
(x)
d x
4. For any positive base b, dx b = (log b) (bx ).
4.4. DERIVATIVES OF NATURAL LOGS AND THE EXPONENT 65
1
40
0
30
−1
f ' (x)
f(x)
20
−2
10
−3
0 1 2 3 0 1 2 3
x x
f(x) = log(x) f(x) = log(x)
1. f (x) = log(x2 + 9)
2. f (x) = log(log x)
3. f (x) = (log x)2
4. f (x) = log ex
Outline of Proof
We actually show the derivative of the log first, and then the derivative of the exponential
naturally follows.
The general derivative of the log at any base a is solvable by the definition of derivatives.
( )
′ 1 h
(loga x) = lim loga 1 +
h→0 h x
h
Re-express g = x and get
1
(loga x)′ =
1
lim log (1 + g) g
x g→0 a
1
= loga e
x
y = ax
⇒ log y = x log a
y′
⇒ = log a
y
⇒ y ′ = y log a
(ex )′ = (ex )
4.5. PARTIAL DERIVATIVES 67
∂f f (x1 , . . . , xi + h, . . . , xn ) − f (x1 , . . . , xi , . . . , xn )
(x1 , . . . , xn ) = lim
∂xi h→0 h
Only the ith variable changes — the others are treated as constants.
We can take higher-order partial derivatives, like we did with functions of a single variable,
except now the higher-order partials can be with respect to multiple variables.
Example 4.6 (More than one type of partial). Notice that you can take partials with
regard to different variables.
Suppose f (x, y) = x2 + y 2 . Then
∂f
(x, y) =
∂x
∂f
(x, y) =
∂y
∂2f
(x, y) =
∂x2
∂2f
(x, y) =
∂x∂y
Exercise 4.2. Let f (x, y) = x3 y 4 + ex − log y. What are the following partial derivaitves?
∂f
(x, y) =
∂x
∂f
(x, y) =
∂y
∂2f
(x, y) =
∂x2
∂2f
(x, y) =
∂x∂y
68 CHAPTER 4. CALCULUS
f ′ (a) f ′′ (a)
f (x) = f (a) + (x − a) + (x − a)2 + · · ·
1! 2!
∑∞
f (n) (a)
= (x − a)n
n=0
n!
f ′ (a) f ′′ (a)
f (x) = f (a) + (x − a) + (x − a)2 + R2
1! 2!
R2 is the remainder (R for remainder, 2 for the fact that we took two derivatives) and often
treated as negligible, giving us:
f ′′ (a)
f (x) ≈ f (a) + f ′ (a)(x − a) + (x − a)2
2
The more derivatives that are added, the smaller the remainder R and the more accurate
the approximation. Proofs involving limits guarantee that the remainder converges to 0 as
the order of derivation increases.
F ′ = f.
4.7. THE INDEFINITE INTEGRATION 69
Another way to describe is through the inverse formula. Let DF be the derivative of F .
And let DF (x) be the derivative of F evaluated at x. Then the antiderivative is denoted by
D−1 (i.e., the inverse derivative). If DF = f , then F = D−1 f .
This definition bolsters the main takeaway about integrals and derivatives: They are inverses
of each other.
We know from derivatives how to manipulate F to get f . But how do you express the
procedure to manipulate f to get F ? For that, we need a new symbol, which we will call
indefinite integration.
f (x) = (x2 − 4)
Solution. The Indefinite Integral of the function f (x) = (x2 − 4) can, for example, be
F (x) = 31 x3 − 4x. But it can also be F (x) = 13 x3 − 4x + 1, because the constant 1 disappears
when taking the derivative.
Some of these functions are plotted in the bottom panel of Figure 4.4 as dotted lines.
Notice from these examples that while there is only a single derivative for any function,
there are multiple antiderivatives: one for any arbitrary constant c. c just shifts the curve
up or down on the y-axis. If more information is present about the antiderivative — e.g.,
that it passes through a particular point — then we can solve for a specific value of c.
Some common rules of integrals follow by virtue of being the inverse of a derivative.
∫ ∫
1. Constants are allowed to slip out: af (x)dx = ∫ a f (x)dx ∫ ∫
2. Integration of the sum
∫ isnsum of 1integrations: [f (x) + g(x)]dx = f (x)dx + g(x)dx
3. Reverse Power-rule: x dx = n+1 xn+1 + c
∫
4. Exponents are still exponents: ex dx = ex + c ∫
5. Recall the derivative of log(x) is one over x, and so: x1 dx = log x + c
70 CHAPTER 4. CALCULUS
10
5
f(x)
−4 −2 0 2 4
x
4
⌠f(x)dx
0
⌡
−4
−4 −2 0 2 4
x
75 75
f(x)
f(x)
50 50
25 25
0 0
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
x x
∫ f (x) ′
6. Reverse chain-rule:
∫ e f (x)dx = ef (x) + c
n ′ 1
7. More generally: [f (x)] f (x)dx = n+1 [f (x)]n+1 + c
∫ ′ (x)
8. Remember the derivative of a log of a function: ff (x) dx = log f (x) + c
of length ∆x and then approximate the region with a series of rectangles, where the base
of each rectangle is ∆x and the height is f (x) at the midpoint of that interval. A(R) would
then be approximated by the area of the union of the rectangles, which is given by
∑
n
S(f, ∆x) = f (xi )∆x
i=1
Figure 4.5 shows that illustration. The curve depicted is f (x) = −15(x − 5) + (x − 5)3 + 50.
We want approximate the area under the curve between the x values of 0 and 10. We can
do this in blocks of arbitrary width, where the sum of rectangles (the area of which is width
times f (x) evaluated at the midpoint of the bar) shows the Riemann Sum. As the width of
the bars ∆x becomes smaller, the better the estimate of A(R).
This is how we define the “Definite” Integral:
Definition 4.4 (The Definite Integral (Riemann)). If for a given function f the Riemann
sum approaches a limit as ∆x → ∫0, then that limit is called the Riemann integral of f from
a to b. We express this with the , symbol, and write
∫b ∑
n
f (x)dx = lim f (xi )∆x
∆x→0
a i=1
The most straightforward of a definite integral is the definite integral. That is, we read
∫b
f (x)dx
a
as the definite integral of f from a to b and we defined as the area under the “curve” f (x)
from point x = a to x = b.
The fundamental theorem of calculus shows us that this sum is, in fact, the antiderivative.
Theorem 4.2 (First Fundamental Theorem of Calculus). Let the function f be bounded
on [a, b] and continuous on (a, b). Then, suggestively, use the symbol F (x) to denote the
definite integral from a to x:
∫x
F (x) = f (t)dt, a≤x≤b
a
4.8. THE DEFINITE INTEGRAL: THE AREA UNDER THE CURVE 73
That is, the definite integral function of f is the one of the antiderivatives of some f .
This is again a long way of saying that that differentiation is the inverse of integration. But
now, we’ve covered definite integrals.
The second theorem gives us a simple way of computing a definite integral as a function of
indefinite integrals.
Theorem 4.3 (Second Fundamental Theorem of Calculus). Let the function f be bounded
on [a, b] and continuous on (a, b). Let F be any function that is continuous on [a, b] such
that F ′ (x) = f (x) on (a, b). Then
∫b
f (x)dx = F (b) − F (a)
a
∫b
So the procedure to calculate a simple definite integral f (x)dx is then
a
∫3
Example 4.9 (Definite Integral of a monomial). Solve 3x2 dx.
1
∫2 x
Exercise 4.4. What is the value of ex ee dx?
−2
The area-interpretation of the definite integral provides some rules for simplification.
1. There is no area below a point:
∫a
f (x)dx = 0
a
∫b ∫b ∫b
[αf (x) + βg(x)]dx = α f (x)dx + β g(x)dx
a a a
∫b ∫c ∫c
f (x)dx + f (x)dx = f (x)dx
a b a
Exercise 4.5 (Definite integral shortcuts). Simplify the following definite intergrals.
∫1
1. 3x2 dx =
1
∫4
2. (2x + 1)dx =
0
∫0 x ∫2 x
3. ex ee dx + ex ee dx =
−2 0
but g(x) is complex and none of the formulas we have seen so far seem to apply immediately.
The trick is to come up with a new function u(x) such that
Why does an introduction of yet another function end of simplifying things? Let’s refer to
the antiderivative of f as F . Then the chain rule tells us that
d
F [u(x)] = f [u(x)]u′ (x)
dx
. So, F [u(x)] is the antiderivative of g. We can then write
∫ ∫ ∫
′ d
g(x)dx = f [u(x)]u (x)dx = F [u(x)]dx = F [u(x)] + c
dx
4.10. INTEGRATION BY PARTS 75
∫
To summarize, the procedure to determine the indefinite integral g(x)dx by the method
of substitution:
1. Identify some part of g(x) that might be simplified by substituting in a single variable
u (which will then be a function of x).
2. Determine if g(x)dx can be reformulated in terms of u and du.
3. Solve the indefinite integral.
4. Substitute back in for x
Substitution can also be used to calculate a definite integral. Using the same procedure as
above,
∫b ∫d
g(x)dx = f (u)du = F (d) − F (c)
a c
√
For the above problem, we could have also used the substitution u = x + 1. Then x = u2 −1
and dx = 2udu. Substituting these in, we get
∫ ∫
√
x2 x + 1dx = (u2 − 1)2 u2udu
which when expanded is again a polynomial and gives the same result as above.
Another case in which integration by substitution is is useful is with a fraction.
∫1
5e2x
dx.
(1 + e2x )1/3
0
or ∫ ∫
u(x)v ′ (x)dx = u(x)v(x) − v(x)u′ (x)dx
∫b ∫b
dv b du
u dx = uv|a − v dx
dx dx
a a
Our goal here is to find expressions for u and dv that, when substituted into the above
equation, yield an expression that’s more easily evaluated.
Example 4.12 (Integration by Parts). Simplify the following integrals. These seemingly
obscure forms of integrals come up often when integrating distributions.
∫
xeax dx
dv
Solution. Let u = x and dx = eax . Then du = dx and v = (1/a)eax . Substituting this into
the integration by parts formula, we obtain
∫ ∫
xeax dx = uv − vdu
( ) ∫
1 ax 1 ax
= x e − e dx
a a
1 ax 1
= xe − 2 eax + c
a a
2. Integrate
∫
x3 e−x dx
2
4.10. INTEGRATION BY PARTS 77
Exercise 4.1
Solution. 1. f ′ (x) = 0
′
2. f (x) = 1
3. f ′ (x) = 2x3
4. f ′(x) = 3x2
5. f ′(x) = −2x−3
6. f ′(x) = 14x6
7. f ′(x) = 4x3 − 3x2 + 2x − 1
8. f ′(x) = 5x4 + 3x2 − 2x
−2
9. f ′(x) = 6x + 23 x 3
−4x
10. f ′(x) = x4 −2x 2 +1
Example 4.3
Solution. For convenience, define f (z) = z 6 and z = g(x) = 3x2 + 5x − 7. Then, y = f [g(x)]
and
d
y = f ′ (z)g ′ (x)
dx
= 6(3x2 + 5x − 7)5 (6x + 5)
Example 4.4
Solution. 1. Let u(x) = −3x. Then u′ (x) = −3 and f ′ (x) = −3e−3x .
2. Let u(x) = x2 . Then u′ (x) = 2x and f ′ (x) = 2xex .
2
Example 4.5
Solution. 1. Let u(x) = x2 + 9. Then u′ (x) = 2x and
dy u′ (x) 2x
= = 2
dx u(x) (x + 9)
dy (2 log x)
=
dx x
4. We know that log ex = x and that dx/dx = 1, but we can double check. Let u(x) = ex .
′ x
Then u′ (x) = ex and dx
dy
= uu(x)
(x)
= eex = 1.
Example 4.9
d 3
Solution. What is F (x)? From the power rule, recognize dx x = 3x2 so
78 CHAPTER 4. CALCULUS
F (x) = x3
∫3
f (x)dx = F (x = 3) − F (x − 1)
1
= 33 − 13
= 26
Example 4.10 √ √
Solution. The problem here is the x + 1 term. However, if the integrand had x times
some polynomial, then we’d be in business. Let’s try u = x+1. Then x = u−1 and dx = du.
Substituting these into the above equation, we get
∫ ∫
√ √
x 2
x + 1dx = (u − 1)2 udu
∫
= (u2 − 2u + 1)u1/2 du
∫
= (u5/2 − 2u3/2 + u1/2 )du
We can easily integrate this, since it is just a polynomial. Doing so and substituting u = x+1
back in, we get
∫ [ ]
√ 1 2 1
x2 x + 1dx = 2(x + 1)3/2 (x + 1)2 − (x + 1) + +c
7 5 3
Example 4.11
Solution. When an expression is raised to a power, it is often helpful to use this expression
as the basis for a substitution. So, let u = 1 + e2x . Then du = 2e2x dx and we can set
5e2x dx = 5du/2. Additionally, u = 2 when x = 0 and u = 1 + e2 when x = 1. Substituting
all of this in, we get
∫1 ∫
2
1+e
5e2x 5 du
2x 1/3
dx =
(1 + e ) 2 u1/3
0 2
∫
2
1+e
5
= u−1/3 du
2
2
1+e2
15 2/3
= u
4 2
= 9.53
4.10. INTEGRATION BY PARTS 79
Exercise 4.6
1. ∫
xn eax dx
Solution. As in the first problem, let
u = xn , dv = eax dx
Then du = nxn−1 dx and v = (1/a)eax .
Substituting these into the integration by parts formula gives
∫ ∫
n ax
x e dx = uv − vdu
( ) ∫
1 ax 1 ax n−1
= xn e − e nx dx
a a
∫
1 n ax n
= x e − xn−1 eax dx
a a
Notice that we now have an integral similar to the previous one, but with xn−1 instead of
xn .
For a given n, we would repeat the integration by parts∫ procedure until the integrand was
directly integratable — e.g., when the integral became eax dx.
2. ∫
x3 e−x dx
2
Solution. We could, as before, choose u = x3 and dv = e−x dx. But we can’t then find v —
2
d −x2
= −2xe−x ,
2
e
dx
which can be factored out of the original integrand
∫ ∫
3 −x2
dx = x2 (xe−x )dx.
2
x e
We can then let u = x2 and dv = xe−x dx. Thedu = 2xdx and v = − 12 e−x . Substituting
2 2
∫ ∫
3 −x2
x e dx = uv − vdu
( ) ∫ ( )
1 −x2 1 −x2
= x − e
2
− − e 2xdx
2 2
∫
1
− x2 e−x + xe−x dx
2 2
=
2
1 1
− x2 e−x − e−x + c
2 2
=
2 2
80 CHAPTER 4. CALCULUS
Chapter 5
Optimization
Example: Meltzer-Richard
A standard backdrop in comparative political economy, the Meltzer-Richard (1981) model
states that redistribution of wealth should be higher in societies where the median income
is much smaller than the average income. More to the point, typically income distributions
wher ethe median is very different from the average is one of high inequality. In other
words, the Meltzer-Richard model says that highly unequal economies will have more re-
distribution of wealth. Why is that the case? Here is a simplified example that is not the
exact model by Meltzer and Richard1 , but adapted from Persson and Tabellini2
We will set the following things about our model human and model democracy.
• Individuals are indexed by i, and the total population is normalized to unity (“1”)
without loss of generality.
• U (·), u for “utility”, is a function that is concave and increasing, and expresses the
utility gained from public goods. This tells us that its first derivative is positive, and
its second derivative is negative.
1 Allan H. Meltzer and Scott F. Richard. “A Rational Theory of the Size of Government”. Journal of
Political Economy 89:5 (1981), p. 914-927
2 Adapted from Torsten Persson and Guido Tabellini, Political Economics: Explaining Economic Policy.
MIT Press.
81
82 CHAPTER 5. OPTIMIZATION
Wi = ci + U (g)
ci = (1 − τ )yi
Income varies by person (In the next section we will cover probability, by then we will
know that we can express this by saying that y is a random variable with the cumulative
distribution function F , i.e. y ∼ F .). Every distribution has a mean and an median.
• E(y) is the average income of the society.
• med(y) is the median income of the society.
What will happen in this economy? What will the tax rate be set too? How much public
goods will be provided?
We’ve skipped ahead of some formal theory results of demoracy, but hopefully these are
conceptually intuitive. First, if a democracy is competitive, there is no slack in the govern-
ment’s goods, and all tax revenue becomes a public good. So we can go ahead and set the
constraint:
∑
g= τ yi P (yi ) = τ E(y)
i
We can do this trick because of the “normalizes to unity” setting, but this is a general
property of the average.
Now given this constraint we can re-write an individual’s welfare as
( )
g
Wi = 1− yi + U (g)
E(y)
1
= (E(y) − g) yi + U (g)
E(y)
yi
= (E(y) − g) + U (g)
E(y)
d yi d
Wi = − + U (g)
dg E(y) dg
d d yi d
dg Wi = 0 when dg U (g) = E(y) , and so after expressing the derivative as Ug = dg U (g) for
simplicity,
( )
−1 yi
gi⋆ = Ug
E(y)
Now recall that because we assumed concavity, Ug is a negative sloping function whose
value is positive. It can be shown that the inverse of such a function is also decreasing.
Thus an individual’s preferred level of government is determined by a single continuum, the
person’s income divided by the average income, and the function is decreasing in yi . This
is consistent with our intuition that richer people prefer less redistribution.
That was the amount for any given person. The government has to set one value of g,
however. So what will that be? Now we will use another result, the median voter theorem.
This says that under certain general electoral conditions (single-peaked preferences, two
parties, majority rule), the policy winner will be that preferred by the median person in the
population. Because the only thing that determines a person’s preferred level of government
is yi /E(y), we can presume that the median voter, whose income is med(y) will prevail in
their preferred choice of government. Therefore, we wil see
( )
⋆ −1 med(y)
g = Ug
E(y)
What does this say about the level of redistribution we observe in an economy? The higher
the average income is than the median income, which often (but not always) means more
inequality, there should be more redistribution.
Exercise 5.1 (Plotting a mazimum and minimum). Plot f (x) = x3 + x2 + 2, plot its
derivative, and identifiy where the derivative is zero. Is there a maximum or minimum?
84 CHAPTER 5. OPTIMIZATION
9
3
6
f ' (x)
f(x)
3
−3
0 −6
−2 0 2 −2 0 2
x x
The second derivative f ′′ (x) identifies whether the function f (x) at the point x is
1. Concave down: f ′′ (x) < 0
2. Concave up: f ′′ (x) > 0
Maximum (Minimum): x0 is a local maximum (minimum) if f (x0 ) > f (x) (f (x0 ) <
f (x)) for all x within some open interval containing x0 . x0 is a global maximum (mini-
mum) if f (x0 ) > f (x) (f (x0 ) < f (x)) for all x in the domain of f .
Given the function f defined over domain D, all of the following are defined as critical
points:
1. Any interior point of D where f ′ (x) = 0.
2. Any interior point of D where f ′ (x) does not exist.
3. Any endpoint that is in D.
The maxima and minima will be a subset of the critical points.
Second Derivative Test of Maxima/Minima: We can use the second derivative to tell
us whether a point is a maximum or minimum of f (x).
1. Local Maximum: f ′ (x) = 0 and f ′′ (x) < 0
2. Local Minimum: f ′ (x) = 0 and f ′′ (x) > 0
3. Need more info: f ′ (x) = 0 and f ′′ (x) = 0
Global Maxima and Minima Sometimes no global max or min exists — e.g., f (x) not
bounded above or below. However, there are three situations where we can fairly easily
identify global max or min.
1. Functions with only one critical point. If x0 is a local max or min of f and it is
the only critical point, then it is the global max or min.
2. Globally concave up or concave down functions. If f ′′ (x) is never zero, then
there is at most one critical point. That critical point is a global maximum if f ′′ < 0
and a global minimum if f ′′ > 0.
3. Functions over closed and bounded intervals must have both a global maximum
and a global minimum.
Example 5.1 (Maxima and Minima by drawing). Find any critical points and identify
whether they are a max, min, or saddle point:
1. f (x) = x2 + 2
2. f (x) = x3 + 2
3. f (x) = |x2 − 1|, x ∈ [−2, 2]
Definition 5.1 (Concave Function). A function f is strictly concave over the set S if
∀x1 , x2 ∈ S and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) > af (x1 ) + (1 − a)f (x2 )
86 CHAPTER 5. OPTIMIZATION
Any line connecting two points on a concave function will lie below the function.
Concave Convex
0
15
−5
10
f(x)
f(x)
−10
5
−15
0
−4 −2 0 2 4 −4 −2 0 2 4
x x
Definition 5.2 (Convex Function). Convex: A function f is strictly convex over the set S
if ∀x1 , x2 ∈ S and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) < af (x1 ) + (1 − a)f (x2 )
Any line connecting two points on a convex function will lie above the function.
Sometimes, concavity and convexity are strict of a requirement. For most purposes of getting
solutions, what we call quasi-concavity is enough.
No matter what two points you select, the lowest valued point will always be an end point.
Second Derivative Test of Concavity: The second derivative can be used to understand
concavity.
If
f ′′ (x) < 0 ⇒ Concave
f ′′ (x) > 0 ⇒ Convex
Quadratic Forms
Quadratic forms is shorthand for a way to summarize a function. This is important for
finding concavity because
1. Approximates local curvature around a point — e.g., used to identify max vs min vs
saddle point.
2. They are simple to express even in n dimensions:
3. Have a matrix representation.
Quadratic Form: A polynomial where each term is a monomial of degree 2 in any number
of variables:
Q(x) = x⊤
1 a11 x1
N variables:
a11 1
2 a12 ··· 1
2 a1n x1
( 1
) 2 a12 ··· 1
a22 2 2n x2
a
Q(x) = x1 x2 ··· xn . .. .. .. ..
.. . . . .
1
2 a1n
1
a
2 2n ··· ann xn
⊤
= x Ax
( )( )
( ) a11 1
Q(x1 , x2 ) = x1 x2 2 a12 x1
1
2 a12 a22 x2
= a11 x21 + a12 x1 x2 + a22 x22
88 CHAPTER 5. OPTIMIZATION
When the function f (x) has more than two inputs, determining whether it has a maxima
and minima (remember, functions may have many inputs but they have only one output)
is a bit more tedious. Definiteness helps identify the curvature of a function, Q(x), in n
dimensional space.
Definiteness: By definition, a quadratic form always takes on the value of zero when x = 0,
Q(x) = 0 at x = 0. The definiteness of the matrix A is determined by whether the quadratic
form Q(x) = x⊤ Ax is greater than zero, less than zero, or sometimes both over all x ≠ 0.
We can see from a graphical representation that if a point is a local maxima or minima, it
must meet certain conditions regarding its derivative. These are so commonly used that we
refer these to “First Order Conditions” (FOCs) and “Second Order Conditions” (SOCs) in
the economic tradition.
When we examined functions of one variable x, we found critical points by taking the first
derivative, setting it to zero, and solving for x. For functions of n variables, the critical
points are found in much the same way, except now we set the partial derivatives equal to
zero. Note: We will only consider critical points on the interior of a function’s domain.
In a derivative, we only took the derivative with respect to one variable at a time. When
we take the derivative separately with respect to all variables in the elements of x and then
express the result as a vector, we use the term Gradient and Hessian.
Definition 5.5 (Gradient). Given a function f (x) in n variables, the gradient ∇f (x) (the
greek letter nabla ) is a column vector, where the ith element is the partial derivative of
f (x) with respect to xi :
∂f (x)
∂x
∂f (x)
1
∂x2
∇f (x) = .
..
∂f (x)
∂xn
Before we know whether a point is a maxima or minima, if it meets the FOC it is a “Critical
Point”.
Definition 5.6 (Critical Point). x∗ is a critical point if and only if ∇f (x∗ ) = 0. If the
partial derivative of f(x) with respect to x∗ is 0, then x∗ is a critical point. To solve for x∗ ,
5.3. FOC AND SOC 89
find the gradient, set each element equal to 0, and solve the system of equations.
∗
x1
x∗2
x∗ = .
..
x∗n
Example 5.2. Example: Given a function f (x) = (x1 − 1)2 + x22 + 1, find the (1) Gradient
and (2) Critical point of f (x).
Solution. Gradient
( )
∂f (x)
∇f (x) = ∂x1
∂f (x)
∂x2
( )
2(x1 − 1)
=
2x2
Critical Point x∗ =
∂f (x)
= 2(x1 − 1) = 0
∂x1
⇒ x∗1 = 1
∂f (x)
= 2x2 = 0
∂x2
⇒ x∗2 = 0
So
x∗ = (1, 0)
When we found a critical point for a function of one variable, we used the second derivative
as a indicator of the curvature at the point in order to determine whether the point was a
min, max, or saddle (second derivative test of concavity). For functions of n variables, we
use second order partial derivatives as an indicator of curvature.
Definition 5.7 (Hessian). Given a function f (x) in n variables, the hessian H(x) is an
n × n matrix, where the (i, j)th element is the second order partial derivative of f (x) with
respect to xi and xj :
90 CHAPTER 5. OPTIMIZATION
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)
∂x21
···
∂x1 ∂x2 ∂x1 ∂xn
2
∂ f (x) ∂ 2 f (x)
··· ∂ 2 f (x)
∂x22 ∂x2 ∂xn
H(x) = ∂x2 ∂x1
.. .. .. ..
. . . .
2
∂ f (x) ∂ 2 f (x) ∂ 2 f (x)
∂xn ∂x1 ∂xn ∂x2 ··· ∂x2n
∂f (x) ∂f (x)
Note that the hessian will be a symmetric matrix because ∂x1 ∂x2 = ∂x2 ∂x1 .
Also note that given that f (x) is of quadratic form, each element of the hessian will be a
constant.
These definitions will be employed when we determine the Second Order Conditions of
a function:
Given a function f (x) and a point x∗ such that ∇f (x∗ ) = 0,
1. Hessian is Positive Definite =⇒ Strict Local Min
2. Hessian is Positive Semidefinite ∀x ∈ B(x∗ , ϵ)} =⇒ Local Min
3. Hessian is Negative Definite =⇒ Strict Local Max
4. Hessian is Negative Semidefinite ∀x ∈ B(x∗ , ϵ)} =⇒ Local Max
5. Hessian is Indefinite =⇒ Saddle Point
Example 5.3 (Max and min with two dimensions). We found that the only critical point
of f (x) = (x1 − 1)2 + x22 + 1 is at x∗ = (1, 0). Is it a min, max, or saddle point?
( )
2 0
H(x) =
0 2
The Leading principal minors of the Hessian are M1 = 2; M2 = 4. Now we consider Defi-
niteness. Since both leading principal minors are positive, the Hessian is positive definite.
Maxima, Minima, or Saddle Point? Since the Hessian is positive definite and the gradient
equals 0, x⋆ = (1, 0) is a strict local minimum.
Note: Alternate check of definiteness. Is H(x∗ ) ≥≤ 0 ∀ x ̸= 0
( )
x⊤ H(x∗ )x = x1 x2
( )
2 0
=
0 2
( )
x1
= 2x21 + 2x22
x2
For any x ̸= 0, 2(x21 + x22 ) > 0, so the Hessian is positive definite and x∗ is a strict local
minimum.
5.4. GLOBAL MAXIMA AND MINIMA 91
Exercise 5.2. Given f (x) = x31 − x32 + 9x1 x2 , find any maxima or minima.
3. Global concavity/convexity.
(a) Is f(x) globally concave/convex?
We have already looked at optimizing a function in one or more dimensions over the whole
domain of the function. Often, however, we want to find the maximum or minimum of a
function over some restricted part of its domain.
ex: Maximizing utility subject to a budget constraint
Types of Constraints: For a function f (x1 , . . . , xn ), there are two types of constraints
that can be imposed:
1. Equality constraints: constraints of the form c(x1 , . . . , xn ) = r. Budget constraints
are the classic example of equality constraints in social science.
2. Inequality constraints: constraints of the form c(x1 , . . . , xn ) ≤ r. These might arise
from non-negativity constraints or other threshold effects.
In any constrained optimization problem, the constrained maximum will always be less
than or equal to the unconstrained maximum. If the constrained maximum is less than the
unconstrained maximum, then the constraint is binding. Essentially, this means that you
can treat your constraint as an equality constraint rather than an inequality constraint.
For example, the budget constraint binds when you spend your entire budget. This generally
happens because we believe that utility is strictly increasing in consumption, i.e. you always
94 CHAPTER 5. OPTIMIZATION
This tells us to maximize/minimize our function, f (x1 , x2 ), with respect to the choice vari-
ables, x1 , x2 , subject to the constraint.
Example:
max f (x1 , x2 ) = −(x21 + 2x22 ) s.t. x1 + x2 = 4
x1 ,x2
It is easy to see that the unconstrained maximum occurs at (x1 , x2 ) = (0, 0), but that does
not satisfy the constraint. How should we proceed?
Equality Constraints
Equality constraints are the easiest to deal with because we know that the maximum or
minimum has to lie on the (intersection of the) constraint(s).
The trick is to change the problem from a constrained optimization problem in n variables
to an unconstrained optimization problem in n + k variables, adding one variable for each
equality constraint. We do this using a lagrangian multiplier.
Lagrangian function: The Lagrangian function allows us to combine the function we
want to optimize and the constraint function into a single function. Once we have this
single function, we can proceed as if this were an unconstrained optimization problem.
For each constraint, we must include a Lagrange multiplier (λi ) as an additional variable
in the analysis. These terms are the link between the constraint and the Lagrangian function.
Given a two dimensional set-up:
∑
k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) − ri )
i=1
5.5. CONSTRAINED OPTIMIZATION 95
Getting the sign right: Note that above we subtract the lagrangian term and we subtract
the constraint constant from the constraint function. Occasionally, you may see the following
alternative form of the Lagrangian, which is equivalent:
∑
k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) + λi (ri − ci (x1 , . . . , xn ))
i=1
Here we add the lagrangian term and we subtract the constraining function from the con-
straint constant.
Using the Lagrangian to Find the Critical Points: To find the critical points, we
take the partial derivatives of lagrangian function, L(x1 , . . . , xn , λ1 , . . . , λk ), with respect to
each of its variables (all choice variables x and all lagrangian multipliers λ). At a critical
point, each of these partial derivatives must be equal to zero, so we obtain a system of n + k
equations in n + k unknowns:
∂L ∂f ∑ ∂ci k
= − λi =0
∂x1 ∂x1 i=1 ∂x1
.. ..
.=.
∂L ∂f ∑ ∂ci k
= − λi =0
∂xn ∂xn i=1 ∂xn
∂L
= c1 (xi , . . . , xn ) − r1 = 0
∂λ1
.. ..
.=.
∂L
= ck (xi , . . . , xn ) − rk = 0
∂λk
We can then solve this system of equations, because there are n + k equations and n + k
unknowns, to calculate the critical point (x∗1 , . . . , x∗n , λ∗1 , . . . , λ∗k ).
Second-order Conditions and Unconstrained Optimization: There may be more
than one critical point, i.e. we need to verify that the critical point we find is a maxi-
mum/minimum. Similar to unconstrained optimization, we can do this by checking the
second-order conditions.
Example 5.4 (Constrained optimization with two goods and a budget constraint). Find
the constrained optimization of
∂L
= −2x1 − λ =0
∂x1
∂L
= −4x2 − λ =0
∂x2
∂L
= −(x1 + x2 − 4) = 0
∂λ
3. Solve the system of equations: Using the first two partials, we see that λ = −2x1 and
λ = −4x2 Set these equal to see that x1 = 2x2 . Using the third partial and the above
equality, 4 = 2x2 + x2 from which we get
5. This gives f ( 83 , 43 ) = − 96
9 , which is less than the unconstrained optimum f (0, 0) = 0
Notice that when we take the partial derivative of L with respect to the Lagrangian multiplier
and set it equal to 0, we return exactly our constraint! This is why signs matter.
Inequality constraints define the boundary of a region over which we seek to optimize the
function. This makes inequality constraints more challenging because we do not know if
the maximum/minimum lies along one of the constraints (the constraint binds) or in the
interior of the region.
We must introduce more variables in order to turn the problem into an unconstrained
optimization.
Slack: For each inequality constraint ci (x1 , . . . , xn ) ≤ ai , we define a slack variable s2i for
which the expression ci (x1 , . . . , xn ) ≤ ai −s2i would hold with equality. These slack variables
capture how close the constraint comes to binding. We use s2 rather than s to ensure that
the slack is positive.
Slack is just a way to transform our constraints.
Given a two-dimensional set-up and these edited constraints:
Adding in Slack:
max / min f (x1 , x2 ) s.t. c(x1 , x2 ) ≤ a1 − s21
x1 ,x2 x1 ,x2
5.6. INEQUALITY CONSTRAINTS 97
∑
k
L(x1 , . . . , xn , λ1 , . . . , λk , s1 , . . . , sk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) + s2i − ai )
i=1
Finding the Critical Points: To find the critical points, we take the partial derivatives
of the lagrangian function, L(x1 , . . . , xn , λ1 , . . . , λk , s1 , . . . , sk ), with respect to each of its
variables (all choice variables x, all lagrangian multipliers λ, and all slack variables s). At a
critical point, each of these partial derivatives must be equal to zero, so we obtain a system
of n + 2k equations in n + 2k unknowns:
∂L ∂f ∑ ∂ci k
= − λi =0
∂x1 ∂x1 i=1 ∂x1
.. ..
.=.
∂L ∂f ∑ ∂ci k
= − λi =0
∂xn ∂xn i=1 ∂xn
∂L
= c1 (xi , . . . , xn ) + s21 − b1 = 0
∂λ1
.. ..
.=.
∂L
= ck (xi , . . . , xn ) + s2k − bk = 0
∂λk
∂L
= 2s1 λ1 = 0
∂s1
.. ..
.=.
∂L
= 2sk λk = 0
∂sk
Complementary slackness conditions: The last set of first order conditions of the form
2si λi = 0 (the partials taken with respect to the slack variables) are known as complementary
slackness conditions. These conditions can be satisfied one of three ways:
1. λi = 0 and si ̸= 0: This implies that the slack is positive and thus the constraint does
not bind.
2. λi ̸= 0 and si = 0: This implies that there is no slack in the constraint and the
constraint does bind.
3. λi = 0 and si = 0: In this case, there is no slack but the constraint binds trivially,
without changing the optimum.
98 CHAPTER 5. OPTIMIZATION
Example: Find the critical points for the following constrained optimization:
∂L
= −2x1 − λ1 = 0
∂x1
∂L
= −4x2 − λ1 = 0
∂x2
∂L
= −(x1 + x2 + s21 − 4) = 0
∂λ1
∂L
= −2s1 λ1 = 0
∂s1
4. Consider all ways that the complementary slackness conditions are solved:
Hypothesis s1 λ1 x1 x2 f (x1 , x2 )
s1 = 0 λ1 = 0 No solution
s1 ̸= 0 λ1 = 0 2 0 0 0 0
−16
s1 = 0 λ1 ̸= 0 0 3
8
3
4
3 − 32
3
s1 ̸= 0 λ1 ̸= 0 No solution
This shows that there are two critical points: (0, 0) and ( 83 , 43 ).
5. Find maximum: Looking at the values of f (x1 , x2 ) at the critical points, we see that
f (x1 , x2 ) is maximized at x∗1 = 0 and x∗2 = 0.
Exercise 5.3. Example: Find the critical points for the following constrained optimization:
x1 + x2 ≤ 4
max f (x) = −(x21 + 2x22 ) s.t. x1 ≥ 0
x1 ,x2
x2 ≥ 0
4. Consider all ways that the complementary slackness conditions are solved:
Hypothesis s1 s2 s3 λ1 λ2 λ3 x1 x2 f (x1 , x2 )
s1 = s2 = s3 = 0
s1 ̸= 0, s2 = s3 = 0
s2 ̸= 0, s1 = s3 = 0
s3 ̸= 0, s1 = s2 = 0
s1 ̸= 0, s2 ̸= 0, s3 = 0
s1 ̸= 0, s3 ̸= 0, s2 = 0
̸ 0, s3 =
s2 = ̸ 0, s1 = 0
s1 ≠ 0, s2 ≠ 0, s3 ̸= 0
5. Find maximum:
As you can see, this can be a pain. When dealing explicitly with non-negativity constraints,
this process is simplified by using the Kuhn-Tucker method.
Because the problem of maximizing a function subject to inequality and non-negativity con-
straints arises frequently in economics, the Kuhn-Tucker conditions provides a method
that often makes it easier to both calculate the critical points and identify points that are
(local) maxima.
Given a two-dimensional set-up:
c(x1 , x2 ) ≤ a1
max / min f (x1 , x2 ) s.t. x1 ≥ 0
x1 ,x2 x1 ,x2
gx2 ≥ 0
We define the Lagrangian function L(x1 , x2 , λ1 ) the same as if we did not have the non-
negativity constraints:
∑
k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) − ai )
i=1
100 CHAPTER 5. OPTIMIZATION
∂L ∂L
≤ 0, . . . , ≤0
∂x1 ∂xn
∂L ∂L
≥ 0, . . . , ≥0
∂λ1 ∂λm
∂L ∂L
x1 = 0, . . . , xn =0
∂x1 ∂xn
∂L ∂L
λ1 = 0, . . . , λm =0
∂λ1 ∂λm
Non-negativity Conditions
x1 ≥ 0 ... xn ≥ 0
λ1 ≥ 0 ... λm ≥ 0
Note that some of these conditions are set equal to 0, while others are set as inequalities!
Note also that to minimize the function f (x1 , . . . , xn ), the simplest thing to do is maximize
the function −f (x1 , . . . , xn ); all of the conditions remain the same after reformulating as a
maximization problem.
There are additional assumptions (notably, f(x) is quasi-concave and the constraints are
convex) that are sufficient to ensure that a point satisfying the Kuhn-Tucker conditions is a
global max; if these assumptions do not hold, you may have to check more than one point.
Finding the Critical Points with Kuhn-Tucker Conditions: Given the above condi-
tions, to find the critical points we solve the above system of equations. To do so, we must
check all border and interior solutions to see if they satisfy the above conditions.
In a two-dimensional set-up, this means we must check the following cases:
1. x1 = 0, x2 =0 Border Solution
2. x1 = 0, x2 ̸= 0 Border Solution
3. x1 ̸= 0, x2 =0 Border Solution
4. x1 ̸= 0, x2 ̸= 0 Interior Solution
5.7. KUHN-TUCKER CONDITIONS 101
Example 5.5 (Kuhn-Tucker with two variables). Solve the following optimization problem
with inequality constraints
max f (x) = −(x21 + 2x22 )
x1 ,x2
x1 + x2 ∗ ≤ 4
s.t. x1 ∗ ≥ 0
x2 ∗ ≥ 0
∂L
= −2x1 − λ ≤ 0
∂x1
∂L
= −4x2 − λ ≤ 0
∂x2
∂L
= −(x1 + x2 − 4) ≥ 0
∂λ
∂L
x1 = x1 (−2x1 − λ) = 0
∂x2
∂L
x2 = x2 (−4x2 − λ) = 0
∂x2
∂L
λ = −λ(x1 + x2 − 4) = 0
∂λ
Non-negativity Conditions
x1 ≥ 0
x2 ≥ 0
λ≥0
3. Consider all border and interior cases:
Hypothesis λ x1 x2 f (x1 , x2 )
x1 = 0, x2 = 0 0 0 0 0
x1 = 0, x2 ̸= 0 -16 0 4 -32
x1 ̸= 0, x2 = 0 -8 4 0 -16
x1 ̸= 0, x2 ̸= 0 − 163
8
3
4
3 − 32
3
102 CHAPTER 5. OPTIMIZATION
4. Find Maximum: Three of the critical points violate the requirement that λ ≥ 0, so
the point (0, 0, 0) is the maximum.
Exercise 5.4 (Kuhn-Tucker with logs). Solve the constrained optimization problem,
x1 + 2x2 ≤ 4
1 2
max f (x) = log(x1 + 1) + log(x2 + 1) s.t. x1 ≥ 0
x1 ,x2 3 3
x2 ≥ 0
Non-negativity Conditions
5.8. APPLICATIONS OF QUADRATIC FORMS 103
4. Find Maximum:
Probability Theory
Probability and Inferences are mirror images of each other, and both are integral to social
science. Probability quantifies uncertainty, which is important because many things in the
social world are at first uncertain. Inference is then the study of how to learn about facts
you don’t observe from facts you do observe.
105
106 CHAPTER 6. PROBABILITY THEORY
making a selection. For example, if replacement is not allowed and I am selecting 3 elements
from the following set {1, 2, 3, 4, 5, 6}, I will have 6 options at first, 5 options as I make
my second selection, and 4 options as I make my third.
So in counting how many different outcomes are possible, if order matters AND we are
sampling with replacement, the number of different outcomes is nk .
If order matters AND we are sampling without replacement, the number of different
outcomes is n(n − 1)(n − 2)...(n − k + 1) = (n−k)!
n!
.
If order doesn’t matter
( ) AND we are sampling without replacement, the number of
different outcomes is nk = (n−k)!k!
n!
.
(n ) n!
Expression k is read as “n choose k” and denotes (n−k)!k! . Also, note that 0! = 1.
Example 6.1 (Counting). There are five balls numbered from 1 through 5 in a jar. Three
balls are chosen. How many possible choices are there?
1. Ordered, with replacement =
2. Ordered, without replacement =
3. Unordered, without replacement =
Exercise 6.1 (Counting). Four cards are selected from a deck of 52 cards. Once a card
has been drawn, it is not reshuffled back into the deck. Moreover, we care only about the
complete hand that we get (i.e. we care about the set of selected cards, not the sequence in
which it was drawn). How many possible outcomes are there?
6.2 Sets
Set : A set is any well defined collection of elements. If x is an element of S, x ∈ S.
Sample Space (S): A set or collection of all possible outcomes from some process. Out-
comes in the set can be discrete elements (countable) or points along a continuous interval
(uncountable).
Examples:
1. Discrete: the numbers on a die, whether a vote cast is republican or democrat.
2. Continuous: GNP, arms spending, age.
Event: Any collection of possible outcomes of an experiment. Any subset of the full set of
possibilities, including the full set itself. Event A ⊂ S.
Empty Set: a set with no elements. S = {}. It is denoted by the symbol ∅.
Set operations:
1. Union: The union of two sets A and B, A∪ B, is the set containing all of the elements
in A or B.
∪
n
A1 ∪ A2 ∪ · · · ∪ An = Ai
i=1
6.3. PROBABILITY 107
Example 6.2 (Sets). Let set A be {1, 2, 3, 4}, B be {3, 4, 5, 6}, and C be {5, 6, 7, 8}.
Sets A, B, and C are all subsets of the sample space S which is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Write out the following sets:
1. A∪B
2. C ∩B
3. Bc
4. A ∩ (B ∪ C)
Exercise 6.2 (Sets). Suppose you had a pair of four-sided dice. You sum the results from
a single toss.
What is the set of possible outcomes (i.e. the sample space)?
Consider subsets A {2, 8} and B {2,3,7} of the sample space you found. What is
1. Ac
2. (A ∪ B)c
6.3 Probability
Many things in the world are uncertain. In everyday speech, we say that we are uncertain
about the outcome of random events. Probability is a formal model of uncertainty which
provides a measure of uncertainty governed by a particular set of rules (Figure 6.1). A
1 Images of Probability and Random Variables drawn by Shiro Kuriwaki and inspired by Blitzstein and
Morris
108 CHAPTER 6. PROBABILITY THEORY
Sample Space: S
s2
s1 An "experiment" from the
(unobserved) data generating process
Event A s6 generates (observed) outcomes.
s3 Events are sets of outcomes.
s4 s8
s7
s5
different model of uncertainty would, of course, have a set of rules different from anything we
discuss here. Our focus on probability is justified because it has proven to be a particularly
useful model of uncertainty.
Probability Distribution Function: a mapping of each event in the sample space S to
the real numbers that satisfy the following three axioms (also called Kolmogorov’s Axioms).
Formally,
Definition 6.1 (Probability). Probability is a function that maps events to a real number,
obeying the axioms of probability.
The axioms of probability make sure that the separate events add up in terms of probability,
and – for standardization purposes – that they add up to 1.
The last axiom is an extension of a union to infinite sets. When there are only two events
in the space, it boils down to:
Probability Operations
Using these three axioms, we can define all of the common rules of probability.
1. P (∅) = 0
2. For any event A, 0 ≤ P (A) ≤ 1.
3. P (AC ) = 1 − P (A)
4. If A ⊂ B (A is a subset of B), then P (A) ≤ P (B).
5. For any two events A and B, P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
6. Boole’s Inequality: For ( nany )
sequence of n events (which need not be disjoint)
∪ ∑n
A1 , A2 , . . . , An , then P Ai ≤ P (Ai ).
i=1 i=1
Exercise 6.3 (Probability). Suppose you had a pair of four-sided dice. You sum the results
from a single toss. Let us call this sum, or the outcome, X.
1. What is P (X = 5), P (X = 3), P (X = 6)?
2. What is P (X = 5 ∪ X = 3)C ?
P (A ∩ B)
P (A|B) =
P (B)
Note that conditional probabilities are probabilities and must also follow the Kolmagorov
axioms of probability.
Example 6.4 (Conditional Probability 1). Assume A and B occur with the following
frequencies:
110 CHAPTER 6. PROBABILITY THEORY
A Ac
B nab nac b
BC nabc n(ab)c
Example 6.5 (Conditional Probability 2). A six-sided die is rolled. What is the probability
of a 1, given the outcome is an odd number?
You could rearrange the fraction to highlight how a joint probability could be expressed as
the product of a conditional probability.
Sometimes it is easier to calculate these conditional probabilities and sum them than it is
to calculate P (A) directly.
Definition 6.4 (Law of Total Probability). Let S be the sample space of some experiment
and let the disjoint k events B1 , . . . , Bk partition S, such that P (B1 ∪ ... ∪ Bk ) = P (S) = 1.
If A is some other event in S, then the events A∩B1 , A∩B2 , . . . , A∩Bk will form a partition
of A and we can write A as
A = (A ∩ B1 ) ∪ · · · ∪ (A ∩ Bk )
.
Since the k events are disjoint,
∑
k
P (A) = P (A ∩ Bi )
i=1
∑
k
= P (Bi )P (A|Bi )
i=1
6.5. INDEPENDENCE 111
Bayes Rule: Assume that events B1 , . . . , Bk form a partition of the space S. Then by the
Law of Total Probability
P (A ∩ Bj ) P (Bj )P (A|Bj )
P (Bj |A) = = k
P (A) ∑
P (Bi )P (A|Bi )
i=1
P (B1 )P (A|B1 )
P (B1 |A) =
P (B1 )P (A|B1 ) + P (B2 )P (A|B2 )
Bayes’ rule determines the posterior probability of a state P (Bj |A) by calculating the prob-
ability P (A ∩ Bj ) that both the event A and the state Bj will occur and dividing it by
the probability that the event will occur regardless of the state (by summing across all
Bi ). The states could be something like Normal/Defective, Healthy/Diseased, Republi-
can/Democrat/Independent, etc. The event on which one conditions could be something
like a sampling from a batch of components, a test for a disease, or a question about a policy
position.
Prior and Posterior Probabilities: Above, P (B1 ) is often called the prior probability,
since it’s the probability of B1 before anything else is known. P (B1 |A) is called the posterior
probability, since it’s the probability after other information is taken into account.
Example 6.6 (Bayes’ Rule). In a given town, 40% of the voters are Democrat and 60%
are Republican. The president’s budget is supported by 50% of the Democrats and 90%
of the Republicans. If a randomly (equally likely) selected voter is found to support the
president’s budget, what is the probability that they are a Democrat?
Exercise 6.4 (Conditional Probability). Assume that 2% of the population of the U.S. are
members of some extremist militia group. We develop a survey that positively classifies
someone as being a member of a militia group given that they are a member 95% of the
time and negatively classifies someone as not being a member of a militia group given that
they are not a member 97% of the time. What is the probability that someone positively
classified as being a member of a militia group is actually a militia member?
6.5 Independence
Definition 6.5 (Independence). If the occurrence or nonoccurrence of either events A
and B have no effect on the occurrence or nonoccurrence of the other, then A and B are
independent.
3. P (A ∩ B) = P (A)P (B)
∩k ∏K
4. More generally than the above, P ( i=1 Ai ) = i=1 P (Ai )
Are mutually exclusive events independent of each other?
No. If A and B are mutually exclusive, then they cannot happen simultaneously. If we know
that A occurred, then we know that B couldn’t have occurred. Because of this, A and B
aren’t independent.
Pairwise Independence: A set of more than two events A1 , A2 , . . . , Ak is pairwise inde-
pendent if P (Ai ∩ Aj ) = P (Ai )P (Aj ), ∀i ̸= j. Note that this does not necessarily imply
joint independence.
Conditional Independence: If A and B are independent once you know the occurrence
of a third event C, then A and B are conditionally independent (conditional on C):
1. P (A|B ∩ C) = P (A|C)
2. P (B|A ∩ C) = P (B|C)
3. P (A ∩ B|C) = P (A|C)P (B|C)
Just because two events are conditionally independent does not mean that they are indepen-
dent. Actually it is hard to think of real-world things that are “unconditionally” independent.
That’s why it’s always important to ask about a finding: What was it conditioned on? For
example, suppose that a graduate school admission decisions are done by only one professor,
who picks a group of 50 bright students and flips a coin for each student to generate a class
of about 25 students. Then the the probability that two students get accepted are condition-
ally independent, because they are determined by two separate coin tosses. However, this
does not mean that their admittance is not completely independent. Knowing that student
A got in gives us information about whether student B got in, if we think that the professor
originally picked her pool of 50 students by merit.
Perhaps more counter-intuitively: If two events are already independent, then it might seem
that no amount of “conditioning” will make them dependent. But this is not always so. For
example2 , suppose I only get a call from two people, Alice and Bob. Let A be the event
that Alice calls, and B be the event that Bob calls. Alice and Bob do not communicate,
so P (A | B) = P (A). But now let C be the event that your phone rings. For conditional
independence to hold here, then P (A | C) must be equal to P (A | B ∩ C). But this is not
true – A | C may or may not be true, but P (A | B ∩ C) certainly is true.
s2
s1
s6 -1.2 -0.6 0 1 2
s3
s4 s8
s7
s5
The Random Variable X is a function
that takes events and assigns a
number to them. That mapping
process is deterministic, but the
occurrence of an event is still random.
Figure 6.2 shows a image of the function. It might seem strange to define a random variable
as a function – which is neither random nor variable. The randomness comes from the
realization of an event from the sample space s.
Randomness means that the outcome of some experiment is not deterministic, i.e. there
is some probability (0 < P (A) < 1) that the event will occur.
The support of a random variable is all values for which there is a positive probability of
occurrence.
Example: Flip a fair coin two times. What is the sample space?
A random variable must map events to the real line. For example, let a random variable
X be the number of heads. The event (H, H) gets mapped to 2 X(s) = 2, and the events
{(H, T ), (T, H)} gets mapped to 1 (X(s) = 1), the event (T, T ) gets mapped to 0 (X(s) = 0).
What are other possible random variables?
6.7 Distributions
We now have two main concepts in this section – probability and random variables. Given
a sample space S and the same experiment, both probability and random variables take
events as their inputs. But they output different things (probabilities measure the “size”
of events, random variables give a number in a way that the analyst chose to define the
random variable). How do the two concepts relate?
The concept of distributions is the natural bridge between these two concepts.
114 CHAPTER 6. PROBABILITY THEORY
Notice how the definition of distributions combines two ideas of random variables and prob-
abilities of events. First, the distribution considers a random variable, call it X. X can take
a number of possible numeric values.
Example 6.7 (Total Number of Occurrences). Consider three binary outcomes, one for
each patient recovering from a disease: Ri denotes the event in which patient i (i = 1, 2, 3)
recovers from a disease. R1 , R2 , and R3 . How would we represent the total number of
people who end up recovering from the disease?
Solution. Define the random variable X be the total number of people (out of three) who
recover from the disease. Random variables are functions, that take as an input a set
of events (in the sample space S) and deterministically assigns them to a number of the
analyst’s choice.
Recall that with each of these numerical values there is a class of events. In the previous
example, for X = 3 there is one outcome (R1 , R2 , R3 ) and for X = 1 there are multiple
({(R1 , R2c , R3c ), (R1c , R2 , R3c ), (R1c , R2c , R3 ), }). Now, the thing to notice here is that each of
these events naturally come with a probability associated with them. That is, P (R1 , R2 , R3 )
is a number from 0 to 1, as is P (R1 , R2c , R3c ). These all have probabilities because they are
in the sample space S. The function that tells us these probabilities that are associated
with a numerical value of a random variable is called a distribution.
In other words, a random variable X induces a probability distribution P (sometimes written
PX to emphasize that the probability density is about the r.v. X)
The formal definition of a random variable is easier to given by separating out two cases:
discrete random variables when the numeric summaries of the events are discrete, and
continuous random variables when they are continuous.
Definition 6.9 (Probability Mass Function). For a discrete random variable X, the prob-
ability mass function (Also referred to simply as the “probability distribution.”) (PMF),
p(x) = P (X = x), assigns probabilities to a countable number of distinct x values such that
1. 0 ≤ p(x) ≤ 1
6.7. DISTRIBUTIONS 115
∑
2. p(x) = 1
y
Example: For a fair six-sided die, there is an equal probability of rolling any number. Since
there are six sides, the probability mass function is then p(y) = 1/6 for y = 1, . . . , 6, 0
otherwise.}
In a discrete random variable, cumulative density function (Also referred to simply as
the “cumulative distribution” or previously as the “density function”), F (x) or P (X ≤ x),
is the probability that X is less than or equal to some value x, or
∑
P (X ≤ x) = p(i)
i≤x
Example 6.8. For a fair die with its value as Y , What are the following?
1. P (Y ≤ 1)
2. P (Y ≤ 3)
3. P (Y ≤ 6)
Definition 6.11 (Probability Density Function). The function f above is called the prob-
ability density function (pdf) of X and must satisfy
f (x) ≥ 0
∫∞
f (x)dx = 1
−∞
For both discrete and continuous random variables, we have a unifying concept of another
measure: the cumulative distribution:
Definition 6.12 (Cumulative Density Function). Because the probability that a continuous
random variable will assume any particular value is zero, we can only make statements about
the probability of a continuous random variable being within an interval. The cumulative
distribution gives the probability that Y lies on the interval (−∞, y) and is defined as
∫x
F (x) = P (X ≤ x) = f (s)ds
−∞
Note that F (x) has similar properties with continuous distributions as it does with dis-
crete - non-decreasing, continuous (not just right-continuous), and lim F (x) = 0 and
x→−∞
lim F (x) = 1.
x→∞
∫b
P (a ≤ x ≤ b) = f (x)dx
a
The PDF and CDF are linked by the integral: The CDF of the integral of the PDF:
dF (x)
f (x) = F ′ (x) =
dx
Example 6.9. For f (y) = 1, 0 < y < 1, find: (1) The CDF F (y) and (2) The probability
P (0.5 < y < 0.75).
p(x, y) = P (X = x, Y = y)
Continuous:
∫∫
f (x, y); P ((X, Y ) ∈ A) = f (x, y)dxdy
A
Exercise 6.5 (Discrete Outcomes). Suppose we are interested in the outcomes of flipping a
coin and rolling a 6-sided die at the same time. The sample space for this process contains
12 elements:
{(H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6), (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6)}
We can define two random variables X and Y such that X = 1 if heads and X = 0 if tails,
while Y equals the number on the die.
We can then make statements about the joint distribution of X and Y . What are the
following?
1. P (X = x)
2. P (Y = y)
3. P (X = x, Y = y)
4. P (X = x|Y = y)
5. Are X and Y independent?
6.9 Expectation
We often want to summarize some characteristics of the distribution of a random variable.
The most important summary is the expectation (or expected value, or mean), in which the
possible values of a random variable are weighted by their probabilities.
118 CHAPTER 6. PROBABILITY THEORY
In words, it is the weighted average of all possible values of Y , weighted by the probability
that y occurs. It is not necessarily the number we would expect Y to take on, but the
average value of Y after a large number of repetitions of an experiment.
Example 6.11 (Expectation of a Continuous Random Variable). Find E(Y ) for f (y) =
1
1.5 , 0 < y < 1.5.
∑
E[g(Y )] = g(y)p(y)
y
∫∞
E[g(Y )] = g(y)f (y)dy
−∞
Dealing with Expectations is easier when the thing inside is a sum. The intuition behind
this that Expectation is an integral, which is a type of sum.
6.10. VARIANCE AND COVARIANCE 119
E(c) = c
regardless of independence
4. Expected Value of Expected Values:
E(E(Y )) = E(Y )
E(XY ) = E(X)E(Y )
Conditional Expectation: With joint distributions, we are often interested in the ex-
pected value of a variable Y if we could hold the other variable X fixed. This is the
conditional expectation of Y given X = x:
∑
1. Y discrete: E(Y |X = x) = y ypY |X (y|x)
∫
2. Y continuous: E(Y |X = x) = y yfY |X (y|x)dy
The conditional expectation is often used for prediction when one knows the value of X but
not Y
We can also look at other summaries of the distribution, which build on the idea of taking
expectations. Variance tells us about the “spread” of the distribution; it is the expected
value of the squared deviations from the mean of the distribution. The standard deviation
is simply the square root of the variance.
What is Var(x)?
Hint: First calculate E(X) and E(X 2 )
Definition 6.15 (Covariance and Correlation). The covariance measures the degree to
which two random variables vary together; if the covariance between X and Y is positive,
X tends to be larger than its mean when Y is larger than its mean.
Definition 6.16 (Correlation). The correlation coefficient is the covariance divided by the
standard deviations of X and Y . It is a unitless measure and always takes on values in the
interval [−1, 1].
Cov(X, Y ) Cov(X, Y )
Corr(X, Y ) = √ =
Var(X)Var(Y ) SD(X)SD(Y )
Exercise 6.6 (Expectation and Variance). Suppose we have a PMF with the following
characteristics:
1
P (X = −2) =
5
1
P (X = −1) =
6
1
P (X = 0) =
5
1
P (X = 1) =
15
11
P (X = 2) =
30
1. Calculate the expected value of X
Define the random variable Y = X 2 .
2. Calculate the expected value of Y. (Hint: It would help to derive the PMF of Y first
in order to calculate the expected value of Y in a straightforward way)
3. Calculate the variance of X.
Exercise 6.7 (Expectation and Variance 2). 1. Find the expectation and variance
Given the following PDF:
{
3
10 (3x− x2 ) 0 ≤ x ≤ 2
f (x) =
0 otherwise
Exercise 6.8 (Expectation and Variance 3). 1. Find the mean and standard deviation
of random variable X. The PDF of this X is as follows:
4x 0 ≤ x ≤ 2
1
f (x) = 41 (4 − x) 2 ≤ x ≤ 4
0 otherwise
2. Next, calculate P (X < µ−σ) Remember, µ is the mean and σ is the standard deviation
For any particular sequence of y successes and n − y failures, the probability of obtaining
that
(n ) sequence is py q n−y (by the multiplicative law and independence). However, there are
y = (n−y)!y! ways of obtaining a sequence with y successes and n − y failures. So the
n!
Example 6.13. Republicans vote for Democrat-sponsored bills 2% of the time. What is
the probability that out of 10 Republicans questioned, half voted for a particular Democrat-
sponsored bill? What is the mean number of Republicans voting for Democrat-sponsored
bills? The variance? 1. P (Y = 5) = 1. E(Y ) = 1. Var(Y ) = 6
λy −λ
P (Y = y) = e , y = 0, 1, 2, . . . , λ>0
y!
The Poisson has the unusual feature that its expectation equals its variance: E(Y ) =
Var(Y ) = λ. The Poisson distribution is often used to model rare event counts: counts of
the number of events that occur during some unit of time. λ is often called the “arrival
rate.”
Example 6.14. Border disputes occur between two countries through a Poisson Distribu-
tion, at a rate of 2 per month. What is the probability of 0, 2, and less than 5 disputes
occurring in a month?
Example 6.15. For Y uniformly distributed over (1, 3), what are the following probabili-
ties?
1. P (Y = 2)
2. Its density evaluated at 2, or f (2)
3. P (Y ≤ 2)
4. P (Y > 2)
6.12. SUMMARIZING OBSERVED EVENTS (DATA) 123
0.4
0.3
f(x)
0.2
0.1
0.0
1 (y−µ)2
f (y) = √ e− 2σ2
2πσ
See Figure 6.3 are various Normal Distributions with the same µ = 1 and two versions of
the variance.
So far, we’ve talked about distributions in a theoretical sense, looking at different properties
of random variables. We don’t observe random variables; we observe realizations of the
random variable. These realizations of events are roughly equivalent to what we mean by
“data”.
Sample mean: This is the most common measure of central tendency, calculated by sum-
124 CHAPTER 6. PROBABILITY THEORY
1∑
n
x̄ = xi
n i=1
Example:
X 6 3 7 5 5 5 6 4 7 2
Y 1 2 1 2 2 1 2 0 2 0
1. x̄ = ȳ =
2. median(x) = median(y) =
3. mx = my =
Dispersion: We also typically want to know how spread out the data are relative to the
center of the observed distribution. There are several ways to measure dispersion.
Sample variance: The sample variance is the sum of the squared deviations from the
sample mean, divided by the number of observations minus 1.
1 ∑
n
ˆ
Var(X) = (xi − x̄)2
n − 1 i=1
Covariance and Correlation: Both of these quantities measure the degree to which two
variables vary together, and are estimates of the covariance and correlation of two random
variables as defined above.
ˆ ∑n
i=1 (xi − x̄)(yi − ȳ)
1
1. Sample covariance: Cov(X, Y ) = n−1
ˆ
2. Sample correlation: Corr = √
ˆ Cov(X,Y
ˆ
)
ˆ
Var(X)Var(Y )
Example 6.16. Example: Using the above table, calculate the sample versions of:
6.13. ASYMPTOTIC THEORY 125
1. Cov(X, Y )
2. Corr(X, Y )
We are now finally ready to revisit, with a bit more precise terms, the two pillars of statistical
theory we motivated Section 3.3 with.
Theorem 6.1 (Central Limit Theorem (i.i.d. case)). Let {Xn } = {X1 , X2 , . . .} be a se-
quence of i.i.d. random variables with finite mean (µ) and variance (σ 2 ). Then, the sample
mean X̄n = X1 +X2n+···+Xn increasingly converges into a Normal distribution.
X̄n − µ d
√ − → Normal(0, 1),
σ/ n
Another way to write this as a probability statement is that for all real numbers a,
( )
X̄n − µ
P √ ≤ a → Φ(a)
σ/ n
as n → ∞, where ∫ x
1 x2
Φ(x) = √ e− 2 dx
−∞ 2π
is the CDF of a Normal distribution with mean 0 and variance 1.
This result means that, as n grows, the distribution of the sample mean X̄n = n1 (X1 + X2 +
· · · + Xn ) is approximately normal with mean µ and standard deviation √σn , i.e.,
( )
σ2
X̄n ≈ N µ, .
n
The standard deviation of X̄n (which is √ roughly a measure of the precision of X̄n as an
estimator of µ) decreases at the rate 1/ n, so, for example, to increase its precision by 10
(i.e., to get one more digit right), one needs to collect 102 = 100 times more units of data.
126 CHAPTER 6. PROBABILITY THEORY
Intuitively, this result also justifies that whenever a lot of small, independent processes some-
how combine together to form the realized observations, practitioners often feel comfortable
assuming Normality.
Theorem 6.2 (Law of Large Numbers (LLN)). For any draw of independent random vari-
ables with the same mean µ, the sample average after n draws, X̄n = n1 (X1 + X2 + . . . + Xn ),
converges in probability to the expected value of X, µ as n → ∞:
p
A shorthand of which is X̄n −
→ µ, where the arrow is read as “converges in probability to”.
Some of you may encounter “big-OH”-notation. If f, g are two functions, we say that
f = O(g) if there exists some constant, c, such that f (n) ≤ c × g(n) for large enough n.
This notation is useful for simplifying complex problems in game theory, computer science,
and statistics.
Example.
What is O(5 exp(0.5n) + n2 + n/2)? Answer: exp(n). Why? Because, for large n,
5 exp(0.5n) + n2 + n/2 c exp(n)
≤ = c.
exp(n) exp(n)
whenever n > 4 and where c = 1.
1. 5 × 5 × 5 = 125
2. 5 × 4 × 3 = 60
( )
3. 53 = (5−3)!3!
5!
= 5×4
2×1 = 10
3. 0
1
4. 2
4 2
5. 6 = 3
6. 1
7. A ∪ B = {1, 2, 3, 4, 6}, A ∩ B = {2}, 5
6
2. What is P (X = 5 ∪ X = 3)C = 10
16 ?
nab
nab
5. N
nab +nabc = nab +nabc
N
128 CHAPTER 6. PROBABILITY THEORY
Using this, Bayes’ Law and the Law of Total Probability, we know:
P (D)P (S|D)
P (D|S) =
P (D)P (S|D) + P (Dc )P (S|Dc )
.4 × .5
P (D|S) = = .27
.4 × .5 + .6 × .9
P (C|M )P (M )
P (M |C) =
P (C)
P (C|M )P (M )
=
P (C|M )P (M ) + P (C|M c )P (M c )
P (C|M )P (M )
=
P (C|M )P (M ) + [1 − P (C c |M c )]P (M c )
.95 × .02
= = .38
.95 × .02 + .03 × .98
3.
Programming
131
Chapter 7
Welcome to the first in-class session for programming. Up till this point, you should have
already:
• Completed the R Visualization and Programming primers (under “The Basics”) on
your own at https://rstudio.cloud/learn/primers/,
• Made an account at RStudio Cloud and join the Math Prefresher 2018 Space, and
• Successfully signed up for the University wi-fi: https://getonline.harvard.edu/
(Access Harvard Secure with your HarvardKey. Try to get a HarvardKey as soon as
possible.)
133
134 CHAPTER 7. ORIENTATION AND READING IN DATA
• How would you figure out what variables are in the data? size of the data?
• How would you read in a csv file, a dta file, a sav file?
7.2 Orienting
1. We will be using a cloud version of RStudio at https://rstudio.cloud. You should
join the Math Prefresher Space 2018 from the link that was emailed to you. Each day,
click on the project with the day’s date on it.
Although most of you will probably doing your work on RStudio local rather than
cloud, we are trying to use cloud because it makes it easier to standardize people’s
settings.
2. RStudio (either cloud or desktop) is a GUI and an IDE for the programming language
R. A Graphical User Interface allows users to interface with the software (in this case
R) using graphical aids like buttons and tabs. Often we don’t think of GUIs because
to most computer users, everything is a GUI (like Microsoft Word or your “Control
Panel”), but it’s always there! A Integrated Development Environment just says that
the software to interface with R comes with useful useful bells and whistles to give
you shortcuts.
The Console is kind of a the core window through which you see your GUI actually
operating through R. It’s not graphical so might not be as intuitive. But all your
results, commands, errors, warnings.. you see them in here. A console tells you what’s
going on now.
3. via the GUI, you the analyst needs to sends instructions, or commands, to the R
application. The verb for this is “run” or “execute” the command. Computer programs
ask users to provide instructions in very specific formats. While a English-speaking
human can understand a sentence with a few typos in it by filling in the blanks, the
same typo or misplaced character would halt a computer program. Each program has
its own requirements for how commands should be typed; after all, each of these is its
own language. We refer to the way a program needs its commands to be formatted as
its syntax.
4. Theoretically, one could do all their work by typing in commands into the Console.
But that would be a lot of work, because you’d have to give instructions each time
7.2. ORIENTING 135
you start your data analysis. Moreover, you’ll have no record of what you did. That’s
why you need a script. This is a type of code. It can be referred to as a source
because that is the source of your commands. Source is also used as a verb; “source
the script” just means execute it. RStudio doesn’t start out with a script, so you can
make one from “File > New” or the New file icon.
4. You can also open scripts that are in folders in your computer. A script is a type of
File. Find your Files in the bottom-right “Files” pane.
To load a dataset, you need to specify where that file is. Computer files (data, doc-
uments, programs) are organized hiearchically, like a branching tree. Folders can
contain files, and also other folders. The GUI toolbar makes this lineaer and hiearchi-
cal relationship apparent. When we turn to locate the file in our commands, we need
another set of syntax. Importantly, denote the hierarchy of a folder by the / (slash)
symbol. data/input/2018-08 indicates the 2018-08 folder, which is included in the
input folder, which is in turn included in the data folder.
Files (but not folders) have “file extensions” which you are probably familiar with
already: .docx, .pdf, and .pdf. The file extensions you will see in a stats or quanti-
tative social science class are:
• .pdf: PDF, a convenient format to view documents and slides in, regardless of
Mac/Windows.
5. In R, there are two main types of scripts. A classic .R file and a .Rmd file (for
Rmarkdown). A .R file is just lines and lines of R code that is meant to be inserted
right into the Console. A .Rmd tries to weave code and English together, to make it
easier for users to create reports that interact with data and intersperse R code with
explanation. For example, we built this book in Rmds.
The Rmarkdown facilitates is the use of code chunks, which are used here. These
start and end with three back-ticks. In the beginning, we can add options in curly
braces ({}). Specifying r in the beginning tells to render it as R code. Options
like echo = TRUE switch between showing the code that was executed or not; eval =
TRUE switch between evaluating the code. More about Rmarkdown in Section 13. For
example, this code chunk would evaluate 1 + 1 and show its output when compiled,
but not display the code that was executed.
7.2. ORIENTING 137
Dataframe subsetting
2 This Exercise is taken from Harvard’s Introductory Undergraduate Class, CS50 (https://www.youtube.
com/watch?v=kcbT3hrEi9s), and many other writeups.
3 See for example this community discussion: https://community.rstudio.com/t/
base-r-and-the-tidyverse/2965/17
7.4. BASE-R VS. TIDYVERSE 139
Remember that tidyverse applies to dataframes only, not vectors. For subsetting vectors,
use the base-R functions with the square brackets.
Read data
Some non-tidyverse functions are not quite “base-R” but have similar relationships to tidy-
verse. For these, we recommend using the tidyverse functions as a general rule due to their
common format, simplicity, and scalability.
Visualization
## data/input/justices_court-median.csv
## data/input/ober_2018.xlsx
## data/input/sample_mid.csv
## data/input/sample_polity.csv
## data/input/upshot-siena-polls.csv
## data/input/usc2010_001percent.Rds
## data/input/usc2010_001percent.csv
A typical file format is Microsoft Excel. Although this is not usually the best format for R
because of its highly formatted structure as opposed to plain text (more on this in Section
??(sec:wysiwyg)), recent packages have made this fairly easy.
In Rstudio, a good way to start is to use the GUI and the Import tool. Once you click a file,
an option to “Import Dataset” comes up. RStudio picks the right function for you, and you
can copy that code, but it’s important to eventually be able to write that code yourself.
For the first time using an outside package, you first need to install it.
install.packages("readxl")
After that, you don’t need to install it again. But you do need to load it each time.
library(readxl)
From the help page, we see that read_excel() is the function that we want to use.
Let’s try it.
library(readxl)
ober <- read_excel("data/input/ober_2018.xlsx")
Review: what does the / mean? Why do we need the data term first? Does the argument
need to be in quotes?
7.5.3 Inspecting
For almost any dataset, you usually want to do a couple of standard checks first to under-
stand what you loaded.
ober
## # A tibble: 1,035 x 10
## polis_number Name Latitude Longitude Hellenicity Fame Size Colonies
## <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 Alal~ 42.1 9.51 most Greek 1.12 100-~ 0
142 CHAPTER 7. ORIENTATION AND READING IN DATA
## [1] 1035 10
From your tutorials, you also know how to do graphics! Graphics are useful for grasping
your data, but we will cover them more deeply in Chapter 9.
ggplot(ober, aes(x = Fame)) + geom_histogram()
500
400
300
count
200
100
0 5 10 15 20
Fame
20
15
Fame
10
These tidyverse commands from the dplyr package are newer and not built-in, but they
are one of the increasingly more popular ways to wrangle data.
• 80 percent of your data wrangling needs might be doable with these basic dplyr
functions: select, mutate, group_by, summarize, and arrange.
• These verbs roughly correspond to the same commands in SQL, another important
language in data science.
• The %>% symbol is a pipe. It takes the thing on the left side and pipes it down to the
function on the right side. We could have done count(cen10, race) as cen10 %>%
count(race). That means take cen10 and pass it on to the function count, which
will count observations by race and return a collapsed dataset with the categories in
its own variable and their respective counts in n.
Although this is a bit beyond our current stage, it’s hard to resist the temptation to see
what you can do with data like this. For example, you can map it.6
6 Inmid-2018, changes in Google’s services made it no longer possible to render maps on the fly. Therefore,
the map is not currently rendered automatically (but can be rendered once the user registers their API).
144 CHAPTER 7. ORIENTATION AND READING IN DATA
I chose the specifications for arguments zoom and maptype by looking at the webpage and
Googling some examples.
Ober’s data has the latitude and longitude of each polis. Because the map of Greece has
the same coordinates, we can add the polei on the same map.
gg_ober <- ggmap(greece) +
geom_point(data = ober,
aes(y = Latitude, x = Longitude),
size = 0.5,
color = "orange")
gg_ober +
scale_x_continuous(limits = c(10, 35)) +
scale_y_continuous(limits = c(32, 44)) +
theme_void()
Exercises
Make a scatterplot with the number of colonies on the x-axis and Fame on the y-axis.
Instead, you now need to register with Google. See the change to the pacakge ggmap.
7.5. A IS FOR ATHENS 145
Figure 7.6
146 CHAPTER 7. ORIENTATION AND READING IN DATA
Figure 7.7
# Enter here
Find the correct function to read the following datasets (available in your rstudio.cloud
session) into your R window.
Read Ober’s codebook and find a variable that you think is interesting. Check the distribu-
tion of that variable in your data, get a couple of statistics, and summarize it in English.
# Enter here
This is day 1 and we covered a lot of material. Some of you might have found this completely
new; others not so. Please click through this survey before you leave so we can adjust
accordingly on the next few days.
https://harvard.az1.qualtrics.com/jfe/form/SV_8As7Y7C83iBiQzH
148 CHAPTER 7. ORIENTATION AND READING IN DATA
Chapter 8
## # A tibble: 6 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
1 Module originally written by Shiro Kuriwaki and Yon Soo Park
149
150 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
## [1] 30871 4
nrow(cen10)
## [1] 30871
ncol(cen10)
## [1] 4
What variables does this dataset hold? What kind of information does it have?
colnames(cen10)
## [1] 51
Matrices are rectangular structures of numbers (they have to be numbers, and they can’t
be characters).
A cross-tab can be considered a matrix:
table(cen10$race, cen10$sex)
##
## Female Male
## American Indian or Alaska Native 142 153
## Black/Negro 2070 1943
## Chinese 192 162
## Japanese 51 26
## Other Asian or Pacific Islander 587 542
## Other race, nec 877 962
## Three or more major races 37 51
## Two major races 443 426
## White 11252 10955
cross_tab <- table(cen10$race, cen10$sex)
dim(cross_tab)
## [1] 9 2
cross_tab[6, 2]
## [1] 962
But a subset of your data – individual values– can be considered a matrix too.
# First 20 rows of the entire data
# Below two lines of code do the same thing
cen10[1:20, ]
## # A tibble: 20 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
152 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
## # A tibble: 20 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## 11 Texas Female 23 White
## 12 Pennsylvania Female 66 White
## 13 California Female 57 White
## 14 Texas Female 73 Other race, nec
## 15 California Male 43 White
## 16 Washington Male 29 White
## 17 Texas Male 8 White
## 18 Missouri Male 78 White
## 19 West Virginia Male 10 White
## 20 Idaho Female 9 White
# Of the first 20 rows of the entire data, look at values of just race and age
# Below two lines of code do the same thing
cen10[1:20, c("race", "age")]
## # A tibble: 20 x 2
## race age
## <chr> <dbl>
## 1 White 8
## 2 White 24
## 3 White 37
## 4 White 12
8.1. BASICS - MATRICES 153
## 5 Black/Negro 18
## 6 White 50
## 7 White 51
## 8 White 41
## 9 White 62
## 10 White 25
## 11 White 23
## 12 White 66
## 13 White 57
## 14 Other race, nec 73
## 15 White 43
## 16 White 29
## 17 White 8
## 18 White 78
## 19 White 10
## 20 White 9
cen10 %>% slice(1:20) %>% select(race, age)
## # A tibble: 20 x 2
## race age
## <chr> <dbl>
## 1 White 8
## 2 White 24
## 3 White 37
## 4 White 12
## 5 Black/Negro 18
## 6 White 50
## 7 White 51
## 8 White 41
## 9 White 62
## 10 White 25
## 11 White 23
## 12 White 66
## 13 White 57
## 14 Other race, nec 73
## 15 White 43
## 16 White 29
## 17 White 8
## 18 White 78
## 19 White 10
## 20 White 9
A vector is a special type of matrix with only one column or only one row
# One column
cen10[1:10, c("age")]
## # A tibble: 10 x 1
## age
154 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
## <dbl>
## 1 8
## 2 24
## 3 37
## 4 12
## 5 18
## 6 50
## 7 51
## 8 41
## 9 62
## 10 25
cen10 %>% slice(1:10) %>% select(c("age"))
## # A tibble: 10 x 1
## age
## <dbl>
## 1 8
## 2 24
## 3 37
## 4 12
## 5 18
## 6 50
## 7 51
## 8 41
## 9 62
## 10 25
# One row
cen10[2, ]
## # A tibble: 1 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 Ohio Male 24 White
cen10 %>% slice(2)
## # A tibble: 1 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 Ohio Male 24 White
What if we want a special subset of the data? For example, what if I only want the records of
individuals in California? What if I just want the age and race of individuals in California?
# subset for CA rows
ca_subset <- cen10[cen10$state == "California", ]
all_equal(ca_subset, ca_subset_tidy)
## [1] TRUE
# subset for CA rows and select age and race
ca_subset_age_race <- cen10[cen10$state == "California", c("age", "race")]
all_equal(ca_subset_age_race, ca_subset_age_race_tidy)
## [1] TRUE
Some common operators that can be used to filter or to use as a condition. Remember, you
can use the unique function to look at the set of all values a variable holds in the dataset.
# all individuals older than 30 and younger than 70
s1 <- cen10[cen10$age > 30 & cen10$age < 70, ]
s2 <- cen10 %>% filter(age > 30 & age < 70)
all_equal(s1, s2)
## [1] TRUE
# all individuals in either New York or California
s3 <- cen10[cen10$state == "New York" | cen10$state == "California", ]
s4 <- cen10 %>% filter(state == "New York" | state == "California")
all_equal(s3, s4)
## [1] TRUE
# all individuals in any of the following states: California, Ohio, Nevada, Michigan
s5 <- cen10[cen10$state %in% c("California", "Ohio", "Nevada", "Michigan"), ]
s6 <- cen10 %>% filter(state %in% c("California", "Ohio", "Nevada", "Michigan"))
all_equal(s5, s6)
## [1] TRUE
# all individuals NOT in any of the following states: California, Ohio, Nevada, Michigan
s7 <- cen10[!(cen10$state %in% c("California", "Ohio", "Nevada", "Michigan")), ]
s8 <- cen10 %>% filter(!state %in% c("California", "Ohio", "Nevada", "Michigan"))
all_equal(s7, s8)
## [1] TRUE
Checkpoint
Get the subset of cen10 for non-white individuals (Hint: look at the set of values for the
race variable by using the unique function)
156 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
# Enter here
Get all the serial numbers for black, male individuals who don’t live in Ohio or Nevada.
# Enter here
You can think of data frames maybe as matrices-plus, because a column can take on char-
acters as well as numbers. As we just saw, this is often useful for real data analyses.
cen10
## # A tibble: 30,871 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## # ... with 30,861 more rows
Another way to think about data frames is that it is a type of list. Try the str() code
below and notice how it is organized in slots. Each slot is a vector. They can be vectors of
numbers or characters.
# enter this on your console
str(cen10)
8.2. MOTIVATION 157
8.2 Motivation
Nunn and Wantchekon (2011) – “The Slave Trade and the Origins of Mistrust in Africa”2
– argues that across African countries, the distrust of co-ethnics fueled by the slave trade
has had long-lasting effects on modern day trust in these territories. They argued that the
slave trade created distrust in these societies in part because as some African groups were
employed by European traders to capture their neighbors and bring them to the slave ships.
Nunn and Wantchekon use a variety of statistical tools to make their case (adding controls,
ordered logit, instrumental variables, falsification tests, causal mechanisms), many of which
will be covered in future courses. In this module we will only touch on their first set of anal-
ysis that use Ordinary Least Squares (OLS). OLS is likely the most common application of
linear algebra in the social sciences. We will cover some linear algebra, matrix manipulation,
and vector manipulation from this data.
library(haven)
nunn_full <- read_dta("data/input/Nunn_Wantchekon_AER_2011.dta")
Nunn and Wantchekon’s main dataset has more than 20,000 observations. Each observation
is a respondent from the Afrobarometer survey.
head(nunn_full)
## # A tibble: 6 x 59
## respno ethnicity murdock_name isocode region district townvill
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BEN00~ fon FON BEN atlna~ KPOMASSE TOKPA-D~
## 2 BEN00~ fon FON BEN atlna~ KPOMASSE TOKPA-D~
## 3 BEN00~ fon FON BEN atlna~ OUIDAH 3ARROND
## 4 BEN00~ fon FON BEN atlna~ OUIDAH 3ARROND
## 5 BEN00~ fon FON BEN atlna~ OUIDAH PAHOU
## 6 BEN00~ fon FON BEN atlna~ OUIDAH PAHOU
## # ... with 52 more variables: location_id <dbl>, trust_relatives <dbl>,
## # trust_neighbors <dbl>, intra_group_trust <dbl>,
## # inter_group_trust <dbl>, trust_local_council <dbl>,
## # ln_export_area <dbl>, export_area <dbl>, export_pop <dbl>,
## # ln_export_pop <dbl>, age <dbl>, age2 <dbl>, male <dbl>,
## # urban_dum <dbl>, occupation <dbl>, religion <dbl>,
## # living_conditions <dbl>, education <dbl>, near_dist <dbl>,
## # distsea <dbl>, loc_murdock_name <chr>, loc_ln_export_area <dbl>,
## # local_council_performance <dbl>, council_listen <dbl>,
## # corrupt_local_council <dbl>, school_present <dbl>,
## # electricity_present <dbl>, piped_water_present <dbl>,
2 Nunn,Nathan, and Leonard Wantchekon. 2011. “The Slave Trade and the Origins of Mistrust in Africa.”
American Economic Review 101(7): 3221–52.
158 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
## # A tibble: 10 x 5
## trust_neighbors exports ln_exports export_area ln_export_area
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3 0.388 0.328 0.00407 0.00406
## 2 3 0.631 0.489 0.0971 0.0926
## 3 3 0.994 0.690 0.0125 0.0124
## 4 0 183. 5.21 1.82 1.04
## 5 3 0 0 0 0
## 6 2 0 0 0 0
## 7 2 666. 6.50 14.0 2.71
## 8 0 0.348 0.298 0.00608 0.00606
## 9 3 0.435 0.361 0.0383 0.0376
## 10 3 0 0 0 0
## [1] 10
data.frames and matrices have much overlap in R, but to explicitly treat an object as a
matrix, you’d need to coerce its class. Let’s call this matrix X.
X <- as.matrix(nunn)
What is the difference between a data.frame and a matrix? A data.frame can have
columns that are of different types, whereas — in a matrix — all columns must be of the
same type (usually either “numeric” or “character”).
system.time(replicate(50000, colMeans(Xmat)))
## [1] 3 3 3 0 3 2 2 0 3 3
What are all the values of “exports”? (i.e. return the whole “exports” column)
8.6. HANDLING MATRICIES IN R 161
X[, "exports"]
## trust_neighbors
## 3
Pause and consider the following problem on your own. What is the following code doing?
X[X[, "trust_neighbors"] == 0, "export_area"]
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
## [6,] 0 0 0 0 0
## [7,] 0 0 0 0 0
## [8,] 0 0 0 0 0
## [9,] 0 0 0 0 0
## [10,] 0 0 0 0 0
t(X) %*% X
cbind(X, 1)
exports is the total number of slaves that were taken from the individual’s ethnic group
between Africa’s four slave trades between 1400-1900.
What is ln_exports? The article describes this as the natural log of one plus the exports.
This is a transformation of one column by a particular function
log(1 + X[, "exports"])
In Table 1 we see “OLS Estimates”. These are estimates of OLS coefficients and standard
errors. You do not need to know what these are for now, but it doesn’t hurt to getting used
to seeing them.
A very crude way to describe regression is through linear combinations. The simplest linear
combination is a one-to-one transformation.
Take the first number in Table 1, which is -0.00068. Now, multiply this by exports
-0.00068 * X[, "exports"]
164 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
Figure 8.1
8.8. LINEAR COMBINATIONS 165
Now, just one more step. Make a new matrix with just exports and the value 1
X2 <- cbind(1, X[, "exports"])
## NULL
colnames(X2) <- c("intercept", "exports")
## [1] 10 2
## [1] 2 1
What is the product of X2 and B? From the dimensions, can you tell if it will be conformable?
X2 %*% B
## [,1]
## [1,] 1.619736
## [2,] 1.619571
## [3,] 1.619324
## [4,] 1.495839
## [5,] 1.620000
## [6,] 1.620000
## [7,] 1.167144
## [8,] 1.619764
## [9,] 1.619704
## [10,] 1.620000
Exercises
Let [ ]
0.6 0.2
A=
0.4 0.8
Use R to write code that will create the matrix A, and then consecutively multiply A to
itself 4 times. What is the value of A4 ?
## Enter yourself
Note that R notation of matrices is different from the math notation. Simply trying X^n
where X is a matrix will only take the power of each element to n. Instead, this problem
asks you to perform matrix multiplication.
Let’s apply what we learned about subsetting or filtering/selecting. Use the nunn_full
dataset you have already loaded
a) First, show all observations (rows) that have a "male" variable higher than 0.5
## Enter yourself
b) Next, create a matrix / dataframe with only two columns: "trust_neighbors" and
"age"
## Enter yourself
c) Lastly, show all values of "trust_neighbors" and "age" for observations (rows) that
have the “male” variable value that is higher than 0.5
## Enter yourself
Find a way to generate a vector of “column averages” of the matrix X from the Nunn and
Wantchekon data in one line of code. Each entry in the vector should contain the sample
average of the values in the column. So a 100 by 4 matrix should generate a length-4 matrix.
## (Intercept) exports
## 1.619913e+00 -6.791360e-04
## age age2
## 8.395936e-03 -5.473436e-05
## male urban_dum
## 4.550246e-02 -1.404551e-01
## factor(education)1 factor(education)2
## 1.709816e-02 -5.224591e-02
## factor(education)3 factor(education)4
## -1.373770e-01 -1.889619e-01
## factor(education)5 factor(education)6
## -1.893494e-01 -2.400767e-01
## factor(education)7 factor(education)8
## -2.850748e-01 -1.232085e-01
## factor(education)9 factor(occupation)1
## -2.406437e-01 6.185655e-02
## factor(occupation)2 factor(occupation)3
## 7.392168e-02 3.356158e-02
## factor(occupation)4 factor(occupation)5
## 7.942048e-03 6.661126e-02
## factor(occupation)6 factor(occupation)7
## -7.563297e-02 1.699699e-02
## factor(occupation)8 factor(occupation)9
## -9.428177e-02 -9.981440e-02
## factor(occupation)10 factor(occupation)11
## -3.307068e-02 -2.300045e-02
## factor(occupation)12 factor(occupation)13
## -1.564540e-01 -1.441370e-02
## factor(occupation)14 factor(occupation)15
## -5.566414e-02 -2.343762e-01
## factor(occupation)16 factor(occupation)18
## -1.306947e-02 -1.729589e-01
## factor(occupation)19 factor(occupation)20
## -1.770261e-01 -2.457800e-02
## factor(occupation)21 factor(occupation)22
## -4.936813e-02 -1.068511e-01
## factor(occupation)23 factor(occupation)24
## -9.712205e-02 1.292371e-02
## factor(occupation)25 factor(occupation)995
168 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
## 2.623186e-02 -1.195063e-03
## factor(religion)2 factor(religion)3
## 5.395953e-02 7.887878e-02
## factor(religion)4 factor(religion)5
## 4.749150e-02 4.318455e-02
## factor(religion)6 factor(religion)7
## -1.787694e-02 -3.616542e-02
## factor(religion)10 factor(religion)11
## 6.015041e-02 2.237845e-01
## factor(religion)12 factor(religion)13
## 2.627086e-01 -6.812813e-02
## factor(religion)14 factor(religion)15
## 4.673681e-02 3.844555e-01
## factor(religion)360 factor(religion)361
## 3.656843e-01 3.416413e-01
## factor(religion)362 factor(religion)363
## 8.230393e-01 3.856565e-01
## factor(religion)995 factor(living_conditions)2
## 4.161301e-02 4.395862e-02
## factor(living_conditions)3 factor(living_conditions)4
## 8.627372e-02 1.197428e-01
## factor(living_conditions)5 district_ethnic_frac
## 1.203606e-01 -1.553648e-02
## frac_ethnicity_in_district isocodeBWA
## 1.011222e-01 -4.258953e-01
## isocodeGHA isocodeKEN
## 1.135307e-02 -1.819556e-01
## isocodeLSO isocodeMDG
## -5.511200e-01 -3.315727e-01
## isocodeMLI isocodeMOZ
## 7.528101e-02 8.223730e-02
## isocodeMWI isocodeNAM
## 3.062497e-01 -1.397541e-01
## isocodeNGA isocodeSEN
## -2.381525e-01 3.867371e-01
## isocodeTZA isocodeUGA
## 2.079366e-01 -6.443732e-02
## isocodeZAF isocodeZMB
## -2.179153e-01 -2.172868e-01
First, get a small subset of the nunn_full dataset. This time, sample 20 rows and se-
lect for variables exports, age, age2, male, and urban_dum. To this small subset, add
(bind_cols() in tidyverse or cbind() in base R) a column of 1’s; this represents the inter-
cept. If you need some guidance, look at how we sampled 10 rows selected for a different
set of variables above in the lecture portion.
# Enter here
Next let’s try calculating predicted values of levels of trust in neighbors by multiplying coef-
8.8. LINEAR COMBINATIONS 169
ficients for the intercept, exports, age, age2, male, and urban_dum to the actual observed
values for those variables in the small subset you’ve just created.
# Hint: You can get just selected elements from the vector returned by coef(lm_1_1)
# For example, the below code gives you the first 3 elements of the original vector
coef(lm_1_1)[1:3]
## (Intercept) male
## 1.61991315 0.04550246
170 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
Chapter 9
Visualization1
171
172 CHAPTER 9. VISUALIZATION
Why care about the Census? The Census is one of the fundamental acts of a government.
See the Law Review article by Persily (2011), “The Law of the Census.”2 The Census is
government’s primary tool for apportionment (allocating seats to districts), appropriations
(allocating federal funding), and tracking demographic change. See3 for example Hochschild
and Powell (2008) on how the categorizations of race in the Census during 1850-1930. Notice
also that both of these pieces are not inherently “quantitative” — the Persily article is a
Law Review and the Hochschild and Powell article is on American Historical Development
— but data analysis would be certainly relevant.
Time series data is a common form of data in social science data, and there is growing
methodological work on making causal inferences with time series.4 We will use the the ide-
ological estimates of the Supreme court, which has been in the news with Brett Kavanaugh’s
nomination.
First, the census. Read in a subset of the 2010 Census that we looked at earlier. This time,
it is in Rds form.
cen10 <- readRDS("data/input/usc2010_001percent.Rds")
The data comes from IPUMS5 , a great source to extract and analyze Census and Census-
conducted survey (ACS, CPS) data.
2 Persily,Nathaniel. 2011. “The Law of the Census: How to Count, What to Count, Whom to Count, and
Where to Count Them.”. Cardozo Law Review 32(3): 755–91.
3 Hochschild, Jennifer L., and Brenna Marea Powell. 2008. “Racial Reorganization and the United States
Census 1850–1930: Mulattoes, Half-Breeds, Mixed Parentage, Hindoos, and the Mexican Race.”. Studies
in American Political Development 22(1): 59–96.
4 Blackwell, Matthew, and Adam Glynn. 2018. “How to Make Causal Inferences with Time-Series Cross-
9.3 Counting
## [1] 30871
This and all subsequent tasks involve manipulating and summarizing data, sometimes called
“wrangling”. As per last time, there are both “base-R” and “tidyverse” approaches.
Yesterday we saw several functions from the tidyverse:
• select selects columns
• filter selects rows based on a logical (boolean) statement
• slice selects rows based on the row number
• arrange reordered the rows in descending order.
In this visualization section, we’ll make use of the pair of functions group_by() and
summarize().
9.4 Tabulating
Summarizing data is the key part of communication; good data viz gets the point across.6
Summaries of data come in two forms: tables and figures.
Here are two ways to count by group, or to tabulate.
In base-R Use the table function, that provides how many rows exist for an unique value
of the vector (remember unique from yesterday?)
table(cen10$race)
##
## American Indian or Alaska Native Black/Negro
## 295 4013
## Chinese Japanese
## 354 77
## Other Asian or Pacific Islander Other race, nec
## 1129 1839
## Three or more major races Two major races
## 88 869
## White
## 22207
With tidyverse, a quick convenience function is count, with the variable to count on included.
count(cen10, race)
6 Kastellec,
Jonathan P., and Eduardo L. Leoni. 2007. “Using Graphs Instead of Tables in Political Science.”.
Perspectives on Politics 5 (4): 755–71.
174 CHAPTER 9. VISUALIZATION
## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 American Indian or Alaska Native 295
## 2 Black/Negro 4013
## 3 Chinese 354
## 4 Japanese 77
## 5 Other Asian or Pacific Islander 1129
## 6 Other race, nec 1839
## 7 Three or more major races 88
## 8 Two major races 869
## 9 White 22207
We can check out the arguments of count and see that there is a sort option. What does
this do?
count(cen10, race, sort = TRUE)
## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 White 22207
## 2 Black/Negro 4013
## 3 Other race, nec 1839
## 4 Other Asian or Pacific Islander 1129
## 5 Two major races 869
## 6 Chinese 354
## 7 American Indian or Alaska Native 295
## 8 Three or more major races 88
## 9 Japanese 77
count is a kind of shorthand for group_by() and summarize. This code would have done
the same.
cen10 %>%
group_by(race) %>%
summarize(n = n())
## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 American Indian or Alaska Native 295
## 2 Black/Negro 4013
## 3 Chinese 354
## 4 Japanese 77
## 5 Other Asian or Pacific Islander 1129
## 6 Other race, nec 1839
## 7 Three or more major races 88
## 8 Two major races 869
## 9 White 22207
9.5. BASE R GRAPHICS AND GGPLOT 175
If you are new to tidyverse, what would you think each row did? Reading the function help
page, verify if your intuition was correct.
where n() is a function that counts rows.
Two prevalent ways of making graphing are referred to as “base-R” and “ggplot”.
9.5.1 base R
“Base-R” graphics are graphics that are made with R’s default graphics commands. First,
let’s assign our tabulation to an object, then put it in the barplot() function.
barplot(table(cen10$race))
20000
10000
5000
0
9.5.2 ggplot
A popular alternative a ggplot graphics, that you were introduced to in the tutorial. gg
stands for grammar of graphics by Hadley Wickham, and it has a new semantics of explaining
graphics in R. Again, first let’s set up the data.
176 CHAPTER 9. VISUALIZATION
Although the tutorial covered making scatter plots as the first cut, often data requires
summaries before they made into graphs.
For this example, let’s group and count first like we just did. But assign it to a new object.
grp_race <- count(cen10, race)
We will now plot this grouped set of numbers. Recall that the ggplot() function takes two
main arguments, data and aes.
1. First enter a single dataframe from which you will draw a plot.
2. Then enter the aes, or aesthetics. This defines which variable in the data the plotting
functions should take for pre-set dimensions in graphics. The dimensions x and y are
the most important. We will assign race and count to them, respectively,
3. After you close ggplot() .. add layers by the plus sign. A geom is a layer of graphical
representation, for example geom_histogram renders a histogram, geom_point renders
a scatter plot. For a barplot, we can use geom_col()
20000
15000
n
10000
5000
White
Black/Negro
Other race, nec
Other Asian or Pacific Islander
Two major races
Chinese
American Indian or Alaska Native
Three or more major races
Japanese
Notice that we applied the sort() function to order the bars in terms of their counts. The
default ordering of a categorical variable / factor is alphabetical. Alphabetical ordering is
uninformative and almost never the way you should order variables.
In ggplot you might do this by:
library(forcats)
geom_col() +
coord_flip() +
labs(y = "Number in Race Category",
x = "",
caption = "Source: 2010 U.S. Census sample")
White
Black/Negro
Chinese
Japanese
The data ink ratio was popularized by Ed Tufte (originally a political economy scholar who
has recently become well known for his data visualization work). See Tufte (2001), The
Visual Display of Quantitative Information and his website https://www.edwardtufte.
com/tufte/. For a R and ggplot focused example using social science examples, check
out Healy (2018), Data Visualization: A Practical Introduction with a draft at https:
//socviz.co/7 . There are a growing number of excellent books on data visualization.
9.7 Cross-tabs
Visualizations and Tables each have their strengths. A rule of thumb is that more than a
dozen numbers on a table is too much to digest, but less than a dozen is too few for a figure
to be worth it. Let’s look at a table first.
A cross-tab is counting with two types of variables, and is a simple and powerful tool to
show the relationship between multiple variables.
7 Healy, Kieran. forthcoming. Data Visualization: A Practical Introduction. Princeton University Press
9.7. CROSS-TABS 179
##
## American Indian or Alaska Native Black/Negro
## Alabama 2 128
## Alaska 11 6
## Arizona 28 23
## Arkansas 1 45
## California 42 253
## Colorado 7 26
## Connecticut 1 39
## Delaware 3 28
## District of Columbia 0 35
## Florida 9 304
## Georgia 2 304
## Hawaii 0 0
## Idaho 2 0
## Illinois 5 194
## Indiana 2 66
## Iowa 0 9
## Kansas 2 24
## Kentucky 2 35
## Louisiana 3 161
## Maine 0 4
## Maryland 2 177
## Massachusetts 5 38
## Michigan 5 147
## Minnesota 6 25
## Mississippi 1 116
## Missouri 4 74
## Montana 8 0
## Nebraska 2 11
## Nevada 6 15
## New Hampshire 1 1
## New Jersey 0 130
## New Mexico 21 3
## New York 13 305
## North Carolina 12 220
## North Dakota 4 1
## Ohio 1 122
## Oklahoma 21 20
## Oregon 5 5
## Pennsylvania 2 156
## Rhode Island 2 3
## South Carolina 2 120
## South Dakota 7 1
## Tennessee 0 97
180 CHAPTER 9. VISUALIZATION
## Texas 14 316
## Utah 8 0
## Vermont 0 2
## Virginia 0 171
## Washington 14 20
## West Virginia 0 5
## Wisconsin 6 27
## Wyoming 1 1
##
## Chinese Japanese Other Asian or Pacific Islander
## Alabama 1 0 3
## Alaska 0 0 5
## Arizona 1 0 12
## Arkansas 0 0 1
## California 141 27 359
## Colorado 3 0 10
## Connecticut 7 0 16
## Delaware 1 0 4
## District of Columbia 0 0 1
## Florida 4 1 24
## Georgia 5 0 35
## Hawaii 2 16 35
## Idaho 0 1 0
## Illinois 6 3 53
## Indiana 3 0 8
## Iowa 1 0 4
## Kansas 2 0 8
## Kentucky 2 0 4
## Louisiana 1 0 5
## Maine 1 0 1
## Maryland 4 1 12
## Massachusetts 15 2 28
## Michigan 8 1 23
## Minnesota 5 1 28
## Mississippi 0 0 3
## Missouri 2 0 9
## Montana 0 0 0
## Nebraska 0 0 5
## Nevada 6 2 15
## New Hampshire 1 1 3
## New Jersey 19 2 65
## New Mexico 1 1 1
## New York 55 3 68
## North Carolina 4 1 12
## North Dakota 0 0 0
## Ohio 5 2 17
## Oklahoma 0 0 5
## Oregon 4 0 11
9.7. CROSS-TABS 181
## Pennsylvania 10 1 28
## Rhode Island 0 0 4
## South Carolina 1 0 4
## South Dakota 0 1 1
## Tennessee 0 0 13
## Texas 15 2 92
## Utah 1 1 6
## Vermont 0 0 0
## Virginia 8 2 29
## Washington 9 4 46
## West Virginia 0 0 0
## Wisconsin 0 1 11
## Wyoming 0 0 2
##
## Other race, nec Three or more major races
## Alabama 8 1
## Alaska 2 0
## Arizona 74 2
## Arkansas 11 1
## California 585 14
## Colorado 28 1
## Connecticut 20 1
## Delaware 5 1
## District of Columbia 1 0
## Florida 72 2
## Georgia 35 1
## Hawaii 0 14
## Idaho 8 1
## Illinois 75 2
## Indiana 20 1
## Iowa 10 0
## Kansas 6 0
## Kentucky 5 1
## Louisiana 7 0
## Maine 0 0
## Maryland 28 1
## Massachusetts 26 0
## Michigan 8 2
## Minnesota 13 1
## Mississippi 2 2
## Missouri 6 2
## Montana 1 0
## Nebraska 6 0
## Nevada 41 1
## New Hampshire 1 0
## New Jersey 69 3
## New Mexico 23 1
## New York 154 8
182 CHAPTER 9. VISUALIZATION
## North Carolina 40 2
## North Dakota 0 0
## Ohio 7 3
## Oklahoma 15 3
## Oregon 21 4
## Pennsylvania 30 1
## Rhode Island 6 1
## South Carolina 6 1
## South Dakota 2 0
## Tennessee 13 0
## Texas 253 2
## Utah 14 0
## Vermont 1 0
## Virginia 29 4
## Washington 37 2
## West Virginia 0 0
## Wisconsin 13 1
## Wyoming 2 0
##
## Two major races White
## Alabama 8 344
## Alaska 15 37
## Arizona 11 485
## Arkansas 2 247
## California 174 2168
## Colorado 22 401
## Connecticut 7 284
## Delaware 1 66
## District of Columbia 2 21
## Florida 42 1435
## Georgia 21 587
## Hawaii 27 39
## Idaho 6 129
## Illinois 35 856
## Indiana 6 514
## Iowa 8 287
## Kansas 8 237
## Kentucky 9 357
## Louisiana 6 273
## Maine 1 117
## Maryland 13 302
## Massachusetts 18 515
## Michigan 23 792
## Minnesota 10 483
## Mississippi 1 167
## Missouri 14 516
## Montana 0 88
## Nebraska 5 155
9.8. COMPOSITION PLOTS 183
## Nevada 16 171
## New Hampshire 1 129
## New Jersey 25 589
## New Mexico 6 146
## New York 51 1220
## North Carolina 20 648
## North Dakota 1 46
## Ohio 20 931
## Oklahoma 24 266
## Oregon 9 279
## Pennsylvania 27 1045
## Rhode Island 4 74
## South Carolina 6 325
## South Dakota 2 72
## Tennessee 9 474
## Texas 71 1792
## Utah 8 255
## Vermont 4 59
## Virginia 24 548
## Washington 33 524
## West Virginia 3 168
## Wisconsin 8 497
## Wyoming 2 47
Another function to make a cross-tab is the xtabs command, which uses formula notation.
xtabs(~ state + race, cen10)
What if we care about proportions within states, rather than counts? Say we’d like to
compare the racial composition of a small state (like Delaware) and a large state (like
California). In fact, most tasks of inference is about the unobserved population, not the
observed data — and proportions are estimates of a quantity in the population.
One way to transform a table of counts to a table of proportions is the function prop.table.
Be careful what you want to take proportions of – this is set by the margin argument. In
R, the first margin (margin = 1) is rows and the second (margin = 2) is columns.
ptab_race_state <- prop.table(xtab_race_state, margin = 2)
Check out each of these table objects in your console and familiarize yourself with the
difference.
How would you make the same figure with ggplot()? First, we want a count for each state
× race combination. So group by those two factors and count how many observations are
in each two-way categorization. group_by() can take any number of variables, separated
by commas.
184 CHAPTER 9. VISUALIZATION
Can you tell from the code what grp_race_state will look like?
# run on your own
grp_race_state
Now, we want to tell ggplot2 something like the following: I want bars by state, where
heights indicate racial groups. Each bar should be colored by the race. With some googling,
you will get something like this:
ggplot(data = grp_race_state, aes(x = state, y = n, fill = race)) +
geom_col(position = "fill") + # the position is determined by the fill ae
scale_fill_brewer(name = "Census Race", palette = "OrRd", direction = -1) + # choose palette
coord_flip() + # flip axes
scale_y_continuous(labels = percent) + # label numbers as percentage
labs(y = "Proportion of Racial Group within State",
x = "",
source = "Source: 2010 Census sample") +
theme_minimal()
9.9. LINE GRAPHS 185
Wyoming
Wisconsin
West Virginia
Washington
Virginia
Vermont
Utah
Texas
Tennessee
South Dakota
South Carolina
Rhode Island
Pennsylvania
Oregon
Oklahoma
Ohio
North Dakota
North Carolina Census Race
New York
New Mexico American Indian or Alaska Native
New Jersey
New Hampshire Black/Negro
Nevada
Chinese
Nebraska
Montana Japanese
Missouri
Mississippi Other Asian or Pacific Islander
Minnesota Other race, nec
Michigan
Massachusetts Three or more major races
Maryland
Maine Two major races
Louisiana White
Kentucky
Kansas
Iowa
Indiana
Illinois
Idaho
Hawaii
Georgia
Florida
District of Columbia
Delaware
Connecticut
Colorado
California
Arkansas
Arizona
Alaska
Alabama
0% 25% 50% 75% 100%
Proportion of Racial Group within State
This data is adapted from the estimates of Martin and Quinn on their website http://
mqscores.lsa.umich.edu/.8
justice <- read_csv("data/input/justices_court-median.csv")
What does the data look like? How do you think it is organized? What does each row
represent?
justice
## # A tibble: 746 x 7
## term justice_id justice idealpt idealpt_sd median_idealpt
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1937 67 McReyn~ 3.44 0.54 -0.568
## 2 1937 68 Brande~ -0.612 0.271 -0.568
## 3 1937 71 Suther~ 1.59 0.549 -0.568
## 4 1937 72 Butler 2.06 0.426 -0.568
## 5 1937 74 Stone -0.774 0.259 -0.568
## 6 1937 75 Hughes2 -0.368 0.232 -0.568
## 7 1937 76 O. Rob~ 0.008 0.228 -0.568
## 8 1937 77 Cardozo -1.59 0.634 -0.568
## 9 1937 78 Black -2.90 0.334 -0.568
## 10 1937 79 Reed -1.06 0.342 -0.568
## # ... with 736 more rows, and 1 more variable: median_justice <chr>
As you might have guessed, these data can be shown in a time trend from the range of the
term variable. As there are only nine justices at any given time and justices have life tenure,
there times on the court are staggered. With a common measure of “preference”, we can
plot time trends of these justices ideal points on the same y-axis scale.
ggplot(justice, aes(x = term, y = idealpt)) +
geom_line()
0
idealpt
−5
Why does the above graph not look like the the put in the beginning? Fix it by adding just
one aesthetic to the graph.
# enter a correction that draws separate lines by group.
If you got the right aesthetic, this seems to “work” off the shelf. But take a moment to see
why the code was written as it is and how that maps on to the graphics. What is the group
aesthetic doing for you?
Now, this graphic already indicates a lot, but let’s improve the graphics so people can
actually read it. This is left for a Exercise.
As social scientists, we should also not forget to ask ourselves whether these numerical
measures are fit for what we care about, or actually succeeds in measuring what we’d like
to measure. The estimation of these “ideal points” is a subfield of political methodology
beyond this prefresher. For more reading, skim through the original paper by Martin and
Quinn (2002).9 Also for a methodological discussion on the difficulty of measuring time
series of preferences, check out Bailey (2013).10
9 Martin, Andrew D. and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte
Carlo for the U.S. Supreme Court, 1953-1999”. Political Analysis. 10(2): 134-153.
10 Bailey, Michael A. 2013. “Is Today’s Court the Most Conservative in Sixty Years? Challenges and
Exercises
In the time remaining, try the following exercises. Order doesn’t matter.
1: Rural states
Make a well-labelled figure that plots the proportion of the state’s population (as per the
census) that is 65 years or older. Each state should be visualized as a point, rather than a
bar, and there should be 51 points, ordered by their value. All labels should be readable.
# Enter yourself
• Alternatively, you can for instead plot the proportion of residents who do not reisde
in a specified city.
Using the justices_court-median.csv dataset and building off of the plot that was given,
make an improved plot by implementing as many of the following changes (which hopefully
improves the graph):
• Label axes
• Use a black-white background.
• Change the breaks of the x-axis to print numbers for every decade, not just every two
decades.
• Plots each line in translucent gray, so the overlapping lines can be visualized clearly.
(Hint: in ggplot the alpha argument controls the degree of transparency)
• Limit the scale of the y-axis to [-5, 5] so that the outlier justice in the 60s is trimmed
and the rest of the data can be seen more easily (also, who is that justice?)
• Plot the ideal point of the justice who holds the “median” ideal point in a given term.
To distinguish this with the others, plot this line separately in a very light red below
the individual justice’s lines.
• Highlight the trend-line of only the nine justices who are currently sitting on SCOTUS.
Make sure this is clearer than the other past justices.
• Add the current nine justice’s names to the right of the endpoint of the 2016 figure,
alongside their ideal point.
• Make sure the text labels do not overlap with each other for readability using the
ggrepel package.
• Extend the x-axis label to about 2020 so the text labels of justices are to the right of
the trend-lines.
• Add a caption to your text describing the data briefly, as well as any features relevant
for the reader (such as the median line and the trimming of the y-axis)
# Enter yourself
9.9. LINE GRAPHS 189
The Figure we made that shows racial composition by state has one notable shortcoming:
it orders the states alphabetically, which is not particularly useful if you want see an overall
pattern, without having particular states in mind.
Find a way to modify the figures so that the states are ordered by the proportion of White
residents in the sample.
# Enter yourself
As a student of politics our goal is not necessarily to make pretty pictures, but rather make
pictures that tell us something about politics, government, or society. If you could augment
either the census dataset or the justices dataset in some way, what would be an substantively
significant thing to show as a graphic?
190 CHAPTER 9. VISUALIZATION
Chapter 10
Now that we have covered some hands-on ways to use graphics, let’s go into some funda-
mentals of the R language.
Let’s first set up
library(dplyr)
library(readr)
library(haven)
library(ggplot2)
Objects are abstract symbols in which you store data. Here we will create an object from
copy, and assign cen10 to it.
191
192 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
## # A tibble: 30,871 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## # ... with 30,861 more rows
What happens if you do this next?
copy <- ""
It got reassigned:
copy
## [1] ""
10.1.1 lists
Lists are one of the most generic and flexible type of object. You can make an empty list
by the function list()
my_list <- list()
my_list
## list()
And start filling it in. Slots on the list are invoked by double square brackets [[]]
my_list[[1]] <- "contents of the first slot -- this is a string"
my_list[["slot 2"]] <- "contents of slot named slot 2"
my_list
## [[1]]
## [1] "contents of the first slot -- this is a string"
##
## $`slot 2`
## [1] "contents of slot named slot 2"
10.2. MAKING YOUR OWN OBJECTS 193
each slot can be anything. What are we doing here? We are defining the 1st slot of the list
my_list to be a vector c(1, 2, 3, 4, 5)
my_list[[1]] <- c(1, 2, 3, 4, 5)
my_list
## [[1]]
## [1] 1 2 3 4 5
##
## $`slot 2`
## [1] "contents of slot named slot 2"
You can even make nested lists. Let’s say we want the 1st slot of the list to be another list
of three elements.
my_list[[1]][[1]] <- "subitem 1 in slot 1 of my_list"
my_list[[1]][[2]] <- "subitem 1 in slot 2 of my_list"
my_list[[1]][[3]] <- "subitem 1 in slot 3 of my_list"
my_list
## [[1]]
## [1] "subitem 1 in slot 1 of my_list" "subitem 1 in slot 2 of my_list"
## [3] "subitem 1 in slot 3 of my_list" "4"
## [5] "5"
##
## $`slot 2`
## [1] "contents of slot named slot 2"
We’ve covered one type of object, which is a list. You saw it was quite flexible. How many
types of objects are there?
There are an infinite number of objects, because people make their own class of object. You
can detect the type of the object (the class) by the function class
Object can be said to be an instance of a class.
Analogies:
class - Pokemon, object - Pikachu
class - Book, object - To Kill a Mockingbird
class - DataFrame, object - 2010 census data
class - Character, object - “Programming is Fun”
What is type (class) of object is cen10?
class(cen10)
194 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
## [1] "character"
To change or create the class of any object, you can assign it. To do this, assign the name
of your class to character to an object’s class().
We can start from a simple list. For example, say we wanted to store data about pokemon.
Because there is no pre-made package for this, we decide to make our own class.
pikachu <- list(name = "Pikachu",
number = 25,
type = "Electric",
color = "Yellow")
## List of 4
## $ name : chr "Pikachu"
## $ number: num 25
## $ type : chr "Electric"
## $ color : chr "Yellow"
## - attr(*, "class")= chr "Pokemon"
pikachu$type
## [1] "Electric"
Most of the R objects that you will see as you advance are their own objects. For example,
here’s a linear regression object (which you will learn more about in Gov 2000):
ols <- lm(mpg ~ wt + vs + gear + carb, mtcars)
class(ols)
## [1] "lm"
Anything can be an object! Even graphs (in ggplot) can be assigned, re-assigned, and
edited.
grp_race <- group_by(cen10, race)%>%
summarize(count = n())
gg_tab
20000
15000
count
10000
5000
Japanese
Three orAmerican
more major
Indian
races
or Alaska
Chinese
Native
Two Other
major Asian
races or Pacific
OtherIslander
race, nec
Black/Negro White
race
Source: U.S. Census 2010
It can be hard to understand an R object because it’s contents are unknown. The function
str, short for structure, is a quick way to look into the innards of an object
str(my_list)
## List of 2
## $ : chr [1:5] "subitem 1 in slot 1 of my_list" "subitem 1 in slot 2 of my_list" "subite
## $ slot 2: chr "contents of slot named slot 2"
class(my_list)
## [1] "list"
196 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
## List of 4
## $ name : chr "Pikachu"
## $ number: num 25
## $ type : chr "Electric"
## $ color : chr "Yellow"
## - attr(*, "class")= chr "Pokemon"
What does a ggplot object look like? Very complicated, but at least you can see it:
# enter this on your console
str(gg_tab)
In the social science we often analyze variables. As you saw in the tutorial, different types
of variables require different care.
A key link with what we just learned is that variables are also types of R objects.
10.3.1 scalars
One number. How many people did we count in our Census sample?
nrow(cen10)
## [1] 30871
Question: What proportion of our census sample is Native American? This number is also
a scalar
# Enter yourself
unique(cen10$race)
## [1] 0.009555894
Hint: you can use the function mean() to calcualte the sample mean. The sample proportion
is the mean of a sequence of number, where your event of interest is a 1 (or TRUE) and others
are 0 (or FALSE).
10.3. TYPES OF VARIABLES 197
A sequence of numbers.
grp_race_ordered$count
## [1] "integer"
Or even, all the ages of the millions of people in our Census. Here are just the first few
numbers of the list.
head(cen10$age)
## [1] 8 24 37 12 18 50
## [1] "character"
or more characters. Notice here that there’s a difference between a vector of individual
characters and a length-one object of characters.
my_name_letters <- c("S", "h", "i", "r", "o")
my_name_letters
## [1] "character"
## [1] FALSE
198 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
You’ll see below that the most basic functions are quite complicated internally.
You’ll notice that functions contain other functions. wrapper functions are functions that
“wrap around” existing functions. This sounds redundant, but it’s an important feature of
programming. If you find yourself repeating a command more than two times, you should
make your own function, rather than writing the same type of code.
It’s worth remembering the basic structure of a function. You create a new function, call it
my_fun by this:
my_fun <- function() {
If we wanted to generate a function that computed the number of men in your data, what
would that look like?
count_men <- function(data) {
return(nmen)
}
## [1] 15220
The point of a function is that you can use it again and again without typing up the set
of constituent manipulations. So, what if we wanted to figure out the number of men in
California?
count_men(cen10[cen10$state == "California",])
## [1] 1876
10.4. WHAT IS A FUNCTION? 199
Let’s go one step further. What if we want to know the proportion of non-whites in a state,
just by entering the name of the state? There’s multiple ways to do it, but it could look
something like this
nw_in_state <- function(data, state) {
nw.s / total.s
}
The last line is what gets generated from the function. To be more explicit you can wrap
the last line around return(). (as in return(nw.s/total.s). return() is used when you
want to break out of a function in the middle of it and not wait till the last line.
Try it on your favorite state!
nw_in_state(cen10, "Massachusetts")
## [1] 0.2040185
Checkpoint
Try making your own function, average_age_in_state, that will give you the average age
of people in a given state.
# Enter on your own
Try making your own function, asians_in_state, that will give you the number of Chinese,
Japanese, and Other Asian or Pacific Islander people in a given state.
# Enter on your own
Try making your own function, ‘top_10_oldest_cities’, that will give you the names of cities
whose population’s average age is top 10 oldest.
# Enter on your own
200 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
You can think of a package as a suite of functions that other people have already built for
you to make your life easier.
help(package = "ggplot2")
To use a package, you need to do two things: (1) install it, and then (2) load it.
Installing is a one-time thing
install.packages("ggplot2")
But you need to load each time you start a R instance. So always keep these commands on
a script.
library(ggplot2)
In rstudio.cloud, we already installed a set of packages for you. But when you start your
own R instance, you need to have installed the package at some point.
10.6 Conditionals
Sometimes, you want to execute a command only under certain conditions. This is done
through the almost universal function, if(). Inside the if function we enter a logical
statement. The line that is adjacent to, or follows, the if() statement only gets executed
if the statement returns TRUE.
For example,
For example,
x <- 5
if (x >0) {
print("positive number")
} else if (x == 0) {
print ("zero")
} else {
print("negative number")
}
print("negative number")
}
}
is_positive(5)
10.7 For-loops
Loops repeat the same statement, although the statement can be “the same” only in an
abstract sense. Use the for(x in X) syntax to repeat the subsequent command as many
times as there are elements in the right-hand object X. Each of these elements will be referred
to the left-hand index x
First, come up with a vector.
fruits <- c("apples", "oranges", "grapes")
n <- nrow(state_data)
men_perc <- round(100*(nmen/n), digits=2)
print(paste("Percentage of men in",state, "is", men_perc))
male_percentages
What if I want to calculate the population percentage of a race group for all race groups in
states of interest? You could probably use tidyverse functions to do this, but let’s try using
loops!
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
for (state in states_of_interest) {
for (race in unique(cen10$race)) {
race_state_num <- nrow(cen10[cen10$race == race & cen10$state == state, ])
state_pop <- nrow(cen10[cen10$state == state, ])
race_perc <- round(100*(race_state_num/(state_pop)), digits=2)
print(paste("Percentage of ", race , "in", state, "is", race_perc))
}
10.8. NESTED LOOPS 203
Exercises
A issue raised in Persily’s article is that the full-count U.S. Census does not record whether
the residents are citizens of the United States1 . Instead, this question is asked in a survey, the
American Community Survey. The two are fundamentally different exercises: the Census
counts everyone by definition, a survey samples its data. Load the 1 percent sample of
the 2015 ACS (acs2015_1percent.csv, in the input folder) and give an estimate of the
proportion of a state’s ACS respondents that are reportedly U.S. citizens.
acs<- read_csv("data/input/acs2015_1percent.csv", col_types = cols())
set.seed(02138)
sample_acs <- sample_frac(acs, 0.01)
# Enter yourself
Write your own function that makes some task of data analysis simpler. Ideally, it would be
a function that helps you do either of the previous tasks in fewer lines of code. You can use
the three lines of code that was provided in exercise 1 to wrap that into another function
too!
# Enter yourself
Using a loop, create a crosstab of sex and race for each state in the set “states_of_interest”
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
# Enter yourself
• R basic programming
• Counting.
• Visualization.
• Objects and Classes.
• Matrix algebra in R
• Functions.
Today you will work on your own, but feel free to ask a fellow classmate nearby or the
instructor. The objective for this session is to get more experience using R, but in the
process (a) test a prominent theory in the political science literature and (b) explore related
ideas of interest to you.
11.1 Motivation
The “Democratic Peace” is one of the most widely discussed propositions in political science,
covering the fields of International Relations and Comparative Politics, with insights to
domestic politics of democracies (e.g. American Politics). The one-sentence idea is that
democracies do not fight with each other. There have been much theoretical debate – for
example in earlier work, Oneal and Russet (1999) argue that the democratic peace is not
due to the hegemony of strong democracies like the U.S. and attempt to distinguish between
1 Module originally written by Shiro Kuriwaki, Connor Jerzak, and Yon Soo Park
207
208 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG
realist and what they call Kantian propositions (e.g. democratic governance, international
organizations)2 .
An empirical demonstration of the democratic peace is also a good example of a Time
Series Cross Sectional (or panel) dataset, where the same units (in this case countries)
are observed repeatedly for multiple time periods. Experience in assembling and analyzing
a TSCS dataset will prepare you for any future research in this area.
11.2 Setting up
library(dplyr)
library(tidyr)
library(readr)
library(data.table)
library(ggplot2)
• Polity: The Polity data can be downloaded from their website (http://www.
systemicpeace.org/inscrdata.html). Look for the newest version of the time
series that has the widest coverage.
France Germany
10
−5
−10
polity2
−5
−10
1800 1850 1900 1950 2000 1800 1850 1900 1950 2000
year
head(polity)
## # A tibble: 6 x 5
## scode ccode country year polity2
210 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG
## # A tibble: 6,132 x 5
## ccode polity_code dispute StYear EndYear
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 200 UKG 1 1902 1903
## 2 2 USA 1 1902 1903
## 3 345 YGS 1 1913 1913
## 4 300 <NA> 1 1913 1913
## 5 339 ALB 1 1946 1946
## 6 200 UKG 1 1946 1946
## 7 200 UKG 1 1951 1952
## 8 651 EGY 1 1951 1952
## 9 630 IRN 1 1856 1857
## 10 200 UKG 1 1856 1857
## # ... with 6,122 more rows
11.6 Loops
Notice that in the mid data, we have a start of a dispute vs. an end of a dispute.In order to
combine this into the polity data, we want a way to give each of the interval years a row.
There are many ways to do this, but one is a loop. We go through one row at a time, and
then for each we make a new dataset. that has year as a sequence of each year.
mid_year_by_year <- data_frame(ccode = numeric(),
year = numeric(),
dispute = numeric())
Figure 11.1
head(mid_year_by_year)
## # A tibble: 6 x 3
## ccode year dispute
## <dbl> <int> <dbl>
## 1 200 1902 1
## 2 200 1903 1
## 3 2 1902 1
## 4 2 1903 1
## 5 345 1913 1
## 6 300 1913 1
11.7 Merging
We want to combine these two datasets by merging. Base-R has a function called merge.
dplyr has several types of joins (the same thing). Those names are based on SQL syntax.
Here we can do a left_join matching rows from mid to polity. We want to keep the rows
in polity that do not match in mid, and label them as non-disputes.
212 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG
head(p_m)
## # A tibble: 6 x 6
## scode ccode country year polity2 dispute
## <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 FRN 220 France 1800 -8 NA
## 2 FRN 220 France 1801 -8 NA
## 3 FRN 220 France 1802 -8 NA
## 4 FRN 220 France 1803 -8 NA
## 5 FRN 220 France 1804 -8 NA
## 6 FRN 220 France 1805 -8 NA
Replace dispute = NA rows with a zero.
p_m$dispute[is.na(p_m$dispute)] <- 0
long to wide
p_m_wide <- dcast(data = p_m,
formula = ccode ~ year,
value.var = "polity2")
Try building a panel that would be useful in answering the Democratic Peace Question,
perhaps in these steps.
Often, files we need are saved in the .xls or xlsx format. It is possible to read these files
directly into R, but experience suggests that this process is slower than converting them first
to .csv format and reading them in as .csv files.
readxl/readr/haven packages(https://github.com/tidyverse/tidyverse) is con-
stantly expanding to capture more file types. In day 1, we used the package readxl, using
the read_excel() function.
We will use data to test a version of the Democratic Peace Thesis (DPS). Democracies are
said to go to war less because the leaders who wage wars are accountable to voters who
11.8. MAIN PROJECT 213
have to bear the costs of war. Are democracies less likely to engage in militarized interstate
disputes?
To start, let’s download and merge some data.
• Load in the Militarized Interstate Dispute (MID) files. Militarized interstate disputes
are hostile action between two formally recognized states. Examples of this would be
threats to use force, threats to declare war, beginning war, fortifying a border with
troops, and so on.
• Find a way to merge the Polity IV dataset and the MID data. This process can be
a bit tricky.
• An advanced version of this task would be to download the dyadic form of the data
and try merging that with polity.
1. Calculate the mean Polity2 score by year. Plot the result. Use graphical indicators
of your choosing to show where key events fall in this timeline (such as 1914, 1929,
1939, 1989, 2008). Speculate on why the behavior from 1800 to 1920 seems to be
qualitatively different than behavior afterwards.
2. Do the same but only among state-years that were invovled in a MID. Plot this line
together with your results from 1.
3. Do the same but only among state years that were not involved in a MID.
4. Arrive at a tentative conclusion for how well the Democratic Peace argument seems
to hold up in this dataset. Visualize this conclusion.
214 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG
Chapter 12
Simulation1
• R basics
• Visualization
• Matrices and vectors
• Functions, objects, loops
• Joining real data
In this module, we will start to work with generating data within R, from thin air, as
it were. Doing simulation also strengthens your understanding of Probability (Section
@ref{probability}).
Check if you have an idea of how you might code the following tasks:
215
216 CHAPTER 12. SIMULATION
## [1] 1 2 3 7 6
2 Axelrod, Robert. 1997. “The Dissemination of Culture.” Journal of Conflict Resolution 41(2): 203–26.
3 Chen, Jowei, and Jonathan Rodden. “Unintentional Gerrymandering: Political Geography and Electoral
Bias in Legislatures. Quarterly Journal of Political Science, 8:239-269”
4 King, Gary, et al. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple
## [1] 7 8 9 6 2 5 4 1 10 3
2. Sampling with replacement (replace = TRUE) means that even if an element of x is
chosen, it is put back in the pool and may be chosen again.
sample(x = 1:10, size = 10, replace = TRUE) ## any number can appear more than once
## [1] 10 2 1 5 3 9 3 6 6 1
It follows then that you cannot sample without replacement a sample that is larger than
the pool.
sample(x = 1:10, size = 100, replace = FALSE)
## Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the popu
So far, every element in x has had an equal probability of being chosen. In some application,
we want a sampling scheme where some elements are more likely to be chosen than others.
The argument prob handles this.
For example, this simulates 20 fair coin tosses (each outcome is equally likely to happen)
sample(c("Head", "Tail"), size = 20, prob = c(0.5, 0.5), replace = TRUE)
## [1] "Head" "Tail" "Tail" "Head" "Tail" "Head" "Head" "Tail" "Tail" "Head"
## [11] "Tail" "Head" "Head" "Head" "Tail" "Head" "Head" "Head" "Head" "Head"
But this simulates 20 biased coin tosses, where say the probability of Tails is 4 times more
likely than the number of Heads
sample(c("Head", "Tail"), size = 20, prob = c(0.2, 0.8), replace = TRUE)
## [1] "Tail" "Tail" "Tail" "Tail" "Tail" "Head" "Tail" "Tail" "Tail" "Head"
## [11] "Head" "Head" "Tail" "Head" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail"
few rows might lead you to ignore any issues that end up in the bottom for whatever
reason.
• Testing your analysis with a small sample: If running analyses on a dataset takes more
than a handful of seconds, change your dataset upstream to a fraction of the size so
the rest of the code runs in less than a second. Once verifying your analysis code runs,
then re-do it with your full dataset (by simply removing the sample_n / sample_frac
line of code in the beginning). While three seconds may not sound like much, they
accumulate and eat up time.
rbinom()
rbinom builds upon sample as a tool to help you answer the question – what is the total
number of successes I would get if I sampled a binary (Bernoulli) result from a test with
size number of trials each, with a event-wise probability of prob. The first argument n
asks me how many such numbers I want.
For example, I want to know how many Heads I would get if I flipped a fair coin 100 times.
rbinom(n = 1, size = 100, prob = 0.5)
## [1] 51
Now imagine this I wanted to do this experiment 10 times, which would require I flip the
coin 10 x 100 = 1000 times! Helpfully, we can do this in one line
rbinom(n = 10, size = 100, prob = 0.5)
## [1] 47 46 49 57 45 42 45 50 44 50
runif()
runif also simulates a stochastic scheme where each event has equal probability of getting
chosen like sample, but is a continuous rather than discrete system. We will cover this more
in the next math module.
The intuition to emphasize here is that one can generate potentially infinite amounts (size
n) of noise that is a essentially random
runif(n = 5)
rnorm()
rnorm is also a continuous distribution, but draws from a Normal distribution – perhaps
the most important distribution in statistics. It runs the same way as runif
220 CHAPTER 12. SIMULATION
rnorm(n = 5)
To better visualize the difference between the output of runif and rnorm, let’s generate lots
of each and plot a histogram.
from_runif <- runif(n = 1000)
from_rnorm <- rnorm(n = 1000)
200
100
150
80
Frequency
Frequency
60
100
40
50
20
0
from_runif from_rnorm
12.5 r, p, and d
Each distribution can do more than generate random numbers (the prefix r). We can
compute the cumulative probability by the function pbinom(), punif(), and pnorm(). Also
the density – the value of the PDF – by dbinom(), dunif() and dnorm().
12.6. SET.SEED() 221
12.6 set.seed()
R doesn’t have the ability to generate truly random numbers! Random numbers are actually
very hard to generate. (Think: flipping a coin –> can be perfectly predicted if I know wind
speed, the angle the coin is flipped, etc.). Some people use random noise in the atmosphere or
random behavior in quantum systems to generate “truly” (?) random numbers. Conversely,
R uses deterministic algorithms which take as an input a “seed” and which then perform a
series of operations to generate a sequence of random-seeming numbers (that is, numbers
whose sequence is sufficiently hard to predict).
Let’s think about this another way. Sampling is a stochastic process, so every time you run
sample() or runif() you are bound to get a different output (because different random seeds
are used). This is intentional in some cases but you might want to avoid it in others. For
example, you might want to diagnose a coding discrepancy by setting the random number
generator to give the same number each time. To do this, use the function set.seed().
In the function goes any number. When you run a sample function in the same command
as a preceding set.seed(), the sampling function will always give you the same sequence
of numbers. In a sense, the sampler is no longer random (in the sense of unpredictable to
use; remember: it never was “truly” random in the first place)
set.seed(02138)
runif(n = 10)
Exercises
Census Sampling
What can we learn from surveys of populations, and how wrong do we get if our sampling is
biased?6 Suppose we want to estimate the proportion of U.S. residents who are non-white
(race != "White"). In reality, we do not have any population dataset to utilize and so
6 This example is inspired from Meng, Xiao-Li (2018). Statistical paradises and paradoxes in big data (I):
Law of large populations, big data paradox, and the 2016 US presidential election. Annals of Applied
Statistics 12:2, 685–726. doi:10.1214/18-AOAS1161SF.
222 CHAPTER 12. SIMULATION
we only see the sample survey. Here, however, to understand how sampling works, let’s
conveniently use the Census extract in some cases and pretend we didn’t in others.
(a) First, load usc2010_001percent.csv into your R session. After loading the
library(tidyverse), browse it. Although this is only a 0.01 percent extract, treat
this as your population for pedagogical purposes. What is the population proportion
of non-White residents?
(b) Setting a seed to 1669482, sample 100 respondents from this sample. What is the
proportion of non-White residents in this particular sample? By how many percentage
points are you off from (what we labelled as) the true proportion?
(c) Now imagine what you did above was one survey. What would we get if we did 20
surveys?
To simulate this, write a loop that does the same exercise 20 times, each time computing a
sample proportion. Use the same seed at the top, but be careful to position the set.seed
function such that it generates the same sequence of 20 samples, rather than 20 of the same
sample.
Try doing this with a for loop and storing your sample proportions in a new length-20
vector. (Suggestion: make an empty vector first as a container). After running the loop,
show a histogram of the 20 values. Also what is the average of the 20 sample estimates?
(d) Now, to make things more real, let’s introduce some response bias. The goal here
is not to correct response bias but to induce it and see how it affects our estimates.
Suppose that non-White residents are 10 percent less likely to respond to enter your
survey than White respondents. This is plausible if you think that the Census is from
2010 but you are polling in 2018, and racial minorities are more geographically mobile
than Whites. Repeat the same exercise in (c) by modeling this behavior.
You can do this by creating a variable, e.g. propensity, that is 0.9 for non-Whites and 1
otherwise. Then, you can refer to it in the propensity argument.
(e) Finally, we want to see if more data (“Big Data”) will improve our estimates. Using the
same unequal response rates framework as (d), repeat the same exercise but instead
of each poll collecting 100 responses, we collect 10,000.
(f) Optional - visualize your 2 pairs of 20 estimates, with a bar showing the “correct”
population average.
Conditional Proportions
This example is not on simulation, but is meant to reinforce some of the probability discus-
sion from math lecture.
Read in the Upshot Siena poll from Fall 2016, data/input/upshot-siena-polls.csv.
In addition to some standard demographic questions, we will focus on one called vt_pres_2
in the csv. This is a two-way presidential vote question, asking respondents who they plan
to vote for President if the election were held today – Donald Trump, the Republican, or
Hilary Clinton, the Democrat, with options for Other candidates as well. For this problem,
use the two-way vote question rather than the 4-way vote question.
12.6. SET.SEED() 223
(a) Drop the the respondents who answered the November poll (i.e. those for which poll
== "November"). We do this in order to ignore this November population in all
subsequent parts of this question because they were not asked the Presidential vote
question.
(b) Using the dataset after the procedure in (a), find the proportion of poll respondents
(those who are in the sample) who support Donald Trump.
(c) Among those who supported Donald Trump, what proportion of them has a Bachelor’s
degree or higher (i.e. have a Bachelor’s, Graduate, or other Professional Degree)?
(d) Among those who did not support Donald Trump (i.e. including supporters of Hi-
lary Clinton, another candidate, or those who refused to answer the question), what
proportion of them has a Bachelor’s degree or higher?
(e) Express the numbers in the previous parts as probabilities of specified events. Define
your own symbols: For example, we can let T be the event that a randomly selected
respondent in the poll supports Donald Trump, then the proportion in part (b) is the
probability P (T ).
(f) Suppose we randomly sampled a person who participated in the survey and found that
he/she had a Bachelor’s degree or higher. Given this evidence, what is the probability
that the same person supports Donald Trump? Use Bayes Rule and show your work
– that is, do not use data or R to compute the quantity directly. Then, verify this is
the case via R.
Write code that will answer the well-known birthday problem via simulation.7
The problem is fairly simple: Suppose k people gather together in a room. What is the
probability at least two people share the same birthday?
To simplify reality a bit, assume that (1) there are no leap years, and so there are always 365
days in a year, and (2) a given individual’s birthday is randomly assigned and independent
from each other.
Step 1: Set k to a concrete number. Pick a number from 1 to 365 randomly, k times to
simulate birthdays (would this be with replacement or without?).
# Your code
Step 2: Write a line (or two) of code that gives a TRUE or FALSE statement of whether or
not at least two people share the same birth date.
# Your code
Step 3: The above steps will generate a TRUE or FALSE answer for your event of interest, but
only for one realization of an event in the sample space. In order to estimate the probability
of your event happening, we need a “stochastic”, as opposed to “deterministic”, method. To
do this, write a loop that does Steps 1 and 2 repeatedly for many times, call that number of
7 This exercise draws from Imai (2017)
224 CHAPTER 12. SIMULATION
times sims. For each of sims iteration, your code should give you a TRUE or FALSE answer.
Code up a way to store these estimates.
# Your code
Step 4: Finally, generalize the function further by letting k be a user-defined number. You
have now created a Monte Carlo simulation!
# Your code
Step 5: Generate a table or plot that shows how the probability of sharing a birthday
changes by k (fixing sims at a large number like 1000). Also generate a similar plot that
shows how the probability of sharing a birthday changes by sims (fixing k at some arbitrary
number like 10).
# Your code
Extra credit: Give an “analytical” answer to this problem, that is an answer through deriving
the mathematical expressions of the probability.
# Your equations
Chapter 13
Check if you have an idea of how you might code the following tasks:
• What does “WYSIWYG” stand for? How would a non-WYSIWYG format text?
• How do you start a header in markdown?
1 Module originally written by Shiro Kuriwaki
225
226 CHAPTER 13. LATEX AND MARKDOWN
13.1 Motivation
Statistical programming is a fast-moving field. The beta version of R was released in 2000,
ggplot2 was released on 2005, and RStudio started around 2010. Of course, some program-
ming technologies are quite “old”: (C in 1969, C++ around 1989, TeX in 1978, Linux in 1991,
Mac OS in 1984). But it is easy to feel you are falling behind in the recent developments of
programming. Today we will do a brief and rough overview of some fundamental and new
tools other than R, with the general aim of having you break out of your comfort zone so
you won’t be shut out from learning these tools in the future.
13.2 Markdown
Markdown is the text we have been using throughout this course! At its core markdown
is just plain text. Plain text does not have any formatting embedded in it. Instead, the
formatting is coded up as text. Markdown is not a WYSIWYG (What you see is what
you get) text editor like Microsoft Word or Google Docs. This will mean that you need to
explicitly code for bold{text} rather than hitting Command+B and making your text look
bold on your own computer.
For italic and bold, use either the asterisks or the underlines,
*italic* **bold**
_italic_ __bold__
# Main Header
## Sub-headers
13.3. LATEX 227
RStudio makes it easy to compile your very first markdown file by giving you templates.
Got to New > R Markdown, pick a document and click Ok. This will give you a skeleton of
a document you can compile – or “knit”.
Rmd is actually a slight modification of real markdown. It is a type of file that R reads and
turns into a proper md file. Then, it uses a document-conversion called pandoc to compile
your md into documents like PDF or HTML.
Multiple software exist where you can edit plain-text (roughly speaking, text that is not
WYSIWYG).
• RStudio (especially for R-related links)
• TeXMaker, TeXShop (especially for TeX)
• emacs, aquamacs (general)
• vim (general)
• Sublime Text (general)
Each has their own keyboard shortcuts and special features. You can browse a couple and
see which one(s) you like.
13.3 LaTeX
LaTeX is a typesetting program. You’d engage with LaTeX much like you engage with your
R code. You will interact with LaTeX in a text editor, and will writing code which will be
interpreted by the LaTeX compiler and which will finally be parsed to form your final PDF.
1. Go to https://www.overleaf.com
2. Scroll down and go to “CREATE A NEW PAPER” if you don’t have an account.
3. Let’s discuss the default template.
4. Make a new document, and set it as your main document. Then type in the Minimal
Working Example (MWE):
228 CHAPTER 13. LATEX AND MARKDOWN
\documentclass{article}
\begin{document}
Hello World
\end{document}
LaTeX is a very stable system, and few changes to it have been made since the 1990s. The
main benefit: better control over how your papers will look; better methods for writing
equations or making tables; overall pleasing aesthetic.
1. Open a plain text editor. Then type in the MWE
\documentclass{article}
\begin{document}
Hello World
\end{document}
2. Save this as hello_world.tex. Make sure you get the file extension right.
3. Open this in your “LaTeX” editor. This can be TeXMaker, Aqumacs, etc..
4. Go through the click/dropdown interface and click compile.
LaTeX can cover most of your typesetting needs, to clean equations and intricate diagrams.
Some main commands you’ll be using are below, and a very concise cheat sheet here: https:
//wch.github.io/latexsheet/latexsheet.pdf
Most involved features require that you begin a specific “environment” for that feature,
clearly demarcating them by the notation \begin{figure} and then \end{figure}, e.g. in
the case of figures.
\begin{figure}
\includegraphics{histogram.pdf}
\end{figure}
where histogram.pdf is a path to one of your files.
Notice that each line starts with a backslash \ – in LaTeX this is the symbol to run a
command.
The following syntax at the endpoints are shorthand for math equations.
\[\int x^2 dx\]
∫
these compile math symbols: x2 dx.2
The align environment is useful to align your multi-line math, for example.
2 Enclosing with $$ instead of \[ also has the same effect, so you may see it too. But this is now discouraged
due to its inflexibility.
13.4. BIBTEX 229
\begin{align}
P(A \mid B) &= \frac{P(A \cap B)}{P(B)}\\
&= \frac{P(B \mid A)P(A)}{P(B)}
\end{align}
P (A ∩ B)
P (A | B) = (13.1)
P (B)
P (B | A)P (A)
= (13.2)
P (B)
Regression tables should be outputted as .tex files with packages like xtable and
stargazer, and then called into LaTeX by \input{regression_table.tex} where
regression_table.tex is the path to your regression output.
Figures and equations should be labelled with the tag (e.g. label{tab:regression} so
that you can refer to them later with their tag Table \ref{tab:regression}, instead of
hard-coding Table 2).
For some LaTeX commands you might need to load a separate package that someone else
has written. Do this in your preamble (i.e. before \begin{document}):
\usepackage[options]{package}
where package is the name of the package and options are options specific to the package.
Further Guides
For a more comprehensive listing of LaTeX commands, Mayya Komisarchik has a great
tutorial set of folders: https://scholar.harvard.edu/mkomisarchik/tutorials-0
There is a version of LaTeX called Beamer, which is a popular way of making a slideshow.
Slides in markdown is also a competitor. The language of Beamer is the same as LaTeX
but has some special functions for slides.
13.4 BibTeX
BibTeX is a reference system for bibliographical tests. We have a .bib file separately on our
computer. This is also a plain text file, but it encodes bibliographical resources with special
syntax so that a program can rearrange parts accordingly for different citation systems.
For example, here is the Nunn and Wantchekon article entry in .bib form.
230 CHAPTER 13. LATEX AND MARKDOWN
Figure 13.2
Figure 13.3
@article{nunn2011slave,
title={The Slave Trade and the Origins of Mistrust in Africa},
author={Nunn, Nathan and Wantchekon, Leonard},
journal={American Economic Review},
volume={101},
number={7},
pages={3221--3252},
year={2011}
}
The first entry, nunn2011slave, is “pick your favorite” – pick your own name for your
reference system. The other slots in this @article entry are entries that refer to specific
bibliographical text.
as part of your text, then when the .tex file is compiled the PDF shows something like
in whatever citation style (APSA, APA, Chicago) you pre-specified!
Also at the end of your paper you will have a bibliography with entries ordered and formatted
in the appropriate citation.
This is a much less frustrating way of keeping track of your references – no need to hand-edit
formatting the bibliography to conform to citation rules (which biblatex already knows) and
no need to update your bibliography as you add and drop references (biblatex will only show
entries that are used in the main text).
13.4. BIBTEX 231
You should keep your own .bib file that has all your bibliographical resources. Storing
entries is cheap (does not take much memory), so it is fine to keep all your references in one
place (but you’ll want to make a new one for collaborative projects where multiple people
will compile a .tex file).
For example, Gary’s BibTeX file is here: https://github.com/iqss-research/gkbibtex/
blob/master/gk.bib
Citation management software (Mendeley or Zotero) automatically generates .bib entries
from your library of PDFs for you, provided you have the bibliography attributes right.
Exercise
Create a LaTeX document for a hypothetical research paper on your laptop and, once you’ve
verified it compiles into a PDF, come show it to either one of the instructors.
You can also use overleaf if you have preference for a cloud-based system. But don’t swallow
the built-in templates without understanding or testing them.
Each student will have slightly different substantive interests, so we won’t impose much of
a standard. But at a minimum, the LaTeX document should have:
• A title, author, date, and abstract
• Sections
• Italics and boldface
• A figure with a caption and in-text reference to it.
Depending on your subfield or interests, try to implement some of the following:
• A bibliographical reference drawing from a separate .bib file
• A table
• A math expression
• A different font
• Different page margins
• Different line spacing
But we should be aware that too much slant towards math and programming can miss the
point:
To be clear, PhD training in Econ (first year) is often a disaster– like how to prove the
Central Limit Theorem (the LeBron James of Statistics) with polar-cooardinates. This is
mostly a way to demoralize actual economists and select a bunch of unimaginative math
jocks.
— Amitabh Chandra (?) August 14, 2018
Keep on learning, trying new techniques to improve your work, and learn from others!
What #rstats tricks did it take you way too long to learn? One of mine is using readRDS
and saveRDS instead of repeatedly loading from CSV
— Emily Riederer (?) August 19, 2017
Please tell us how we can improve the Prefresher: The Prefresher is a work in progress, with
material mainly driven by graduate students. Please tell us how we should change (or not
change) each of its elements:
https://harvard.az1.qualtrics.com/jfe/form/SV_esbzN8ZFAOPTqiV
Chapter 14
Text1
14.1 Review
• " and ' are usually equivalent.
• <- and = are usually interchangeable2 . (x <- 3 is equivalent to x = 3, although the
former is more preferred because it explicitly states the assignment).
• Use ( ) when you are giving input to a function:
# my_results <- FunctionName(FunctionInputs)
233
234 CHAPTER 14. TEXT
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Today, we will learn more about using text data. Our objectives are:
• Reading and writing in text in R.
• To learn how to use paste and sprintf;
• To learn how to use regular expressions;
• To learn about other tools for representing + analyzing text in R.
paste and sprintf are useful commands in text processing, such as for automatically naming
files or automatically performing a series of command over a subset of your data. Table
making also will often need these commands.
Paste concatenates vectors together.
14.5. REGULAR EXPRESSIONS 235
A regular expression is a special text string for describing a search pattern. They are most
often used in functions for detecting, locating, and replacing desired text in a corpus.
Use cases:
1. TEXT PARSING. E.g. I have 10000 congressional speaches. Find all those which
mention Iran.
2. WEB SCRAPING. E.g. Parse html code in order to extract research information from
an online table.
3. CLEANING DATA. E.g. After loading in a dataset, we might need to remove mistakes
from the dataset, orsubset the data using regular expression tools.
Example in R. Extract the tweet mentioning Indonesia.
s1 <- "If only Bradley's arm was longer. RT"
s2 <- "Share our love in Indonesia and in the World. RT if you agree."
my_string <- c(s1, s2)
grepl(my_string, pattern = "Indonesia")
## [1] "Share our love in Indonesia and in the World. RT if you agree."
Key point: Many R commands use regular expressions. See ?grepl. Assume that x is a
character vector and that pattern is the target pattern. In the earlier example, x could
236 CHAPTER 14. TEXT
have been something like my_string and pattern would have been “Indonesia”. Here are
other key uses:
1. DETECT PATTERNS. grepl(pattern, x) goes through all the entries of x and
returns a string of TRUE and FALSE values of the same size as x. It will return
a TRUE whenever that string entry has the target pattern, and FALSE whenever it
doesn’t.
2. REPLACE PATTERNS. gsub(pattern, x, replacement) goes through all the en-
tries of x replaces the pattern with replacement.
gsub(x = my_string,
pattern = "o",
replacement = "AAAA")
## [1] 3 -1
attr(regex_object, "useBytes")
## [1] TRUE
regexpr(pattern = "was", text = my_string)[1]
## [1] 23
regexpr(pattern = "was", text = my_string)[2]
## [1] -1
Seems simple? The problem: the patterns can get pretty complex!
Some types of symbols are stand in for some more complex thing, rather than taken literally.
[[:digit:]] Matches with all digits.
[[:lower:]] Matches with lower case letters.
[[:alpha:]] Matches with all alphabetic characters.
[[:punct:]] Matches with all punctuation characters.
[[:cntrl:]] Matches with “control” characters such as \n, \r, etc.
14.5. REGULAR EXPRESSIONS 237
Example in R:
my_string <- "Do you think that 34% of apples are red?"
gsub(my_string, pattern = "[[:digit:]]", replace ="DIGIT")
Certain characters (such as ., *, \) have special meaning in the regular expressions frame-
work (they are used to form conditional patterns as discussed below). Thus, when we want
our pattern to explicitly include those characters as characters, we must “escape” them by
using \ or encoding them in \Q…\E.
Example in R:
my_string <- "Do *really* think he will win?"
gsub(my_string, pattern = "\\*", replace ="")
## [1] "Now be brave! Dread what comrades say of you here in combat! "
[] The target characters to match are located between the brackets. For example, [aAbB]
will match with the characters a, A, b, B.
[^...] Matches with everything except the material between the brackets. For example,
[^aAbB] will match with everything but the characters a, A, b, B.
(?=) Lookahead – match something that IS followed by the pattern.
(?!) Negative lookahead — match something that is NOT followed by the pattern.
(?<=) Lookbehind – match with something that follows the pattern.
my_string <- "Do you think that 34%of the 23%of apples are red?"
gsub(my_string, pattern = "(?<=%)", replace = " ", perl = TRUE)
## [1] "Do you think that 34% of the 23% of apples are red?"
my_string <- c("legislative1_term1.png",
"legislative1_term1.pdf",
"legislative1_term2.png",
"legislative1_term2.pdf",
238 CHAPTER 14. TEXT
"term2_presidential1.png",
"presidential1.png",
"presidential1_term2.png",
"presidential1_term1.pdf",
"presidential1_term2.pdf")
## [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
• Indicates which file names don’t start with presidential1 but do end in .png
• ^ indicates that the pattern should start at the beginning of the string.
• ?! indicates negative lookahead – we’re looking for any pattern NOT following presi-
dential1 which meets the subsequent conditions. (see below)
• The first . indicates that, following the negative lookahead, there can be any charac-
ters and the * says that it doesn’t matter how many. Note that we have to escape the
. in .png. (by writing \\. instead of just .)
You will have the chance to try out some regular expressions for yourself at the end!
Stemming is the process of reducing inflected/derived words to their word stem or base
(e.g. stemming, stemmed, stemmer –> stem*)
Exercises
## [1] "15.03322123 of spontaneous events are puzzles in the mind. \n Really, 15.03?"
240 CHAPTER 14. TEXT
Using grepl, these materials, Google, and your friends, describe what the following com-
mand does. What changes when value = FALSE?
grep('\'',
c("To dare is to lose one's footing momentarily.", "To not dare is to lose oneself."), valu
Write code to automatically extract the file names that DO end start with presidential and
DO end in .pdf
my_string <- c("legislative1_term1.png",
"legislative1_term1.pdf",
"legislative1_term2.png",
"legislative1_term2.pdf",
"term2_presidential1.png",
"presidential1.png",
"presidential1_term2.png",
"presidential1_term1.pdf",
"presidential1_term2.pdf")
Using the same string as in the above, write code to automatically extract the file names
that end in .pdf and that contain the text term2.
# Your code here
Combine these two strings into a single string separated by a “-”. Desired output: “The
carbonyl group in aldehydes and ketones is an oxygen analog of the carbon–carbon double
bond.”
14.7. IMPORTANT PACKAGES FOR PARSING TEXT 241
Command-line, git1
Check if you have an idea of how you might code the following tasks:
• What is a GUI?
• What do the following commands stand for in shell: ls (or dir in Windows), cd, rm,
mv (or move in windows), cp (or copy in Windows).
• What is the difference between a relative path and an absolute path?
• What paths do these refer to in shell/terminal: ~/, ., ..
• What is a repository in github?
• What does it mean to “clone” a repository?
15.3 command-line
243
244 CHAPTER 15. COMMAND-LINE, GIT
form. Although there are good enough GUIs for most of your needs, you still might need to
go under the hood sometimes and run a command.
## 01_warmup.Rmd
## 02_linear-algebra.Rmd
## 03_functions.Rmd
## 04_limits.Rmd
## 05_calculus.Rmd
## 06_optimization.Rmd
## 07_probability.Rmd
## 11_data-handling_counting.Rmd
## 12_matricies-manipulation.Rmd
## 13_visualization.Rmd
## 14_functions_obj_loops.Rmd
## 15_project-dempeace.Rmd
## 16_simulation.Rmd
## 17_non-wysiwyg.Rmd
## 18_text.Rmd
## 19_command-line_git.Rmd
## 21_solutions-warmup.Rmd
## 23_solution_programming.Rmd
## _book
## _bookdown_files
## _bookdown.yml
## _build.sh
## CODE_OF_CONDUCT.md
## CONTRIBUTING.md
## data
## _deploy.sh
## DESCRIPTION
## images
## index.Rmd
## LICENSE
## _output.yml
## preamble.tex
## prefresher_files
## prefresher.Rmd
## prefresher.Rproj
## README.md
## style.css
15.3. COMMAND-LINE 245
## /home/travis/build/IQSS/prefresher
cd means change directory. You need to give it what to change your current directory to.
You can specify a name of another directory in your directory.
Or you can go up to your parent directory. The syntax for that are two periods, .. . One
period . refers to the current directory.
cd ..
pwd
## /home/travis/build/IQSS
~/ stands for your home directory defined by your computer.
cd ~/
ls
## apt-get-update.log
## bin
## build
## builds
## build.sh
## filter.rb
## gopath
## otp
## perl5
## R-bin
## texlive
## virtualenv
Using .. and . are “relative” to where you are currently at. So are things like
figures/figure1.pdf, which is implicitly writing ./figures/figure1.pdf. These are
called relative paths. In contrast, /Users/shirokuriwaki/project1/figures/figure1.pdf
is an “absolute” path because it does not start from your current directory.
Relative paths are nice if you have a shared Dropbox, for example, and I had
/Users/shirokuriwaki/mathcamp but Connor’s path to the same folder is /Users/connorjerzak/mathcamp.
To run the same code in mathcamp, we should be using relative paths that start from
“mathcamp”. Relative paths are also shorter, and they are invariant to higher-level changes
in your computer.
Suppose you have a simple Rscript, call it hello_world.R. This is simply a plain text file
that contains
cat("Hello World")
246 CHAPTER 15. COMMAND-LINE, GIT
This should give you the output Hello World, which verifies that you “executed” the file
with R via the command-line.
If you know exactly what you want to do your files and the changes are local, then command-
line might be faster and be more sensible than navigating yourself through a GUI. For
example, what if you wanted a single command that will run 10 R scripts successively at
once (as Gentzkow and Shapiro suggest you should do in your research)? It is tedious to run
each of your scripts on Rstudio, especially if running some take more than a few minutes.
Instead you could write a “batch” script that you can run on the terminal,
Rscript 01_read_data.R
Rscript 02_merge_data.R
Rscript 03_run_regressions.R
Rscript 04_make_graphs.R
Rscript 05_maketable.R
On the other hand, command-line prompts may require more keystrokes, and is also less
intuitive than a good GUI. It can also be dangerous for beginners, because it can allow you
to make large irreversible changes inadvertently. For example, removing a file (rm) has no
“Undo” feature.
15.4 git
Git is a tool for version control. It comes pre-installed on Macs, you will probably need to
install it yourself on Windows.
As you might have noticed from all the quoted terms, git uses a lot of its own terms that are
not intuitive and hard to remember at first. The nuts and bolts of maintaining your version
control further requires “adding”, “committing”, and “push”ing, sometimes “pull”ing.
The tutorial https://try.github.io/ is quite good. You’d want to have familiarity with
command-line to fully understand this and use it in your work.
RStudio Projects has a great git GUI as well.
While git is a powerful tool, you may choose to not use it for everything because
• git is mainly for code, not data. It has a fairly strict limit on the size of your dataset
that you cover.
• your collaborators might want to work with Dropbox
• unless you get a paid account, all your repositories will be public.
248 CHAPTER 15. COMMAND-LINE, GIT
Part III
Solutions
249
Solutions to Warmup Questions
Linear Algebra
Vectors
1 4
Define the vectors u = 2, v = 5, and the scalar c = 2.
3 6
5
1. u + v = 7
9
8
2. cv = 10
12
3. u · v = 1(4) + 2(5) + 3(6) = 32
If you are having trouble with these problems, please review Section 1.1 “Working with
Vectors” in Chapter 1.
Are the following sets of vectors linearly independent?
( ) ( )
1 2
1. u = ,v=
2 4
⇝ No: ( ) ( )
2 2
2u = ,v =
4 4
so infinitely many linear combinations of u and v that amount to 0 exist.
1 3
2. u = 2, v = 7
5 9
⇝ Yes: we cannot find linear combination of these two vectors that would amount to zero.
2 3 5
3. a = −1, b = −4, c = −10
1 −2 −8
251
252
⇝ No: After playing around with some numbers, we can find that
−4 9 −5
−2a = 2 , 3b = −12 , −1c = 10
−2 −6 8
So
0
−2a + 3b − c = 0
0
i.e., a linear combination of these three vectors that would amount to zero exists.
If you are having trouble with these problems, please review Section 1.2.
Matrices
7 5 1
11 9 3
A=
2 14
21
4 1 5
1 2 8
3 9 11
B=
4
7 5
5 1 9
8 7 9
14 18 14
A+B=
6 21 26
9 2 14
Given that
1 2 8
C = 3 9 11
4 7 5
Given that
253
c=2
14 10 2
22 18 6
cA =
4 28
42
8 2 10
If you are having trouble with these problems, please review Section 1.3.
Operations
Summation
∑
3 ∑
3 ∑
3
2. (3k + 2) = 3 k+ 2 = 3 × 6 + 3 × 2 = 24
k=1 k=1 k=1
∑
4 ∑
4 ∑
4 ∑
4
3. (3k + i + 2) = 3 k+ i+ 2 = 12k + 10 + 8 = 12k + 18
i=1 i=1 i=1 i=1
Products
∏
3
1. i=1·2·3=6
i=1
∏
3
2. (3k + 2) = (3 + 2) · (6 + 2) · (9 + 2) = 440
k=1
Limits
1. lim (x − 1) = 1
x→2
(x−2)(x−1) (x−2)(x−1)
2. lim (x−2) = 1, though note that the original function (x−2) would have been
x→2
undefined at x = 2 because of a divide by zero problem; otherwise it would have been
equal to x − 1.
3. lim x −3x+2
2
Calculus
For each of the following functions f (x), find the derivative f ′ (x) or d
dx f (x)
1. f (x) = c, f ′ (x) = 0
2. f (x) = x, f ′ (x) = 1
3. f (x) = x2 , f ′ (x) = 2x
4. f (x) = x3 , f ′ (x) = 3x2
5. f (x) = 3x2 + 2x1/3 , f ′ (x) = 6x + 23 x−2/3
6. f (x) = (x3 )(2x4 ), f ′ (x) = dx
d
2x7 = 14x6
Optimization
For each of the followng functions f (x), does a maximum and minimum exist in the domain
x ∈ R? If so, for what are those values and for which values of x?
If you are stuck, please try sketching out a picture of each of the functions.
255
Probability
(12 ) 12·11·10·9
1. If there are 12 cards, numbered 1 to 12, and 4 cards are chosen, 4 = 4! = 495
possible hands exist (unordered, without replacement) .
2. Let A = {1, 3, 5, 7, 8} and B = {2, 4, 7, 8, 12, 13}. Then A∪B = {1, 2, 3, 4, 5, 7, 8, 12, 13},
A ∩ B = {7, 8}? If A is a subset of the Sample Space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
then the complement AC = {2, 4, 6, 9, 10}
3. If we roll two fair dice, what is the probability that their sum would be 11? ⇝ 1
18
4. If we roll two fair dice, what is the probability that their sum would be 12? ⇝ 36 1
.
2
There are two independent dice, so 6 = 36 options in total. While the previous
question had two possibilities for a sum of 11 (5,6 and 6,5), there is only one possibility
out of 36 for a sum of 12 (6,6).
For a review, please see Sections 6.2 - 6.3
256
Suggested Programming
Solutions
library(dplyr)
library(readr)
library(ggplot2)
library(ggrepel)
library(forcats)
library(scales)
1 State Proportions
Group by state, noting that the mean of a set of logicals is a mean of 1s (TRUE) and 0s
(FALSE).
grp_st <- cen10 %>%
group_by(state) %>%
summarize(prop = mean(age >= 65)) %>%
arrange(prop) %>%
mutate(state = as_factor(state))
Plot points
ggplot(grp_st, aes(x = state, y = prop)) +
geom_point() +
coord_flip() +
scale_y_continuous(labels = percent_format(accuracy = 1)) + # use the scales package to format
labs(
y = "Proportion Senior",
x = "",
257
258
West Virginia
Maine
Vermont
Florida
South Carolina
Arkansas
Montana
Wyoming
Rhode Island
New Mexico
Indiana
Iowa
Ohio
Alabama
Arizona
Pennsylvania
Maryland
South Dakota
New Hampshire
Michigan
Connecticut
Kansas
Wisconsin
North Dakota
Mississippi
District of Columbia
New York
Missouri
New Jersey
Tennessee
Massachusetts
North Carolina
Nevada
Illinois
Louisiana
Nebraska
Delaware
Oklahoma
Hawaii
Oregon
Georgia
Washington
Virginia
Kentucky
Minnesota
California
Idaho
Texas
Colorado
Utah
Alaska
5% 10% 15% 20%
Proportion Senior
Source: 2010 Census sample
2 Swing Justice
## Joining, by = "justice"
All together
ggplot(df_indicator, aes(x = term, y = idealpt, group = justice_id)) +
geom_line(aes(y = median_idealpt), color = "red", size = 2, alpha = 0.1) +
geom_line(alpha = 0.5) +
15.6. CHAPTER ??: OBJECTS AND LOOPS 259
5.0
Estimated Martin−Quinn Ideal Point
Thomas
2.5
Alito
Gorsuch
Roberts
0.0
Kennedy
Breyer
Kagan
−2.5 Ginsburg
Sotomayor
−5.0
Checkpoint #3
cen10 %>%
group_by(state) %>%
summarise(avg_age = mean(age)) %>%
arrange(desc(avg_age)) %>%
slice(1:10)
## # A tibble: 10 x 2
## state avg_age
## <chr> <dbl>
## 1 West Virginia 44.1
## 2 Maine 42.1
## 3 Florida 41.3
## 4 New Hampshire 41.2
## 5 North Dakota 41.1
## 6 Montana 40.6
## 7 Vermont 40.3
## 8 Connecticut 40.1
## 9 Wisconsin 39.9
## 10 New Mexico 39.3
Exercise 1
colnames(sample_acs)
## [1] 0.9419765
15.6. CHAPTER ??: OBJECTS AND LOOPS 261
Exercise 3
print(state_i)
print(table(state_subset$race, state_subset$sex))
}
## [1] "California"
##
## Female Male
## American Indian or Alaska Native 21 21
## Black/Negro 127 126
## Chinese 76 65
## Japanese 15 12
## Other Asian or Pacific Islander 182 177
## Other race, nec 283 302
## Three or more major races 7 7
## Two major races 91 83
## White 1085 1083
## [1] "Massachusetts"
##
## Female Male
## American Indian or Alaska Native 4 1
## Black/Negro 21 17
## Chinese 8 7
## Japanese 1 1
## Other Asian or Pacific Islander 14 14
## Other race, nec 9 17
## Two major races 10 8
## White 272 243
## [1] "New Hampshire"
##
## Female Male
## American Indian or Alaska Native 1 0
## Black/Negro 0 1
## Chinese 0 1
## Japanese 1 0
## Other Asian or Pacific Islander 2 1
## Other race, nec 1 0
## Two major races 0 1
## White 66 63
## [1] "Washington"
##
262
## Female Male
## American Indian or Alaska Native 9 5
## Black/Negro 11 9
## Chinese 2 7
## Japanese 4 0
## Other Asian or Pacific Islander 28 18
## Other race, nec 19 18
## Three or more major races 0 2
## Two major races 17 16
## White 267 257
Exercise 4
Then
for (state in states_of_interest) {
for (race in unique(cen10$race)) {
race_state_num <- nrow(cen10[cen10$race == race & cen10$state == state, ])
state_pop <- nrow(cen10[cen10$state == state, ])
race_perc <- round(100 * (race_state_num / (state_pop)), digits = 2)
line <- data.frame(race_d = race, state_d = state, proportion_d = race_perc)
answer <- rbind(answer, line)
}
}
for(i in 1:nrow(mid_b)) {
x <- data_frame(ccode = mid_b$ccode[i], ## row i's country
year = mid_b$styear[i]:mid_b$endyear[i], ## sequence of years for dispute in row i
dispute = 1)## there was a dispute
mid_y_by_y <- rbind(mid_y_by_y, x)
}
#don't include the -88, -77, -66 values in calculating the mean of polity
mean_polity_by_year <- merged_mid_polity %>% group_by(year) %>% summarise(mean_polity = mean(poli
## age = col_double(),
## race = col_character()
## )
mean(pop$race != "White")
## [1] 0.2806517
set.seed(1669482)
samp <- sample_n(pop, 100)
mean(samp$race != "White")
## [1] 0.22
ests <- c()
set.seed(1669482)
for (i in 1:20) {
samp <- sample_n(pop, 100)
ests[i] <- mean(samp$race != "White")
}
mean(ests)
pop_with_prop <- mutate(pop, propensity = ifelse(race != "White", 0.9, 1))
for (i in 1:20) {
samp <- sample_n(pop_with_prop, 100, weight = propensity)
ests[i] <- mean(samp$race != "White")
}
mean(ests)
ests <- c()
set.seed(1669482)
for (i in 1:20) {
samp <- sample_n(pop_with_prop, 10000, weight = propensity)
ests[i] <- mean(samp$race != "White")
}
mean(ests)