0% found this document useful (0 votes)
6 views264 pages

P Refresher

The document is a math refresher booklet designed for political scientists, covering essential mathematical concepts such as linear algebra, calculus, optimization, and probability theory. It includes exercises and examples to reinforce understanding of topics like limits, derivatives, and probability distributions. The content is structured to aid political scientists in applying mathematical principles to their research and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views264 pages

P Refresher

The document is a math refresher booklet designed for political scientists, covering essential mathematical concepts such as linear algebra, calculus, optimization, and probability theory. It includes exercises and examples to reinforce understanding of topics like limits, derivatives, and probability distributions. The content is structured to aid political scientists in applying mathematical principles to their research and analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 264

Math Prefresher for Political Scientists

July 2019
2
Contents

About this Booklet 9

Pre-Prefresher Exercises 11
Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

I Math 15
1 Linear Algebra 17
1.1 Working with Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Basics of Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Systems of Equations as Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Finding Solutions to Augmented Matrices and Systems of Equations . . . . . 24
1.7 Rank — and Whether a System Has One, Infinite, or No Solutions . . . . . . 26
1.8 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.9 Linear Systems and Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.10 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.11 Getting Inverse of a Matrix using its Determinant . . . . . . . . . . . . . . . 30
Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Functions and Operations ∑ ∏ 35


2.1 Summation Operators and . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Introduction to Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3 log and exp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Graphing Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Solving for Variables and Finding Roots . . . . . . . . . . . . . . . . . . . . . 40
2.6 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3
4 CONTENTS

3 Limits 45
Example: The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 45
Example: The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.1 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 The Limit of a Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Limits of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Answers to Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4 Calculus 57
Example: The Mean is a Type of Integral . . . . . . . . . . . . . . . . . . . . . . . 57
4.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Higher-Order Derivatives (Derivatives of Derivatives of Derivatives) . . . . . . 61
4.3 Composite Functions and the Chain Rule . . . . . . . . . . . . . . . . . . . . 62
4.4 Derivatives of natural logs and the exponent . . . . . . . . . . . . . . . . . . . 63
4.5 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6 Taylor Series Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.7 The Indefinite Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.8 The Definite Integral: The Area under the Curve . . . . . . . . . . . . . . . . 71
4.9 Integration by Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.10 Integration by Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5 Optimization 81
Example: Meltzer-Richard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.1 Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Concavity of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 FOC and SOC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Global Maxima and Minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Inequality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.7 Kuhn-Tucker Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.8 Applications of Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . 103

6 Probability Theory 105


6.1 Counting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
6.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.3 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.4 Conditional Probability and Bayes Rule . . . . . . . . . . . . . . . . . . . . . 109
6.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.6 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.7 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.8 Joint Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.9 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.10 Variance and Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.11 Special Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.12 Summarizing Observed Events (Data) . . . . . . . . . . . . . . . . . . . . . . 123
6.13 Asymptotic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
CONTENTS 5

Answers to Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 126

II Programming 131
7 Orientation and Reading in Data 133
7.1 Motivation: Data and You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.2 Orienting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
7.3 The Computer and You: Giving Instructions . . . . . . . . . . . . . . . . . . 138
7.4 Base-R vs. tidyverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
7.5 A is for Athens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8 Manipulating Vectors and Matrices 149


8.1 Basics - Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.3 Read Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.4 data.frame vs. matricies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.5 Speed considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
8.6 Handling matricies in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.7 Variable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.8 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

9 Visualization 171
9.1 Motivation: The Law of the Census . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2 Read data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.3 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.4 Tabulating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.5 base R graphics and ggplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.6 Improving your graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.7 Cross-tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.8 Composition Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
9.9 Line graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

10 Objects, Functions, Loops 191


10.1 What is an object? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10.2 Making your own objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
10.3 Types of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
10.4 What is a function? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Checkpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
10.5 What is a package? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.6 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.7 For-loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
10.8 Nested Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6 CONTENTS

11 Joins and Merges, Wide and Long 207


Where are we? Where are we headed? . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
11.2 Setting up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.3 Create a project directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.4 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
11.5 Example with 2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
11.6 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
11.7 Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
11.8 Main Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

12 Simulation 215
12.1 Motivation: Simulation as an Analytical Tool . . . . . . . . . . . . . . . . . . 216
12.2 Pick a sample, any sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.3 The sample() function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.4 Random numbers from specific distributions . . . . . . . . . . . . . . . . . . . 219
12.5 r, p, and d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
12.6 set.seed() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

13 LaTeX and markdown 225


13.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.2 Markdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.3 LaTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
13.4 BibTeX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Concluding the Prefresher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

14 Text 233
Where are we? Where are we headed? . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.2 Goals for today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.3 Reading and writing text in R . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.4 paste() and sprintf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.5 Regular expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14.6 Representing Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.7 Important packages for parsing text . . . . . . . . . . . . . . . . . . . . . . . 239
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

15 Command-line, git 243


15.1 Where are we? Where are we headed? . . . . . . . . . . . . . . . . . . . . . . 243
15.2 Check your understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
15.3 command-line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
15.4 git . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

III Solutions 249


Solutions to Warmup Questions 251
CONTENTS 7

Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251


Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Suggested Programming Solutions 257


15.5 Chapter 9: Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
15.6 Chapter 10: Objects and Loops . . . . . . . . . . . . . . . . . . . . . . . . . . 259
15.7 Chapter 11: Demoratic Peace Project . . . . . . . . . . . . . . . . . . . . . . 262
15.8 Chapter 12: Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
8 CONTENTS
About this Booklet

The Harvard Gov Prefresher is held each year in August. All relevant information is on
our website, including the day-to-day schedule. The 2019 Prefresher instructors are Shannon
Parker and Meg Schwenzfeier, and the faculty sponsor is Gary King.
This booklet serves as the text for the Prefresher, available as a webpage and as a printable
PDF. It is the product of generations of Prefresher instructors. See below for a full list of
instructors and contributors.

Authors and Contributors

• Authors and Instructors: Curt Signorino 1996-1997; Ken Scheve 1997-1998; Eric Dick-
son 1998-2000; Orit Kedar 1999; James Fowler 2000-2001; Kosuke Imai 2001-2002;
Jacob Kline 2002; Dan Epstein 2003; Ben Ansell 2003-2004; Ryan Moore 2004-2005;
Mike Kellermann 2005-2006; Ellie Powell 2006-2007; Jen Katkin 2007-2008; Patrick
Lam 2008-2009; Viridiana Rios 2009-2010; Jennifer Pan 2010-2011; Konstantin Kashin
2011-2012; Soledad Prillaman 2013; Stephen Pettigrew 2013-2014; Anton Strezhnev
2014-2015; Mayya Komisarchik 2015-2016; Connor Jerzak 2016-2017; Shiro Kuriwaki
2017-2018; Yon Soo Park 2018
• Repository Maintainer: Shiro Kuriwaki (kuriwaki)
• Contributors: Thanks to Juan Dodyk (juandodyk), Hunter Rendleman (hrendleman),
and Tyler Simko (tylersimko) for contributing to the booklet for corrections and im-
provements as students.

Contributing

We transitioned the booklet into a bookdown github repository in 2018. As we update this
version, we appreciate any bug reports or fixes appreciated.
All changes should be made in the .Rmd files in the project root. Changes pushed to the
repository will be checked for compilation by Travis-CI. To contribute a change, please make
a pull request and set the repository maintainer as the reviewer.

9
10 CONTENTS
Pre-Prefresher Exercises

Before our first meeting, please try solving these questions. They are a sample of the very
beginning of each math section. We have provided links to the parts of the book you can
read if the concepts are new to you.
The goal of this “pre”-prefresher assignment is not to intimidate you but to set common
expectations so you can make the most out of the actual Prefresher. Even if you do not
understand some or all of these questions after skimming through the linked sections, your
effort will pay off and you will be better prepared for the math prefresher. We are also open
to adjusting these expectations based on feedback (this class is for you), so please do not
hesitate to write to the instructors for feedback.

Linear Algebra

Vectors
   
1 4
Define the vectors u = 2 , v = 5, and the scalar c = 2. Calculate the following:

3 6
1. u + v
2. cv
3. u · v
If you are having trouble with these problems, please review Section 1.1 “Working with
Vectors” in Chapter 1.
Are the following sets of vectors linearly independent?
( ) ( )
1 2
1. u = ,v=
2 4
   
1 3
2. u = 2, v = 7
5 9
     
2 3 5
3. a = −1, b = −4, c = −10 (this requires some guesswork)
1 −2 −8

11
12 CONTENTS

If you are having trouble with these problems, please review Section 1.2.

Matrices
 
7 5 1
11 9 3
A=
 2 14

21
4 1 5

What is the dimensionality of matrix A?


What is the element a23 of A?
Given that

 
1 2 8
3 9 11
B=
4

7 5
5 1 9

What is A + B?
Given that

 
1 2 8
C = 3 9 11
4 7 5

What is A + C?
Given that

c=2

What is cA?
If you are having trouble with these problems, please review Section 1.3.

Operations

Summation

Simplify the following



3
1. i
i=1
CONTENTS 13


3
2. (3k + 2)
k=1


4
3. (3k + i + 2)
i=1

Products


3
1. i
i=1


3
2. (3k + 2)
k=1

To review this material, please see Section 2.1.

Logs and exponents

Simplify the following

1. 42
2. 42 23
3. log10 100
4. log2 4
5. log e, where log is the natural log (also written as ln) – a log with base e, and e is
Euler’s constant
6. ea eb ec , where a, b, c are each constants
7. log 0
8. e0
9. e1
10. log e2

To review this material, please see Section 2.3

Limits

Find the limit of the following.

1. lim (x − 1)
x→2
(x−2)(x−1)
2. lim (x−2)
x→2
x2 −3x+2
3. lim x−2
x→2

To review this material please see Section 3.3


14 CONTENTS

Calculus
For each of the following functions f (x), find the derivative f ′ (x) or d
dx f (x)

1. f (x) = c
2. f (x) = x
3. f (x) = x2
4. f (x) = x3
5. f (x) = 3x2 + 2x1/3
6. f (x) = (x3 )(2x4 )
For a review, please see Section 4.1 - 4.2

Optimization
For each of the followng functions f (x), does a maximum and minimum exist in the domain
x ∈ R? If so, for what are those values and for which values of x?
1. f (x) = x
2. f (x) = x2
3. f (x) = −(x − 2)2
If you are stuck, please try sketching out a picture of each of the functions.

Probability
1. If there are 12 cards, numbered 1 to 12, and 4 cards are chosen, how many distinct
possible choices are there? (unordered, without replacement)
2. Let A = {1, 3, 5, 7, 8} and B = {2, 4, 7, 8, 12, 13}. What is A ∪ B? What is A ∩ B? If A
is a subset of the Sample Space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, what is the complement
AC ?
3. If we roll two fair dice, what is the probability that their sum would be 11?
4. If we roll two fair dice, what is the probability that their sum would be 12?
For a review, please see Sections 6.2 - 6.3.
Part I

Math

15
Chapter 1

Linear Algebra

Topics: • Working with Vectors • Linear Independence • Basics of Matrix Algebra • Square
Matrices • Linear Equations • Systems of Linear Equations • Systems of Equations as
Matrices • Solving Augmented Matrices and Systems of Equations • Rank • The Inverse of
a Matrix • Inverse of Larger Matrices

1.1 Working with Vectors

Vector: A vector in n-space is an ordered list of n numbers. These numbers can be


represented as either a row vector or a column vector:
 
v1
( )  v2 
 
v v1 v2 ... vn , v =  . 
 .. 
vn

We can also think of a vector as defining a point in n-dimensional space, usually Rn ; each
element of the vector defines the coordinate of the point in a particular direction.
Vector Addition and Subtraction: If two vectors, u and v, have the same length
(i.e. have the same number of elements), they can be added (subtracted) together:
( )
u + v = u1 + v1 u2 + v2 ··· uk + vn
( )
u − v = u1 − v1 u2 − v2 ··· uk − vn

Scalar Multiplication: The product of a scalar c (i.e. a constant) and vector v is:
( )
cv = cv1 cv2 ... cvn

17
18 CHAPTER 1. LINEAR ALGEBRA

Vector Inner Product: The inner product (also called the dot product or scalar product)
of two vectors u and v is again defined if and only if they have the same number of elements


n
u · v = u1 v1 + u2 v2 + · · · + un vn = ui vi
i=1

If u · v = 0, the two vectors are orthogonal (or perpendicular).


Vector Norm: The norm of a vector is a measure of its length. There are many different
ways to calculate the norm, but the most common is the Euclidean norm (which corresponds
to our usual conception of distance in three-dimensional space):
√ √
||v|| = v · v = v1 v1 + v2 v2 + · · · + vn vn
( ) ( )
Example 1.1 (Vector Algebra). Let a = 2 1 2 ,b= 3 4 5 . Calculate the follow-
ing:
1. a − b
2. a · b
( ) ( )
Exercise
( ) Algebra). Let u = 7 1 −5 3 , v = 9
1.1 (Vector −3 2 8 , w =
1 13 −7 2 15 , and c = 2. Calculate the following:
1. u − v
2. cw
3. u · v
4. w · v

1.2 Linear Independence


Linear combinations: The vector u is a linear combination of the vectors v1 , v2 , · · · , vk
if
u = c1 v1 + c2 v2 + · · · + ck vk
( ) ( )
For
( example,
)
9 13 17 is a linear combination of the following three vectors: 1 2 3 ,
( ) ( ) ( ) ( )
2 (3 4 , and
) 3 4 5 . This is because 9 13 17 = (2) 1 2 3 +(−1) 2 3 4
+3 3 4 5
Linear independence: A set of vectors v1 , v2 , · · · , vk is linearly independent if the only
solution to the equation
c1 v1 + c2 v2 + · · · + ck vk = 0
is c1 = c2 = · · · = ck = 0. If another solution exists, the set of vectors is linearly dependent.
A set S of vectors is linearly dependent if and only if at least one of the vectors in S can be
written as a linear combination of the other vectors in S.
1.3. BASICS OF MATRIX ALGEBRA 19

Linear independence is only defined for sets of vectors with the same number of elements;
any linearly independent set of vectors in n-space contains at most n vectors.
( ) ( ) ( ) ( )
Since 9 13 17 is a linear combination of 1 2 3 , 2 3 4 , and 3 4 5 , these 4
vectors constitute a linearly dependent set.

Example 1.2 (Linear Independence). Are the following sets of vectors linearly indepen-
dent?
( ) ( )
1. (2 3 1) and
( 4 6) 1 ( )
2. 1 0 0 , 0 5 0 , and 10 10 0

Exercise 1.2 (Linear Independence). Are the following sets of vectors linearly independent?
1.      
1 1 1
v1 = 0 , v2 = 0 , v3 = 1
0 1 1

2.      
2 −4 −2
v1 = 2 , v2 =  6  , v3 =  8 
1 5 6

1.3 Basics of Matrix Algebra


Matrix: A matrix is an array of real numbers arranged in m rows by n columns. The
dimensionality of the matrix is defined as the number of rows by the number of columns,
m × n.
 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
A= . .. .. .. 
 .. . . . 
am1 am2 ··· amn

Note that you can think of vectors as special cases of matrices; a column vector of length k
is a k × 1 matrix, while a row vector of the same length is a 1 × k matrix.
It’s also useful to think of matrices as being made up of a collection of row or column vectors.
For example, ( )
A = a1 a2 · · · am

Matrix Addition: Let A and B be two m × n matrices.


 
a11 + b11 a12 + b12 · · · a1n + b1n
 a21 + b21 a22 + b22 · · · a2n + b2n 
 
A+B= .. .. .. .. 
 . . . . 
am1 + bm1 am2 + bm2 ··· amn + bmn
20 CHAPTER 1. LINEAR ALGEBRA

Note that matrices A and B must have the same dimensionality, in which case they are
conformable for addition.

Example 1.3. ( ) ( )
1 2 3 1 2 1
A= , B=
4 5 6 2 1 2
A+B=

Scalar Multiplication: Given the scalar s, the scalar multiplication of sA is


   
a11 a12 · · · a1n sa11 sa12 · · · sa1n
 a21 a22 · · · a2n   sa21 sa22 · · · sa2n 
   
sA = s  . .. .. ..  =  .. .. .. .. 
 .. . . .   . . . . 
am1 am2 · · · amn sam1 sam2 · · · samn
( )
1 2 3
Example 1.4. s = 2, A=
4 5 6
sA =

Matrix Multiplication: If A is an m × k matrix and B is a k × n matrix, then their


product C = AB is the m × n matrix where

cij = ai1 b1j + ai2 b2j + · · · + aik bkj


 
a b ( )
A B
Example 1.5. 1.  c d  =
C D
e f
 
( ) −2 5
1 2 −1 
2. 4 −3 =
3 1 4
2 1

Note that the number of columns of the first matrix must equal the number of rows of the
second matrix, in which case they are conformable for multiplication. The sizes of the
matrices (including the resulting product) must be

(m × k)(k × n) = (m × n)

Also note that if AB exists, BA exists only if dim(A) = m × n and dim(B) = n × m.


This does not mean that AB = BA. AB = BA is true only in special circumstances, like
when A or B is an identity matrix or A = B−1 .
Laws of Matrix Algebra:
1. Associative: (A + B) + C = A + (B + C)
(AB)C = A(BC)
1.3. BASICS OF MATRIX ALGEBRA 21

2. Commutative: A+B=B+A
3. Distributive: A(B + C) = AB + AC
(A + B)C = AC + BC
Commutative law for multiplication does not hold – the order of multiplication matters:

AB ̸= BA

For example,
(
) ( )
1 2 2 1
A= , B=
−1 3 0 1
( ) ( )
2 3 1 7
AB = , BA =
−2 2 −1 3

Transpose: The transpose of the m × n matrix A is the n × m matrix AT (also written


A′ ) obtained by interchanging the rows and columns of A.
For example,
 
( ) 4 0
4 −2 3
A= , AT = −2 5
0 5 −1
3 −1
 
2 ( )
B = −1 , BT = 2 −1 3
3
The following rules apply for transposed matrices:
1. (A + B)T = AT + BT
2. (AT )T = A
3. (sA)T = sAT
4. (AB)T = BT AT ; and by induction (ABC)T = CT BT AT
Example of (AB)T = BT AT :
 
( ) 0 1
1 3 2
A= , B = 2 2
2 −1 3
3 −1

 T 
( 0 1 ) ( )
1 3 2  12 7
(AB)T =  2 2  =
2 −1 3 5 −3
3 −1
 
( ) 1 2 ( )
0 2 3   12 7
T T
B A = 3 −1 =
1 2 −1 5 −3
2 3
22 CHAPTER 1. LINEAR ALGEBRA

Exercise 1.3 (Matrix Multiplication). Let


( )
2 0 −1 1
A=
1 2 0 1

 
1 5 −7
1 1 0
B=
0

−1 1
2 0 0
( )
3 2 −1
C=
0 4 6

Calculate the following:


1.
AB
2.
BA
3.
(BC)T
4.
BC T

1.4 Systems of Linear Equations


Linear Equation: a1 x1 + a2 x2 + · · · + an xn = b
ai are parameters or coefficients. xi are variables or unknowns.
Linear because only one variable per term and degree is at most 1.
We are often interested in solving linear systems like

x − 3y = −3
2x + y = 8

More generally, we might have a system of m equations in n unknowns

a11 x1 + a12 x2 + ··· + a1n xn = b1


a21 x1 + a22 x2 + ··· + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + ··· + amn xn = bm

A solution to a linear system of m equations in n unknowns is a set of n numbers


x1 , x2 , · · · , xn that satisfy each of the m equations.
1.5. SYSTEMS OF EQUATIONS AS MATRICES 23

Example: x = 3 and y = 2 is the solution to the above 2 × 2 linear system. If you graph the
two lines, you will find that they intersect at (3, 2).
Does a linear system have one, no, or multiple solutions? For a system of 2 equations with
2 unknowns (i.e., two lines): _
One solution: The lines intersect at exactly one point.
No solution: The lines are parallel.
Infinite solutions: The lines coincide.
Methods to solve linear systems:
1. Substitution
2. Elimination of variables
3. Matrix methods

Exercise 1.4 (Linear Equations). Provide a system of 2 equations with 2 unknowns that
has
1. one solution
2. no solution
3. infinite solutions

1.5 Systems of Equations as Matrices


Matrices provide an easy and efficient way to represent linear systems such as
a11 x1 + a12 x2 + ··· + a1n xn = b1
a21 x1 + a22 x2 + ··· + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + ··· + amn xn = bm

as
Ax = b
where
The m × n coefficient matrix A is an array of mn real numbers arranged in m rows by n
columns:  
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
A= . .. .. 
 .. . . 
am1 am2 ··· amn
 
x1
 x2 
 
The unknown quantities are represented by the vector x =  . .
 .. 
xn
24 CHAPTER 1. LINEAR ALGEBRA
 
b1
 b2 
 
The right hand side of the linear system is represented by the vector b =  . .
 .. 
bm
Augmented Matrix: When we append b to the coefficient matrix A, we get the augmented
b = [A|b]
matrix A
 
a11 a12 · · · a1n | b1
 a21 a22 · · · a2n | b2 
 
 .. .. .. .. 
 . . . | . 
am1 am2 · · · amn | bm

Exercise 1.5 (Augmented Matrix). Create an augmented matrix that represent the follow-
ing system of equations:

2x1 − 7x2 + 9x3 − 4x4 = 8

41x2 + 9x3 − 5x6 = 11

x1 − 15x2 − 11x5 = 9

1.6 Finding Solutions to Augmented Matrices and Sys-


tems of Equations

Row Echelon Form: Our goal is to translate our augmented matrix or system of equations
into row echelon form. This will provide us with the values of the vector x which solve
the system. We use the row operations to change coefficients in the lower triangle of the
augmented matrix to 0. An augmented matrix of the form

 
a′11 a′12 a′13 ··· a′1n | b′1
 
 0 a′22 a′23 ··· a′2n | b′2 
 
 
 0 0 a′33 ··· a′3n ′ 
| b3 

 .. .. . 
 0 . . | .. 
 0 0 
′ ′
0 0 0 0 amn | bm

is said to be in row echelon form — each row has more leading zeros than the row preceding
it.
Reduced Row Echelon Form: We can go one step further and put the matrix into
reduced row echelon form. Reduced row echelon form makes the value of x which solves the
system very obvious. For a system of m equations in m unknowns, with no all-zero rows,
the reduced row echelon form would be
1.6. FINDING SOLUTIONS TO AUGMENTED MATRICES AND SYSTEMS OF EQUATIONS25

 
1 0 0 0 0 | b∗1
 
 0 1 0 0 0 | b∗2 
 ∗
 0 0 1 0 0 | b3 
 
 .. . 
0 0 0 . 0 | .. 
0 0 0 0 1 | b∗m

Gaussian and Gauss-Jordan elimination: We can conduct elementary row operations


to get our augmented matrix into row echelon or reduced row echelon form. The methods of
transforming a matrix or system into row echelon and reduced row echelon form are referred
to as Gaussian elimination and Gauss-Jordan elimination, respectively.
Elementary Row Operations: To do Gaussian and Gauss-Jordan elimination, we use
three basic operations to transform the augmented matrix into another augmented matrix
that represents an equivalent linear system – equivalent in the sense that the same values
of xj solve both the original and transformed matrix/system:
Interchanging Rows: Suppose we have the augmented matrix
( )
b = a11 a12 | b1
A
a21 a22 | b2
If we interchange the two rows, we get the augmented matrix
( )
a21 a22 | b2
a11 a12 | b1

b
which represents a linear system equivalent to that represented by matrix A.
b by a constant c,
Multiplying by a Constant: If we multiply the second row of matrix A
we get the augmented matrix ( )
a11 a12 | b1
ca21 ca22 | cb2
b
which represents a linear system equivalent to that represented by matrix A.
b to the second,
Adding (subtracting) Rows: If we add (subtract) the first row of matrix A
we obtain the augmented matrix
( )
a11 a12 | b1
a11 + a21 a12 + a22 | b1 + b2

b
which represents a linear system equivalent to that represented by matrix A.

Example 1.6. Solve the following system of equations by using elementary row operations:
x − 3y = −3
2x + y = 8

Exercise 1.6 (Solving Systems of Equations). Put the following system of equations into
augmented matrix form. Then, using Gaussian or Gauss-Jordan elimination, solve the
system of equations by putting the matrix into row echelon or reduced row echelon form.
26 CHAPTER 1. LINEAR ALGEBRA



x + y + 2z = 2
1. 3x − 2y + z = 1


y−z =3



2x + 3y − z = −8
2. x + 2y − z = 12


−x − 4y + z = −6

1.7 Rank — and Whether a System Has One, Infinite,


or No Solutions
To determine how many solutions exist, we can use information about (1) the number of
equations m, (2) the number of unknowns n, and (3) the rank of the matrix representing
the linear system.
Rank: The maximum number of linearly independent row or column vectors in the matrix.
This is equivalent to the number of nonzero rows of a matrix in row echelon form. For any
matrix A, the row rank always equals column rank, and we refer to this number as the rank
of A.
For example
 
1 2 3
0 4 5
0 0 6
Rank = 3
 
1 2 3
0 4 5
0 0 0
Rank = 2

Exercise 1.7 (Rank of Matrices). Find the rank of each matrix below:
(Hint: transform the matrices into row echelon form. Remember that the number of nonzero
rows of a matrix in row echelon form is the rank of that matrix)
 
1 1 2
1.2 1 3
1 2 3

 
1 3 3 −3 3
1 3 1 1 3
2.
1 3 2

−1 −2
1 3 0 3 −2
1.8. THE INVERSE OF A MATRIX 27

Answer to Exercise 1.7:


1. rank is 2
2. rank is 3

1.8 The Inverse of a Matrix


Identity Matrix: The n × n identity matrix In is the matrix whose diagonal elements are
1 and all off-diagonal elements are 0. Examples:
 
( ) 1 0 0
1 0
I2 = , I3 = 0 1 0
0 1
0 0 1

Inverse Matrix: An n × n matrix A is nonsingular or invertible if there exists an n × n


matrix A−1 such that
AA−1 = A−1 A = In
where A−1 is the inverse of A. If there is no such A−1 , then A is singular or not invertible.
Example: Let ( ) ( )
2 3 −1 3
2
A= , B=
2 2 1 −1
Since
AB = BA = In
−1
we conclude that B is the inverse, A , of A and that A is nonsingular.
Properties of the Inverse:
• If the inverse exists, it is unique.
• If A is nonsingular, then A−1 is nonsingular.
• (A−1 )−1 = A
• If A and B are nonsingular, then AB is nonsingular
• (AB)−1 = B−1 A−1
• If A is nonsingular, then (AT )−1 = (A−1 )T
Procedure to Find A−1 : We know that if B is the inverse of A, then

AB = BA = In

Looking only at the first and last parts of this

AB = In

Solving for B is equivalent to solving for n linear systems, where each column of B is solved
for the corresponding column in In . We can solve the systems simultaneously by augmenting
28 CHAPTER 1. LINEAR ALGEBRA

A with In and performing Gauss-Jordan elimination on A. If Gauss-Jordan elimination on


[A|In ] results in [In |B], then B is the inverse of A. Otherwise, A is singular.
To summarize: To calculate the inverse of A
1. Form the augmented matrix [A|In ]
2. Using elementary row operations, transform the augmented matrix to reduced row
echelon form.
3. The result of step 2 is an augmented matrix [C|B].
a. If C = In , then B = A−1 .
b. If C ̸= In , then C has a row of zeros. This means A is singular and A−1 does
not exist.

Example 1.7. Find the inverse of the following matricies:


 
1 1 1
1. A = 0 2 3
5 5 1

Exercise 1.8 (Finding the inverse of matrices). Find the inverse of the following matrix:
 
1 0 4
1. A = 0 2 0
0 0 1

1.9 Linear Systems and Inverses

Let’s return to the matrix representation of a linear system

Ax = b

If A is an n × n matrix,then Ax = b is a system of n equations in n unknowns. Suppose A


is nonsingular. Then A−1 exists. To solve this system, we can multiply each side by A−1
and reduce it as follows:

A−1 (Ax) = A−1 b


(A−1 A)x = A−1 b
In x = A−1 b
x = A−1 b

Hence, given A and b and given that A is nonsingular, then x = A−1 b is a unique solution
to this system.
1.10. DETERMINANTS 29

Exercise 1.9 (Solve linear system using inverses). Use the inverse matrix to solve the
following linear system:

−3x + 4y = 5
2x − y = −10

Hint: the linear system above can be written in the matrix form
Az = b
given ( )
−3 4
A= ,
2 −1

( )
x
z= ,
y
and ( )
5
b=
−10

1.10 Determinants
Singularity: Determinants can be used to determine whether a square matrix is nonsingu-
lar.
A square matrix is nonsingular if and only if its determinant is not zero.
Determinant of a 1 × 1 matrix, A, equals a11
a11 a12
Determinant of a 2 × 2 matrix, A, :
a21 a22

det(A) = |A|
= a11 |a22 | − a12 |a21 |
= a11 a22 − a12 a21

We can extend the second to last equation above to get the definition of the determinant of
a 3 × 3 matrix:

a11 a12 a13


a22 a23 a a23 a a22
a21 a22 a23 = a11 − a12 21 + a13 21
a32 a33 a31 a33 a31 a32
a31 a32 a33
= a11 (a22 a33 − a23 a32 ) − a12 (a21 a33 − a23 a31 ) + a13 (a21 a32 − a22 a31 )
30 CHAPTER 1. LINEAR ALGEBRA

Let’s extend this now to any n × n matrix. Let’s define Aij as the (n − 1) × (n − 1)
submatrix of A obtained by deleting row i and column j. Let the (i, j)th minor of A be
the determinant of Aij :
Mij = |Aij |
Then for any n × n matrix A

|A| = a11 M11 − a12 M12 + · · · + (−1)n+1 a1n M1n

For example, in figuring out whether the following matrix has an inverse?
 
1 1 1
A = 0 2 3
5 5 1

1. Calculate its determinant.

= 1(2 − 15) − 1(0 − 15) + 1(0 − 10)


= −13 + 15 − 10
= −8

2. Since |A| ̸= 0, we conclude that A has an inverse.

Exercise 1.10 (Determinants and Inverses). Determine whether the following matrices are
nonsingular:
 
1 0 1
1. 2 1 2
1 0 −1
 
2 1 2
2. 1 0 1
4 1 4

1.11 Getting Inverse of a Matrix using its Determinant


Thus far, we have a number of algorithms to
1. Find the solution of a linear system,
2. Find the inverse of a matrix
but these remain just that — algorithms. At this point, we have no way of telling how the
solutions xj change as the parameters aij and bi change, except by changing the values and
“rerunning” the algorithms.
With determinants, we can provide an explicit formula for the inverse and therefore provide
an explicit formula for the solution of an n × n linear system.
1.11. GETTING INVERSE OF A MATRIX USING ITS DETERMINANT 31

Hence, we can examine how changes in the parameters and bi affect the solutions xj .
Determinant Formula for the Inverse of a 2 × 2:
( )
a b
The determinant of a 2 × 2 matrix A is defined as:
c d
( )
1 d −b
det(A) −c a

For example, Let’s calculate the inverse of matrix A from Exercise 1.9 using the determinant
formula.
Recall,
( )
−3 4
A=
2 −1

det(A) = (−3)(−1) − (4)(2) = 3 − 8 = −5

( )
1 −1 −4
det(A) −2 −3
( )
1 −1 −4
−5 −2 −3
(1 4
)
5 5
2 3
5 5

Exercise 1.11 (Calculate Inverse using Determinant Formula). Caculate the inverse of A
( )
3 5
A=
−7 2

Answers to Examples and Exercises


Answer to Example 1.1:
( )
1. −1 −3 −3
2. 6 + 4 + 10 = 20
Answer to Exercise 1.1:
( )
1. (−2 4 −7 −5 )
2. 2 26 −14 4 30
3. 63 -3 -10 + 24 = 74
4. undefined
32 CHAPTER 1. LINEAR ALGEBRA

Answer to Example 1.2:


1. yes
2. no
Answer to Exercise 1.2:
1. yes
2. no (−v1 − v2 + v3 = 0)
Answer to Example 1.3:
( )
2 4 4
A+B=
6 6 8
Answer to Example 1.4:
( )
2 4 6
sA =
8 10 12
Answer to Example 1.5:
 
aA + bC aB + bD
1.  cA + dC cB + dD 
eA + f C eB + f D
( ) ( )
1(−2) + 2(4) − 1(2) 1(5) + 2(−3) − 1(1) 4 −2
2. =
3(−2) + 1(4) + 4(2) 3(5) + 1(−3) + 4(1) 6 16
Answer to Exercise 1.3:
( )
4 11 −15
1. AB =
5 7 −7
2. BA = undefined
3. (BC)T = undefined
   
1 5 −7   20 −22
1 1  3 0 5
0 4 
4. BC = 
T 
0 −1 1  2 4 = −3
  
2 
−1 6
2 0 0 6 0
Answer to Exercise 1.4:
There are many answers to this. Some possible simple ones are as follows:
1. One solution:
−x + y = 0
x + y = 2

2. No solution:
−x + y = 0
x − y = 2

3. Infinite solutions:
−x + y = 0
2x − 2y = 0
1.11. GETTING INVERSE OF A MATRIX USING ITS DETERMINANT 33

Answer to Exercise 1.5:


 
2 −7 9 −4 0 0| 8
0 41 9 0 0 5| 11
1 −15 0 0 −11 0| 9
Answer to Example 1.6:

x − 3y = −3
2x + y = 8

x − 3y = −3
7y = 14

x − 3y = −3
y = 2

x = 3
y = 2

Answer to Exercise 1.6:


1. x = 2, y = 2, z = -1
2. x = -17, y = -3, z = -35
Answer to Exercise 1.7:
1. rank is 2
2. rank is 3
Answer to Example 1.7:
 
1 1 1 1 0 0
 0 2 3 0 1 0 
5 5 1 0 0 1
 
1 1 1 1 0 0
 0 2 3 0 1 0 
0 0 −4 −5 0 1
 
1 1 1 1 0 0
 0 2 3 0 1 0 
0 0 1 5/4 0 −1/4
 
1 1 0 −1/4 0 1/4
 0 2 0 −15/4 1 3/4 
0 0 1 5/4 0 −1/4
 
1 1 0 −1/4 0 1/4
 0 1 0 −15/8 1/2 3/8 
0 0 1 5/4 0 −1/4
34 CHAPTER 1. LINEAR ALGEBRA
 
1 0 0 13/8 −1/2 −1/8
 0 1 0 −15/8 1/2 3/8 
0 0 1 5/4 0 −1/4
 
13/8 −1/2 −1/8
A−1 =  −15/8 1/2 3/8 
5/4 0 −1/4
Answer to Exercise 1.8:
 
1 0 −4
1. A−1 = 0 12 0
0 0 1
Answer to Exercise 1.9:
( )( ) ( ) ( )
1/5 4/5 5 −7 x
z = A−1 b = = =
2/5 3/5 −10 −4 y
Answer to Exercise 1.10:
1. nonsingular
2. singular
Answer to Exercise 1.11:
( 2 −5 )
41 41
7 3
41 41
Chapter 2

Functions and Operations

Topics Dimensionality; Interval Notation for R1 ; Neighborhoods: Intervals, Disks, and


Balls; Introduction to Functions; Domain and Range; Some General Types of Functions; log,
ln, and exp; Other Useful Functions; Graphing Functions; Solving for Variables; Finding
Roots; Limit of a Function; Continuity; Sets, Sets, and More Sets.

∑ ∏
2.1 Summation Operators and
Addition (+), Subtraction (-), multiplication and division are basic operations of arithmetic
– combining numbers. In statistics and calculus, we want to add a sequence of numbers
that can be expressed as a pattern without needing to write down all its components. For
example, how would we express the sum of all numbers from 1 to 100 without writing a
hundred numbers?
∑ ∏
For this we use the summation operator and the product operator .
Summation:


100
xi = x1 + x2 + x3 + · · · + x100
i=1


The bottom of the symbol indicates an index (here, i), and its
∑start value 1. At the top
is where the index ends. The notion of “addition” is part of the symbol. The content to
the right of the summation is the meat of what we add. While you can pick your favorite
index, start, and end values, the content must also have the index.

n ∑
n
• cxi = c xi
i=1 i=1
∑n ∑
n ∑
n
• (xi + yi ) = xi + yi
i=1 i=1 i=1
∑n
• c = nc
i=1

35
36 CHAPTER 2. FUNCTIONS AND OPERATIONS

Product:


n
xi = x1 x2 x3 · · · xn
i=1

Properties:

n ∏
n
• cxi = cn xi
i=1 i=1

n ∏
n
• cxk = cn−k xi
i=k i=k

n
• (xi + yi ) = a total mess
i=1

n
• c = cn
i=1

Other Useful Functions

Factorials!:

x! = x · (x − 1) · (x − 2) · · · (1)

Modulo: Tells you the remainder when you divide the first number by the second.

• 17 mod 3 = 2
• 100 % 30 = 10


5
Example 2.1 (Operators). 1. i=
i=1


5
2. i=
i=1

3. 14 mod 4 =

4. 4! =

Exercise 2.1 (Operators). Let x1 = 4, x2 = 3, x3 = 7, x4 = 11, x5 = 2



3
1. (7)xi
i=1


5
2. 2
i=1


5
3. (2)xi
i=3
2.2. INTRODUCTION TO FUNCTIONS 37

2.2 Introduction to Functions

A function (in R1 ) is a mapping, or transformation, that relates members of one set to


members of another set. For instance, if you have two sets: set A and set B, a function from
A to B maps every value a in set A such that f (a) ∈ B. Functions can be “many-to-one”,
where many values or combinations of values from set A produce a single output in set B,
or they can be “one-to-one”, where each value in set A corresponds to a single value in set
B. A function by definition has a single function value for each element of its domain. This
means, there cannot be “one-to-many” mapping.
Dimensionality: R1 is the set of all real numbers extending from −∞ to +∞ — i.e., the
real number line. Rn is an n-dimensional space, where each of the n axes extends from −∞
to +∞.
• R1 is a one dimensional line.
• R2 is a two dimensional plane.
• R3 is a three dimensional space.
Points in Rn are ordered n-tuples (just means an combination of n elements where order
matters), where each element of the n-tuple represents the coordinate along that dimension.
For example:
• R1 : (3)
• R2 : (-15, 5)
• R3 : (86, 4, 0)
Examples of mapping notation:
Function of one variable: f : R1 → R1
• f (x) = x + 1. For each x in R1 , f (x) assigns the number x + 1.
Function of two variables: f : R2 → R1 .
• f (x, y) = x2 + y 2 . For each ordered pair (x, y) in R2 , f (x, y) assigns the number
x2 + y 2 .
We often use variable x as input and another y as output, e.g. y = x + 1

Example 2.2 (Functions). For each of the following, state whether they are one-to-one or
many-to-one functions.
1. For x ∈ [0, ∞], f : x → x2 (this could also be written as f (x) = x2 ).
2. For x ∈ [−∞, ∞], f : x → x2 .

Exercise 2.2 (Functions). For each of the following, state whether they are one-to-one or
many-to-one functions.
1. For x ∈ [−3, ∞], f : x → x2 .

2. For x ∈ [0, ∞], f : x → x
38 CHAPTER 2. FUNCTIONS AND OPERATIONS

Some functions are defined only on proper subsets of Rn .


• Domain: the set of numbers in X at which f (x) is defined.
• Range: elements of Y assigned by f (x) to elements of X, or

f (X) = {y : y = f (x), x ∈ X}

Most often used when talking about a function f : R1 → R1 .


• Image: same as range, but more often used when talking about a function f : Rn →
R1 .
Some General Types of Functions
Monomials: f (x) = axk
a is the coefficient. k is the degree.
Examples: y = x2 , y = − 12 x3
Polynomials: sum of monomials.
Examples: y = − 21 x3 + x2 , y = 3x + 5
The degree of a polynomial is the highest degree of its monomial terms. Also, it’s often a
good idea to write polynomials with terms in decreasing degree.
Exponential Functions: Example: y = 2x

2.3 log and exp

Relationship of logarithmic and exponential functions:

y = loga (x) ⇐⇒ ay = x

The log function can be thought of as an inverse for exponential functions. a is referred to
as the “base” of the logarithm.
Common Bases: The two most common logarithms are base 10 and base e.
1. Base 10: y = log10 (x) ⇐⇒ 10y = x. The base 10 logarithm is often simply written
as “log(x)” with no base denoted.
2. Base e: y = loge (x) ⇐⇒ ey = x. The base e logarithm is referred to as the
“natural” logarithm and is written as “ln(x)“.
Properties of exponential functions:
• ax ay = ax+y
• a−x = 1/ax
• ax /ay = ax−y
• (ax )y = axy
• a0 = 1
2.3. LOG AND EXP 39

Properties of logarithmic functions (any base):


Generally, when statisticians or social scientists write log(x) they mean loge (x). In other
words: loge (x) ≡ ln(x) ≡ log(x)

loga (ax ) = x
and
aloga (x) = x
• log(xy) = log(x) + log(y)
• log(xy ) = y log(x)
• log(1/x) = log(x−1 ) = − log(x)
• log(x/y) = log(x · y −1 ) = log(x) + log(y −1 ) = log(x) − log(y)
• log(1) = log(e0 ) = 0
Change of Base Formula: Use the change of base formula to switch bases as necessary:

loga (x)
logb (x) =
loga (b)

Example:
ln(x)
log10 (x) =
ln(10)

You can use logs to go between sum and product notation. This will be particularly impor-
tant when you’re learning maximum likelihood estimation.

(∏
n )
log xi = log(x1 · x2 · x3 · · · · xn )
i=1
= log(x1 ) + log(x2 ) + log(x3 ) + · · · + log(xn )
∑n
= log(xi )
i=1

Therefore, you can see that the log of a product is equal to the sum of the logs. We can
write this more generally by adding in a constant, c:

(∏
n )
log cxi = log(cx1 · cx2 · · · cxn )
i=1
= log(cn · x1 · x2 · · · xn )
= log(cn ) + log(x1 ) + log(x2 ) + · · · + log(xn )


n
= n log(c) + log(xi )
i=1
40 CHAPTER 2. FUNCTIONS AND OPERATIONS

Example 2.3 (Logarithmic Functions). Evaluate each of the following logarithms


1. log4 (16)
2. log2 (16)
Simplify the following logarithm. By “simplify”, we actually really mean - use as many of
the logarithmic properties as you can.
3. log4 (x3 y 5 )

Exercise 2.3 (Logarithmic Functions). Evaluate each of the following logarithms


1. log 32 ( 27
8 )

Simplify each of the following logarithms. By “simplify”, we actually really mean - use as
many of the logarithmic properties as you can.
9 5
2. log( xz3y )

3. ln xy

2.4 Graphing Functions


What can a graph tell you about a function?
• Is the function increasing or decreasing? Over what part of the domain?
• How “fast” does it increase or decrease?
• Are there global or local maxima and minima? Where?
• Are there inflection points?
• Is the function continuous?
• Is the function differentiable?
• Does the function tend to some limit?
• Other questions related to the substance of the problem at hand.

2.5 Solving for Variables and Finding Roots


Sometimes we’re given a function y = f (x) and we want to find how x varies as a function
of y. Use algebra to move x to the left hand side (LHS) of the equation and so that the
right hand side (RHS) is only a function of y.

Example 2.4 (Solving for Variables). Solve for x:


1. y = 3x + 2
2. y = ex

Solving for variables is especially important when we want to find the roots of an equation:
those values of variables that cause an equation to equal zero. Especially important in
finding equilibria and in doing maximum likelihood estimation.
2.6. SETS 41

Procedure: Given y = f (x), set f (x) = 0. Solve for x.


Multiple Roots:
√ √
f (x) = x2 − 9 =⇒ 0 = x2 − 9 =⇒ 9 = x2 =⇒ ± 9 = x2 =⇒ ±3 = x

Quadratic Formula: For quadratic equations ax2 + bx + c = 0, use the quadratic formula:

−b ± b2 − 4ac
x=
2a

Exercise 2.4 (Finding Roots). Solve for x:


1. f (x) = 3x + 2 = 0
2. f (x) = x2 + 3x − 4 = 0
3. f (x) = e−x − 10 = 0

2.6 Sets
Interior Point: The point x is an interior point of the set S if x is in S and if there is
some ϵ-ball around x that contains only points in S. The interior of S is the collection of
all interior points in S. The interior can also be defined as the union of all open sets in S.
• If the set S is circular, the interior points are everything inside of the circle, but not
on the circle’s rim.
• Example: The interior of the set {(x, y) : x2 + y 2 ≤ 4} is {(x, y) : x2 + y 2 < 4} .
Boundary Point: The point x is a boundary point of the set S if every ϵ-ball around x
contains both points that are in S and points that are outside S. The boundary is the
collection of all boundary points.
• If the set S is circular, the boundary points are everything on the circle’s rim.
• Example: The boundary of {(x, y) : x2 + y 2 ≤ 4} is {(x, y) : x2 + y 2 = 4}.
Open: A set S is open if for each point x in S, there exists an open ϵ-ball around x
completely contained in S.
• If the set S is circular and open, the points contained within the set get infinitely close
to the circle’s rim, but do not touch it.
• Example: {(x, y) : x2 + y 2 < 4}
Closed: A set S is closed if it contains all of its boundary points.
• Alternatively: A set is closed if its complement is open.
• If the set S is circular and closed, the set contains all points within the rim as well as
the rim itself.
• Example: {(x, y) : x2 + y 2 ≤ 4}
• Note: a set may be neither open nor closed. Example: {(x, y) : 2 < x2 + y 2 ≤ 4}
Complement: The complement of set S is everything outside of S.
42 CHAPTER 2. FUNCTIONS AND OPERATIONS

• If the set S is circular, the complement of S is everything outside of the circle.


• Example: The complement of {(x, y) : x2 + y 2 ≤ 4} is {(x, y) : x2 + y 2 > 4}.
Empty: The empty (or null) set is a unique set that has no elements, denoted by {} or ∅.
• The empty set is an example of a set that is open and closed, or a “clopen” set.
• Examples: The set of squares with 5 sides; the set of countries south of the South
Pole.

Answers to Examples and Exercises


Answer to Example 2.1:
1. 1 + 2 + 3 + 4 + 5 = 15
2. 1 * 2 * 3 * 4 * 5 = 120
3. 2
4. 4 * 3 * 2 * 1 = 24
Answer to Exercise 2.1:
1. 7(4 + 3 + 7) = 98
2. 2 + 2 + 2 + 2 + 2 = 10
3. 23 (7)(11)(2) = 1232
Answer to Example 2.2:
1. one-to-one
2. many-to-one
Answer to Exercise 2.2:
1. many-to-one
2. one-to-one
Answer to Example 2.3:
1. 2
2. 4
3. 3 log4 (x) + 5 log4 (y)
Answer to Exercise 2.3:
1. 3
2. 9 log(x) + 5 log(y) − 3 log(z)
1
3. 2 (ln x + ln y)
Answer to Example 2.4:
2.6. SETS 43

1. y = 3x + 2 =⇒ −3x = 2 − y =⇒ 3x = y − 2 =⇒ x = 31 (y − 2)
2. x = ln y
Answer to Exercise 2.4:
−2
1. 3

2. x = {1, -4}
3. x = - ln 10
44 CHAPTER 2. FUNCTIONS AND OPERATIONS
Chapter 3

Limits

Solving limits, i.e. finding out the value of functions as its input moves closer to some value,
is important for the social scientist’s mathematical toolkit for two related tasks. The first
is for the study of calculus, which will be in turn useful to show where certain functions are
maximized or minimized. The second is for the study of statistical inference, which is the
study of inferring things about things you cannot see by using things you can see.

Example: The Central Limit Theorem


Perhaps the most important theorem in statistics is the Central Limit Theorem,

Theorem 3.1 (Central Limit Theorem (i.i.d. case)). For any series of independent and
identically distributed random variables X1 , X2 , · · ·, we know the distribution of its sum even
if we do not know the distribution of X. The distribution of the sum is a Normal distribution.

X̄n − µ d
√ − → Normal(0, 1),
σ/ n

where µ is the mean of X and σ is the standard deviation of X. The arrow is read as
“converges in distribution to”. Normal(0, 1) indicates a Normal Distribution with mean 0
and variance 1.
That is, the limit of the distribution of the lefthand side is the distribution of the righthand
side.

The sign of a limit is the arrow “→”. Although we have not yet covered probability (in
Section 6) so we have not described what distributions and random variables are, it is worth
foreshadowing the Central Limit Theorem. The Central Limit Theorem is powerful because
it gives us a guarantee of what would happen if n → ∞, which in this case means we collected
more data.

45
46 CHAPTER 3. LIMITS

1.00
Estimate of the Probability of Heads after n trials

0.75

Estimate at
0.50 n = 1,000:
0.487

0.25

0.00

0 250 500 750 1,000


n, or the number of times of a coin−flip experiment

Figure 3.1: As the number of coin tosses goes to infinity, the average probabiity of heads
converges to 0.5

Example: The Law of Large Numbers

A finding that perhaps rivals the Central Limit Theorem is the Law of Large Numbers:

Theorem 3.2 ((Weak) Law of Large Numbers). For any draw of identically distributed
independent variables with mean µ, the sample average after n draws, X̄n , converges in
probability to the true mean as n → ∞:

lim P (|X̄n − µ| > ε) = 0


n→∞

p
A shorthand of which is X̄n −
→ µ, where the arrow is read as “converges in probability to”.

Intuitively, the more data, the more accurate is your guess. For example, the Figure 3.1
shows how the sample average from many coin tosses converges to the true value : 0.5.
3.1. SEQUENCES 47

3.1 Sequences
We need a couple of steps until we get to limit theorems in probability. First we will
introduce a “sequence”, then we will think about the limit of a sequence, then we will think
about the limit of a function.
A sequence
{xn } = {x1 , x2 , x3 , . . . , xn }
is an ordered set of real numbers, where x1 is the first term in the sequence and yn is the
nth term. Generally, a sequence is infinite, that is it extends to n = ∞. We can also write
the sequence as
{xn }∞
n=1

where the subscript and superscript are read together as “from 1 to infinity.”

Example 3.1 (Sequences). How does these sequence behave?


{ }
1. {An } = {2 − n1}
2
2
2. {Bn } = n n+1
{ ( )}
3. {Cn } = (−1)n 1 − n1

We find the sequence by simply “plugging in” the integers into each n. The important thing
is to get a sense of how these numbers are going to change. Example 1’s numbers seem
to come closer and closer to 2, but will it ever surpass 2? Example 2’s numbers are also
increasing each time, but will it hit a limit? What is the pattern in Example 3? Graphing
helps you make this point more clearly. See the sequence of n = 1, ...20 for each of the three
examples in Figure 3.2.

3.2 The Limit of a Sequence


The notion of “converging to a limit” is the behavior of the points in Example 3.1. In some
sense, that’s the counterfactual we want to know. What happens as n → ∞?
1. Sequences like 1 above that converge to a limit.
2. Sequences like 2 above that increase without bound.
3. Sequences like 3 above that neither converge nor increase without bound — alternating
over the number line.

Definition 3.1. The sequence {yn } has the limit L, which we write as

lim yn = L
n→∞

, if for any ϵ > 0 there is an integer N (which depends on ϵ) with the property that
|yn − L| < ϵ for each n > N . {yn } is said to converge to L. If the above does not hold, then
{yn } diverges.
48 CHAPTER 3. LIMITS

1.0
2.00 20

1.75 0.5
15

Cn
An

Bn

1.50 0.0
10

1.25 −0.5

1.00
−1.0
5 10 15 20 5 10 15 20 5 10 15 20
n n n

Figure 3.2: Behavior of Some Sequences


3.2. THE LIMIT OF A SEQUENCE 49

We can also express the behavior of a sequence as bounded or not:


1. Bounded: if |yn | ≤ K for all n
2. Monotonically Increasing: yn+1 > yn for all n
3. Monotonically Decreasing: yn+1 < yn for all n
A limit is unique: If {yn } converges, then the limit L is unique.
If a sequence converges, then the sum of such sequences also converges. Let lim yn = y
n→∞
and lim zn = z. Then
n→∞

1. lim [kyn + ℓzn ] = ky + ℓz


n→∞
2. lim yn zn = yz
n→∞
3. lim yn
= yz , provided z ̸= 0
n→∞ zn

This looks reasonable enough. The harder question, obviously is when the parts of the
fraction don’t converge. If limn→∞ yn = ∞ and limn→∞ zn = ∞, What is limn→∞ yn − zn ?
What is limn→∞ yznn ?
It is nice for a sequence to converge in limit. We want to know if complex-looking sequences
converge or not. The name of the game here is to break that complex sequence up into
sums of simple fractions where n only appears in the denominator: n1 , n12 , and so on. Each
of these will converge to 0, because the denominator gets larger and larger. Then, because
of the properties above, we can then find the final sequence.

Example 3.2 (Simplifying a Fraction into Sums). Find the limit of


n+3
lim ,
n→∞ n

Solution. At first glance, n + 3 and n both grow to ∞, so it looks like we need to divide
infinity by infinity. However, we can express this fraction as a sum, then the limits apply
separately:
( ) ( )
n+3 3 3
lim = lim 1 + = lim 1 + lim
n→∞ n n→∞ n n→∞ n→∞ n
| {z } | {z }
1 0

so, the limit is actually 1.


After some practice, the key to intuition is whether one part of the fraction grows “faster”
than another. If the denominator grows faster to infinity than the numerator, then the
fraction will converge to 0, even if the numerator will also increase to infinity. In a sense,
limits show how not all infinities are the same.

Exercise 3.1. Find the following limits of sequences, then explain in English the intuition
for why that is the case.
2n
1. lim 2
n→∞ n +1
2. lim (n3 − 100n2 )
n→∞
50 CHAPTER 3. LIMITS

3.3 Limits of a Function

We’ve now covered functions and just covered limits of sequences, so now is the time to
combine the two.
A function f is a compact representation of some behavior we care about. Like for sequences,
we often want to know if f (x) approaches some number L as its independent variable x moves
to some number c (which is usually 0 or ±∞). If it does, we say that the limit of f (x), as
x approaches c, is L: lim f (x) = L. Unlike a sequence, x is a continuous number, and we
x→c
can move in decreasing order as well as increasing.
For a limit L to exist, the function f (x) must approach L from both the left (increasing)
and the right (decreasing).

Definition 3.2 (Limit of a function). Let f (x) be defined at each point in some open
interval containing the point c. Then L equals lim f (x) if for any (small positive) number ϵ,
x→c
there exists a corresponding number δ > 0 such that if 0 < |x − c| < δ, then |f (x) − L| < ϵ.

A neat, if subtle result is that f (x) does not necessarily have to be defined at c for lim to
x→c
exist.
Properties: Let f and g be functions with lim f (x) = k and lim g(x) = ℓ.
x→c x→c

1. lim [f (x) + g(x)] = lim f (x) + lim g(x)


x→c x→c x→c
2. lim kf (x) = k lim f (x)
x→c [
x→c ] [ ]
3. lim f (x)g(x) = lim f (x) · lim g(x)
x→c x→c x→c
lim f (x)
f (x)
4. lim = lim g(x) ,
x→c
provided lim g(x) ̸= 0.
x→c g(x) x→c
x→c

Simple limits of functions can be solved as we did limits of sequences. Just be careful which
part of the function is changing.

Example 3.3 (Limits of Functions). Find the limit of the following functions.
1. limx→c k
2. limx→c x
3. limx→2 (2x − 3)
4. limx→c xn

Limits can get more complex in roughly two ways. First, the functions may become large
polynomials with many moving pieces. Second,the functions may become discontinuous.
The function can be thought of as a more general or “smooth” version of sequences. For
example,

Exercise 3.2 (Limits of a Fraction of Functions). Find the limit of


3.4. CONTINUITY 51

(x4 + 3x99)(2x5 )
lim
x→∞ (18x7 + 9x6 3x2 1)(x + 1)

Now, the functions will become a bit more complex:

Exercise 3.3. Solve the following limits of functions


1. lim |x|
x→0 ( )
1
2. lim 1 + x2
x→0

So there are a few more alternatives about what a limit of a function could be:
1. Right-hand limit: The value approached by f (x) when you move from right to left.
2. Left-hand limit: The value approached by f (x) when you move from left to right.
3. Infinity: The value approached by f (x) as x grows infinitely large. Sometimes this
may be a number; sometimes it might be ∞ or −∞.
4. Negative infinity: The value approached by f (x) as x grows infinitely negative. Some-
times this may be a number; sometimes it might be ∞ or −∞.
The distinction between left and right becomes important when the function is not deter-
mined for some values of x. What are those cases in the examples below?

3.4 Continuity
To repeat a finding from the limits of functions: f (x) does not necessarily have to be defined
at c for lim to exist. Functions that have breaks in their lines are called discontinuous.
x→c
Functions that have no breaks are called continuous. Continuity is a concept that is more
fundamental to, but related to that of “differentiability”, which we will cover next in calculus.

Definition 3.3 (Continuity). Suppose that the domain of the function f includes an open
interval containing the point c. Then f is continuous at c if lim f (x) exists and if lim f (x) =
x→c x→c
f (c). Further, f is continuous on an open interval (a, b) if it is continuous at each point in
the interval.

To prove that a function is continuous for all points is beyond this practical introduction to
math, but the general intuition can be grasped by graphing.

Example 3.4 (Continuous and Discontinuous Functions). For each function, determine if
it is continuous or discontinuous.

1. f (x) = x
2. f (x) = ex
3. f (x) = 1 + x12
4. f (x) = floor(x).
52 CHAPTER 3. LIMITS

f(x) = x 1
f(x) =
x
15

20

10
10
f(x)

f(x)

5 −10

−20

0
0 5 10 15 −2 −1 0 1 2
x x

Figure 3.3: Functions which are not defined in some areas


3.4. CONTINUITY 53

f(x) = x f(x) = ex
3
6
2
f(x)

f(x)
4

1
2

0 0
0.0 2.5 5.0 7.5 10.0 −2 −1 0 1 2
x x

1 f(x) = floor(x)
f(x) = 1 + 2
x 5

150 4

3
f(x)
100
f(x)

2
50
1

0 0
−4 −2 0 2 4 0 1 2 3 4 5
x x

Figure 3.4: Continuous and Discontinuous Functions

The floor is the smaller of the two integers bounding a number. So floor(x = 2.999) = 2,
floor(x = 2.0001) = 2, and floor(x = 2) = 2.

Solution. In Figure 3.4, we can see that the first two functions are continuous, and the next
two are discontinuous. f (x) = 1 + x12 is discontinuous at x = 0, and f (x) = floor(x) is
discontinuous at each whole number.
Some properties of continuous functions:
1. If f and g are continuous at point c, then f + g, f − g, f · g, |f |, and αf are continuous
at point c also. f /g is continuous, provided g(c) ̸= 0.
2. Boundedness: If f is continuous on the closed bounded interval [a, b], then there is a
number K such that |f (x)| ≤ K for each x in [a, b].
3. Max/Min: If f is continuous on the closed bounded interval [a, b], then f has a maxi-
mum and a minimum on [a, b]. They may be located at the end points.

Exercise 3.4 (Limit when Denominator converges to 0). Let

x2 + 2x
f (x) = .
x
1. Graph the function. Is it defined everywhere?
2. What is the functions limit at x → 0?
54 CHAPTER 3. LIMITS

Answers to Examples
Example 3.1 { } { }
Solution. 1.{ {An }}= 2 − n12 = 1, 74 , 17 31 49
9 , 16 , 25 , . . . = 2
2 { }
2. {Bn } = n n+1 = 2, 52 , 10 17
3 , 4 ...,
{ ( )} { }
3. {Cn } = (−1)n 1 − n1 = 0, 12 , − 23 , 34 , − 45
Exercise 3.1
Example 3.3
Solution. 1. k
2. c
3. limx→2 (2x − 3) = 2 lim x − 3 lim 1 = 1
x→2 x→2
4. limx→c xn = lim x · · · [ lim x] = c · · · c = cn
x→c x→c

Exercise 3.2
Solution. Although this function seems large, the thing our eyes should focus on is where
the highest order polynomial remains. That will grow the fastest, so if the highest order
term is on the denominator, the fraction will converge to 0, if it is on the numerator it will
converge to negative infinity. Previewing the multiplication by hand, we can see that the
−x9 on the numerator will be the largest power. So the answer will be −∞. We can also
confirm this by writing out fractions:

( )( 2 )
1 + x33 − 4x
99
− x5 + 1
lim ( )( )
4

x→∞ 1 + 9 − 3 5 − 1 7 1 + x1
18x 18x 18x
x4 x5 1 1
× ×− × ×
1 1 18x7 x
x
=1 × lim
−x→∞ 18

Exercise 3.4
Solution. See Figure 3.5.
Divide each part by x, and we get x + x2 on the numerator, 1 on the denominator. So,
without worrying about a function being not defined, we can say limx→0 f (x) = 0.
3.4. CONTINUITY 55

x2 + 2x
f(x) =
x2
4

2
f(x)

−2

−4 −2 0 2
x

Figure 3.5: A function undedefined at x = 0


56 CHAPTER 3. LIMITS
Chapter 4

Calculus

Calculus is a fundamental part of any type of statistics exercise. Although you may not be
taking derivatives and integral in your daily work as an analyst, calculus undergirds many
concepts we use: maximization, expectation, and cumulative probability.

Example: The Mean is a Type of Integral


The average of a quantity is a type of weighted mean, where the potential values are weighted
by their likelihood, loosely speaking. The integral is actually a general way to describe this
weighted average when there are conceptually an infinite number of potential values.
If X is a continuous random variable, its expected value E(X) – the center of mass – is
given by
∫ ∞
E(X) = xf (x)dx
−∞

where f (x) is the probability density function of X.


This is a continuous version of the case where X is discrete, in which case



E(X) = xj P (X = xj )
j=1

even more concretely, if the potential values of X are finite, then we can write out the
expected value as a weighted mean, where the weights is the probability that the value
occurs.
 
∑ 
E(X) = x ·
|{z} P (X = x) 

x
| {z }
value weight, or PMF

57
58 CHAPTER 4. CALCULUS

4.1 Derivatives
The derivative of f at x is its rate of change at x: how much f (x) changes with a change in x.
The rate of change is a fraction — rise over run — but because not all lines are straight and
the rise over run formula will give us different values depending on the range we examine,
we need to take a limit (Section 3).

Definition 4.1 (Derivative). Let f be a function whose domain includes an open interval
containing the point x. The derivative of f at x is given by

d f (x + h) − f (x) f (x + h) − f (x)
f (x) = lim = lim
dx h→0 (x + h) − x h→0 h

There are a two main ways to denote a derivate:


d
• Leibniz Notation: dx (f (x))
• Prime or Lagrange Notation: f ′ (x)

If f (x) is a straight line, the derivative is the slope. For a curve, the slope changes by the
values of x, so the derivative is the slope of the line tangent to the curve at x. See, For
example, Figure 4.1.
If f ′ (x) exists at a point x0 , then f is said to be differentiable at x0 . That also implies
that f (x) is continuous at x0 .

Properties of derivatives

Suppose that f and g are differentiable at x and that α is a constant. Then the functions
f ± g, αf , f g, and f /g (provided g(x) ̸= 0) are also differentiable at x. Additionally,
Constant rule:

[kf (x)] = kf ′ (x)

Sum rule:

[f (x) ± g(x)] = f ′ (x) ± g ′ (x)

With a bit more algebra, we can apply the definition of derivatives to get a formula for of
the derivative of a product and a derivative of a quotient.
Product rule:

[f (x)g(x)] = f ′ (x)g(x) + f (x)g ′ (x)

Quotient rule:
′ f ′ (x)g(x) − f (x)g ′ (x)
[f (x)/g(x)] = , g(x) ̸= 0
[g(x)]2

Finally, one way to think of the power of derivatives is that it takes a function a notch down
in complexity. The power rule applies to any higher-order function:
4.1. DERIVATIVES 59

f(x) = 2x
6
2.025
3

f ' (x)
2.000
f(x)

−3 1.975

−6
1.950
−2 0 2 −2 0 2
x x

g(x) = x3

20
20
g ' (x)

10
g(x)

0 10
−10
−20
0
−2 0 2 −2 0 2
x x

Figure 4.1: The Derivative as a Slope


60 CHAPTER 4. CALCULUS

Power rule: [ k ]′
x = kxk−1

For any real number k (that is, both whole numbers and fractions). The power rule is proved
by induction, a neat method of proof used in many fundamental applications to prove that
a general statement holds for every possible case, even if there are countably infinite cases.
We’ll show a simple case where k is an integer here.

Proof of Power Rule by Induction. We would like to prove that

[ k ]′
x = kxk−1
for any integer k.
First, consider the first case (the base case) of k = 1. We can show by the definition of
derivatives (setting f (x) = x1 = 1) that

(x + h) − x
[x1 ]′ = lim = 1.
h→0 (x + h) − x

Because 1 is also expressed as 1x1−1 , the statement we want to prove holds for the case
k = 1.
Now, assume that the statement holds for some integer m. That is, assume

[xm ] = mxm−1

Then, for the case m + 1, using the product rule above, we can simplify

[ m+1 ]′
x = [xm · x]′
= (xm )′ · x + (xm ) · (x)′
= mxm−1 · x + xm ∵ by previous assumption
= mxm + xm
= (m + 1)xm
= (m + 1)x(m+1)−1

Therefore, the rule holds for the case k = m + 1 once we have assumed it holds for k = m.
Combined with the first case, this completes proof by induction – we have now proved that
the statement holds for all integers k = 1, 2, 3, · · ·.
To show that it holds for real fractions as well, we can prove expressing that exponent by a
fraction of two integers.

These “rules” become apparent by applying the definition of the derivative above to each of
the things to be “derived”, but these come up so frequently that it is best to repeat until it
is muscle memory.
4.2. HIGHER-ORDER DERIVATIVES (DERIVATIVES OF DERIVATIVES OF DERIVATIVES)61

Exercise 4.1 (Derivative of Polynomials). For each of the following functions, find the
first-order derivative f ′ (x).
1. f (x) = c
2. f (x) = x
3. f (x) = x2
4. f (x) = x3
5. f (x) = x12
6. f (x) = (x3 )(2x4 )
7. f (x) = x4 − x3 + x2 − x + 1
8. f (x) = (x2 + 1)(x3 − 1)
9. f (x) = 3x2 + 2x1/3
2
10. f (x) = xx2 −1
+1

4.2 Higher-Order Derivatives (Derivatives of Deriva-


tives of Derivatives)
The first derivative is applying the definition of derivatives on the function, and it can be
expressed as

d dy
f ′ (x), y ′ , f (x),
dx dx

We can keep applying the differentiation process to functions that are themselves derivatives.
The derivative of f ′ (x) with respect to x, would then be

f ′ (x + h) − f ′ (x)
f ′′ (x) = lim
h→0 h
and we can therefore call it the Second derivative:

d2 d2 y
f ′′ (x), y ′′ , f (x),
dx2 dx2

Similarly, the derivative of f ′′ (x) would be called the third derivative and is denoted f ′′′ (x).
dn dn y
And by extension, the nth derivative is expressed as dx n f (x), dxn .

Example 4.1 (Succession of Derivatives).

f (x) = x3
f ′ (x) = 3x2
f ′′ (x) = 6x
f ′′′ (x) = 6
f ′′′′ (x) = 0
62 CHAPTER 4. CALCULUS

Earlier, in Section 4.1, we said that if a function differentiable at a given point, then it
must be continuous. Further, if f ′ (x) is itself continuous, then f (x) is called continuously
differentiable. All of this matters because many of our findings about optimization (Section
5) rely on differentiation, and so we want our function to be differentiable in as many layers.
A function that is continuously differentiable infinitly is called “smooth”. Some examples:
f (x) = x2 , f (x) = ex .

4.3 Composite Functions and the Chain Rule


As useful as the above rules are, many functions you’ll see won’t fit neatly in each case
immediately. Instead, they will be functions of functions. For example, the difference
between x2 + 12 and (x2 + 1)2 may look trivial, but the sum rule can be easily applied to
the former, while it’s actually not obvious what do with the latter.
Composite functions are formed by substituting one function into another and are de-
noted by
(f ◦ g)(x) = f [g(x)].
To form f [g(x)], the range of g must be contained (at least in part) within the domain of
f . The domain of f ◦ g consists of all the points in the domain of g for which g(x) is in the
domain of f .

Example 4.2. Let f (x) = log x for 0 < x < ∞ and g(x) = x2 for −∞ < x < ∞.
Then
(f ◦ g)(x) = log x2 , −∞ < x < ∞ − {0}

Also
(g ◦ f )(x) = [log x]2 , 0 < x < ∞

Notice that f ◦ g and g ◦ f are not the same functions.

With the notation of composite functions in place, now we can introduce a helpful addi-
tional rule that will deal with a derivative of composite functions as a chain of concentric
derivatives.
Chain Rule:
Let y = (f ◦ g)(x) = f [g(x)]. The derivative of y with respect to x is

d
{f [g(x)]} = f ′ [g(x)]g ′ (x)
dx

We can read this as: “the derivative of the composite function y is the derivative of f
evaluated at g(x), times the derivative of g.”
The chain rule can be thought of as the derivative of the “outside” times the derivative of
the “inside”, remembering that the derivative of the outside function is evaluated at the
value of the inside function.
4.4. DERIVATIVES OF NATURAL LOGS AND THE EXPONENT 63

• The chain rule can also be written as


dy dy dg(x)
=
dx dg(x) dx

This expression does not imply that the dg(x)’s cancel out, as in fractions. They are
part of the derivative notation and you can’t separate them out or cancel them.)

Example 4.3 (Composite Exponent). Find f ′ (x) for f (x) = (3x2 + 5x − 7)6 .

The direct use of a chain rule is when the exponent of is itself a function, so the power rule
could not have applied generaly:
Generalized Power Rule:
If f (x) = [g(x)]p for any rational number p,

f ′ (x) = p[g(x)]p−1 g ′ (x)

4.4 Derivatives of natural logs and the exponent


Natural logs and exponents (they are inverses of each other; see Section 2.3) crop up every-
where in statistics. Their derivative is a special case from the above, but quite elegant.

Theorem 4.1. The functions ex and the natural logarithm log(x) are continuous and
differentiable in their domains, and their first derivate is

(ex )′ = ex

1
log(x)′ =
x

Also, when these are composite functions, it follows by the generalized power rule that

( )′
eg(x) = eg(x) · g ′ (x)

′ g ′ (x)
(log g(x)) = , if g(x) > 0
g(x)

We will relegate the proofs to small excerpts.

Derivatives of natural exponential function (e)

To repeat the main rule in Theorem 4.1, the intuition is that


d x
1. Derivative of ex is itself: dx e = ex (See Figure 4.2)
64 CHAPTER 4. CALCULUS

20 20

15 15

f ' (x)
f(x)

10 10

5 5

0 0

−2 0 2 −2 0 2
x x
f(x) = ex f(x) = ex

Figure 4.2: Derivative of the Exponential Function

d
2. Same thing if there were a constant in front: dx αex = αex
dn x
3. Same thing no matter how many derivatives there are in front: dx n αe = αex
4. Chain Rule: When the exponent is a function of x, remember to take derivative of
d g(x)
that function and add to product. dx e = eg(x) g ′ (x)

Example 4.4 (Derivative of exponents). Find the derivative for the following.
1. f (x) = e−3x
2
2. f (x) = ex
3. f (x) = (x − 1)ex

Derivatives of log

The natural log is the mirror image of the natural exponent and has mirroring properties,
again, to repeat the theorem,
d
1. log prime x is one over x: dx log x = x1 (Figure 4.3)
d d k
2. Exponents become multiplicative constants: dx log xk = dx k log x = x

d
3. Chain rule again: dx log u(x) = uu(x)
(x)

d x
4. For any positive base b, dx b = (log b) (bx ).
4.4. DERIVATIVES OF NATURAL LOGS AND THE EXPONENT 65

1
40

0
30

−1
f ' (x)
f(x)

20

−2

10

−3

0 1 2 3 0 1 2 3
x x
f(x) = log(x) f(x) = log(x)

Figure 4.3: Derivative of the Natural Log


66 CHAPTER 4. CALCULUS

Example 4.5 (Derivative of logs). Find dy/dx for the following.

1. f (x) = log(x2 + 9)
2. f (x) = log(log x)
3. f (x) = (log x)2
4. f (x) = log ex

Outline of Proof

We actually show the derivative of the log first, and then the derivative of the exponential
naturally follows.

The general derivative of the log at any base a is solvable by the definition of derivatives.

( )
′ 1 h
(loga x) = lim loga 1 +
h→0 h x

h
Re-express g = x and get

1
(loga x)′ =
1
lim log (1 + g) g
x g→0 a
1
= loga e
x

By definition of e. As a special case, when a = e, then (log x)′ = x1 .

Now let’s think about the inverse, taking the derivative of y = ax .

y = ax
⇒ log y = x log a
y′
⇒ = log a
y
⇒ y ′ = y log a

Then in the special case where a = e,

(ex )′ = (ex )
4.5. PARTIAL DERIVATIVES 67

4.5 Partial Derivatives


What happens when there’s more than variable that is changing?
If you can do ordinary derivatives, you can do partial derivatives: just hold all
the other input variables constant except for the one you’re differentiating with
respect to. (Joe Blitzstein’s Math Notes)
Suppose we have a function f now of two (or more) variables and we want to determine the
rate of change relative to one of the variables. To do so, we would find its partial derivative,
which is defined similar to the derivative of a function of one variable.
Partial Derivative: Let f be a function of the variables (x1 , . . . , xn ). The partial derivative
of f with respect to xi is

∂f f (x1 , . . . , xi + h, . . . , xn ) − f (x1 , . . . , xi , . . . , xn )
(x1 , . . . , xn ) = lim
∂xi h→0 h

Only the ith variable changes — the others are treated as constants.
We can take higher-order partial derivatives, like we did with functions of a single variable,
except now the higher-order partials can be with respect to multiple variables.

Example 4.6 (More than one type of partial). Notice that you can take partials with
regard to different variables.
Suppose f (x, y) = x2 + y 2 . Then

∂f
(x, y) =
∂x
∂f
(x, y) =
∂y
∂2f
(x, y) =
∂x2
∂2f
(x, y) =
∂x∂y

Exercise 4.2. Let f (x, y) = x3 y 4 + ex − log y. What are the following partial derivaitves?

∂f
(x, y) =
∂x
∂f
(x, y) =
∂y
∂2f
(x, y) =
∂x2
∂2f
(x, y) =
∂x∂y
68 CHAPTER 4. CALCULUS

4.6 Taylor Series Approximation


A common form of approximation used in statistics involves derivatives. A Taylor series is
a way to represent common functions as infinite series (a sum of infinite elements) of the
function’s derivatives at some point a.
For example, Taylor series are very helpful in representing nonlinear (read: difficult) func-
tions as linear (read: manageable) functions. One can thus approximate functions by
using lower-order, finite series known as Taylor polynomials. If a = 0, the series is called
a Maclaurin series.
Specifically, a Taylor series of a real or complex function f (x) that is infinitely differentiable
in the neighborhood of point a is:

f ′ (a) f ′′ (a)
f (x) = f (a) + (x − a) + (x − a)2 + · · ·
1! 2!
∑∞
f (n) (a)
= (x − a)n
n=0
n!

Taylor Approximation: We can often approximate the curvature of a function f (x) at


point a using a 2nd order Taylor polynomial around point a:

f ′ (a) f ′′ (a)
f (x) = f (a) + (x − a) + (x − a)2 + R2
1! 2!
R2 is the remainder (R for remainder, 2 for the fact that we took two derivatives) and often
treated as negligible, giving us:

f ′′ (a)
f (x) ≈ f (a) + f ′ (a)(x − a) + (x − a)2
2
The more derivatives that are added, the smaller the remainder R and the more accurate
the approximation. Proofs involving limits guarantee that the remainder converges to 0 as
the order of derivation increases.

4.7 The Indefinite Integration


So far, we’ve been interested in finding the derivative f = F ′ of a function F . However,
sometimes we’re interested in exactly the reverse: finding the function F for which f is its
derivative. We refer to F as the antiderivative of f .

Definition 4.2 (Antiderivative). The antiverivative of a function f (x) is a differentiable


function F whose derivative is f .

F ′ = f.
4.7. THE INDEFINITE INTEGRATION 69

Another way to describe is through the inverse formula. Let DF be the derivative of F .
And let DF (x) be the derivative of F evaluated at x. Then the antiderivative is denoted by
D−1 (i.e., the inverse derivative). If DF = f , then F = D−1 f .
This definition bolsters the main takeaway about integrals and derivatives: They are inverses
of each other.

Exercise 4.3 (Antiderivative). Find the antiderivative of the following:


1. f (x) = x12
2. f (x) = 3e3x

We know from derivatives how to manipulate F to get f . But how do you express the
procedure to manipulate f to get F ? For that, we need a new symbol, which we will call
indefinite integration.

Definition 4.3 (Indefinite Integral). The indefinite integral of f (x) is written



f (x)dx

and is equal to the antiderivative of f .



Example 4.7. Draw the function f (x) and its indefinite integral, f (x)dx

f (x) = (x2 − 4)

Solution. The Indefinite Integral of the function f (x) = (x2 − 4) can, for example, be
F (x) = 31 x3 − 4x. But it can also be F (x) = 13 x3 − 4x + 1, because the constant 1 disappears
when taking the derivative.
Some of these functions are plotted in the bottom panel of Figure 4.4 as dotted lines.
Notice from these examples that while there is only a single derivative for any function,
there are multiple antiderivatives: one for any arbitrary constant c. c just shifts the curve
up or down on the y-axis. If more information is present about the antiderivative — e.g.,
that it passes through a particular point — then we can solve for a specific value of c.

Common Rules of Integration

Some common rules of integrals follow by virtue of being the inverse of a derivative.
∫ ∫
1. Constants are allowed to slip out: af (x)dx = ∫ a f (x)dx ∫ ∫
2. Integration of the sum
∫ isnsum of 1integrations: [f (x) + g(x)]dx = f (x)dx + g(x)dx
3. Reverse Power-rule: x dx = n+1 xn+1 + c

4. Exponents are still exponents: ex dx = ex + c ∫
5. Recall the derivative of log(x) is one over x, and so: x1 dx = log x + c
70 CHAPTER 4. CALCULUS

10

5
f(x)

−4 −2 0 2 4
x

4
⌠f(x)dx

0

−4

−4 −2 0 2 4
x

Figure 4.4: The Many Indefinite Integrals of a Function


4.8. THE DEFINITE INTEGRAL: THE AREA UNDER THE CURVE 71

Evaluating f with width = 1 intervals Evaluating f with width = 0.1 intervals


100 100

75 75
f(x)

f(x)
50 50

25 25

0 0
0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0
x x

Figure 4.5: The Riemann Integral as a Sum of Evaluations

∫ f (x) ′
6. Reverse chain-rule:
∫ e f (x)dx = ef (x) + c
n ′ 1
7. More generally: [f (x)] f (x)dx = n+1 [f (x)]n+1 + c
∫ ′ (x)
8. Remember the derivative of a log of a function: ff (x) dx = log f (x) + c

Example 4.8 (Common Integration). Simplify the following indefinite integrals:



• ∫ 3x2 dx
• ∫ (2x + 1)dx
x
• ex ee dx

4.8 The Definite Integral: The Area under the Curve


If there is a indefinite integral, there must be a definite integral. Indeed there is, but
the notion of definite integrals comes from a different objective: finding the are a under a
function. We will find, perhaps remarkably, that the formula we find to get the sum turns
out to be expressible by the anti-derivative.
Suppose we want to determine the area A(R) of a region R defined by a curve f (x) and
some interval a ≤ x ≤ b.
One way to calculate the area would be to divide the interval a ≤ x ≤ b into n subintervals
72 CHAPTER 4. CALCULUS

of length ∆x and then approximate the region with a series of rectangles, where the base
of each rectangle is ∆x and the height is f (x) at the midpoint of that interval. A(R) would
then be approximated by the area of the union of the rectangles, which is given by

n
S(f, ∆x) = f (xi )∆x
i=1

and is called a Riemann sum.


As we decrease the size of the subintervals ∆x, making the rectangles “thinner,” we would
expect our approximation of the area of the region to become closer to the true area. This
allows us to express the area as a limit of a series:

n
A(R) = lim f (xi )∆x
∆x→0
i=1

Figure 4.5 shows that illustration. The curve depicted is f (x) = −15(x − 5) + (x − 5)3 + 50.
We want approximate the area under the curve between the x values of 0 and 10. We can
do this in blocks of arbitrary width, where the sum of rectangles (the area of which is width
times f (x) evaluated at the midpoint of the bar) shows the Riemann Sum. As the width of
the bars ∆x becomes smaller, the better the estimate of A(R).
This is how we define the “Definite” Integral:

Definition 4.4 (The Definite Integral (Riemann)). If for a given function f the Riemann
sum approaches a limit as ∆x → ∫0, then that limit is called the Riemann integral of f from
a to b. We express this with the , symbol, and write
∫b ∑
n
f (x)dx = lim f (xi )∆x
∆x→0
a i=1

The most straightforward of a definite integral is the definite integral. That is, we read
∫b
f (x)dx
a

as the definite integral of f from a to b and we defined as the area under the “curve” f (x)
from point x = a to x = b.

The fundamental theorem of calculus shows us that this sum is, in fact, the antiderivative.

Theorem 4.2 (First Fundamental Theorem of Calculus). Let the function f be bounded
on [a, b] and continuous on (a, b). Then, suggestively, use the symbol F (x) to denote the
definite integral from a to x:
∫x
F (x) = f (t)dt, a≤x≤b
a
4.8. THE DEFINITE INTEGRAL: THE AREA UNDER THE CURVE 73

Then F (x) has a derivative at each point in (a, b) and

F ′ (x) = f (x), a<x<b

That is, the definite integral function of f is the one of the antiderivatives of some f .

This is again a long way of saying that that differentiation is the inverse of integration. But
now, we’ve covered definite integrals.
The second theorem gives us a simple way of computing a definite integral as a function of
indefinite integrals.

Theorem 4.3 (Second Fundamental Theorem of Calculus). Let the function f be bounded
on [a, b] and continuous on (a, b). Let F be any function that is continuous on [a, b] such
that F ′ (x) = f (x) on (a, b). Then
∫b
f (x)dx = F (b) − F (a)
a

∫b
So the procedure to calculate a simple definite integral f (x)dx is then
a

1. Find the indefinite integral F (x).


2. Evaluate F (b) − F (a).

∫3
Example 4.9 (Definite Integral of a monomial). Solve 3x2 dx.
1

Let f (x) = 3x2 .

∫2 x
Exercise 4.4. What is the value of ex ee dx?
−2

Common Rules for Definite Integrals

The area-interpretation of the definite integral provides some rules for simplification.
1. There is no area below a point:
∫a
f (x)dx = 0
a

2. Reversing the limits changes the sign of the integral:


∫b ∫a
f (x)dx = − f (x)dx
a b
74 CHAPTER 4. CALCULUS

3. Sums can be separated into their own integrals:

∫b ∫b ∫b
[αf (x) + βg(x)]dx = α f (x)dx + β g(x)dx
a a a

4. Areas can be combined as long as limits are linked:

∫b ∫c ∫c
f (x)dx + f (x)dx = f (x)dx
a b a

Exercise 4.5 (Definite integral shortcuts). Simplify the following definite intergrals.
∫1
1. 3x2 dx =
1
∫4
2. (2x + 1)dx =
0
∫0 x ∫2 x
3. ex ee dx + ex ee dx =
−2 0

4.9 Integration by Substitution


From the second fundamental theorem of calculus, we now that a quick way to get a definite
integral is to first find the indefinite integral, and then just plug in the bounds.
Sometimes the integrand (the thing that we are trying to take an integral of) doesn’t appear
integrable using common rules and antiderivatives. A method one might try is integration
by substitution, which is related to the Chain Rule.
Suppose we want to find the indefinite integral

g(x)dx

but g(x) is complex and none of the formulas we have seen so far seem to apply immediately.
The trick is to come up with a new function u(x) such that

g(x) = f [u(x)]u′ (x).

Why does an introduction of yet another function end of simplifying things? Let’s refer to
the antiderivative of f as F . Then the chain rule tells us that
d
F [u(x)] = f [u(x)]u′ (x)
dx
. So, F [u(x)] is the antiderivative of g. We can then write
∫ ∫ ∫
′ d
g(x)dx = f [u(x)]u (x)dx = F [u(x)]dx = F [u(x)] + c
dx
4.10. INTEGRATION BY PARTS 75


To summarize, the procedure to determine the indefinite integral g(x)dx by the method
of substitution:
1. Identify some part of g(x) that might be simplified by substituting in a single variable
u (which will then be a function of x).
2. Determine if g(x)dx can be reformulated in terms of u and du.
3. Solve the indefinite integral.
4. Substitute back in for x
Substitution can also be used to calculate a definite integral. Using the same procedure as
above,
∫b ∫d
g(x)dx = f (u)du = F (d) − F (c)
a c

where c = u(a) and d = u(b).

Example 4.10 (Integration by Substitution I). Solve the indefinite integral




x2 x + 1dx.


For the above problem, we could have also used the substitution u = x + 1. Then x = u2 −1
and dx = 2udu. Substituting these in, we get
∫ ∫

x2 x + 1dx = (u2 − 1)2 u2udu

which when expanded is again a polynomial and gives the same result as above.
Another case in which integration by substitution is is useful is with a fraction.

Example 4.11 (Integration by Substitutiton II). Simplify

∫1
5e2x
dx.
(1 + e2x )1/3
0

4.10 Integration by Parts


Another useful integration technique is integration by parts, which is related to the
Product Rule of differentiation. The product rule states that
d dv du
(uv) = u +v
dx dx dx
Integrating this and rearranging, we get
∫ ∫
dv du
u dx = uv − v dx
dx dx
76 CHAPTER 4. CALCULUS

or ∫ ∫
u(x)v ′ (x)dx = u(x)v(x) − v(x)u′ (x)dx

More easily remembered with the mnemonic “Ultraviolet Voodoo”:


∫ ∫
udv = uv − vdu

where du = u′ (x)dx and dv = v ′ (x)dx.


For definite integrals, this is simply

∫b ∫b
dv b du
u dx = uv|a − v dx
dx dx
a a

Our goal here is to find expressions for u and dv that, when substituted into the above
equation, yield an expression that’s more easily evaluated.

Example 4.12 (Integration by Parts). Simplify the following integrals. These seemingly
obscure forms of integrals come up often when integrating distributions.


xeax dx

dv
Solution. Let u = x and dx = eax . Then du = dx and v = (1/a)eax . Substituting this into
the integration by parts formula, we obtain

∫ ∫
xeax dx = uv − vdu
( ) ∫
1 ax 1 ax
= x e − e dx
a a
1 ax 1
= xe − 2 eax + c
a a

Exercise 4.6 (Integration by Parts II). 1. Integrate



xn eax dx

2. Integrate

x3 e−x dx
2
4.10. INTEGRATION BY PARTS 77

Answers to Examples and Exercises

Exercise 4.1
Solution. 1. f ′ (x) = 0

2. f (x) = 1
3. f ′ (x) = 2x3
4. f ′(x) = 3x2
5. f ′(x) = −2x−3
6. f ′(x) = 14x6
7. f ′(x) = 4x3 − 3x2 + 2x − 1
8. f ′(x) = 5x4 + 3x2 − 2x
−2
9. f ′(x) = 6x + 23 x 3
−4x
10. f ′(x) = x4 −2x 2 +1

Example 4.3
Solution. For convenience, define f (z) = z 6 and z = g(x) = 3x2 + 5x − 7. Then, y = f [g(x)]
and

d
y = f ′ (z)g ′ (x)
dx
= 6(3x2 + 5x − 7)5 (6x + 5)

Example 4.4
Solution. 1. Let u(x) = −3x. Then u′ (x) = −3 and f ′ (x) = −3e−3x .
2. Let u(x) = x2 . Then u′ (x) = 2x and f ′ (x) = 2xex .
2

Example 4.5
Solution. 1. Let u(x) = x2 + 9. Then u′ (x) = 2x and

dy u′ (x) 2x
= = 2
dx u(x) (x + 9)

2. Let u(x) = log x. Then u′ (x) = 1/x and dy


dx = 1
(x log x) .
3. Use the generalized power rule.

dy (2 log x)
=
dx x

4. We know that log ex = x and that dx/dx = 1, but we can double check. Let u(x) = ex .
′ x
Then u′ (x) = ex and dx
dy
= uu(x)
(x)
= eex = 1.

Example 4.9
d 3
Solution. What is F (x)? From the power rule, recognize dx x = 3x2 so
78 CHAPTER 4. CALCULUS

F (x) = x3
∫3
f (x)dx = F (x = 3) − F (x − 1)
1
= 33 − 13
= 26

Example 4.10 √ √
Solution. The problem here is the x + 1 term. However, if the integrand had x times
some polynomial, then we’d be in business. Let’s try u = x+1. Then x = u−1 and dx = du.
Substituting these into the above equation, we get

∫ ∫
√ √
x 2
x + 1dx = (u − 1)2 udu

= (u2 − 2u + 1)u1/2 du

= (u5/2 − 2u3/2 + u1/2 )du

We can easily integrate this, since it is just a polynomial. Doing so and substituting u = x+1
back in, we get
∫ [ ]
√ 1 2 1
x2 x + 1dx = 2(x + 1)3/2 (x + 1)2 − (x + 1) + +c
7 5 3

Example 4.11
Solution. When an expression is raised to a power, it is often helpful to use this expression
as the basis for a substitution. So, let u = 1 + e2x . Then du = 2e2x dx and we can set
5e2x dx = 5du/2. Additionally, u = 2 when x = 0 and u = 1 + e2 when x = 1. Substituting
all of this in, we get

∫1 ∫
2
1+e
5e2x 5 du
2x 1/3
dx =
(1 + e ) 2 u1/3
0 2


2
1+e
5
= u−1/3 du
2
2
1+e2
15 2/3
= u
4 2
= 9.53
4.10. INTEGRATION BY PARTS 79

Exercise 4.6
1. ∫
xn eax dx
Solution. As in the first problem, let
u = xn , dv = eax dx
Then du = nxn−1 dx and v = (1/a)eax .
Substituting these into the integration by parts formula gives

∫ ∫
n ax
x e dx = uv − vdu
( ) ∫
1 ax 1 ax n−1
= xn e − e nx dx
a a

1 n ax n
= x e − xn−1 eax dx
a a

Notice that we now have an integral similar to the previous one, but with xn−1 instead of
xn .
For a given n, we would repeat the integration by parts∫ procedure until the integrand was
directly integratable — e.g., when the integral became eax dx.
2. ∫
x3 e−x dx
2

Solution. We could, as before, choose u = x3 and dv = e−x dx. But we can’t then find v —
2

i.e., integrating e−x dx isn’t possible. Instead, notice that


2

d −x2
= −2xe−x ,
2
e
dx
which can be factored out of the original integrand
∫ ∫
3 −x2
dx = x2 (xe−x )dx.
2
x e

We can then let u = x2 and dv = xe−x dx. Thedu = 2xdx and v = − 12 e−x . Substituting
2 2

these in, we have

∫ ∫
3 −x2
x e dx = uv − vdu
( ) ∫ ( )
1 −x2 1 −x2
= x − e
2
− − e 2xdx
2 2

1
− x2 e−x + xe−x dx
2 2
=
2
1 1
− x2 e−x − e−x + c
2 2
=
2 2
80 CHAPTER 4. CALCULUS
Chapter 5

Optimization

To optimize, we use derivatives and calculus. Optimization is to find the maximum or


minimum of a functon, and to find what value of an input gives that extremum. This has
obvious uses in engineering. Many tools in the statistical toolkit use optimization. One of
the most common ways of estimating a model is through “Maximum Likelihood Estimation”,
done via optimizing a function (the likelihood).
Optimization also comes up in Economics, Formal Theory, and Political Economy all the
time. A go-to model of human behavior is that they optimize a certain utility function.
Humans are not pure utility maximizers, of course, but nuanced models of optimization –
for example, adding constraints and adding uncertainty – will prove to be quite useful.

Example: Meltzer-Richard
A standard backdrop in comparative political economy, the Meltzer-Richard (1981) model
states that redistribution of wealth should be higher in societies where the median income
is much smaller than the average income. More to the point, typically income distributions
wher ethe median is very different from the average is one of high inequality. In other
words, the Meltzer-Richard model says that highly unequal economies will have more re-
distribution of wealth. Why is that the case? Here is a simplified example that is not the
exact model by Meltzer and Richard1 , but adapted from Persson and Tabellini2
We will set the following things about our model human and model democracy.
• Individuals are indexed by i, and the total population is normalized to unity (“1”)
without loss of generality.
• U (·), u for “utility”, is a function that is concave and increasing, and expresses the
utility gained from public goods. This tells us that its first derivative is positive, and
its second derivative is negative.
1 Allan H. Meltzer and Scott F. Richard. “A Rational Theory of the Size of Government”. Journal of
Political Economy 89:5 (1981), p. 914-927
2 Adapted from Torsten Persson and Guido Tabellini, Political Economics: Explaining Economic Policy.

MIT Press.

81
82 CHAPTER 5. OPTIMIZATION

• yi is the income of person i


• Wi , w for “welfare”, is the welfare of person i
• ci , c for “consumption”, is the consumption utility of person i
Also, the government is democratically elected and sets the following redistribution output:
• τ , t for “tax”, is a flat tax rate between 0 and 1 that is applied to everyone’s income.
• g, “g” for “goods”, is the amount of public goods that the government provides.
Suppose an individual’s welfare is given by:

Wi = ci + U (g)

The consumption good is the person’s post-tax income.

ci = (1 − τ )yi

Income varies by person (In the next section we will cover probability, by then we will
know that we can express this by saying that y is a random variable with the cumulative
distribution function F , i.e. y ∼ F .). Every distribution has a mean and an median.
• E(y) is the average income of the society.
• med(y) is the median income of the society.
What will happen in this economy? What will the tax rate be set too? How much public
goods will be provided?
We’ve skipped ahead of some formal theory results of demoracy, but hopefully these are
conceptually intuitive. First, if a democracy is competitive, there is no slack in the govern-
ment’s goods, and all tax revenue becomes a public good. So we can go ahead and set the
constraint:


g= τ yi P (yi ) = τ E(y)
i

We can do this trick because of the “normalizes to unity” setting, but this is a general
property of the average.
Now given this constraint we can re-write an individual’s welfare as

( )
g
Wi = 1− yi + U (g)
E(y)
1
= (E(y) − g) yi + U (g)
E(y)
yi
= (E(y) − g) + U (g)
E(y)

When is the individual’s welfare maximized, as a function of the public good?


5.1. MAXIMA AND MINIMA 83

d yi d
Wi = − + U (g)
dg E(y) dg

d d yi d
dg Wi = 0 when dg U (g) = E(y) , and so after expressing the derivative as Ug = dg U (g) for
simplicity,
( )
−1 yi
gi⋆ = Ug
E(y)

Now recall that because we assumed concavity, Ug is a negative sloping function whose
value is positive. It can be shown that the inverse of such a function is also decreasing.
Thus an individual’s preferred level of government is determined by a single continuum, the
person’s income divided by the average income, and the function is decreasing in yi . This
is consistent with our intuition that richer people prefer less redistribution.
That was the amount for any given person. The government has to set one value of g,
however. So what will that be? Now we will use another result, the median voter theorem.
This says that under certain general electoral conditions (single-peaked preferences, two
parties, majority rule), the policy winner will be that preferred by the median person in the
population. Because the only thing that determines a person’s preferred level of government
is yi /E(y), we can presume that the median voter, whose income is med(y) will prevail in
their preferred choice of government. Therefore, we wil see
( )
⋆ −1 med(y)
g = Ug
E(y)

What does this say about the level of redistribution we observe in an economy? The higher
the average income is than the median income, which often (but not always) means more
inequality, there should be more redistribution.

5.1 Maxima and Minima


The first derivative, f ′ (x), quantifies the slope of a function. Therefore, it can be used to
check whether the function f (x) at the point x is increasing or decreasing at x.
1. Increasing: f ′ (x) > 0
2. Decreasing: f ′ (x) < 0
3. Neither increasing nor decreasing: f ′ (x) = 0 i.e. a maximum, minimum, or saddle
point
So for example, f (x) = x2 + 2 and f ′ (x) = 2x

Exercise 5.1 (Plotting a mazimum and minimum). Plot f (x) = x3 + x2 + 2, plot its
derivative, and identifiy where the derivative is zero. Is there a maximum or minimum?
84 CHAPTER 5. OPTIMIZATION

9
3

6
f ' (x)
f(x)

3
−3

0 −6

−2 0 2 −2 0 2
x x

Figure 5.1: Maxima and Minima


5.2. CONCAVITY OF A FUNCTION 85

The second derivative f ′′ (x) identifies whether the function f (x) at the point x is
1. Concave down: f ′′ (x) < 0
2. Concave up: f ′′ (x) > 0
Maximum (Minimum): x0 is a local maximum (minimum) if f (x0 ) > f (x) (f (x0 ) <
f (x)) for all x within some open interval containing x0 . x0 is a global maximum (mini-
mum) if f (x0 ) > f (x) (f (x0 ) < f (x)) for all x in the domain of f .
Given the function f defined over domain D, all of the following are defined as critical
points:
1. Any interior point of D where f ′ (x) = 0.
2. Any interior point of D where f ′ (x) does not exist.
3. Any endpoint that is in D.
The maxima and minima will be a subset of the critical points.
Second Derivative Test of Maxima/Minima: We can use the second derivative to tell
us whether a point is a maximum or minimum of f (x).
1. Local Maximum: f ′ (x) = 0 and f ′′ (x) < 0
2. Local Minimum: f ′ (x) = 0 and f ′′ (x) > 0
3. Need more info: f ′ (x) = 0 and f ′′ (x) = 0
Global Maxima and Minima Sometimes no global max or min exists — e.g., f (x) not
bounded above or below. However, there are three situations where we can fairly easily
identify global max or min.
1. Functions with only one critical point. If x0 is a local max or min of f and it is
the only critical point, then it is the global max or min.
2. Globally concave up or concave down functions. If f ′′ (x) is never zero, then
there is at most one critical point. That critical point is a global maximum if f ′′ < 0
and a global minimum if f ′′ > 0.
3. Functions over closed and bounded intervals must have both a global maximum
and a global minimum.

Example 5.1 (Maxima and Minima by drawing). Find any critical points and identify
whether they are a max, min, or saddle point:
1. f (x) = x2 + 2
2. f (x) = x3 + 2
3. f (x) = |x2 − 1|, x ∈ [−2, 2]

5.2 Concavity of a Function


Concavity helps identify the curvature of a function, f (x), in 2 dimensional space.

Definition 5.1 (Concave Function). A function f is strictly concave over the set S if
∀x1 , x2 ∈ S and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) > af (x1 ) + (1 − a)f (x2 )
86 CHAPTER 5. OPTIMIZATION

Any line connecting two points on a concave function will lie below the function.

Concave Convex
0
15

−5
10
f(x)

f(x)
−10
5

−15
0

−4 −2 0 2 4 −4 −2 0 2 4
x x

Definition 5.2 (Convex Function). Convex: A function f is strictly convex over the set S
if ∀x1 , x2 ∈ S and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) < af (x1 ) + (1 − a)f (x2 )

Any line connecting two points on a convex function will lie above the function.

Sometimes, concavity and convexity are strict of a requirement. For most purposes of getting
solutions, what we call quasi-concavity is enough.

Definition 5.3 (Quasiconcave Function). A function f is quasiconcave over the set S if


∀x1 , x2 ∈ S and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) ≥ min(f (x1 ), f (x2 ))

No matter what two points you select, the lowest valued point will always be an end point.

Definition 5.4 (Quasiconvex). A function f is quasiconvex over the set S if ∀x1 , x2 ∈ S


and ∀a ∈ (0, 1),
f (ax1 + (1 − a)x2 ) ≤ max(f (x1 ), f (x2 ))
No matter what two points you select, the highest valued point will always be an end point.
5.2. CONCAVITY OF A FUNCTION 87

Second Derivative Test of Concavity: The second derivative can be used to understand
concavity.
If
f ′′ (x) < 0 ⇒ Concave
f ′′ (x) > 0 ⇒ Convex

Quadratic Forms

Quadratic forms is shorthand for a way to summarize a function. This is important for
finding concavity because
1. Approximates local curvature around a point — e.g., used to identify max vs min vs
saddle point.
2. They are simple to express even in n dimensions:
3. Have a matrix representation.
Quadratic Form: A polynomial where each term is a monomial of degree 2 in any number
of variables:

One variable: Q(x1 ) = a11 x21


Two variables: Q(x1 , x2 ) = a11 x21 + a12 x1 x2 + a22 x22

N variables: Q(x1 , · · · , xn ) = aij xi xj
i≤j

which can be written in matrix terms:


One variable

Q(x) = x⊤
1 a11 x1

N variables:

  
a11 1
2 a12 ··· 1
2 a1n x1
(  1
)  2 a12 ··· 1   
a22 2 2n   x2 
a
Q(x) = x1 x2 ··· xn  . .. .. ..   .. 
 .. . . .  . 
1
2 a1n
1
a
2 2n ··· ann xn

= x Ax

For example, the Quadratic on R2 :

( )( )
( ) a11 1
Q(x1 , x2 ) = x1 x2 2 a12 x1
1
2 a12 a22 x2
= a11 x21 + a12 x1 x2 + a22 x22
88 CHAPTER 5. OPTIMIZATION

Definiteness of Quadratic Forms

When the function f (x) has more than two inputs, determining whether it has a maxima
and minima (remember, functions may have many inputs but they have only one output)
is a bit more tedious. Definiteness helps identify the curvature of a function, Q(x), in n
dimensional space.
Definiteness: By definition, a quadratic form always takes on the value of zero when x = 0,
Q(x) = 0 at x = 0. The definiteness of the matrix A is determined by whether the quadratic
form Q(x) = x⊤ Ax is greater than zero, less than zero, or sometimes both over all x ≠ 0.

5.3 FOC and SOC

We can see from a graphical representation that if a point is a local maxima or minima, it
must meet certain conditions regarding its derivative. These are so commonly used that we
refer these to “First Order Conditions” (FOCs) and “Second Order Conditions” (SOCs) in
the economic tradition.

First Order Conditions

When we examined functions of one variable x, we found critical points by taking the first
derivative, setting it to zero, and solving for x. For functions of n variables, the critical
points are found in much the same way, except now we set the partial derivatives equal to
zero. Note: We will only consider critical points on the interior of a function’s domain.
In a derivative, we only took the derivative with respect to one variable at a time. When
we take the derivative separately with respect to all variables in the elements of x and then
express the result as a vector, we use the term Gradient and Hessian.

Definition 5.5 (Gradient). Given a function f (x) in n variables, the gradient ∇f (x) (the
greek letter nabla ) is a column vector, where the ith element is the partial derivative of
f (x) with respect to xi :

 ∂f (x) 
∂x
 ∂f (x)
1

 ∂x2 

∇f (x) =  .  
 .. 
∂f (x)
∂xn

Before we know whether a point is a maxima or minima, if it meets the FOC it is a “Critical
Point”.

Definition 5.6 (Critical Point). x∗ is a critical point if and only if ∇f (x∗ ) = 0. If the
partial derivative of f(x) with respect to x∗ is 0, then x∗ is a critical point. To solve for x∗ ,
5.3. FOC AND SOC 89

find the gradient, set each element equal to 0, and solve the system of equations.
 ∗
x1
 x∗2 
 
x∗ =  . 
 .. 
x∗n

Example 5.2. Example: Given a function f (x) = (x1 − 1)2 + x22 + 1, find the (1) Gradient
and (2) Critical point of f (x).

Solution. Gradient

( )
∂f (x)
∇f (x) = ∂x1
∂f (x)
∂x2
( )
2(x1 − 1)
=
2x2

Critical Point x∗ =

∂f (x)
= 2(x1 − 1) = 0
∂x1
⇒ x∗1 = 1
∂f (x)
= 2x2 = 0
∂x2
⇒ x∗2 = 0

So
x∗ = (1, 0)

Second Order Conditions

When we found a critical point for a function of one variable, we used the second derivative
as a indicator of the curvature at the point in order to determine whether the point was a
min, max, or saddle (second derivative test of concavity). For functions of n variables, we
use second order partial derivatives as an indicator of curvature.

Definition 5.7 (Hessian). Given a function f (x) in n variables, the hessian H(x) is an
n × n matrix, where the (i, j)th element is the second order partial derivative of f (x) with
respect to xi and xj :
90 CHAPTER 5. OPTIMIZATION

 
∂ 2 f (x) ∂ 2 f (x) ∂ 2 f (x)
∂x21
···
 ∂x1 ∂x2 ∂x1 ∂xn

 2 
 ∂ f (x) ∂ 2 f (x)
··· ∂ 2 f (x) 
 ∂x22 ∂x2 ∂xn 
H(x) =  ∂x2 ∂x1 
 .. .. .. .. 
 . . . . 
 2 
∂ f (x) ∂ 2 f (x) ∂ 2 f (x)
∂xn ∂x1 ∂xn ∂x2 ··· ∂x2n

∂f (x) ∂f (x)
Note that the hessian will be a symmetric matrix because ∂x1 ∂x2 = ∂x2 ∂x1 .

Also note that given that f (x) is of quadratic form, each element of the hessian will be a
constant.
These definitions will be employed when we determine the Second Order Conditions of
a function:
Given a function f (x) and a point x∗ such that ∇f (x∗ ) = 0,
1. Hessian is Positive Definite =⇒ Strict Local Min
2. Hessian is Positive Semidefinite ∀x ∈ B(x∗ , ϵ)} =⇒ Local Min
3. Hessian is Negative Definite =⇒ Strict Local Max
4. Hessian is Negative Semidefinite ∀x ∈ B(x∗ , ϵ)} =⇒ Local Max
5. Hessian is Indefinite =⇒ Saddle Point

Example 5.3 (Max and min with two dimensions). We found that the only critical point
of f (x) = (x1 − 1)2 + x22 + 1 is at x∗ = (1, 0). Is it a min, max, or saddle point?

Solution. The Hessian is

( )
2 0
H(x) =
0 2

The Leading principal minors of the Hessian are M1 = 2; M2 = 4. Now we consider Defi-
niteness. Since both leading principal minors are positive, the Hessian is positive definite.
Maxima, Minima, or Saddle Point? Since the Hessian is positive definite and the gradient
equals 0, x⋆ = (1, 0) is a strict local minimum.
Note: Alternate check of definiteness. Is H(x∗ ) ≥≤ 0 ∀ x ̸= 0

( )
x⊤ H(x∗ )x = x1 x2
( )
2 0
=
0 2
( )
x1
= 2x21 + 2x22
x2

For any x ̸= 0, 2(x21 + x22 ) > 0, so the Hessian is positive definite and x∗ is a strict local
minimum.
5.4. GLOBAL MAXIMA AND MINIMA 91

Definiteness and Concavity

Although definiteness helps us to understand the curvature of an n-dimensional function, it


does not necessarily tell us whether the function is globally concave or convex.
We need to know whether a function is globally concave or convex to determine whether a
critical point is a global min or max. We can use the definiteness of the Hessian to determine
whether a function is globally concave or convex:
1. Hessian is Positive Semidefinite ∀x} =⇒ Globally Convex
2. Hessian is Negative Semidefinite ∀x} =⇒ Globally Concave
Notice that the definiteness conditions must be satisfied over the entire domain.

5.4 Global Maxima and Minima


Global Max/Min Conditions: Given a function f (x) and a point x∗ such that ∇f (x∗ ) =
0,
1. f (x) Globally Convex =⇒ Global Min
2. f (x) Globally Concave =⇒ Global Max
Note that showing that H(x∗ ) is negative semidefinite is not enough to guarantee x∗ is a
local max. However, showing that H(x) is negative semidefinite for all x guarantees that
x∗ is a global max. (The same goes for positive semidefinite and minima.)\
Example: Take f1 (x) = x4 and f2 (x) = −x4 . Both have x = 0 as a critical point. Unfortu-
nately, f1′′ (0) = 0 and f2′′ (0) = 0, so we can’t tell whether x = 0 is a min or max for either.
However, f1′′ (x) = 12x2 and f2′′ (x) = −12x2 . For all x, f1′′ (x) ≥ 0 and f2′′ (x) ≤ 0 — i.e.,
f1 (x) is globally convex and f2 (x) is globally concave. So x = 0 is a global min of f1 (x) and
a global max of f2 (x).

Exercise 5.2. Given f (x) = x31 − x32 + 9x1 x2 , find any maxima or minima.

1. First order conditions.


(a) Gradient ∇f (x) =

(b) Critical Points x∗ =


92 CHAPTER 5. OPTIMIZATION

2. Second order conditions.


(a) Hessian H(x) =

(b) Hessian H(x1∗ ) =

(c) Leading principal minors of H(x1∗ ) =

(d) Definiteness of H(x1∗ )?

(e) Maxima, Minima, or Saddle Point for x1∗ ?

(f) Hessian H(x2∗ ) =

(g) Leading principal minors of H(x2∗ ) =

(h) Definiteness of H(x2∗ )?

(i) Maxima, Minima, or Saddle Point for x2∗ ?

3. Global concavity/convexity.
(a) Is f(x) globally concave/convex?

(b) Are any x∗ global minima or maxima?


5.5. CONSTRAINED OPTIMIZATION 93

Figure 5.2: A typical Utility Function with a Budget Constraint

5.5 Constrained Optimization

We have already looked at optimizing a function in one or more dimensions over the whole
domain of the function. Often, however, we want to find the maximum or minimum of a
function over some restricted part of its domain.
ex: Maximizing utility subject to a budget constraint
Types of Constraints: For a function f (x1 , . . . , xn ), there are two types of constraints
that can be imposed:
1. Equality constraints: constraints of the form c(x1 , . . . , xn ) = r. Budget constraints
are the classic example of equality constraints in social science.
2. Inequality constraints: constraints of the form c(x1 , . . . , xn ) ≤ r. These might arise
from non-negativity constraints or other threshold effects.
In any constrained optimization problem, the constrained maximum will always be less
than or equal to the unconstrained maximum. If the constrained maximum is less than the
unconstrained maximum, then the constraint is binding. Essentially, this means that you
can treat your constraint as an equality constraint rather than an inequality constraint.
For example, the budget constraint binds when you spend your entire budget. This generally
happens because we believe that utility is strictly increasing in consumption, i.e. you always
94 CHAPTER 5. OPTIMIZATION

want more so you spend everything you have.


Any number of constraints can be placed on an optimization problem. When working with
multiple constraints, always make sure that the set of constraints are not pathological; it
must be possible for all of the constraints to be satisfied simultaneously.
Set-up for Constrained Optimization:

max f (x1 , x2 ) s.t. c(x1 , x2 )


x1 ,x2

min f (x1 , x2 ) s.t. c(x1 , x2 )


x1 ,x2

This tells us to maximize/minimize our function, f (x1 , x2 ), with respect to the choice vari-
ables, x1 , x2 , subject to the constraint.
Example:
max f (x1 , x2 ) = −(x21 + 2x22 ) s.t. x1 + x2 = 4
x1 ,x2

It is easy to see that the unconstrained maximum occurs at (x1 , x2 ) = (0, 0), but that does
not satisfy the constraint. How should we proceed?

Equality Constraints

Equality constraints are the easiest to deal with because we know that the maximum or
minimum has to lie on the (intersection of the) constraint(s).
The trick is to change the problem from a constrained optimization problem in n variables
to an unconstrained optimization problem in n + k variables, adding one variable for each
equality constraint. We do this using a lagrangian multiplier.
Lagrangian function: The Lagrangian function allows us to combine the function we
want to optimize and the constraint function into a single function. Once we have this
single function, we can proceed as if this were an unconstrained optimization problem.
For each constraint, we must include a Lagrange multiplier (λi ) as an additional variable
in the analysis. These terms are the link between the constraint and the Lagrangian function.
Given a two dimensional set-up:

max / min f (x1 , x2 ) s.t. c(x1 , x2 ) = a


x1 ,x2 x1 ,x2

We define the Lagrangian function L(x1 , x2 , λ1 ) as follows:

L(x1 , x2 , λ1 ) = f (x1 , x2 ) − λ1 (c(x1 , x2 ) − a)

More generally, in n dimensions:


k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) − ri )
i=1
5.5. CONSTRAINED OPTIMIZATION 95

Getting the sign right: Note that above we subtract the lagrangian term and we subtract
the constraint constant from the constraint function. Occasionally, you may see the following
alternative form of the Lagrangian, which is equivalent:


k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) + λi (ri − ci (x1 , . . . , xn ))
i=1

Here we add the lagrangian term and we subtract the constraining function from the con-
straint constant.
Using the Lagrangian to Find the Critical Points: To find the critical points, we
take the partial derivatives of lagrangian function, L(x1 , . . . , xn , λ1 , . . . , λk ), with respect to
each of its variables (all choice variables x and all lagrangian multipliers λ). At a critical
point, each of these partial derivatives must be equal to zero, so we obtain a system of n + k
equations in n + k unknowns:

∂L ∂f ∑ ∂ci k
= − λi =0
∂x1 ∂x1 i=1 ∂x1
.. ..
.=.
∂L ∂f ∑ ∂ci k
= − λi =0
∂xn ∂xn i=1 ∂xn
∂L
= c1 (xi , . . . , xn ) − r1 = 0
∂λ1
.. ..
.=.
∂L
= ck (xi , . . . , xn ) − rk = 0
∂λk

We can then solve this system of equations, because there are n + k equations and n + k
unknowns, to calculate the critical point (x∗1 , . . . , x∗n , λ∗1 , . . . , λ∗k ).
Second-order Conditions and Unconstrained Optimization: There may be more
than one critical point, i.e. we need to verify that the critical point we find is a maxi-
mum/minimum. Similar to unconstrained optimization, we can do this by checking the
second-order conditions.

Example 5.4 (Constrained optimization with two goods and a budget constraint). Find
the constrained optimization of

max f (x) = −(x21 + 2x22 ) s.t. x1 + x2 = 4


x1 ,x2

Solution. 1. Begin by writing the Lagrangian:

L(x1 , x2 , λ) = −(x21 + 2x22 ) − λ(x1 + x2 − 4)


96 CHAPTER 5. OPTIMIZATION

2. Take the partial derivatives and set equal to zero:

∂L
= −2x1 − λ =0
∂x1
∂L
= −4x2 − λ =0
∂x2
∂L
= −(x1 + x2 − 4) = 0
∂λ

3. Solve the system of equations: Using the first two partials, we see that λ = −2x1 and
λ = −4x2 Set these equal to see that x1 = 2x2 . Using the third partial and the above
equality, 4 = 2x2 + x2 from which we get

x∗2 = 4/3, x∗1 = 8/3, λ = −16/3

4. Therefore, the only critical point is x∗1 = 8


3 and x∗2 = 4
3

5. This gives f ( 83 , 43 ) = − 96
9 , which is less than the unconstrained optimum f (0, 0) = 0

Notice that when we take the partial derivative of L with respect to the Lagrangian multiplier
and set it equal to 0, we return exactly our constraint! This is why signs matter.

5.6 Inequality Constraints

Inequality constraints define the boundary of a region over which we seek to optimize the
function. This makes inequality constraints more challenging because we do not know if
the maximum/minimum lies along one of the constraints (the constraint binds) or in the
interior of the region.
We must introduce more variables in order to turn the problem into an unconstrained
optimization.
Slack: For each inequality constraint ci (x1 , . . . , xn ) ≤ ai , we define a slack variable s2i for
which the expression ci (x1 , . . . , xn ) ≤ ai −s2i would hold with equality. These slack variables
capture how close the constraint comes to binding. We use s2 rather than s to ensure that
the slack is positive.
Slack is just a way to transform our constraints.
Given a two-dimensional set-up and these edited constraints:

max / min f (x1 , x2 ) s.t. c(x1 , x2 ) ≤ a1


x1 ,x2 x1 ,x2

Adding in Slack:
max / min f (x1 , x2 ) s.t. c(x1 , x2 ) ≤ a1 − s21
x1 ,x2 x1 ,x2
5.6. INEQUALITY CONSTRAINTS 97

We define the Lagrangian function L(x1 , x2 , λ1 , s1 ) as follows:

L(x1 , x2 , λ1 , s1 ) = f (x1 , x2 ) − λ1 (c(x1 , x2 ) + s21 − a1 )

More generally, in n dimensions:


k
L(x1 , . . . , xn , λ1 , . . . , λk , s1 , . . . , sk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) + s2i − ai )
i=1

Finding the Critical Points: To find the critical points, we take the partial derivatives
of the lagrangian function, L(x1 , . . . , xn , λ1 , . . . , λk , s1 , . . . , sk ), with respect to each of its
variables (all choice variables x, all lagrangian multipliers λ, and all slack variables s). At a
critical point, each of these partial derivatives must be equal to zero, so we obtain a system
of n + 2k equations in n + 2k unknowns:

∂L ∂f ∑ ∂ci k
= − λi =0
∂x1 ∂x1 i=1 ∂x1
.. ..
.=.
∂L ∂f ∑ ∂ci k
= − λi =0
∂xn ∂xn i=1 ∂xn
∂L
= c1 (xi , . . . , xn ) + s21 − b1 = 0
∂λ1
.. ..
.=.
∂L
= ck (xi , . . . , xn ) + s2k − bk = 0
∂λk
∂L
= 2s1 λ1 = 0
∂s1
.. ..
.=.
∂L
= 2sk λk = 0
∂sk

Complementary slackness conditions: The last set of first order conditions of the form
2si λi = 0 (the partials taken with respect to the slack variables) are known as complementary
slackness conditions. These conditions can be satisfied one of three ways:
1. λi = 0 and si ̸= 0: This implies that the slack is positive and thus the constraint does
not bind.
2. λi ̸= 0 and si = 0: This implies that there is no slack in the constraint and the
constraint does bind.
3. λi = 0 and si = 0: In this case, there is no slack but the constraint binds trivially,
without changing the optimum.
98 CHAPTER 5. OPTIMIZATION

Example: Find the critical points for the following constrained optimization:

max f (x) = −(x21 + 2x22 ) s.t. x1 + x2 ≤ 4


x1 ,x2

1. Rewrite with the slack variables:

max f (x) = −(x21 + 2x22 ) s.t. x1 + x2 ≤ 4 − s21


x1 ,x2

2. Write the Lagrangian:

L(x1 , x2 , λ1 , s1 ) = −(x21 + 2x22 ) − λ1 (x1 + x2 + s21 − 4)

3. Take the partial derivatives and set equal to 0:

∂L
= −2x1 − λ1 = 0
∂x1
∂L
= −4x2 − λ1 = 0
∂x2
∂L
= −(x1 + x2 + s21 − 4) = 0
∂λ1
∂L
= −2s1 λ1 = 0
∂s1

4. Consider all ways that the complementary slackness conditions are solved:
Hypothesis s1 λ1 x1 x2 f (x1 , x2 )
s1 = 0 λ1 = 0 No solution
s1 ̸= 0 λ1 = 0 2 0 0 0 0
−16
s1 = 0 λ1 ̸= 0 0 3
8
3
4
3 − 32
3
s1 ̸= 0 λ1 ̸= 0 No solution
This shows that there are two critical points: (0, 0) and ( 83 , 43 ).
5. Find maximum: Looking at the values of f (x1 , x2 ) at the critical points, we see that
f (x1 , x2 ) is maximized at x∗1 = 0 and x∗2 = 0.

Exercise 5.3. Example: Find the critical points for the following constrained optimization:

x1 + x2 ≤ 4
max f (x) = −(x21 + 2x22 ) s.t. x1 ≥ 0
x1 ,x2
x2 ≥ 0

1. Rewrite with the slack variables:


5.7. KUHN-TUCKER CONDITIONS 99

2. Write the Lagrangian:

3. Take the partial derivatives and set equal to zero:

4. Consider all ways that the complementary slackness conditions are solved:
Hypothesis s1 s2 s3 λ1 λ2 λ3 x1 x2 f (x1 , x2 )
s1 = s2 = s3 = 0
s1 ̸= 0, s2 = s3 = 0
s2 ̸= 0, s1 = s3 = 0
s3 ̸= 0, s1 = s2 = 0
s1 ̸= 0, s2 ̸= 0, s3 = 0
s1 ̸= 0, s3 ̸= 0, s2 = 0
̸ 0, s3 =
s2 = ̸ 0, s1 = 0
s1 ≠ 0, s2 ≠ 0, s3 ̸= 0

5. Find maximum:

5.7 Kuhn-Tucker Conditions

As you can see, this can be a pain. When dealing explicitly with non-negativity constraints,
this process is simplified by using the Kuhn-Tucker method.
Because the problem of maximizing a function subject to inequality and non-negativity con-
straints arises frequently in economics, the Kuhn-Tucker conditions provides a method
that often makes it easier to both calculate the critical points and identify points that are
(local) maxima.
Given a two-dimensional set-up:

c(x1 , x2 ) ≤ a1
max / min f (x1 , x2 ) s.t. x1 ≥ 0
x1 ,x2 x1 ,x2
gx2 ≥ 0

We define the Lagrangian function L(x1 , x2 , λ1 ) the same as if we did not have the non-
negativity constraints:

L(x1 , x2 , λ2 ) = f (x1 , x2 ) − λ1 (c(x1 , x2 ) − a1 )

More generally, in n dimensions:


k
L(x1 , . . . , xn , λ1 , . . . , λk ) = f (x1 , . . . , xn ) − λi (ci (x1 , . . . , xn ) − ai )
i=1
100 CHAPTER 5. OPTIMIZATION

Kuhn-Tucker and Complementary Slackness Conditions: To find the critical points,


we first calculate the Kuhn-Tucker conditions by taking the partial derivatives of the la-
grangian function, L(x1 , . . . , xn , λ1 , . . . , λk ), with respect to each of its variables (all choice
variable sx and all lagrangian multipliers λ) and we calculate the complementary slack-
ness conditions by multiplying each partial derivative by its respective variable and include
non-negativity conditions for all variables (choice variables x and lagrangian multipliers λ).
Kuhn-Tucker Conditions

∂L ∂L
≤ 0, . . . , ≤0
∂x1 ∂xn
∂L ∂L
≥ 0, . . . , ≥0
∂λ1 ∂λm

Complementary Slackness Conditions

∂L ∂L
x1 = 0, . . . , xn =0
∂x1 ∂xn
∂L ∂L
λ1 = 0, . . . , λm =0
∂λ1 ∂λm

Non-negativity Conditions

x1 ≥ 0 ... xn ≥ 0
λ1 ≥ 0 ... λm ≥ 0

Note that some of these conditions are set equal to 0, while others are set as inequalities!
Note also that to minimize the function f (x1 , . . . , xn ), the simplest thing to do is maximize
the function −f (x1 , . . . , xn ); all of the conditions remain the same after reformulating as a
maximization problem.
There are additional assumptions (notably, f(x) is quasi-concave and the constraints are
convex) that are sufficient to ensure that a point satisfying the Kuhn-Tucker conditions is a
global max; if these assumptions do not hold, you may have to check more than one point.
Finding the Critical Points with Kuhn-Tucker Conditions: Given the above condi-
tions, to find the critical points we solve the above system of equations. To do so, we must
check all border and interior solutions to see if they satisfy the above conditions.
In a two-dimensional set-up, this means we must check the following cases:
1. x1 = 0, x2 =0 Border Solution
2. x1 = 0, x2 ̸= 0 Border Solution
3. x1 ̸= 0, x2 =0 Border Solution
4. x1 ̸= 0, x2 ̸= 0 Interior Solution
5.7. KUHN-TUCKER CONDITIONS 101

Example 5.5 (Kuhn-Tucker with two variables). Solve the following optimization problem
with inequality constraints
max f (x) = −(x21 + 2x22 )
x1 ,x2



 x1 + x2 ∗ ≤ 4
s.t. x1 ∗ ≥ 0


x2 ∗ ≥ 0

1. Write the Lagrangian:


L(x1 , x2 , λ) = −(x21 + 2x22 ) − λ(x1 + x2 − 4)

2. Find the First Order Conditions:


Kuhn-Tucker Conditions

∂L
= −2x1 − λ ≤ 0
∂x1
∂L
= −4x2 − λ ≤ 0
∂x2
∂L
= −(x1 + x2 − 4) ≥ 0
∂λ

Complementary Slackness Conditions

∂L
x1 = x1 (−2x1 − λ) = 0
∂x2
∂L
x2 = x2 (−4x2 − λ) = 0
∂x2
∂L
λ = −λ(x1 + x2 − 4) = 0
∂λ

Non-negativity Conditions

x1 ≥ 0
x2 ≥ 0
λ≥0
3. Consider all border and interior cases:
Hypothesis λ x1 x2 f (x1 , x2 )
x1 = 0, x2 = 0 0 0 0 0
x1 = 0, x2 ̸= 0 -16 0 4 -32
x1 ̸= 0, x2 = 0 -8 4 0 -16
x1 ̸= 0, x2 ̸= 0 − 163
8
3
4
3 − 32
3
102 CHAPTER 5. OPTIMIZATION

4. Find Maximum: Three of the critical points violate the requirement that λ ≥ 0, so
the point (0, 0, 0) is the maximum.

Exercise 5.4 (Kuhn-Tucker with logs). Solve the constrained optimization problem,

x1 + 2x2 ≤ 4
1 2
max f (x) = log(x1 + 1) + log(x2 + 1) s.t. x1 ≥ 0
x1 ,x2 3 3
x2 ≥ 0

1. Write the Lagrangian:

2. Find the First Order Conditions:


Kuhn-Tucker Conditions

Complementary Slackness Conditions

Non-negativity Conditions
5.8. APPLICATIONS OF QUADRATIC FORMS 103

3. Consider all border and interior cases:


Hypothesis λ x1 x2 f (x1 , x2 )
x1 = 0, x2 = 0
x1 = 0, x2 ̸= 0
x1 ̸= 0, x2 = 0
x1 ̸= 0, x2 ̸= 0

4. Find Maximum:

5.8 Applications of Quadratic Forms


Curvature and The Taylor Polynomial as a Quadratic Form: The Hessian is used in
a Taylor polynomial approximation to f (x) and provides information about the curvature
of f (x) at x — e.g., which tells us whether a critical point x∗ is a min, max, or saddle point.
1. The second order Taylor polynomial about the critical point x∗ is
1 ⊤
f (x∗ + h) = f (x∗ ) + ∇f (x∗ )h + h H(x∗ )h + R(h)
2
2. Since we’re looking at a critical point, ∇f (x∗ ) = 0; and for small h, R(h) is negligible.
Rearranging, we get
1
f (x∗ + h) − f (x∗ ) ≈ h⊤ H(x∗ )h
2
3. The Righthand side here is a quadratic form and we can determine the definiteness of
H(x∗ ).
104 CHAPTER 5. OPTIMIZATION
Chapter 6

Probability Theory

Probability and Inferences are mirror images of each other, and both are integral to social
science. Probability quantifies uncertainty, which is important because many things in the
social world are at first uncertain. Inference is then the study of how to learn about facts
you don’t observe from facts you do observe.

6.1 Counting rules

Fundamental Theorem of Counting: If an object has j different characteristics that are


independent
∏j of each other, and each characteristic i has ni ways of being expressed, then
there are i=1 ni possible unique objects.
Example: Cards can be either red or black and can take on any of 13 values.
j=
ncolor =
nnumber =
Number of Outcomes =
We often need to count the number of ways to choose a subset from some set of possibilities.
The number of outcomes depends on two characteristics of the process: does the order
matter and is replacement allowed?
Sampling Table: If there are n objects which are numbered 1 to n and we select k < n of
them, how many different outcomes are possible?
If the order in which a given object is selected matters, selecting 4 numbered objects in the
following order (1, 3, 7, 2) and selecting the same four objects but in a different order such
as (7, 2, 1, 3) will be counted as different outcomes.
If replacement is allowed, there are always the same n objects to select from. However, if
replacement is not allowed, there is always one less option than the previous round when

105
106 CHAPTER 6. PROBABILITY THEORY

making a selection. For example, if replacement is not allowed and I am selecting 3 elements
from the following set {1, 2, 3, 4, 5, 6}, I will have 6 options at first, 5 options as I make
my second selection, and 4 options as I make my third.
So in counting how many different outcomes are possible, if order matters AND we are
sampling with replacement, the number of different outcomes is nk .
If order matters AND we are sampling without replacement, the number of different
outcomes is n(n − 1)(n − 2)...(n − k + 1) = (n−k)!
n!
.
If order doesn’t matter
( ) AND we are sampling without replacement, the number of
different outcomes is nk = (n−k)!k!
n!
.
(n ) n!
Expression k is read as “n choose k” and denotes (n−k)!k! . Also, note that 0! = 1.

Example 6.1 (Counting). There are five balls numbered from 1 through 5 in a jar. Three
balls are chosen. How many possible choices are there?
1. Ordered, with replacement =
2. Ordered, without replacement =
3. Unordered, without replacement =

Exercise 6.1 (Counting). Four cards are selected from a deck of 52 cards. Once a card
has been drawn, it is not reshuffled back into the deck. Moreover, we care only about the
complete hand that we get (i.e. we care about the set of selected cards, not the sequence in
which it was drawn). How many possible outcomes are there?

6.2 Sets
Set : A set is any well defined collection of elements. If x is an element of S, x ∈ S.
Sample Space (S): A set or collection of all possible outcomes from some process. Out-
comes in the set can be discrete elements (countable) or points along a continuous interval
(uncountable).
Examples:
1. Discrete: the numbers on a die, whether a vote cast is republican or democrat.
2. Continuous: GNP, arms spending, age.
Event: Any collection of possible outcomes of an experiment. Any subset of the full set of
possibilities, including the full set itself. Event A ⊂ S.
Empty Set: a set with no elements. S = {}. It is denoted by the symbol ∅.
Set operations:
1. Union: The union of two sets A and B, A∪ B, is the set containing all of the elements
in A or B.

n
A1 ∪ A2 ∪ · · · ∪ An = Ai
i=1
6.3. PROBABILITY 107

2. Intersection: The intersection of sets A and B, A ∩ B, is the set containing all of


the elements in both A and B.

n
A1 ∩ A2 ∩ · · · ∩ An = Ai
i=1

3. Complement: If set A is a subset of S, then the complement of A, denoted AC , is


the set containing all of the elements in S that are not in A.
Properties of set operations:
• Commutative: A ∪ B = B ∪ A; A ∩ B = B ∩ A
• Associative: A ∪ (B ∪ C) = (A ∪ B) ∪ C; A ∩ (B ∩ C) = (A ∩ B) ∩ C
• Distributive: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C); A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
• de Morgan’s laws: (A ∪ B)C = AC ∩ B C ; (A ∩ B)C = AC ∪ B C
• Disjointness: Sets are disjoint when they do not intersect, such that A ∩ B = ∅. A
collection of sets is pairwise disjoint (mutually exclusive) if, for all i ̸= j, Ai ∩Aj = ∅.
A collection of sets
∪kform a partition of set S if they are pairwise disjoint and they cover
set S, such that i=1 Ai = S.

Example 6.2 (Sets). Let set A be {1, 2, 3, 4}, B be {3, 4, 5, 6}, and C be {5, 6, 7, 8}.
Sets A, B, and C are all subsets of the sample space S which is {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
Write out the following sets:
1. A∪B
2. C ∩B
3. Bc
4. A ∩ (B ∪ C)

Exercise 6.2 (Sets). Suppose you had a pair of four-sided dice. You sum the results from
a single toss.
What is the set of possible outcomes (i.e. the sample space)?
Consider subsets A {2, 8} and B {2,3,7} of the sample space you found. What is
1. Ac
2. (A ∪ B)c

6.3 Probability

Probability Definitions: Formal and Informal

Many things in the world are uncertain. In everyday speech, we say that we are uncertain
about the outcome of random events. Probability is a formal model of uncertainty which
provides a measure of uncertainty governed by a particular set of rules (Figure 6.1). A
1 Images of Probability and Random Variables drawn by Shiro Kuriwaki and inspired by Blitzstein and
Morris
108 CHAPTER 6. PROBABILITY THEORY

Sample Space: S
s2
s1 An "experiment" from the
(unobserved) data generating process
Event A s6 generates (observed) outcomes.
s3 Events are sets of outcomes.
s4 s8
s7
s5

Figure 6.1: Probablity as a Measure1

different model of uncertainty would, of course, have a set of rules different from anything we
discuss here. Our focus on probability is justified because it has proven to be a particularly
useful model of uncertainty.
Probability Distribution Function: a mapping of each event in the sample space S to
the real numbers that satisfy the following three axioms (also called Kolmogorov’s Axioms).
Formally,

Definition 6.1 (Probability). Probability is a function that maps events to a real number,
obeying the axioms of probability.

The axioms of probability make sure that the separate events add up in terms of probability,
and – for standardization purposes – that they add up to 1.

Definition 6.2 (Axioms of Probability). 1. For any event A, P (A) ≥ 0.


2. P (S) = 1
3. The Countable Additivity Axiom: For any sequence of disjoint (mutually exclusive)
events A1 , A2 , . . . (of which there may be infinitely many),
( )

k ∑
k
P Ai = P (Ai )
i=1 i=1

The last axiom is an extension of a union to infinite sets. When there are only two events
in the space, it boils down to:

P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) for disjoint A1 , A2


6.4. CONDITIONAL PROBABILITY AND BAYES RULE 109

Probability Operations

Using these three axioms, we can define all of the common rules of probability.
1. P (∅) = 0
2. For any event A, 0 ≤ P (A) ≤ 1.
3. P (AC ) = 1 − P (A)
4. If A ⊂ B (A is a subset of B), then P (A) ≤ P (B).
5. For any two events A and B, P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
6. Boole’s Inequality: For ( nany )
sequence of n events (which need not be disjoint)
∪ ∑n
A1 , A2 , . . . , An , then P Ai ≤ P (Ai ).
i=1 i=1

Example 6.3 (Probability). Assume we have an evenly-balanced, six-sided die.


Then,
1. Sample space S =
2. P (1) = · · · = P (6) =
3. P (∅) = P (7) =
4. P ({1,
( 3, 5}) ) =
5. P {1, 2}C = P ({3, 4, 5, 6}) =
6. Let A = {1, 2, 3, 4, 5} ⊂ S. Then P (A) = 5/6 < P (S) =
7. Let A = {1, 2, 3} and B = {2, 4, 6}. Then A ∪ B? A ∩ B? P (A ∪ B)?

Exercise 6.3 (Probability). Suppose you had a pair of four-sided dice. You sum the results
from a single toss. Let us call this sum, or the outcome, X.
1. What is P (X = 5), P (X = 3), P (X = 6)?
2. What is P (X = 5 ∪ X = 3)C ?

6.4 Conditional Probability and Bayes Rule


Conditional Probability: The conditional probability P (A|B) of an event A is the prob-
ability of A, given that another event B has occurred. Conditional probability allows for
the inclusion of other information into the calculation of the probability of an event. It is
calculated as

P (A ∩ B)
P (A|B) =
P (B)

Note that conditional probabilities are probabilities and must also follow the Kolmagorov
axioms of probability.

Example 6.4 (Conditional Probability 1). Assume A and B occur with the following
frequencies:
110 CHAPTER 6. PROBABILITY THEORY

A Ac
B nab nac b
BC nabc n(ab)c

and let nab + naC b + nabC + n(ab)C = N . Then


1. P (A) =
2. P (B) =
3. P (A ∩ B) =
4. P (A|B) = P P(A∩B)
(B) =
P (A∩B)
5. P (B|A) = P (A) =

Example 6.5 (Conditional Probability 2). A six-sided die is rolled. What is the probability
of a 1, given the outcome is an odd number?

You could rearrange the fraction to highlight how a joint probability could be expressed as
the product of a conditional probability.

Definition 6.3 (Multiplicative Law of Probability). The probability of the intersection of


two events A and B is P (A ∩ B) = P (A)P (B|A) = P (B)P (A|B) which follows directly
from the definition of conditional probability. More generally,

P (A1 ∩ · · · ∩ Ak ) = P (Ak |Ak−1 ∩ · · · ∩ A1 ) × P (Ak−1 |Ak−2 ∩ · · · A1 ) × . . . × P (A2 |A1 ) × P (A1 )

Sometimes it is easier to calculate these conditional probabilities and sum them than it is
to calculate P (A) directly.

Definition 6.4 (Law of Total Probability). Let S be the sample space of some experiment
and let the disjoint k events B1 , . . . , Bk partition S, such that P (B1 ∪ ... ∪ Bk ) = P (S) = 1.
If A is some other event in S, then the events A∩B1 , A∩B2 , . . . , A∩Bk will form a partition
of A and we can write A as

A = (A ∩ B1 ) ∪ · · · ∪ (A ∩ Bk )

.
Since the k events are disjoint,


k
P (A) = P (A ∩ Bi )
i=1

k
= P (Bi )P (A|Bi )
i=1
6.5. INDEPENDENCE 111

Bayes Rule: Assume that events B1 , . . . , Bk form a partition of the space S. Then by the
Law of Total Probability

P (A ∩ Bj ) P (Bj )P (A|Bj )
P (Bj |A) = = k
P (A) ∑
P (Bi )P (A|Bi )
i=1

If there are only two states of B, then this is just

P (B1 )P (A|B1 )
P (B1 |A) =
P (B1 )P (A|B1 ) + P (B2 )P (A|B2 )

Bayes’ rule determines the posterior probability of a state P (Bj |A) by calculating the prob-
ability P (A ∩ Bj ) that both the event A and the state Bj will occur and dividing it by
the probability that the event will occur regardless of the state (by summing across all
Bi ). The states could be something like Normal/Defective, Healthy/Diseased, Republi-
can/Democrat/Independent, etc. The event on which one conditions could be something
like a sampling from a batch of components, a test for a disease, or a question about a policy
position.
Prior and Posterior Probabilities: Above, P (B1 ) is often called the prior probability,
since it’s the probability of B1 before anything else is known. P (B1 |A) is called the posterior
probability, since it’s the probability after other information is taken into account.

Example 6.6 (Bayes’ Rule). In a given town, 40% of the voters are Democrat and 60%
are Republican. The president’s budget is supported by 50% of the Democrats and 90%
of the Republicans. If a randomly (equally likely) selected voter is found to support the
president’s budget, what is the probability that they are a Democrat?

Exercise 6.4 (Conditional Probability). Assume that 2% of the population of the U.S. are
members of some extremist militia group. We develop a survey that positively classifies
someone as being a member of a militia group given that they are a member 95% of the
time and negatively classifies someone as not being a member of a militia group given that
they are not a member 97% of the time. What is the probability that someone positively
classified as being a member of a militia group is actually a militia member?

6.5 Independence
Definition 6.5 (Independence). If the occurrence or nonoccurrence of either events A
and B have no effect on the occurrence or nonoccurrence of the other, then A and B are
independent.

If A and B are independent, then


1. P (A|B) = P (A)
2. P (B|A) = P (B)
112 CHAPTER 6. PROBABILITY THEORY

3. P (A ∩ B) = P (A)P (B)
∩k ∏K
4. More generally than the above, P ( i=1 Ai ) = i=1 P (Ai )
Are mutually exclusive events independent of each other?
No. If A and B are mutually exclusive, then they cannot happen simultaneously. If we know
that A occurred, then we know that B couldn’t have occurred. Because of this, A and B
aren’t independent.
Pairwise Independence: A set of more than two events A1 , A2 , . . . , Ak is pairwise inde-
pendent if P (Ai ∩ Aj ) = P (Ai )P (Aj ), ∀i ̸= j. Note that this does not necessarily imply
joint independence.
Conditional Independence: If A and B are independent once you know the occurrence
of a third event C, then A and B are conditionally independent (conditional on C):
1. P (A|B ∩ C) = P (A|C)
2. P (B|A ∩ C) = P (B|C)
3. P (A ∩ B|C) = P (A|C)P (B|C)
Just because two events are conditionally independent does not mean that they are indepen-
dent. Actually it is hard to think of real-world things that are “unconditionally” independent.
That’s why it’s always important to ask about a finding: What was it conditioned on? For
example, suppose that a graduate school admission decisions are done by only one professor,
who picks a group of 50 bright students and flips a coin for each student to generate a class
of about 25 students. Then the the probability that two students get accepted are condition-
ally independent, because they are determined by two separate coin tosses. However, this
does not mean that their admittance is not completely independent. Knowing that student
A got in gives us information about whether student B got in, if we think that the professor
originally picked her pool of 50 students by merit.
Perhaps more counter-intuitively: If two events are already independent, then it might seem
that no amount of “conditioning” will make them dependent. But this is not always so. For
example2 , suppose I only get a call from two people, Alice and Bob. Let A be the event
that Alice calls, and B be the event that Bob calls. Alice and Bob do not communicate,
so P (A | B) = P (A). But now let C be the event that your phone rings. For conditional
independence to hold here, then P (A | C) must be equal to P (A | B ∩ C). But this is not
true – A | C may or may not be true, but P (A | B ∩ C) certainly is true.

6.6 Random Variables


Most questions in the social sciences involve events, rather than numbers per se. To analyze
and reason about events quantitatively, we need a way of mapping events to numbers. A
random variable does exactly that.

Definition 6.6 (Random Variable). A random variable is a measurable function X that


maps from the sample space S to the set of real numbers R. It assigns a real number to
every outcome s ∈ S.
2 Example taken from Blitzstein and Hwang, Example 2.5.10
6.7. DISTRIBUTIONS 113

Space of Events (Real) Number Line

s2
s1
s6 -1.2 -0.6 0 1 2
s3
s4 s8
s7
s5
The Random Variable X is a function
that takes events and assigns a
number to them. That mapping
process is deterministic, but the
occurrence of an event is still random.

Figure 6.2: The Random Variable as a Real-Valued Function

Figure 6.2 shows a image of the function. It might seem strange to define a random variable
as a function – which is neither random nor variable. The randomness comes from the
realization of an event from the sample space s.
Randomness means that the outcome of some experiment is not deterministic, i.e. there
is some probability (0 < P (A) < 1) that the event will occur.
The support of a random variable is all values for which there is a positive probability of
occurrence.
Example: Flip a fair coin two times. What is the sample space?
A random variable must map events to the real line. For example, let a random variable
X be the number of heads. The event (H, H) gets mapped to 2 X(s) = 2, and the events
{(H, T ), (T, H)} gets mapped to 1 (X(s) = 1), the event (T, T ) gets mapped to 0 (X(s) = 0).
What are other possible random variables?

6.7 Distributions
We now have two main concepts in this section – probability and random variables. Given
a sample space S and the same experiment, both probability and random variables take
events as their inputs. But they output different things (probabilities measure the “size”
of events, random variables give a number in a way that the analyst chose to define the
random variable). How do the two concepts relate?
The concept of distributions is the natural bridge between these two concepts.
114 CHAPTER 6. PROBABILITY THEORY

Definition 6.7 (Distribution of a random variable). A distribution of a random variable is


a function that specifies the probabilities of all events associated with that random variable.
There are several types of distributions: A probability mass function for a discrete random
variable and probability density function for a continuous random variable.

Notice how the definition of distributions combines two ideas of random variables and prob-
abilities of events. First, the distribution considers a random variable, call it X. X can take
a number of possible numeric values.

Example 6.7 (Total Number of Occurrences). Consider three binary outcomes, one for
each patient recovering from a disease: Ri denotes the event in which patient i (i = 1, 2, 3)
recovers from a disease. R1 , R2 , and R3 . How would we represent the total number of
people who end up recovering from the disease?

Solution. Define the random variable X be the total number of people (out of three) who
recover from the disease. Random variables are functions, that take as an input a set
of events (in the sample space S) and deterministically assigns them to a number of the
analyst’s choice.
Recall that with each of these numerical values there is a class of events. In the previous
example, for X = 3 there is one outcome (R1 , R2 , R3 ) and for X = 1 there are multiple
({(R1 , R2c , R3c ), (R1c , R2 , R3c ), (R1c , R2c , R3 ), }). Now, the thing to notice here is that each of
these events naturally come with a probability associated with them. That is, P (R1 , R2 , R3 )
is a number from 0 to 1, as is P (R1 , R2c , R3c ). These all have probabilities because they are
in the sample space S. The function that tells us these probabilities that are associated
with a numerical value of a random variable is called a distribution.
In other words, a random variable X induces a probability distribution P (sometimes written
PX to emphasize that the probability density is about the r.v. X)

Discrete Random Variables

The formal definition of a random variable is easier to given by separating out two cases:
discrete random variables when the numeric summaries of the events are discrete, and
continuous random variables when they are continuous.

Definition 6.8 (Discrete Random Variable). X is a discrete random variable if it can


assume only a finite or countably infinite number of distinct values. Examples: number of
wars per year, heads or tails.

The distribution of a discrete r.v. is a PMF:

Definition 6.9 (Probability Mass Function). For a discrete random variable X, the prob-
ability mass function (Also referred to simply as the “probability distribution.”) (PMF),
p(x) = P (X = x), assigns probabilities to a countable number of distinct x values such that
1. 0 ≤ p(x) ≤ 1
6.7. DISTRIBUTIONS 115


2. p(x) = 1
y

Example: For a fair six-sided die, there is an equal probability of rolling any number. Since
there are six sides, the probability mass function is then p(y) = 1/6 for y = 1, . . . , 6, 0
otherwise.}
In a discrete random variable, cumulative density function (Also referred to simply as
the “cumulative distribution” or previously as the “density function”), F (x) or P (X ≤ x),
is the probability that X is less than or equal to some value x, or

P (X ≤ x) = p(i)
i≤x

Properties a CDF must satisfy:


1. F (x) is non-decreasing in x.
2. lim F (x) = 0 and lim F (x) = 1
x→−∞ x→∞
3. F (x) is right-continuous.
Note that P (X > x) = 1 − P (X ≤ x).

Example 6.8. For a fair die with its value as Y , What are the following?
1. P (Y ≤ 1)
2. P (Y ≤ 3)
3. P (Y ≤ 6)

Continuous Random Variables

We also have a similar definition for continuous random variables.

Definition 6.10 (Continuous Random Variable). X is a continuous random variable if


∫ function f (x) defined for all real x ∈ (−∞, ∞), such that for any
there exists a nonnegative
interval A, P (X ∈ A) = f (x)dx. Examples: age, income, GNP, temperature.
A

Definition 6.11 (Probability Density Function). The function f above is called the prob-
ability density function (pdf) of X and must satisfy

f (x) ≥ 0

∫∞
f (x)dx = 1
−∞

Note also that P (X = x) = 0 — i.e., the probability of any point y is zero.


116 CHAPTER 6. PROBABILITY THEORY

For both discrete and continuous random variables, we have a unifying concept of another
measure: the cumulative distribution:

Definition 6.12 (Cumulative Density Function). Because the probability that a continuous
random variable will assume any particular value is zero, we can only make statements about
the probability of a continuous random variable being within an interval. The cumulative
distribution gives the probability that Y lies on the interval (−∞, y) and is defined as
∫x
F (x) = P (X ≤ x) = f (s)ds
−∞

Note that F (x) has similar properties with continuous distributions as it does with dis-
crete - non-decreasing, continuous (not just right-continuous), and lim F (x) = 0 and
x→−∞
lim F (x) = 1.
x→∞

We can also make statements about the probability of Y falling in an interval a ≤ y ≤ b.

∫b
P (a ≤ x ≤ b) = f (x)dx
a

The PDF and CDF are linked by the integral: The CDF of the integral of the PDF:

dF (x)
f (x) = F ′ (x) =
dx

Example 6.9. For f (y) = 1, 0 < y < 1, find: (1) The CDF F (y) and (2) The probability
P (0.5 < y < 0.75).

6.8 Joint Distributions


Often, we are interested in two or more random variables defined on the same sample space.
The distribution of these variables is called a joint distribution. Joint distributions can be
made up of any combination of discrete and continuous random variables.
Joint Probability Distribution: If both X and Y are random variable, their joint prob-
ability mass/density function assigns probabilities to each pair of outcomes
Discrete:

p(x, y) = P (X = x, Y = y)

such that p(x, y) ∈ [0, 1] and


∑∑
p(x, y) = 1
6.9. EXPECTATION 117

Continuous:
∫∫
f (x, y); P ((X, Y ) ∈ A) = f (x, y)dxdy
A

s.t. f (x, y) ≥ 0 and


∫ ∞ ∫ ∞
f (x, y)dxdy = 1
−∞ −∞

If X and Y are independent, then P (X = x, Y = y) = P (X = x)P (Y = y) and f (x, y) =


f (x)f (y)
Marginal Probability Distribution: probability distribution of only one of the two
variables (ignoring information about the other variable), we can obtain the marginal dis-
tribution by summing/integrating across the variable that we don’t care about:

• Discrete: pX (x) = i ∫p(x, yi )

• Continuous: fX (x) = −∞ f (x, y)dy
Conditional Probability Distribution: probability distribution for one variable, holding
the other variable fixed. Recalling from the previous lecture that P (A|B) = P P(A∩B)
(B) , we
can write the conditional distribution as
p(x,y)
• Discrete: pY |X (y|x) = pX (x) , pX (x) > 0
f (x,y)
• Continuous: fY |X (y|x) = fX (x) , fX (x) > 0

Exercise 6.5 (Discrete Outcomes). Suppose we are interested in the outcomes of flipping a
coin and rolling a 6-sided die at the same time. The sample space for this process contains
12 elements:
{(H, 1), (H, 2), (H, 3), (H, 4), (H, 5), (H, 6), (T, 1), (T, 2), (T, 3), (T, 4), (T, 5), (T, 6)}
We can define two random variables X and Y such that X = 1 if heads and X = 0 if tails,
while Y equals the number on the die.
We can then make statements about the joint distribution of X and Y . What are the
following?
1. P (X = x)
2. P (Y = y)
3. P (X = x, Y = y)
4. P (X = x|Y = y)
5. Are X and Y independent?

6.9 Expectation
We often want to summarize some characteristics of the distribution of a random variable.
The most important summary is the expectation (or expected value, or mean), in which the
possible values of a random variable are weighted by their probabilities.
118 CHAPTER 6. PROBABILITY THEORY

Definition 6.13 (Expectation of a Discrete Random Variable). The expected value of a


discrete random variable Y is
∑ ∑
E(Y ) = yP (Y = y) = yp(y)
y y

In words, it is the weighted average of all possible values of Y , weighted by the probability
that y occurs. It is not necessarily the number we would expect Y to take on, but the
average value of Y after a large number of repetitions of an experiment.

Example 6.10 (Expectation of a Discrete Random Variable). What is the expectation of


a fair, six-sided die?

Expectation of a Continuous Random Variable: The expected value of a continuous


random variable is similar in concept to that of the discrete random variable, except that
instead of summing using probabilities as weights, we integrate using the density to weight.
Hence, the expected value of the continuous variable Y is defined by

E(Y ) = yf (y)dy
y

Example 6.11 (Expectation of a Continuous Random Variable). Find E(Y ) for f (y) =
1
1.5 , 0 < y < 1.5.

Expected Value of a Function

Remember: An Expected Value is a type of weighted average. We can extend this to


composite functions. For random variable Y ,
If Y is Discrete with PMF p(y),


E[g(Y )] = g(y)p(y)
y

If Y is Continuous with PDF f (y),

∫∞
E[g(Y )] = g(y)f (y)dy
−∞

Properties of Expected Values

Dealing with Expectations is easier when the thing inside is a sum. The intuition behind
this that Expectation is an integral, which is a type of sum.
6.10. VARIANCE AND COVARIANCE 119

1. Expectation of a constant is a constant

E(c) = c

2. Constants come out


E(cg(Y )) = cE(g(Y ))
3. Expectation is Linear

E(g(Y1 ) + · · · + g(Yn )) = E(g(Y1 )) + · · · + E(g(Yn )),

regardless of independence
4. Expected Value of Expected Values:

E(E(Y )) = E(Y )

(because the expected value of a random variable is a constant)


Finally, if X and Y are independent, even products are easy:

E(XY ) = E(X)E(Y )

Conditional Expectation: With joint distributions, we are often interested in the ex-
pected value of a variable Y if we could hold the other variable X fixed. This is the
conditional expectation of Y given X = x:

1. Y discrete: E(Y |X = x) = y ypY |X (y|x)

2. Y continuous: E(Y |X = x) = y yfY |X (y|x)dy
The conditional expectation is often used for prediction when one knows the value of X but
not Y

6.10 Variance and Covariance

We can also look at other summaries of the distribution, which build on the idea of taking
expectations. Variance tells us about the “spread” of the distribution; it is the expected
value of the squared deviations from the mean of the distribution. The standard deviation
is simply the square root of the variance.

Definition 6.14 (Variance). The Variance of a Random Variable Y is

Var(Y ) = E[(Y − E(Y ))2 ] = E(Y 2 ) − [E(Y )]2

The Standard Deviation is the square root of the variance :



SD(Y ) = σY = Var(Y )
120 CHAPTER 6. PROBABILITY THEORY

Example 6.12 (Variance). Given the following PMF:


{
3!
( 1 )3 x = 0, 1, 2, 3
f (x) = x!(3−x)! 2
0 otherwise

What is Var(x)?
Hint: First calculate E(X) and E(X 2 )

Definition 6.15 (Covariance and Correlation). The covariance measures the degree to
which two random variables vary together; if the covariance between X and Y is positive,
X tends to be larger than its mean when Y is larger than its mean.

Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))]


We can also write this as

Cov(X, Y ) = E (XY − XE(Y ) − E(X)Y + E(X)E(Y ))


= E(XY ) − E(X)E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(XY ) − E(X)E(Y )

The covariance of a variable with itself is the variance of that variable.


The Covariance is unfortunately hard to interpret in magnitude. The correlation is a stan-
dardized version of the covariance, and always ranges from -1 to 1.

Definition 6.16 (Correlation). The correlation coefficient is the covariance divided by the
standard deviations of X and Y . It is a unitless measure and always takes on values in the
interval [−1, 1].

Cov(X, Y ) Cov(X, Y )
Corr(X, Y ) = √ =
Var(X)Var(Y ) SD(X)SD(Y )

Properties of Variance and Covariance:


1. Var(c) = 0
2. Var(cY ) = c2 Var(Y )
3. Cov(Y, Y ) = Var(Y )
4. Cov(X, Y ) = Cov(Y, X)
5. Cov(aX, bY ) = abCov(X, Y )
6. Cov(X + a, Y ) = Cov(X, Y )
7. Cov(X + Z, Y + W ) = Cov(X, Y ) + Cov(X, W ) + Cov(Z, Y ) + Cov(Z, W )
8. Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y )
6.11. SPECIAL DISTRIBUTIONS 121

Exercise 6.6 (Expectation and Variance). Suppose we have a PMF with the following
characteristics:

1
P (X = −2) =
5
1
P (X = −1) =
6
1
P (X = 0) =
5
1
P (X = 1) =
15
11
P (X = 2) =
30
1. Calculate the expected value of X
Define the random variable Y = X 2 .
2. Calculate the expected value of Y. (Hint: It would help to derive the PMF of Y first
in order to calculate the expected value of Y in a straightforward way)
3. Calculate the variance of X.

Exercise 6.7 (Expectation and Variance 2). 1. Find the expectation and variance
Given the following PDF:
{
3
10 (3x− x2 ) 0 ≤ x ≤ 2
f (x) =
0 otherwise

Exercise 6.8 (Expectation and Variance 3). 1. Find the mean and standard deviation
of random variable X. The PDF of this X is as follows:



4x 0 ≤ x ≤ 2
1

f (x) = 41 (4 − x) 2 ≤ x ≤ 4


0 otherwise
2. Next, calculate P (X < µ−σ) Remember, µ is the mean and σ is the standard deviation

6.11 Special Distributions


Two discrete distributions used often are:

Definition 6.17 (Binomial Distribution). Y is distributed binomial if it represents the


number of “successes” observed in n independent, identical “trials,” where the probability
of success in any trial is p and the probability of failure is q = 1 − p.
122 CHAPTER 6. PROBABILITY THEORY

For any particular sequence of y successes and n − y failures, the probability of obtaining
that
(n ) sequence is py q n−y (by the multiplicative law and independence). However, there are
y = (n−y)!y! ways of obtaining a sequence with y successes and n − y failures. So the
n!

binomial distribution is given by


( )
n y n−y
p(y) = p q , y = 0, 1, 2, . . . , n
y

with mean µ = E(Y ) = np and variance σ 2 = Var(Y ) = npq.

Example 6.13. Republicans vote for Democrat-sponsored bills 2% of the time. What is
the probability that out of 10 Republicans questioned, half voted for a particular Democrat-
sponsored bill? What is the mean number of Republicans voting for Democrat-sponsored
bills? The variance? 1. P (Y = 5) = 1. E(Y ) = 1. Var(Y ) = 6

Definition 6.18 (Poisson Distribution). A random variable Y has a Poisson distribution


if

λy −λ
P (Y = y) = e , y = 0, 1, 2, . . . , λ>0
y!

The Poisson has the unusual feature that its expectation equals its variance: E(Y ) =
Var(Y ) = λ. The Poisson distribution is often used to model rare event counts: counts of
the number of events that occur during some unit of time. λ is often called the “arrival
rate.”

Example 6.14. Border disputes occur between two countries through a Poisson Distribu-
tion, at a rate of 2 per month. What is the probability of 0, 2, and less than 5 disputes
occurring in a month?

Two continuous distributions used often are:

Definition 6.19 (Uniform Distribution). A random variable Y has a continuous uniform


distribution on the interval (α, β) if its density is given by
1
f (y) = , α≤y≤β
β−α
α+β (β−α)2
The mean and variance of Y are E(Y ) = 2 and Var(Y ) = 12 .

Example 6.15. For Y uniformly distributed over (1, 3), what are the following probabili-
ties?
1. P (Y = 2)
2. Its density evaluated at 2, or f (2)
3. P (Y ≤ 2)
4. P (Y > 2)
6.12. SUMMARIZING OBSERVED EVENTS (DATA) 123

0.4

0.3
f(x)

0.2

0.1

0.0

−5.0 −2.5 0.0 2.5 5.0


x
Thick line: variance = 2, Normal line: variance = 1

Figure 6.3: Normal Distribution Density

Definition 6.20 (Normal Distribution). A random variable Y is normally distributed with


mean E(Y ) = µ and variance Var(Y ) = σ 2 if its density is

1 (y−µ)2
f (y) = √ e− 2σ2
2πσ

See Figure 6.3 are various Normal Distributions with the same µ = 1 and two versions of
the variance.

6.12 Summarizing Observed Events (Data)

So far, we’ve talked about distributions in a theoretical sense, looking at different properties
of random variables. We don’t observe random variables; we observe realizations of the
random variable. These realizations of events are roughly equivalent to what we mean by
“data”.

Sample mean: This is the most common measure of central tendency, calculated by sum-
124 CHAPTER 6. PROBABILITY THEORY

ming across the observations and dividing by the number of observations.

1∑
n
x̄ = xi
n i=1

The sample mean is an estimate of the expected value of a distribution.

Example:
X 6 3 7 5 5 5 6 4 7 2
Y 1 2 1 2 2 1 2 0 2 0
1. x̄ = ȳ =
2. median(x) = median(y) =
3. mx = my =

Dispersion: We also typically want to know how spread out the data are relative to the
center of the observed distribution. There are several ways to measure dispersion.
Sample variance: The sample variance is the sum of the squared deviations from the
sample mean, divided by the number of observations minus 1.

1 ∑
n
ˆ
Var(X) = (xi − x̄)2
n − 1 i=1

Again, this is an estimate of the variance of a random variable; we divide by n − 1 instead


of n in order to get an unbiased estimate.
Standard deviation: The sample standard deviation is the square root of the sample
variance. v
√ u
u 1 ∑ n
ˆ
SD(X) ˆ
= Var(X) =t (xi − x̄)2
n − 1 i=1

Example: Using table above, calculate:


1. Var(X) = Var(Y ) =
2. SD(X) = SD(Y ) =

Covariance and Correlation: Both of these quantities measure the degree to which two
variables vary together, and are estimates of the covariance and correlation of two random
variables as defined above.
ˆ ∑n
i=1 (xi − x̄)(yi − ȳ)
1
1. Sample covariance: Cov(X, Y ) = n−1
ˆ
2. Sample correlation: Corr = √
ˆ Cov(X,Y
ˆ
)
ˆ
Var(X)Var(Y )

Example 6.16. Example: Using the above table, calculate the sample versions of:
6.13. ASYMPTOTIC THEORY 125

1. Cov(X, Y )
2. Corr(X, Y )

6.13 Asymptotic Theory


In theoretical and applied research, asymptotic arguments are often made. In this section
we briefly introduce some of this material.
What are asymptotics? In probability theory, asymptotic analysis is the study of limiting
behavior. By limiting behavior, we mean the behavior of some random process as the
number of observations gets larger and larger.
Why is this important? We rarely know the true process governing the events we see in the
social world. It is helpful to understand how such unknown processes theoretically must
behave and asymptotic theory helps us do this.

6.13.1 CLT and LLN

We are now finally ready to revisit, with a bit more precise terms, the two pillars of statistical
theory we motivated Section 3.3 with.

Theorem 6.1 (Central Limit Theorem (i.i.d. case)). Let {Xn } = {X1 , X2 , . . .} be a se-
quence of i.i.d. random variables with finite mean (µ) and variance (σ 2 ). Then, the sample
mean X̄n = X1 +X2n+···+Xn increasingly converges into a Normal distribution.

X̄n − µ d
√ − → Normal(0, 1),
σ/ n

Another way to write this as a probability statement is that for all real numbers a,

( )
X̄n − µ
P √ ≤ a → Φ(a)
σ/ n
as n → ∞, where ∫ x
1 x2
Φ(x) = √ e− 2 dx
−∞ 2π
is the CDF of a Normal distribution with mean 0 and variance 1.
This result means that, as n grows, the distribution of the sample mean X̄n = n1 (X1 + X2 +
· · · + Xn ) is approximately normal with mean µ and standard deviation √σn , i.e.,
( )
σ2
X̄n ≈ N µ, .
n

The standard deviation of X̄n (which is √ roughly a measure of the precision of X̄n as an
estimator of µ) decreases at the rate 1/ n, so, for example, to increase its precision by 10
(i.e., to get one more digit right), one needs to collect 102 = 100 times more units of data.
126 CHAPTER 6. PROBABILITY THEORY

Intuitively, this result also justifies that whenever a lot of small, independent processes some-
how combine together to form the realized observations, practitioners often feel comfortable
assuming Normality.

Theorem 6.2 (Law of Large Numbers (LLN)). For any draw of independent random vari-
ables with the same mean µ, the sample average after n draws, X̄n = n1 (X1 + X2 + . . . + Xn ),
converges in probability to the expected value of X, µ as n → ∞:

lim P (|X̄n − µ| > ε) = 0


n→∞

p
A shorthand of which is X̄n −
→ µ, where the arrow is read as “converges in probability to”.

as n → ∞. In other words, P (limn→∞ X̄n = µ) = 1. This is an important motivation for


the widespread use of the sample mean, as well as the intuition link between averages and
expected values.
More precisely this version of the LLN is called the weak law of large numbers because it
leaves open the possibility that |X̄n − µ| > ε occurs many times. The strong law of large
numbers states that, under a few more conditions, the probability that the limit of the
sample average is the true mean is 1 (and other possibilities occur with probability 0), but
the difference is rarely consequential in practice.
The Strong Law of Large Numbers holds so long as the expected value exists; no other
assumptions are needed. However, the rate of convergence will differ greatly depending
on the distribution underlying the observed data. When extreme observations occur often
(i.e. kurtosis is large), the rate of convergence is much slower. Cf. The distribution of
financial returns.

6.13.2 Big O Notation

Some of you may encounter “big-OH”-notation. If f, g are two functions, we say that
f = O(g) if there exists some constant, c, such that f (n) ≤ c × g(n) for large enough n.
This notation is useful for simplifying complex problems in game theory, computer science,
and statistics.
Example.
What is O(5 exp(0.5n) + n2 + n/2)? Answer: exp(n). Why? Because, for large n,
5 exp(0.5n) + n2 + n/2 c exp(n)
≤ = c.
exp(n) exp(n)
whenever n > 4 and where c = 1.

Answers to Examples and Exercises


Answer to Example 6.1:
6.13. ASYMPTOTIC THEORY 127

1. 5 × 5 × 5 = 125
2. 5 × 4 × 3 = 60
( )
3. 53 = (5−3)!3!
5!
= 5×4
2×1 = 10

Answer to Exercise 6.1:


( )
1. 52
4
52!
= (52−4)!4! = 270725

Answer to Example 6.2:


1. {1, 2, 3, 4, 5, 6}
2. {5, 6}
3. {1, 2, 7, 8, 9, 10}
4. {3, 4}
Answer to Exercise 6.2:
Sample Space: {2, 3, 4, 5, 6, 7, 8}
1. {3, 4, 5, 6, 7}
2. {4, 5, 6}
Answer to Example 6.3:
1. 1, 2, 3, 4, 5, 6
1
2. 6

3. 0
1
4. 2
4 2
5. 6 = 3

6. 1
7. A ∪ B = {1, 2, 3, 4, 6}, A ∩ B = {2}, 5
6

Answer to Exercise 6.3:


4 2 3
1. P (X = 5) = 16 , P (X = 3) = 16 , P (X = 6) = 16

2. What is P (X = 5 ∪ X = 3)C = 10
16 ?

Answer to Example 6.4:


nab +nabc
1. N
nab +nac b
2. N
nab
3. N
nab
nab
4. N
nab +nac b = nab +nac b
N

nab
nab
5. N
nab +nabc = nab +nabc
N
128 CHAPTER 6. PROBABILITY THEORY

Answer to Example 6.5:


1
P (1∩Odd) 1
P (1|Odd) = P (Odd) = 6
1 = 3
2

Answer to Example 6.6:


We are given that

P (D) = .4, P (Dc ) = .6, P (S|D) = .5, P (S|Dc ) = .9

Using this, Bayes’ Law and the Law of Total Probability, we know:

P (D)P (S|D)
P (D|S) =
P (D)P (S|D) + P (Dc )P (S|Dc )
.4 × .5
P (D|S) = = .27
.4 × .5 + .6 × .9

Answer to Exercise 6.4:


We are given that

P (M ) = .02, P (C|M ) = .95, P (C c |M c ) = .97

P (C|M )P (M )
P (M |C) =
P (C)

P (C|M )P (M )
=
P (C|M )P (M ) + P (C|M c )P (M c )

P (C|M )P (M )
=
P (C|M )P (M ) + [1 − P (C c |M c )]P (M c )
.95 × .02
= = .38
.95 × .02 + .03 × .98

Answer to Example 6.10:


E(Y ) = 7/2
We would never expect the result of a rolled die to be 7/2, but that would be the average
over a large number of rolls of the die.
Answer to Example 6.11
0.75
Answer to Example 6.12:
E(X) = 0 × 1
8 +1× 3
8 +2× 3
8 +3× 1
8 = 3
2

Since there is a 1 to 1 mapping from X to X 2 : E(X 2 ) = 0× 81 +1× 38 +4× 38 +9× 18 = 24


8 =3
6.13. ASYMPTOTIC THEORY 129

Var(x) = E(X 2 ) − E(x)2


3
= 3 − ( )2
2
3
=
4

Answer to Exercise 6.6:


1. E(X) = −2( 15 ) + −1( 16 ) + 0( 15 ) + 1( 15
1
) + 2( 11
30 ) =
7
30
7
2. E(Y ) = 0( 15 ) + 1( 30 ) + 4( 17
30 ) =
5
2

3.

Var(X) = E[X 2 ] − E[X]2


= E(Y ) − E(X)2
5 72
= − ≈ 2.45
2 30

Answer to Exercise 6.7:


1. expectation = 65 , variance = 6
25

Answer to Exercise 6.8:


√ 2
1. mean = 2, standard deviation = (3)

2. 18 (2 − ( 23 ))2
130 CHAPTER 6. PROBABILITY THEORY
Part II

Programming

131
Chapter 7

Orientation and Reading in


Data1

Welcome to the first in-class session for programming. Up till this point, you should have
already:
• Completed the R Visualization and Programming primers (under “The Basics”) on
your own at https://rstudio.cloud/learn/primers/,
• Made an account at RStudio Cloud and join the Math Prefresher 2018 Space, and
• Successfully signed up for the University wi-fi: https://getonline.harvard.edu/
(Access Harvard Secure with your HarvardKey. Try to get a HarvardKey as soon as
possible.)

Where are we? Where are we headed?

Today we’ll cover:


• What’s what in RStudio
• How to read in data
• Comment on coding style on the way

Check your understanding

• What is the difference between a file and a folder?


• In the RStudio windows, what is the difference between the “Source” Pane and the
“Console”? What is a “code chunk”?
• How do you read a R help page? What is the Usage section, the Values section, and
the Examples section?
• What use is the “Environment” Pane?
• How would you read in a spreadsheet in R?
1 Module originally written by Shiro Kuriwaki

133
134 CHAPTER 7. ORIENTATION AND READING IN DATA

• How would you figure out what variables are in the data? size of the data?
• How would you read in a csv file, a dta file, a sav file?

7.1 Motivation: Data and You


The modal social science project starts by importing existing datasets. Datasets come in all
shapes and sizes. As you search for new data you may encounter dozens of file extensions –
csv, xlsx, dta, sav, por, Rdata, Rds, txt, xml, json, shp … the list continues. Although these
files can often be cumbersome, its a good to be able to find a way to encounter any file that
your research may call for.
Reviewing data import will allow us to get on the same page on how computer systems
work.

7.2 Orienting
1. We will be using a cloud version of RStudio at https://rstudio.cloud. You should
join the Math Prefresher Space 2018 from the link that was emailed to you. Each day,
click on the project with the day’s date on it.
Although most of you will probably doing your work on RStudio local rather than
cloud, we are trying to use cloud because it makes it easier to standardize people’s
settings.
2. RStudio (either cloud or desktop) is a GUI and an IDE for the programming language
R. A Graphical User Interface allows users to interface with the software (in this case
R) using graphical aids like buttons and tabs. Often we don’t think of GUIs because
to most computer users, everything is a GUI (like Microsoft Word or your “Control
Panel”), but it’s always there! A Integrated Development Environment just says that
the software to interface with R comes with useful useful bells and whistles to give
you shortcuts.
The Console is kind of a the core window through which you see your GUI actually
operating through R. It’s not graphical so might not be as intuitive. But all your
results, commands, errors, warnings.. you see them in here. A console tells you what’s
going on now.
3. via the GUI, you the analyst needs to sends instructions, or commands, to the R
application. The verb for this is “run” or “execute” the command. Computer programs
ask users to provide instructions in very specific formats. While a English-speaking
human can understand a sentence with a few typos in it by filling in the blanks, the
same typo or misplaced character would halt a computer program. Each program has
its own requirements for how commands should be typed; after all, each of these is its
own language. We refer to the way a program needs its commands to be formatted as
its syntax.
4. Theoretically, one could do all their work by typing in commands into the Console.
But that would be a lot of work, because you’d have to give instructions each time
7.2. ORIENTING 135

Figure 7.1: A Typical RStudio Window at Startup

you start your data analysis. Moreover, you’ll have no record of what you did. That’s
why you need a script. This is a type of code. It can be referred to as a source
because that is the source of your commands. Source is also used as a verb; “source
the script” just means execute it. RStudio doesn’t start out with a script, so you can
make one from “File > New” or the New file icon.

4. You can also open scripts that are in folders in your computer. A script is a type of
File. Find your Files in the bottom-right “Files” pane.

To load a dataset, you need to specify where that file is. Computer files (data, doc-
uments, programs) are organized hiearchically, like a branching tree. Folders can
contain files, and also other folders. The GUI toolbar makes this lineaer and hiearchi-
cal relationship apparent. When we turn to locate the file in our commands, we need
another set of syntax. Importantly, denote the hierarchy of a folder by the / (slash)
symbol. data/input/2018-08 indicates the 2018-08 folder, which is included in the
input folder, which is in turn included in the data folder.

Files (but not folders) have “file extensions” which you are probably familiar with
already: .docx, .pdf, and .pdf. The file extensions you will see in a stats or quanti-
tative social science class are:

• .pdf: PDF, a convenient format to view documents and slides in, regardless of
Mac/Windows.

• .csv: A comma separated values file


136 CHAPTER 7. ORIENTATION AND READING IN DATA

Figure 7.2: Opening New Script (as opposed to the Console)

• .xlsx: Microsoft Excel file

• .dta: Stata data

• .sav: SPSS data

• .R: R code (script)

• .Rmd: Rmarkdown code (text + code)

• .do: Stata code (script)

5. In R, there are two main types of scripts. A classic .R file and a .Rmd file (for
Rmarkdown). A .R file is just lines and lines of R code that is meant to be inserted
right into the Console. A .Rmd tries to weave code and English together, to make it
easier for users to create reports that interact with data and intersperse R code with
explanation. For example, we built this book in Rmds.

The Rmarkdown facilitates is the use of code chunks, which are used here. These
start and end with three back-ticks. In the beginning, we can add options in curly
braces ({}). Specifying r in the beginning tells to render it as R code. Options
like echo = TRUE switch between showing the code that was executed or not; eval =
TRUE switch between evaluating the code. More about Rmarkdown in Section 13. For
example, this code chunk would evaluate 1 + 1 and show its output when compiled,
but not display the code that was executed.
7.2. ORIENTING 137

Figure 7.3: Opening an Existing Script from Files

Figure 7.4: A code chunk in Rmarkdown (before rendering)


138 CHAPTER 7. ORIENTATION AND READING IN DATA

7.3 The Computer and You: Giving Instructions


We’ll do the Peanut Butter and Jelly Exercise in class as an introduction to programming
for those who are new.2
Assignment: Take 5 minutes to write down on a piece of paper, how to make a peanut
butter and jelly sandwich. Be as concise and unambiguous as possible so that a robot (who
doesn’t know what a PBJ is) would understand. You can assume that there will be loaf of
sliced bread, a jar of jelly, a jar of peanut butter, and a knife.

7.4 Base-R vs. tidyverse


One last thing before we jump into data. Many things in R and other open source packages
have competing standards. A lecture on a technique inevitably biases one standard over
another. Right now among R users in this area, there are two families of functions: base-R
and tidyverse. R instructors thus face a dilemma about which to teach primarily.3
In this prefresher, we try our best to choose the one that is most useful to the modal task
of social science researchers, and make use of the tidyverse functions in most applications.
but feel free to suggest changes to us or to the booklet.
Although you do not need to choose one over the other, for beginners it is confusing what
is a tidyverse function and what is not. Many of the tidyverse packages are covered in this
2017 graphic below, and the cheat-sheets that other programmers have written: https:
//www.rstudio.com/resources/cheatsheets/
The following side-by-side comparison of commands for a particular function compares some
tidyverse and non-tidyverse functions (which we refer to loosely as base-R). This list is not
meant to be comprehensive and more to give you a quick rule of thumb.

Dataframe subsetting

In order to … in tidyverse: in base-R:


Count each category count(df, var) table(df$var)
Filter rows by condition filter(df, var == df[df$var == "Female",
"Female") ] or subset(df, var ==
"Female")
Extract columns select(df, var1, var2) df[, c("var1",
"var2")]
Extract a single column pull(df, var) df[["var"]] or df[,
as a vector "var"]
Combine rows bind_rows() rbind()

2 This Exercise is taken from Harvard’s Introductory Undergraduate Class, CS50 (https://www.youtube.
com/watch?v=kcbT3hrEi9s), and many other writeups.
3 See for example this community discussion: https://community.rstudio.com/t/
base-r-and-the-tidyverse/2965/17
7.4. BASE-R VS. TIDYVERSE 139

In order to … in tidyverse: in base-R:


Combine columns bind_cols() cbind()
Create a dataframe tibble(x = vec1, y = data.frame(x = vec1, y
vec2) = vec2)
Turn a dataframe into a tbl_df(df)
tidyverse dataframe

Remember that tidyverse applies to dataframes only, not vectors. For subsetting vectors,
use the base-R functions with the square brackets.

Read data

Some non-tidyverse functions are not quite “base-R” but have similar relationships to tidy-
verse. For these, we recommend using the tidyverse functions as a general rule due to their
common format, simplicity, and scalability.

In order to … in tidyverse: in base-R:


Read a Excel file read_excel() read.xlsx()
Read a csv read_csv() read.csv()
Read a Stata file read_dta() read.dta()
Substitute strings str_replace() gsub()
Return matching strings str_subset() grep(., value = TRUE)
Merge data1 and data2 left_join(data1, data2, merge(data1, data2,
on variables x1 and x2 by = c("x1", "x2")) by.x = "x1", by.y =
"x2", all.x = TRUE)

Visualization

Plotting by ggplot2 (from your tutorials) is also a tidyverse family.

In order to … in tidyverse: in base-R:


Make a scatter plot ggplot(data, aes(x, y)) plot(data$x, data$y)
+ geom_point()
Make a line plot ggplot(data, aes(x, y)) plot(data$x, data$y,
+ geom_line() type = "l")
Make a histogram ggplot(data, aes(x, y)) hist(data$x, data$y)
+ geom_histogram()
Make a barplot See Section 9 See Section 9
140 CHAPTER 7. ORIENTATION AND READING IN DATA

Figure 7.5: Names of Packages in the tidyverse Family

7.5 A is for Athens


For our first dataset, let’s try reading in a dataset on the Ancient Greek world. Political
Theorists and Political Historians study the domestic systems, international wars, cultures
and writing of this era to understand the first instance of democracy, the rise and overturning
of tyranny, and the legacies of political institutions.
This POLIS dataset was generously provided by Professor Josiah Ober of Stanford Univer-
sity. This dataset includes information on city states in the Ancient Greek world, parts of it
collected by careful work by historians and archaeologists. It is part of his recent books on
Greece (Ober 2015), “The Rise and Fall of Classical Greece”4 and Institutions in Ancient
Athens (Ober 2010) , “Democracy and Knowledge: Innovation and Learning in Classical
Athens.”5

7.5.1 Locating the Data

What files do we have in the data/input folder?


## data/input/Nunn_Wantchekon_AER_2011.dta
## data/input/Nunn_Wantchekon_sample.dta
## data/input/acs2015_1percent.csv
## data/input/gapminder_wide.Rds
## data/input/gapminder_wide.tab
## data/input/german_credit.sav
4 Ober,Josiah (2015). The Rise and Fall of Classical Greece. Princeton University Press.
5 Ober,Josiah (2010). Democracy and Knowledge: Innovation and Learning in Classical Athens. Princeton
University Press.
7.5. A IS FOR ATHENS 141

## data/input/justices_court-median.csv
## data/input/ober_2018.xlsx
## data/input/sample_mid.csv
## data/input/sample_polity.csv
## data/input/upshot-siena-polls.csv
## data/input/usc2010_001percent.Rds
## data/input/usc2010_001percent.csv
A typical file format is Microsoft Excel. Although this is not usually the best format for R
because of its highly formatted structure as opposed to plain text (more on this in Section
??(sec:wysiwyg)), recent packages have made this fairly easy.

7.5.2 Reading in Data

In Rstudio, a good way to start is to use the GUI and the Import tool. Once you click a file,
an option to “Import Dataset” comes up. RStudio picks the right function for you, and you
can copy that code, but it’s important to eventually be able to write that code yourself.
For the first time using an outside package, you first need to install it.
install.packages("readxl")

After that, you don’t need to install it again. But you do need to load it each time.
library(readxl)

The package readxl has a website: https://readxl.tidyverse.org/. Other packages are


not as user-friendly, but they have a help page with a table of contents of all their functions.
help(package = readxl)

From the help page, we see that read_excel() is the function that we want to use.
Let’s try it.
library(readxl)
ober <- read_excel("data/input/ober_2018.xlsx")

Review: what does the / mean? Why do we need the data term first? Does the argument
need to be in quotes?

7.5.3 Inspecting

For almost any dataset, you usually want to do a couple of standard checks first to under-
stand what you loaded.
ober

## # A tibble: 1,035 x 10
## polis_number Name Latitude Longitude Hellenicity Fame Size Colonies
## <dbl> <chr> <dbl> <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 Alal~ 42.1 9.51 most Greek 1.12 100-~ 0
142 CHAPTER 7. ORIENTATION AND READING IN DATA

## 2 2 Empo~ 42.1 3.11 most barba~ 2.12 25-1~ 0


## 3 3 Mass~ 43.3 5.38 most Greek 4 25-1~ 2
## 4 4 Rhode 42.3 3.17 most Greek 0.87 <NA> 0
## 5 5 Abak~ 38.1 15.1 most barba~ 1 <NA> 0
## 6 6 Adra~ 37.7 14.8 most Greek 1 <NA> 0
## 7 7 Agyr~ 37.7 14.5 most Greek 1.25 <NA> 0
## 8 8 Aitna 38.2 15.6 most Greek 3.25 200-~ 1
## 9 9 Akra~ 37.3 13.6 most Greek 6.37 500 ~ 0
## 10 10 Akrai 37.1 14.9 most Greek 1.25 <NA> 0
## # ... with 1,025 more rows, and 2 more variables: Regime <chr>,
## # Delian <chr>
dim(ober)

## [1] 1035 10
From your tutorials, you also know how to do graphics! Graphics are useful for grasping
your data, but we will cover them more deeply in Chapter 9.
ggplot(ober, aes(x = Fame)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

500

400

300
count

200

100

0 5 10 15 20
Fame

What about the distribution of fame by regime?


ggplot(ober, aes(y = Fame, x = Regime, group = Regime)) +
geom_boxplot()
7.5. A IS FOR ATHENS 143

20

15
Fame

10

evidence of democracy no evidence of democracy NA


Regime

What do the 1’s, 2’s, and 3’s stand for?

7.5.4 Finding observations

These tidyverse commands from the dplyr package are newer and not built-in, but they
are one of the increasingly more popular ways to wrangle data.
• 80 percent of your data wrangling needs might be doable with these basic dplyr
functions: select, mutate, group_by, summarize, and arrange.
• These verbs roughly correspond to the same commands in SQL, another important
language in data science.
• The %>% symbol is a pipe. It takes the thing on the left side and pipes it down to the
function on the right side. We could have done count(cen10, race) as cen10 %>%
count(race). That means take cen10 and pass it on to the function count, which
will count observations by race and return a collapsed dataset with the categories in
its own variable and their respective counts in n.

7.5.5 Extra: A sneak peak at Ober’s data

Although this is a bit beyond our current stage, it’s hard to resist the temptation to see
what you can do with data like this. For example, you can map it.6
6 Inmid-2018, changes in Google’s services made it no longer possible to render maps on the fly. Therefore,
the map is not currently rendered automatically (but can be rendered once the user registers their API).
144 CHAPTER 7. ORIENTATION AND READING IN DATA

Using the ggmap package


library(ggmap)

First get a map of the Greek world.


greece <- get_map(location = c(lon = 22.6382849, lat = 39.543287),
zoom = 5,
source = "stamen",
maptype = "toner")
ggmap(greece)

I chose the specifications for arguments zoom and maptype by looking at the webpage and
Googling some examples.
Ober’s data has the latitude and longitude of each polis. Because the map of Greece has
the same coordinates, we can add the polei on the same map.
gg_ober <- ggmap(greece) +
geom_point(data = ober,
aes(y = Latitude, x = Longitude),
size = 0.5,
color = "orange")
gg_ober +
scale_x_continuous(limits = c(10, 35)) +
scale_y_continuous(limits = c(32, 44)) +
theme_void()

Exercises

What is the Fame value of Delphoi?


# Enter here

Find the polis with the top 10 Fame values.


# Enter here

Make a scatterplot with the number of colonies on the x-axis and Fame on the y-axis.
Instead, you now need to register with Google. See the change to the pacakge ggmap.
7.5. A IS FOR ATHENS 145

Figure 7.6
146 CHAPTER 7. ORIENTATION AND READING IN DATA

Figure 7.7

# Enter here

Find the correct function to read the following datasets (available in your rstudio.cloud
session) into your R window.

• data/input/acs2015_1percent.csv: A one percent sample of the American Com-


munity Survey
• data/input/gapminder_wide.tab: Country-level wealth and health from Gapmin-
der7
• data/input/gapminder_wide.Rds: A Rds version of the Gapminder (What is a Rds
file? What’s the difference?)
• data/input/Nunn_Wantchekon_sample.dta: A sample from the Afrobarometer sur-
vey (which we’ll explore tomorrow). .dta is a Stata format.
• data/input/german_credit.sav: A hypothetical dataset on consumer credit. .sav
is a SPSS format.

Our Recommendations: Look at the packages haven and readr


7 Formatted and taken from https://doi.org/10.7910/DVN/GJQNEQ
7.5. A IS FOR ATHENS 147

# Enter here, perhaps making a chunk for each file.

Read Ober’s codebook and find a variable that you think is interesting. Check the distribu-
tion of that variable in your data, get a couple of statistics, and summarize it in English.
# Enter here

This is day 1 and we covered a lot of material. Some of you might have found this completely
new; others not so. Please click through this survey before you leave so we can adjust
accordingly on the next few days.
https://harvard.az1.qualtrics.com/jfe/form/SV_8As7Y7C83iBiQzH
148 CHAPTER 7. ORIENTATION AND READING IN DATA
Chapter 8

Manipulating Vectors and


Matrices1

Where are we? Where are we headed?

Up till now, you should have covered:


• R basic programming
• Data Import
• Statistical Summaries.
Today we’ll cover
• Matrices & Dataframes in R
• Manipulating variables
• And other R tips

8.1 Basics - Matrices


Let’s take a look at Matrices in the context of R
cen10 <- read_csv("data/input/usc2010_001percent.csv")
head(cen10)

## # A tibble: 6 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
1 Module originally written by Shiro Kuriwaki and Yon Soo Park

149
150 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## 5 Maryland Female 18 Black/Negro


## 6 New Hampshire Male 50 White
What is the dimension of this dataframe? What does the number of rows represent? What
does the number of columns represent?
dim(cen10)

## [1] 30871 4
nrow(cen10)

## [1] 30871
ncol(cen10)

## [1] 4
What variables does this dataset hold? What kind of information does it have?
colnames(cen10)

## [1] "state" "sex" "age" "race"


We can access column vectors, or vectors that contain values of variables by using the $ sign
head(cen10$state)

## [1] "New York" "Ohio" "Nevada" "Michigan"


## [5] "Maryland" "New Hampshire"
head(cen10$race)

## [1] "White" "White" "White" "White" "Black/Negro"


## [6] "White"
We can look at a unique set of variable values by calling the unique function
unique(cen10$state)

## [1] "New York" "Ohio" "Nevada"


## [4] "Michigan" "Maryland" "New Hampshire"
## [7] "Iowa" "Missouri" "New Jersey"
## [10] "California" "Texas" "Pennsylvania"
## [13] "Washington" "West Virginia" "Idaho"
## [16] "North Carolina" "Massachusetts" "Connecticut"
## [19] "Arkansas" "Indiana" "Wisconsin"
## [22] "Maine" "Tennessee" "Minnesota"
## [25] "Florida" "Oklahoma" "Montana"
## [28] "Georgia" "Arizona" "Colorado"
## [31] "Virginia" "Illinois" "Oregon"
## [34] "Kentucky" "South Carolina" "Kansas"
## [37] "Louisiana" "Alabama" "District of Columbia"
## [40] "Mississippi" "Utah" "Delaware"
## [43] "Nebraska" "Alaska" "New Mexico"
## [46] "South Dakota" "Hawaii" "Vermont"
8.1. BASICS - MATRICES 151

## [49] "Rhode Island" "Wyoming" "North Dakota"


How many different states are represented (this dataset includes DC as a state)?
length(unique(cen10$state))

## [1] 51
Matrices are rectangular structures of numbers (they have to be numbers, and they can’t
be characters).
A cross-tab can be considered a matrix:
table(cen10$race, cen10$sex)

##
## Female Male
## American Indian or Alaska Native 142 153
## Black/Negro 2070 1943
## Chinese 192 162
## Japanese 51 26
## Other Asian or Pacific Islander 587 542
## Other race, nec 877 962
## Three or more major races 37 51
## Two major races 443 426
## White 11252 10955
cross_tab <- table(cen10$race, cen10$sex)
dim(cross_tab)

## [1] 9 2
cross_tab[6, 2]

## [1] 962
But a subset of your data – individual values– can be considered a matrix too.
# First 20 rows of the entire data
# Below two lines of code do the same thing
cen10[1:20, ]

## # A tibble: 20 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
152 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## 10 California Male 25 White


## 11 Texas Female 23 White
## 12 Pennsylvania Female 66 White
## 13 California Female 57 White
## 14 Texas Female 73 Other race, nec
## 15 California Male 43 White
## 16 Washington Male 29 White
## 17 Texas Male 8 White
## 18 Missouri Male 78 White
## 19 West Virginia Male 10 White
## 20 Idaho Female 9 White
cen10 %>% slice(1:20)

## # A tibble: 20 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## 11 Texas Female 23 White
## 12 Pennsylvania Female 66 White
## 13 California Female 57 White
## 14 Texas Female 73 Other race, nec
## 15 California Male 43 White
## 16 Washington Male 29 White
## 17 Texas Male 8 White
## 18 Missouri Male 78 White
## 19 West Virginia Male 10 White
## 20 Idaho Female 9 White
# Of the first 20 rows of the entire data, look at values of just race and age
# Below two lines of code do the same thing
cen10[1:20, c("race", "age")]

## # A tibble: 20 x 2
## race age
## <chr> <dbl>
## 1 White 8
## 2 White 24
## 3 White 37
## 4 White 12
8.1. BASICS - MATRICES 153

## 5 Black/Negro 18
## 6 White 50
## 7 White 51
## 8 White 41
## 9 White 62
## 10 White 25
## 11 White 23
## 12 White 66
## 13 White 57
## 14 Other race, nec 73
## 15 White 43
## 16 White 29
## 17 White 8
## 18 White 78
## 19 White 10
## 20 White 9
cen10 %>% slice(1:20) %>% select(race, age)

## # A tibble: 20 x 2
## race age
## <chr> <dbl>
## 1 White 8
## 2 White 24
## 3 White 37
## 4 White 12
## 5 Black/Negro 18
## 6 White 50
## 7 White 51
## 8 White 41
## 9 White 62
## 10 White 25
## 11 White 23
## 12 White 66
## 13 White 57
## 14 Other race, nec 73
## 15 White 43
## 16 White 29
## 17 White 8
## 18 White 78
## 19 White 10
## 20 White 9
A vector is a special type of matrix with only one column or only one row
# One column
cen10[1:10, c("age")]

## # A tibble: 10 x 1
## age
154 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## <dbl>
## 1 8
## 2 24
## 3 37
## 4 12
## 5 18
## 6 50
## 7 51
## 8 41
## 9 62
## 10 25
cen10 %>% slice(1:10) %>% select(c("age"))

## # A tibble: 10 x 1
## age
## <dbl>
## 1 8
## 2 24
## 3 37
## 4 12
## 5 18
## 6 50
## 7 51
## 8 41
## 9 62
## 10 25
# One row
cen10[2, ]

## # A tibble: 1 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 Ohio Male 24 White
cen10 %>% slice(2)

## # A tibble: 1 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 Ohio Male 24 White

What if we want a special subset of the data? For example, what if I only want the records of
individuals in California? What if I just want the age and race of individuals in California?
# subset for CA rows
ca_subset <- cen10[cen10$state == "California", ]

ca_subset_tidy <- cen10 %>% filter(state == "California")


8.1. BASICS - MATRICES 155

all_equal(ca_subset, ca_subset_tidy)

## [1] TRUE
# subset for CA rows and select age and race
ca_subset_age_race <- cen10[cen10$state == "California", c("age", "race")]

ca_subset_age_race_tidy <- cen10 %>% filter(state == "California") %>% select(age, race)

all_equal(ca_subset_age_race, ca_subset_age_race_tidy)

## [1] TRUE
Some common operators that can be used to filter or to use as a condition. Remember, you
can use the unique function to look at the set of all values a variable holds in the dataset.
# all individuals older than 30 and younger than 70
s1 <- cen10[cen10$age > 30 & cen10$age < 70, ]
s2 <- cen10 %>% filter(age > 30 & age < 70)
all_equal(s1, s2)

## [1] TRUE
# all individuals in either New York or California
s3 <- cen10[cen10$state == "New York" | cen10$state == "California", ]
s4 <- cen10 %>% filter(state == "New York" | state == "California")
all_equal(s3, s4)

## [1] TRUE
# all individuals in any of the following states: California, Ohio, Nevada, Michigan
s5 <- cen10[cen10$state %in% c("California", "Ohio", "Nevada", "Michigan"), ]
s6 <- cen10 %>% filter(state %in% c("California", "Ohio", "Nevada", "Michigan"))
all_equal(s5, s6)

## [1] TRUE
# all individuals NOT in any of the following states: California, Ohio, Nevada, Michigan
s7 <- cen10[!(cen10$state %in% c("California", "Ohio", "Nevada", "Michigan")), ]
s8 <- cen10 %>% filter(!state %in% c("California", "Ohio", "Nevada", "Michigan"))
all_equal(s7, s8)

## [1] TRUE

Checkpoint

Get the subset of cen10 for non-white individuals (Hint: look at the set of values for the
race variable by using the unique function)
156 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

# Enter here

Get the subset of cen10 for females over the age of 40


# Enter here

Get all the serial numbers for black, male individuals who don’t live in Ohio or Nevada.
# Enter here

8.1.1 data frames

You can think of data frames maybe as matrices-plus, because a column can take on char-
acters as well as numbers. As we just saw, this is often useful for real data analyses.
cen10

## # A tibble: 30,871 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## # ... with 30,861 more rows

Another way to think about data frames is that it is a type of list. Try the str() code
below and notice how it is organized in slots. Each slot is a vector. They can be vectors of
numbers or characters.
# enter this on your console
str(cen10)
8.2. MOTIVATION 157

8.2 Motivation
Nunn and Wantchekon (2011) – “The Slave Trade and the Origins of Mistrust in Africa”2
– argues that across African countries, the distrust of co-ethnics fueled by the slave trade
has had long-lasting effects on modern day trust in these territories. They argued that the
slave trade created distrust in these societies in part because as some African groups were
employed by European traders to capture their neighbors and bring them to the slave ships.
Nunn and Wantchekon use a variety of statistical tools to make their case (adding controls,
ordered logit, instrumental variables, falsification tests, causal mechanisms), many of which
will be covered in future courses. In this module we will only touch on their first set of anal-
ysis that use Ordinary Least Squares (OLS). OLS is likely the most common application of
linear algebra in the social sciences. We will cover some linear algebra, matrix manipulation,
and vector manipulation from this data.

8.3 Read Data

library(haven)
nunn_full <- read_dta("data/input/Nunn_Wantchekon_AER_2011.dta")

Nunn and Wantchekon’s main dataset has more than 20,000 observations. Each observation
is a respondent from the Afrobarometer survey.
head(nunn_full)

## # A tibble: 6 x 59
## respno ethnicity murdock_name isocode region district townvill
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 BEN00~ fon FON BEN atlna~ KPOMASSE TOKPA-D~
## 2 BEN00~ fon FON BEN atlna~ KPOMASSE TOKPA-D~
## 3 BEN00~ fon FON BEN atlna~ OUIDAH 3ARROND
## 4 BEN00~ fon FON BEN atlna~ OUIDAH 3ARROND
## 5 BEN00~ fon FON BEN atlna~ OUIDAH PAHOU
## 6 BEN00~ fon FON BEN atlna~ OUIDAH PAHOU
## # ... with 52 more variables: location_id <dbl>, trust_relatives <dbl>,
## # trust_neighbors <dbl>, intra_group_trust <dbl>,
## # inter_group_trust <dbl>, trust_local_council <dbl>,
## # ln_export_area <dbl>, export_area <dbl>, export_pop <dbl>,
## # ln_export_pop <dbl>, age <dbl>, age2 <dbl>, male <dbl>,
## # urban_dum <dbl>, occupation <dbl>, religion <dbl>,
## # living_conditions <dbl>, education <dbl>, near_dist <dbl>,
## # distsea <dbl>, loc_murdock_name <chr>, loc_ln_export_area <dbl>,
## # local_council_performance <dbl>, council_listen <dbl>,
## # corrupt_local_council <dbl>, school_present <dbl>,
## # electricity_present <dbl>, piped_water_present <dbl>,
2 Nunn,Nathan, and Leonard Wantchekon. 2011. “The Slave Trade and the Origins of Mistrust in Africa.”
American Economic Review 101(7): 3221–52.
158 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## # sewage_present <dbl>, health_clinic_present <dbl>,


## # district_ethnic_frac <dbl>, frac_ethnicity_in_district <dbl>,
## # townvill_nonethnic_mean_exports <dbl>,
## # district_nonethnic_mean_exports <dbl>,
## # region_nonethnic_mean_exports <dbl>,
## # country_nonethnic_mean_exports <dbl>, murdock_centr_dist_coast <dbl>,
## # centroid_lat <dbl>, centroid_long <dbl>, explorer_contact <dbl>,
## # railway_contact <dbl>, dist_Saharan_node <dbl>,
## # dist_Saharan_line <dbl>, malaria_ecology <dbl>, v30 <dbl+lbl>,
## # v33 <dbl+lbl>, fishing <dbl>, exports <dbl>, ln_exports <dbl>,
## # total_missions_area <dbl>, ln_init_pop_density <dbl>,
## # cities_1400_dum <dbl>
colnames(nunn_full)

## [1] "respno" "ethnicity"


## [3] "murdock_name" "isocode"
## [5] "region" "district"
## [7] "townvill" "location_id"
## [9] "trust_relatives" "trust_neighbors"
## [11] "intra_group_trust" "inter_group_trust"
## [13] "trust_local_council" "ln_export_area"
## [15] "export_area" "export_pop"
## [17] "ln_export_pop" "age"
## [19] "age2" "male"
## [21] "urban_dum" "occupation"
## [23] "religion" "living_conditions"
## [25] "education" "near_dist"
## [27] "distsea" "loc_murdock_name"
## [29] "loc_ln_export_area" "local_council_performance"
## [31] "council_listen" "corrupt_local_council"
## [33] "school_present" "electricity_present"
## [35] "piped_water_present" "sewage_present"
## [37] "health_clinic_present" "district_ethnic_frac"
## [39] "frac_ethnicity_in_district" "townvill_nonethnic_mean_exports"
## [41] "district_nonethnic_mean_exports" "region_nonethnic_mean_exports"
## [43] "country_nonethnic_mean_exports" "murdock_centr_dist_coast"
## [45] "centroid_lat" "centroid_long"
## [47] "explorer_contact" "railway_contact"
## [49] "dist_Saharan_node" "dist_Saharan_line"
## [51] "malaria_ecology" "v30"
## [53] "v33" "fishing"
## [55] "exports" "ln_exports"
## [57] "total_missions_area" "ln_init_pop_density"
## [59] "cities_1400_dum"

First, let’s consider a small subset of this dataset.


8.4. DATA.FRAME VS. MATRICIES 159

nunn <- read_dta("data/input/Nunn_Wantchekon_sample.dta")


nunn

## # A tibble: 10 x 5
## trust_neighbors exports ln_exports export_area ln_export_area
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 3 0.388 0.328 0.00407 0.00406
## 2 3 0.631 0.489 0.0971 0.0926
## 3 3 0.994 0.690 0.0125 0.0124
## 4 0 183. 5.21 1.82 1.04
## 5 3 0 0 0 0
## 6 2 0 0 0 0
## 7 2 666. 6.50 14.0 2.71
## 8 0 0.348 0.298 0.00608 0.00606
## 9 3 0.435 0.361 0.0383 0.0376
## 10 3 0 0 0 0

8.4 data.frame vs. matricies


This is a data.frame object.
class(nunn)

## [1] "tbl_df" "tbl" "data.frame"


But it can be also consider a matrix in the linear algebra sense. What are the dimensions
of this matrix?
nrow(nunn)

## [1] 10
data.frames and matrices have much overlap in R, but to explicitly treat an object as a
matrix, you’d need to coerce its class. Let’s call this matrix X.
X <- as.matrix(nunn)

What is the difference between a data.frame and a matrix? A data.frame can have
columns that are of different types, whereas — in a matrix — all columns must be of the
same type (usually either “numeric” or “character”).

8.5 Speed considerations

Nrow <- 100


Ncol <- 5
Xmat <- matrix(rnorm(Nrow * Ncol), nrow = Nrow, ncol = Ncol)
Xdf <- as.data.frame(Xmat)
160 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

system.time(replicate(50000, colMeans(Xmat)))

## user system elapsed


## 0.302 0.024 0.326
system.time(replicate(50000, colMeans(Xdf)))

## user system elapsed


## 3.358 0.016 3.374

8.6 Handling matricies in R


You can easily transpose a matrix
X

## trust_neighbors exports ln_exports export_area ln_export_area


## [1,] 3 0.3883497 0.3281158 0.004067405 0.004059155
## [2,] 3 0.6311236 0.4892691 0.097059444 0.092633367
## [3,] 3 0.9941893 0.6902376 0.012524694 0.012446908
## [4,] 0 182.5891266 5.2127004 1.824284434 1.038255095
## [5,] 3 0.0000000 0.0000000 0.000000000 0.000000000
## [6,] 2 0.0000000 0.0000000 0.000000000 0.000000000
## [7,] 2 665.9652100 6.5027380 13.975566864 2.706419945
## [8,] 0 0.3476418 0.2983562 0.006082553 0.006064130
## [9,] 3 0.4349871 0.3611559 0.038332380 0.037615947
## [10,] 3 0.0000000 0.0000000 0.000000000 0.000000000
t(X)

## [,1] [,2] [,3] [,4] [,5] [,6]


## trust_neighbors 3.000000000 3.00000000 3.00000000 0.000000 3 2
## exports 0.388349682 0.63112360 0.99418926 182.589127 0 0
## ln_exports 0.328115761 0.48926911 0.69023758 5.212700 0 0
## export_area 0.004067405 0.09705944 0.01252469 1.824284 0 0
## ln_export_area 0.004059155 0.09263337 0.01244691 1.038255 0 0
## [,7] [,8] [,9] [,10]
## trust_neighbors 2.000000 0.000000000 3.00000000 3
## exports 665.965210 0.347641766 0.43498713 0
## ln_exports 6.502738 0.298356235 0.36115587 0
## export_area 13.975567 0.006082553 0.03833238 0
## ln_export_area 2.706420 0.006064130 0.03761595 0
What are the values of all rows in the first column?
X[, 1]

## [1] 3 3 3 0 3 2 2 0 3 3
What are all the values of “exports”? (i.e. return the whole “exports” column)
8.6. HANDLING MATRICIES IN R 161

X[, "exports"]

## [1] 0.3883497 0.6311236 0.9941893 182.5891266 0.0000000


## [6] 0.0000000 665.9652100 0.3476418 0.4349871 0.0000000
What is the first observation (i.e. first row)?
X[1, ]

## trust_neighbors exports ln_exports export_area


## 3.000000000 0.388349682 0.328115761 0.004067405
## ln_export_area
## 0.004059155
What is the value of the first variable of the first observation?
X[1, 1]

## trust_neighbors
## 3
Pause and consider the following problem on your own. What is the following code doing?
X[X[, "trust_neighbors"] == 0, "export_area"]

## [1] 1.824284434 0.006082553


Why does it give the same output as the following?
X[which(X[, "trust_neighbors"] == 0), "export_area"]

## [1] 1.824284434 0.006082553


Some more manipulation
X + X

## trust_neighbors exports ln_exports export_area ln_export_area


## [1,] 6 0.7766994 0.6562315 0.008134809 0.00811831
## [2,] 6 1.2622472 0.9785382 0.194118887 0.18526673
## [3,] 6 1.9883785 1.3804752 0.025049388 0.02489382
## [4,] 0 365.1782532 10.4254007 3.648568869 2.07651019
## [5,] 6 0.0000000 0.0000000 0.000000000 0.00000000
## [6,] 4 0.0000000 0.0000000 0.000000000 0.00000000
## [7,] 4 1331.9304199 13.0054760 27.951133728 5.41283989
## [8,] 0 0.6952835 0.5967125 0.012165107 0.01212826
## [9,] 6 0.8699743 0.7223117 0.076664761 0.07523189
## [10,] 6 0.0000000 0.0000000 0.000000000 0.00000000
X - X

## trust_neighbors exports ln_exports export_area ln_export_area


## [1,] 0 0 0 0 0
## [2,] 0 0 0 0 0
## [3,] 0 0 0 0 0
162 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
## [6,] 0 0 0 0 0
## [7,] 0 0 0 0 0
## [8,] 0 0 0 0 0
## [9,] 0 0 0 0 0
## [10,] 0 0 0 0 0
t(X) %*% X

## trust_neighbors exports ln_exports export_area


## trust_neighbors 62.000000 1339.276 18.61181 28.40709
## exports 1339.276369 476850.298 5283.76294 9640.42990
## ln_exports 18.611811 5283.763 70.50077 100.46202
## export_area 28.407085 9640.430 100.46202 198.65558
## ln_export_area 5.853106 1992.047 23.08189 39.72847
## ln_export_area
## trust_neighbors 5.853106
## exports 1992.046502
## ln_exports 23.081893
## export_area 39.728468
## ln_export_area 8.412887
cbind(X, 1:10)

## trust_neighbors exports ln_exports export_area ln_export_area


## [1,] 3 0.3883497 0.3281158 0.004067405 0.004059155
## [2,] 3 0.6311236 0.4892691 0.097059444 0.092633367
## [3,] 3 0.9941893 0.6902376 0.012524694 0.012446908
## [4,] 0 182.5891266 5.2127004 1.824284434 1.038255095
## [5,] 3 0.0000000 0.0000000 0.000000000 0.000000000
## [6,] 2 0.0000000 0.0000000 0.000000000 0.000000000
## [7,] 2 665.9652100 6.5027380 13.975566864 2.706419945
## [8,] 0 0.3476418 0.2983562 0.006082553 0.006064130
## [9,] 3 0.4349871 0.3611559 0.038332380 0.037615947
## [10,] 3 0.0000000 0.0000000 0.000000000 0.000000000
##
## [1,] 1
## [2,] 2
## [3,] 3
## [4,] 4
## [5,] 5
## [6,] 6
## [7,] 7
## [8,] 8
## [9,] 9
## [10,] 10
8.7. VARIABLE TRANSFORMATIONS 163

cbind(X, 1)

## trust_neighbors exports ln_exports export_area ln_export_area


## [1,] 3 0.3883497 0.3281158 0.004067405 0.004059155 1
## [2,] 3 0.6311236 0.4892691 0.097059444 0.092633367 1
## [3,] 3 0.9941893 0.6902376 0.012524694 0.012446908 1
## [4,] 0 182.5891266 5.2127004 1.824284434 1.038255095 1
## [5,] 3 0.0000000 0.0000000 0.000000000 0.000000000 1
## [6,] 2 0.0000000 0.0000000 0.000000000 0.000000000 1
## [7,] 2 665.9652100 6.5027380 13.975566864 2.706419945 1
## [8,] 0 0.3476418 0.2983562 0.006082553 0.006064130 1
## [9,] 3 0.4349871 0.3611559 0.038332380 0.037615947 1
## [10,] 3 0.0000000 0.0000000 0.000000000 0.000000000 1
colnames(X)

## [1] "trust_neighbors" "exports" "ln_exports" "export_area"


## [5] "ln_export_area"

8.7 Variable Transformations

exports is the total number of slaves that were taken from the individual’s ethnic group
between Africa’s four slave trades between 1400-1900.
What is ln_exports? The article describes this as the natural log of one plus the exports.
This is a transformation of one column by a particular function
log(1 + X[, "exports"])

## [1] 0.3281158 0.4892691 0.6902376 5.2127003 0.0000000 0.0000000 6.5027379


## [8] 0.2983562 0.3611559 0.0000000
Question for you: why add the 1?
Verify that this is the same as X[, "ln_exports"]

8.8 Linear Combinations

In Table 1 we see “OLS Estimates”. These are estimates of OLS coefficients and standard
errors. You do not need to know what these are for now, but it doesn’t hurt to getting used
to seeing them.
A very crude way to describe regression is through linear combinations. The simplest linear
combination is a one-to-one transformation.
Take the first number in Table 1, which is -0.00068. Now, multiply this by exports
-0.00068 * X[, "exports"]
164 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

Figure 8.1
8.8. LINEAR COMBINATIONS 165

## [1] -0.0002640778 -0.0004291640 -0.0006760487 -0.1241606061 0.0000000000


## [6] 0.0000000000 -0.4528563428 -0.0002363964 -0.0002957912 0.0000000000

Now, just one more step. Make a new matrix with just exports and the value 1
X2 <- cbind(1, X[, "exports"])

name this new column “intercept”


colnames(X2)

## NULL
colnames(X2) <- c("intercept", "exports")

What are the dimensions of the matrix X2?


dim(X2)

## [1] 10 2

Now consider a new matrix, called B.


B <- matrix(c(1.62, -0.00068))

What are the dimensions of B?


dim(B)

## [1] 2 1

What is the product of X2 and B? From the dimensions, can you tell if it will be conformable?
X2 %*% B

## [,1]
## [1,] 1.619736
## [2,] 1.619571
## [3,] 1.619324
## [4,] 1.495839
## [5,] 1.620000
## [6,] 1.620000
## [7,] 1.167144
## [8,] 1.619764
## [9,] 1.619704
## [10,] 1.620000

What is this multiplication doing in terms of equations?


166 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

Exercises

Let [ ]
0.6 0.2
A=
0.4 0.8

Use R to write code that will create the matrix A, and then consecutively multiply A to
itself 4 times. What is the value of A4 ?
## Enter yourself

Note that R notation of matrices is different from the math notation. Simply trying X^n
where X is a matrix will only take the power of each element to n. Instead, this problem
asks you to perform matrix multiplication.

Let’s apply what we learned about subsetting or filtering/selecting. Use the nunn_full
dataset you have already loaded
a) First, show all observations (rows) that have a "male" variable higher than 0.5
## Enter yourself

b) Next, create a matrix / dataframe with only two columns: "trust_neighbors" and
"age"
## Enter yourself

c) Lastly, show all values of "trust_neighbors" and "age" for observations (rows) that
have the “male” variable value that is higher than 0.5
## Enter yourself

Find a way to generate a vector of “column averages” of the matrix X from the Nunn and
Wantchekon data in one line of code. Each entry in the vector should contain the sample
average of the values in the column. So a 100 by 4 matrix should generate a length-4 matrix.

Similarly, generate a vector of “column medians”.


8.8. LINEAR COMBINATIONS 167

Consider the regression that was run to generate Table 1:


form <- "trust_neighbors ~ exports + age + age2 + male + urban_dum + factor(education) + factor(
lm_1_1 <- lm(as.formula(form), nunn_full)

# The below coef function returns a vector of OLS coefficiants


coef(lm_1_1)

## (Intercept) exports
## 1.619913e+00 -6.791360e-04
## age age2
## 8.395936e-03 -5.473436e-05
## male urban_dum
## 4.550246e-02 -1.404551e-01
## factor(education)1 factor(education)2
## 1.709816e-02 -5.224591e-02
## factor(education)3 factor(education)4
## -1.373770e-01 -1.889619e-01
## factor(education)5 factor(education)6
## -1.893494e-01 -2.400767e-01
## factor(education)7 factor(education)8
## -2.850748e-01 -1.232085e-01
## factor(education)9 factor(occupation)1
## -2.406437e-01 6.185655e-02
## factor(occupation)2 factor(occupation)3
## 7.392168e-02 3.356158e-02
## factor(occupation)4 factor(occupation)5
## 7.942048e-03 6.661126e-02
## factor(occupation)6 factor(occupation)7
## -7.563297e-02 1.699699e-02
## factor(occupation)8 factor(occupation)9
## -9.428177e-02 -9.981440e-02
## factor(occupation)10 factor(occupation)11
## -3.307068e-02 -2.300045e-02
## factor(occupation)12 factor(occupation)13
## -1.564540e-01 -1.441370e-02
## factor(occupation)14 factor(occupation)15
## -5.566414e-02 -2.343762e-01
## factor(occupation)16 factor(occupation)18
## -1.306947e-02 -1.729589e-01
## factor(occupation)19 factor(occupation)20
## -1.770261e-01 -2.457800e-02
## factor(occupation)21 factor(occupation)22
## -4.936813e-02 -1.068511e-01
## factor(occupation)23 factor(occupation)24
## -9.712205e-02 1.292371e-02
## factor(occupation)25 factor(occupation)995
168 CHAPTER 8. MANIPULATING VECTORS AND MATRICES

## 2.623186e-02 -1.195063e-03
## factor(religion)2 factor(religion)3
## 5.395953e-02 7.887878e-02
## factor(religion)4 factor(religion)5
## 4.749150e-02 4.318455e-02
## factor(religion)6 factor(religion)7
## -1.787694e-02 -3.616542e-02
## factor(religion)10 factor(religion)11
## 6.015041e-02 2.237845e-01
## factor(religion)12 factor(religion)13
## 2.627086e-01 -6.812813e-02
## factor(religion)14 factor(religion)15
## 4.673681e-02 3.844555e-01
## factor(religion)360 factor(religion)361
## 3.656843e-01 3.416413e-01
## factor(religion)362 factor(religion)363
## 8.230393e-01 3.856565e-01
## factor(religion)995 factor(living_conditions)2
## 4.161301e-02 4.395862e-02
## factor(living_conditions)3 factor(living_conditions)4
## 8.627372e-02 1.197428e-01
## factor(living_conditions)5 district_ethnic_frac
## 1.203606e-01 -1.553648e-02
## frac_ethnicity_in_district isocodeBWA
## 1.011222e-01 -4.258953e-01
## isocodeGHA isocodeKEN
## 1.135307e-02 -1.819556e-01
## isocodeLSO isocodeMDG
## -5.511200e-01 -3.315727e-01
## isocodeMLI isocodeMOZ
## 7.528101e-02 8.223730e-02
## isocodeMWI isocodeNAM
## 3.062497e-01 -1.397541e-01
## isocodeNGA isocodeSEN
## -2.381525e-01 3.867371e-01
## isocodeTZA isocodeUGA
## 2.079366e-01 -6.443732e-02
## isocodeZAF isocodeZMB
## -2.179153e-01 -2.172868e-01

First, get a small subset of the nunn_full dataset. This time, sample 20 rows and se-
lect for variables exports, age, age2, male, and urban_dum. To this small subset, add
(bind_cols() in tidyverse or cbind() in base R) a column of 1’s; this represents the inter-
cept. If you need some guidance, look at how we sampled 10 rows selected for a different
set of variables above in the lecture portion.
# Enter here

Next let’s try calculating predicted values of levels of trust in neighbors by multiplying coef-
8.8. LINEAR COMBINATIONS 169

ficients for the intercept, exports, age, age2, male, and urban_dum to the actual observed
values for those variables in the small subset you’ve just created.
# Hint: You can get just selected elements from the vector returned by coef(lm_1_1)

# For example, the below code gives you the first 3 elements of the original vector
coef(lm_1_1)[1:3]

## (Intercept) exports age


## 1.619913146 -0.000679136 0.008395936
# Also, the below code gives you the coefficient elements for intercept and male
coef(lm_1_1)[c("(Intercept)", "male")]

## (Intercept) male
## 1.61991315 0.04550246
170 CHAPTER 8. MANIPULATING VECTORS AND MATRICES
Chapter 9

Visualization1

Where are we? Where are we headed?

Up till now, you should have covered:


• The R Visualization and Programming primers at https://rstudio.cloud/primers/
• Reading and handling data
• Matrices and Vectors
• What does : mean in R? What about ==? ,?, != , &, |, %in%
• What does %>% do?
Today we’ll cover:
• Visualization
• A bit of data wrangling

Check your understanding

• How do you make a barplot, in base-R and in ggplot?


• How do you add layers to a ggplot?
• How do you change the axes of a ggplot?
• How do you make a histogram?
• How do you make a graph that looks like this?
Other review

9.1 Motivation: The Law of the Census


In this module, let’s visualize some cross-sectional stats with an actual Census. Then, we’ll
do an example on time trends with Supreme Court ideal points.
1 Module originally written by Shiro Kuriwaki

171
172 CHAPTER 9. VISUALIZATION

Figure 9.1: By Randy Schutt - Own work, CC BY-SA 3.0, Wikimedia.

Why care about the Census? The Census is one of the fundamental acts of a government.
See the Law Review article by Persily (2011), “The Law of the Census.”2 The Census is
government’s primary tool for apportionment (allocating seats to districts), appropriations
(allocating federal funding), and tracking demographic change. See3 for example Hochschild
and Powell (2008) on how the categorizations of race in the Census during 1850-1930. Notice
also that both of these pieces are not inherently “quantitative” — the Persily article is a
Law Review and the Hochschild and Powell article is on American Historical Development
— but data analysis would be certainly relevant.

Time series data is a common form of data in social science data, and there is growing
methodological work on making causal inferences with time series.4 We will use the the ide-
ological estimates of the Supreme court, which has been in the news with Brett Kavanaugh’s
nomination.

9.2 Read data

First, the census. Read in a subset of the 2010 Census that we looked at earlier. This time,
it is in Rds form.
cen10 <- readRDS("data/input/usc2010_001percent.Rds")

The data comes from IPUMS5 , a great source to extract and analyze Census and Census-
conducted survey (ACS, CPS) data.

2 Persily,Nathaniel. 2011. “The Law of the Census: How to Count, What to Count, Whom to Count, and
Where to Count Them.”. Cardozo Law Review 32(3): 755–91.
3 Hochschild, Jennifer L., and Brenna Marea Powell. 2008. “Racial Reorganization and the United States

Census 1850–1930: Mulattoes, Half-Breeds, Mixed Parentage, Hindoos, and the Mexican Race.”. Studies
in American Political Development 22(1): 59–96.
4 Blackwell, Matthew, and Adam Glynn. 2018. “How to Make Causal Inferences with Time-Series Cross-

Sectional Data under Selection on Observables.” American Political Science Review


5 Ruggles, Steven, Katie Genadek, Ronald Goeken, Josiah Grover, and Matthew Sobek. 2015. Integrated

Public Use Microdata Series: Version 6.0 dataset


9.3. COUNTING 173

9.3 Counting

How many people are in your sample?


nrow(cen10)

## [1] 30871
This and all subsequent tasks involve manipulating and summarizing data, sometimes called
“wrangling”. As per last time, there are both “base-R” and “tidyverse” approaches.
Yesterday we saw several functions from the tidyverse:
• select selects columns
• filter selects rows based on a logical (boolean) statement
• slice selects rows based on the row number
• arrange reordered the rows in descending order.
In this visualization section, we’ll make use of the pair of functions group_by() and
summarize().

9.4 Tabulating

Summarizing data is the key part of communication; good data viz gets the point across.6
Summaries of data come in two forms: tables and figures.
Here are two ways to count by group, or to tabulate.
In base-R Use the table function, that provides how many rows exist for an unique value
of the vector (remember unique from yesterday?)
table(cen10$race)

##
## American Indian or Alaska Native Black/Negro
## 295 4013
## Chinese Japanese
## 354 77
## Other Asian or Pacific Islander Other race, nec
## 1129 1839
## Three or more major races Two major races
## 88 869
## White
## 22207
With tidyverse, a quick convenience function is count, with the variable to count on included.
count(cen10, race)
6 Kastellec,
Jonathan P., and Eduardo L. Leoni. 2007. “Using Graphs Instead of Tables in Political Science.”.
Perspectives on Politics 5 (4): 755–71.
174 CHAPTER 9. VISUALIZATION

## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 American Indian or Alaska Native 295
## 2 Black/Negro 4013
## 3 Chinese 354
## 4 Japanese 77
## 5 Other Asian or Pacific Islander 1129
## 6 Other race, nec 1839
## 7 Three or more major races 88
## 8 Two major races 869
## 9 White 22207

We can check out the arguments of count and see that there is a sort option. What does
this do?
count(cen10, race, sort = TRUE)

## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 White 22207
## 2 Black/Negro 4013
## 3 Other race, nec 1839
## 4 Other Asian or Pacific Islander 1129
## 5 Two major races 869
## 6 Chinese 354
## 7 American Indian or Alaska Native 295
## 8 Three or more major races 88
## 9 Japanese 77

count is a kind of shorthand for group_by() and summarize. This code would have done
the same.
cen10 %>%
group_by(race) %>%
summarize(n = n())

## # A tibble: 9 x 2
## race n
## <chr> <int>
## 1 American Indian or Alaska Native 295
## 2 Black/Negro 4013
## 3 Chinese 354
## 4 Japanese 77
## 5 Other Asian or Pacific Islander 1129
## 6 Other race, nec 1839
## 7 Three or more major races 88
## 8 Two major races 869
## 9 White 22207
9.5. BASE R GRAPHICS AND GGPLOT 175

If you are new to tidyverse, what would you think each row did? Reading the function help
page, verify if your intuition was correct.
where n() is a function that counts rows.

9.5 base R graphics and ggplot

Two prevalent ways of making graphing are referred to as “base-R” and “ggplot”.

9.5.1 base R

“Base-R” graphics are graphics that are made with R’s default graphics commands. First,
let’s assign our tabulation to an object, then put it in the barplot() function.
barplot(table(cen10$race))
20000
10000
5000
0

American Indian or Alaska Native Other race, nec White

9.5.2 ggplot

A popular alternative a ggplot graphics, that you were introduced to in the tutorial. gg
stands for grammar of graphics by Hadley Wickham, and it has a new semantics of explaining
graphics in R. Again, first let’s set up the data.
176 CHAPTER 9. VISUALIZATION

Although the tutorial covered making scatter plots as the first cut, often data requires
summaries before they made into graphs.

For this example, let’s group and count first like we just did. But assign it to a new object.
grp_race <- count(cen10, race)

We will now plot this grouped set of numbers. Recall that the ggplot() function takes two
main arguments, data and aes.

1. First enter a single dataframe from which you will draw a plot.
2. Then enter the aes, or aesthetics. This defines which variable in the data the plotting
functions should take for pre-set dimensions in graphics. The dimensions x and y are
the most important. We will assign race and count to them, respectively,
3. After you close ggplot() .. add layers by the plus sign. A geom is a layer of graphical
representation, for example geom_histogram renders a histogram, geom_point renders
a scatter plot. For a barplot, we can use geom_col()

What is the right geometry layer to make a barplot? Turns out:


ggplot(data = grp_race, aes(x = race, y = n)) + geom_col()

20000

15000
n

10000

5000

American Indian or Alaska


Black/Negro
Native Chinese Japanese
Other Asian or Pacific
OtherIslander
Three
race, nec
or more major
Two major
races racesWhite
race
9.6. IMPROVING YOUR GRAPHICS 177

9.6 Improving your graphics


Adjusting your graphics to make the point clear is an important skill. Here is a base-R
example of showing the same numbers but with a different design, in a way that aims to
maximize the “data-to-ink ratio”.
par(oma = c(1, 11, 1, 1))
barplot(sort(table(cen10$race)), # sort numbers
horiz = TRUE, # flip
border = NA, # border is extraneous
xlab = "Number in Race Category",
bty = "n", # no box
las = 1) # alignment of axis labels is horizontal

White
Black/Negro
Other race, nec
Other Asian or Pacific Islander
Two major races
Chinese
American Indian or Alaska Native
Three or more major races
Japanese

0 5000 10000 15000 20000

Number in Race Category

Notice that we applied the sort() function to order the bars in terms of their counts. The
default ordering of a categorical variable / factor is alphabetical. Alphabetical ordering is
uninformative and almost never the way you should order variables.
In ggplot you might do this by:
library(forcats)

grp_race_ordered <- arrange(grp_race, n) %>%


mutate(race = as_factor(race))

ggplot(data = grp_race_ordered, aes(x = race, y = n)) +


178 CHAPTER 9. VISUALIZATION

geom_col() +
coord_flip() +
labs(y = "Number in Race Category",
x = "",
caption = "Source: 2010 U.S. Census sample")

White

Black/Negro

Other race, nec

Other Asian or Pacific Islander

Two major races

Chinese

American Indian or Alaska Native

Three or more major races

Japanese

0 5000 10000 15000 20000


Number in Race Category
Source: 2010 U.S. Census sample

The data ink ratio was popularized by Ed Tufte (originally a political economy scholar who
has recently become well known for his data visualization work). See Tufte (2001), The
Visual Display of Quantitative Information and his website https://www.edwardtufte.
com/tufte/. For a R and ggplot focused example using social science examples, check
out Healy (2018), Data Visualization: A Practical Introduction with a draft at https:
//socviz.co/7 . There are a growing number of excellent books on data visualization.

9.7 Cross-tabs

Visualizations and Tables each have their strengths. A rule of thumb is that more than a
dozen numbers on a table is too much to digest, but less than a dozen is too few for a figure
to be worth it. Let’s look at a table first.
A cross-tab is counting with two types of variables, and is a simple and powerful tool to
show the relationship between multiple variables.
7 Healy, Kieran. forthcoming. Data Visualization: A Practical Introduction. Princeton University Press
9.7. CROSS-TABS 179

xtab_race_state <- table(cen10$state, cen10$race)


xtab_race_state

##
## American Indian or Alaska Native Black/Negro
## Alabama 2 128
## Alaska 11 6
## Arizona 28 23
## Arkansas 1 45
## California 42 253
## Colorado 7 26
## Connecticut 1 39
## Delaware 3 28
## District of Columbia 0 35
## Florida 9 304
## Georgia 2 304
## Hawaii 0 0
## Idaho 2 0
## Illinois 5 194
## Indiana 2 66
## Iowa 0 9
## Kansas 2 24
## Kentucky 2 35
## Louisiana 3 161
## Maine 0 4
## Maryland 2 177
## Massachusetts 5 38
## Michigan 5 147
## Minnesota 6 25
## Mississippi 1 116
## Missouri 4 74
## Montana 8 0
## Nebraska 2 11
## Nevada 6 15
## New Hampshire 1 1
## New Jersey 0 130
## New Mexico 21 3
## New York 13 305
## North Carolina 12 220
## North Dakota 4 1
## Ohio 1 122
## Oklahoma 21 20
## Oregon 5 5
## Pennsylvania 2 156
## Rhode Island 2 3
## South Carolina 2 120
## South Dakota 7 1
## Tennessee 0 97
180 CHAPTER 9. VISUALIZATION

## Texas 14 316
## Utah 8 0
## Vermont 0 2
## Virginia 0 171
## Washington 14 20
## West Virginia 0 5
## Wisconsin 6 27
## Wyoming 1 1
##
## Chinese Japanese Other Asian or Pacific Islander
## Alabama 1 0 3
## Alaska 0 0 5
## Arizona 1 0 12
## Arkansas 0 0 1
## California 141 27 359
## Colorado 3 0 10
## Connecticut 7 0 16
## Delaware 1 0 4
## District of Columbia 0 0 1
## Florida 4 1 24
## Georgia 5 0 35
## Hawaii 2 16 35
## Idaho 0 1 0
## Illinois 6 3 53
## Indiana 3 0 8
## Iowa 1 0 4
## Kansas 2 0 8
## Kentucky 2 0 4
## Louisiana 1 0 5
## Maine 1 0 1
## Maryland 4 1 12
## Massachusetts 15 2 28
## Michigan 8 1 23
## Minnesota 5 1 28
## Mississippi 0 0 3
## Missouri 2 0 9
## Montana 0 0 0
## Nebraska 0 0 5
## Nevada 6 2 15
## New Hampshire 1 1 3
## New Jersey 19 2 65
## New Mexico 1 1 1
## New York 55 3 68
## North Carolina 4 1 12
## North Dakota 0 0 0
## Ohio 5 2 17
## Oklahoma 0 0 5
## Oregon 4 0 11
9.7. CROSS-TABS 181

## Pennsylvania 10 1 28
## Rhode Island 0 0 4
## South Carolina 1 0 4
## South Dakota 0 1 1
## Tennessee 0 0 13
## Texas 15 2 92
## Utah 1 1 6
## Vermont 0 0 0
## Virginia 8 2 29
## Washington 9 4 46
## West Virginia 0 0 0
## Wisconsin 0 1 11
## Wyoming 0 0 2
##
## Other race, nec Three or more major races
## Alabama 8 1
## Alaska 2 0
## Arizona 74 2
## Arkansas 11 1
## California 585 14
## Colorado 28 1
## Connecticut 20 1
## Delaware 5 1
## District of Columbia 1 0
## Florida 72 2
## Georgia 35 1
## Hawaii 0 14
## Idaho 8 1
## Illinois 75 2
## Indiana 20 1
## Iowa 10 0
## Kansas 6 0
## Kentucky 5 1
## Louisiana 7 0
## Maine 0 0
## Maryland 28 1
## Massachusetts 26 0
## Michigan 8 2
## Minnesota 13 1
## Mississippi 2 2
## Missouri 6 2
## Montana 1 0
## Nebraska 6 0
## Nevada 41 1
## New Hampshire 1 0
## New Jersey 69 3
## New Mexico 23 1
## New York 154 8
182 CHAPTER 9. VISUALIZATION

## North Carolina 40 2
## North Dakota 0 0
## Ohio 7 3
## Oklahoma 15 3
## Oregon 21 4
## Pennsylvania 30 1
## Rhode Island 6 1
## South Carolina 6 1
## South Dakota 2 0
## Tennessee 13 0
## Texas 253 2
## Utah 14 0
## Vermont 1 0
## Virginia 29 4
## Washington 37 2
## West Virginia 0 0
## Wisconsin 13 1
## Wyoming 2 0
##
## Two major races White
## Alabama 8 344
## Alaska 15 37
## Arizona 11 485
## Arkansas 2 247
## California 174 2168
## Colorado 22 401
## Connecticut 7 284
## Delaware 1 66
## District of Columbia 2 21
## Florida 42 1435
## Georgia 21 587
## Hawaii 27 39
## Idaho 6 129
## Illinois 35 856
## Indiana 6 514
## Iowa 8 287
## Kansas 8 237
## Kentucky 9 357
## Louisiana 6 273
## Maine 1 117
## Maryland 13 302
## Massachusetts 18 515
## Michigan 23 792
## Minnesota 10 483
## Mississippi 1 167
## Missouri 14 516
## Montana 0 88
## Nebraska 5 155
9.8. COMPOSITION PLOTS 183

## Nevada 16 171
## New Hampshire 1 129
## New Jersey 25 589
## New Mexico 6 146
## New York 51 1220
## North Carolina 20 648
## North Dakota 1 46
## Ohio 20 931
## Oklahoma 24 266
## Oregon 9 279
## Pennsylvania 27 1045
## Rhode Island 4 74
## South Carolina 6 325
## South Dakota 2 72
## Tennessee 9 474
## Texas 71 1792
## Utah 8 255
## Vermont 4 59
## Virginia 24 548
## Washington 33 524
## West Virginia 3 168
## Wisconsin 8 497
## Wyoming 2 47
Another function to make a cross-tab is the xtabs command, which uses formula notation.
xtabs(~ state + race, cen10)

What if we care about proportions within states, rather than counts? Say we’d like to
compare the racial composition of a small state (like Delaware) and a large state (like
California). In fact, most tasks of inference is about the unobserved population, not the
observed data — and proportions are estimates of a quantity in the population.
One way to transform a table of counts to a table of proportions is the function prop.table.
Be careful what you want to take proportions of – this is set by the margin argument. In
R, the first margin (margin = 1) is rows and the second (margin = 2) is columns.
ptab_race_state <- prop.table(xtab_race_state, margin = 2)

Check out each of these table objects in your console and familiarize yourself with the
difference.

9.8 Composition Plots

How would you make the same figure with ggplot()? First, we want a count for each state
× race combination. So group by those two factors and count how many observations are
in each two-way categorization. group_by() can take any number of variables, separated
by commas.
184 CHAPTER 9. VISUALIZATION

grp_race_state <- cen10 %>%


count(race, state)

Can you tell from the code what grp_race_state will look like?
# run on your own
grp_race_state

Now, we want to tell ggplot2 something like the following: I want bars by state, where
heights indicate racial groups. Each bar should be colored by the race. With some googling,
you will get something like this:
ggplot(data = grp_race_state, aes(x = state, y = n, fill = race)) +
geom_col(position = "fill") + # the position is determined by the fill ae
scale_fill_brewer(name = "Census Race", palette = "OrRd", direction = -1) + # choose palette
coord_flip() + # flip axes
scale_y_continuous(labels = percent) + # label numbers as percentage
labs(y = "Proportion of Racial Group within State",
x = "",
source = "Source: 2010 Census sample") +
theme_minimal()
9.9. LINE GRAPHS 185

Wyoming
Wisconsin
West Virginia
Washington
Virginia
Vermont
Utah
Texas
Tennessee
South Dakota
South Carolina
Rhode Island
Pennsylvania
Oregon
Oklahoma
Ohio
North Dakota
North Carolina Census Race
New York
New Mexico American Indian or Alaska Native
New Jersey
New Hampshire Black/Negro
Nevada
Chinese
Nebraska
Montana Japanese
Missouri
Mississippi Other Asian or Pacific Islander
Minnesota Other race, nec
Michigan
Massachusetts Three or more major races
Maryland
Maine Two major races
Louisiana White
Kentucky
Kansas
Iowa
Indiana
Illinois
Idaho
Hawaii
Georgia
Florida
District of Columbia
Delaware
Connecticut
Colorado
California
Arkansas
Arizona
Alaska
Alabama
0% 25% 50% 75% 100%
Proportion of Racial Group within State

9.9 Line graphs

Line graphs are useful for plotting time trends.


The Census does not track individuals over time. So let’s take up another example: The
186 CHAPTER 9. VISUALIZATION

U.S. Supreme Court. Take the dataset justices_court-median.csv.

This data is adapted from the estimates of Martin and Quinn on their website http://
mqscores.lsa.umich.edu/.8
justice <- read_csv("data/input/justices_court-median.csv")

What does the data look like? How do you think it is organized? What does each row
represent?
justice

## # A tibble: 746 x 7
## term justice_id justice idealpt idealpt_sd median_idealpt
## <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1937 67 McReyn~ 3.44 0.54 -0.568
## 2 1937 68 Brande~ -0.612 0.271 -0.568
## 3 1937 71 Suther~ 1.59 0.549 -0.568
## 4 1937 72 Butler 2.06 0.426 -0.568
## 5 1937 74 Stone -0.774 0.259 -0.568
## 6 1937 75 Hughes2 -0.368 0.232 -0.568
## 7 1937 76 O. Rob~ 0.008 0.228 -0.568
## 8 1937 77 Cardozo -1.59 0.634 -0.568
## 9 1937 78 Black -2.90 0.334 -0.568
## 10 1937 79 Reed -1.06 0.342 -0.568
## # ... with 736 more rows, and 1 more variable: median_justice <chr>

As you might have guessed, these data can be shown in a time trend from the range of the
term variable. As there are only nine justices at any given time and justices have life tenure,
there times on the court are staggered. With a common measure of “preference”, we can
plot time trends of these justices ideal points on the same y-axis scale.
ggplot(justice, aes(x = term, y = idealpt)) +
geom_line()

8 This exercise inspired from Princeton’s R Camp Assignment.


9.9. LINE GRAPHS 187

0
idealpt

−5

1940 1960 1980 2000 2020


term

Why does the above graph not look like the the put in the beginning? Fix it by adding just
one aesthetic to the graph.
# enter a correction that draws separate lines by group.

If you got the right aesthetic, this seems to “work” off the shelf. But take a moment to see
why the code was written as it is and how that maps on to the graphics. What is the group
aesthetic doing for you?

Now, this graphic already indicates a lot, but let’s improve the graphics so people can
actually read it. This is left for a Exercise.

As social scientists, we should also not forget to ask ourselves whether these numerical
measures are fit for what we care about, or actually succeeds in measuring what we’d like
to measure. The estimation of these “ideal points” is a subfield of political methodology
beyond this prefresher. For more reading, skim through the original paper by Martin and
Quinn (2002).9 Also for a methodological discussion on the difficulty of measuring time
series of preferences, check out Bailey (2013).10

9 Martin, Andrew D. and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte
Carlo for the U.S. Supreme Court, 1953-1999”. Political Analysis. 10(2): 134-153.
10 Bailey, Michael A. 2013. “Is Today’s Court the Most Conservative in Sixty Years? Challenges and

Opportunities in Measuring Judicial Preferences.”. Journal of Politics 75(3): 821-834


188 CHAPTER 9. VISUALIZATION

Exercises

In the time remaining, try the following exercises. Order doesn’t matter.

1: Rural states

Make a well-labelled figure that plots the proportion of the state’s population (as per the
census) that is 65 years or older. Each state should be visualized as a point, rather than a
bar, and there should be 51 points, ordered by their value. All labels should be readable.
# Enter yourself

• Alternatively, you can for instead plot the proportion of residents who do not reisde
in a specified city.

2: The swing justice

Using the justices_court-median.csv dataset and building off of the plot that was given,
make an improved plot by implementing as many of the following changes (which hopefully
improves the graph):

• Label axes
• Use a black-white background.
• Change the breaks of the x-axis to print numbers for every decade, not just every two
decades.
• Plots each line in translucent gray, so the overlapping lines can be visualized clearly.
(Hint: in ggplot the alpha argument controls the degree of transparency)
• Limit the scale of the y-axis to [-5, 5] so that the outlier justice in the 60s is trimmed
and the rest of the data can be seen more easily (also, who is that justice?)
• Plot the ideal point of the justice who holds the “median” ideal point in a given term.
To distinguish this with the others, plot this line separately in a very light red below
the individual justice’s lines.
• Highlight the trend-line of only the nine justices who are currently sitting on SCOTUS.
Make sure this is clearer than the other past justices.
• Add the current nine justice’s names to the right of the endpoint of the 2016 figure,
alongside their ideal point.
• Make sure the text labels do not overlap with each other for readability using the
ggrepel package.
• Extend the x-axis label to about 2020 so the text labels of justices are to the right of
the trend-lines.
• Add a caption to your text describing the data briefly, as well as any features relevant
for the reader (such as the median line and the trimming of the y-axis)
# Enter yourself
9.9. LINE GRAPHS 189

3: Don’t sort by the alphabet

The Figure we made that shows racial composition by state has one notable shortcoming:
it orders the states alphabetically, which is not particularly useful if you want see an overall
pattern, without having particular states in mind.
Find a way to modify the figures so that the states are ordered by the proportion of White
residents in the sample.
# Enter yourself

4 What to show and how to show it

As a student of politics our goal is not necessarily to make pretty pictures, but rather make
pictures that tell us something about politics, government, or society. If you could augment
either the census dataset or the justices dataset in some way, what would be an substantively
significant thing to show as a graphic?
190 CHAPTER 9. VISUALIZATION
Chapter 10

Objects, Functions, Loops

Where are we? Where are we headed?

Up till now, you should have covered:


• R basic programming
• Data Import
• Statistical Summaries
• Visualization
Today we’ll cover
• Objects
• Functions
• Loops

10.1 What is an object?

Now that we have covered some hands-on ways to use graphics, let’s go into some funda-
mentals of the R language.
Let’s first set up
library(dplyr)
library(readr)
library(haven)
library(ggplot2)

cen10 <- read_csv("data/input/usc2010_001percent.csv", col_types = cols())

Objects are abstract symbols in which you store data. Here we will create an object from
copy, and assign cen10 to it.

191
192 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

copy <- cen10

This looks the same as the original dataset:


copy

## # A tibble: 30,871 x 4
## state sex age race
## <chr> <chr> <dbl> <chr>
## 1 New York Female 8 White
## 2 Ohio Male 24 White
## 3 Nevada Male 37 White
## 4 Michigan Female 12 White
## 5 Maryland Female 18 Black/Negro
## 6 New Hampshire Male 50 White
## 7 Iowa Female 51 White
## 8 Missouri Female 41 White
## 9 New Jersey Male 62 White
## 10 California Male 25 White
## # ... with 30,861 more rows
What happens if you do this next?
copy <- ""

It got reassigned:
copy

## [1] ""

10.1.1 lists

Lists are one of the most generic and flexible type of object. You can make an empty list
by the function list()
my_list <- list()
my_list

## list()
And start filling it in. Slots on the list are invoked by double square brackets [[]]
my_list[[1]] <- "contents of the first slot -- this is a string"
my_list[["slot 2"]] <- "contents of slot named slot 2"
my_list

## [[1]]
## [1] "contents of the first slot -- this is a string"
##
## $`slot 2`
## [1] "contents of slot named slot 2"
10.2. MAKING YOUR OWN OBJECTS 193

each slot can be anything. What are we doing here? We are defining the 1st slot of the list
my_list to be a vector c(1, 2, 3, 4, 5)
my_list[[1]] <- c(1, 2, 3, 4, 5)
my_list

## [[1]]
## [1] 1 2 3 4 5
##
## $`slot 2`
## [1] "contents of slot named slot 2"
You can even make nested lists. Let’s say we want the 1st slot of the list to be another list
of three elements.
my_list[[1]][[1]] <- "subitem 1 in slot 1 of my_list"
my_list[[1]][[2]] <- "subitem 1 in slot 2 of my_list"
my_list[[1]][[3]] <- "subitem 1 in slot 3 of my_list"

my_list

## [[1]]
## [1] "subitem 1 in slot 1 of my_list" "subitem 1 in slot 2 of my_list"
## [3] "subitem 1 in slot 3 of my_list" "4"
## [5] "5"
##
## $`slot 2`
## [1] "contents of slot named slot 2"

10.2 Making your own objects

We’ve covered one type of object, which is a list. You saw it was quite flexible. How many
types of objects are there?
There are an infinite number of objects, because people make their own class of object. You
can detect the type of the object (the class) by the function class
Object can be said to be an instance of a class.
Analogies:
class - Pokemon, object - Pikachu
class - Book, object - To Kill a Mockingbird
class - DataFrame, object - 2010 census data
class - Character, object - “Programming is Fun”
What is type (class) of object is cen10?
class(cen10)
194 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

## [1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"


What about this text?
class("some random text")

## [1] "character"
To change or create the class of any object, you can assign it. To do this, assign the name
of your class to character to an object’s class().
We can start from a simple list. For example, say we wanted to store data about pokemon.
Because there is no pre-made package for this, we decide to make our own class.
pikachu <- list(name = "Pikachu",
number = 25,
type = "Electric",
color = "Yellow")

and we can give it any class name we want.


class(pikachu) <- "Pokemon"
str(pikachu)

## List of 4
## $ name : chr "Pikachu"
## $ number: num 25
## $ type : chr "Electric"
## $ color : chr "Yellow"
## - attr(*, "class")= chr "Pokemon"
pikachu$type

## [1] "Electric"

10.2.1 Seeing R through objects

Most of the R objects that you will see as you advance are their own objects. For example,
here’s a linear regression object (which you will learn more about in Gov 2000):
ols <- lm(mpg ~ wt + vs + gear + carb, mtcars)
class(ols)

## [1] "lm"
Anything can be an object! Even graphs (in ggplot) can be assigned, re-assigned, and
edited.
grp_race <- group_by(cen10, race)%>%
summarize(count = n())

grp_race_ordered <- arrange(grp_race, count) %>%


mutate(race = forcats::as_factor(race))
10.2. MAKING YOUR OWN OBJECTS 195

gg_tab <- ggplot(data = grp_race_ordered) +


aes(x = race, y = count) +
geom_col() +
labs(caption = "Source: U.S. Census 2010")

gg_tab

20000

15000
count

10000

5000

Japanese
Three orAmerican
more major
Indian
races
or Alaska
Chinese
Native
Two Other
major Asian
races or Pacific
OtherIslander
race, nec
Black/Negro White
race
Source: U.S. Census 2010

You can change the orientation


gg_tab<- gg_tab + coord_flip()

10.2.2 Parsing an object by str()s

It can be hard to understand an R object because it’s contents are unknown. The function
str, short for structure, is a quick way to look into the innards of an object
str(my_list)

## List of 2
## $ : chr [1:5] "subitem 1 in slot 1 of my_list" "subitem 1 in slot 2 of my_list" "subite
## $ slot 2: chr "contents of slot named slot 2"
class(my_list)

## [1] "list"
196 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

Same for the object we just made


str(pikachu)

## List of 4
## $ name : chr "Pikachu"
## $ number: num 25
## $ type : chr "Electric"
## $ color : chr "Yellow"
## - attr(*, "class")= chr "Pokemon"
What does a ggplot object look like? Very complicated, but at least you can see it:
# enter this on your console
str(gg_tab)

10.3 Types of variables

In the social science we often analyze variables. As you saw in the tutorial, different types
of variables require different care.
A key link with what we just learned is that variables are also types of R objects.

10.3.1 scalars

One number. How many people did we count in our Census sample?
nrow(cen10)

## [1] 30871
Question: What proportion of our census sample is Native American? This number is also
a scalar
# Enter yourself
unique(cen10$race)

## [1] "White" "Black/Negro"


## [3] "Other race, nec" "American Indian or Alaska Native"
## [5] "Chinese" "Other Asian or Pacific Islander"
## [7] "Two major races" "Three or more major races"
## [9] "Japanese"
mean(cen10$race == "American Indian or Alaska Native")

## [1] 0.009555894
Hint: you can use the function mean() to calcualte the sample mean. The sample proportion
is the mean of a sequence of number, where your event of interest is a 1 (or TRUE) and others
are 0 (or FALSE).
10.3. TYPES OF VARIABLES 197

10.3.2 numeric vectors

A sequence of numbers.
grp_race_ordered$count

## [1] 77 88 295 354 869 1129 1839 4013 22207


class(grp_race_ordered$count)

## [1] "integer"

Or even, all the ages of the millions of people in our Census. Here are just the first few
numbers of the list.
head(cen10$age)

## [1] 8 24 37 12 18 50

10.3.3 characters (aka strings)

This can be just one stretch of characters


my_name <- "Yon Soo"
my_name

## [1] "Yon Soo"


class(my_name)

## [1] "character"

or more characters. Notice here that there’s a difference between a vector of individual
characters and a length-one object of characters.
my_name_letters <- c("S", "h", "i", "r", "o")
my_name_letters

## [1] "S" "h" "i" "r" "o"


class(my_name_letters)

## [1] "character"

Finally, remember that lower vs. upper case matters in R!


my_name2 <- "shiro"
my_name == my_name2

## [1] FALSE
198 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

10.4 What is a function?


Most of what we do in R is executing a function. read_csv(), nrow(), ggplot() .. pretty
much anything with a parentheses is a function. And even things like <- and [ are functions
as well.
A function is a set of instructions with specified ingredients. It takes an input, then ma-
nipulates it – changes it in some way – and then returns the manipulated product.
One way to see what a function actually does is to enter it without parentheses.
# enter this on your console
table

You’ll see below that the most basic functions are quite complicated internally.
You’ll notice that functions contain other functions. wrapper functions are functions that
“wrap around” existing functions. This sounds redundant, but it’s an important feature of
programming. If you find yourself repeating a command more than two times, you should
make your own function, rather than writing the same type of code.

10.4.1 Write your own function

It’s worth remembering the basic structure of a function. You create a new function, call it
my_fun by this:
my_fun <- function() {

If we wanted to generate a function that computed the number of men in your data, what
would that look like?
count_men <- function(data) {

nmen <- sum(data$sex == "Male")

return(nmen)
}

Then all we need to do is feed this function a dataset


count_men(cen10)

## [1] 15220
The point of a function is that you can use it again and again without typing up the set
of constituent manipulations. So, what if we wanted to figure out the number of men in
California?
count_men(cen10[cen10$state == "California",])

## [1] 1876
10.4. WHAT IS A FUNCTION? 199

Let’s go one step further. What if we want to know the proportion of non-whites in a state,
just by entering the name of the state? There’s multiple ways to do it, but it could look
something like this
nw_in_state <- function(data, state) {

s.subset <- data[data$state == state,]


total.s <- nrow(s.subset)
nw.s <- sum(s.subset$race != "White")

nw.s / total.s
}

The last line is what gets generated from the function. To be more explicit you can wrap
the last line around return(). (as in return(nw.s/total.s). return() is used when you
want to break out of a function in the middle of it and not wait till the last line.
Try it on your favorite state!
nw_in_state(cen10, "Massachusetts")

## [1] 0.2040185

Checkpoint

Try making your own function, average_age_in_state, that will give you the average age
of people in a given state.
# Enter on your own

Try making your own function, asians_in_state, that will give you the number of Chinese,
Japanese, and Other Asian or Pacific Islander people in a given state.
# Enter on your own

Try making your own function, ‘top_10_oldest_cities’, that will give you the names of cities
whose population’s average age is top 10 oldest.
# Enter on your own
200 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

10.5 What is a package?

You can think of a package as a suite of functions that other people have already built for
you to make your life easier.
help(package = "ggplot2")

To use a package, you need to do two things: (1) install it, and then (2) load it.
Installing is a one-time thing
install.packages("ggplot2")

But you need to load each time you start a R instance. So always keep these commands on
a script.
library(ggplot2)

In rstudio.cloud, we already installed a set of packages for you. But when you start your
own R instance, you need to have installed the package at some point.

10.6 Conditionals

Sometimes, you want to execute a command only under certain conditions. This is done
through the almost universal function, if(). Inside the if function we enter a logical
statement. The line that is adjacent to, or follows, the if() statement only gets executed
if the statement returns TRUE.
For example,
For example,
x <- 5
if (x >0) {
print("positive number")
} else if (x == 0) {
print ("zero")
} else {
print("negative number")
}

## [1] "positive number"


You can wrap that whole things in a function
is_positive <- function(number) {
if (number >0) {
print("positive number")
} else if (number == 0) {
print ("zero")
} else {
10.7. FOR-LOOPS 201

print("negative number")
}
}

is_positive(5)

## [1] "positive number"


is_positive(-3)

## [1] "negative number"

10.7 For-loops
Loops repeat the same statement, although the statement can be “the same” only in an
abstract sense. Use the for(x in X) syntax to repeat the subsequent command as many
times as there are elements in the right-hand object X. Each of these elements will be referred
to the left-hand index x
First, come up with a vector.
fruits <- c("apples", "oranges", "grapes")

Now we use the fruits vector in a for loop.


for (fruit in fruits) {
print(paste("I love", fruit))
}

## [1] "I love apples"


## [1] "I love oranges"
## [1] "I love grapes"
Here for() and in must be part of any for loop. The right hand side fruits must be a thing
that exists. Finally the left-hand side object
∑10is “Pick your favor name.” It is analogous to
how we can index a sum with any letter. i=1 i and sum_{j = 1}^{10}j are in fact the
same thing.
for (i in 1:length(fruits)) {
print(paste("I love", fruits[i]))
}

## [1] "I love apples"


## [1] "I love oranges"
## [1] "I love grapes"
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")

for( state in states_of_interest){


state_data <- cen10[cen10$state == state,]
nmen <- sum(state_data$sex == "Male")
202 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

n <- nrow(state_data)
men_perc <- round(100*(nmen/n), digits=2)
print(paste("Percentage of men in",state, "is", men_perc))

## [1] "Percentage of men in California is 49.85"


## [1] "Percentage of men in Massachusetts is 47.6"
## [1] "Percentage of men in New Hampshire is 48.55"
## [1] "Percentage of men in Washington is 48.19"
Instead of printing, you can store the information in a vector
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
male_percentages <- c()
iter <-1

for( state in states_of_interest){


state_data <- cen10[cen10$state == state,]
nmen <- sum(state_data$sex == "Male")
n <- nrow(state_data)
men_perc <- round(100*(nmen/n), digits=2)

male_percentages <- c(male_percentages, men_perc)


names(male_percentages)[iter] <- state
iter <- iter + 1
}

male_percentages

## California Massachusetts New Hampshire Washington


## 49.85 47.60 48.55 48.19

10.8 Nested Loops

What if I want to calculate the population percentage of a race group for all race groups in
states of interest? You could probably use tidyverse functions to do this, but let’s try using
loops!
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
for (state in states_of_interest) {
for (race in unique(cen10$race)) {
race_state_num <- nrow(cen10[cen10$race == race & cen10$state == state, ])
state_pop <- nrow(cen10[cen10$state == state, ])
race_perc <- round(100*(race_state_num/(state_pop)), digits=2)
print(paste("Percentage of ", race , "in", state, "is", race_perc))
}
10.8. NESTED LOOPS 203

## [1] "Percentage of White in California is 57.61"


## [1] "Percentage of Black/Negro in California is 6.72"
## [1] "Percentage of Other race, nec in California is 15.55"
## [1] "Percentage of American Indian or Alaska Native in California is 1.12"
## [1] "Percentage of Chinese in California is 3.75"
## [1] "Percentage of Other Asian or Pacific Islander in California is 9.54"
## [1] "Percentage of Two major races in California is 4.62"
## [1] "Percentage of Three or more major races in California is 0.37"
## [1] "Percentage of Japanese in California is 0.72"
## [1] "Percentage of White in Massachusetts is 79.6"
## [1] "Percentage of Black/Negro in Massachusetts is 5.87"
## [1] "Percentage of Other race, nec in Massachusetts is 4.02"
## [1] "Percentage of American Indian or Alaska Native in Massachusetts is 0.77"
## [1] "Percentage of Chinese in Massachusetts is 2.32"
## [1] "Percentage of Other Asian or Pacific Islander in Massachusetts is 4.33"
## [1] "Percentage of Two major races in Massachusetts is 2.78"
## [1] "Percentage of Three or more major races in Massachusetts is 0"
## [1] "Percentage of Japanese in Massachusetts is 0.31"
## [1] "Percentage of White in New Hampshire is 93.48"
## [1] "Percentage of Black/Negro in New Hampshire is 0.72"
## [1] "Percentage of Other race, nec in New Hampshire is 0.72"
## [1] "Percentage of American Indian or Alaska Native in New Hampshire is 0.72"
## [1] "Percentage of Chinese in New Hampshire is 0.72"
## [1] "Percentage of Other Asian or Pacific Islander in New Hampshire is 2.17"
## [1] "Percentage of Two major races in New Hampshire is 0.72"
## [1] "Percentage of Three or more major races in New Hampshire is 0"
## [1] "Percentage of Japanese in New Hampshire is 0.72"
## [1] "Percentage of White in Washington is 76.05"
## [1] "Percentage of Black/Negro in Washington is 2.9"
## [1] "Percentage of Other race, nec in Washington is 5.37"
## [1] "Percentage of American Indian or Alaska Native in Washington is 2.03"
## [1] "Percentage of Chinese in Washington is 1.31"
## [1] "Percentage of Other Asian or Pacific Islander in Washington is 6.68"
## [1] "Percentage of Two major races in Washington is 4.79"
## [1] "Percentage of Three or more major races in Washington is 0.29"
## [1] "Percentage of Japanese in Washington is 0.58"
204 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS

Exercises

Exercise 1: Counting CVAP

A issue raised in Persily’s article is that the full-count U.S. Census does not record whether
the residents are citizens of the United States1 . Instead, this question is asked in a survey, the
American Community Survey. The two are fundamentally different exercises: the Census
counts everyone by definition, a survey samples its data. Load the 1 percent sample of
the 2015 ACS (acs2015_1percent.csv, in the input folder) and give an estimate of the
proportion of a state’s ACS respondents that are reportedly U.S. citizens.
acs<- read_csv("data/input/acs2015_1percent.csv", col_types = cols())
set.seed(02138)
sample_acs <- sample_frac(acs, 0.01)

# Enter yourself

Exercise 2: Write your own function

Write your own function that makes some task of data analysis simpler. Ideally, it would be
a function that helps you do either of the previous tasks in fewer lines of code. You can use
the three lines of code that was provided in exercise 1 to wrap that into another function
too!
# Enter yourself

Exercise 3: Using Loops

Using a loop, create a crosstab of sex and race for each state in the set “states_of_interest”
states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
# Enter yourself

Exercise 4: Storing information derived within loops in a global


dataframe

Recall the following nested loop


states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")
for (state in states_of_interest) {
for (race in unique(cen10$race)) {
race_state_num <- nrow(cen10[cen10$race == race & cen10$state == state, ])
state_pop <- nrow(cen10[cen10$state == state, ])
1 Here
is that argument of his again, more recently in the popular press. “The Mysterious Number of
American Citizens”. June 2, 2015. POLITICO
10.8. NESTED LOOPS 205

race_perc <- round(100*(race_state_num/(state_pop)), digits=2)


print(paste("Percentage of ", race , "in", state, "is", race_perc))
}
}

## [1] "Percentage of White in California is 57.61"


## [1] "Percentage of Black/Negro in California is 6.72"
## [1] "Percentage of Other race, nec in California is 15.55"
## [1] "Percentage of American Indian or Alaska Native in California is 1.12"
## [1] "Percentage of Chinese in California is 3.75"
## [1] "Percentage of Other Asian or Pacific Islander in California is 9.54"
## [1] "Percentage of Two major races in California is 4.62"
## [1] "Percentage of Three or more major races in California is 0.37"
## [1] "Percentage of Japanese in California is 0.72"
## [1] "Percentage of White in Massachusetts is 79.6"
## [1] "Percentage of Black/Negro in Massachusetts is 5.87"
## [1] "Percentage of Other race, nec in Massachusetts is 4.02"
## [1] "Percentage of American Indian or Alaska Native in Massachusetts is 0.77"
## [1] "Percentage of Chinese in Massachusetts is 2.32"
## [1] "Percentage of Other Asian or Pacific Islander in Massachusetts is 4.33"
## [1] "Percentage of Two major races in Massachusetts is 2.78"
## [1] "Percentage of Three or more major races in Massachusetts is 0"
## [1] "Percentage of Japanese in Massachusetts is 0.31"
## [1] "Percentage of White in New Hampshire is 93.48"
## [1] "Percentage of Black/Negro in New Hampshire is 0.72"
## [1] "Percentage of Other race, nec in New Hampshire is 0.72"
## [1] "Percentage of American Indian or Alaska Native in New Hampshire is 0.72"
## [1] "Percentage of Chinese in New Hampshire is 0.72"
## [1] "Percentage of Other Asian or Pacific Islander in New Hampshire is 2.17"
## [1] "Percentage of Two major races in New Hampshire is 0.72"
## [1] "Percentage of Three or more major races in New Hampshire is 0"
## [1] "Percentage of Japanese in New Hampshire is 0.72"
## [1] "Percentage of White in Washington is 76.05"
## [1] "Percentage of Black/Negro in Washington is 2.9"
## [1] "Percentage of Other race, nec in Washington is 5.37"
## [1] "Percentage of American Indian or Alaska Native in Washington is 2.03"
## [1] "Percentage of Chinese in Washington is 1.31"
## [1] "Percentage of Other Asian or Pacific Islander in Washington is 6.68"
## [1] "Percentage of Two major races in Washington is 4.79"
## [1] "Percentage of Three or more major races in Washington is 0.29"
## [1] "Percentage of Japanese in Washington is 0.58"
Instead of printing the percentage of each race in each state, create a dataframe, and store
all that information in that dataframe. (Hint: look at how I stored information about male
percentage in each state of interest in a vector.)
206 CHAPTER 10. OBJECTS, FUNCTIONS, LOOPS
Chapter 11

Joins and Merges, Wide and


Long1

Where are we? Where are we headed?

Up till now, you should have covered:

• R basic programming
• Counting.
• Visualization.
• Objects and Classes.
• Matrix algebra in R
• Functions.

Today you will work on your own, but feel free to ask a fellow classmate nearby or the
instructor. The objective for this session is to get more experience using R, but in the
process (a) test a prominent theory in the political science literature and (b) explore related
ideas of interest to you.

11.1 Motivation

The “Democratic Peace” is one of the most widely discussed propositions in political science,
covering the fields of International Relations and Comparative Politics, with insights to
domestic politics of democracies (e.g. American Politics). The one-sentence idea is that
democracies do not fight with each other. There have been much theoretical debate – for
example in earlier work, Oneal and Russet (1999) argue that the democratic peace is not
due to the hegemony of strong democracies like the U.S. and attempt to distinguish between

1 Module originally written by Shiro Kuriwaki, Connor Jerzak, and Yon Soo Park

207
208 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG

realist and what they call Kantian propositions (e.g. democratic governance, international
organizations)2 .
An empirical demonstration of the democratic peace is also a good example of a Time
Series Cross Sectional (or panel) dataset, where the same units (in this case countries)
are observed repeatedly for multiple time periods. Experience in assembling and analyzing
a TSCS dataset will prepare you for any future research in this area.

11.2 Setting up

library(dplyr)
library(tidyr)
library(readr)
library(data.table)
library(ggplot2)

11.3 Create a project directory


First start a directory for this project. This can be done manually or through RStudio’s
Project feature(File > New Project...)
Directories is the computer science / programming name for folders. While advice about
how to structure your working directories might strike you as petty, we believe that starting
from some well-tested guides will go a long way in improving the quality and efficiency of
your work.
Chapter 4 of Gentzkow and Shapiro’s memo, Code and Data for the Social Scientist] provides
a good template.

11.4 Data Sources


Most projects you do will start with downloading data from elsewhere. For this task, you’ll
probably want to track down and download the following:
• Correlates of war dataset (COW): Find and download the Militarized
Interstate Disputes (MIDs) data from the Correlates of War website: http:
//www.correlatesofwar.org/data-sets. Or a dyad-version on dataverse: https:
//dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/11489
• PRIO Data on Armed Conflict: Find and download the Uppsala Conflict Data
Program (UCDP) and PRIO dyad-year data on armed conflict(https://www.prio.
org) or this link to to the flat csv file (http://ucdp.uu.se/downloads/dyadic/
ucdp-dyadic-171.csv).
2 The
Kantian Peace: The Pacific Benefits of Democracy, Interdependence, and International Organizations,
1885-1992. World Politics 52(1):1-37
11.5. EXAMPLE WITH 2 DATASETS 209

• Polity: The Polity data can be downloaded from their website (http://www.
systemicpeace.org/inscrdata.html). Look for the newest version of the time
series that has the widest coverage.

11.5 Example with 2 Datasets

Let’s read in a sample dataset.


polity <- read_csv("data/input/sample_polity.csv")
mid <- read_csv("data/input/sample_mid.csv")

What does polity look like?


unique(polity$country)

## [1] "France" "Prussia" "Germany" "United States"


ggplot(polity, aes(x = year, y = polity2)) +
facet_wrap(~ country) +
geom_line()

France Germany
10

−5

−10
polity2

Prussia United States


10

−5

−10
1800 1850 1900 1950 2000 1800 1850 1900 1950 2000
year

head(polity)

## # A tibble: 6 x 5
## scode ccode country year polity2
210 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG

## <chr> <dbl> <chr> <dbl> <dbl>


## 1 FRN 220 France 1800 -8
## 2 FRN 220 France 1801 -8
## 3 FRN 220 France 1802 -8
## 4 FRN 220 France 1803 -8
## 5 FRN 220 France 1804 -8
## 6 FRN 220 France 1805 -8
MID is a dataset that captures a dispute for a given country and year.
mid

## # A tibble: 6,132 x 5
## ccode polity_code dispute StYear EndYear
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 200 UKG 1 1902 1903
## 2 2 USA 1 1902 1903
## 3 345 YGS 1 1913 1913
## 4 300 <NA> 1 1913 1913
## 5 339 ALB 1 1946 1946
## 6 200 UKG 1 1946 1946
## 7 200 UKG 1 1951 1952
## 8 651 EGY 1 1951 1952
## 9 630 IRN 1 1856 1857
## 10 200 UKG 1 1856 1857
## # ... with 6,122 more rows

11.6 Loops

Notice that in the mid data, we have a start of a dispute vs. an end of a dispute.In order to
combine this into the polity data, we want a way to give each of the interval years a row.
There are many ways to do this, but one is a loop. We go through one row at a time, and
then for each we make a new dataset. that has year as a sequence of each year.
mid_year_by_year <- data_frame(ccode = numeric(),
year = numeric(),
dispute = numeric())

## Warning: `data_frame()` is deprecated, use `tibble()`.


## This warning is displayed once per session.
for(i in 1:nrow(mid)) {
x <- data_frame(ccode = mid$ccode[i], ## row i's country
year = mid$StYear[i]:mid$EndYear[i], ## sequence of years for dispute in row i
dispute = 1)
mid_year_by_year <- rbind(mid_year_by_year, x)
}
11.7. MERGING 211

Figure 11.1

head(mid_year_by_year)

## # A tibble: 6 x 3
## ccode year dispute
## <dbl> <int> <dbl>
## 1 200 1902 1
## 2 200 1903 1
## 3 2 1902 1
## 4 2 1903 1
## 5 345 1913 1
## 6 300 1913 1

11.7 Merging

We want to combine these two datasets by merging. Base-R has a function called merge.
dplyr has several types of joins (the same thing). Those names are based on SQL syntax.
Here we can do a left_join matching rows from mid to polity. We want to keep the rows
in polity that do not match in mid, and label them as non-disputes.
212 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG

p_m <- left_join(polity,


distinct(mid_year_by_year),
by = c("ccode", "year"))

head(p_m)

## # A tibble: 6 x 6
## scode ccode country year polity2 dispute
## <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 FRN 220 France 1800 -8 NA
## 2 FRN 220 France 1801 -8 NA
## 3 FRN 220 France 1802 -8 NA
## 4 FRN 220 France 1803 -8 NA
## 5 FRN 220 France 1804 -8 NA
## 6 FRN 220 France 1805 -8 NA
Replace dispute = NA rows with a zero.
p_m$dispute[is.na(p_m$dispute)] <- 0

long to wide
p_m_wide <- dcast(data = p_m,
formula = ccode ~ year,
value.var = "polity2")

11.8 Main Project

Try building a panel that would be useful in answering the Democratic Peace Question,
perhaps in these steps.

Task 1: Data Input and Standardization

Often, files we need are saved in the .xls or xlsx format. It is possible to read these files
directly into R, but experience suggests that this process is slower than converting them first
to .csv format and reading them in as .csv files.
readxl/readr/haven packages(https://github.com/tidyverse/tidyverse) is con-
stantly expanding to capture more file types. In day 1, we used the package readxl, using
the read_excel() function.

Task 2: Data Merging

We will use data to test a version of the Democratic Peace Thesis (DPS). Democracies are
said to go to war less because the leaders who wage wars are accountable to voters who
11.8. MAIN PROJECT 213

have to bear the costs of war. Are democracies less likely to engage in militarized interstate
disputes?
To start, let’s download and merge some data.
• Load in the Militarized Interstate Dispute (MID) files. Militarized interstate disputes
are hostile action between two formally recognized states. Examples of this would be
threats to use force, threats to declare war, beginning war, fortifying a border with
troops, and so on.
• Find a way to merge the Polity IV dataset and the MID data. This process can be
a bit tricky.
• An advanced version of this task would be to download the dyadic form of the data
and try merging that with polity.

Task 3: Tabulations and Visualization

1. Calculate the mean Polity2 score by year. Plot the result. Use graphical indicators
of your choosing to show where key events fall in this timeline (such as 1914, 1929,
1939, 1989, 2008). Speculate on why the behavior from 1800 to 1920 seems to be
qualitatively different than behavior afterwards.
2. Do the same but only among state-years that were invovled in a MID. Plot this line
together with your results from 1.
3. Do the same but only among state years that were not involved in a MID.
4. Arrive at a tentative conclusion for how well the Democratic Peace argument seems
to hold up in this dataset. Visualize this conclusion.
214 CHAPTER 11. JOINS AND MERGES, WIDE AND LONG
Chapter 12

Simulation1

Where are we? Where are we headed?

Up till now, you should have covered:

• R basics
• Visualization
• Matrices and vectors
• Functions, objects, loops
• Joining real data

In this module, we will start to work with generating data within R, from thin air, as
it were. Doing simulation also strengthens your understanding of Probability (Section
@ref{probability}).

Check your Understanding

• What does the sample() function do?


• What does runif() stand for?
• What is a seed?
• What is a Monte Carlo?

Check if you have an idea of how you might code the following tasks:

• Simulate 100 rolls of a die


• Simulate one random ordering of 25 numbers
• Simulate 100 values of white noise (uniform random variables)
• Generate a “bootstrap” sample of an existing dataset

We’re going to learn about this today!


1 Module originally written by Connor Jerzak and Shiro Kuriwaki

215
216 CHAPTER 12. SIMULATION

12.1 Motivation: Simulation as an Analytical Tool


An increasing amount of political science contributions now include a simulation.
• Axelrod (1977) demonstrated via simulation how atomized individuals evolve to be
grouped in similar clusters or countries, a model of culture.2
• Chen and Rodden (2013) argued in a 2013 article that the vote-seat inequality in U.S.
elections that is often attributed to intentional partisan gerrymandering can actually
attributed to simply the reality of “human geography” – Democratic voters tend to
be concentrated in smaller area. Put another way, no feasible form of gerrymandering
could spread out Democratic voters in such a way to equalize their vote-seat translation
effectiveness. After demonstrating the empirical pattern of human geography, they
advance their key claim by simulating thousands of redistricting plans and record the
vote-seat ratio.3
• Gary King, James Honaker, and multiple other authors propose a way to analyze
missing data with a method of multiple imputation, which uses a lot of simulation
from a researcher’s observed dataset.4 (Software: Amelia5 )
Statistical methods also incorporate simulation:
• The bootstrap: a statistical method for estimating uncertainty around some parameter
by re-sampling observations.
• Bagging: a method for improving machine learning predictions by re-sampling obser-
vations, storing the estimate across many re-samples, and averaging these estimates
to form the final estimate. A variance reduction technique.
• Statistical reasoning: if you are trying to understand a quantitative problem, a won-
derful first-step to understand the problem better is to simulate it! The analytical
solution is often very hard (or impossible), but the simulation is often much easier :-)

12.2 Pick a sample, any sample

12.3 The sample() function


The core functions for coding up stochastic data revolves around several key functions, so
we will simply review them here.
Suppose you have a vector of values x and from it you want to randomly sample a sample
of length size. For this, use the sample function
sample(x = 1:10, size = 5)

## [1] 1 2 3 7 6
2 Axelrod, Robert. 1997. “The Dissemination of Culture.” Journal of Conflict Resolution 41(2): 203–26.
3 Chen, Jowei, and Jonathan Rodden. “Unintentional Gerrymandering: Political Geography and Electoral
Bias in Legislatures. Quarterly Journal of Political Science, 8:239-269”
4 King, Gary, et al. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple

Imputation”. American Political Science Review, 95: 49-69.


5 James Honaker, Gary King, Matthew Blackwell (2011). Amelia II: A Program for Missing Data. Journal

of Statistical Software, 45(7), 1-47.


12.3. THE SAMPLE() FUNCTION 217

There are two subtypes of sampling – with and without replacement.


1. Sampling without replacement (replace = FALSE) means once an element of x is
chosen, it will not be considered again:
sample(x = 1:10, size = 10, replace = FALSE) ## no number appears more than once

## [1] 7 8 9 6 2 5 4 1 10 3
2. Sampling with replacement (replace = TRUE) means that even if an element of x is
chosen, it is put back in the pool and may be chosen again.
sample(x = 1:10, size = 10, replace = TRUE) ## any number can appear more than once

## [1] 10 2 1 5 3 9 3 6 6 1
It follows then that you cannot sample without replacement a sample that is larger than
the pool.
sample(x = 1:10, size = 100, replace = FALSE)

## Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the popu
So far, every element in x has had an equal probability of being chosen. In some application,
we want a sampling scheme where some elements are more likely to be chosen than others.
The argument prob handles this.
For example, this simulates 20 fair coin tosses (each outcome is equally likely to happen)
sample(c("Head", "Tail"), size = 20, prob = c(0.5, 0.5), replace = TRUE)

## [1] "Head" "Tail" "Tail" "Head" "Tail" "Head" "Head" "Tail" "Tail" "Head"
## [11] "Tail" "Head" "Head" "Head" "Tail" "Head" "Head" "Head" "Head" "Head"
But this simulates 20 biased coin tosses, where say the probability of Tails is 4 times more
likely than the number of Heads
sample(c("Head", "Tail"), size = 20, prob = c(0.2, 0.8), replace = TRUE)

## [1] "Tail" "Tail" "Tail" "Tail" "Tail" "Head" "Tail" "Tail" "Tail" "Head"
## [11] "Head" "Head" "Tail" "Head" "Tail" "Tail" "Tail" "Tail" "Tail" "Tail"

12.3.1 Sampling rows from a dataframe

In tidyverse, there is a convenience function to sample rows randomly: sample_n() and


sample_frac().
For example, load the dataset on cars, mtcars, which has 32 observations.
mtcars

## mpg cyl disp hp drat wt qsec vs am gear carb


## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
218 CHAPTER 12. SIMULATION

## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1


## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
sample_n picks a user-specified number of rows from the dataset:
sample_n(mtcars, 3)

## mpg cyl disp hp drat wt qsec vs am gear carb


## 1 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4
## 2 15.2 8 304 150 3.15 3.435 17.30 0 0 3 2
## 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Sometimes you want a X percent sample of your dataset. In this case use sample_frac()
sample_frac(mtcars, 0.10)

## mpg cyl disp hp drat wt qsec vs am gear carb


## 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 2 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## 3 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
As a side-note, these functions have very practical uses for any type of data analysis:
• Inspecting your dataset: using head() all the same time and looking over the first
12.4. RANDOM NUMBERS FROM SPECIFIC DISTRIBUTIONS 219

few rows might lead you to ignore any issues that end up in the bottom for whatever
reason.
• Testing your analysis with a small sample: If running analyses on a dataset takes more
than a handful of seconds, change your dataset upstream to a fraction of the size so
the rest of the code runs in less than a second. Once verifying your analysis code runs,
then re-do it with your full dataset (by simply removing the sample_n / sample_frac
line of code in the beginning). While three seconds may not sound like much, they
accumulate and eat up time.

12.4 Random numbers from specific distributions

rbinom()

rbinom builds upon sample as a tool to help you answer the question – what is the total
number of successes I would get if I sampled a binary (Bernoulli) result from a test with
size number of trials each, with a event-wise probability of prob. The first argument n
asks me how many such numbers I want.
For example, I want to know how many Heads I would get if I flipped a fair coin 100 times.
rbinom(n = 1, size = 100, prob = 0.5)

## [1] 51
Now imagine this I wanted to do this experiment 10 times, which would require I flip the
coin 10 x 100 = 1000 times! Helpfully, we can do this in one line
rbinom(n = 10, size = 100, prob = 0.5)

## [1] 47 46 49 57 45 42 45 50 44 50

runif()

runif also simulates a stochastic scheme where each event has equal probability of getting
chosen like sample, but is a continuous rather than discrete system. We will cover this more
in the next math module.
The intuition to emphasize here is that one can generate potentially infinite amounts (size
n) of noise that is a essentially random
runif(n = 5)

## [1] 0.1847970 0.9906337 0.4612721 0.8321032 0.3246780

rnorm()

rnorm is also a continuous distribution, but draws from a Normal distribution – perhaps
the most important distribution in statistics. It runs the same way as runif
220 CHAPTER 12. SIMULATION

rnorm(n = 5)

## [1] 1.7477110 0.5347322 1.2716806 0.2623107 0.1506185

To better visualize the difference between the output of runif and rnorm, let’s generate lots
of each and plot a histogram.
from_runif <- runif(n = 1000)
from_rnorm <- rnorm(n = 1000)

par(mfrow = c(1, 2)) ## base-R parameter for two plots at once


hist(from_runif)
hist(from_rnorm)

Histogram of from_runif Histogram of from_rnorm

200
100

150
80
Frequency

Frequency
60

100
40

50
20
0

0.0 0.2 0.4 0.6 0.8 1.0 −3 −1 0 1 2 3

from_runif from_rnorm

12.5 r, p, and d

Each distribution can do more than generate random numbers (the prefix r). We can
compute the cumulative probability by the function pbinom(), punif(), and pnorm(). Also
the density – the value of the PDF – by dbinom(), dunif() and dnorm().
12.6. SET.SEED() 221

12.6 set.seed()

R doesn’t have the ability to generate truly random numbers! Random numbers are actually
very hard to generate. (Think: flipping a coin –> can be perfectly predicted if I know wind
speed, the angle the coin is flipped, etc.). Some people use random noise in the atmosphere or
random behavior in quantum systems to generate “truly” (?) random numbers. Conversely,
R uses deterministic algorithms which take as an input a “seed” and which then perform a
series of operations to generate a sequence of random-seeming numbers (that is, numbers
whose sequence is sufficiently hard to predict).
Let’s think about this another way. Sampling is a stochastic process, so every time you run
sample() or runif() you are bound to get a different output (because different random seeds
are used). This is intentional in some cases but you might want to avoid it in others. For
example, you might want to diagnose a coding discrepancy by setting the random number
generator to give the same number each time. To do this, use the function set.seed().
In the function goes any number. When you run a sample function in the same command
as a preceding set.seed(), the sampling function will always give you the same sequence
of numbers. In a sense, the sampler is no longer random (in the sense of unpredictable to
use; remember: it never was “truly” random in the first place)
set.seed(02138)
runif(n = 10)

## [1] 0.51236144 0.61530551 0.37451441 0.43541258 0.21166530 0.17812129


## [7] 0.04420775 0.45567854 0.88718264 0.06970056
The random number generator should give you the exact same sequence of numbers if you
precede the function by the same seed,
set.seed(02138)
runif(n = 10)

## [1] 0.51236144 0.61530551 0.37451441 0.43541258 0.21166530 0.17812129


## [7] 0.04420775 0.45567854 0.88718264 0.06970056
whereas a true random number generator would give you the exact same sequence of output
with probability 0!

Exercises

Census Sampling

What can we learn from surveys of populations, and how wrong do we get if our sampling is
biased?6 Suppose we want to estimate the proportion of U.S. residents who are non-white
(race != "White"). In reality, we do not have any population dataset to utilize and so
6 This example is inspired from Meng, Xiao-Li (2018). Statistical paradises and paradoxes in big data (I):
Law of large populations, big data paradox, and the 2016 US presidential election. Annals of Applied
Statistics 12:2, 685–726. doi:10.1214/18-AOAS1161SF.
222 CHAPTER 12. SIMULATION

we only see the sample survey. Here, however, to understand how sampling works, let’s
conveniently use the Census extract in some cases and pretend we didn’t in others.
(a) First, load usc2010_001percent.csv into your R session. After loading the
library(tidyverse), browse it. Although this is only a 0.01 percent extract, treat
this as your population for pedagogical purposes. What is the population proportion
of non-White residents?
(b) Setting a seed to 1669482, sample 100 respondents from this sample. What is the
proportion of non-White residents in this particular sample? By how many percentage
points are you off from (what we labelled as) the true proportion?
(c) Now imagine what you did above was one survey. What would we get if we did 20
surveys?
To simulate this, write a loop that does the same exercise 20 times, each time computing a
sample proportion. Use the same seed at the top, but be careful to position the set.seed
function such that it generates the same sequence of 20 samples, rather than 20 of the same
sample.
Try doing this with a for loop and storing your sample proportions in a new length-20
vector. (Suggestion: make an empty vector first as a container). After running the loop,
show a histogram of the 20 values. Also what is the average of the 20 sample estimates?
(d) Now, to make things more real, let’s introduce some response bias. The goal here
is not to correct response bias but to induce it and see how it affects our estimates.
Suppose that non-White residents are 10 percent less likely to respond to enter your
survey than White respondents. This is plausible if you think that the Census is from
2010 but you are polling in 2018, and racial minorities are more geographically mobile
than Whites. Repeat the same exercise in (c) by modeling this behavior.
You can do this by creating a variable, e.g. propensity, that is 0.9 for non-Whites and 1
otherwise. Then, you can refer to it in the propensity argument.
(e) Finally, we want to see if more data (“Big Data”) will improve our estimates. Using the
same unequal response rates framework as (d), repeat the same exercise but instead
of each poll collecting 100 responses, we collect 10,000.
(f) Optional - visualize your 2 pairs of 20 estimates, with a bar showing the “correct”
population average.

Conditional Proportions

This example is not on simulation, but is meant to reinforce some of the probability discus-
sion from math lecture.
Read in the Upshot Siena poll from Fall 2016, data/input/upshot-siena-polls.csv.
In addition to some standard demographic questions, we will focus on one called vt_pres_2
in the csv. This is a two-way presidential vote question, asking respondents who they plan
to vote for President if the election were held today – Donald Trump, the Republican, or
Hilary Clinton, the Democrat, with options for Other candidates as well. For this problem,
use the two-way vote question rather than the 4-way vote question.
12.6. SET.SEED() 223

(a) Drop the the respondents who answered the November poll (i.e. those for which poll
== "November"). We do this in order to ignore this November population in all
subsequent parts of this question because they were not asked the Presidential vote
question.
(b) Using the dataset after the procedure in (a), find the proportion of poll respondents
(those who are in the sample) who support Donald Trump.
(c) Among those who supported Donald Trump, what proportion of them has a Bachelor’s
degree or higher (i.e. have a Bachelor’s, Graduate, or other Professional Degree)?
(d) Among those who did not support Donald Trump (i.e. including supporters of Hi-
lary Clinton, another candidate, or those who refused to answer the question), what
proportion of them has a Bachelor’s degree or higher?
(e) Express the numbers in the previous parts as probabilities of specified events. Define
your own symbols: For example, we can let T be the event that a randomly selected
respondent in the poll supports Donald Trump, then the proportion in part (b) is the
probability P (T ).
(f) Suppose we randomly sampled a person who participated in the survey and found that
he/she had a Bachelor’s degree or higher. Given this evidence, what is the probability
that the same person supports Donald Trump? Use Bayes Rule and show your work
– that is, do not use data or R to compute the quantity directly. Then, verify this is
the case via R.

The Birthday problem

Write code that will answer the well-known birthday problem via simulation.7
The problem is fairly simple: Suppose k people gather together in a room. What is the
probability at least two people share the same birthday?
To simplify reality a bit, assume that (1) there are no leap years, and so there are always 365
days in a year, and (2) a given individual’s birthday is randomly assigned and independent
from each other.
Step 1: Set k to a concrete number. Pick a number from 1 to 365 randomly, k times to
simulate birthdays (would this be with replacement or without?).
# Your code

Step 2: Write a line (or two) of code that gives a TRUE or FALSE statement of whether or
not at least two people share the same birth date.
# Your code

Step 3: The above steps will generate a TRUE or FALSE answer for your event of interest, but
only for one realization of an event in the sample space. In order to estimate the probability
of your event happening, we need a “stochastic”, as opposed to “deterministic”, method. To
do this, write a loop that does Steps 1 and 2 repeatedly for many times, call that number of
7 This exercise draws from Imai (2017)
224 CHAPTER 12. SIMULATION

times sims. For each of sims iteration, your code should give you a TRUE or FALSE answer.
Code up a way to store these estimates.
# Your code

Step 4: Finally, generalize the function further by letting k be a user-defined number. You
have now created a Monte Carlo simulation!
# Your code

Step 5: Generate a table or plot that shows how the probability of sharing a birthday
changes by k (fixing sims at a large number like 1000). Also generate a similar plot that
shows how the probability of sharing a birthday changes by sims (fixing k at some arbitrary
number like 10).
# Your code

Extra credit: Give an “analytical” answer to this problem, that is an answer through deriving
the mathematical expressions of the probability.
# Your equations
Chapter 13

LaTeX and markdown1

Where are we? Where are we headed?

Up till now, you should have covered:


• Statistical Programming in R
This is only the beginning of R – programming is like learning a language, so learn more as
we use it. And yet R is of likely not the only programming language you will want to use.
While we cannot introduce everything, we’ll pick out a few that we think are particularly
helpful.
Here will cover
• Markdown
• LaTeX (and BibTeX)
as examples of a non-WYSIWYG editor
and the next chapter (you can read it without reading this LaTeX chapter) covers
• command-line
• git
command-line are a basic set of tools that you may have to use from time to time. It also
clarifies what more complicated programs are doing. Markdown is an example of compiling
a plain text file. LaTeX is a typesetting program and git is a version control program – both
are useful for non-quantitative work as well.

Check your understanding

Check if you have an idea of how you might code the following tasks:
• What does “WYSIWYG” stand for? How would a non-WYSIWYG format text?
• How do you start a header in markdown?
1 Module originally written by Shiro Kuriwaki

225
226 CHAPTER 13. LATEX AND MARKDOWN

• What are some “plain text” editors?


• How do you start a document in .tex?
• How do you start a environment in .tex?
• How do you insert a figure in .tex?
• How do you reference a figure in .tex?
• What is a .bib file?
• Say you came across a interesting journal article. How would you want to maintain
this reference so that you can refer to its citation in all your subsequent papers?

13.1 Motivation

Statistical programming is a fast-moving field. The beta version of R was released in 2000,
ggplot2 was released on 2005, and RStudio started around 2010. Of course, some program-
ming technologies are quite “old”: (C in 1969, C++ around 1989, TeX in 1978, Linux in 1991,
Mac OS in 1984). But it is easy to feel you are falling behind in the recent developments of
programming. Today we will do a brief and rough overview of some fundamental and new
tools other than R, with the general aim of having you break out of your comfort zone so
you won’t be shut out from learning these tools in the future.

13.2 Markdown

Markdown is the text we have been using throughout this course! At its core markdown
is just plain text. Plain text does not have any formatting embedded in it. Instead, the
formatting is coded up as text. Markdown is not a WYSIWYG (What you see is what
you get) text editor like Microsoft Word or Google Docs. This will mean that you need to
explicitly code for bold{text} rather than hitting Command+B and making your text look
bold on your own computer.

Markdown is known as a “light-weight” editor, which means that it is relatively easy to


write code that will compile. It is quick and easy and satisfies most presentation purposes;
you might want to try LaTeX for more involved papers.

13.2.1 markdown commands

For italic and bold, use either the asterisks or the underlines,

*italic* **bold**
_italic_ __bold__

And for headers use the hash symbols,

# Main Header
## Sub-headers
13.3. LATEX 227

Figure 13.1: How Rmds become PDFs or HTMLs

13.2.2 your own markdown

RStudio makes it easy to compile your very first markdown file by giving you templates.
Got to New > R Markdown, pick a document and click Ok. This will give you a skeleton of
a document you can compile – or “knit”.
Rmd is actually a slight modification of real markdown. It is a type of file that R reads and
turns into a proper md file. Then, it uses a document-conversion called pandoc to compile
your md into documents like PDF or HTML.

13.2.3 A note on plain-text editors

Multiple software exist where you can edit plain-text (roughly speaking, text that is not
WYSIWYG).
• RStudio (especially for R-related links)
• TeXMaker, TeXShop (especially for TeX)
• emacs, aquamacs (general)
• vim (general)
• Sublime Text (general)
Each has their own keyboard shortcuts and special features. You can browse a couple and
see which one(s) you like.

13.3 LaTeX
LaTeX is a typesetting program. You’d engage with LaTeX much like you engage with your
R code. You will interact with LaTeX in a text editor, and will writing code which will be
interpreted by the LaTeX compiler and which will finally be parsed to form your final PDF.

13.3.1 compile online

1. Go to https://www.overleaf.com
2. Scroll down and go to “CREATE A NEW PAPER” if you don’t have an account.
3. Let’s discuss the default template.
4. Make a new document, and set it as your main document. Then type in the Minimal
Working Example (MWE):
228 CHAPTER 13. LATEX AND MARKDOWN

\documentclass{article}
\begin{document}
Hello World
\end{document}

13.3.2 compile your first LaTeX document locally

LaTeX is a very stable system, and few changes to it have been made since the 1990s. The
main benefit: better control over how your papers will look; better methods for writing
equations or making tables; overall pleasing aesthetic.
1. Open a plain text editor. Then type in the MWE
\documentclass{article}
\begin{document}
Hello World
\end{document}

2. Save this as hello_world.tex. Make sure you get the file extension right.
3. Open this in your “LaTeX” editor. This can be TeXMaker, Aqumacs, etc..
4. Go through the click/dropdown interface and click compile.

13.3.3 main LaTeX commands

LaTeX can cover most of your typesetting needs, to clean equations and intricate diagrams.
Some main commands you’ll be using are below, and a very concise cheat sheet here: https:
//wch.github.io/latexsheet/latexsheet.pdf
Most involved features require that you begin a specific “environment” for that feature,
clearly demarcating them by the notation \begin{figure} and then \end{figure}, e.g. in
the case of figures.
\begin{figure}
\includegraphics{histogram.pdf}
\end{figure}
where histogram.pdf is a path to one of your files.
Notice that each line starts with a backslash \ – in LaTeX this is the symbol to run a
command.
The following syntax at the endpoints are shorthand for math equations.
\[\int x^2 dx\]

these compile math symbols: x2 dx.2

The align environment is useful to align your multi-line math, for example.
2 Enclosing with $$ instead of \[ also has the same effect, so you may see it too. But this is now discouraged
due to its inflexibility.
13.4. BIBTEX 229

\begin{align}
P(A \mid B) &= \frac{P(A \cap B)}{P(B)}\\
&= \frac{P(B \mid A)P(A)}{P(B)}
\end{align}

P (A ∩ B)
P (A | B) = (13.1)
P (B)
P (B | A)P (A)
= (13.2)
P (B)

Regression tables should be outputted as .tex files with packages like xtable and
stargazer, and then called into LaTeX by \input{regression_table.tex} where
regression_table.tex is the path to your regression output.
Figures and equations should be labelled with the tag (e.g. label{tab:regression} so
that you can refer to them later with their tag Table \ref{tab:regression}, instead of
hard-coding Table 2).
For some LaTeX commands you might need to load a separate package that someone else
has written. Do this in your preamble (i.e. before \begin{document}):
\usepackage[options]{package}
where package is the name of the package and options are options specific to the package.

Further Guides

For a more comprehensive listing of LaTeX commands, Mayya Komisarchik has a great
tutorial set of folders: https://scholar.harvard.edu/mkomisarchik/tutorials-0
There is a version of LaTeX called Beamer, which is a popular way of making a slideshow.
Slides in markdown is also a competitor. The language of Beamer is the same as LaTeX
but has some special functions for slides.

13.4 BibTeX

BibTeX is a reference system for bibliographical tests. We have a .bib file separately on our
computer. This is also a plain text file, but it encodes bibliographical resources with special
syntax so that a program can rearrange parts accordingly for different citation systems.

13.4.1 what is a .bib file?

For example, here is the Nunn and Wantchekon article entry in .bib form.
230 CHAPTER 13. LATEX AND MARKDOWN

Figure 13.2

Figure 13.3

@article{nunn2011slave,
title={The Slave Trade and the Origins of Mistrust in Africa},
author={Nunn, Nathan and Wantchekon, Leonard},
journal={American Economic Review},
volume={101},
number={7},
pages={3221--3252},
year={2011}
}
The first entry, nunn2011slave, is “pick your favorite” – pick your own name for your
reference system. The other slots in this @article entry are entries that refer to specific
bibliographical text.

13.4.2 what does LaTeX do with .bib files?

Now, in LaTeX, if you type


\textcite{nunn2011slave} argue that current variation in the trust among citizens of African co

as part of your text, then when the .tex file is compiled the PDF shows something like
in whatever citation style (APSA, APA, Chicago) you pre-specified!
Also at the end of your paper you will have a bibliography with entries ordered and formatted
in the appropriate citation.
This is a much less frustrating way of keeping track of your references – no need to hand-edit
formatting the bibliography to conform to citation rules (which biblatex already knows) and
no need to update your bibliography as you add and drop references (biblatex will only show
entries that are used in the main text).
13.4. BIBTEX 231

13.4.3 stocking up on your .bib files

You should keep your own .bib file that has all your bibliographical resources. Storing
entries is cheap (does not take much memory), so it is fine to keep all your references in one
place (but you’ll want to make a new one for collaborative projects where multiple people
will compile a .tex file).
For example, Gary’s BibTeX file is here: https://github.com/iqss-research/gkbibtex/
blob/master/gk.bib
Citation management software (Mendeley or Zotero) automatically generates .bib entries
from your library of PDFs for you, provided you have the bibliography attributes right.

Exercise
Create a LaTeX document for a hypothetical research paper on your laptop and, once you’ve
verified it compiles into a PDF, come show it to either one of the instructors.
You can also use overleaf if you have preference for a cloud-based system. But don’t swallow
the built-in templates without understanding or testing them.
Each student will have slightly different substantive interests, so we won’t impose much of
a standard. But at a minimum, the LaTeX document should have:
• A title, author, date, and abstract
• Sections
• Italics and boldface
• A figure with a caption and in-text reference to it.
Depending on your subfield or interests, try to implement some of the following:
• A bibliographical reference drawing from a separate .bib file
• A table
• A math expression
• A different font
• Different page margins
• Different line spacing

Concluding the Prefresher


Math may not be the perfect tool for every aspiring political scientist, but hopefully it was
useful background to have at the least:
Historians think this totally meaningless and nonsensical statistic is the product of
an early-modern epistemological shift in which numbers and quantifiable data became
revered above other kinds of knowledge as the most useful and credible form of truth
https://t.co/wVFyAQGxEv
— Gina Anne Tam ��� (?) May 29, 2018
232 CHAPTER 13. LATEX AND MARKDOWN

But we should be aware that too much slant towards math and programming can miss the
point:
To be clear, PhD training in Econ (first year) is often a disaster– like how to prove the
Central Limit Theorem (the LeBron James of Statistics) with polar-cooardinates. This is
mostly a way to demoralize actual economists and select a bunch of unimaginative math
jocks.
— Amitabh Chandra (?) August 14, 2018
Keep on learning, trying new techniques to improve your work, and learn from others!
What #rstats tricks did it take you way too long to learn? One of mine is using readRDS
and saveRDS instead of repeatedly loading from CSV
— Emily Riederer (?) August 19, 2017

Your Feedback Matters

Please tell us how we can improve the Prefresher: The Prefresher is a work in progress, with
material mainly driven by graduate students. Please tell us how we should change (or not
change) each of its elements:
https://harvard.az1.qualtrics.com/jfe/form/SV_esbzN8ZFAOPTqiV
Chapter 14

Text1

Where are we? Where are we headed?


Up till now, you should have covered:
• Loading in data;
• R notation;
• Matrix algebra.

14.1 Review
• " and ' are usually equivalent.
• <- and = are usually interchangeable2 . (x <- 3 is equivalent to x = 3, although the
former is more preferred because it explicitly states the assignment).
• Use ( ) when you are giving input to a function:
# my_results <- FunctionName(FunctionInputs)

note `c(1,2,3)` is inputting three numbers in the function `c`


• Use { } when you are defining a function or writing a for loop:
#function
MyFunction <- function(InputMatrix){
TempMat <- InputMatrix
for(i in 1:5){
TempMat <- t(TempMat) %*% TempMat / 10
}
return( TempMat )
}
1 Module originally written by Connor Jerzak
2 Only equal signs are allowed to define the values of a functions’ argument

233
234 CHAPTER 14. TEXT

myMat <- matrix(rnorm(100*5), nrow = 100, ncol = 5)


print( MyFunction(myMat) )

## [,1] [,2] [,3] [,4] [,5]


## [1,] 342.3602 196.1668 856.7638 -732.7517 173.1954
## [2,] 196.1668 515.3176 762.8554 -277.1625 299.6710
## [3,] 856.7638 762.8554 2697.1230 -1868.8323 461.6741
## [4,] -732.7517 -277.1625 -1868.8323 1678.3580 -264.6936
## [5,] 173.1954 299.6710 461.6741 -264.6936 219.0823
# loop
x <- c()
for(i in 1:20){
x[i] <- i
}
print(x)

## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

14.2 Goals for today

Today, we will learn more about using text data. Our objectives are:
• Reading and writing in text in R.
• To learn how to use paste and sprintf;
• To learn how to use regular expressions;
• To learn about other tools for representing + analyzing text in R.

14.3 Reading and writing text in R

• To read in a text file, use readLines


readLines("~/Downloads/Carboxylic acid - Wikipedia.html")
• To write a text file, use:
write.table(my_string_vector, "~/mydata.txt", sep="\t")

14.4 paste() and sprintf()

paste and sprintf are useful commands in text processing, such as for automatically naming
files or automatically performing a series of command over a subset of your data. Table
making also will often need these commands.
Paste concatenates vectors together.
14.5. REGULAR EXPRESSIONS 235

#use collapse for inputs of length > 1


my_string <- c("Not", "one", "could", "equal")
paste(my_string, collapse = " ")

## [1] "Not one could equal"


#use sep for inputs of length == 1
paste("Not", "one", "could", "equal", sep = " ")

## [1] "Not one could equal"


For more sophisticated concatenation, use sprintf. This is very useful for automatically
making tables.
sprintf("Coefficient for %s: %.3f (%.2f)", "Gender", 1.52324, 0.03143)

## [1] "Coefficient for Gender: 1.523 (0.03)"


#%s is replaced by a character string
#%.3f is replaced by a floating point digit with 3 decimal places
#%.2f is replaced by a floating point digit with 2 decimal places

14.5 Regular expressions

A regular expression is a special text string for describing a search pattern. They are most
often used in functions for detecting, locating, and replacing desired text in a corpus.
Use cases:
1. TEXT PARSING. E.g. I have 10000 congressional speaches. Find all those which
mention Iran.
2. WEB SCRAPING. E.g. Parse html code in order to extract research information from
an online table.
3. CLEANING DATA. E.g. After loading in a dataset, we might need to remove mistakes
from the dataset, orsubset the data using regular expression tools.
Example in R. Extract the tweet mentioning Indonesia.
s1 <- "If only Bradley's arm was longer. RT"
s2 <- "Share our love in Indonesia and in the World. RT if you agree."
my_string <- c(s1, s2)
grepl(my_string, pattern = "Indonesia")

## [1] FALSE TRUE


my_string[ grepl(my_string, pattern = "Indonesia")]

## [1] "Share our love in Indonesia and in the World. RT if you agree."
Key point: Many R commands use regular expressions. See ?grepl. Assume that x is a
character vector and that pattern is the target pattern. In the earlier example, x could
236 CHAPTER 14. TEXT

have been something like my_string and pattern would have been “Indonesia”. Here are
other key uses:
1. DETECT PATTERNS. grepl(pattern, x) goes through all the entries of x and
returns a string of TRUE and FALSE values of the same size as x. It will return
a TRUE whenever that string entry has the target pattern, and FALSE whenever it
doesn’t.
2. REPLACE PATTERNS. gsub(pattern, x, replacement) goes through all the en-
tries of x replaces the pattern with replacement.
gsub(x = my_string,
pattern = "o",
replacement = "AAAA")

## [1] "If AAAAnly Bradley's arm was lAAAAnger. RT"


## [2] "Share AAAAur lAAAAve in IndAAAAnesia and in the WAAAArld. RT if yAAAAu agree."
3. LOCATE PATTERNS. regexpr(pattern, text) goes through each element of the
character string. It returns a vector of the same length, with the entries of the vector
corresponding to the location of the first pattern match, or a -1 if no match was
obtained.
regex_object <- regexpr(pattern = "was", text = my_string)
attr(regex_object, "match.length")

## [1] 3 -1
attr(regex_object, "useBytes")

## [1] TRUE
regexpr(pattern = "was", text = my_string)[1]

## [1] 23
regexpr(pattern = "was", text = my_string)[2]

## [1] -1
Seems simple? The problem: the patterns can get pretty complex!

14.5.1 Character classes

Some types of symbols are stand in for some more complex thing, rather than taken literally.
[[:digit:]] Matches with all digits.
[[:lower:]] Matches with lower case letters.
[[:alpha:]] Matches with all alphabetic characters.
[[:punct:]] Matches with all punctuation characters.
[[:cntrl:]] Matches with “control” characters such as \n, \r, etc.
14.5. REGULAR EXPRESSIONS 237

Example in R:
my_string <- "Do you think that 34% of apples are red?"
gsub(my_string, pattern = "[[:digit:]]", replace ="DIGIT")

## [1] "Do you think that DIGITDIGIT% of apples are red?"


gsub(my_string, pattern = "[[:alpha:]]", replace ="")

## [1] " 34% ?"

14.5.2 Special Characters.

Certain characters (such as ., *, \) have special meaning in the regular expressions frame-
work (they are used to form conditional patterns as discussed below). Thus, when we want
our pattern to explicitly include those characters as characters, we must “escape” them by
using \ or encoding them in \Q…\E.
Example in R:
my_string <- "Do *really* think he will win?"
gsub(my_string, pattern = "\\*", replace ="")

## [1] "Do really think he will win?"


my_string <- "Now be brave! \n Dread what comrades say of you here in combat! "
gsub(my_string, pattern = "\\\n", replace ="")

## [1] "Now be brave! Dread what comrades say of you here in combat! "

14.5.3 Conditional patterns

[] The target characters to match are located between the brackets. For example, [aAbB]
will match with the characters a, A, b, B.
[^...] Matches with everything except the material between the brackets. For example,
[^aAbB] will match with everything but the characters a, A, b, B.
(?=) Lookahead – match something that IS followed by the pattern.
(?!) Negative lookahead — match something that is NOT followed by the pattern.
(?<=) Lookbehind – match with something that follows the pattern.
my_string <- "Do you think that 34%of the 23%of apples are red?"
gsub(my_string, pattern = "(?<=%)", replace = " ", perl = TRUE)

## [1] "Do you think that 34% of the 23% of apples are red?"
my_string <- c("legislative1_term1.png",
"legislative1_term1.pdf",
"legislative1_term2.png",
"legislative1_term2.pdf",
238 CHAPTER 14. TEXT

"term2_presidential1.png",
"presidential1.png",
"presidential1_term2.png",
"presidential1_term1.pdf",
"presidential1_term2.pdf")

grepl(my_string, pattern = "^(?!presidential1).*\\.png", perl = TRUE)

## [1] TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE
• Indicates which file names don’t start with presidential1 but do end in .png
• ^ indicates that the pattern should start at the beginning of the string.
• ?! indicates negative lookahead – we’re looking for any pattern NOT following presi-
dential1 which meets the subsequent conditions. (see below)
• The first . indicates that, following the negative lookahead, there can be any charac-
ters and the * says that it doesn’t matter how many. Note that we have to escape the
. in .png. (by writing \\. instead of just .)
You will have the chance to try out some regular expressions for yourself at the end!

14.6 Representing Text


In courses and research, we often want to analyze text, to extract meaning out of it. One
of the key decisions we need to make is how to represent the text as numbers. Once the
text is represented numerically, we can then apply a host of statistical and machine learning
methods to it. Those methods are discussed more in the Gov methods sequence (Gov
2000-2003). Here’s a summary of the decisions you must make:
1. WHICH TEXT TO USE? Which text do I want to analyze? What is my universe of
documents?
2. HOW TO REPRESENT THE TEXT NUMERICALLY? How do I use numbers to
represent different things about the text?
3. HOW TO ANALYZE THE NUMERICAL REPRESENTATION? How do I extract
meaning out of the numerical representation?
Representing text numerically.
1. Document term matrix. The document term matrix (DTM) is a common method for
representing text. The DTM is a matrix. Each row of this matrix corresponds to a
document; each column corresponds to a word. It is often useful to look at summary
statistics such as the percentage of speaches in which a Democratic lawmaker used the
word “inequality” compared to a Republican; the DTM would be very helpful for this
and other tasks.
doc1 <- "Rage---Goddess, sing the rage of Peleus’ son Achilles,
murderous, doomed, that cost the Achaeans countless losses,
hurling down to the House of Death so many sturdy souls,
great fighters’ souls."
doc2 <- "And fate? No one alive has ever escaped it,
14.7. IMPORTANT PACKAGES FOR PARSING TEXT 239

neither brave man nor coward, I tell you,


it's born with us the day that we are born."
doc3 <- "Many cities of men he saw and learned their minds,
many pains he suffered, heartsick on the open sea,
fighting to save his life and bring his comrades home."

DocVec <- c(doc1, doc2, doc3)

Now we can use utility functions in the tm package:


library(tm)
DocCorpus <- Corpus(VectorSource(DocVec) )
DTM1 <- inspect( DocumentTermMatrix(DocCorpus) )

Consider the effect of different “pre-processing” choices on the resulting DTM!


DocVec <- tolower(DocVec)
DocVec <- gsub(DocVec, pattern ="[[:punct:]]", replace = " ")
DocVec <- gsub(DocVec, pattern ="[[:cntrl:]]", replace = " ")
DocCorpus <- Corpus(VectorSource(DocVec) )
DTM2 <- inspect(DocumentTermMatrix(DocCorpus,
control = list(stopwords = TRUE, stemming = TRUE)))

Stemming is the process of reducing inflected/derived words to their word stem or base
(e.g. stemming, stemmed, stemmer –> stem*)

14.7 Important packages for parsing text

1. rvest – Useful for downloading and manipulating HTML and XM.


2. tm – Useful for converting text into a numerical representation (forming DTMs).
3. stringr – Useful for string parsing.

Exercises

Figure out why this command does what it does:


sprintf("%s of spontaneous events are %s in the mind.
Really, %.2f?",
"15.03322123", "puzzles", 15.03322123)

## [1] "15.03322123 of spontaneous events are puzzles in the mind. \n Really, 15.03?"
240 CHAPTER 14. TEXT

Why does this command not work?


try(sprintf("%s of spontaneous events are %s in the mind. Really, %.2f?",
"15.03322123", "puzzles", "15.03322123" ), TRUE)

Using grepl, these materials, Google, and your friends, describe what the following com-
mand does. What changes when value = FALSE?
grep('\'',
c("To dare is to lose one's footing momentarily.", "To not dare is to lose oneself."), valu

## [1] "To dare is to lose one's footing momentarily."

Write code to automatically extract the file names that DO end start with presidential and
DO end in .pdf
my_string <- c("legislative1_term1.png",
"legislative1_term1.pdf",
"legislative1_term2.png",
"legislative1_term2.pdf",
"term2_presidential1.png",
"presidential1.png",
"presidential1_term2.png",
"presidential1_term1.pdf",
"presidential1_term2.pdf")

Using the same string as in the above, write code to automatically extract the file names
that end in .pdf and that contain the text term2.
# Your code here

Combine these two strings into a single string separated by a “-”. Desired output: “The
carbonyl group in aldehydes and ketones is an oxygen analog of the carbon–carbon double
bond.”
14.7. IMPORTANT PACKAGES FOR PARSING TEXT 241

string1 <- "The carbonyl group in aldehydes and ketones


is an oxygen analog of the carbon"
string2 <- "–carbon double bond."

Challenge problem! Download this webpage https://en.wikipedia.org/wiki/Odyssey


• Read the html file into your R workspace.
• Remove all of the htlm tags (you may need Google to help with this one).
• Remove all punctuation.
• Make all the characters lower case.
• Do this same process with this webpage (https://en.wikipedia.org/wiki/Iliad).
• Form a document term matrix from the two resulting text strings.
# Your code here
242 CHAPTER 14. TEXT
Chapter 15

Command-line, git1

15.1 Where are we? Where are we headed?

Up till now, you should have covered:


• Statistical Programming in R
In conjunction with the markdown/LaTeX chapter, which is mostly used for typesetting
and presentation, here we’ll introduce the command-line and git, more used for software
extensions and version control

15.2 Check your understanding

Check if you have an idea of how you might code the following tasks:
• What is a GUI?
• What do the following commands stand for in shell: ls (or dir in Windows), cd, rm,
mv (or move in windows), cp (or copy in Windows).
• What is the difference between a relative path and an absolute path?
• What paths do these refer to in shell/terminal: ~/, ., ..
• What is a repository in github?
• What does it mean to “clone” a repository?

15.3 command-line

Elementary programming operations are done on the command-line, or by entering com-


mands into your computer. This is different from a UI or GUI – graphical user-interface –
which are interfaces that allow you to click buttons and enter commands in more readable
1 Module originally written by Shiro Kuriwaki

243
244 CHAPTER 15. COMMAND-LINE, GIT

form. Although there are good enough GUIs for most of your needs, you still might need to
go under the hood sometimes and run a command.

15.3.1 command-line commands

Open up Terminal in a Mac. (Command Prompt in Windows)


Running this command in a Mac (dir in Windows) should show you a list of all files in the
directory that you are currently in.
ls

## 01_warmup.Rmd
## 02_linear-algebra.Rmd
## 03_functions.Rmd
## 04_limits.Rmd
## 05_calculus.Rmd
## 06_optimization.Rmd
## 07_probability.Rmd
## 11_data-handling_counting.Rmd
## 12_matricies-manipulation.Rmd
## 13_visualization.Rmd
## 14_functions_obj_loops.Rmd
## 15_project-dempeace.Rmd
## 16_simulation.Rmd
## 17_non-wysiwyg.Rmd
## 18_text.Rmd
## 19_command-line_git.Rmd
## 21_solutions-warmup.Rmd
## 23_solution_programming.Rmd
## _book
## _bookdown_files
## _bookdown.yml
## _build.sh
## CODE_OF_CONDUCT.md
## CONTRIBUTING.md
## data
## _deploy.sh
## DESCRIPTION
## images
## index.Rmd
## LICENSE
## _output.yml
## preamble.tex
## prefresher_files
## prefresher.Rmd
## prefresher.Rproj
## README.md
## style.css
15.3. COMMAND-LINE 245

pwd stands for present working directory (cd in Windows)


pwd

## /home/travis/build/IQSS/prefresher
cd means change directory. You need to give it what to change your current directory to.
You can specify a name of another directory in your directory.
Or you can go up to your parent directory. The syntax for that are two periods, .. . One
period . refers to the current directory.
cd ..
pwd

## /home/travis/build/IQSS
~/ stands for your home directory defined by your computer.
cd ~/
ls

## apt-get-update.log
## bin
## build
## builds
## build.sh
## filter.rb
## gopath
## otp
## perl5
## R-bin
## texlive
## virtualenv
Using .. and . are “relative” to where you are currently at. So are things like
figures/figure1.pdf, which is implicitly writing ./figures/figure1.pdf. These are
called relative paths. In contrast, /Users/shirokuriwaki/project1/figures/figure1.pdf
is an “absolute” path because it does not start from your current directory.
Relative paths are nice if you have a shared Dropbox, for example, and I had
/Users/shirokuriwaki/mathcamp but Connor’s path to the same folder is /Users/connorjerzak/mathcamp.
To run the same code in mathcamp, we should be using relative paths that start from
“mathcamp”. Relative paths are also shorter, and they are invariant to higher-level changes
in your computer.

15.3.2 running things via command-line

Suppose you have a simple Rscript, call it hello_world.R. This is simply a plain text file
that contains
cat("Hello World")
246 CHAPTER 15. COMMAND-LINE, GIT

Then in command-line, go to the directory that contains hello_world.R and enter


Rscript hello_world.R

This should give you the output Hello World, which verifies that you “executed” the file
with R via the command-line.

15.3.3 why do command-line?

If you know exactly what you want to do your files and the changes are local, then command-
line might be faster and be more sensible than navigating yourself through a GUI. For
example, what if you wanted a single command that will run 10 R scripts successively at
once (as Gentzkow and Shapiro suggest you should do in your research)? It is tedious to run
each of your scripts on Rstudio, especially if running some take more than a few minutes.
Instead you could write a “batch” script that you can run on the terminal,
Rscript 01_read_data.R
Rscript 02_merge_data.R
Rscript 03_run_regressions.R
Rscript 04_make_graphs.R
Rscript 05_maketable.R

Then run this single file, call it run_all_Rscripts.sh, on your terminal as


sh run_all_Rscripts.sh

On the other hand, command-line prompts may require more keystrokes, and is also less
intuitive than a good GUI. It can also be dangerous for beginners, because it can allow you
to make large irreversible changes inadvertently. For example, removing a file (rm) has no
“Undo” feature.

15.4 git
Git is a tool for version control. It comes pre-installed on Macs, you will probably need to
install it yourself on Windows.

15.4.1 why version control?

All version control software should be built to


• preserve all snapshots of your work
• and catalog them in such a way that you can refer back or even revert back your files
to the past snapshot.
• makes it easy to see exactly which parts of your files you changed between directories.
Further, git is most commonly used for collaborative work.
• maintains “branches”, or parallel universes of your files that people can switch back
and forth on, doing version control on each one
15.4. GIT 247

• makes it easy to “merge” a sub-branch to a master branch when it is ready.


Note that Dropbox is useful for collaborative work too. But the added value of git’s branches
is that people can make different changes simultaneously on their computers and merge them
to the master branch later. In Dropbox, there is only one copy of each thing so simultaneous
editing is not possible.

15.4.2 open-source code at your fingertips

Some links to check out:


• https://github.com/tidyverse/dplyr
• https://github.com/apple/swift
• https://github.com/kosukeimai/qss
GitHub https://github.com is the GUI to git. Making an account there is free. Making
an account will allow you to be a part of the collaborative programming community. It will
also allow you to “fork” other people’s “repositories”. “Forking” is making your own copy
of the project that forks off from the master project at a point in time. A “repository” is
simply the name of your main project directory.
“cloning” someone else’s repository is similar to forking – it gives you your own copy.

15.4.3 commands in git

As you might have noticed from all the quoted terms, git uses a lot of its own terms that are
not intuitive and hard to remember at first. The nuts and bolts of maintaining your version
control further requires “adding”, “committing”, and “push”ing, sometimes “pull”ing.
The tutorial https://try.github.io/ is quite good. You’d want to have familiarity with
command-line to fully understand this and use it in your work.
RStudio Projects has a great git GUI as well.

15.4.4 is git worth it?

While git is a powerful tool, you may choose to not use it for everything because
• git is mainly for code, not data. It has a fairly strict limit on the size of your dataset
that you cover.
• your collaborators might want to work with Dropbox
• unless you get a paid account, all your repositories will be public.
248 CHAPTER 15. COMMAND-LINE, GIT
Part III

Solutions

249
Solutions to Warmup Questions

Linear Algebra

Vectors
   
1 4
Define the vectors u = 2, v = 5, and the scalar c = 2.
3 6
 
5
1. u + v = 7
  9
8
2. cv = 10
12
3. u · v = 1(4) + 2(5) + 3(6) = 32
If you are having trouble with these problems, please review Section 1.1 “Working with
Vectors” in Chapter 1.
Are the following sets of vectors linearly independent?
( ) ( )
1 2
1. u = ,v=
2 4
⇝ No: ( ) ( )
2 2
2u = ,v =
4 4
so infinitely many linear combinations of u and v that amount to 0 exist.
   
1 3
2. u = 2, v = 7
5 9
⇝ Yes: we cannot find linear combination of these two vectors that would amount to zero.
     
2 3 5
3. a = −1, b = −4, c = −10
1 −2 −8

251
252

⇝ No: After playing around with some numbers, we can find that
     
−4 9 −5
−2a =  2  , 3b = −12 , −1c =  10 
−2 −6 8

So  
0
−2a + 3b − c = 0
0

i.e., a linear combination of these three vectors that would amount to zero exists.
If you are having trouble with these problems, please review Section 1.2.

Matrices
 
7 5 1
11 9 3
A=
 2 14

21
4 1 5

What is the dimensionality of matrix A? 4 × 3


What is the element a23 of A? 3
Given that

 
1 2 8
3 9 11
B=
4

7 5
5 1 9
 
8 7 9
14 18 14

A+B= 
6 21 26
9 2 14

Given that

 
1 2 8
C = 3 9 11
4 7 5

A + C = No solution, matrices non-conformable

Given that
253

c=2

 
14 10 2
22 18 6
cA = 
 4 28

42
8 2 10

If you are having trouble with these problems, please review Section 1.3.

Operations

Summation

Simplify the following



3
1. i=1+2+3=6
i=1


3 ∑
3 ∑
3
2. (3k + 2) = 3 k+ 2 = 3 × 6 + 3 × 2 = 24
k=1 k=1 k=1


4 ∑
4 ∑
4 ∑
4
3. (3k + i + 2) = 3 k+ i+ 2 = 12k + 10 + 8 = 12k + 18
i=1 i=1 i=1 i=1

Products

3
1. i=1·2·3=6
i=1


3
2. (3k + 2) = (3 + 2) · (6 + 2) · (9 + 2) = 440
k=1

To review this material, please see Section 2.1.

Logs and exponents

Simplify the following


1. 42 = 16
2. 42 23 = 22·2 23 = 24+3 = 128
3. log10 100 = log10 102 = 2
4. log2 4 = log2 22 = 2
5. when log is the natural log, log e = loge e1 = 1
6. when a, b, c are each constants, ea eb ec = ea+b+c ,
7. log 0 = undefined – no exponentiation of anything will result in a 0.
254

8. e0 = 1 – any number raised to the 0 is always 1.


9. e1 = e – any number raised to the 1 is always itself
10. log e2 = loge e2 = 2

To review this material, please see Section 2.3

Limits

Find the limit of the following.

1. lim (x − 1) = 1
x→2
(x−2)(x−1) (x−2)(x−1)
2. lim (x−2) = 1, though note that the original function (x−2) would have been
x→2
undefined at x = 2 because of a divide by zero problem; otherwise it would have been
equal to x − 1.
3. lim x −3x+2
2

x−2 = 1, same as above.


x→2

To review this material please see Section 3.3

Calculus

For each of the following functions f (x), find the derivative f ′ (x) or d
dx f (x)

1. f (x) = c, f ′ (x) = 0
2. f (x) = x, f ′ (x) = 1
3. f (x) = x2 , f ′ (x) = 2x
4. f (x) = x3 , f ′ (x) = 3x2
5. f (x) = 3x2 + 2x1/3 , f ′ (x) = 6x + 23 x−2/3
6. f (x) = (x3 )(2x4 ), f ′ (x) = dx
d
2x7 = 14x6

For a review, please see Section 4.1 - 4.2

Optimization

For each of the followng functions f (x), does a maximum and minimum exist in the domain
x ∈ R? If so, for what are those values and for which values of x?

1. f (x) = x ⇝ neither exists.


2. f (x) = x2 ⇝ a minimum f (x) = 0 exists at x = 0, but not a maximum.
3. f (x) = −(x − 2)2 ⇝ a maximum f (x) = 0 exists at x = 2, but not a minimum.

If you are stuck, please try sketching out a picture of each of the functions.
255

Probability
(12 ) 12·11·10·9
1. If there are 12 cards, numbered 1 to 12, and 4 cards are chosen, 4 = 4! = 495
possible hands exist (unordered, without replacement) .
2. Let A = {1, 3, 5, 7, 8} and B = {2, 4, 7, 8, 12, 13}. Then A∪B = {1, 2, 3, 4, 5, 7, 8, 12, 13},
A ∩ B = {7, 8}? If A is a subset of the Sample Space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10},
then the complement AC = {2, 4, 6, 9, 10}
3. If we roll two fair dice, what is the probability that their sum would be 11? ⇝ 1
18

4. If we roll two fair dice, what is the probability that their sum would be 12? ⇝ 36 1
.
2
There are two independent dice, so 6 = 36 options in total. While the previous
question had two possibilities for a sum of 11 (5,6 and 6,5), there is only one possibility
out of 36 for a sum of 12 (6,6).
For a review, please see Sections 6.2 - 6.3
256
Suggested Programming
Solutions

library(dplyr)
library(readr)
library(ggplot2)
library(ggrepel)
library(forcats)
library(scales)

15.5 Chapter 9: Visualization

1 State Proportions

cen10 <- readRDS("data/input/usc2010_001percent.Rds")

Group by state, noting that the mean of a set of logicals is a mean of 1s (TRUE) and 0s
(FALSE).
grp_st <- cen10 %>%
group_by(state) %>%
summarize(prop = mean(age >= 65)) %>%
arrange(prop) %>%
mutate(state = as_factor(state))

Plot points
ggplot(grp_st, aes(x = state, y = prop)) +
geom_point() +
coord_flip() +
scale_y_continuous(labels = percent_format(accuracy = 1)) + # use the scales package to format
labs(
y = "Proportion Senior",
x = "",

257
258

caption = "Source: 2010 Census sample"


)

West Virginia
Maine
Vermont
Florida
South Carolina
Arkansas
Montana
Wyoming
Rhode Island
New Mexico
Indiana
Iowa
Ohio
Alabama
Arizona
Pennsylvania
Maryland
South Dakota
New Hampshire
Michigan
Connecticut
Kansas
Wisconsin
North Dakota
Mississippi
District of Columbia
New York
Missouri
New Jersey
Tennessee
Massachusetts
North Carolina
Nevada
Illinois
Louisiana
Nebraska
Delaware
Oklahoma
Hawaii
Oregon
Georgia
Washington
Virginia
Kentucky
Minnesota
California
Idaho
Texas
Colorado
Utah
Alaska
5% 10% 15% 20%
Proportion Senior
Source: 2010 Census sample

2 Swing Justice

justices <- read_csv("data/input/justices_court-median.csv")

Keep justices who are in the dataset in 2016,


in_2017 <- justices %>%
filter(term >= 2016) %>%
distinct(justice) %>% # unique values
mutate(present_2016 = 1) # keep an indicator to distinguish from rest after merge

df_indicator <- justices %>%


left_join(in_2017)

## Joining, by = "justice"
All together
ggplot(df_indicator, aes(x = term, y = idealpt, group = justice_id)) +
geom_line(aes(y = median_idealpt), color = "red", size = 2, alpha = 0.1) +
geom_line(alpha = 0.5) +
15.6. CHAPTER ??: OBJECTS AND LOOPS 259

geom_line(data = filter(df_indicator, !is.na(present_2016))) +


geom_point(data = filter(df_indicator, !is.na(present_2016), term == 2018)) +
geom_text_repel(
data = filter(df_indicator, term == 2016), aes(label = justice),
nudge_x = 10,
direction = "y"
) + # labels nudged and vertical
scale_x_continuous(breaks = seq(1940, 2020, 10), limits = c(1937, 2020)) + # axis breaks
scale_y_continuous(limits = c(-5, 5)) + # axis limits
labs(
x = "SCOTUS Term",
y = "Estimated Martin-Quinn Ideal Point",
caption = "Outliers capped at -5 to 5. Red lines indicate median justice. Current justices of
) +
theme_bw()

## Warning: Removed 19 rows containing missing values (geom_path).

5.0
Estimated Martin−Quinn Ideal Point

Thomas
2.5

Alito
Gorsuch
Roberts
0.0
Kennedy
Breyer
Kagan
−2.5 Ginsburg

Sotomayor

−5.0

1940 1950 1960 1970 1980 1990 2000 2010 2020


SCOTUS Term
Outliers capped at −5 to 5. Red lines indicate median justice. Current justices of the 2017 Court in black.

15.6 Chapter 10: Objects and Loops

cen10 <- read_csv("data/input/usc2010_001percent.csv")


sample_acs <- read_csv("data/input/acs2015_1percent.csv")
260

Checkpoint #3

cen10 %>%
group_by(state) %>%
summarise(avg_age = mean(age)) %>%
arrange(desc(avg_age)) %>%
slice(1:10)

## # A tibble: 10 x 2
## state avg_age
## <chr> <dbl>
## 1 West Virginia 44.1
## 2 Maine 42.1
## 3 Florida 41.3
## 4 New Hampshire 41.2
## 5 North Dakota 41.1
## 6 Montana 40.6
## 7 Vermont 40.3
## 8 Connecticut 40.1
## 9 Wisconsin 39.9
## 10 New Mexico 39.3

Exercise 1

colnames(sample_acs)

## [1] "serial" "pernum" "hhwt"


## [4] "perwt" "state" "county_identified"
## [7] "puma" "city" "sex"
## [10] "age" "birthyr" "race"
## [13] "hispan" "educ" "citizen"
## [16] "yrnatur"
unique(sample_acs$citizen)

## [1] "Born in the US"


## [2] "US citizen by naturalization"
## [3] "Not a citizen of the US"
## [4] "Born abroad of American parent(s)"
## [5] "Born in Puerto Rico, Guam, the US Virgin Islands,or the Northern Marianas"
mean(sample_acs$citizen != "Not a citizen of the US")

## [1] 0.9419765
15.6. CHAPTER ??: OBJECTS AND LOOPS 261

Exercise 3

states_of_interest <- c("California", "Massachusetts", "New Hampshire", "Washington")

for (state_i in states_of_interest) {


state_subset <- cen10 %>% filter(state == state_i)

print(state_i)

print(table(state_subset$race, state_subset$sex))
}

## [1] "California"
##
## Female Male
## American Indian or Alaska Native 21 21
## Black/Negro 127 126
## Chinese 76 65
## Japanese 15 12
## Other Asian or Pacific Islander 182 177
## Other race, nec 283 302
## Three or more major races 7 7
## Two major races 91 83
## White 1085 1083
## [1] "Massachusetts"
##
## Female Male
## American Indian or Alaska Native 4 1
## Black/Negro 21 17
## Chinese 8 7
## Japanese 1 1
## Other Asian or Pacific Islander 14 14
## Other race, nec 9 17
## Two major races 10 8
## White 272 243
## [1] "New Hampshire"
##
## Female Male
## American Indian or Alaska Native 1 0
## Black/Negro 0 1
## Chinese 0 1
## Japanese 1 0
## Other Asian or Pacific Islander 2 1
## Other race, nec 1 0
## Two major races 0 1
## White 66 63
## [1] "Washington"
##
262

## Female Male
## American Indian or Alaska Native 9 5
## Black/Negro 11 9
## Chinese 2 7
## Japanese 4 0
## Other Asian or Pacific Islander 28 18
## Other race, nec 19 18
## Three or more major races 0 2
## Two major races 17 16
## White 267 257

Exercise 4

race_d <- c()


state_d <- c()
proportion_d <- c()
answer <- data.frame(race_d, state_d, proportion_d)

Then
for (state in states_of_interest) {
for (race in unique(cen10$race)) {
race_state_num <- nrow(cen10[cen10$race == race & cen10$state == state, ])
state_pop <- nrow(cen10[cen10$state == state, ])
race_perc <- round(100 * (race_state_num / (state_pop)), digits = 2)
line <- data.frame(race_d = race, state_d = state, proportion_d = race_perc)
answer <- rbind(answer, line)
}
}

15.7 Chapter 11: Demoratic Peace Project

Task 1: Data Input and Standardization

mid_b <- read_csv("data/input/MIDB_4.2.csv")


polity <- read_excel("data/input/p4v2017.xls")

Task 2: Data Merging

mid_y_by_y <- data_frame(ccode = numeric(),


year = numeric(),
dispute = numeric())
colnames(mid_b)
15.8. CHAPTER ??: SIMULATION 263

for(i in 1:nrow(mid_b)) {
x <- data_frame(ccode = mid_b$ccode[i], ## row i's country
year = mid_b$styear[i]:mid_b$endyear[i], ## sequence of years for dispute in row i
dispute = 1)## there was a dispute
mid_y_by_y <- rbind(mid_y_by_y, x)
}

merged_mid_polity <- left_join(polity,


distinct(mid_y_by_y),
by = c("ccode", "year"))

Task 3: Tabulations and Visualization

#don't include the -88, -77, -66 values in calculating the mean of polity
mean_polity_by_year <- merged_mid_polity %>% group_by(year) %>% summarise(mean_polity = mean(poli

mean_polity_by_year_ordered <- arrange(mean_polity_by_year, year)

mean_polity_by_year_mid <- merged_mid_polity %>% group_by(year, dispute) %>% summarise(mean_polit

mean_polity_by_year_mid_ordered <- arrange(mean_polity_by_year_mid, year)

mean_polity_no_mid <- mean_polity_by_year_mid_ordered %>% filter(dispute == 0)


mean_polity_yes_mid <- mean_polity_by_year_mid_ordered %>% filter(dispute == 1)

answer <- ggplot(data = mean_polity_by_year_ordered, aes(x = year, y = mean_polity)) +


geom_line() +
labs(y = "Mean Polity Score",
x = "") +
geom_vline(xintercept = c(1914, 1929, 1939, 1989, 2008), linetype = "dashed")

answer + geom_line(data =mean_polity_no_mid, aes(x = year, y = mean_polity_mid), col = "indianred

15.8 Chapter 12: Simulation

15.8.1 Census Sampling

pop <- read_csv("data/input/usc2010_001percent.csv")

## Parsed with column specification:


## cols(
## state = col_character(),
## sex = col_character(),
264

## age = col_double(),
## race = col_character()
## )
mean(pop$race != "White")

## [1] 0.2806517
set.seed(1669482)
samp <- sample_n(pop, 100)
mean(samp$race != "White")

## [1] 0.22
ests <- c()
set.seed(1669482)

for (i in 1:20) {
samp <- sample_n(pop, 100)
ests[i] <- mean(samp$race != "White")
}

mean(ests)
pop_with_prop <- mutate(pop, propensity = ifelse(race != "White", 0.9, 1))

ests <- c()


set.seed(1669482)

for (i in 1:20) {
samp <- sample_n(pop_with_prop, 100, weight = propensity)
ests[i] <- mean(samp$race != "White")
}

mean(ests)
ests <- c()
set.seed(1669482)

for (i in 1:20) {
samp <- sample_n(pop_with_prop, 10000, weight = propensity)
ests[i] <- mean(samp$race != "White")
}

mean(ests)

You might also like