Homework 5
Homework 5
HOMEWORK #5
1. You have a dataset with observations on 1,230 individuals within the U.S., containing
the education (in years, denoted by the variable educ) of each individual, as well as
the education (in years) for each individual’s mother (denoted motheduc) and father
(denoted f atheduc). The sample correlation matrix is as follows:
Also, the sample variance of educ is 5.543 (s2educ = 5.543), the sample variance of
motheduc is 5.190 (s2motheduc = 5.190), and the sample variance of f atheduc is 10.653
(s2f atheduc = 10.653).
(a) What is the sample covariance between educ and motheduc? (Hint: Use the
formula for sample correlation.)
(b) What are the units of your answer from part (a)?
(c) If the sample covariance between motheduc and f atheduc is 4.454, what is the
sample variance of the sum of motheduc and f atheduc? That is, defining
sumeduc = motheduc + f atheduc, what is the sample variance s2sumeduc ?
(d) If the sample covariance between motheduc and f atheduc is 4.454, what is the
sample variance of the average of motheduc and f atheduc? That is, defining
avgeduc = 21 (motheduc + f atheduc), what is the sample variance s2avgeduc ?
(e) If the sample covariance between motheduc and f atheduc is 4.454, what is the
sample variance of the difference of motheduc and f atheduc? That is, defining
dif f educ = motheduc − f atheduc, what is the sample variance s2dif f educ ?
(f) Does it make intuitive sense that the sample variance found in part (c) is higher
than the sample variance found in part (e)? Explain why.
2. For this question, you will use R to analyze some different stock portfolios based upon
the S&P500 dataset that we’ve used in class. To get started, make sure that you have
the sp500-monthly-returns.csv dataset within your “working directory” and then
read the dataset into the data frame sp500 as follows:
1
You can then use the command View(sp500) to see the data as a spreadsheet in
the upper-left window of RStudio (make sure to use an upper-case V for the View
function).
(a) First, let’s focus on the first 20 stocks that appear in the spreadsheet. Ignore
IDX, which is in the second column, so the first 20 stocks are given by the stock
tickers AAP L through AP A. You can get descriptive statistics for these 20 stocks
by using the command summary(sp500[,3:22]), which “summarizes” columns
3 through 22 of the sp500 data frame. What is the sample mean for the AM D
stock?
(b) To see the sample standard deviations of each of the 20 stocks from part (a), you
can use the following command:
apply(sp500[,3:22], 2, sd)
Here, the apply function “applies” the sd function to each of the columns of
the first argument sp500[,3:22]. The second argument (equal to 2 here) is
what tells R to apply the function to columns (rather than rows).
Which stock has the highest sample standard deviation? Which stock has the
lowest sample standard deviation?
(c) Create a new variable that contains the monthly returns for a portfolio with equal
(1/2) weights on the first two stocks (AAP L, ABM D). What is the sample mean
and sample standard deviation for this two-stock portfolio?
(d) Create a new variable that contains the monthly returns for a portfolio with equal
(1/3) weights on the first three stocks (AAP L, ABM D, ABT ). What is the sample
mean and sample standard deviation for this three-stock portfolio?
(e) (Harder; optional) Continue this process to get an equally weighted 4-stock port-
folio with the first 4 stocks, an equally weighted 5-stock portfolio with the first
5 stocks, and so on, through an equally weighted 20-stock portfolio with the
first 20 stocks. For each portfolio, calculate the sample mean and sample stan-
dard deviation. Then, make two plots: (i) sample mean versus the number of
stocks (ranging from 2 to 20) in the portfolio and (ii) sample standard deviation
versus the number of stocks (ranging from 2 to 20) in the portfolio. (Hint: A
convenient function to use here is rowMeans, which creates a vector that aver-
ages across columns of a data frame. For example, the command portfolio
<- rowMeans(sp500[,3:8]) creates a portfolio variable corresponding to an
equally weighted portfolio consisting of the first 6 stocks (columns 3 through 8).)
(f) (Even harder; even more optional) Rather than using the first 20 stocks as in
part (e), instead choose 20 stocks randomly without replacement from the full
set of stocks (which are contained in columns 3 through 268 of the data frame).
Calculate the sample average and sample standard deviation for just the equally
weighted 20-stock portfolio. Use a loop to do this a bunch of times (say, 1000
2
times, where you are randomly picking 20 stocks each time) and store the sample
means and sample standard deviations along the way. Plot the histogram and/or
smoothed density for the sample means. Plot the histogram and/or smoothed
density for the sample standard deviations.
3. A company makes widgets and receives shipments of its widgets from a supplier. Sup-
pose that the true probability of a defect in any given widget is 10%, and assume that
the production process is independent for every widget made. Consider picking two
widgets at random from a shipment, and let the random variable X denote the number
of defective widgets. Note that the possible outcomes are {0, 1, 2}.