Tutorial 1
Tutorial 1
R Tutorial 1
25 October 2024
Important Instructions
• These weekly exercises are highly relevant to the group assignment.
1
Question 1
The task in this question is to familiarize yourself with the equity data
from the Center for Research in Security Prices (CRSP).
2. Examine the dataset using the head() and summary() functions. You should
see that the price variable contains positive and negative values. Find the
reason why price takes on negative values and solve the problem. Hint: consult
the variable descriptions tab of the CRSP Monthly Stock File on the WRDS
website.
Question 2
The purpose with this exercise is to get you start thinking about restrict-
ing a sample for a specific purpose. In research, filters are almost always used
to convert the raw data into a relevant sample. After having filtered your data you
will construct a value-weighted portfolio return.
1. Restrict the sample to common stocks. The variable shrcd can be used for this
purpose. More information is in the variable descriptions tab of the CRSP
Monthly Stock File on the WRDS website.
2. Restrict the sample to stocks that trade on the following exchanges: New York
Stock Exchange (NYSE), American Stock Exchange (AMEX), National Asso-
ciation of Securities Dealers Automated Quotations (NASDAQ). The variable
exchcd can be used for this purpose.
3. Calculate the value-weighted market returns of this sample. Make sure you
use the correct return definition that includes dividend and adjustments for
corporate events. In a value-weighted portfolio every stock is assigned a weight
proportional to its market capitalization. This is quite tricky, since data for
the 31st of January contains the price and shares outstanding at the 31st of
January but the return during the month of January. Hint: Lag market value
and look up the function “weighted.mean” in the dplyr library. Alternatively
construct the weights yourself.
4. Optional: Check the correlation between the U.S. market return you calcu-
lated with the market factor available at Ken French’s website.
2
Question 3
The task in this question is to create features (characteristics) that we
will later use to predict returns with supervised learning methods.
RIt
ret 1 0 = − 1 = (1 + rett ) − 1, (1)
RIt−1
RIt−1
ret 12 1 = −1 (2)
RIt−12
2. Lag the characteristics by 1 month to prepare the data for creating portfolios.
3. Construct a portfolio that takes a long position in the stocks that are in the
top 10% of the distribution of the variable “Momentum 1-12 Months” in a
specific month. Take a short position in the stocks that are in the bottom
10% of the same distribution in the same month. We recommend that you use
the quantile function to create the cut-off points you need to allocate stocks
into different portfolios.
6. Show that this strategy delivers a positive alpha relative to the Capital Asset
Pricing Model (CAPM).2
1
Note that the cumprod function (Base R) does not work if there are gaps in the data.
2
Hint: you have the return of the long-short portfolio, your dependent variable, and the value-
weighted market return from Question 2, your independent variable.
3
List of additional characteristics you can test
1. Construct the variable Momentum 1-3 Months based on the paper of Je-
gadeesh and Titman (1993).
RIt−1
ret 3 1 = − 1 = (1 + rett−1 ) × (1 + rett−2 ) − 1, (3)
RIt−3
2. Construct the variable Momentum 1-6 Months based on the paper of Je-
gadeesh and Titman (1993).
RIt−1
ret 6 1 = −1
RIt−6
= (1 + rett−1 ) × (1 + rett−2 ) × (1 + rett−3 ) × (1 + rett−4 ) × (1 + rett−5 ) − 1
(4)
3. Construct the variable Momentum 1-9 Months based on the paper of Je-
gadeesh and Titman (1993).
RIt−1
ret 9 1 = −1 (5)
RIt−9
4. Construct the variable Momentum 7-12 Months based on the paper of Novy-
Marx (2013).
RIt−7
ret 12 7 = −1 (6)
RIt−12
5. Construct the variable Momentum 13-36 Months based on the paper of Bondt
and Thaler (1985).
RIt−13
ret 36 13 = −1 (7)
RIt−36
6. Construct the variable Long-Term Reversal based on the paper of Bondt and
Thaler (1987).
RIt−12
ret 60 12 = −1 (8)
RIt−60
4
Question 4
This question deals with preprocessing of data for return predictions.
Load into R: ”CRSP Monthly Including Lagged Characteristics.csv”. The features
are already lagged by one month to prevent the use of future information. Note:
it is essentially the same dataset as in the previous question. If you want, you can
use fread from data.table to load files (it is faster than most alternatives when the
dataset is large). In dealing with dates, we recommend the lubridate package.
1. Select (create if not present in the data) the following variables:
(a) Permno (permno)
(b) Date (date)
(c) Year (year)
(d) Month (month),
(e) Lagged Market Value (lag1.market value)
(f) Return (ret)
(g) Short-Term Reversal (ret 1 0)
(h) Momentum 1-3 Months (ret 3 1)
(i) Momentum 1-6 Months (ret 6 1)
(j) Momentum 1-9 Months (ret 9 1)
(k) Momentum 7-12 Months (ret 12 1)
2. Machine learning models can not be used with missing data. To solve this
problem, we will follow the steps as suggested by Gu, Kelly, and Xiu (2020).
Impute missing features with their cross-sectional median as follows: (i) cal-
culate the cross-sectional median for each stock-level predictive characteristic,
(ii) check whether the stock-level predictive characteristic is missing, and (iii)
replace the stock-level predictive characteristic with its cross-sectional median
if it is missing.3
3. Restrict the sample to 2001 and onward (you lose the first year due to the con-
struction of the variables) and to firm × date observations with non-missing
lagged market values. Set missing returns (ret) to zero.
4. Calculate the summary statistics. You probably noticed that some of the
returns are extremely large. To prevent outliers from influencing your results,
winsorize the returns at 0.5% level (e.g., use the Winsorize function from the
DescTools library)
5. Normalize all features between -1 and 1 in the cross-section:
2 × (x − min(x))
−1
(max(x) − min(x))
6. Replace the stock-level predictive characteristics with 0 if missing after the
previous steps and drop all the rows for which the ret column contains a
missing value.
3
This problem can be solved for all stock-level predictive characteristics using the mutate at
function from dplyr.
5
References
Bondt, Werner F. M. De and Richard Thaler. 1985. “Does the Stock Market Overreact?”
The Journal of Finance 40 (3):793–805.
Gu, Shihao, Bryan Kelly, and Dacheng Xiu. 2020. “Empirical Asset Pricing via Machine
Learning.” The Review of Financial Studies 33 (5):2223–2273.
Jegadeesh, Narasimhan and Sheridan Titman. 1993. “Returns to Buying Winners and
Selling Losers: Implications for Stock Market Efficiency.” The Journal of Finance
48 (1):65–91.
Novy-Marx, Robert. 2013. “The other side of value: The gross profitability premium.”
Journal of Financial Economics 108 (1):1–28.