0% found this document useful (0 votes)

17 views

Tutorial 1

The document is a tutorial for applying data science methods in finance using R, focusing on equity data from the CRSP. It includes exercises on loading datasets, filtering samples, calculating market returns, and creating predictive features for stock returns. Additionally, it outlines preprocessing steps for machine learning models and references key academic papers related to the methods discussed.

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Tutorial 1

Uploaded by

q.s.b.bibo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Science Methods in Finance

R Tutorial 1

25 October 2024

Important Instructions
• These weekly exercises are highly relevant to the group assignment.

• It is optional, but we strongly encourage you to work through it.

NO write-up of your answers or submission is required

1
Question 1
The task in this question is to familiarize yourself with the equity data
from the Center for Research in Security Prices (CRSP).

1. Load the following dataset into R: “CRSP Monthly.csv”. Before loading a

comma-separated values file into R, The following functions load text files
(such as csv) into R: read.csv from Base R, read delim from readr, and fread
from data.table. Note that fread from data.table is much faster than the other
methods when the dataset is large.

2. Examine the dataset using the head() and summary() functions. You should
see that the price variable contains positive and negative values. Find the
reason why price takes on negative values and solve the problem. Hint: consult
the variable descriptions tab of the CRSP Monthly Stock File on the WRDS
website.

Question 2
The purpose with this exercise is to get you start thinking about restrict-
ing a sample for a specific purpose. In research, filters are almost always used
to convert the raw data into a relevant sample. After having filtered your data you
will construct a value-weighted portfolio return.

1. Restrict the sample to common stocks. The variable shrcd can be used for this
purpose. More information is in the variable descriptions tab of the CRSP
Monthly Stock File on the WRDS website.

2. Restrict the sample to stocks that trade on the following exchanges: New York
Stock Exchange (NYSE), American Stock Exchange (AMEX), National Asso-
ciation of Securities Dealers Automated Quotations (NASDAQ). The variable
exchcd can be used for this purpose.

3. Calculate the value-weighted market returns of this sample. Make sure you
use the correct return definition that includes dividend and adjustments for
corporate events. In a value-weighted portfolio every stock is assigned a weight
proportional to its market capitalization. This is quite tricky, since data for
the 31st of January contains the price and shares outstanding at the 31st of
January but the return during the month of January. Hint: Lag market value
and look up the function “weighted.mean” in the dplyr library. Alternatively
construct the weights yourself.

4. Optional: Check the correlation between the U.S. market return you calcu-
lated with the market factor available at Ken French’s website.

2
Question 3
The task in this question is to create features (characteristics) that we
will later use to predict returns with supervised learning methods.

1. Construct the variable Short-Term Reversal based on the paper of Jegadeesh

(1990). We recommend you stick to the notation we introduce here (left hand
side of the equations)

RIt
ret 1 0 = − 1 = (1 + rett ) − 1, (1)
RIt−1

where RI is equal to the cumulative return.1 Construct also the variable

Momentum 1-12 Months based on the paper of Fama and French (1996).

RIt−1
ret 12 1 = −1 (2)
RIt−12

2. Lag the characteristics by 1 month to prepare the data for creating portfolios.

3. Construct a portfolio that takes a long position in the stocks that are in the
top 10% of the distribution of the variable “Momentum 1-12 Months” in a
specific month. Take a short position in the stocks that are in the bottom
10% of the same distribution in the same month. We recommend that you use
the quantile function to create the cut-off points you need to allocate stocks
into different portfolios.

4. After having assigned stocks to portfolios, calculate the value-weighted re-

turn of the long leg (top 10% of stocks based on the variable Momentum
1-12 Months) and the short leg (bottom 10% of stocks based on the variable
Momentum 1-12 Months).

5. Create the “factor” as the return of a long-short portfolio strategy.

6. Show that this strategy delivers a positive alpha relative to the Capital Asset
Pricing Model (CAPM).2

1
Note that the cumprod function (Base R) does not work if there are gaps in the data.
2
Hint: you have the return of the long-short portfolio, your dependent variable, and the value-
weighted market return from Question 2, your independent variable.

3
List of additional characteristics you can test

1. Construct the variable Momentum 1-3 Months based on the paper of Je-
gadeesh and Titman (1993).

RIt−1
ret 3 1 = − 1 = (1 + rett−1 ) × (1 + rett−2 ) − 1, (3)
RIt−3

2. Construct the variable Momentum 1-6 Months based on the paper of Je-
gadeesh and Titman (1993).

RIt−1
ret 6 1 = −1
RIt−6

= (1 + rett−1 ) × (1 + rett−2 ) × (1 + rett−3 ) × (1 + rett−4 ) × (1 + rett−5 ) − 1
(4)

3. Construct the variable Momentum 1-9 Months based on the paper of Je-
gadeesh and Titman (1993).

RIt−1
ret 9 1 = −1 (5)
RIt−9

4. Construct the variable Momentum 7-12 Months based on the paper of Novy-
Marx (2013).
RIt−7
ret 12 7 = −1 (6)
RIt−12
5. Construct the variable Momentum 13-36 Months based on the paper of Bondt
and Thaler (1985).
RIt−13
ret 36 13 = −1 (7)
RIt−36
6. Construct the variable Long-Term Reversal based on the paper of Bondt and
Thaler (1987).
RIt−12
ret 60 12 = −1 (8)
RIt−60

4
Question 4
This question deals with preprocessing of data for return predictions.
Load into R: ”CRSP Monthly Including Lagged Characteristics.csv”. The features
are already lagged by one month to prevent the use of future information. Note:
it is essentially the same dataset as in the previous question. If you want, you can
use fread from data.table to load files (it is faster than most alternatives when the
dataset is large). In dealing with dates, we recommend the lubridate package.
1. Select (create if not present in the data) the following variables:
(a) Permno (permno)
(b) Date (date)
(c) Year (year)
(d) Month (month),
(e) Lagged Market Value (lag1.market value)
(f) Return (ret)
(g) Short-Term Reversal (ret 1 0)
(h) Momentum 1-3 Months (ret 3 1)
(i) Momentum 1-6 Months (ret 6 1)
(j) Momentum 1-9 Months (ret 9 1)
(k) Momentum 7-12 Months (ret 12 1)
2. Machine learning models can not be used with missing data. To solve this
problem, we will follow the steps as suggested by Gu, Kelly, and Xiu (2020).
Impute missing features with their cross-sectional median as follows: (i) cal-
culate the cross-sectional median for each stock-level predictive characteristic,
(ii) check whether the stock-level predictive characteristic is missing, and (iii)
replace the stock-level predictive characteristic with its cross-sectional median
if it is missing.3
3. Restrict the sample to 2001 and onward (you lose the first year due to the con-
struction of the variables) and to firm × date observations with non-missing
lagged market values. Set missing returns (ret) to zero.
4. Calculate the summary statistics. You probably noticed that some of the
returns are extremely large. To prevent outliers from influencing your results,
winsorize the returns at 0.5% level (e.g., use the Winsorize function from the
DescTools library)
5. Normalize all features between -1 and 1 in the cross-section:
2 × (x − min(x))
−1
(max(x) − min(x))
6. Replace the stock-level predictive characteristics with 0 if missing after the
previous steps and drop all the rows for which the ret column contains a
missing value.
3
This problem can be solved for all stock-level predictive characteristics using the mutate at
function from dplyr.

5
References
Bondt, Werner F. M. De and Richard Thaler. 1985. “Does the Stock Market Overreact?”
The Journal of Finance 40 (3):793–805.

Bondt, Werner F. M. De and Richard H. Thaler. 1987. “Further Evidence on Investor

Overreaction and Stock Market Seasonality.” The Journal of Finance 42 (3):557–581.

Fama, Eugene F. and Kenneth R. French. 1996. “Multifactor Explanations of Asset

Pricing Anomalies.” The Journal of Finance 51 (1):55–84.

Gu, Shihao, Bryan Kelly, and Dacheng Xiu. 2020. “Empirical Asset Pricing via Machine
Learning.” The Review of Financial Studies 33 (5):2223–2273.

Jegadeesh, Narasimhan. 1990. “Evidence of Predictable Behavior of Security Returns.”

The Journal of Finance 45 (3):881–898.

Jegadeesh, Narasimhan and Sheridan Titman. 1993. “Returns to Buying Winners and
Selling Losers: Implications for Stock Market Efficiency.” The Journal of Finance
48 (1):65–91.

Novy-Marx, Robert. 2013. “The other side of value: The gross profitability premium.”
Journal of Financial Economics 108 (1):1–28.

Scope For FINAL Exam Grade 9 2024 - TERM 4
80% (5)
Scope For FINAL Exam Grade 9 2024 - TERM 4
3 pages
Part 1: Predicting Stock Returns.: Data Description
0% (1)
Part 1: Predicting Stock Returns.: Data Description
8 pages
#1 - Midterm Self Evaluation Solutions
No ratings yet
#1 - Midterm Self Evaluation Solutions
6 pages
FM423 Practice Exam III
No ratings yet
FM423 Practice Exam III
7 pages
Graph Theory
From Everand
Graph Theory
Ronald Gould
No ratings yet
QAP Bridge
100% (2)
QAP Bridge
23 pages
Calculation Radial Forced Slot Wedge-Paper - 40
No ratings yet
Calculation Radial Forced Slot Wedge-Paper - 40
6 pages
HW3 Equity
No ratings yet
HW3 Equity
5 pages
BRM L4 Hypothesis Maths Questions
No ratings yet
BRM L4 Hypothesis Maths Questions
3 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Tutorial 5: An Introduction To Asset Pricing Models
No ratings yet
Tutorial 5: An Introduction To Asset Pricing Models
49 pages
Case-Study-1-Codes
No ratings yet
Case-Study-1-Codes
4 pages
Erratum to Heterogeneous Intermediary Asset Pricing 2021 Journal of Fina
No ratings yet
Erratum to Heterogeneous Intermediary Asset Pricing 2021 Journal of Fina
4 pages
Lecture 2
No ratings yet
Lecture 2
74 pages
m339w Sample Three
No ratings yet
m339w Sample Three
8 pages
Exercises and Solutions For Finance Theory and Modelling.
No ratings yet
Exercises and Solutions For Finance Theory and Modelling.
21 pages
Workshop 5 Topics 11 and 12
No ratings yet
Workshop 5 Topics 11 and 12
2 pages
The Single Index Model
No ratings yet
The Single Index Model
53 pages
BE314 2022-23 CW
No ratings yet
BE314 2022-23 CW
6 pages
Financial Modelling
No ratings yet
Financial Modelling
17 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
Assig 2
No ratings yet
Assig 2
7 pages
Ass1 Fin5eme Sem2 2019 Final
No ratings yet
Ass1 Fin5eme Sem2 2019 Final
5 pages
Solutions Manual To Accompany An Introduction To Financial Markets
No ratings yet
Solutions Manual To Accompany An Introduction To Financial Markets
100 pages
Assignment 4 PDF
No ratings yet
Assignment 4 PDF
5 pages
WQU Econometrics Group Work Project
50% (2)
WQU Econometrics Group Work Project
12 pages
Homework 2 Assignment
0% (2)
Homework 2 Assignment
6 pages
2022sem1
No ratings yet
2022sem1
5 pages
CRT 2 Assignment
No ratings yet
CRT 2 Assignment
10 pages
Bootstrap Dependent Andreas Sunesson
No ratings yet
Bootstrap Dependent Andreas Sunesson
53 pages
Formal Assignment 1
No ratings yet
Formal Assignment 1
4 pages
Download Study Resources for Corporate Finance 9th Edition Ross Solutions Manual
100% (3)
Download Study Resources for Corporate Finance 9th Edition Ross Solutions Manual
52 pages
Exame1Epoca_1
No ratings yet
Exame1Epoca_1
10 pages
5250 Final 2019 Practice
No ratings yet
5250 Final 2019 Practice
7 pages
Portfolio Management Using Robust Optimization
No ratings yet
Portfolio Management Using Robust Optimization
42 pages
Regression Analysis: Case Study 1: Dr. Kempthorne September 23, 2013
No ratings yet
Regression Analysis: Case Study 1: Dr. Kempthorne September 23, 2013
22 pages
MFin Sample Exam Solution
No ratings yet
MFin Sample Exam Solution
8 pages
Solutions 1
No ratings yet
Solutions 1
9 pages
Sapm (10SBCM0361) Assignment - 2
No ratings yet
Sapm (10SBCM0361) Assignment - 2
7 pages
Updated Questions Notes
No ratings yet
Updated Questions Notes
10 pages
Event Study Sample Paper 5 24 2019 Ttu 2009
No ratings yet
Event Study Sample Paper 5 24 2019 Ttu 2009
11 pages
Practice Exam 2
No ratings yet
Practice Exam 2
6 pages
Assignment 2: Portfolio Optimisation: The Markowitz Framework For Portfolio Composition
No ratings yet
Assignment 2: Portfolio Optimisation: The Markowitz Framework For Portfolio Composition
3 pages
Revision Questions and Classwork 8
No ratings yet
Revision Questions and Classwork 8
8 pages
Topic 1
No ratings yet
Topic 1
61 pages
Multiple Choice Problems
No ratings yet
Multiple Choice Problems
11 pages
Multiple Choice Problems
No ratings yet
Multiple Choice Problems
11 pages
Week 3 Tutorials - PDF PDF
No ratings yet
Week 3 Tutorials - PDF PDF
9 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
MATH4512 hw3
No ratings yet
MATH4512 hw3
5 pages
Problem Set 4 Sol
No ratings yet
Problem Set 4 Sol
4 pages
Project 2: The Capital Asset Pricing Model and Portfolio Theory
No ratings yet
Project 2: The Capital Asset Pricing Model and Portfolio Theory
11 pages
5 Financial Analytics Annotated (1)
No ratings yet
5 Financial Analytics Annotated (1)
57 pages
Data Challenge Problem Statement
No ratings yet
Data Challenge Problem Statement
2 pages
FINALS Investment Cheat Sheet
No ratings yet
FINALS Investment Cheat Sheet
9 pages
MorePorts [UP]
No ratings yet
MorePorts [UP]
52 pages
SAPM_Unit 3
No ratings yet
SAPM_Unit 3
144 pages
1-s2.0-S0957417423000283-main
No ratings yet
1-s2.0-S0957417423000283-main
20 pages
Corporate Finance 9th Edition Ross Solutions Manual - Complete Set Of Chapters Available For One-Click Download
100% (3)
Corporate Finance 9th Edition Ross Solutions Manual - Complete Set Of Chapters Available For One-Click Download
36 pages
5250 Final 2022 Practice Ans
No ratings yet
5250 Final 2022 Practice Ans
7 pages
Final Corpo 2018 Solution
No ratings yet
Final Corpo 2018 Solution
11 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Strategic Risk Management: Designing Portfolios and Managing Risk
From Everand
Strategic Risk Management: Designing Portfolios and Managing Risk
Campbell R. Harvey
No ratings yet
Column Fields Field: Additional Navigational Attributes
No ratings yet
Column Fields Field: Additional Navigational Attributes
25 pages
Vglove-Medical Glove Brochure
No ratings yet
Vglove-Medical Glove Brochure
20 pages
Lesson 7
No ratings yet
Lesson 7
63 pages
Available Freeware List
0% (1)
Available Freeware List
268 pages
Petroleum Coke Category Analysis and Hazard Characterization
No ratings yet
Petroleum Coke Category Analysis and Hazard Characterization
40 pages
Head Loss and Darcy Weisbach Equation Problems
100% (1)
Head Loss and Darcy Weisbach Equation Problems
17 pages
Theory - Introjoist and Structural Glossary 4
No ratings yet
Theory - Introjoist and Structural Glossary 4
29 pages
Nozzle Repad
No ratings yet
Nozzle Repad
1 page
RW Logical Fallacy
No ratings yet
RW Logical Fallacy
4 pages
Nessco Catalog 1
No ratings yet
Nessco Catalog 1
16 pages
Simulate Analysis Functionality
No ratings yet
Simulate Analysis Functionality
2 pages
Aims of Finance Function
100% (2)
Aims of Finance Function
56 pages
PDF Download
No ratings yet
PDF Download
5 pages
Awp Unit 3
No ratings yet
Awp Unit 3
27 pages
Welcome Guide: PFD Tools For Productive People
No ratings yet
Welcome Guide: PFD Tools For Productive People
10 pages
Docslide - Us - Trimble Series 4000 Reference Manual PDF
No ratings yet
Docslide - Us - Trimble Series 4000 Reference Manual PDF
349 pages
Predicted Failure
No ratings yet
Predicted Failure
3 pages
Metasomatic Zones in Metamorphic Rocks
No ratings yet
Metasomatic Zones in Metamorphic Rocks
13 pages
ML 4
No ratings yet
ML 4
21 pages
Electrical Transformers
No ratings yet
Electrical Transformers
21 pages
vattenfall technical guideline vtr14-01e_cables-36-170kv
No ratings yet
vattenfall technical guideline vtr14-01e_cables-36-170kv
13 pages
Q1-Summative 1-SUMMER CLASS
No ratings yet
Q1-Summative 1-SUMMER CLASS
2 pages
Revision Acid and Alkali and Simple Reactions
100% (1)
Revision Acid and Alkali and Simple Reactions
10 pages
10G EPON OLT Quick Operation Guide
100% (1)
10G EPON OLT Quick Operation Guide
13 pages
2 Quarter Science 10 Lesson 9 Lesson: Electromagnetic Waves: A. Content Standards
No ratings yet
2 Quarter Science 10 Lesson 9 Lesson: Electromagnetic Waves: A. Content Standards
10 pages
Gear Cutting Machine Code For Arduino
100% (1)
Gear Cutting Machine Code For Arduino
5 pages
Module 1 Part 2
No ratings yet
Module 1 Part 2
63 pages

Tutorial 1

Uploaded by

Tutorial 1

Uploaded by

Data Science Methods in Finance

• It is optional, but we strongly encourage you to work through it.

NO write-up of your answers or submission is required

1. Load the following dataset into R: “CRSP Monthly.csv”. Before loading a

1. Construct the variable Short-Term Reversal based on the paper of Jegadeesh

where RI is equal to the cumulative return.1 Construct also the variable

4. After having assigned stocks to portfolios, calculate the value-weighted re-

5. Create the “factor” as the return of a long-short portfolio strategy.

Bondt, Werner F. M. De and Richard H. Thaler. 1987. “Further Evidence on Investor

Fama, Eugene F. and Kenneth R. French. 1996. “Multifactor Explanations of Asset

Jegadeesh, Narasimhan. 1990. “Evidence of Predictable Behavior of Security Returns.”

You might also like