0% found this document useful (0 votes)
20 views71 pages

03-Data Gathering and Preparation

Uploaded by

srusti.patil.97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views71 pages

03-Data Gathering and Preparation

Uploaded by

srusti.patil.97
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 71

FINA 5376

Financial Analytics
Sima Jannati
Download Finance Data
Download Finance Data

 Common public sources: FRED, census, Yahoo Finance


 Google Finance stopped providing data in March 2018
 Subscription based: WRDS, Bloomberg, Capital IQ, etc.
 Using R, we can automate some these data download
 We saw earlier how to get historical price information from Yahoo
R Code - FRED
library(quantmod)
spy <- getSymbols(Symbols = "SPY", auto.assign = FALSE)
getSymbols(Symbols = "QQQ", auto_assign = TRUE)
getSymbols(Symbols = "QQQ", src = "yahoo")

getSymbols(Symbols = "UNRATE", src = "FRED")

library(Quandl)
getSymbols(Symbols = "GDP", src = "FRED")
gdp <- Quandl(code = "FRED/GDP")
gdp_xts <- Quandl(code = "FRED/GDP", type = "xts")

apple_stock <- getSymbols("AAPL", from = "2010-12-31", to = "2013-12-31", auto.assign =


FALSE)
plot(apple_stock)
R Code - Census
To get data from Census you should first go to https://api.census.gov/data/key_signup.html and request a
registration key.

library(tidycensus)
census_api_key("YOUR KEY GOES HERE", install = TRUE)
R Code - Census

 The two main functions in tidycensus are get_decennial() for the 2000 and 2010
decennial Censuses and get_acs() for the American Community Survey
 The two required arguments are geography and variables for the functions to work
 The default year in get_decennial() is 2010
R Code- Census
pop10 <- get_decennial(
geography = "state",
variables = "P001001"
)

income_15to19 <- get_acs(


geography = "state",
variables = "B19013_001"
)
Characteristic of Stocks
Review of Four Statistical Measures
 Four important statistical measures:
1. Expectation or mean

2. Variance and standard deviation

3. Covariance
4. Correlation
Random Variable
Example:

P{ x = -1 : R < 0 } = 0.3 Probability


p2
P{ x = 0 : R = 0 } = 0.5 p1
p3
P{ x = 1 : R > 0 } = 0.2

p1 + p2 + p3 = 1
-1 0 1
x1 x2 x3

In general, a random variable will take many values, e.g., the range of future stock prices are:
[0, +∞].
So we have different realizations that can occur with probabilities.
Random Variable
General Probability Distribution:

return
Simplification: Event-tree
Many times it’s easier to focus on different scenarios.
For example, when you are an analyst issuing stock recommendations.

p1 Negative returns: x1

p2 Zero return: x
2

p3 Positive returns: x3
Expected Value
1. Expected Value (Mean): probability weighted sum

EX (or μx) = p1x1 + p2x2 + . . . . . + pnxn

Intuition: The expected return is our best guess as to what the value of the
return will be in the future. The expectation tells us which event is most probable.

Note:
Portfolio return with weights wx and wY is
E(wxX + wyY) = wx EX + wyEY
Expected Value
P{ x = -1 : R < 0 } = 0.3 Probability
p2
P{ x = 0 : R = 0 } = 0.5
p1
P{ x = 1 : R > 0 } = 0.2 p3

p1 + p2 + p3 = 1

E(x) = (-1)0.3 + (0)0.5 + (1)0.2 = - 0.1 -1 0 1


x1 x2 x3

The most probable outcome is R = 0, therefore the expected return is very


close to zero. But we need to measure how uncertain we are about the future.
Expected Value
Portfolios: If we combine portfolio X with Y, what is the return of the combined
portfolio?

Assign to X and to Y:
E(X + Y) = EX + EY

What happens in the future can be very different from what we expect.

We need to measure how uncertain we are about the future.


Variance
Variance: Deviation from the mean or the expected value

Var() (or x) = ]

Std. Dev.() (or ) = ](1/2)

Intuition: Future returns can take many different values in the future.
How different these values are from the mean return is captured by the variance.
Variance

Variance will be our measure of risk


Investors need to be rewarded for risk.
Assets with high risk (high variance) sold at low prices today, making the possibility of
high future returns high.
In other words, they offer high expected returns.
Questions!

1. Why would investors consider the positive deviation from expected value as
a part of their investments’ risk?

2. What are other statistical measures that can be used to measure risk?

3. What are the advantageous and disadvantageous of these measures over


variance and standard deviation?
Covariance

3. Covariance: measures relationship between two random variables X and Y


When X is above its mean, is Y also above its mean? If yes, positive
covariance/correlation

= p1(x1 – EX)(y1 – EY) + . . . + pn(xn – EX)(yn – EY)

• The covariance is the average of the product of deviations from the mean.
Covariance
Intuition: The covariance tells us whether two returns move together (positive covariance)
or move opposite to one another (negative covariance). If two variables are completely
unrelated the covariance is zero.
Correlation

4. Correlation: normalized covariance

Ensures that covariance lies between -1 and 1.

Correlation is calculated by dividing covariance by the standard deviations:

Intuition: the covariance can take any number, thus, using the covariance is difficult
to infer how strong the connection between two variables is.
Skewness

 Skewness is a statistical measure that assesses the asymmetry of a


probability distribution.
 It quantifies the extent to which the data is skewed or shifted to one side.
 Positive skewness indicates a longer tail on the right side of the distribution
 Negative skewness indicates a longer tail on the left side.
 Skewness helps in understanding the shape and outliers in a dataset.
Skewness

]
Kurtosis

 While skewness focuses on the spread (tails) of distribution,


kurtosis focuses more on the height
 It tells us how peaked or flat our distribution is
 Mesokurtic distribution (kurtosis = 3, excess kurtosis = 0): perfect normal distribution or very
close to it.
 Leptokurtic distribution (kurtosis > 3, excess kurtosis > 0): sharp peak, heavy tails
 Platykurtic distribution (kurtosis < 3, excess kurtosis < 0): flat peak, light tails
Kurtosis

]
R-Code

• Stock prices from 2008 to 2009:


JPMorgan_stock <- getSymbols("JPM", from = "2008-01-01", to = "2013-01-31", auto.assign =

FALSE)

plot(Ad(JPMorgan_stock))

GE_stock <- getSymbols("GE", from = "2008-01-01", to = "2013-01-31", auto.assign = FALSE)

plot(Ad(GE_stock))

 Notice the downward trend in price during the housing recession


R-Code

 Now let’s take JP Morgan and measure the followings:


1. Historical mean
2. Volatility during the sample we downloaded
R-Code
Historical Mean

library(xts)
diff_Log_JPMorgan <- diff (log(Ad(JPMorgan_stock)))
diff_Log_JPMorgan <- na.omit(diff_Log_JPMorgan)
JPMorgan_mean <- mean (diff_Log_JPMorgan, na.rm= TRUE)
R-Code
Variance and St. Deviation

sample_number <- length (diff_Log_JPMorgan)


JPMorgan_variance <- (1/(sample_number-1))*sum(diff_Log_JPMorgan -
JPMorgan_mean)^2
JPMorgan_std <- sqrt (JPMorgan_variance)
time_period <- length(Ad(JPMorgan_stock))
Annual_volatility <- JPMorgan_std * sqrt (time_period)
R-Code
Skewness and Kurtosis

 Let’s also measure third and fourth momentum of the distribution


R-Code
Skewness and Kurtosis

skewness_JP_Stock_price <- skewness(Ad(JPMorgan_stock))


kurtosis_JP_Stock_price <- kurtosis(Ad(JPMorgan_stock))
variance_JP_Stock_return <- var(Ad(diff_Log_JPMorgan))
skewness_JP_Stock_return <- skewness(Ad(diff_Log_JPMorgan))
kurtosis_JP_Stock_return <- kurtosis(Ad(diff_Log_JPMorgan))
R Code- Example
logret <- diff(log(JPMorgan_stock$JPM.Adjusted))
logret <- diff(log(JPMorgan_stock$JPM.Adjusted))[-1]
round(head(logret,3),6)
ret <- exp(logret)-1
logret.w <- apply.weekly(logret,sum) # Calculate log return and discrete retun for weekly
round(head(logret.w,3),6)

logret.m <- apply.monthly(logret,sum) # Monthly return


round(head(logret.m,3),6)

logret.q <- apply.quarterly(logret,sum) # Quarterly Return


ret.q <- exp(logret.q)-1
round(head(ret.q,3),6)

logret.y <- apply.yearly(logret,sum) # Annual Return


ret.y <- exp(logret.y)-1
round(tail(ret.y,3),6)

rvec <- as.vector(logret)


round(skewness(rvec),2) #skewness of normal dist is zero. Left skewed is negative

#Kurtosis
round(kurtosis(rvec),2)
Discussion

 How is these variables used in empirical studies?

 Discussion of “Who Gambles in the Stock Market?”


Grammar of Graphics
Basic Questions

 What is a graphic?
 How can we succinctly describe a graphic?
 How can we create the graphic we have described?
Grammar of Graphics

 One approach is to develop a grammar


 Grammar: The fundamental rules of an art or science

 A good grammar:
 Yields insights into the composition of complicated graphics

 Reveals unexpected connections between seemingly different graphics


Components

 Five components completely describe a wide range of graphics

1. Data and mapping

2. Layer(s)

3. Scale
4. Coordinate system

5. Facet
1. Data and Mapping

 Independent of the other components


 We can create a graphic that can be applied to multiple datasets

 Datasets are what turn an abstract graphic into a concrete graphic


1. Data and Mapping
 Variables must be mapped to aesthetics
 Aesthetics are perceived on the graphic

 X-position

 Y-position

 Encoding (size, shape, color, etc.)

 Need a specification of which variables are mapped to which aesthetics


2. Layer(s)
 One statistical transformation, called a “stat”
 Stat summarizes the data
 Line chart:
 A moving average, regression equation

 Histogram
 Box plot:
 Minimum, 1st quartile, median, 3rd quartile, and maximum

 Candlestick chart:
 open, high, low, and closing price
2. Layer(s)
 Geometric object, called a “geom”, determines the type of plot
 Every geom has a default stat and every stat has a default geom
 Point geom creates a scatterplot
 Line geom creates a line plot
 Bar geom creates a histogram

 Each geom has a limited set of aesthetics (e.g., point {position, shape, color,
size})
3. Scales

 Controls the mapping from data to aesthetic attributes


 Need one scale for each aesthetic property used in a layer

 It is a function, and its inverse, along with a set of parameters


 The inverse is used to draw a legend
 Typically map a single variable to a single aesthetic, but not always
3. Scales

 Examples:
 X and Y position on a scatterplot

 Position of a line on a chart

 Size of a geom to represent density

 Colors to represent data from different samples


4. Coordinate System (coord)
 Often specified by two coordinates (x, y), but could be more
 Cartesian system in two dimensions is most common
 Logarithmic and Semi-logarithmic systems are not uncommon
 Polar coordinates and map projections are used less frequently

 Coords affect all position variables simultaneously, differing from scales


 For example, in polar coordinates, bar geoms look like segments of a circle
5. Faceting

 Create small multiples of different subsets of an entire dataset


 Useful when searching for patterns across conditions
 Faceting specification:
 Which variables should be used to split the data

 How the subsets should be arranged


Hierarchy of Defaults

 Organizing into a layered grammar permits efficient defaulting

 Let’s look at three sets of code to plot diamond size in carats (=X) versus cost (=Y)
Example 1

diamonds
ggplot() +
layer(data = diamonds, mapping = aes(x =
carat, y = price),
geom = "point", stat = "identity", position =
"identity") +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()
Example 2

ggplot(diamonds, aes(carat, price)) +


geom_point()
Example 3

qplot(carat, price, data = diamonds)


Implications of the Layered Grammar
 Insights:
 A histogram is a combination of the geom “bar” and the stat “bin”
 Polar coordinate systems create interesting pie, bullseye, and Coxcomb charts
 Transformations can be accomplished in three ways
 Transform the data
 Transform the scales
 Transform the coordinate system
Example 4

ggplot(data = diamonds, mapping = aes(price))


+
layer(geom = "bar", stat = "bin", position =
"identity",
mapping = aes( y = ..count..))
Example 5

ggplot(diamonds, aes(x = price, fill = clarity)) +


geom_histogram(binwidth = 50)
Example 6
ggplot(diamonds, aes(x = "", fill = clarity)) +
geom_bar(width = 1) +
coord_polar(theta = "y")
Example 7

ggplot(diamonds, aes(x = "", fill = clarity)) +


geom_bar(width = 1) +
coord_polar(theta = "x")
Questions!

1. Can you create at least two other charts/graphics based on the diamonds
data?
Making Maps with the maps() Package

 Code to set up the map, including coordinates for Indianapolis, IN and Arlington TX:
library(maps)
usa <- map_data("usa")

labs <- data.frame(


long = c(-86.1581, -97.1081),
lat = c(39.7684, 32.7357),
names = c("Indy", "Arl"),
stringsAsFactors = "FALSE")
Making Maps with the maps() Package

 Code to set up the map, including coordinates for Indianapolis, IN and Arlington TX:

ggplot(usa, aes(long, lat)) +


geom_polygon(fill = "white", color = "black") +
geom_point(data = labs, aes(x = long, y = lat),
color = "black", size = 4) +
geom_point(data = labs, aes(x = long, y = lat),
color = "yellow", size = 3) +
coord_quickmap()
Resulting Map without the “group” Aesthetic
 Without the “group” aesthetic, we get unintended lines:
Adding the “group” Aesthetic
 Draw map with aes(… group = group), then add the cities (and some color)
gg1 <- ggplot() +
geom_polygon(data = usa, aes(x = long, y =
lat, group = group),
fill = "violet", color = "blue") +
coord_quickmap()
gg1
#
gg1 +
geom_point(data = labs, aes(x = long, y = lat),
color = "black", size = 4) +
geom_point(data = labs, aes(x = long, y = lat),
color = "yellow", size = 3)
Group Project - Phase 1
Data
 The data you need to complete this project can be obtained from any financial web site (e.g.,
http://finance.yahoo.com).

 You can download the monthly market factor and the monthly 30-day T-Bill return from
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research.

 Click on the link Fama/French factors and open the zip file (the data is in a notepad
version).

 NOTE: the returns on the website of Ken French are based on beginning of the month
prices. For example, the RMRF for 192607 refers to a monthly return where the portfolio
was bought on the first trading day of 192906 and sold on the first trading day of 192607.
Portfolio Construction: Stock Selection
 Suppose you had $100,000 to invest.

 Construct a portfolio, where each portfolio must contain at least 4 stocks per group
member.

 For each stock provide a justification of why you chose it.

 You need to clearly outline your investment strategy and explain how each stock
fits into your strategy.

 When forming your portfolio, you should also think about “shorting” positions.
Portfolio Construction: Stock Selection
 In Discussions 1, provide the following information for each stock in your
portfolio:

1. Name of the company and its ticker symbol


2. Number of analysts covering the stock
3. Latest consensus analyst recommendation, and
4. Other relevant fundamental information (such as PE, etc.)
 Present this information in a table, Table 1.
Rules for Selecting Stocks
1. You should not use international stocks, i.e., stocks not traded in the U.S. stock market.
2. You should not use mutual funds or any kind of bond funds for the stock selection part.
• ETFs are allowed.

3. You should not use firms that went public recently

• For which there is not enough historical data.


• Your sample should include at least 15 years of monthly data (the sample period needs to
include period where the market was up, down, and neutral)
Portfolio Construction: Optimal Portfolio weights

 Next, construct the portfolio by choosing the portfolio weights so that you maximize
the Sharpe ratio of your portfolio.

 Also, construct a portfolio that assigns equal weights to all stocks. Call this the naïve
portfolio.

 In your discussion, report the Sharpe ratio of the naïve portfolio and optimal
portfolio.
Portfolio Construction: Optimal Portfolio weights
Present in a table the portfolio weights (Table 2).

Next, for both portfolios (naïve and optimal) provide the following information:

1. Historical mean, variance, standard deviation, skewness, and kurtosis value of monthly
return
2. A visual comparison between the trend in your optimal portfolio means return and market
performance (S&P 500).
3. How many months your portfolio has outperformed the market and by how much?

Discuss (Discussion 2) how you constructed the theoretical portfolio and comment on the stock
correlations and how they related to the portfolio weights and provide a clear summary of the above
information.
Risk Measurement
Compute the following measures for each of your optimal portfolio and the naïve portfolio:

1. CAPM beta,
2. Systematic risk
3. Idiosyncratic risk using CAPM
4. Report the R-squared of the CAPM regression
5. Report the R-squared of the Fama and French three factor model
Use monthly returns to compute these risk measures. Present the risk measures in Table 3 and
discuss them in Discussion 3.
Performance Evaluation

Compute the following performance measures for each of the two portfolios:

1. Portfolio Sharpe ratio


2. Market Sharpe ratio
3. Relative Sharpe ratio (i.e., portfolio Sharpe ratio/Market Sharpe ratio)
4. Jensen’s alpha using both CAPM and Fama and French factor models.
Present the performance measures in Table 4 and discuss them in Discussion 4.
Shiny App

1. Choose one shiny app between Sharpe Ratio, CAPM model, or the Fama and
French three factor model.
2. Provide the R code for the shiny app you have selected.

Provide a short summary of your code and approach in Discussion 5 and explain
whether your app can replicate the graphs you created in Discussion 2.
Dealing with Free-Riders!

 At the end of the semester, after the presentation for the second phase, each member
turns in a peer evaluation form
 The average grade of your peer evaluation is multiplied by your group performance
and determines your individual grade for the project

You might also like