03-Data Gathering and Preparation
03-Data Gathering and Preparation
Financial Analytics
Sima Jannati
Download Finance Data
Download Finance Data
library(Quandl)
getSymbols(Symbols = "GDP", src = "FRED")
gdp <- Quandl(code = "FRED/GDP")
gdp_xts <- Quandl(code = "FRED/GDP", type = "xts")
library(tidycensus)
census_api_key("YOUR KEY GOES HERE", install = TRUE)
R Code - Census
The two main functions in tidycensus are get_decennial() for the 2000 and 2010
decennial Censuses and get_acs() for the American Community Survey
The two required arguments are geography and variables for the functions to work
The default year in get_decennial() is 2010
R Code- Census
pop10 <- get_decennial(
geography = "state",
variables = "P001001"
)
3. Covariance
4. Correlation
Random Variable
Example:
p1 + p2 + p3 = 1
-1 0 1
x1 x2 x3
In general, a random variable will take many values, e.g., the range of future stock prices are:
[0, +∞].
So we have different realizations that can occur with probabilities.
Random Variable
General Probability Distribution:
return
Simplification: Event-tree
Many times it’s easier to focus on different scenarios.
For example, when you are an analyst issuing stock recommendations.
p1 Negative returns: x1
p2 Zero return: x
2
p3 Positive returns: x3
Expected Value
1. Expected Value (Mean): probability weighted sum
Intuition: The expected return is our best guess as to what the value of the
return will be in the future. The expectation tells us which event is most probable.
Note:
Portfolio return with weights wx and wY is
E(wxX + wyY) = wx EX + wyEY
Expected Value
P{ x = -1 : R < 0 } = 0.3 Probability
p2
P{ x = 0 : R = 0 } = 0.5
p1
P{ x = 1 : R > 0 } = 0.2 p3
p1 + p2 + p3 = 1
Assign to X and to Y:
E(X + Y) = EX + EY
What happens in the future can be very different from what we expect.
Var() (or x) = ]
Intuition: Future returns can take many different values in the future.
How different these values are from the mean return is captured by the variance.
Variance
1. Why would investors consider the positive deviation from expected value as
a part of their investments’ risk?
2. What are other statistical measures that can be used to measure risk?
• The covariance is the average of the product of deviations from the mean.
Covariance
Intuition: The covariance tells us whether two returns move together (positive covariance)
or move opposite to one another (negative covariance). If two variables are completely
unrelated the covariance is zero.
Correlation
Intuition: the covariance can take any number, thus, using the covariance is difficult
to infer how strong the connection between two variables is.
Skewness
]
Kurtosis
]
R-Code
FALSE)
plot(Ad(JPMorgan_stock))
plot(Ad(GE_stock))
library(xts)
diff_Log_JPMorgan <- diff (log(Ad(JPMorgan_stock)))
diff_Log_JPMorgan <- na.omit(diff_Log_JPMorgan)
JPMorgan_mean <- mean (diff_Log_JPMorgan, na.rm= TRUE)
R-Code
Variance and St. Deviation
#Kurtosis
round(kurtosis(rvec),2)
Discussion
What is a graphic?
How can we succinctly describe a graphic?
How can we create the graphic we have described?
Grammar of Graphics
A good grammar:
Yields insights into the composition of complicated graphics
2. Layer(s)
3. Scale
4. Coordinate system
5. Facet
1. Data and Mapping
X-position
Y-position
Histogram
Box plot:
Minimum, 1st quartile, median, 3rd quartile, and maximum
Candlestick chart:
open, high, low, and closing price
2. Layer(s)
Geometric object, called a “geom”, determines the type of plot
Every geom has a default stat and every stat has a default geom
Point geom creates a scatterplot
Line geom creates a line plot
Bar geom creates a histogram
Each geom has a limited set of aesthetics (e.g., point {position, shape, color,
size})
3. Scales
Examples:
X and Y position on a scatterplot
Let’s look at three sets of code to plot diamond size in carats (=X) versus cost (=Y)
Example 1
diamonds
ggplot() +
layer(data = diamonds, mapping = aes(x =
carat, y = price),
geom = "point", stat = "identity", position =
"identity") +
scale_y_continuous() +
scale_x_continuous() +
coord_cartesian()
Example 2
1. Can you create at least two other charts/graphics based on the diamonds
data?
Making Maps with the maps() Package
Code to set up the map, including coordinates for Indianapolis, IN and Arlington TX:
library(maps)
usa <- map_data("usa")
Code to set up the map, including coordinates for Indianapolis, IN and Arlington TX:
You can download the monthly market factor and the monthly 30-day T-Bill return from
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html#Research.
Click on the link Fama/French factors and open the zip file (the data is in a notepad
version).
NOTE: the returns on the website of Ken French are based on beginning of the month
prices. For example, the RMRF for 192607 refers to a monthly return where the portfolio
was bought on the first trading day of 192906 and sold on the first trading day of 192607.
Portfolio Construction: Stock Selection
Suppose you had $100,000 to invest.
Construct a portfolio, where each portfolio must contain at least 4 stocks per group
member.
You need to clearly outline your investment strategy and explain how each stock
fits into your strategy.
When forming your portfolio, you should also think about “shorting” positions.
Portfolio Construction: Stock Selection
In Discussions 1, provide the following information for each stock in your
portfolio:
Next, construct the portfolio by choosing the portfolio weights so that you maximize
the Sharpe ratio of your portfolio.
Also, construct a portfolio that assigns equal weights to all stocks. Call this the naïve
portfolio.
In your discussion, report the Sharpe ratio of the naïve portfolio and optimal
portfolio.
Portfolio Construction: Optimal Portfolio weights
Present in a table the portfolio weights (Table 2).
Next, for both portfolios (naïve and optimal) provide the following information:
1. Historical mean, variance, standard deviation, skewness, and kurtosis value of monthly
return
2. A visual comparison between the trend in your optimal portfolio means return and market
performance (S&P 500).
3. How many months your portfolio has outperformed the market and by how much?
Discuss (Discussion 2) how you constructed the theoretical portfolio and comment on the stock
correlations and how they related to the portfolio weights and provide a clear summary of the above
information.
Risk Measurement
Compute the following measures for each of your optimal portfolio and the naïve portfolio:
1. CAPM beta,
2. Systematic risk
3. Idiosyncratic risk using CAPM
4. Report the R-squared of the CAPM regression
5. Report the R-squared of the Fama and French three factor model
Use monthly returns to compute these risk measures. Present the risk measures in Table 3 and
discuss them in Discussion 3.
Performance Evaluation
Compute the following performance measures for each of the two portfolios:
1. Choose one shiny app between Sharpe Ratio, CAPM model, or the Fama and
French three factor model.
2. Provide the R code for the shiny app you have selected.
Provide a short summary of your code and approach in Discussion 5 and explain
whether your app can replicate the graphs you created in Discussion 2.
Dealing with Free-Riders!
At the end of the semester, after the presentation for the second phase, each member
turns in a peer evaluation form
The average grade of your peer evaluation is multiplied by your group performance
and determines your individual grade for the project