CFM_Programming Task
CFM_Programming Task
July 6, 2022
1 Task1
1.1
There are some remarkable outliers in the quarterly inflation, like 256 and -35 which are obviously
not true.
I replace the outliers (defined as the 1% tails of the distribution of inflation in all periods) with
national level inflation extracted from FRED database.
The data on inflation is missing between the years 1986 and 1989. I do not do anything for these
observations. Because replacing these dates with the national level inflation can result in biased
estimations in sections 4 and 5.
1.2
The graph below depicts the median, 25th and 75th percentiles of state level inflation in each
quarter. The shaded area represents US recessions.
1
State-level inflation, USA
10 8
Quarterly Inflation %
2 4 0 6
From the above figure, I conclude that the dispersion of inflation (the distance between the 75th
and 25th percentiles) has stayed constant during the majority of the recessions. The financial crises
of 2008-2009 is an exception, during which this dispersion has decreased considerably. However, in
other recession the inflation dispersion across the states has no specific patterns. It has increased
at some times (the first recession in the plot) and in most cases remained constant.
1.3
In the attached STATA code I generate a dummy which is equal to one if the absolute value of the
difference between the inflation of the state and the median of inflation in that period is greater
than 1 (1 percentage point is equal to 100 basis points).
I then count the number of observation with that dummy equal to one and divide it by the number
of all observation in that period. This will give us the share of states that had inflation more than
100 basis points away from median and is equal to 33%
1.4
I think the question can be best answered by using time and state fixed-effects. R2 of a regression
with inflation as the dependent variable and time fixed-effects as the dependent variable will tell
us what percentage of variations in the inflation can be explained by factors that are common to
all states at each period (i.e. national factors)
On the other hand, R2 of a regression with inflation as the dependent variable and state fixed effects
as the dependent variable shows the fraction of variation in the inflation which can be explained
by state-specific factors that are constant through the course of time.
The table below shows the results of these two regression:
2
Table3. Time and State Fixed Effect R-squared
(1) (2)
Time Fixed Effect State Fixed Effect
So around 69 % of inflation can be explained by common changes in inflation across all states
(Time fixed effect) and only less than 1 % of that is due to constant differences between states.
This is reasonable, inflation (at least in the long run) will be determined by the fiscal and monetary
policies of the government; and this is not different for different states!
1.5
To check the persistency of inflation, I regress the inflation on its lag. The more the persistent
inflation is, the greater will be the coefficient of its lag.
H0 : α 1 = 0
(1) (2)
VARIABLES NO Fixed Effect Time Fixed Effect
3
Although controlling for the national factors reduced the magnitude of the coefficient of πt−1 it is
still statistically significant. So we reject the hypothesis that persistence in inflation is only due to
national factors.
2 Task2
2.1
A short definition of words:
• Extracted state/city: What I have simply extracted from the inventorlocation variable as the
sate/city of the inventor
• True states/cities: Complete and truly spelled name of cities/states in the US
I first split the location of inventor by comma. After doing this, each location is a list with few
items. We extract the state and city of the inventor by choosing the second and third items (from
the end of the list), respectively. Although this is not a universal pattern in the data set, it’s a
good start and works for many observations! (This can be done automatically; I will discuss that.)
Now I have to match these incomplete (and often misspelled) names with true names of states and
cities of US.
Ideally we need a complete list of US cities. (Since it is possible for the inventor to live in locations
that don’t appear in “PlantLocations.csv”). Here I only use the states and cities which are in the
PlantLocations file, but the extension to a more comprehensive set of cities is straight-forward.
I match the name of extracted cities and states to their true names by two different methods.
The first method utilizes Levenshtein Distance, which is implemented in FuzzyWuzzy library of
Python. This allows us to check the equality of two string variables, but in a fuzzy (rather than a
binary) way.
I compare the extracted state and city of the inventor with each state and city in the PlantLocations
file. The city and state that receives the higher score (lower Levenshtein distance) will be assigned
to the inventor.
The second method uses another library of python, abydos.phonetic. I use Russell Index algorithm,
which encode the words to numbers. Similar word receives similar numbers. In a similar process to
the previous method, I encode the name of cities and states and pick the most similar one. (i.e. the
smallest difference between encoded “extracted state” and the true states).
Below is the Pseudo-codes for the first method (See Appendix for the second method), I write them
for finding the states, but it will be exactly the same for finding cities:
first method:
for each extracted state
for each true state
4
end
end
Note that this process can be done even without extracting the city and state from the inventor-
location variable! We can simply calculate the Levenshtein distance between the inventorlocation
and states/cities and pick the state/cities with the smallest distance.
I have implemented these algorithms in python. The below figure shows the result for some sam-
ple observations. On average we are doing OK, but there are also some errors. The “Inven-
tor_city/Inventor_state” columns are result of the method. The extracted state/extracted city
columns are from raw data.
2.2
Since we have no other information, we assume that the inventor works at the nearest plant to
his/her home.
We can have a matrix (or function) which calculates the distance between any two given cities.
We then calculate (in a for loop) the distance between inventor’s home city and each plant of his
company. We then choose the minimum distance as the plant in which the inventor is working.
Below is the pseudo-code for this procedure:
for each inventor
5
for each plant of inventor's company
end
This method will be reliable if different plants of companies are located in places far from each
other. If each plant is reasonably near the other plants, then the distance may not be the most
important factor. We will probably have to gather data on plants and inventor attributes (probably
from some other data set) and then estimate a discrete-choice model (which inventors choose which
plant). We can then predict the plant at which each inventor is more likely to work.
3 Task3
3.1
First let’s write the problem with its budget constraint:
[ ∞
]
∑
t
max : V = E0 β log (ct )
t=0
s.t : kt+1 = kt − ct − δt kt
Where kt is the stock of cake remaining at period t and δt is a stochastic varaible (shock).
We assume that the initail value of cake is equal to one.
Now we can write the Bellman equation:
s.t : kt+1 = kt − ct − δt kt
Since our shock is state-dependent, we have to condition our expectation on the past realization of
the schock.
Given the inforamtoin in the question, good/bad days are a markov process with the transition
matrix P:
6
[ ]
0.8 0.2
P =
0.2 0.8
3.2
The value of log(ct ) is only a function of how much cake is eaten today (ct ). (The other two
variables are irrevelant).
In the code, I make a 3 dimentioanl matrix (100 × 100 × 2). Which shows the value of log(ct ) for
the set of variables (kt , ct , δt ). Since this is a function of ct , the only varaition is among rows of the
matrix. I then interpolate the function.
3.3
Because of log function problems around zero, I have changed the utility function to another similar
and standard function, CRRA:
c1−γ
U (c) =
1−γ
with γ = 0.8.
After solving the problem by value function iteration, I save the policy function and interpolate it.
This allows me to simulate the model. I also draw a markov process with matrix P for 100 periods.
The graph below shows the results of simulation starting from k=1 (a complete cake) and a good
day:
7
figure3. Cake eating model simulation
I have also plotted the value functions, Good days increase the value function a little bit, which is
reasonable.
8
figure4. Cake eating model value functions
4 Appendix
Pseudo-code for the second method:
for each extracted state
end
end
9
for each extracted state
for each true state
end
end
And the results:
10