Midterm Asm
Midterm Asm
June 4, 2023
1.Import data
[72]: import pandas as pd
import matplotlib.pyplot as plt
# Annual Index
annual_data = pd.read_csv(r'C:\Users\Admin\Downloads\Annual Index.csv')
annual_data['Date'] = pd.to_datetime(annual_data['Date'])
annual_data.set_index('Date', inplace=True)
# Rename Columns
index = ['Germany', 'Canada', 'USA', 'United Kingdom', 'France', 'Japan']
mapper = {"GERMANY Standard (Large+Mid Cap)": "Germany", "CANADA Standard␣
↪(Large+Mid Cap)": "Canada",
monthly_data.rename(columns=mapper, inplace=True)
annual_data.rename(columns=mapper, inplace=True)
2.Clean data by removing NA values. Calculate returns for each series in the data
set. Use tidyverse package to produce plot of the above data set in two frequencies of
monthly and annual levels as well as at both price and return levels. Provide summary
statistics of the data.
2.1 Clean data by removing NA values
[74]: # Clean data by removing NA values
monthly_data_cleaned = monthly_data.apply(pd.to_numeric, errors='coerce').
↪dropna()
1
# Calculate returns for each series
monthly_returns = monthly_data_cleaned.pct_change()
annual_returns = annual_data_cleaned.pct_change()
monthly_returns.dropna(inplace=True)
annual_returns.dropna(inplace=True)
[64]: monthly_returns
[65]: annual_returns
2.2 Use tidyverse package to produce plot of the above data set in two frequencies of monthly and
annual levels as well as at both price and return levels.
2
[75]: import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.show()
plt.show()
3
[76]: # Plotting the annual data
plt.figure(figsize=(18,5))
for column in annual_data_cleaned.columns[1:]:
plt.plot(annual_data_cleaned['Date'], annual_data_cleaned[column],␣
↪label=column)
4
2.3 Provide summary statistics of the data.
[78]: import pandas as pd
# Monthly Index
# Compute summary statistics
summary_stats = monthly_returns.describe()
United Kingdom
count 640.000000
mean 0.005680
std 0.061128
min -0.217360
5
25% -0.026872
50% 0.006536
75% 0.038066
max 0.554762
median 0.006536
• Count: The count indicates the number of monthly return data points available for each
country, which is 640 in this case. This suggests that there are no missing values in the
dataset.
• Mean: The mean represents the average monthly return for each country. On average, all
countries have positive returns, ranging from 0.005680 (United Kingdom) to 0.007177 (Japan).
• Standard Deviation: The standard deviation measures the dispersion or variability of the
monthly returns around the mean. Countries like Germany and France have relatively higher
standard deviations, indicating greater volatility in their returns compared to other countries.
• Minimum and Maximum: These values represent the lowest and highest monthly returns
observed across all countries. For example, the minimum return is -0.271553 (Canada), and
the maximum return is 0.262018 (France).
• Quartiles: The quartiles (25%, 50%, and 75%) provide information about the distribution
of returns. The median (50%) represents the middle value, separating the data into two equal
halves. The interquartile range (75% - 25%) gives an indication of the spread of the data.
For example, the 25th percentile (Q1) for France is -0.030122, while the 75th percentile (Q3)
is 0.045426.
[79]: # Annual Index
# Compute summary statistics
summary_stats = annual_returns.describe()
6
countries have positive returns, ranging from 0.036942 (France) to 0.187846 (Japan).
• Standard Deviation: The standard deviation measures the dispersion or variability of
the annual returns around the mean. Countries like France and Japan have relatively higher
standard deviations, indicating greater volatility in their returns compared to other countries.
• Minimum and Maximum: These values represent the lowest and highest annual returns
observed across all countries. For example, the minimum return is -0.344408 (France), and
the maximum return is 1.211648 (Japan).
• Quartiles: The quartiles (25%, 50%, and 75%) provide information about the distribution
of returns. The median (50%) represents the middle value, separating the data into two equal
halves. The interquartile range (75% - 25%) gives an indication of the spread of the data.
For example, the 25th percentile (Q1) for France is -0.091996, while the 75th percentile (Q3)
is 0.208468.
3. Perform Cointegration Test:
a. Using the Engle – Grangle 2 – step method to detect for any possible cointegration between the
US and each of the other 5 countries. For the other data sets, you can decide pairs of each two
series for your study.
[94]: # Checking Order of Intergration
import statsmodels.api as sm
adf_test = sm.tsa.adfuller(monthly_data_cleaned[country])
adf_test_result[country] = {'ADF Statistic': adf_test[0],
'p-value': adf_test[1]}
print(f"Augmented Dickey-Fuller Test for {country}:")
print(f"ADF Statistic: {adf_test[0]}")
print(f"p-value: {adf_test[1]}\n")
7
Augmented Dickey-Fuller Test for USA:
ADF Statistic: 2.999255142701915
p-value: 1.0
=> Based on the conclusions above, the USA series is the only one that is stationary.
Proceed with the Engle-Granger two-step method to detect cointegration between the US and each
of the other five countries
US and France
[120]: import statsmodels.api as sm
from statsmodels.tsa.api import VAR
from statsmodels.tsa.vector_ar.vecm import VECM
print("Long-Run Coefficients:")
print(long_run_coefficients)
vecm_results = vecm_model.fit()
cointegration_rank = vecm_results.coint_rank
short_run_coefficients = vecm_results.alpha
8
Long-Run Coefficients:
USA France
const 3.908657 6.610449
L1.USA 0.687983 -0.158048
L1.France 0.200517 1.159797
L2.USA 0.296998 0.225846
L2.France -0.231164 -0.257889
L3.USA 0.099102 -0.079512
L3.France 0.028698 0.187625
L4.USA 0.080207 0.073887
L4.France -0.129421 -0.123000
L5.USA -0.043345 -0.018558
L5.France 0.079933 0.034507
L6.USA -0.237960 -0.119492
L6.France 0.134274 0.072083
L7.USA 0.330034 0.201609
L7.France -0.099939 -0.145689
L8.USA -0.023283 -0.097676
L8.France -0.070396 0.038186
L9.USA -0.193279 -0.074251
L9.France -0.010026 -0.055345
L10.USA -0.066086 -0.041431
L10.France 0.166274 0.147870
L11.USA 0.054160 0.133417
L11.France -0.106036 -0.189471
L12.USA -0.141833 -0.188946
L12.France 0.112326 0.196146
L13.USA 0.117282 0.217650
L13.France -0.080625 -0.117716
L14.USA -0.235244 -0.266335
L14.France 0.024814 0.080239
L15.USA 0.292260 0.208215
L15.France -0.032086 -0.045259
Cointegration Rank: 1
Short-Run Coefficients:
[[0.01260729]
[0.00860016]]
=> The long-run coefficients show the relationship between the USA and France variables over time.
Positive or negative coefficients indicate the direction of the relationship. The cointegration rank of
1 suggests a stable long-term relationship between the variables. In the short run, the coefficients
indicate how the variables adjust towards the long-run equilibrium. The results provide insights into
the long-run and short-run dynamics between the USA and France variables.
US and Japan
[122]: # Assuming monthly_data_cleaned is a DataFrame containing the monthly data for␣
↪the US and Japan
9
# Step 1: VAR Lag Length Selection
model = VAR(monthly_data_cleaned[['USA', 'Japan']])
lag_order = model.select_order()
print("Long-Run Coefficients:")
print(long_run_coefficients)
vecm_results = vecm_model.fit()
cointegration_rank = vecm_results.coint_rank
short_run_coefficients = vecm_results.alpha
Long-Run Coefficients:
USA Japan
const 0.437536 21.078081
L1.USA 0.835417 -0.029675
L1.Japan 0.023783 1.045266
L2.USA 0.146366 0.008814
L2.Japan -0.030743 -0.083121
L3.USA 0.081900 0.046576
L3.Japan 0.023560 0.089672
L4.USA 0.027683 0.145836
L4.Japan -0.043590 -0.110146
L5.USA -0.016210 -0.094526
L5.Japan 0.038912 0.099939
L6.USA -0.144610 -0.106945
L6.Japan -0.004949 -0.114778
L7.USA 0.280993 0.095119
L7.Japan -0.013329 0.062392
L8.USA -0.136280 -0.160680
L8.Japan 0.017069 0.112720
L9.USA -0.171701 -0.096308
10
L9.Japan -0.010307 0.007940
L10.USA 0.062338 0.189190
L10.Japan 0.010391 -0.088132
L11.USA -0.022810 -0.278016
L11.Japan -0.000344 0.033928
L12.USA -0.009760 0.191722
L12.Japan -0.018134 -0.068157
L13.USA 0.042276 0.195088
L13.Japan 0.001919 -0.087733
L14.USA -0.246197 -0.193043
L14.Japan 0.046028 0.147078
L15.USA 0.356702 0.089102
L15.Japan -0.073751 -0.015872
L16.USA -0.020649 0.154549
L16.Japan 0.031127 -0.193474
L17.USA -0.058097 -0.150804
L17.Japan 0.002356 0.152533
Cointegration Rank: 1
Short-Run Coefficients:
[[0.00684774]
[0.0019781 ]]
=> The long-run coefficients show the relationship between the USA and Japan variables. Positive
coefficients indicate a positive impact, while negative coefficients indicate a negative impact. The
cointegration rank of 1 suggests a stable long-term relationship. The short-run coefficients represent
how the variables adjust towards the long-run equilibrium. In summary, the results indicate the
long-run and short-run dynamics between the USA and Japan variables.
US and Canada
[123]: # Assuming monthly_data_cleaned is a DataFrame containing the monthly data for␣
↪the US and Canada
print("Long-Run Coefficients:")
print(long_run_coefficients)
11
vecm_model = VECM(monthly_data_cleaned[['USA', 'Canada']],␣
↪k_ar_diff=selected_lag_order)
vecm_results = vecm_model.fit()
cointegration_rank = vecm_results.coint_rank
short_run_coefficients = vecm_results.alpha
Long-Run Coefficients:
USA Canada
const 2.103576 4.265292
L1.USA 0.800352 -0.137374
L1.Canada 0.026192 1.061635
L2.USA 0.080777 -0.034169
L2.Canada 0.111629 0.159191
L3.USA 0.243946 0.238537
L3.Canada -0.206150 -0.265254
L4.USA 0.022779 -0.050546
L4.Canada -0.070137 0.063188
L5.USA -0.068556 0.053757
L5.Canada 0.124801 -0.096839
L6.USA -0.093869 0.039233
L6.Canada -0.045115 -0.080220
L7.USA 0.286016 0.053710
L7.Canada -0.026954 0.057376
L8.USA -0.073615 -0.088764
L8.Canada -0.064524 0.026346
L9.USA -0.166921 -0.003251
L9.Canada -0.017373 -0.067218
L10.USA -0.165870 -0.129777
L10.Canada 0.310808 0.254638
L11.USA 0.021461 -0.138703
L11.Canada -0.050913 0.087514
L12.USA 0.057059 0.172041
L12.Canada -0.127966 -0.279571
L13.USA 0.167872 0.228251
L13.Canada -0.149208 -0.182770
L14.USA -0.312103 -0.304856
L14.Canada 0.143470 0.178440
L15.USA 0.209442 0.107526
L15.Canada 0.038053 0.076652
Cointegration Rank: 1
Short-Run Coefficients:
[[0.00668387]
12
[0.00225761]]
• The long-run coefficients indicate the relationship between the USA and Canada variables.
Positive coefficients suggest a positive impact, while negative coefficients indicate a negative
impact. The cointegration rank of 1 implies a stable long-term relationship between the
variables.
• The short-run coefficients represent how the variables adjust towards the long-run equilibrium.
The USA variable, on average, adjusts by 0.0067 units in response to a one-unit deviation
from the long-run equilibrium, while the Canada variable adjusts by 0.0023 units.
Overall, these results provide insights into the long-run and short-run dynamics between the USA
and Canada variables, indicating the nature and magnitude of their relationship over time.
US and Germany
[124]: # Assuming monthly_data_cleaned is a DataFrame containing the monthly data for␣
↪the US and Germany
print("Long-Run Coefficients:")
print(long_run_coefficients)
vecm_results = vecm_model.fit()
cointegration_rank = vecm_results.coint_rank
short_run_coefficients = vecm_results.alpha
Long-Run Coefficients:
USA Germany
const 2.063929 7.654177
L1.USA 0.726840 -0.087282
L1.Germany 0.136558 1.113275
13
L2.USA 0.230449 0.122849
L2.Germany -0.134446 -0.161848
L3.USA 0.126409 -0.071666
L3.Germany -0.014933 0.105742
L4.USA 0.030641 -0.002708
L4.Germany -0.033123 0.012719
L5.USA 0.012816 0.109731
L5.Germany -0.001405 -0.115922
L6.USA -0.275762 -0.185353
L6.Germany 0.147663 0.095693
L7.USA 0.339938 0.273059
L7.Germany -0.084155 -0.095933
L8.USA -0.036124 -0.063231
L8.Germany -0.079233 -0.026895
L9.USA -0.235343 -0.231054
L9.Germany 0.065403 0.116426
L10.USA 0.050010 0.089520
L10.Germany 0.005692 -0.060252
L11.USA 0.001921 0.064126
L11.Germany -0.010941 -0.080004
L12.USA -0.132395 -0.250843
L12.Germany 0.084883 0.219011
L13.USA 0.179588 0.354981
L13.Germany -0.137836 -0.213253
L14.USA -0.275636 -0.216802
L14.Germany 0.064414 0.060699
L15.USA 0.271408 0.103653
L15.Germany -0.016097 0.019402
Cointegration Rank: 1
Short-Run Coefficients:
[[0.01044952]
[0.00366845]]
• The long-run coefficients reveal the relationship between the USA and Germany variables.
Positive coefficients suggest a positive impact, while negative coefficients indicate a negative
impact. The cointegration rank of 1 implies a stable long-term relationship between the
variables.
• The short-run coefficients represent the adjustment process towards the long-run equilibrium.
On average, the USA variable adjusts by 0.0104 units in response to a one-unit deviation from
the long-run equilibrium, while the Germany variable adjusts by 0.0037 units.
Overall, these results provide insights into the long-run and short-run dynamics between the USA
and Germany variables, indicating the nature and magnitude of their relationship over time.
US and United Kingdom
[126]: # Assuming monthly_data_cleaned is a DataFrame containing the monthly data for␣
↪the US and United Kingdom
14
# Step 1: VAR Lag Length Selection
model = VAR(monthly_data_cleaned[['USA', 'United Kingdom']])
lag_order = model.select_order()
print("Long-Run Coefficients:")
print(long_run_coefficients)
vecm_results = vecm_model.fit()
cointegration_rank = vecm_results.coint_rank
short_run_coefficients = vecm_results.alpha
Long-Run Coefficients:
USA United Kingdom
const 2.995636 4.710506
L1.USA 0.716072 -0.073310
L1.United Kingdom 0.311737 1.126344
L2.USA 0.293435 0.050389
L2.United Kingdom -0.398068 -0.110440
L3.USA 0.104088 0.008351
L3.United Kingdom 0.026379 0.049105
L4.USA 0.022567 0.025779
L4.United Kingdom -0.112983 -0.022658
L5.USA 0.001960 0.081420
L5.United Kingdom 0.041938 -0.152335
L6.USA -0.241266 -0.081354
L6.United Kingdom 0.253374 0.118916
L7.USA 0.328047 0.040925
L7.United Kingdom -0.180078 -0.062693
L8.USA -0.061390 -0.022641
L8.United Kingdom -0.070519 0.005272
L9.USA -0.270352 -0.089684
15
L9.United Kingdom 0.184072 0.147493
L10.USA 0.000562 0.004111
L10.United Kingdom 0.100389 -0.030379
L11.USA 0.076528 0.056845
L11.United Kingdom -0.168634 -0.125460
L12.USA -0.045012 -0.084944
L12.United Kingdom -0.021101 0.125920
L13.USA 0.096879 0.144180
L13.United Kingdom -0.059099 -0.181658
L14.USA -0.339230 -0.131805
L14.United Kingdom 0.237577 0.173212
L15.USA 0.328576 0.074093
L15.United Kingdom -0.152993 -0.067265
Cointegration Rank: 1
Short-Run Coefficients:
[[0.01006416]
[0.00181731]]
• The long-run coefficients represent the relationship between the USA and United Kingdom
variables. Positive coefficients indicate a positive impact, while negative coefficients indicate a
negative impact. The cointegration rank of 1 suggests a stable long-term relationship between
the variables.
• The short-run coefficients indicate the adjustment process towards the long-run equilibrium.
On average, the USA variable adjusts by 0.0101 units in response to a one-unit deviation
from the long-run equilibrium, while the United Kingdom variable adjusts by 0.0018 units.
These results provide insights into the long-run and short-run dynamics between the USA and
United Kingdom variables, revealing the nature and magnitude of their relationship over time.
b. Using the Johansen technique to detect for any possible cointegration among the 6 countries.
[138]: # Step1: Checking Order of Intergration
import statsmodels.api as sm
adf_test = sm.tsa.adfuller(monthly_data_cleaned[country])
adf_test_result[country] = {'ADF Statistic': adf_test[0],
'p-value': adf_test[1]}
print(f"Augmented Dickey-Fuller Test for {country}:")
print(f"ADF Statistic: {adf_test[0]}")
print(f"p-value: {adf_test[1]}\n")
16
Augmented Dickey-Fuller Test for Canada:
ADF Statistic: -0.049139645100995735
p-value: 0.9542746431528027
=> Based on the conclusions above, the USA series is the only one that is stationary.
[134]: #Step 2: VAR Lag Length Selection
from statsmodels.tsa.api import VAR
# Subset the data to include only the stationary variable (USA) and␣
↪non-stationary variables
17
Selected Lag Order: 10
18
statistics values are [44.53532035, 41.65507387, 25.03347123, 8.82308117, 5.88118436, 0.78812185].
- These statistics are compared with the critical values to assess the presence of cointegration.
3. Critical values at 95% confidence: - The critical values indicate the threshold values for
accepting or rejecting the null hypothesis of no cointegration. - The critical values provided are for
a 95% confidence level.
[ ]: # We compare the trace statistics and max-eigen statistics with the␣
↪corresponding critical values.
# If the test statistics exceed the critical values, it suggests the presence␣
↪of cointegration.
The trace statistics and max-eigen statistics for all six eigenvalues are above the critical values.
This indicates that there is evidence of cointegration among the variables.
=> Therefore, we can conclude that there is a possibility of cointegration among the six countries
in the dataset (US, UK, France, Japan, Canada, and Germany).
[ ]:
19