SciPy and StatsModels For Finan - Publishing, Reactive
SciPy and StatsModels For Finan - Publishing, Reactive
S TAT S M O D E L S F O R
FINANCIAL MODELING
Reactive Publishing
CONTENTS
Title Page
Chapter 1: Introduction to Financial Modeling
Chapter 2: Statistical Foundations in Finance
Chapter 3: Time Series Modeling with SciPy
Chapter 4: Regression Analysis Using StatsModels
Chapter 5: Portfolio Optimization with SciPy
Chapter 6: Econometric Models and Applications in Finance
Chapter 7: Advanced Topics and Case Studies
CHAPTER 1:
INTRODUCTION TO
FINANCIAL MODELING
I
n modern finance, where vast amounts of data intersect with
strategic decision-making, financial modeling stands as a pillar of
clarity and direction. financial modeling is the construction of a
mathematical representation (a model) of a financial asset, portfolio,
or investment strategy. These models are indispensable tools for
analysts, investors, and corporate managers, facilitating informed
decisions grounded in quantitative analysis.
1. Defining the Scope: Clearly outline the objectives and the scope of
the model. This involves understanding what the model is intended
to achieve and the specific questions it will answer.
6. Net Income: The bottom line of the income statement, net income
is the total profit or loss after all expenses, including taxes and
interest, have been deducted from the total revenue. It is a key
indicator of a company's profitability.
```python
import pandas as pd
data = {
'Item': ['Revenue', 'COGS', 'Gross Profit', 'Operating Expenses',
'Operating Income', 'Net Income'],
'Amount': [500000, 200000, 300000, 150000, 150000, 120000]
}
income_statement = pd.DataFrame(data)
print(income_statement)
```
```python
data = {
'Category': ['Current Assets', 'Non-Current Assets', 'Total Assets',
'Current Liabilities', 'Non-Current Liabilities', 'Total Liabilities',
'Shareholders’ Equity'],
'Amount': [150000, 250000, 400000, 100000, 50000, 150000,
250000]
}
balance_sheet = pd.DataFrame(data)
print(balance_sheet)
```
The cash flow statement is critical for assessing the liquidity and
solvency of a company. It helps investors understand how well a
company generates cash to pay its debt obligations and fund its
operating expenses.
```python
data = {
'Category': ['Net Income', 'Depreciation', 'Change in Working Capital',
'Cash from Operating Activities', 'Purchase of Equipment', 'Cash
from Investing Activities', 'Debt Issued', 'Debt Repaid', 'Cash from
Financing Activities'],
'Amount': [120000, 20000, -10000, 130000, -50000, -50000, 40000,
-20000, 20000]
}
cash_flow_statement = pd.DataFrame(data)
print(cash_flow_statement)
```
Liquidity Ratios
\[
{Current Ratio} = \frac{{Current Assets}}{{Current Liabilities}}
\]
A current ratio greater than 1 suggests that the company has more
current assets than current liabilities, implying good short-term
financial health.
2. Quick Ratio (Acid-Test Ratio): This ratio refines the current ratio by
excluding inventory from current assets, as inventory is not as
readily convertible to cash.
\[
{Quick Ratio} = \frac{{Current Assets} - {Inventory}}{{Current
Liabilities}}
\]
```python
current_assets = 200000
inventory = 50000
current_liabilities = 100000
Profitability Ratios
\[
{Gross Profit Margin} = \frac{{Gross Profit}}{{Revenue}} \times 100
\]
\[
{Operating Margin} = \frac{{Operating Income}}{{Revenue}} \times
100
\]
\[
{Net Profit Margin} = \frac{{Net Income}}{{Revenue}} \times 100
\]
```python
revenue = 500000
cogs = 200000
operating_expenses = 150000
net_income = 120000
Efficiency Ratios
\[
{Inventory Turnover} = \frac{{COGS}}{{Average Inventory}}
\]
\[
{Receivables Turnover} = \frac{{Net Credit Sales}}{{Average
Accounts Receivable}}
\]
```python
average_inventory = 40000
average_receivables = 30000
net_credit_sales = 450000
Leverage Ratios
\[
{Debt-to-Equity Ratio} = \frac{{Total Liabilities}}{{Shareholders'
Equity}}
\]
\[
{Interest Coverage Ratio} = \frac{{Operating Income}}{{Interest
Expense}}
\]
Example using Python:
```python
total_liabilities = 150000
shareholders_equity = 250000
interest_expense = 20000
\[
{P/E Ratio} = \frac{{Market Price per Share}}{{Earnings per Share}}
\]
```python
market_price_per_share = 50
earnings_per_share = 5
book_value_per_share = 30
```python
# Apple Inc. financial data (example figures)
aapl_data = {
'current_assets': 143000000000,
'inventory': 4000000000,
'current_liabilities': 105000000000,
'revenue': 274515000000,
'cogs': 169559000000,
'operating_expenses': 43788000000,
'net_income': 57411000000,
'average_inventory': 5000000000,
'average_receivables': 15000000000,
'net_credit_sales': 270000000000,
'total_liabilities': 287000000000,
'shareholders_equity': 65339000000,
'interest_expense': 3000000000,
'market_price_per_share': 145,
'earnings_per_share': 3.28,
'book_value_per_share': 20.11
}
# Calculations
current_ratio = aapl_data['current_assets'] /
aapl_data['current_liabilities']
quick_ratio = (aapl_data['current_assets'] - aapl_data['inventory']) /
aapl_data['current_liabilities']
gross_profit = aapl_data['revenue'] - aapl_data['cogs']
operating_income = gross_profit - aapl_data['operating_expenses']
gross_margin = (gross_profit / aapl_data['revenue']) * 100
operating_margin = (operating_income / aapl_data['revenue']) * 100
net_margin = (aapl_data['net_income'] / aapl_data['revenue']) * 100
inventory_turnover = aapl_data['cogs'] /
aapl_data['average_inventory']
receivables_turnover = aapl_data['net_credit_sales'] /
aapl_data['average_receivables']
debt_to_equity_ratio = aapl_data['total_liabilities'] /
aapl_data['shareholders_equity']
interest_coverage_ratio = operating_income /
aapl_data['interest_expense']
pe_ratio = aapl_data['market_price_per_share'] /
aapl_data['earnings_per_share']
pb_ratio = aapl_data['market_price_per_share'] /
aapl_data['book_value_per_share']
```python
import numpy as np
# Assumptions
cash_flows = [10000, 15000, 20000, 25000, 30000] # Projected
cash flows
discount_rate = 0.1 # Discount rate (10%)
terminal_value = 350000 # Terminal value
# Calculate the present value of projected cash flows
present_value_cash_flows = np.sum([cf / (1 + discount_rate) i for i,
cf in enumerate(cash_flows, start=1)])
# Calculate the present value of terminal value
present_value_terminal = terminal_value / (1 + discount_rate)
len(cash_flows)
```python
# Peer group data (example figures)
peer_group = {
'CompanyA': {'P/E': 15, 'EV/EBITDA': 10, 'P/B': 2},
'CompanyB': {'P/E': 18, 'EV/EBITDA': 12, 'P/B': 2.5},
'CompanyC': {'P/E': 16, 'EV/EBITDA': 11, 'P/B': 2.2}
}
```python
# Past transaction data (example figures)
transactions = {
'Transaction1': {'EV/Revenue': 2.5, 'EV/EBITDA': 8},
'Transaction2': {'EV/Revenue': 3, 'EV/EBITDA': 9.5},
'Transaction3': {'EV/Revenue': 2.8, 'EV/EBITDA': 8.5}
}
```python
# Assumptions
purchase_price = 100000000
equity_contribution = 30000000
debt_amount = purchase_price - equity_contribution
exit_multiple = 10
investment_horizon = 5
```python
# Base-case assumptions (example figures)
revenue_growth_rate = 0.05
ebitda_margin = 0.2
discount_rate = 0.1
# Sensitivity Analysis
sensitivity_results = {}
for growth_rate in np.arange(0.03, 0.08, 0.01):
for margin in np.arange(0.15, 0.25, 0.02):
sensitivity_results[(growth_rate, margin)] =
calculate_dcf_value(growth_rate, margin, discount_rate)
print("Sensitivity Analysis Results:")
for key, value in sensitivity_results.items():
print(f"Growth Rate: {key[0]*100:.2f}%, EBITDA Margin:
{key[1]*100:.2f}% - DCF Value: ${value:,.2f}")
```
```python
import pandas as pd
import statsmodels.api as sm
# Sample data
data = {
'GDP': [2.9, 3.1, 2.7, 3.3, 2.8],
'Interest_Rate': [1.5, 1.7, 1.6, 1.8, 1.6],
'Stock_Price': [150, 155, 148, 160, 152]
}
df = pd.DataFrame(data)
# Define the dependent and independent variables
X = df[['GDP', 'Interest_Rate']]
y = df['Stock_Price']
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Create a DataFrame
df = pd.DataFrame(data, index=date_rng, columns=['Value'])
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create a DataFrame
df = pd.DataFrame(data, index=date_rng, columns=['Value'])
```python
import numpy as np
import matplotlib.pyplot as plt
Scenario Analysis
```python
# Define scenarios
baseline_growth_rate = 0.05
best_case_growth_rate = 0.07
worst_case_growth_rate = 0.03
Before diving into Python for financial applications, you need to set
up your Python environment. Here’s a step-by-step guide:
```bash
# Create a virtual environment
conda create --name finance_env python=3.8
```bash
# Install NumPy for numerical computations
pip install numpy
```python
# Lists for ordered collections
stock_prices = [150, 155, 148, 160, 152]
```python
# Define a function to calculate the average of a list
def calculate_average(prices):
total = sum(prices)
return total / len(prices)
```python
# Load data from a CSV file
df = pd.read_csv('financial_data.csv')
```python
# Filter rows based on a condition
high_gdp = df[df['GDP'] > 3.0]
# Aggregate data
average_gdp = df['GDP'].mean()
# Merge data
df2 = pd.read_csv('additional_financial_data.csv')
merged_df = pd.merge(df, df2, on='Date')
print(merged_df.head())
```
```python
import matplotlib.pyplot as plt
```python
from scipy import stats
mean_stock_price = stats.tmean(df['Stock_Price'])
median_stock_price = stats.scoreatpercentile(df['Stock_Price'], 50)
std_stock_price = stats.tstd(df['Stock_Price'])
print(f"Mean: {mean_stock_price}, Median: {median_stock_price},
Std Dev: {std_stock_price}")
```
```python
import statsmodels.api as sm
1. Load the Data: First, we load historical stock data into a Pandas
DataFrame.
```python
df = pd.read_csv('historical_stock_data.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
```
```python
# Simple Moving Average (SMA)
df['SMA_50'] = df['Close'].rolling(window=50).mean()
3. Plot the Data: Visualize the stock prices along with the moving
averages.
```python
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Closing Price')
plt.plot(df['SMA_50'], label='50-Day SMA')
plt.plot(df['EMA_50'], label='50-Day EMA')
plt.legend()
plt.title('Stock Price with Moving Averages')
plt.show()
```
4. Perform Regression Analysis: Use StatsModels to analyze the
relationship between stock prices and economic indicators.
```python
X = df[['GDP', 'Interest_Rate']]
y = df['Close']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())
```
1. Download and Install Python: Ensure you have the latest version
of Python installed. You can download it from the official Python
website: [python.org](https://www.python.org/).
```bash
# Create a virtual environment
conda create --name finance_env python=3.8
```bash
# Install SciPy for scientific computing
conda install scipy
```python
import numpy as np
import pandas as pd
import scipy
import statsmodels.api as sm
```bash
# Install Jupyter Notebook
conda install jupyter
```
```bash
jupyter notebook
```
This will open a new tab in your default web browser, providing an
interactive workspace to write and execute Python code.
```python
from scipy import optimize
```python
import statsmodels.api as sm
This example demonstrates the simplicity with which you can apply
sophisticated statistical models using StatsModels.
```python
import pandas as pd
3. Statistical Analysis: You can now use SciPy and StatsModels for
more advanced analysis. For example, let's perform a regression
analysis to understand the relationship between returns and other
economic indicators.
```python
from scipy import stats
import statsmodels.api as sm
```python
import matplotlib.pyplot as plt
Optimization
Optimization is fundamental in financial modeling, particularly in
portfolio optimization and risk management. SciPy offers a variety of
optimization methods through its `optimize` module.
```python
import numpy as np
```python
from scipy.optimize import minimize
num_assets = 4
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
bounds = tuple((0, 1) for asset in range(num_assets))
```
3. Optimization: Use the `minimize` function to find the optimal
weights.
```python
initial_guess = num_assets * [1. / num_assets]
cov_matrix = np.array([[0.1, 0.2, 0.1, 0.3], [0.2, 0.3, 0.4, 0.2], [0.1,
0.4, 0.5, 0.3], [0.3, 0.2, 0.3, 0.4]])
if result.success:
print(f"Optimal Weights: {result.x}")
else:
print("Optimization failed.")
```
Integration
```python
def pdf(x):
return np.exp(-x2 / 2) / np.sqrt(2 * np.pi)
```
```python
from scipy.integrate import quad
In this case, the `quad` function integrates the PDF from negative
infinity to positive infinity, yielding the total area under the curve.
Interpolation
```python
maturities = np.array([1, 2, 5, 10])
yields = np.array([0.5, 0.75, 1.5, 2.0])
```
```python
from scipy.interpolate import interp1d
```python
new_maturities = np.array([3, 4, 6, 7])
estimated_yields = yield_curve(new_maturities)
print(f"Estimated Yields: {estimated_yields}")
```
Linear Algebra
Linear algebra functions are essential for solving systems of linear
equations, performing matrix operations, and handling eigenvalue
problems. These functions are particularly relevant in risk
management and portfolio optimization.
```python
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 10]])
b = np.array([6, 15, 25])
```
2. Solve the System: Use the `solve` function to find the solution.
```python
from scipy.linalg import solve
x = solve(A, b)
print(f"Solution: {x}")
```
Special Functions
SciPy provides a collection of special functions that are often used in
financial modeling, such as the gamma function and the error
function.
```python
from scipy.special import gamma
result = gamma(5)
print(f"Gamma(5): {result}")
```
```python
from scipy.fft import fft
fft_result = fft(s)
```
```python
import matplotlib.pyplot as plt
plt.plot(np.abs(fft_result))
plt.title('Frequency Components')
plt.xlabel('Frequency')
plt.ylabel('Amplitude')
plt.show()
```
Descriptive Statistics
```python
import statsmodels.api as sm
data = sm.datasets.get_rdataset("mtcars").data
mean_val = data['mpg'].mean()
print(f"Mean MPG: {mean_val}")
```
```python
std_val = data['mpg'].std()
print(f"Standard Deviation of MPG: {std_val}")
```
```python
desc_stats = data.describe()
print(desc_stats)
```
```python
from statsmodels.stats.weightstats import ttest_ind
```python
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
```python
X = sm.add_constant(data['hp']) # Adding a constant term
y = data['mpg']
ols_model = sm.OLS(y, X).fit()
print(ols_model.summary())
```
```python
logit_model = sm.Logit(data['am'], sm.add_constant(data[['hp',
'wt']])).fit()
print(logit_model.summary())
```
ts_data = data['mpg']
arima_model = ARIMA(ts_data, order=(1, 1, 1)).fit()
print(arima_model.summary())
```
```python
from statsmodels.tsa.holtwinters import ExponentialSmoothing
exp_smooth_model = ExponentialSmoothing(ts_data,
seasonal='add', seasonal_periods=12).fit()
print(exp_smooth_model.summary())
```
```python
from statsmodels.tsa.api import VAR
model = VAR(data[['mpg', 'hp', 'wt']])
var_results = model.fit(maxlags=2)
print(var_results.summary())
```
```python
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(data[['mpg', 'hp', 'wt']])
print(pca_result)
```
```python
import matplotlib.pyplot as plt
```python
from statsmodels.stats.stattools import durbin_watson
dw_stat = durbin_watson(ols_model.resid)
print(f"Durbin-Watson statistic: {dw_stat}")
```
Prerequisites
```bash
# Install Anaconda
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-
x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
# Run Jupyter Notebook
jupyter notebook
```
```bash
# Install SciPy
pip install scipy
```
10.2.3 StatsModels Library
```bash
# Install StatsModels
pip install statsmodels
```
```bash
# Install NumPy
pip install numpy
```
```bash
# Install Pandas
pip install pandas
```
```bash
# Install Matplotlib and Seaborn
pip install matplotlib seaborn
```
```bash
# Create a virtual environment
python -m venv financial_modeling_env
# Activate the virtual environment
# For Windows
financial_modeling_env\Scripts\activate
# For Unix or MacOS
source financial_modeling_env/bin/activate
```
```bash
# Create a requirements.txt file
echo -e "numpy\npandas\nscipy\nstatsmodels\nmatplotlib\nseaborn"
> requirements.txt
# Install libraries
pip install -r requirements.txt
```
```bash
# Install Jupyter Notebook
pip install jupyter
# Install ipykernel
pip install ipykernel
# Add the virtual environment as a Jupyter kernel
python -m ipykernel install --user --name=financial_modeling_env
# Run Jupyter Notebook
jupyter notebook
```
```python
import yfinance as yf
10.4.2 Quandl
```python
import pandas_datareader as pdr
```python
# Drop rows with missing values
clean_data = data.dropna()
# Fill missing values with a specified value (e.g., mean of the
column)
clean_data = data.fillna(data.mean())
# Remove outliers
clean_data = data[(data['column'] > lower_bound) & (data['column'] <
upper_bound)]
```
```python
from sklearn.preprocessing import StandardScaler
# Normalize data
scaler = StandardScaler()
normalized_data = scaler.fit_transform(data)
print(normalized_data)
```
I
n financial analysis, the ability to succinctly summarize and
interpret data is crucial. Descriptive statistics provide the
fundamental tools necessary to achieve this. By employing
measures such as mean, median, variance, and standard deviation,
financial analysts can distill large datasets into comprehensible
insights, facilitating more informed decision-making. In this section,
we will explore the application of descriptive statistics within the
context of financial data, demonstrating how these techniques can
be employed to uncover valuable patterns and trends.
# Mean
The mean, or average, is the sum of all data points divided by the
number of points. It provides a central value around which data
points tend to cluster.
```python
import pandas as pd
# Sample financial data
data = {'Price': [100, 102, 98, 101, 99, 100, 97]}
df = pd.DataFrame(data)
# Calculate mean
mean_price = df['Price'].mean()
print(f"Mean Price: {mean_price}")
```
# Median
The median is the middle value when a data set is ordered from
least to greatest. It is less affected by outliers than the mean.
```python
# Calculate median
median_price = df['Price'].median()
print(f"Median Price: {median_price}")
```
# Mode
```python
# Calculate mode
mode_price = df['Price'].mode()
print(f"Mode Price: {mode_price[0]}")
```
0.12.2 Measures of Dispersion
# Variance
```python
# Calculate variance
variance_price = df['Price'].var()
print(f"Variance Price: {variance_price}")
```
# Standard Deviation
```python
# Calculate standard deviation
std_dev_price = df['Price'].std()
print(f"Standard Deviation Price: {std_dev_price}")
```
# Range
```python
# Calculate range
range_price = df['Price'].max() - df['Price'].min()
print(f"Range Price: {range_price}")
```
The IQR is the range between the first quartile (25th percentile) and
the third quartile (75th percentile). It is useful for understanding the
middle spread of the data.
```python
# Calculate IQR
iqr_price = df['Price'].quantile(0.75) - df['Price'].quantile(0.25)
print(f"Interquartile Range Price: {iqr_price}")
```
Data Distribution
# Skewness
```python
# Calculate skewness
skewness_price = df['Price'].skew()
print(f"Skewness Price: {skewness_price}")
```
# Kurtosis
Kurtosis measures the "tailedness" of the data distribution. High
kurtosis indicates heavy tails, and low kurtosis indicates light tails
compared to a normal distribution.
```python
# Calculate kurtosis
kurtosis_price = df['Price'].kurtosis()
print(f"Kurtosis Price: {kurtosis_price}")
```
# Histograms
```python
import matplotlib.pyplot as plt
# Plot histogram
df['Price'].hist(bins=10, edgecolor='black')
plt.title('Price Distribution')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.show()
```
# Box Plots
```python
# Plot box plot
df['Price'].plot(kind='box')
plt.title('Price Box Plot')
plt.ylabel('Price')
plt.show()
```
# Scatter Plots
```python
# Sample data for scatter plot
data = {
'Price': [100, 102, 98, 101, 99, 100, 97],
'Volume': [200, 210, 190, 205, 195, 200, 185]
}
df = pd.DataFrame(data)
```python
import yfinance as yf
# Rolling Statistics
```python
# Calculate rolling mean and standard deviation
rolling_mean = stock_data['Close'].rolling(window=20).mean()
rolling_std = stock_data['Close'].rolling(window=20).std()
# Normal Distribution
```python
import numpy as np
import matplotlib.pyplot as plt
# Lognormal Distribution
- Stock Prices: Since stock prices cannot be negative, they are often
modeled using a lognormal distribution.
- Option Pricing: The Black-Scholes model assumes that stock
prices follow a lognormal distribution.
# Discrete Distributions
Binomial Distribution
The binomial distribution models the number of successes in a fixed
number of independent Bernoulli trials. Each trial has two possible
outcomes: success or failure.
```python
from scipy.stats import binom
# Parameters
n = 10 # number of trials
p = 0.5 # probability of success
Poisson Distribution
The Poisson distribution models the number of events occurring
within a fixed interval of time or space, given a constant mean rate of
occurrence.
```python
from scipy.stats import poisson
# Parameter
mu = 3 # mean number of events
# Continuous Distributions
Exponential Distribution
```python
from scipy.stats import expon
Beyond the basic distributions, there are more complex ones that are
particularly useful in financial modeling.
# Student's t-Distribution
```python
from scipy.stats import t
# Generate a t-distribution
data = t.rvs(df=10, size=1000)
# Beta Distribution
The beta distribution is defined on the interval [0, 1] and is useful for
modeling variables that represent proportions or probabilities.
```python
from scipy.stats import beta
# Parameters
a, b = 2, 5
```python
import yfinance as yf
# Download historical stock data
ticker = 'AAPL'
stock_data = yf.download(ticker, start='2022-01-01', end='2022-12-
31')
Introduction
- H0: The mean return of the new strategy is equal to or less than the
mean return of the existing strategy.
- H1: The mean return of the new strategy is greater than the mean
return of the existing strategy.
The significance level (α) is the threshold for rejecting the null
hypothesis, commonly set at 0.05. The p-value measures the
probability of obtaining test results at least as extreme as the
observed results, assuming the null hypothesis is true.
# T-Test
The t-test compares the means of two groups and is useful for small
sample sizes.
```python
from scipy.stats import ttest_1samp, ttest_ind
# Sample data
returns_strategy_A = [0.05, 0.06, 0.07, 0.08, 0.06]
returns_strategy_B = [0.07, 0.08, 0.09, 0.10, 0.09]
# One-sample t-test
t_statistic, p_value = ttest_1samp(returns_strategy_A, 0.06)
print(f"One-sample t-test p-value: {p_value}")
# Two-sample t-test
t_statistic, p_value = ttest_ind(returns_strategy_A,
returns_strategy_B)
print(f"Two-sample t-test p-value: {p_value}")
```
# Chi-Square Test
```python
from scipy.stats import chi2_contingency
# Contingency table
contingency_table = [[30, 10], [20, 40]]
# Chi-square test
chi2_statistic, p_value, dof, expected =
chi2_contingency(contingency_table)
print(f"Chi-square test p-value: {p_value}")
```
```python
from scipy.stats import f_oneway
# One-way ANOVA
f_statistic, p_value = f_oneway(returns_A, returns_B, returns_C)
print(f"ANOVA p-value: {p_value}")
```
# Regression Analysis
```python
import statsmodels.api as sm
# Sample data
X = [1, 2, 3, 4, 5]
Y = [2, 4, 5, 4, 5]
- H0: The mean return before the algorithm is equal to the mean
return after the algorithm.
- H1: The mean return after the algorithm is greater than the mean
return before the algorithm.
```python
# Sample data
returns_before = [0.01, 0.02, 0.015, 0.017, 0.014]
returns_after = [0.018, 0.021, 0.019, 0.022, 0.020]
# Paired t-test
t_statistic, p_value = ttest_ind(returns_before, returns_after)
print(f"Paired t-test p-value: {p_value}")
```
```python
import pandas as pd
from statsmodels.tsa.stattools import adfuller
Fundamentals of Covariance
# Mathematical Definition
Where:
- \( n \) is the number of data points.
- \( X_i \) and \( Y_i \) are the individual data points.
- \( \bar{X} \) and \( \bar{Y} \) are the means of X and Y, respectively.
# Python Implementation
```python
import numpy as np
# Sample data
returns_A = [0.05, 0.06, 0.07, 0.08, 0.06]
returns_B = [0.07, 0.08, 0.09, 0.10, 0.09]
# Calculating covariance
cov_matrix = np.cov(returns_A, returns_B)
covariance = cov_matrix[0, 1]
print(f"Covariance: {covariance}")
```
Fundamentals of Correlation
# Mathematical Definition
Where:
- \( {Cov}(X, Y) \) is the covariance of X and Y.
- \( \sigma_X \) and \( \sigma_Y \) are the standard deviations of X
and Y, respectively.
# Python Implementation
```python
# Calculating correlation
correlation_matrix = np.corrcoef(returns_A, returns_B)
correlation = correlation_matrix[0, 1]
print(f"Correlation: {correlation}")
```
# Portfolio Diversification
# Sample data
assets = ['Asset_A', 'Asset_B']
returns = np.array([[0.05, 0.07],
[0.06, 0.08],
[0.07, 0.09],
[0.08, 0.10],
[0.06, 0.09]])
# Covariance matrix
cov_matrix = np.cov(returns.T)
# Risk Management
# Calculate correlation
correlation = stock_data['Returns'].corr(market_data['Returns'])
print(f"Correlation with Market: {correlation}")
```
# Economic Analysis
```python
import statsmodels.api as sm
# Sample data
returns_data = np.array([returns_A, returns_B])
# Example Dataset
```python
import pandas as pd
import matplotlib.pyplot as plt
```python
import statsmodels.api as sm
Moving averages and smoothing techniques help filter out noise from
time series data, making it easier to identify trends and patterns.
```python
# Calculating an exponential moving average
ema = stock_data.ewm(span=window, adjust=False).mean()
# AR(p) Model
Where:
- \( \phi_i \) are the parameters.
- \( \epsilon_t \) is the error term.
```python
from statsmodels.tsa.ar_model import AutoReg
# Fitting an AR model
model = AutoReg(stock_data, lags=1)
model_fit = model.fit()
# Making predictions
predictions = model_fit.predict(start=len(stock_data),
end=len(stock_data))
print(f"Predicted value: {predictions[0]}")
```
# MA(q) Model
Where:
- \( \theta_i \) are the parameters.
- \( \epsilon_t \) is the error term.
```python
from statsmodels.tsa.arima.model import ARIMA
# Fitting an MA model
model = ARIMA(stock_data, order=(0, 0, 1))
model_fit = model.fit()
# Making predictions
predictions = model_fit.forecast(steps=1)
# ARIMA(p,d,q) Model
```python
# Fitting an ARIMA model
model = ARIMA(stock_data, order=(1, 1, 1))
model_fit = model.fit()
# Making predictions
predictions = model_fit.forecast(steps=1)
# Seasonal Decomposition
```python
# Seasonal adjustment
seasonally_adjusted = stock_data - decomposition.seasonal
Understanding Stationarity
# Types of Stationarity
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
The differenced series, which subtracts the previous value from the
current value, often exhibits stationarity, making it suitable for further
analysis.
Importance of Stationarity in Time Series Analysis
Where:
- \( \epsilon_t \) is a white noise error term.
A series with a unit root has a stochastic trend and its variance
grows over time, indicating non-stationarity.
The ADF test checks for a unit root by regressing the first difference
of the series on its lagged value and additional lagged differences.
Where:
- \(\Delta Y_t\) is the differenced series.
- \(\alpha\) and \(\beta t\) are optional deterministic terms (constant
and trend).
- \(\gamma\) is the coefficient of the lagged series.
The null hypothesis (\(H_0\)) is that the series has a unit root (\
(\gamma = 0\)). If \(H_0\) is rejected, the series is stationary.
Using the `statsmodels` library, we can apply the ADF test to our
time series data.
```python
from statsmodels.tsa.stattools import adfuller
Using the `arch` library, we can apply the PP test to our time series
data.
```python
from arch.unitroot import PhillipsPerron
The KPSS test reverses the null hypothesis, testing for stationarity.
The null hypothesis (\(H_0\)) is that the series is stationary, while the
alternative hypothesis (\(H_1\)) is that the series has a unit root.
Using the `statsmodels` library, we can apply the KPSS test to our
time series data.
```python
from statsmodels.tsa.stattools import kpss
```python
# Differencing the non-stationary series
diff_sales_data = sales_data.diff().dropna()
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
# Create a DataFrame
stock_data = pd.DataFrame({'Date': dates, 'Price': prices})
stock_data.set_index('Date', inplace=True)
Using the same stock price data, we can compute and visualize the
partial autocorrelation function (PACF) as follows:
```python
# Calculate and plot the partial autocorrelation
partial_autocorrelation = sm.tsa.pacf(stock_data['Price'], nlags=20)
plt.bar(range(len(partial_autocorrelation)), partial_autocorrelation)
plt.title('Partial Autocorrelation Function')
plt.xlabel('Lag')
plt.ylabel('Partial Autocorrelation')
plt.show()
```
This code snippet calculates the PACF and plots it, showing the
direct correlations between the price data and its lags.
```python
from statsmodels.tsa.arima.model import ARIMA
Volatility Modeling
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ax2 = ax1.twinx()
ax2.plot(stock_data.index,
stock_data['Return'].rolling(window=21).std() * np.sqrt(252),
color='red', label='Rolling Volatility')
ax2.set_ylabel('Volatility')
fig.legend(loc='upper left')
plt.show()
```
In this example, we generate a synthetic series of stock prices,
calculate the daily returns, and compute the historical volatility. The
plot visualizes both the stock price and rolling volatility, providing
insights into how volatility evolves over time.
Using the same stock price data, we can fit a GARCH(1,1) model
and forecast future volatility.
```python
from arch import arch_model
This code snippet fits a GARCH(1,1) model to the stock returns and
prints the summary of the model. It also forecasts the volatility for the
next five days, providing valuable insights into future market risks.
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Example: Generate synthetic stock returns data
np.random.seed(42)
stock_returns = np.random.normal(size=(100, 5)) # 100 days of
returns for 5 stocks
# Create a DataFrame
stock_data = pd.DataFrame(stock_returns, columns=['Stock A',
'Stock B', 'Stock C', 'Stock D', 'Stock E'])
# Perform PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)
# Factor Analysis
Using the same stock returns data, we can perform factor analysis to
identify common factors.
```python
from sklearn.decomposition import FactorAnalysis
# Multivariate Regression
# Create DataFrames
macro_data = pd.DataFrame(macro_indicators, columns=['Indicator
1', 'Indicator 2', 'Indicator 3'])
stock_data = pd.DataFrame(stock_returns, columns=['Stock Return
1', 'Stock Return 2'])
For the purposes of this section, we will use historical stock price
data from Yahoo Finance.
```python
import yfinance as yf
import pandas as pd
This example downloads the historical stock prices for Apple Inc.
(AAPL) for the year 2022 and displays the first few rows of the data.
```python
import matplotlib.pyplot as plt
This code generates a line plot of the closing prices of AAPL for the
specified period. Visualizing the data helps in identifying trends,
seasonality, and potential outliers.
Data Cleaning
```python
# Check for missing values
missing_values = stock_data.isnull().sum()
print("Missing values:\n", missing_values)
In this example, any missing values in the stock data are filled with
the previous day's closing price using the forward fill method.
Practical Applications
```python
# Calculate daily returns
stock_data['Daily Return'] = stock_data['Close'].pct_change()
```python
# Calculate the 20-day moving average
stock_data['20 Day MA'] =
stock_data['Close'].rolling(window=20).mean()
# Plot the closing price and 20-day moving average
plt.figure(figsize=(10, 6))
plt.plot(stock_data['Close'], label='Close Price')
plt.plot(stock_data['20 Day MA'], label='20 Day MA', linestyle='--')
plt.title('AAPL Stock Price and 20-Day Moving Average (2022)')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()
```
This code calculates and plots the 20-day moving average alongside
the closing prices, providing a smoothed view of the stock's price
trend.
# Regression Analysis
```python
import statsmodels.api as sm
T
ime series data, a cornerstone of financial analysis, represents
observations of a variable or several variables collected
sequentially over intervals of time. Unlike other forms of data,
time series data captures the temporal dependencies that are crucial
for understanding the dynamics of financial markets. This section
delves into the essence of time series data, emphasizing its
significance in financial modeling, and provides a comprehensive
guide to handling and analyzing this type of data using Python.
We'll use the `yfinance` library to download historical stock price data
for Microsoft Corporation (MSFT).
```python
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
This code snippet downloads the historical stock prices for Microsoft
for the year 2021 and displays the first few rows.
```python
# Plot the closing price over time
plt.figure(figsize=(10, 6))
plt.plot(stock_data['Close'], label='Close Price')
plt.title('MSFT Stock Price (2021)')
plt.xlabel('Date')
plt.ylabel('Close Price (USD)')
plt.legend()
plt.show()
```
The above code generates a line plot of the closing prices, providing
a visual representation of the stock's performance over time.
1. Trend: The long-term direction of the time series data, which can
be upward, downward, or flat.
2. Seasonality: Regular, repeating patterns within a specific period,
such as monthly or annually.
3. Cyclic Patterns: Irregular fluctuations that are not as predictable
as seasonal patterns.
4. Random Noise: Unpredictable variations that do not follow a
discernible pattern.
```python
from statsmodels.tsa.seasonal import seasonal_decompose
Time series data can often contain missing values or outliers, which
need to be addressed for accurate analysis.
```python
# Check for missing values
missing_values = stock_data.isnull().sum()
print("Missing values:\n", missing_values)
In this example, any missing values in the stock data are filled using
the forward fill method.
```python
import numpy as np
# Remove outliers
cleaned_data = stock_data[z_scores <= 3]
```
```python
from statsmodels.tsa.stattools import adfuller
Visualization Insights:
```python
# Differencing the data to remove trend and achieve stationarity
diff_data = stock_data['Close'].diff().dropna()
```python
from statsmodels.tsa.seasonal import STL
Practical Applications
---
Moving Averages and Smoothing Techniques
In this example, we calculate the 20-day SMA for Apple Inc. (AAPL)
stock prices. The `rolling` method of Pandas DataFrame is used to
compute the SMA, and the results are visualized to observe the
smoothing effect.
The EMA assigns more weight to recent data points, making it more
responsive to new information. This is particularly useful in volatile
markets where recent trends are more indicative of future
movements.
```python
# Calculate 20-day EMA
stock_data['EMA_20'] = stock_data['Close'].ewm(span=20,
adjust=False).mean()
```python
import numpy as np
This example showcases how to compute the 20-day WMA for AAPL
stock prices, using a custom function that applies varying weights to
the data within the window.
Smoothing Techniques
```python
from statsmodels.nonparametric.smoothers_lowess import lowess
Exponential Smoothing
SES is ideal for forecasting time series data that lacks significant
trends or seasonal variations. The forecast is computed as:
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
DES, also known as Holt’s linear trend model, is effective for data
with a trend but no seasonality. It incorporates both level and trend
components:
```python
from statsmodels.tsa.holtwinters import ExponentialSmoothing
where \( \alpha \), \( \beta \), and \( \gamma \) are the smoothing
factors for level, trend, and seasonality, respectively, and \( s \)
denotes the season length.
```python
# Generate synthetic seasonal data
np.random.seed(42)
seasonal_data = np.sin(np.linspace(0, 4 * np.pi, 100)) +
np.random.randn(100).cumsum() + 10
where:
- \( X_t \) is the value of the series at time \( t \).
- \( c \) is a constant term.
- \( \phi_i \) are the coefficients of the model.
- \( X_{t-i} \) are the past values of the series.
- \( \epsilon_t \) is the white noise error term.
The order \( p \) signifies how many past values are considered for
predicting the current value. For instance, an AR(1) model considers
only the immediate past value, while an AR(2) model considers the
past two values.
Python, with its robust libraries like SciPy and StatsModels, provides
an efficient framework for implementing AR models. The following
steps guide you through the process:
1. Data Preparation
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.ar_model import AutoReg
# Load financial data
df = pd.read_csv('financial_data.csv', index_col='Date',
parse_dates=True)
# Display the first few rows
print(df.head())
```
```python
# Plot the financial time series
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Stock Prices')
plt.title('Financial Time Series')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```
```python
# Split the data into training and testing sets
train_data, test_data = df['Close'][:-30], df['Close'][-30:]
4. Making Predictions
After fitting the model, the next step is to make predictions and
assess the model's performance.
```python
# Make predictions
predictions = ar_model.predict(start=len(train_data),
end=len(train_data) + len(test_data) - 1, dynamic=False)
Evaluation metrics like Mean Squared Error (MSE) are essential for
quantifying the model's accuracy.
```python
from sklearn.metrics import mean_squared_error
# Calculate MSE
mse = mean_squared_error(test_data, predictions)
print(f'Mean Squared Error: {mse}')
```
Let's apply the AR model to predict future stock prices. For this
example, we will use historical data from a well-known stock, say
Apple Inc. (AAPL).
1. Data Collection
Ensure you have the historical stock prices data for Apple Inc. You
can download it from sources like Yahoo Finance or use an API to
fetch the data.
```python
import yfinance as yf
2. AR Model Implementation
Repeat the steps to fit an AR model to the Apple stock data and
make predictions.
```python
# Split the data
train_data, test_data = apple_stock['Close'][:-30],
apple_stock['Close'][-30:]
# Make predictions
predictions = ar_model.predict(start=len(train_data),
end=len(train_data) + len(test_data) - 1, dynamic=False)
# Calculate MSE
mse = mean_squared_error(test_data, predictions)
print(f'Mean Squared Error: {mse}')
```python
from statsmodels.tsa.stattools import adfuller
With the optimal lag determined, fit the AR model and make
predictions.
```python
# Fit the AR model with optimal lag
optimal_ar_model = AutoReg(train_data, lags=optimal_lag).fit()
# Make predictions
optimal_predictions =
optimal_ar_model.predict(start=len(train_data), end=len(train_data)
+ len(test_data) - 1, dynamic=False)
# Calculate MSE
optimal_mse = mean_squared_error(test_data, optimal_predictions)
print(f'Optimal AR Model Mean Squared Error: {optimal_mse}')
where:
- \( X_t \) is the value of the series at time \( t \).
- \( \mu \) is the mean of the series.
- \( \epsilon_t \) is the white noise error term at time \( t \).
- \( \theta_i \) are the coefficients of the model, representing the
impact of past errors.
1. Data Preparation
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
```python
# Plot the financial time series
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Stock Prices')
plt.title('Financial Time Series')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```
4. Making Predictions
After fitting the model, make predictions and evaluate the model’s
performance.
```python
# Make predictions
predictions = ma_model.predict(start=len(train_data),
end=len(train_data) + len(test_data) - 1, dynamic=False)
```python
from sklearn.metrics import mean_squared_error
# Calculate MSE
mse = mean_squared_error(test_data, predictions)
print(f'Mean Squared Error: {mse}')
```
1. Data Collection
```python
import yfinance as yf
2. MA Model Implementation
Follow the steps to fit an MA model to the Microsoft stock data and
make predictions.
```python
# Split the data
train_data, test_data = microsoft_stock['Close'][:-30],
microsoft_stock['Close'][-30:]
# Make predictions
predictions = ma_model.predict(start=len(train_data),
end=len(train_data) + len(test_data) - 1, dynamic=False)
# Calculate MSE
mse = mean_squared_error(test_data, predictions)
print(f'Mean Squared Error: {mse}')
While MA(1) models are a good starting point, financial data often
benefits from higher-order MA models. Let’s explore MA(q) models
with different values of \( q \) for better predictive performance.
```python
# Function to find the optimal order
def find_optimal_order(series):
aic_values = []
for order in range(1, 11):
model = ARIMA(series, order=(0, 0, order)).fit()
aic_values.append(model.aic)
optimal_order = aic_values.index(min(aic_values)) + 1
return optimal_order
# Find the optimal order for Microsoft stock data
optimal_order = find_optimal_order(microsoft_stock['Close'])
print(f'Optimal Order: {optimal_order}')
```
With the optimal order determined, fit the MA model and make
predictions.
```python
# Fit the MA model with optimal order
optimal_ma_model = ARIMA(train_data, order=(0, 0,
optimal_order)).fit()
# Make predictions
optimal_predictions =
optimal_ma_model.predict(start=len(train_data), end=len(train_data)
+ len(test_data) - 1, dynamic=False)
# Calculate MSE
optimal_mse = mean_squared_error(test_data, optimal_predictions)
print(f'Optimal MA Model Mean Squared Error: {optimal_mse}')
where:
- \( X_t \) is the differenced value of the series at time \( t \).
- \( \mu \) is the mean of the series.
- \( \phi_i \) represents the coefficients for the autoregressive terms.
- \( \theta_j \) represents the coefficients for the moving average
terms.
- \( \epsilon_t \) is the white noise error term at time \( t \).
1. Data Preparation
Begin by importing the necessary libraries, loading the financial data,
and performing initial checks.
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
```python
# Plot the financial time series
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Stock Prices')
plt.title('Financial Time Series')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```
```python
# Perform first differencing
df_diff = df['Close'].diff().dropna()
```python
# Fit the ARIMA model
arima_model = ARIMA(df['Close'], order=(1, 1, 1)).fit()
# Print model summary
print(arima_model.summary())
```
5. Making Predictions
After fitting the model, make predictions and evaluate the model’s
performance.
```python
# Make predictions
predictions = arima_model.predict(start=len(df) - 30, end=len(df) - 1,
dynamic=False)
# Calculate MSE
mse = mean_squared_error(df['Close'][-30:], predictions)
print(f'Mean Squared Error: {mse}')
```
1. Data Collection
Collect historical stock prices for Apple Inc. using sources like Yahoo
Finance or an API.
```python
import yfinance as yf
```python
# Fit the ARIMA model
arima_model = ARIMA(apple_stock['Close'], order=(1, 1, 1)).fit()
# Make predictions
predictions = arima_model.predict(start=len(apple_stock) - 30,
end=len(apple_stock) - 1, dynamic=False)
# Calculate MSE
mse = mean_squared_error(apple_stock['Close'][-30:], predictions)
print(f'Mean Squared Error: {mse}')
Criteria like the Akaike Information Criterion (AIC) and the Bayesian
Information Criterion (BIC) help in selecting the optimal model
parameters.
```python
# Function to find the optimal order
def find_optimal_order(series):
aic_values = []
for p in range(1, 5):
for d in range(1, 3):
for q in range(1, 5):
try:
model = ARIMA(series, order=(p, d, q)).fit()
aic_values.append((p, d, q, model.aic))
except:
continue
optimal_order = sorted(aic_values, key=lambda x: x[3])[0][:3]
return optimal_order
With the optimal parameters determined, fit the ARIMA model and
make predictions.
```python
# Fit the ARIMA model with optimal order
optimal_arima_model = ARIMA(apple_stock['Close'],
order=optimal_order).fit()
# Make predictions
optimal_predictions =
optimal_arima_model.predict(start=len(apple_stock) - 30,
end=len(apple_stock) - 1, dynamic=False)
# Calculate MSE
optimal_mse = mean_squared_error(apple_stock['Close'][-30:],
optimal_predictions)
print(f'Optimal ARIMA Model Mean Squared Error: {optimal_mse}')
Understanding Seasonality
# 1. Visual Inspection
Plotting the time series data and examining it for recurring patterns is
a straightforward way to detect seasonality.
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# 2. Seasonal Decomposition
```python
from statsmodels.tsa.seasonal import seasonal_decompose
# 1. Seasonal Differencing
```python
# Seasonal differencing
seasonal_diff = df['Close'] - df['Close'].shift(12)
```python
from statsmodels.tsa.statespace.sarimax import SARIMAX
# Make predictions
predictions = sarima_model.predict(start=len(df) - 30, end=len(df) -
1, dynamic=False)
```python
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Make predictions
predictions = ets_model.predict(start=len(df) - 30, end=len(df) - 1)
# 1. Data Collection
```python
# Fetch historical retail sales data
retail_sales = yf.download('XRT', start='2020-01-01', end='2023-01-
01')
# Display the first few rows
print(retail_sales.head())
```
```python
# Decompose the time series
decomposition = seasonal_decompose(retail_sales['Close'],
model='multiplicative')
trend = decomposition.trend
seasonal = decomposition.seasonal
residual = decomposition.resid
# Seasonal adjustment
seasonally_adjusted = retail_sales['Close'] / seasonal
```python
# Fit the ARIMA model on seasonally adjusted data
arima_model = ARIMA(seasonally_adjusted, order=(1, 1, 1)).fit()
# Make predictions
predictions = arima_model.predict(start=len(seasonally_adjusted) -
30, end=len(seasonally_adjusted) - 1, dynamic=False)
Before diving into advanced techniques, let’s briefly revisit the basics
of ARIMA. The ARIMA model comprises three key components:
- AutoRegressive (AR) part: Involves regressing the variable on its
own lagged (past) values.
- Integrated (I) part: Involves differencing the data to make it
stationary.
- Moving Average (MA) part: Involves modeling the error term as a
linear combination of error terms occurring at various times in the
past.
# Parameter Optimization
# 1. Grid Search
```python
import itertools
import warnings
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error
warnings.filterwarnings("ignore")
AIC and BIC are measures of the quality of a model relative to other
models. Lower AIC or BIC values indicate a better model. These
criteria penalize models with more parameters to avoid overfitting.
```python
# Fit the model using the best parameters found from grid search
best_model = SARIMAX(df['Close'], order=(best_params[0],
best_params[1], best_params[2]),
seasonal_order=(best_params[3], best_params[4], best_params[5],
best_params[6])).fit()
# 1. Residual Analysis
```python
# Plot the residuals
residuals = best_model.resid
plt.figure(figsize=(12, 6))
plt.plot(residuals)
plt.title('Residuals of the SARIMA Model')
plt.xlabel('Date')
plt.ylabel('Residuals')
plt.show()
# 2. Prediction Accuracy
```python
# Split the data into training and testing sets
train_data = df['Close'][:len(df) - 30]
test_data = df['Close'][len(df) - 30:]
# Make predictions
predictions = model_fit.predict(start=len(train_data), end=len(df) - 1,
dynamic=False)
# 1. Data Preprocessing
```python
# Fetch Forex data (e.g., USD/EUR exchange rates)
forex_data = yf.download('EURUSD=X', start='2020-01-01',
end='2023-01-01')
```python
# Fit the SARIMA model
sarima_model_forex = SARIMAX(forex_data_diff, order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12)).fit()
# Make predictions
predictions_forex =
sarima_model_forex.predict(start=len(forex_data_diff) - 30,
end=len(forex_data_diff) - 1)
Residual Analysis
Analyzing the residuals of your model is a fundamental step in
diagnostics. Residuals, the difference between observed and
predicted values, should ideally behave like white noise—indicating
that the model has explained all systematic patterns.
```python
import matplotlib.pyplot as plt
residuals = best_model.resid
plt.figure(figsize=(12, 6))
plt.plot(residuals)
plt.title('Residuals of the SARIMA Model')
plt.xlabel('Date')
plt.ylabel('Residuals')
plt.show()
```
```python
residuals.plot(kind='kde')
plt.title('Density Plot of Residuals')
plt.show()
residuals.hist(bins=30)
plt.title('Histogram of Residuals')
plt.show()
```
```python
from statsmodels.stats.diagnostic import acorr_ljungbox
```python
aic_value = best_model.aic
bic_value = best_model.bic
print(f'AIC: {aic_value}')
print(f'BIC: {bic_value}')
```
Cross-Validation
1. Data Splitting: Divide the data into training and testing sets.
```python
train_data = df['Close'][:len(df) - 30]
test_data = df['Close'][len(df) - 30:]
```
```python
model_fit = SARIMAX(train_data, order=(best_params[0],
best_params[1], best_params[2]),
seasonal_order=(best_params[3], best_params[4], best_params[5],
best_params[6])).fit()
```
3. Prediction and Plotting: Make predictions on the test data and plot
the results to visually inspect the prediction accuracy.
```python
predictions = model_fit.predict(start=len(train_data), end=len(df) - 1,
dynamic=False)
plt.figure(figsize=(12, 6))
plt.plot(test_data, label='Actual Prices')
plt.plot(predictions, color='red', linestyle='--', label='Predicted Prices')
plt.title('SARIMA Model Predictions vs Actual Prices')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```
```python
mape = np.mean(np.abs((test_data - predictions) / test_data)) * 100
print(f'MAPE: {mape:.2f}%')
```
print(f'MSE: {mse:.2f}')
print(f'RMSE: {rmse:.2f}')
print(f'MAE: {mae:.2f}')
```
```python
import yfinance as yf
```python
# Fit the SARIMA model
sarima_model_sp500 = SARIMAX(sp500_data_diff, order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12)).fit()
```
```python
# Plot residuals
residuals_sp500 = sarima_model_sp500.resid
plt.figure(figsize=(12, 6))
plt.plot(residuals_sp500)
plt.title('Residuals of the S&P 500 SARIMA Model')
plt.xlabel('Date')
plt.ylabel('Residuals')
plt.show()
```python
# Split the data into training and testing sets
train_data_sp500 = sp500_data_diff[:len(sp500_data_diff) - 30]
test_data_sp500 = sp500_data_diff[len(sp500_data_diff) - 30:]
# Make predictions
predictions_sp500 =
model_fit_sp500.predict(start=len(train_data_sp500),
end=len(sp500_data_diff) - 1, dynamic=False)
# Calculate MAPE
mape_sp500 = np.mean(np.abs((test_data_sp500 -
predictions_sp500) / test_data_sp500)) * 100
print(f'MAPE: {mape_sp500:.2f}%')
```
# Summary
R
egression analysis is a cornerstone of statistical and financial
modeling. It provides a powerful tool for understanding
relationships between variables and making predictions based
on historical data. In finance, regression models are indispensable
for tasks ranging from risk management and asset pricing to
economic forecasting and investment strategy development. This
section will introduce you to the fundamental principles of regression
models and their applications in financial contexts.
Where:
- \( y \) is the dependent variable.
- \( x_1, x_2, ..., x_n \) are independent variables.
- \( \beta_0 \) is the intercept.
- \( \beta_1, \beta_2, ..., \beta_n \) are the coefficients of the
independent variables.
- \( \epsilon \) is the error term.
```python
import pandas as pd
import numpy as np
import statsmodels.api as sm
# Sample data
data = {
'Interest_Rate': [2.5, 3.0, 3.5, 4.0, 4.5],
'Stock_Price': [120, 125, 130, 135, 140]
}
df = pd.DataFrame(data)
```python
# Sample data
data = {
'GDP_Growth': [2.5, 3.0, 3.5, 4.0, 4.5],
'Interest_Rate': [1.5, 2.0, 2.5, 3.0, 3.5],
'Stock_Price': [120, 125, 130, 140, 145]
}
df = pd.DataFrame(data)
```python
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([1, 4, 9, 16, 25])
```python
from sklearn.metrics import mean_squared_error
print(f'RMSE: {rmse}')
print(f'R-squared: {r_squared}')
print(f'Adjusted R-squared: {adjusted_r_squared}')
```
Practical Application: Predicting Stock Prices Using GDP Growth
and Interest Rates
For this example, assume you have the following data on GDP
growth, interest rates, and stock prices for a given period.
```python
# Sample data
data = {
'GDP_Growth': [2.5, 3.0, 3.5, 4.0, 4.5],
'Interest_Rate': [1.5, 2.0, 2.5, 3.0, 3.5],
'Stock_Price': [120, 125, 130, 140, 145]
}
df = pd.DataFrame(data)
Plot the actual vs. predicted stock prices to visually assess the
model's accuracy.
```python
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.plot(df['Stock_Price'], label='Actual Stock Prices', marker='o')
plt.plot(predictions, label='Predicted Stock Prices', marker='x')
plt.xlabel('Observations')
plt.ylabel('Stock Prices')
plt.title('Actual vs Predicted Stock Prices')
plt.legend()
plt.show()
```
# Summary
At its heart, OLS regression seeks to find the best-fitting line through
a set of data points by minimizing the sum of the squares of the
vertical differences (residuals) between the observed values and the
values predicted by the linear model. The equation of a simple linear
regression model is:
Where:
- \( y_i \) is the dependent variable (response).
- \( x_i \) is the independent variable (predictor).
- \( \beta_0 \) is the intercept.
- \( \beta_1 \) is the slope coefficient.
- \( \epsilon_i \) represents the error term or residual for each
observation \( i \).
Where:
- \( \mathbf{y} \) is an \( n \times 1 \) vector of the dependent variable.
- \( \mathbf{X} \) is an \( n \times (p+1) \) matrix of the predictors
(including a column of ones for the intercept).
- \( \boldsymbol{\beta} \) is a \( (p+1) \times 1 \) vector of coefficients.
- \( \boldsymbol{\epsilon} \) is an \( n \times 1 \) vector of errors.
```python
import pandas as pd
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
```
```python
# Sample data
data = {
'Market_Return': [0.02, 0.03, -0.01, 0.04, 0.01, 0.02, 0.03, -0.02,
0.05, 0.04],
'Stock_Return': [0.025, 0.035, -0.005, 0.045, 0.015, 0.025, 0.032,
-0.018, 0.055, 0.042]
}
df = pd.DataFrame(data)
```
```python
# Define independent and dependent variables
X = df['Market_Return']
y = df['Stock_Return']
To better understand the fit of the model, plot the observed vs.
predicted stock returns.
```python
# Generate predictions
predictions = model.predict(X)
plt.figure(figsize=(10, 6))
plt.scatter(df['Market_Return'], df['Stock_Return'], label='Observed',
color='blue')
plt.plot(df['Market_Return'], predictions, label='Fitted Line',
color='red')
plt.xlabel('Market Return')
plt.ylabel('Stock Return')
plt.title('OLS Regression: Stock vs. Market Return')
plt.legend()
plt.show()
```
```python
# Calculate relevant metrics
r_squared = model.rsquared
adjusted_r_squared = model.rsquared_adj
p_values = model.pvalues
f_statistic = model.fvalue
print(f'R-squared: {r_squared}')
print(f'Adjusted R-squared: {adjusted_r_squared}')
print(f'P-values: {p_values}')
print(f'F-statistic: {f_statistic}')
```
```python
# Sample data
data = {
'Market_Return': [0.02, 0.03, -0.01, 0.04, 0.01, 0.02, 0.03, -0.02,
0.05, 0.04],
'Interest_Rate': [0.01, 0.02, 0.015, 0.02, 0.025, 0.03, 0.02, 0.015,
0.025, 0.02],
'GDP_Growth': [0.03, 0.025, 0.02, 0.035, 0.03, 0.04, 0.025, 0.02,
0.05, 0.045],
'Stock_Return': [0.025, 0.035, -0.005, 0.045, 0.015, 0.025, 0.032,
-0.018, 0.055, 0.042]
}
df = pd.DataFrame(data)
Plot the observed vs. predicted stock returns to visually assess the
model's accuracy.
```python
# Generate predictions
predictions = model.predict(X)
plt.figure(figsize=(10, 6))
plt.plot(y, label='Actual Stock Returns', marker='o')
plt.plot(predictions, label='Predicted Stock Returns', marker='x')
plt.xlabel('Observations')
plt.ylabel('Stock Returns')
plt.title('OLS Regression: Observed vs Predicted Stock Returns')
plt.legend()
plt.show()
```
First, let's load and explore the data using Python's powerful
libraries, `pandas`, `matplotlib`, and `seaborn` for visualization.
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```python
import statsmodels.api as sm
```python
from statsmodels.stats.outliers_influence import
variance_inflation_factor
# Calculate VIF for each independent variable
vif_data = pd.DataFrame()
vif_data["feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in
range(len(X.columns))]
print(vif_data)
```
```python
sns.residplot(x=model.fittedvalues, y=model.resid)
plt.xlabel('Fitted values')
plt.ylabel('Residuals')
plt.show()
```
```python
from statsmodels.stats.diagnostic import het_breuschpagan
```python
from scipy.stats import shapiro
shapiro_test = shapiro(model.resid)
print(f'P-value: {shapiro_test.pvalue}')
```
```python
df['log_stock_market_return'] = np.log(df['stock_market_return'])
Y = df['log_stock_market_return']
Practical Application
To illustrate the practical application of multiple regression models,
consider a case study where a financial analyst predicts quarterly
stock returns. By incorporating economic indicators such as GDP
growth, interest rates, and inflation, the analyst can develop a robust
model that aids in investment decision-making. This model not only
enhances the predictive power but also provides actionable insights
into the driving factors behind stock performance.
This equation models the log-odds of the probability that \(Y = 1\) as
a linear combination of the independent variables. The logistic
function then transforms these log-odds into probabilities:
\[ P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X_1 + \beta_2X_2 +
\cdots + \beta_nX_n)}} \]
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```python
import statsmodels.api as sm
```python
from sklearn.metrics import confusion_matrix
print(conf_matrix)
```
2. ROC Curve and AUC: The ROC curve plots the true positive rate
against the false positive rate, and the AUC provides a single
measure of overall model performance.
```python
from sklearn.metrics import roc_curve, roc_auc_score
# Calculate AUC
auc = roc_auc_score(Y, logit_model.predict(X))
print(f'AUC: {auc}')
```
```python
from sklearn.metrics import classification_report
print(classification_report(Y, predictions))
```
```python
from statsmodels.stats.outliers_influence import
variance_inflation_factor
print(vif_data)
```
Practical Application
For example:
```python
# Predicting the probability of default for new applicants
new_applicant = pd.DataFrame({'const': 1, 'credit_score': [650],
'annual_income': [50000], 'loan_amount': [15000]})
prob_default = logit_model.predict(new_applicant)
print(f'Probability of default: {prob_default[0]:.2f}')
```
```python
import pandas as pd
# Sample data
data = {
'marketing_expenses': [15000, 18000, 12000, 20000, 22000],
'rd_costs': [23000, 25000, 22000, 24000, 26000],
'num_employees': [100, 110, 90, 120, 130],
'revenue': [500000, 550000, 480000, 600000, 650000]
}
df = pd.DataFrame(data)
print(df)
```
```python
from sklearn.linear_model import LinearRegression
One-Hot Encoding
One-hot encoding is a common method to convert qualitative data
into a numerical format. This technique creates binary columns for
each category, where each column represents a category in the
original data, and a value of 1 indicates the presence of that
category.
```python
# Sample data with qualitative variable
data = {
'industry': ['Tech', 'Finance', 'Tech', 'Health', 'Finance'],
'marketing_expenses': [15000, 18000, 12000, 20000, 22000],
'revenue': [500000, 550000, 480000, 600000, 650000]
}
df = pd.DataFrame(data)
The resulting DataFrame will have separate binary columns for each
industry category:
```
marketing_expenses revenue industry_Finance industry_Health in
dustry_Tech
0 15000 500000 0 0 1
1 18000 550000 1 0 0
2 12000 480000 0 0 1
3 20000 600000 0 1 0
4 22000 650000 1 0 0
```
Dummy Variables
```python
# Create dummy variables
df_dummies = pd.get_dummies(df, columns=['industry'],
drop_first=True)
print(df_dummies)
```
```
marketing_expenses revenue industry_Finance industry_Health
0 15000 500000 0 0
1 18000 550000 1 0
2 12000 480000 0 0
3 20000 600000 0 1
4 22000 650000 1 0
```
```python
# Define independent and dependent variables
X = df_dummies[['marketing_expenses', 'industry_Finance',
'industry_Health']]
Y = df_dummies['revenue']
Practical Application
For example:
```python
# Sample dataset
data = {
'loan_amount': [10000, 20000, 15000, 25000, 30000],
'interest_rate': [5, 3.5, 4, 4.5, 3],
'loan_type': ['personal', 'mortgage', 'auto', 'mortgage', 'personal'],
'default': [0, 1, 0, 1, 0]
}
df = pd.DataFrame(data)
X = sm.add_constant(X)
logit_model = sm.Logit(Y, X).fit()
```python
import pandas as pd
# Sample data
data = {
'revenue': [500000, 550000, 600000, 650000, 700000],
'advertising_expenses': [15000, 18000, 20000, 22000, 24000],
'sales_personnel': [100, 110, 130, 140, 150]
}
df = pd.DataFrame(data)
print(df.corr())
```
```
revenue advertising_expenses sales_personnel
revenue 1.000000 0.993399 0.992278
advertising_expenses 0.993399 1.000000 0.998879
sales_personnel 0.992278 0.998879 1.000000
```
# Calculating VIF
```python
from statsmodels.stats.outliers_influence import
variance_inflation_factor
from sklearn.preprocessing import StandardScaler
# Independent variables
X = df[['advertising_expenses', 'sales_personnel']]
# Calculate VIF
vif_data = pd.DataFrame()
vif_data["feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X_scaled, i) for i in
range(X_scaled.shape[1])]
print(vif_data)
```
Addressing Multicollinearity
```python
from sklearn.linear_model import Ridge
# Output coefficients
print(f'Ridge Coefficients: {ridge_model.coef_}')
```
Practical Application
```python
# Sample dataset
data = {
'stock_price': [100, 110, 105, 115, 120],
'gdp_growth': [2.5, 3.0, 2.8, 3.2, 3.5],
'interest_rate': [1.5, 1.8, 1.7, 1.9, 2.0],
'inflation': [2.0, 2.1, 2.0, 2.2, 2.3]
}
df = pd.DataFrame(data)
# Independent variables
X = df[['gdp_growth', 'interest_rate', 'inflation']]
# Calculate VIF
vif_data = pd.DataFrame()
vif_data["feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in
range(X.shape[1])]
print(vif_data)
```python
import pandas as pd
# Sample data
data = {
'stock_return': [0.05, 0.02, -0.01, 0.03, 0.04],
'sector': ['Technology', 'Healthcare', 'Financial', 'Technology',
'Healthcare']
}
df = pd.DataFrame(data)
```
stock_return sector_Healthcare sector_Technology
0 0.05 0 1
1 0.02 1 0
2 -0.01 0 0
3 0.03 0 1
4 0.04 1 0
```
```python
import statsmodels.api as sm
The output shows the impact of each sector on stock returns, with
`Financial` as the baseline:
```
OLS Regression Results
====================================================
==========================
Dep. Variable: stock_return R-squared: 0.341
Model: OLS Adj. R-squared: 0.089
Method: Least Squares F-statistic: 1.355
Date: Thu, 05 Oct 2023 Prob (F-statistic): 0.372
Time: 15:21:06 Log-Likelihood: 11.809
No. Observations: 5 AIC: -17.62
Df Residuals: 2 BIC: -18.80
Df Model: 2
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0050 0.008 0.650 0.570 -0.027 0.037
sector_Healthcare 0.0350 0.011 3.267 0.085 -0.010
0.080
sector_Technology 0.0450 0.011 4.050 0.055 -0.003
0.093
====================================================
==========================
Omnibus: 1.558 Durbin-Watson: 1.685
Prob(Omnibus): 0.459 Jarque-Bera (JB): 0.570
Skew: -0.318 Prob(JB): 0.752
Kurtosis: 1.422 Cond. No. 3.41
====================================================
==========================
```
Interaction Effects
Interaction effects occur when the effect of one predictor variable on
the dependent variable depends on the level of another predictor
variable. These effects are crucial for uncovering complex
relationships in financial data.
```python
# Sample data
data = {
'revenue': [500000, 550000, 600000, 650000, 700000],
'advertising_expenses': [15000, 18000, 20000, 22000, 24000],
'sales_personnel': [100, 110, 130, 140, 150]
}
df = pd.DataFrame(data)
```
revenue advertising_expenses sales_personnel interaction
0 500000 15000 100 1500000
1 550000 18000 110 1980000
2 600000 20000 130 2600000
3 650000 22000 140 3080000
4 700000 24000 150 3600000
```
```python
# Define predictors and response variable
X = df[['advertising_expenses', 'sales_personnel', 'interaction']]
X = sm.add_constant(X) # Add a constant term to the model
Y = df['revenue']
```
OLS Regression Results
====================================================
==========================
Dep. Variable: revenue R-squared: 0.997
Model: OLS Adj. R-squared: 0.994
Method: Least Squares F-statistic: 274.8
Date: Thu, 05 Oct 2023 Prob (F-statistic): 0.00365
Time: 15:35:07 Log-Likelihood: -43.178
No. Observations: 5 AIC: 94.36
Df Residuals: 1 BIC: 92.80
Df Model: 3
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 30000. 20000. 1.500 0.380 -230000.
290000.
advertising_expenses 10. 2.236 4.472 0.140 -18.3
56 38.356
sales_personnel 1332. 138.590 9.612 0.066 -432
.0 3096.
interaction 0.0011 0.0000 19.600 0.032 0.0006 0
.0015
====================================================
==========================
Omnibus: NaN Durbin-Watson: 1.939
Prob(Omnibus): NaN Jarque-Bera (JB): 0.414
Skew: -0.581 Prob(JB): 0.813
Kurtosis: 1.988 Cond. No. 9.06e+06
====================================================
==========================
```
```python
# Sample dataset
data = {
'sales_growth': [10, 15, 20, 25, 30],
'region': ['North', 'South', 'East', 'West', 'North'],
'online_marketing': [5000, 7000, 8000, 6000, 9000],
'offline_marketing': [10000, 12000, 15000, 11000, 13000]
}
df = pd.DataFrame(data)
```python
import pandas as pd
import statsmodels.api as sm
# Sample data
data = {
'date': pd.date_range(start='2023-01-01', periods=10, freq='M'),
'stock_price': [100, 102, 104, 107, 110, 108, 112, 115, 118, 120]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
```
OLS Regression Results
====================================================
==========================
Dep. Variable: stock_price R-squared: 0.985
Model: OLS Adj. R-squared: 0.982
Method: Least Squares F-statistic: 462.7
Date: Thu, 05 Oct 2023 Prob (F-statistic): 1.21e-05
Time: 15:45:07 Log-Likelihood: -6.6545
No. Observations: 9 AIC: 17.31
Df Residuals: 7 BIC: 18.02
Df Model: 1
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.4043 1.030 2.334 0.054 -0.060 4.869
lag1 0.9481 0.044 21.507 0.000 0.846 1.051
====================================================
==========================
Omnibus: 0.015 Durbin-Watson: 2.341
Prob(Omnibus): 0.992 Jarque-Bera (JB): 0.202
Skew: 0.021 Prob(JB): 0.904
Kurtosis: 2.379 Cond. No. 722.
====================================================
==========================
```
```python
# Sample data
data = {
'date': pd.date_range(start='2023-01-01', periods=24, freq='M'),
'sales': [100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650,
110, 160, 210, 260, 310, 360, 410, 460, 510, 560, 610, 660]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
```
OLS Regression Results
====================================================
==========================
Dep. Variable: sales R-squared: 0.998
Model: OLS Adj. R-squared: 0.996
Method: Least Squares F-statistic: 545.3
Date: Thu, 05 Oct 2023 Prob (F-statistic): 1.32e-07
Time: 15:55:07 Log-Likelihood: -48.039
No. Observations: 23 AIC: 112.1
Df Residuals: 8 BIC: 123.5
Df Model: 14
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 13.6364 52.726 0.259 0.803 -106.324 133.5
97
lag1 0.8862 0.059 15.001 0.000 0.752 1.021
trend 0.0204 0.007 2.787 0.024 0.003 0.038
month_2 0.0653 19.691 0.003 0.998 -45.203 45.3
33
month_3 -0.1138 19.504 -0.006 0.995 -43.086 42.8
59
month_4 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_5 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_6 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_7 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_8 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_9 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_10 0.1138 13.980 0.008 0.994 -33.458 33.
686
month_11 0.1138 13.980 0.008 0.994 -33.458 33.6
86
month_12 0.1138 13.980 0.008 0.994 -33.458 33.
686
====================================================
==========================
Omnibus: 21.153 Durbin-Watson: 2.697
Prob(Omnibus): 0.000 Jarque-Bera (JB): 25.621
Skew: 2.087 Prob(JB): 2.71e-06
Kurtosis: 6.855 Cond. No. 1.73e+05
====================================================
==========================
```
```python
from statsmodels.tsa.api import ARDL
# Sample data
data = {
'date': pd.date_range(start='2023-01-01', periods=30, freq='M'),
'interest_rate': [3, 3.1, 3.2, 3.1, 3.3, 3.4, 3.3, 3.5, 3.6, 3.5, 3.7, 3.8,
3.9, 4, 4.1, 4.2, 4.3, 4.2, 4.4, 4.5, 4.6, 4.5, 4.7, 4.8,
4.9, 5, 5.1, 5.2, 5.1, 5.3],
'inflation_rate': [2, 2.1, 2.2, 2.1, 2.3, 2.4, 2.3, 2.5, 2.6, 2.5, 2.7, 2.8,
2.9, 3, 3.1, 3.2, 3.3, 3.2, 3.4, 3.5, 3.6, 3.5, 3.7, 3.8,
3.9, 4, 4.1, 4.2, 4.1, 4.3]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
```
ARDL Model Results
====================================================
==========================
Dep. Variable: interest_rate No. Observations: 29
Model: ARDL(1, 1) Log Likelihood -14.546
Method: Conditional MLE S.D. of
innovations 0.045
Date: Thu, 05 Oct 2023 AIC -1.299
Time: 16:10:07 BIC 4.199
Sample: 1 HQIC 0.518
29
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.1052 0.025 -4.234 0.001 -0.157 -0.054
interest_rate.L1 1.2507 0.069 18.090 0.000 1.105
1.396
inflation_rate 0.0639 0.020 3.157 0.005 0.023 0
.105
inflation_rate.L1 -0.0729 0.021 -3.451 0.003 -0.117
-0.029
====================================================
==========================
```
```python
# Sample dataset
data = {
'date': pd.date_range(start='2023-01-01', periods=36, freq='M'),
'stock_index': [3000, 3050, 3100, 3150, 3200, 3250, 3300, 3350,
3400, 3450,
3500, 3550, 3600, 3650, 3700, 3750, 3800, 3850, 3900, 3950,
4000, 4050, 4100, 4150, 4200, 4250, 4300, 4350, 4400, 4450,
4500, 4550, 4600, 4650, 4700, 4750],
'fed_rate': [1.5, 1.5, 1.5, 1.5, 2, 2, 2, 2, 2.5, 2.5, 2.5, 2.5,
2, 2, 2, 2, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5,
1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5, 1.5]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
The resulting model reveals the impact of past stock index values,
trends, and lagged Federal Reserve rates on the current stock index:
```
OLS Regression Results
====================================================
==========================
Dep. Variable: stock_index R-squared: 0.997
Model: OLS Adj. R-squared: 0.996
Method: Least Squares F-statistic: 917.4
Date: Thu, 05 Oct 2023 Prob (F-statistic): 1.77e-22
Time: 16:25:07 Log-Likelihood: -98.971
No. Observations: 35 AIC: 205.9
Df Residuals: 31 BIC: 211.7
Df Model: 3
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2.5367 7.246 -0.350 0.728 -17.350 12.27
6
lag1 0.9339 0.032 29.429 0.000 0.869 0.999
trend 0.9491 0.175 5.419 0.000 0.593 1.305
fed_rate_lag1 0.8039 1.399 0.574 0.570 -2.046 3.6
54
====================================================
==========================
```
Choosing the right model for financial analysis is a critical step that
can significantly impact the accuracy and reliability of your
predictions. In financial modeling, where precision is paramount,
understanding and applying appropriate model selection criteria
ensures that the chosen model best fits the data and the task at
hand. This section delves into the various criteria used to evaluate
and select models, offering practical insights and examples using
Python.
Goodness-of-Fit Statistics
```python
import statsmodels.api as sm
import pandas as pd
# Sample data
data = {
'date': pd.date_range(start='2023-01-01', periods=12, freq='M'),
'income': [50, 55, 60, 62, 65, 70, 72, 75, 78, 80, 85, 88],
'expense': [30, 32, 35, 37, 40, 42, 45, 47, 50, 52, 55, 58]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
Output:
```
OLS Regression Results
====================================================
==========================
Dep. Variable: expense R-squared: 0.978
Model: OLS Adj. R-squared: 0.977
Method: Least Squares F-statistic: 432.4
Date: Thu, 05 Oct 2023 Prob (F-statistic): 1.03e-09
Time: 16:45:07 Log-Likelihood: -8.0364
No. Observations: 12 AIC: 20.07
Df Residuals: 10 BIC: 21.08
Df Model: 1
Covariance Type: nonrobust
====================================================
==========================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.0000 1.355 2.952 0.015 0.920 7.080
income 0.8000 0.038 20.799 0.000 0.715 0.88
5
====================================================
==========================
Omnibus: 1.463 Durbin-Watson: 2.214
Prob(Omnibus): 0.481 Jarque-Bera (JB): 0.949
Skew: 0.247 Prob(JB): 0.622
Kurtosis: 1.776 Cond. No. 131.
====================================================
==========================
```
Information Criteria
AIC and BIC are widely used metrics that penalize model complexity
differently.
```python
import numpy as np
# Simulate data
np.random.seed(0)
X = np.random.rand(100, 3)
Y = 1.5 + X[:, 0] * 3 + X[:, 1] * 2 + np.random.randn(100) * 0.5
Output:
```
Model 1 - AIC: 187.905 BIC: 192.115
Model 2 - AIC: 131.671 BIC: 138.091
Model 3 - AIC: 126.655 BIC: 135.285
```
Models with lower AIC and BIC values are preferred, indicating
better trade-offs between fit and complexity. Here, Model 3 is the
most suitable.
Cross-Validation
```python
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression
# Sample dataset
X = df[['income']].values
Y = df['expense'].values
Output:
```
Mean Squared Error: 2.769230769230759
```
Predictive Power
```python
from sklearn.model_selection import train_test_split
Output:
```
Mean Squared Error: 1.25
```
Residual Analysis
Residual analysis examines the errors of the model to check for
patterns or biases, ensuring that residuals are random and normally
distributed.
```python
import matplotlib.pyplot as plt
# Plot residuals
plt.scatter(model.predict(X), residuals)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()
```
```python
import statsmodels.api as sm
import pandas as pd
# Sample data
data = {
'date': pd.date_range(start='2023-01-01', periods=12, freq='M'),
'income': [50, 55, 60, 62, 65, 70, 72, 75, 78, 80, 85, 88],
'expense': [30, 32, 35, 37, 40, 42, 45, 47, 50, 52, 55, 58]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)
Output:
```
R-squared: 0.978
Adjusted R-squared: 0.977
```
Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)
```python
import numpy as np
# Calculate predictions
predictions = model.predict(X)
Output:
```
Mean Squared Error: 1.145
Root Mean Squared Error: 1.070
```
```python
# Calculate MAE
mae = np.mean(np.abs(Y - predictions))
print("Mean Absolute Error:", mae)
```
Output:
```
Mean Absolute Error: 0.95
```
Residual Analysis
```python
import matplotlib.pyplot as plt
# Plot residuals
residuals = Y - predictions
plt.scatter(predictions, residuals)
plt.axhline(y=0, color='r', linestyle='--')
plt.xlabel('Predicted Values')
plt.ylabel('Residuals')
plt.title('Residual Plot')
plt.show()
```
F-statistic
The F-statistic tests the overall significance of the regression model,
comparing the fits of different models.
```python
# Display F-statistic
print("F-statistic:", model.fvalue)
print("p-value of F-statistic:", model.f_pvalue)
```
Output:
```
F-statistic: 432.4
p-value of F-statistic: 1.03e-09
```
P-values
```python
# Display p-values
print(model.pvalues)
```
Output:
```
const 0.015
income 0.000
dtype: float64
```
Low p-values (typically less than 0.05) suggest that the predictors
are statistically significant.
```python
from statsmodels.stats.outliers_influence import
variance_inflation_factor
# Calculate VIF
vif_data = pd.DataFrame()
vif_data["feature"] = X.columns
vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in
range(len(X.columns))]
print(vif_data)
```
Output:
```
feature VIF
0 const 4.403916
1 income 1.000000
```
P
ortfolio theory is the cornerstone that guides investors in the
construction of their investment portfolios. It is a framework that
focuses on maximizing returns while minimizing risk through the
diversification of investments. This section will delve into the core
concepts and mathematical models that underpin portfolio theory,
providing a robust foundation for understanding more advanced
topics later in the book.
# Mathematical Foundation
where:
- \(w_i\) = weight of asset \(i\)
- \(E(R_i)\) = expected return of asset \(i\)
- \(n\) = number of assets in the portfolio
where:
- \(\sigma_{ij}\) = covariance between the returns of asset \(i\) and
asset \(j\)
# Efficient Frontier
The efficient frontier represents the set of optimal portfolios that offer
the highest expected return for a defined level of risk or the lowest
risk for a given level of expected return. Portfolios that lie below the
efficient frontier are considered sub-optimal because they do not
provide enough return for the level of risk taken.
```python
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
plt.figure(figsize=(10, 7))
plt.scatter(results[0,:], results[1,:], c=results[2,:], cmap='YlGnBu',
marker='o')
plt.colorbar(label='Sharpe ratio')
plt.xlabel('Risk (Standard Deviation)')
plt.ylabel('Return')
plt.title('Efficient Frontier')
plt.show()
plot_efficient_frontier(returns, cov_matrix)
```
where:
- \(E(R_i)\) = expected return of the asset
- \(R_f\) = risk-free rate
- \(\beta_i\) = beta of the asset
- \(E(R_m)\) = expected return of the market
# Practical Implications
# Defining Return
# Measuring Risk
```python
import numpy as np
import pandas as pd
# Portfolio weights
weights = np.array([0.4, 0.3, 0.3])
From this code, you can derive the expected return and risk
(standard deviation) of the portfolio. These calculations form the
basis for more complex portfolio optimization techniques.
# Sharpe Ratio
The Sharpe ratio is a critical metric for evaluating the risk-adjusted
return of a portfolio. It is defined as the difference between the
portfolio's return and the risk-free rate, divided by the portfolio's
standard deviation.
To calculate the Sharpe ratio for our example portfolio, you can
extend the previous code as follows:
```python
# Risk-free rate (e.g., 3% or 0.03)
risk_free_rate = 0.03
Graphical Representation:
The efficient frontier is typically plotted on a graph where the x-axis
represents risk (as measured by standard deviation) and the y-axis
represents expected return. Efficient portfolios form a concave curve,
demonstrating the trade-offs between risk and return.
Mathematical Foundation:
To construct the efficient frontier, we solve for portfolios that either
maximize expected return for a given risk level or minimize risk for a
given return. This involves a set of quadratic optimization problems
that can be solved using Python libraries such as SciPy.
1. Data Preparation:
- Gather historical return data for the assets.
- Calculate the mean returns and the covariance matrix.
2. Optimization Setup:
- Define the objective function to minimize portfolio variance.
- Implement constraints to ensure the portfolio returns a specific
target return and the weights sum to one.
```python
import numpy as np
import pandas as pd
from scipy.optimize import minimize
import matplotlib.pyplot as plt
returns_df = pd.DataFrame(returns_data)
Key Insights:
- Risk-Return Trade-Off: The efficient frontier visually represents the
trade-off between risk and return. Portfolios on the upper part of the
curve offer higher returns for additional risk, while those on the lower
part provide lower returns with reduced risk.
- Diversification Benefits: Efficient portfolios typically include a
diverse mix of assets, highlighting the benefits of diversification in
reducing risk without compromising returns.
# Practical Applications
4. Mean-Variance Optimization
Expected Return:
- The weighted average of the individual asset returns, where the
weights represent the proportion of the total investment allocated to
each asset.
Optimization Objective:
- Minimize portfolio variance for a specified target return or maximize
return for a given level of risk.
Mathematically, the optimization problem can be expressed as:
\[ \min \frac{1}{2} w^T \Sigma w \]
subject to:
\[ \sum_{i} w_i \mu_i = \mu_p \]
\[ \sum_{i} w_i = 1 \]
\[ w_i \geq 0 \]
Where:
- \( w \) is the vector of asset weights.
- \( \Sigma \) is the covariance matrix of asset returns.
- \( \mu_i \) is the expected return of asset \( i \).
- \( \mu_p \) is the target portfolio return.
1. Data Preparation:
- Obtain historical return data for the assets.
- Calculate the mean returns and the covariance matrix.
```python
import numpy as np
import pandas as pd
from scipy.optimize import minimize
returns_df = pd.DataFrame(returns_data)
# Number of assets
num_assets = len(mean_returns)
# Target return
target_return = 0.10
After running the above code, the output will provide the optimal
asset weights that achieve the desired return with the lowest
possible risk. This analysis can be extended to include various target
returns, constructing a complete efficient frontier.
Key Insights:
Practical Applications
Ignoring Tail Risks: The focus on mean and variance may overlook
extreme events or tail risks, which can have significant impacts on
portfolio performance.
Where:
- \( E(R_i) \) is the expected return on asset \( i \).
- \( R_f \) is the risk-free rate of return, typically represented by
government bond yields.
- \( \beta_i \) is the beta of asset \( i \), indicating its sensitivity to
market movements.
- \( E(R_m) \) is the expected return of the market portfolio.
- \( E(R_m) - R_f \) is the market risk premium, the excess return
expected from the market over the risk-free rate.
1. Data Collection:
- Obtain historical price data for the stock and a market index (e.g.,
S&P 500).
- Calculate the returns for both the stock and the market index.
- Retrieve the current risk-free rate from financial databases or
government sources.
2. Calculating Beta:
- Perform a regression analysis on the stock's returns against the
market returns to estimate beta.
```python
import numpy as np
import pandas as pd
import yfinance as yf
import statsmodels.api as sm
Key Insights:
# Enhancing CAPM
This completes your in-depth look at the Capital Asset Pricing Model,
a vital component of portfolio optimization. With the knowledge
gained here, you are well-equipped to apply CAPM in real-world
scenarios, enhancing your investment strategies and risk
management practices.
1. Sharpe Ratio
2. Sortino Ratio
3. Treynor Ratio
4. Jensen's Alpha
5. Information Ratio
Each of these metrics has its unique strengths and applications, and
understanding them collectively provides a comprehensive view of
portfolio performance.
Where:
- \( E(R_p) \) is the expected return of the portfolio.
- \( R_f \) is the risk-free rate.
- \( \sigma_p \) is the standard deviation of the portfolio's returns,
representing total risk.
1. Data Collection:
- Obtain historical price data for the assets in the portfolio.
- Calculate the portfolio returns.
- Retrieve the current risk-free rate from financial databases or
government sources.
2. Calculating Returns:
- Compute the daily returns for each asset in the portfolio.
- Aggregate these returns to get the portfolio return.
```python
import numpy as np
import pandas as pd
import yfinance as yf
Key Insights:
Sortino Ratio:
The Sortino Ratio is a variation of the Sharpe Ratio that focuses only
on downside risk, ignoring upside volatility. It is calculated as:
\[ {Sortino Ratio} = \frac{E(R_p) - R_f}{\sigma_d} \]
Where \( \sigma_d \) is the standard deviation of negative returns
(downside deviation). The Sortino Ratio is particularly useful for
investors concerned about downside risk.
Treynor Ratio:
The Treynor Ratio measures the return earned in excess of the risk-
free rate per unit of market risk, as measured by beta:
\[ {Treynor Ratio} = \frac{E(R_p) - R_f}{\beta_p} \]
Where \( \beta_p \) is the portfolio's beta. The Treynor Ratio is useful
for investors with well-diversified portfolios who want to assess
performance relative to systematic risk.
Jensen's Alpha:
Jensen's Alpha measures the excess return of a portfolio over the
expected return predicted by the CAPM:
\[ \alpha = E(R_p) - [R_f + \beta_p (E(R_m) - R_f)] \]
A positive alpha indicates outperformance relative to the CAPM
benchmark, while a negative alpha suggests underperformance.
Information Ratio:
The Information Ratio measures the portfolio's excess return relative
to a benchmark, divided by the tracking error:
\[ {Information Ratio} = \frac{E(R_p) - E(R_b)}{\sigma_{{tracking}}} \]
Where \( E(R_b) \) is the benchmark return and \( \sigma_{{tracking}}
\) is the standard deviation of the excess return. The Information
Ratio is useful for evaluating active managers who aim to outperform
a benchmark.
1. Data Collection:
- Gather historical price data for the portfolio assets.
- Calculate daily returns and the covariance matrix of returns.
3. Specifying Constraints:
- Define the mathematical expressions for each constraint.
```python
import numpy as np
import pandas as pd
import yfinance as yf
from scipy.optimize import minimize
# Define the bounds for each asset weight (example: 0 <= weight <=
0.5)
bounds = tuple((0, 0.5) for _ in range(num_assets))
The output consists of the optimized asset weights that maximize the
portfolio's Sharpe Ratio while adhering to the specified constraints.
This solution ensures that the portfolio is not only optimized for risk-
adjusted returns but also complies with practical investment
considerations.
Key Insights:
Budget Constraints:
Budget constraints ensure that the sum of all asset weights equals
100%, making sure the entire capital is allocated.
Non-Negativity Constraints:
Non-negativity constraints restrict asset weights to non-negative
values, preventing short-selling and ensuring only long positions.
Sector Constraints:
Sector constraints limit exposure to specific sectors or industries,
ensuring diversification across different economic sectors. This helps
mitigate sector-specific risks.
Risk Constraints:
Risk constraints impose limits on risk measures like Value at Risk
(VaR) or portfolio volatility, ensuring the portfolio stays within
acceptable risk levels.
Liquidity Constraints:
Liquidity constraints ensure sufficient liquidity by limiting investments
in illiquid assets, which is crucial for managing redemptions and
transactions.
Institutional Investors:
Institutional investors, such as pension funds and mutual funds, use
portfolio constraints to adhere to regulatory requirements and
investment policies, ensuring compliance and risk management.
Risk-Averse Investors:
Risk-averse investors impose strict risk constraints to limit exposure
to volatile assets, aligning the portfolio with their risk tolerance.
Sector-Specific Funds:
Sector-specific funds use sector constraints to focus on particular
industries, while still ensuring diversification and risk management
within those sectors.
Dynamic Strategies:
Dynamic investment strategies can adjust constraints based on
market conditions, optimizing the portfolio's performance while
adhering to evolving risk and return objectives.
Dynamic Constraints:
Implementing dynamic constraints that adjust based on market
conditions and investor preferences can enhance portfolio
performance and adaptability.
Scenario Analysis:
Running scenario analysis to test the portfolio's performance under
different market conditions helps ensure robustness and resilience.
Multi-Objective Optimization:
Incorporating multi-objective optimization techniques allows
investors to balance multiple objectives, such as maximizing returns,
minimizing risk, and adhering to constraints.
Stochastic Optimization:
Using stochastic optimization techniques that account for uncertainty
in returns and covariances can lead to more robust and realistic
portfolio solutions.
Leveraging Technology:
Advanced technological tools, such as machine learning and AI, can
enhance the optimization process by identifying patterns and insights
that traditional methods might overlook.
where \( r_{it} \) and \( r_{jt} \) are the returns of assets \(i\) and \(j\) at
time \(t\), and \( \bar{r}_i \) and \( \bar{r}_j \) are the mean returns of
assets \(i\) and \(j\) over the period \(T\).
2. Shrinkage Estimators:
Shrinkage methods improve the estimation of the covariance matrix
by combining the sample covariance matrix with a structured
estimator, such as the identity matrix. This technique reduces
estimation error and improves robustness.
3. Factor Models:
Factor models, such as the Capital Asset Pricing Model (CAPM) and
the Fama-French model, estimate the covariance matrix using a set
of common factors that explain the returns. This approach reduces
the number of parameters to estimate and can be more stable.
1. Data Collection:
- Gather historical price data for the portfolio assets.
- Calculate daily returns.
```python
import numpy as np
import pandas as pd
import yfinance as yf
```python
# Calculate the sample covariance matrix
sample_cov_matrix = returns.cov()
print(f"Sample Covariance Matrix:\n{sample_cov_matrix}")
```
```python
from sklearn.covariance import LedoitWolf
1. Risk Management:
The covariance matrix is essential for calculating portfolio risk
metrics such as volatility and Value at Risk (VaR). An accurate
covariance matrix ensures reliable risk assessment.
2. Diversification:
Understanding the covariances between assets helps in constructing
diversified portfolios that can reduce risk without sacrificing expected
returns. Properly estimated covariances reveal how assets interact,
aiding in better diversification strategies.
3. Optimization:
Portfolio optimization techniques, such as mean-variance
optimization, rely heavily on the covariance matrix. Errors in
covariance estimation can lead to suboptimal asset allocations and
affect the portfolio's performance.
4. Stress Testing:
Stress testing involves evaluating the portfolio's performance under
adverse market conditions. An accurate covariance matrix is
necessary to simulate realistic scenarios and assess potential risks.
# Enhancing Covariance Matrix Estimation
1. Regularization Techniques:
Regularization methods, such as shrinkage estimators, improve the
stability and accuracy of the covariance matrix, especially when
dealing with high-dimensional data or limited historical observations.
2. Dynamic Models:
Implementing dynamic models, such as the EWMA, allows the
covariance matrix to adapt to changing market conditions, providing
more timely and relevant risk assessments.
3. Factor Models:
Factor models reduce the dimensionality of the covariance matrix
estimation problem by focusing on common risk factors that drive
asset returns, leading to more stable estimates.
4. High-Frequency Data:
Utilizing high-frequency data can enhance the accuracy of the
covariance matrix estimation, capturing more granular market
movements and improving short-term risk assessments.
1. Institutional Portfolios:
Institutional investors, such as pension funds and endowments, rely
on accurate covariance matrix estimation to manage large and
diverse portfolios, ensuring optimal asset allocation and risk
management.
2. Quantitative Strategies:
Quantitative investment strategies, including hedge funds and
algorithmic trading, require precise covariance estimates to
implement sophisticated models and achieve desired performance
metrics.
4. Quantum Computing:
Quantum computing has the potential to revolutionize covariance
matrix estimation by solving complex optimization problems more
efficiently, leading to more accurate and timely results.
```python
import scipy.optimize as opt
```python
import scipy.optimize as opt
```python
import numpy as np
import scipy.optimize as opt
```python
import scipy.optimize as opt
def constraint1(x):
return x[0] + x[1] - 1 # Equality constraint
```python
import numpy as np
import scipy.optimize as opt
def portfolio_return(weights):
return np.dot(weights, returns)
def portfolio_volatility(weights):
return np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
def objective_function(weights):
return - (portfolio_return(weights) - risk_free_rate) /
portfolio_volatility(weights)
constraints = ({'type': 'eq', 'fun': lambda x: np.sum(x) - 1})
bounds = tuple((0, 1) for asset in range(num_assets))
```python
import numpy as np
import scipy.optimize as opt
from arch import arch_model
print(result.summary())
```
SciPy's optimization tools are indispensable for financial analysts
seeking to solve complex optimization problems efficiently and
accurately. By leveraging these capabilities, one can tackle a wide
range of financial modeling challenges, from portfolio optimization
and risk management to parameter estimation and beyond. As we
navigate the intricacies of financial data and models, SciPy's robust
optimization functions provide the precision and flexibility needed to
achieve optimal outcomes.
To begin, we need historical price data for the assets in our portfolio.
For this case study, let's consider a simplified portfolio consisting of
five stocks. We will use Python libraries such as `pandas` and
`yfinance` to fetch and manipulate this data.
```python
import pandas as pd
import yfinance as yf
```python
import numpy as np
import scipy.optimize as opt
```python
optimal_return = np.dot(optimal_weights, mean_returns)
optimal_volatility = np.sqrt(np.dot(optimal_weights.T,
np.dot(cov_matrix, optimal_weights)))
optimal_sharpe_ratio = (optimal_return - risk_free_rate) /
optimal_volatility
```python
import matplotlib.pyplot as plt
E
conometrics combines statistical methods with economic theory
to analyze financial data and test hypotheses. Within the
bustling world of finance, econometrics stands as a cornerstone
for making informed decisions, providing a bridge between
theoretical models and real-world data. This section delves into the
foundational principles of econometrics, highlighting its significance
and applications in financial contexts.
First, we need to import the necessary libraries and load the financial
data. For this example, we’ll analyze the relationship between stock
returns and market returns.
```python
import pandas as pd
import statsmodels.api as sm
import yfinance as yf
```python
# Add a constant term for the intercept
data['Constant'] = 1
print(results)
```
```python
# Hypothesis test for the slope coefficient
t_stat = model.tvalues['Market']
p_value = model.pvalues['Market']
print(f'T-statistic: {t_stat}')
print(f'P-value: {p_value}')
```
```python
# Heteroscedasticity test (Breusch-Pagan)
from statsmodels.stats.diagnostic import het_breuschpagan
dw_stat = durbin_watson(model.resid)
print(f'Durbin-Watson statistic: {dw_stat}')
```
First, we import the necessary libraries and load the financial data.
For this example, we'll analyze the relationship between a stock's
excess returns and a set of explanatory variables.
```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
import yfinance as yf
Next, we define the moment conditions for our model. For simplicity,
we'll assume a linear relationship between excess returns and the
explanatory variables.
```python
def moment_conditions(params, data):
alpha, beta_market, beta_size, beta_value = params
y = data['Excess_Returns']
X = np.column_stack((data['Market'], data['Size'], data['Value']))
residuals = y - (alpha + beta_market * X[:, 0] + beta_size * X[:, 1] +
beta_value * X[:, 2])
return np.column_stack((residuals, residuals * X))
```
```python
from statsmodels.sandbox.regression.gmm import GMM
class AssetPricingGMM(GMM):
def momcond(self, params):
return moment_conditions(params, self.data)
print(gmm_results.summary())
```
# Step 4: Interpreting Results
\[
Y_t = c + A_1 Y_{t-1} + A_2 Y_{t-2} + \cdots + A_p Y_{t-p} +
\epsilon_t
\]
where:
- \( Y_t \) is a vector of \( k \) endogenous variables.
- \( c \) is a vector of constants (intercepts).
- \( A_i \) are \( k \times k \) coefficient matrices.
- \( \epsilon_t \) is a vector of error terms, which are assumed to be
white noise with zero mean and constant covariance matrix \( \Sigma
\).
First, we import the necessary libraries and load the financial data.
```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.tsa.api import VAR
import yfinance as yf
Next, we specify and fit the VAR model to the data. We begin by
selecting the optimal lag length using information criteria.
```python
# Create VAR model instance
model = VAR(data)
```python
# Generate impulse response functions
irf = var_results.irf(10)
irf.plot(orth=False)
```
```python
# Perform variance decomposition
fevd = var_results.fevd(10)
fevd.plot()
```
Consider two time series \( X_t \) and \( Y_t \). If both series are
integrated of order one (i.e., I(1)), they are non-stationary, but if there
exists a linear combination \( Z_t = X_t - \beta Y_t \) that is stationary
(i.e., I(0)), then \( X_t \) and \( Y_t \) are said to be cointegrated.
\[
\Delta Y_t = \alpha + \gamma (\beta X_{t-1} - Y_{t-1}) + \sum_{i=1}^k
\delta_i \Delta X_{t-i} + \sum_{j=1}^k \phi_j \Delta Y_{t-j} + \epsilon_t
\]
where:
- \( \Delta \) denotes the first difference operator.
- \( \beta X_{t-1} - Y_{t-1} \) is the error correction term representing
the long-term equilibrium relationship.
- \( \gamma \) is the speed of adjustment coefficient, indicating how
quickly deviations from the long-term equilibrium are corrected.
- \( \delta_i \) and \( \phi_j \) are short-term dynamic coefficients.
First, we import the necessary libraries and load the financial data.
```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
import yfinance as yf
# Download historical data for stock prices (SPY) and interest rates
(10-year Treasury yield)
spy = yf.download('SPY', start='2010-01-01', end='2023-01-01')['Adj
Close']
treasury_yield = yf.download('^TNX', start='2010-01-01', end='2023-
01-01')['Adj Close']
Next, we test for cointegration between the two time series using the
Engle-Granger two-step method.
```python
# Step 1: Cointegrating regression
coint_reg = sm.OLS(data['SPY'],
sm.add_constant(data['Treasury_Yield'])).fit()
data['residuals'] = coint_reg.resid
```python
# Create lagged variables
data['SPY_lag'] = data['SPY'].shift(1)
data['Treasury_Yield_lag'] = data['Treasury_Yield'].shift(1)
data['residuals_lag'] = data['residuals'].shift(1)
\[
r_t = \mu + \epsilon_t \quad {with} \quad \epsilon_t = \sigma_t z_t
\]
\[
\sigma_t^2 = \alpha_0 + \sum_{i=1}^p \alpha_i \epsilon_{t-i}^2 +
\sum_{j=1}^q \beta_j \sigma_{t-j}^2
\]
where:
- \( r_t \) is the return at time \( t \).
- \( \mu \) is the mean return.
- \( \epsilon_t \) is the residual error term, with \( \epsilon_t \sim N(0,
\sigma_t^2) \).
- \( \sigma_t^2 \) is the conditional variance at time \( t \).
- \( z_t \) is a standard normal random variable.
- \( \alpha_0, \alpha_i, \beta_j \) are model parameters.
```python
import numpy as np
import pandas as pd
import yfinance as yf
from arch import arch_model
Next, we specify and fit the GARCH(1,1) model to the daily returns.
```python
# Specify the GARCH(1,1) model
model = arch_model(returns, vol='Garch', p=1, q=1)
```python
# Forecast the next 5 days of volatility
forecast = garch_fit.forecast(horizon=5)
print(cond_var)
```
First, we import the necessary libraries and load the data, which
includes economic indicators and stock index returns.
```python
import pandas as pd
from semopy import Model
df = pd.DataFrame(data)
```
```python
# Define the SEM model
model_desc = """
Stock_returns ~ GDP_growth + Inflation_rate + Interest_rate
GDP_growth ~~ Inflation_rate
GDP_growth ~~ Interest_rate
Inflation_rate ~~ Interest_rate
"""
model = Model(model_desc)
```
```python
# Fit the model
model.fit(df)
Bayesian Econometrics
Where:
- \( P(\theta | y) \) is the posterior probability of the parameter \( \theta
\) given data \( y \).
- \( P(y | \theta) \) is the likelihood of observing data \( y \) given
parameter \( \theta \).
- \( P(\theta) \) is the prior probability of \( \theta \).
- \( P(y) \) is the marginal likelihood of data \( y \).
```python
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
In this example, we start with a prior belief about the mean return of
a stock, based on historical knowledge. As we observe more data
(sample returns), we update our belief using Bayesian inference.
The posterior distribution reflects our updated belief after considering
the observed data.
Applications in Finance
```python
import pandas as pd
import numpy as np
import scipy.stats as stats
```python
import pandas as pd
import matplotlib.pyplot as plt
```python
import numpy as np
```python
# Aggregating order book data
order_book = {
'bid_prices': np.random.rand(10),
'ask_prices': np.random.rand(10) + 1,
'bid_sizes': np.random.randint(1, 10, size=10),
'ask_sizes': np.random.randint(1, 10, size=10)
}
```python
# Example of a simple market making strategy
spread = 0.01
buy_order_price = data['price'].iloc[-1] - spread / 2
sell_order_price = data['price'].iloc[-1] + spread / 2
print(f'Placing buy order at: {buy_order_price}')
print(f'Placing sell order at: {sell_order_price}')
```
Applications in Finance
The applications of high-frequency data analysis are vast and varied.
Here are some key use cases:
```python
import numpy as np
```python
import pandas as pd
import numpy as np
1. ARIMA Model:
```python
import statsmodels.api as sm
2. GARCH Model:
```python
from arch import arch_model
```python
# Load multivariate time series data
multivariate_data = pd.read_csv('multivariate_data.csv')
multivariate_data['date'] = pd.to_datetime(multivariate_data['date'])
multivariate_data.set_index('date', inplace=True)
var_model = VAR(multivariate_data)
var_result = var_model.fit(maxlags=15)
Applications in Finance
1. Risk Management:
2. Portfolio Optimization:
Asset price models aid in the construction of optimized portfolios. By
predicting returns and volatilities, investors can allocate their assets
to maximize returns while minimizing risk.
3. Algorithmic Trading:
```python
# Example of a momentum trading strategy
data['momentum'] = data['price'].pct_change(periods=10)
buy_signal = data[data['momentum'] > 0.05]
sell_signal = data[data['momentum'] < -0.05]
4. Economic Forecasting:
5. Valuation of Derivatives:
```python
# Monte Carlo simulation for option pricing
S0 = data['price'].iloc[-1] # Current stock price
K = 100 # Strike price
T = 1 # Time to maturity
r = 0.05 # Risk-free rate
sigma = np.std(data['log_return']) # Volatility
simulations = 10000
payoffs = []
for _ in range(simulations):
ST = S0 * np.exp((r - 0.5 * sigma2) * T + sigma * np.sqrt(T) *
np.random.randn())
payoffs.append(max(ST - K, 0))
```python
from keras.models import Sequential
from keras.layers import LSTM, Dense
from sklearn.preprocessing import MinMaxScaler
X, y = np.array(X), np.array(y)
X = np.reshape(X, (X.shape[0], X.shape[1], 1))
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X, y, epochs=10, batch_size=32)
For our case study, we will forecast the Gross Domestic Product
(GDP) growth rate using a combination of leading, lagging, and
coincident indicators.
```python
import pandas as pd
import numpy as np
plt.figure(figsize=(10, 5))
for column in data.columns:
plt.plot(data.index, data[column], label=column)
plt.title('Economic Indicators Over Time')
plt.xlabel('Date')
plt.ylabel('Normalized Value')
plt.legend()
plt.show()
```
```python
from statsmodels.tsa.api import VAR
```python
# Forecast future values
n_forecasts = len(test_data)
forecast = var_result.forecast(train_data.values[-var_result.k_ar:],
steps=n_forecasts)
forecast_df = pd.DataFrame(forecast, index=test_data.index,
columns=test_data.columns)
```python
# Load exogenous variables
exog_data = pd.read_csv('exogenous_variables.csv')
exog_data['date'] = pd.to_datetime(exog_data['date'])
exog_data.set_index('date', inplace=True)
1. Policymaking:
Governments and central banks use GDP forecasts to make
informed decisions about monetary policy, taxation, and public
spending. Accurate predictions enable proactive measures to
stabilize the economy.
2. Investment Decisions:
3. Corporate Strategy:
4. Economic Research:
B
acktesting is a critical process in the development and
validation of financial models. It involves simulating how a
model would have performed in the past using historical data.
By doing so, financial analysts can assess the effectiveness and
robustness of their models before applying them to live trading or
investment decisions. In this section, we will delve deeply into the
methodologies and best practices of backtesting financial models
using Python, SciPy, and StatsModels.
# Import libraries
import pandas as pd
import numpy as np
import scipy as sp
import statsmodels.api as sm
import yfinance as yf
import matplotlib.pyplot as plt
```python
# Define short and long windows
short_window = 40
long_window = 100
# Create signals
data['Short_MA'] = data['Adj Close'].rolling(window=short_window,
min_periods=1, center=False).mean()
data['Long_MA'] = data['Adj Close'].rolling(window=long_window,
min_periods=1, center=False).mean()
data['Signal'] = 0
data['Signal'][short_window:] = np.where(data['Short_MA']
[short_window:] > data['Long_MA'][short_window:], 1, 0)
data['Position'] = data['Signal'].diff()
# Plot signals
plt.figure(figsize=(12, 6))
plt.plot(data['Adj Close'], label='Price')
plt.plot(data['Short_MA'], label=f'{short_window}-day SMA')
plt.plot(data['Long_MA'], label=f'{long_window}-day SMA')
plt.plot(data[data['Position'] == 1].index, data['Short_MA']
[data['Position'] == 1], '^', markersize=10, color='g', label='Buy
Signal')
plt.plot(data[data['Position'] == -1].index, data['Short_MA']
[data['Position'] == -1], 'v', markersize=10, color='r', label='Sell
Signal')
plt.title('SMA Crossover Strategy')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
```
```python
# Calculate returns
data['Strategy_Returns'] = data['Returns'] * data['Signal'].shift(1)
```python
# Define transaction cost (e.g., 0.1% per trade)
transaction_cost = 0.001
```python
# Monte Carlo simulation
n_simulations = 1000
simulation_results = np.zeros((n_simulations,
len(data['Strategy_Returns'])))
for i in range(n_simulations):
simulation_results[i, :] =
np.random.choice(data['Strategy_Returns'].dropna(),
size=len(data['Strategy_Returns']), replace=True).cumsum()
1. Performance Metrics:
2. Market Conditions:
3. Sensitivity Analysis:
```python
# Install necessary libraries
!pip install pandas numpy scipy statsmodels yfinance matplotlib
# Import libraries
import pandas as pd
import numpy as np
import scipy as sp
import statsmodels.api as sm
import yfinance as yf
import matplotlib.pyplot as plt
```python
# Define a function to simulate changes in interest rates
def sensitivity_analysis(data, interest_rate_changes):
results = {}
for change in interest_rate_changes:
data['Adjusted_Returns'] = data['Returns'] + change
cumulative_returns = (1 + data['Adjusted_Returns']).cumprod()
results[change] = cumulative_returns
return results
Scenario Analysis
```python
# Define a function for reverse stress testing
def reverse_stress_testing(data, target_drawdown):
potential_scenarios = []
for shock in np.linspace(-0.1, 0, 100):
data['Adjusted_Returns'] = data['Returns'] + shock
cumulative_returns = (1 + data['Adjusted_Returns']).cumprod()
max_drawdown = (cumulative_returns /
cumulative_returns.cummax() - 1).min()
if max_drawdown <= target_drawdown:
potential_scenarios.append((shock, max_drawdown))
break
return potential_scenarios
```python
# Define a function to calculate VaR and CVaR
def calculate_var_cvar(data, confidence_level=0.95):
returns = data['Returns'].dropna()
var = np.percentile(returns, (1 - confidence_level) * 100)
cvar = returns[returns <= var].mean()
return var, cvar
for i in range(n_simulations):
simulated_returns = np.random.choice(data['Returns'].dropna(),
size=len(data['Returns']), replace=True)
simulation_results[i, :] = (1 + simulated_returns).cumprod()
```python
# Install necessary libraries
!pip install pandas numpy scipy statsmodels yfinance matplotlib
# Import libraries
import pandas as pd
import numpy as np
import scipy as sp
import statsmodels.api as sm
import yfinance as yf
import matplotlib.pyplot as plt
```python
# Calculate moving averages
data['SMA50'] = data['Adj Close'].rolling(window=50).mean()
data['SMA200'] = data['Adj Close'].rolling(window=200).mean()
```python
# Calculate Bollinger Bands
data['SMA20'] = data['Adj Close'].rolling(window=20).mean()
data['StdDev'] = data['Adj Close'].rolling(window=20).std()
data['UpperBand'] = data['SMA20'] + (data['StdDev'] * 2)
data['LowerBand'] = data['SMA20'] - (data['StdDev'] * 2)
# Generate trading signals
data['Signal'] = 0
data['Signal'] = np.where(data['Adj Close'] < data['LowerBand'], 1,
np.where(data['Adj Close'] > data['UpperBand'], -1, 0))
Arbitrage Strategies
```python
# Load historical data for two correlated assets (e.g., S&P 500 and
NASDAQ 100)
data1 = yf.download('^GSPC', start='2000-01-01', end='2022-01-01')
data2 = yf.download('^IXIC', start='2000-01-01', end='2022-01-01')
```python
# Simulate market making strategy
data['MidPrice'] = (data['High'] + data['Low']) / 2
data['Bid'] = data['MidPrice'] - (0.01 * data['MidPrice'])
data['Ask'] = data['MidPrice'] + (0.01 * data['MidPrice'])
Statistical Arbitrage
```python
# Calculate z-score of the spread for statistical arbitrage
data1['ZScore'] = (data1['Spread'] - mean_spread) / std_spread
```python
# Define a function for backtesting
def backtest_strategy(data, initial_capital=10000):
positions = data['Signal'].shift().fillna(0)
daily_returns = data['Returns'] * positions
cumulative_returns = (1 + daily_returns).cumprod() * initial_capital
return cumulative_returns
```python
import pandas as pd
from sklearn.preprocessing import StandardScaler
# Feature Engineering
data['Moving_Average'] = data['Close'].rolling(window=10).mean()
# Normalize data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(data[['Open', 'High', 'Low', 'Close',
'Volume', 'Moving_Average']])
```
```python
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# Train Random Forest model
rf_model = RandomForestClassifier(n_estimators=100,
random_state=42)
rf_model.fit(X_train, y_train)
```python
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
lstm_model.compile(optimizer='adam', loss='mse')
lstm_model.fit(X_train, y_train, epochs=50, batch_size=32,
validation_data=(X_test, y_test))
```python
from sklearn.model_selection import cross_val_score
# Cross-validation example
cv_scores = cross_val_score(model, X, y, cv=5)
print(f'Cross-Validation Scores: {cv_scores}')
```
```python
import shap
Financial risk refers to the uncertainty and potential for financial loss
inherent in any investment or business operation. It encompasses
various types, including:
1. Value at Risk (VaR): Estimates the maximum loss that can occur
over a specified period with a certain confidence level.
2. Conditional Value at Risk (CVaR): Provides an average loss
beyond the VaR threshold, offering a more comprehensive risk
assessment.
3. Standard Deviation and Variance: Measure the dispersion of
returns, serving as indicators of volatility.
4. Beta: Measures the sensitivity of a stock or portfolio to market
movements.
5. Sharpe Ratio: Assesses risk-adjusted performance by comparing
the excess return of an investment to its standard deviation.
Using Python, you can calculate these risk metrics to analyze and
manage financial risk effectively. Here's a practical example to
illustrate the calculation of VaR and CVaR.
```python
import numpy as np
import pandas as pd
# Load historical price data
data = pd.read_csv('historical_prices.csv')
returns = data['Close'].pct_change().dropna()
```python
cvar = returns[returns <= var].mean()
print(f'Conditional Value at Risk (CVaR): {cvar}')
```
# Portfolio weights
weights = np.array([0.4, 0.4, 0.2])
# Portfolio variance
portfolio_variance = np.dot(weights.T, np.dot(np.cov(asset_returns),
weights))
print(f'Portfolio Variance: {portfolio_variance}')
```
Hedging Strategies
Here's how you can use Python to model a simple hedging strategy
with options:
```python
from scipy.stats import norm
import numpy as np
# Parameters
S = 100 # Current stock price
K = 105 # Strike price
T=1 # Time to maturity in years
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility
```python
import numpy as np
Where:
- \( C \) = Call option price
- \( S_0 \) = Current stock price
- \( K \) = Strike price
- \( T \) = Time to expiration
- \( r \) = Risk-free rate
- \( N(\cdot) \) = Cumulative distribution function of the standard
normal distribution
- \( d_1 = \frac{\ln(S_0/K) + (r + \sigma^2/2)T}{\sigma \sqrt{T}} \)
- \( d_2 = d_1 - \sigma \sqrt{T} \)
```python
from scipy.stats import norm
import numpy as np
# Parameters
S = 100 # Current stock price
K = 105 # Strike price
T=1 # Time to maturity in years
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility
# Calculate call and put option prices
call_price = black_scholes(S, K, T, r, sigma, 'call')
put_price = black_scholes(S, K, T, r, sigma, 'put')
print(f'Call Option Price: {call_price}')
print(f'Put Option Price: {put_price}')
```
```python
def binomial_option_pricing(S, K, T, r, sigma, n, option_type='call'):
dt = T / n
u = np.exp(sigma * np.sqrt(dt))
d=1/u
p = (np.exp(r * dt) - d) / (u - d)
# Initialize asset prices at maturity
asset_prices = np.zeros((n + 1, n + 1))
option_values = np.zeros((n + 1, n + 1))
asset_prices[0, 0] = S
return option_values[0, 0]
# Parameters
S = 100 # Current stock price
K = 105 # Strike price
T=1 # Time to maturity in years
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility
n = 100 # Number of time intervals
The Greeks are measures of sensitivity that describe how the price
of an option changes with respect to various parameters. The most
commonly used Greeks are:
```python
def black_scholes_greeks(S, K, T, r, sigma):
d1 = (np.log(S / K) + (r + 0.5 * sigma 2) * T) / (sigma * np.sqrt(T))
d2 = d1 - sigma * np.sqrt(T)
delta = norm.cdf(d1)
gamma = norm.pdf(d1) / (S * sigma * np.sqrt(T))
return delta, gamma
# Parameters
S = 100 # Current stock price
K = 105 # Strike price
T=1 # Time to maturity in years
r = 0.05 # Risk-free rate
sigma = 0.2 # Volatility
# Potential outcomes
stock_price_at_expiration = np.array([90, 100, 110])
payoff = np.maximum(stock_price_at_expiration - S + call_premium,
call_premium - (K - stock_price_at_expiration))
\[ V(x) =
\begin{cases}
x^\alpha & {if } x \geq 0 \\
- \lambda (-x)^\beta & {if } x < 0
\end{cases}
\]
Where:
- \( \alpha \) and \( \beta \) are typically less than 1, reflecting
diminishing sensitivity.
- \( \lambda \) is the loss aversion coefficient, typically greater than 1.
```python
import numpy as np
import matplotlib.pyplot as plt
```python
import numpy as np
def anchoring_adjustment(initial_value, adjustments, anchor):
return anchor + adjustments * (initial_value - anchor)
# Example data
initial_value = 100
adjustments = np.random.normal(0, 1, 100)
anchor = 90
Overconfidence in Trading
```python
import numpy as np
```python
import numpy as np
import matplotlib.pyplot as plt
# Parameters
n_agents = 100
n_steps = 200
```python
import pandas as pd
from textblob import TextBlob
# Convert to DataFrame
news_df = pd.DataFrame(news_data)
```python
import matplotlib.pyplot as plt
import pandas as pd
# Candlestick Charts
```python
import matplotlib.dates as mdates
import mplfinance as mpf
# Sample OHLC (Open, High, Low, Close) data
ohlc_data = pd.DataFrame({
'Date': dates,
'Open': np.random.randn(100).cumsum(),
'High': np.random.randn(100).cumsum() + 1,
'Low': np.random.randn(100).cumsum() - 1,
'Close': np.random.randn(100).cumsum()
})
ohlc_data.set_index('Date', inplace=True)
# Heatmaps
```python
import seaborn as sns
import numpy as np
# Pair Plots
```python
# Sample financial data
df = pd.DataFrame({
'returns': np.random.randn(100),
'volatility': np.random.randn(100),
'dividend_yield': np.random.randn(100)
})
Interactive line charts allow users to zoom, pan, and hover over data
points for detailed insights.
```python
import plotly.graph_objects as go
fig.show()
```
```python
import plotly.graph_objects as go
fig = go.Figure(data=[go.Candlestick(x=ohlc_data.index,
open=ohlc_data['Open'],
high=ohlc_data['High'],
low=ohlc_data['Low'],
close=ohlc_data['Close'])])
fig.show()
```
# 3D Surface Plots
```python
import plotly.graph_objects as go
import numpy as np
# Sample data
x = np.linspace(-2, 2, 50)
y = np.linspace(-2, 2, 50)
X, Y = np.meshgrid(x, y)
Z = np.sin(np.sqrt(X2 + Y2))
fig.show()
```
# Interactive Dashboards
```python
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
app.layout = html.Div([
dcc.Graph(id='timeseries-plot'),
dcc.Slider(
id='slider',
min=0,
max=99,
value=50,
marks={i: f'Day {i}' for i in range(0, 100, 10)}
)
])
@app.callback(
Output('timeseries-plot', 'figure'),
[Input('slider', 'value')]
)
def update_figure(selected_day):
filtered_prices = prices[:selected_day]
fig = go.Figure()
fig.add_trace(go.Scatter(x=filtered_prices.index, y=filtered_prices,
mode='lines', name='Stock Prices'))
fig.update_layout(title='Interactive Stock Prices Over Time',
xaxis_title='Date',
yaxis_title='Price')
return fig
if __name__ == '__main__':
app.run_server(debug=True)
```
```python
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px
app = dash.Dash(__name__)
app.layout = html.Div([
html.H1('Financial Performance Dashboard'),
dcc.Graph(id='performance-plot'),
dcc.Dropdown(
id='metric-dropdown',
options=[
{'label': 'Returns', 'value': 'returns'},
{'label': 'Volatility', 'value': 'volatility'},
{'label': 'Dividend Yield', 'value': 'dividend_yield'}
],
value='returns'
)
])
@app.callback(
Output('performance-plot', 'figure'),
[Input('metric-dropdown', 'value')]
)
def update_figure(selected_metric):
fig = px.line(performance_data, x='month', y=selected_metric,
title=f'Monthly {selected_metric.capitalize()}')
return fig
if __name__ == '__main__':
app.run_server(debug=True)
```
# The Objective
# Data Collection
# Data Preprocessing
Before we build any models, it's crucial to preprocess the data. This
step includes handling missing values, scaling features, and creating
any necessary derived variables.
```python
# Handling missing values
stock_data = stock_data.dropna()
```python
import matplotlib.pyplot as plt
ARIMA is a popular method for time series forecasting. We'll use the
`statsmodels` library to fit an ARIMA model to our data.
```python
from statsmodels.tsa.arima.model import ARIMA
# Making predictions
predictions = model_fit.forecast(steps=len(test))
```
```python
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
time_step = 10
X_train, y_train = create_dataset(scaled_prices[:train_size],
time_step)
X_test, y_test = create_dataset(scaled_prices[train_size:],
time_step)
# Making predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)
```python
from sklearn.metrics import mean_absolute_error,
mean_squared_error
```python
import time
# Make prediction
predicted_price = model.predict(X_input)
predicted_price = scaler.inverse_transform(predicted_price)
return predicted_price[0][0]
# Example usage
recent_data = closing_prices[-time_step:].values # Using the last
'time_step' closing prices
predicted_price = predict_real_time(model, scaler, time_step,
recent_data)
print(f'Predicted next closing price: {predicted_price}')
By following this case study, you have learned how to predict stock
market movements using statistical and machine learning models in
Python. These skills are invaluable for financial analysts and traders,
offering a significant edge in making data-driven investment
decisions.
Key Developments:
1. Automated Machine Learning (AutoML): Tools like AutoML
democratize machine learning by simplifying the model-building
process, enabling non-experts to develop complex models.
2. Explainable AI (XAI): As models become more complex,
understanding their decision-making processes is crucial. XAI
techniques ensure transparency and trust in AI-driven financial
models.
3. Reinforcement Learning: Used extensively in trading strategies,
reinforcement learning optimizes decisions through trial and error,
adapting to market conditions dynamically.
Practical Implementation:
```python
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error,
mean_squared_error
Key Developments:
1. Alternative Data Sources: Social media, satellite imagery, and
transaction data provide new insights that traditional data cannot
capture.
2. Real-Time Analytics: The ability to process and analyze data in
real time allows for quicker decision-making and the development of
more responsive models.
3. Data Integration: Combining structured and unstructured data
from various sources enhances model accuracy and robustness.
Practical Implementation:
```python
from pyspark.sql import SparkSession
# ESG Integration
Key Developments:
1. ESG Data Providers: Companies like MSCI and Sustainalytics
offer comprehensive ESG data, enabling more informed investment
decisions.
2. ESG Scoring Models: Developing robust ESG scoring models
helps quantify the impact of ESG factors on financial performance.
3. Regulatory Compliance: Growing regulations around ESG
disclosures necessitate the integration of these factors into financial
models.
Practical Implementation:
```python
import pandas as pd
# Quantum Computing
Key Developments:
1. Quantum Algorithms: Algorithms like the Quantum Approximate
Optimization Algorithm (QAOA) hold promise for solving optimization
problems in finance.
2. Quantum Machine Learning: Combining quantum computing with
machine learning can enhance model training and prediction
capabilities.
3. Industry Partnerships: Companies like IBM and Google are
collaborating with financial institutions to explore quantum computing
applications.
Practical Implementation:
```python
from qiskit import Aer, QuantumCircuit, transpile
from qiskit.visualization import plot_histogram
from qiskit.providers.aer import QasmSimulator