Assignment questions
Assignment questions
Dataset: You are provided with a dataset containing historical data of a futures contract, including:
OHLCV Data: Open, High, Low, Close prices, and Volume for each time period (e.g., daily).
Time and Sales Data: Detailed information about individual trades, including price,
volume, and timestamp.
Objective: Develop machine learning models to analyze market regimes (distinct patterns or states
in the market) using the provided data.
Using the OHLCV data, build a decision tree model to classify market regimes into three categories:
"Trending Up": Characterized by consistently higher highs and higher lows, potentially with
increasing volume.
"Trending Down": Characterized by consistently lower highs and lower lows, potentially
with increasing volume.
Explain your feature selection process, how you would handle overfitting, and how you would
evaluate the performance of your model.
Using the OHLCV data, apply a clustering algorithm (e.g., K-means) to group similar trading days
together.
Analyze the characteristics of each cluster (e.g., average price volatility, average volume).
Explain how this clustering analysis could be used to gain insights into di erent market
conditions.
Part 3: Hidden Markov Model for Regime Prediction (dataset given OHLCV 5min data)
Using the OHLCV data (and potentially Time and Sales data if you see fit), develop a Hidden Markov
Model (HMM) to:
Use the trained HMM to predict the most likely market regime for future time periods.
Explain your choice of hidden states, how you would estimate the model parameters, and how you
would evaluate the predictive accuracy of your HMM.
Bonus:
How could you incorporate Time and Sales data into your HMM to potentially improve its
predictive power?
Discuss the limitations of your models and potential areas for improvement.
# Rates Prediction
Dataset: You are provided with a dataset containing historical settlement price of a futures
contracts (from Dec24 to Dec28).
Objective: Develop a ML model which predicts next day settlement prices based on the given
dataset.
For example, below is the correlation and ratio matrix for contracts from Dec24 to Jun29 using
previous 15 days data.
Note – You model must decide number of days to be taken for calculation of ratio and
correlation (both should be taking same number of days for calculation) to optimize the model.
Part 2:
Calculate minimum number of next day contracts to Speculate that can result in prediction of all
the next day contract’s prices with maximum accuracy in relative to the actual next day data.
Maximum accuracy of prediction can be defined as lowest mean square error of predicted vs actual
rates of next day.