0% found this document useful (0 votes)
94 views3 pages

Regime Detection via Unsupervised Learning

Uploaded by

Aditi Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views3 pages

Regime Detection via Unsupervised Learning

Uploaded by

Aditi Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Task@RozReturns

Problem Statement :: Regime Detection via Unsupervised Learning from Order Book and
Volume Data

You might not know about many technical terms and jargon. Feel free to use chatgpt!

Objective :: Segment the market into distinct behavioral regimes depending on 3 factors:
1.​ Trending vs Mean-reverting
2.​ Volatile vs Stable
3.​ Liquid vs Illiquid
Using unsupervised learning, based on real-time order book and volume features.

Data Link :: Data


depth20 : Top 20 levels of order book data (price, quantity for bid/ask)
aggTrade : Trade volume data (e.g., per second or per event)

Step-by-Step Task Breakdown ::

Note: This is an example breakdown, you can build differently from this or more on
top of this. That would be counted as a plus.

1. Feature Engineering

Extract meaningful, hand-crafted features from each timestamp's data. For example:

Liquidity & Depth Features: Bid/Ask spread = ask_1_price - bid_1_price

Order book imbalance at each level:


imbalance_lvl1 = (bid_qty_1 - ask_qty_1) / (bid_qty_1 + ask_qty_1)
microprice = (bid_1_price * ask_qty_1 + ask_1_price * bid_qty_1) / (bid_qty_1 +
ask_qty_1)
Cumulative depth:
cum_bid_qty = sum(bid_qty_1 to bid_qty_20)
cum_ask_qty = sum(ask_qty_1 to ask_qty_20)

Volatility & Price Action:


Rolling mid-price return: log(mid_t / mid_t-1)
Price volatility: standard deviation of returns in last 10s, 30s

Volume Features:
Volume imbalance (buy volume vs sell volume)
Cumulative volume in last 10s, 30s
VWAP shift (change in VWAP over short windows)

Derived Features:
Sloped depth: quantify how quickly size decays away from top of book
Trade Wipe Level : Average levels wiped in a duration by trades (10 sec, 30sec, etc)

2. Data Normalization

Normalize features (z-score or min-max or any other norm)


Optionally, reduce dimensionality using PCA or any other metric.

3. Clustering

Apply one or more clustering algorithms:

K-means (with elbow plot to choose K)


HDBSCAN (handles noise and non-spherical clusters)
Gaussian Mixture Models (to get soft probabilities)
Compare clustering quality via silhouette score, Davies-Bouldin index or other metrics.
4. Regime Labeling and Analysis

Assign a “market regime” label to each timestamp.


Analyze characteristics of each regime:

Average volatility
Typical spread and liquidity
Price movement directionality

Name and describe each regime (e.g., " Trending & Liquidity & Stable", "Mean Reverting &
Illiquid & Volatile")

5: Visualization (Good Informatic plots)

Plot regime evolution over time (regime label vs time).


Overlay with price charts or volatility to visually validate.
Use t-SNE or UMAP to visualize clusters in 2D space.

6. Regime Change Insights

See if there is any correlation when one state of regime changes to another i.e. what is the
probability a “Trending & Liquidity & Stable” follows “Mean Reverting & Illiquid & Volatile” etc.

Deliverables:

1. Ipython notebook or script with code and results


2. Report : 1-2 pages
Explanation of any custom features, clustering metric, clustering functions apart from
above.
Clustering results
Regime insights
Visualizations

You might also like