Machine Learning For Quants
Machine Learning For Quants
Contents
1 Introduction 2
2 Supervised Learning 2
2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 K-Nearest Neighbors (KNN) . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.6 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.7 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.8 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9.1 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.10 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.1 Regression Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.2 Classification Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.3 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.10.4 Classification Metrics in Layman’s Terms . . . . . . . . . . . . . . 7
2.10.5 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Unsupervised Learning 8
3.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Association Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . 9
4 Reinforcement Learning 9
4.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Deep Q Networks (DQN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Double Deep Q Networks (DDQN) . . . . . . . . . . . . . . . . . . . . . 10
4.4 Policy Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Overfitting 10
5.1 Causes of Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2.1 Regularization Techniques . . . . . . . . . . . . . . . . . . . . . . 11
1
5.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6 Feature Engineering 12
6.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Feature Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.4 Feature Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.5 Domain Knowledge in Feature Engineering . . . . . . . . . . . . . . . . . 13
1 Introduction
Machine learning (ML) is a field of computer science that allows computers to learn
from data without being explicitly programmed. In the world of finance and quan-
titative analysis (quants), ML techniques are being used increasingly to predict stock
prices, manage risks, and detect fraudulent activities.
For quants, machine learning isn’t just a novel tool—it’s rapidly becoming a ne-
cessity. The ability to process vast datasets, adapt to changing conditions, recognize
complex patterns, and automate tasks makes ML indispensable in the modern financial
landscape. As technology and data continue to grow, the symbiosis between quanti-
tative analysis and machine learning will only deepen, driving innovations and new
strategies in the world of finance.
In coming sections , we’ll learn different types machine learning with types of model
we generally encounter in finance.
2 Supervised Learning
Supervised Learning is akin to learning with a tutor. Imagine you’re trying to under-
stand a complex topic, and every time you answer a question, the tutor tells you if
you’re right or wrong. Over time, you adjust your thinking based on this feedback,
improving your understanding.
In finance, supervised learning is used when we have past data with known out-
comes. For instance, using past stock prices to predict future ones. The ’tutor’ in this
case is the historical data – it provides the model with the ’right answers’ so it can adjust
and improve.
Example: A quant analyst has 5 years of daily stock prices and wants to predict
tomorrow’s price. Using supervised learning, the model learns from past price move-
ments and the factors influencing them. Once trained, this model can then predict
future prices, helping the analyst make investment decisions.
2
possible. This line won’t go through every point perfectly, but it gives a good general
idea of the relationship between height and shoe size. If you know someone’s height,
you can use this line to make a good guess about their shoe size.
This is the essence of Linear Regression. It’s about finding the best straight line (or
in more complex scenarios, a plane) that describes the relationship between variables.
Financial Example: Consider the stock market. If you plot a company’s stock price
against the performance of the broader market over time, you might notice a trend. Per-
haps when the market performs well, the stock price tends to rise, and when the market
dips, the stock tends to fall. Linear Regression can help quantify this relationship. By
drawing the best-fitting line through the data points, analysts can predict how the stock
might perform based on broader market movements. For instance, if the model indi-
cates a strong positive relationship and the market is expected to rise tomorrow, there’s
a good chance that the stock will rise as well.
3
Financial Example: Imagine a fund manager wants to classify stocks into two cat-
egories: ”buy” and ”not buy”. They have data on various features of these stocks, such
as their price-to-earnings ratio, historical volatility, and market sentiment scores. The
SVM will analyze this data and find the best decision boundary that separates the ”buy”
stocks from the ”not buy” stocks. If a new stock is introduced, the SVM can quickly
determine which side of the boundary it falls on, aiding the fund manager in their
decision-making process.
2.7 XGBoost
Imagine a class where each student is trying to solve a complex math problem. One
student tries and gets partly there but makes some errors. The next student builds
on the first’s work, correcting some mistakes, and getting closer to the solution. This
process continues, with each student building and improving on the previous student’s
4
work. XGBoost operates in a similar manner, building one tree at a time, each new tree
focusing on the errors made by the previous ones.
Financial Example: When forecasting stock prices using multiple factors, XGBoost
captures intricate relationships by combining many individual decision trees’ predictive
power, making it a powerful tool for financial predictions.
• Boosting: Here, models are trained sequentially, with each new model focusing
on the mistakes made by the previous ones. The predictions from all models are
then weighted and combined for the final output.
• Stacking: In stacking, multiple models are trained on the data, and their predic-
tions are used as inputs to another model (called a meta-model) that makes the
final prediction.
5
these models’ predictions using techniques like Bagging or Boosting, the bank can
ensure a more holistic assessment of the applicant’s risk profile, leading to better loan
approval decisions.
• Mean Absolute Error (MAE): Represents the average of the absolute differences
between the predicted and actual values. It provides a straightforward measure
of prediction error.
• Mean Squared Error (MSE): Like MAE, but squares the differences before aver-
aging them. It penalizes larger errors more severely than smaller ones.
• Precision: Represents the number of true positive predictions divided by the total
number of positive predictions (including both true positives and false positives).
• Recall (or Sensitivity): Represents the number of true positive predictions di-
vided by the total actual positives.
• AUC (Area Under the ROC Curve): Quantifies the overall ability of the model to
discriminate between positive and negative classes. An AUC of 1 indicates perfect
classification, while an AUC of 0.5 indicates no discrimination capability.
6
2.10.3 Financial Example
In the realm of finance, consider an analyst who develops a model to predict stock prices
for the next day. After deploying this model for a month, the analyst wants to gauge
its performance. Using the Mean Absolute Error, the analyst finds that, on average,
the model’s predictions are off by $5. This means that if the model predicts a stock
price of $100, the real price could likely be between $95 and $105. This metric helps
the analyst understand the model’s reliability and adjust their investment strategies
accordingly.
7
3 Unsupervised Learning
Unsupervised Learning is like exploring a new city without a map. You wander around,
noticing patterns, like which areas have more restaurants or parks. No one’s guiding
you; you’re figuring things out on your own based on observations.
In the financial realm, unsupervised learning helps when we have tons of data, but
no specific outcomes or labels to guide us. It’s used to uncover hidden patterns or
groupings.
Example: Consider an analyst looking at trading data for thousands of stocks. With
unsupervised learning, they might uncover groups of stocks that move similarly, per-
haps because they’re in the same industry or affected by similar economic factors.
3.1 Clustering
Clustering is akin to organizing a wardrobe. You naturally group similar items together:
shirts with shirts, pants with pants, based on features like color, type, or material. In
machine learning, clustering is about grouping data points that are similar to each other
without any predefined labels.
Financial Example: An asset manager might use clustering to categorize stocks
with similar price movements. This can help in portfolio diversification by ensuring
investments are spread across different clusters, thereby reducing risk.
3.4 Autoencoders
Imagine taking a detailed painting and trying to describe it in a few words, then using
those words to recreate the painting. You might lose some details, but the essence is
captured. Autoencoders in machine learning work similarly. They compress data into a
simpler form and then try to reconstruct it.
8
Financial Example: Autoencoders can be used for anomaly detection in trading
data. By training on normal trading data, the model can then detect anomalies or
potential fraud by spotting data that doesn’t fit the typical pattern.
4 Reinforcement Learning
Reinforcement Learning is like teaching a dog new tricks. The dog doesn’t know how
to fetch initially, but after repeated attempts and rewards (like treats), it learns the
desired behavior.
In a financial context, reinforcement learning trains models through trial and error,
rewarding them for good decisions and punishing them for bad ones. It’s particularly
useful for areas where immediate feedback is available.
Example: A hedge fund creates a trading bot. Initially, it makes random trades.
However, with reinforcement learning, it gets a ’reward’ for profitable trades and a
’penalty’ for unprofitable ones. Over time, it refines its strategy to maximize profits
based on this feedback.
4.1 Q-Learning
Q-Learning is like a child learning to navigate a new playground. The child tries dif-
ferent play equipment (slides, swings, monkey bars) and remembers how much fun
each one was. Over time, the child learns to spend more time on the most enjoyable
equipment.
Financial Example: In algorithmic trading, Q-learning can be used to determine
the best trading strategy by trying out various strategies and remembering the reward
(profit) from each. Over time, the algorithm focuses more on the most profitable strate-
gies.
9
4.3 Double Deep Q Networks (DDQN)
Imagine a student preparing for an exam using two textbooks. One book is used to
learn and study the material (let’s call it the ”learning book”), while the other is used
only to test the knowledge by solving its questions (the ”testing book”). By separating
the learning and testing processes, the student avoids potential biases and overestima-
tions that might arise if they relied too heavily on one source. In a similar vein, DDQN
improves upon DQN by using two separate neural networks: one to select the best ac-
tion (like the learning book) and another to evaluate that action’s value (like the testing
book). This separation helps in mitigating overestimation of Q-values, a common issue
in DQN.
Financial Example: In stock trading, a DDQN could be used to manage a portfo-
lio. One network might suggest reallocating funds from one stock to another, while
the second network evaluates the potential return of that reallocation. By having this
dual-check mechanism, the DDQN can make more balanced and less over-optimistic
investment decisions.
5 Overfitting
Imagine studying for an exam by memorizing every single word in the textbook, in-
cluding footnotes and page numbers. Come exam day, you find the questions are more
about understanding concepts than regurgitating facts. You’ve over-prepared in the
wrong way. This is akin to overfitting.
In finance, overfitting occurs when a model is too tailored to past data, capturing
every tiny detail, including noise or anomalies. Such a model performs poorly in real-
world scenarios.
Example: A trading strategy might show spectacular returns when back-tested on
historical data because it’s overly adjusted to that specific data. But when applied to
today’s market, it fails, because it’s too aligned with past quirks that aren’t relevant
now.
10
• Not considering the randomness or noise in the data.
5.2 Regularization
Regularization is like adding a guiding hand when memorizing for an exam, ensuring
you focus more on understanding key concepts than on rote memorization of every
detail. It introduces a penalty on overly complex models, helping to prevent overfitting.
Financial Example: In portfolio optimization, regularization might prevent putting
too much weight on a single stock or a small group of stocks, ensuring a more diversified
and balanced portfolio.
11
5.3 Cross-Validation
Cross-validation is akin to taking multiple mock exams before the real one. Instead of
relying on one set of questions (or data) to prepare, you test yourself on various sets to
ensure a comprehensive understanding.
Financial Example: Before finalizing a trading strategy, a quant might test it on
different periods of historical data (not just one) to ensure its robustness across varying
market conditions.
5.4 Pruning
Pruning, in the context of decision trees, is like trimming excess branches from a tree
to ensure healthy growth. In machine learning, it means simplifying a model by re-
moving sections of it (like certain nodes in a decision tree) that provide little power in
prediction.
Financial Example: When using decision trees to predict stock movements, pruning
helps in removing nodes that consider irrelevant factors, leading to a more concise and
effective prediction model.
6 Feature Engineering
Think of feature engineering as selecting the right ingredients for a recipe. Instead of
throwing everything into the pot, you choose specific items that complement each other
and enhance the final dish.
For quants, feature engineering is about selecting or transforming the most relevant
pieces of data to improve predictions. Instead of using all available data, they pinpoint
the most insightful variables.
Example: To predict a stock’s movement, instead of just looking at its price, an
analyst might also consider other factors like trading volume, company earnings, or
economic indicators. By carefully choosing these ’features’, they can build a more ef-
fective prediction model.
12
Financial Example: Instead of using a company’s monthly sales and expenses as
separate features, an analyst might combine them to create a new feature: ”profit
margin.”
Thank You
13