0% found this document useful (0 votes)
162 views13 pages

Machine Learning For Quants

The document is a primer on machine learning techniques for quantitative analysts in finance. It introduces several supervised learning algorithms commonly used in finance like linear regression, logistic regression, and support vector machines to predict stock prices and risks. Unsupervised techniques like clustering and dimensionality reduction are also discussed. The document explains techniques to address overfitting like regularization and cross-validation. Feature engineering concepts are covered to prepare data for machine learning models.

Uploaded by

abdullah zaheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views13 pages

Machine Learning For Quants

The document is a primer on machine learning techniques for quantitative analysts in finance. It introduces several supervised learning algorithms commonly used in finance like linear regression, logistic regression, and support vector machines to predict stock prices and risks. Unsupervised techniques like clustering and dimensionality reduction are also discussed. The document explains techniques to address overfitting like regularization and cross-validation. Feature engineering concepts are covered to prepare data for machine learning models.

Uploaded by

abdullah zaheer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Machine Learning for Quants: A Primer

Amit Kumar Jha, UBS

Contents
1 Introduction 2

2 Supervised Learning 2
2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . . . . . . . 3
2.4 Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.5 K-Nearest Neighbors (KNN) . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.6 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.7 XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.8 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Ensemble Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9.1 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.10 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.1 Regression Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.2 Classification Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10.3 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.10.4 Classification Metrics in Layman’s Terms . . . . . . . . . . . . . . 7
2.10.5 Financial Example . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Unsupervised Learning 8
3.1 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Association Rule Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Generative Adversarial Networks (GANs) . . . . . . . . . . . . . . . . . . 9

4 Reinforcement Learning 9
4.1 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 Deep Q Networks (DQN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Double Deep Q Networks (DDQN) . . . . . . . . . . . . . . . . . . . . . 10
4.4 Policy Gradient Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Overfitting 10
5.1 Causes of Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2.1 Regularization Techniques . . . . . . . . . . . . . . . . . . . . . . 11

1
5.3 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.4 Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

6 Feature Engineering 12
6.1 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Feature Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.4 Feature Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.5 Domain Knowledge in Feature Engineering . . . . . . . . . . . . . . . . . 13

1 Introduction
Machine learning (ML) is a field of computer science that allows computers to learn
from data without being explicitly programmed. In the world of finance and quan-
titative analysis (quants), ML techniques are being used increasingly to predict stock
prices, manage risks, and detect fraudulent activities.
For quants, machine learning isn’t just a novel tool—it’s rapidly becoming a ne-
cessity. The ability to process vast datasets, adapt to changing conditions, recognize
complex patterns, and automate tasks makes ML indispensable in the modern financial
landscape. As technology and data continue to grow, the symbiosis between quanti-
tative analysis and machine learning will only deepen, driving innovations and new
strategies in the world of finance.
In coming sections , we’ll learn different types machine learning with types of model
we generally encounter in finance.

2 Supervised Learning
Supervised Learning is akin to learning with a tutor. Imagine you’re trying to under-
stand a complex topic, and every time you answer a question, the tutor tells you if
you’re right or wrong. Over time, you adjust your thinking based on this feedback,
improving your understanding.
In finance, supervised learning is used when we have past data with known out-
comes. For instance, using past stock prices to predict future ones. The ’tutor’ in this
case is the historical data – it provides the model with the ’right answers’ so it can adjust
and improve.
Example: A quant analyst has 5 years of daily stock prices and wants to predict
tomorrow’s price. Using supervised learning, the model learns from past price move-
ments and the factors influencing them. Once trained, this model can then predict
future prices, helping the analyst make investment decisions.

2.1 Linear Regression


Imagine you’re trying to understand how the height of a group of people relates to
their shoe size. You plot each person’s height against their shoe size on a graph, and
you notice a general trend: as height increases, shoe size tends to increase as well.
Now, you take a ruler and try to draw a straight line that fits these points as closely as

2
possible. This line won’t go through every point perfectly, but it gives a good general
idea of the relationship between height and shoe size. If you know someone’s height,
you can use this line to make a good guess about their shoe size.
This is the essence of Linear Regression. It’s about finding the best straight line (or
in more complex scenarios, a plane) that describes the relationship between variables.
Financial Example: Consider the stock market. If you plot a company’s stock price
against the performance of the broader market over time, you might notice a trend. Per-
haps when the market performs well, the stock price tends to rise, and when the market
dips, the stock tends to fall. Linear Regression can help quantify this relationship. By
drawing the best-fitting line through the data points, analysts can predict how the stock
might perform based on broader market movements. For instance, if the model indi-
cates a strong positive relationship and the market is expected to rise tomorrow, there’s
a good chance that the stock will rise as well.

2.2 Logistic Regression


Imagine you’re at a carnival, and there’s a game where you have to guess whether a
tossed coin will land heads up or tails up. Now, this isn’t a regular coin toss because
the coin is weighted. Before each toss, you’re given some hints: the temperature of the
room, the height from which the coin is tossed, and the angle of the toss. Based on
these hints, you have to decide: heads or tails?
Logistic Regression works similarly. Instead of predicting a continuous outcome
(like the exact stock price), it predicts the probability of an event occurring: will it be
’A’ or ’B’? The output is a probability that the given input point belongs to a particular
category, which is transformed into a binary outcome via a threshold. For instance, if
the calculated probability is above 0.5 (or 50
Financial Example: Consider an investor trying to decide whether a particular stock
will rise or fall tomorrow. They could use various factors like today’s trading volume,
recent news about the company, and current economic indicators. Logistic Regression
would take these factors into account and give a probability score. If the score is above
a certain threshold, say 0.5, the model predicts the stock will rise. If below, it predicts
a fall. Over time, as the model is exposed to more data and outcomes, it gets better at
making these predictions, helping the investor make more informed decisions.

2.3 Support Vector Machines (SVM)


Imagine you’re at a beach, and you see two groups of people: those playing beach
volleyball and those sunbathing. You want to draw a line in the sand that separates
the two groups. You could draw this line in many ways, but you aim for one that gives
the most space between the two groups, so there’s a clear distinction. If someone new
comes to the beach, you can easily decide which group they belong to based on which
side of the line they’re on.
SVM operates on a similar principle. It doesn’t just find a line (or in more complex
cases, a plane) to separate data points of two categories; it finds the one that has the
maximum margin between the two groups. The data points that are closest to this
boundary (and hence, most difficult to classify) are known as ’support vectors’. The
SVM ensures that this boundary is as far from these critical points as possible, aiming
for clear and confident classifications.

3
Financial Example: Imagine a fund manager wants to classify stocks into two cat-
egories: ”buy” and ”not buy”. They have data on various features of these stocks, such
as their price-to-earnings ratio, historical volatility, and market sentiment scores. The
SVM will analyze this data and find the best decision boundary that separates the ”buy”
stocks from the ”not buy” stocks. If a new stock is introduced, the SVM can quickly
determine which side of the boundary it falls on, aiding the fund manager in their
decision-making process.

2.4 Decision Trees


Imagine you’re trying to decide what to wear for a day out. You look outside: Is it
raining? If yes, wear a raincoat; if no, proceed to the next question. Is it cold? If
yes, wear a sweater; if no, a T-shirt will do. This step-by-step decision-making process
resembles a Decision Tree. It’s a flowchart-like structure where you answer questions
or make decisions at each node until you reach a conclusion.
Financial Example: Consider an investor deciding whether to invest in a particular
stock. The Decision Tree might guide the decision by asking: ”Is the company’s P/E
ratio below the industry average?” or ”Has the company’s revenue been growing for
the past three quarters?” Depending on the answers, the tree helps the investor arrive
at an investment decision.

2.5 K-Nearest Neighbors (KNN)


Imagine you’re in a new city and want to find a good restaurant. Instead of checking
every restaurant, you ask a few locals nearby. If most of them recommend the same
place, you’d likely choose that one. KNN works similarly, considering the ’opinion’ or
classification of nearby data points to make a prediction.
Financial Example: An analyst wants to predict a stock’s movement based on the
movements of ’similar’ stocks. If most stocks that have similar features have gone up
recently, KNN might predict this stock will rise too.

2.6 Naive Bayes


Think of Naive Bayes as a detective trying to solve a case based on evidence. Each
piece of evidence (or feature) contributes to the probability of a suspect being guilty.
However, this detective treats each piece of evidence as independent, not considering
how they might be related.
Financial Example: An algorithm predicting whether a news article about a com-
pany is positive or negative. The model gauges the likelihood based on the presence
of certain positive or negative words in the article, treating each word’s presence as an
independent piece of evidence.

2.7 XGBoost
Imagine a class where each student is trying to solve a complex math problem. One
student tries and gets partly there but makes some errors. The next student builds
on the first’s work, correcting some mistakes, and getting closer to the solution. This
process continues, with each student building and improving on the previous student’s

4
work. XGBoost operates in a similar manner, building one tree at a time, each new tree
focusing on the errors made by the previous ones.
Financial Example: When forecasting stock prices using multiple factors, XGBoost
captures intricate relationships by combining many individual decision trees’ predictive
power, making it a powerful tool for financial predictions.

2.8 Random Forest


Consider a council meeting where each member has a say on a decision. Instead of
just one member deciding, all members cast their vote, and the majority decision is
taken. Each member might have a different perspective, but when combined, you get
a more comprehensive viewpoint. Random Forest is like this council, building multiple
decision trees on varied subsets of data and combining their predictions.
Financial Example: When assessing the creditworthiness of loan applicants, a Ran-
dom Forest considers multiple factors, ensuring a balanced judgment. Each decision
tree might focus on different aspects, like employment history or past loans. By com-
bining their insights, the Random Forest provides a more holistic view of the applicant’s
creditworthiness.

2.9 Ensemble Methods


Imagine you’re trying to decide on a movie to watch tonight. Instead of relying on
just one friend’s recommendation, you ask several friends. Each friend might have a
slightly different taste, but by aggregating their suggestions, you can choose a movie
that’s likely to be enjoyed by everyone. Ensemble methods in machine learning operate
on a similar principle. Instead of relying on a single model’s prediction, they combine
the outputs of multiple models to arrive at a more accurate and robust prediction.
There are three primary ensemble techniques:

• Bagging: This involves creating multiple versions of a dataset through random


sampling and building a separate model on each. The final prediction is an aver-
age (for regression) or a majority vote (for classification) of the predictions from
each model.

• Boosting: Here, models are trained sequentially, with each new model focusing
on the mistakes made by the previous ones. The predictions from all models are
then weighted and combined for the final output.

• Stacking: In stacking, multiple models are trained on the data, and their predic-
tions are used as inputs to another model (called a meta-model) that makes the
final prediction.

2.9.1 Financial Example


Consider a bank that’s trying to assess the creditworthiness of loan applicants. The bank
has data like income, employment history, and credit score for each applicant. Instead
of using a single model, the bank employs ensemble methods, using predictions from
multiple models to make the final decision. For instance, one model might give more
weight to credit score, while another emphasizes employment history. By aggregating

5
these models’ predictions using techniques like Bagging or Boosting, the bank can
ensure a more holistic assessment of the applicant’s risk profile, leading to better loan
approval decisions.

2.10 Evaluation Metrics


Evaluating the success of a model is paramount in any machine learning endeavor.
Just as a student’s understanding of a subject is gauged through exams, a model’s
performance is assessed using various metrics. These metrics provide insights into how
well the model is doing and where it might be faltering.

2.10.1 Regression Metrics


Regression tasks, where the objective is to predict a continuous value, rely on specific
metrics to measure the accuracy of predictions:

• Mean Absolute Error (MAE): Represents the average of the absolute differences
between the predicted and actual values. It provides a straightforward measure
of prediction error.

• Mean Squared Error (MSE): Like MAE, but squares the differences before aver-
aging them. It penalizes larger errors more severely than smaller ones.

• R-squared: Often known as the coefficient of determination, it quantifies the


proportion of the variance in the dependent variable that is predictable from the
independent variables. An R-squared of 1 indicates perfect predictions, while an
R-squared of 0 indicates that the model is no better than simply predicting the
mean of the target variable.

2.10.2 Classification Metrics


Classification tasks, where the objective is to categorize data points, use different met-
rics:

• Accuracy: Measures the proportion of correctly predicted classification.

• Precision: Represents the number of true positive predictions divided by the total
number of positive predictions (including both true positives and false positives).

• Recall (or Sensitivity): Represents the number of true positive predictions di-
vided by the total actual positives.

• F1-Score: Harmonic mean of precision and recall, giving a balanced measure.

• ROC Curve: A graphical representation of the performance of a classification


model, plotting the true positive rate against the false positive rate.

• AUC (Area Under the ROC Curve): Quantifies the overall ability of the model to
discriminate between positive and negative classes. An AUC of 1 indicates perfect
classification, while an AUC of 0.5 indicates no discrimination capability.

6
2.10.3 Financial Example
In the realm of finance, consider an analyst who develops a model to predict stock prices
for the next day. After deploying this model for a month, the analyst wants to gauge
its performance. Using the Mean Absolute Error, the analyst finds that, on average,
the model’s predictions are off by $5. This means that if the model predicts a stock
price of $100, the real price could likely be between $95 and $105. This metric helps
the analyst understand the model’s reliability and adjust their investment strategies
accordingly.

2.10.4 Classification Metrics in Layman’s Terms


Imagine you’re a detective trying to identify which of two identical twins committed a
crime. You have a test (your classification model) that can help you decide, but it’s not
perfect. Let’s use this scenario to understand some classification metrics:
• Accuracy: This is like counting how often your test correctly identifies the guilty
twin out of all the times you use it. If you test 100 times and it’s right 85 times,
the accuracy is 85%.
• Precision: Imagine the test sometimes falsely accuses the innocent twin. Preci-
sion tells you how often your test is actually right when it claims a twin is guilty. If
the test points to a twin as guilty 10 times, but is only right 7 times, the precision
is 70%.
• Recall (or Sensitivity): This metric tells you how many times the test correctly
identified the guilty twin out of all the times the guilty twin was tested. If the
guilty twin was tested 10 times and the test correctly identified him 7 times, the
recall is 70%.
• F1-Score: Since precision and recall are both important, the F1-Score is like a
balance between the two. It’s a way to ensure you’re not sacrificing too much of
one for the other.
• ROC Curve: Think of the ROC curve as a graph that shows how good your test
is under different conditions. The better the test, the closer the curve is to the
top-left corner of the graph.
• AUC (Area Under the ROC Curve): This is like measuring the total area where
your test performs well. A perfect test has an AUC of 1, while a random guess
would have an AUC of 0.5.

2.10.5 Financial Example


Consider an investment firm that develops a model to classify stocks as ”buy” or ”not
buy”. After using the model for several months, they want to understand its perfor-
mance. They might look at Precision to see how often the stocks the model labeled
as ”buy” actually turned out to be good investments. Similarly, they’d use Recall to
understand how many of the actual good investments were correctly identified by the
model. The F1-Score would give them a balanced view of both. By understanding
these metrics, the firm can have more confidence in the model’s recommendations and
refine their investment strategies.

7
3 Unsupervised Learning
Unsupervised Learning is like exploring a new city without a map. You wander around,
noticing patterns, like which areas have more restaurants or parks. No one’s guiding
you; you’re figuring things out on your own based on observations.
In the financial realm, unsupervised learning helps when we have tons of data, but
no specific outcomes or labels to guide us. It’s used to uncover hidden patterns or
groupings.
Example: Consider an analyst looking at trading data for thousands of stocks. With
unsupervised learning, they might uncover groups of stocks that move similarly, per-
haps because they’re in the same industry or affected by similar economic factors.

3.1 Clustering
Clustering is akin to organizing a wardrobe. You naturally group similar items together:
shirts with shirts, pants with pants, based on features like color, type, or material. In
machine learning, clustering is about grouping data points that are similar to each other
without any predefined labels.
Financial Example: An asset manager might use clustering to categorize stocks
with similar price movements. This can help in portfolio diversification by ensuring
investments are spread across different clusters, thereby reducing risk.

3.2 Dimensionality Reduction


Dimensionality Reduction is like looking at a complex, multi-layered painting and try-
ing to capture its essence in a simple sketch. It’s about distilling vast amounts of infor-
mation into its most meaningful components.
Financial Example: When dealing with a large set of financial indicators to predict
stock movements, dimensionality reduction can help quants focus on the most impact-
ful indicators, simplifying the prediction process.

3.3 Association Rule Learning


Association Rule Learning is similar to noticing that people who buy sunscreen also
tend to buy sunglasses. It’s about finding relationships or patterns between different
items in large datasets.
Financial Example: In algorithmic trading, association rules might help uncover
that when certain stocks (e.g., tech stocks) go down, others (e.g., gold or utility stocks)
tend to go up, guiding investment strategies.

3.4 Autoencoders
Imagine taking a detailed painting and trying to describe it in a few words, then using
those words to recreate the painting. You might lose some details, but the essence is
captured. Autoencoders in machine learning work similarly. They compress data into a
simpler form and then try to reconstruct it.

8
Financial Example: Autoencoders can be used for anomaly detection in trading
data. By training on normal trading data, the model can then detect anomalies or
potential fraud by spotting data that doesn’t fit the typical pattern.

3.5 Generative Adversarial Networks (GANs)


GANs are like two artists in a contest. One artist (the generator) creates a painting,
while the other artist (the discriminator) critiques it. The generator keeps refining its
work based on feedback until the discriminator can’t tell if it’s a genuine masterpiece
or a forgery.
Financial Example: GANs can be used in finance for tasks like generating synthetic
financial data for model training when real data might be scarce or sensitive.

4 Reinforcement Learning
Reinforcement Learning is like teaching a dog new tricks. The dog doesn’t know how
to fetch initially, but after repeated attempts and rewards (like treats), it learns the
desired behavior.
In a financial context, reinforcement learning trains models through trial and error,
rewarding them for good decisions and punishing them for bad ones. It’s particularly
useful for areas where immediate feedback is available.
Example: A hedge fund creates a trading bot. Initially, it makes random trades.
However, with reinforcement learning, it gets a ’reward’ for profitable trades and a
’penalty’ for unprofitable ones. Over time, it refines its strategy to maximize profits
based on this feedback.

4.1 Q-Learning
Q-Learning is like a child learning to navigate a new playground. The child tries dif-
ferent play equipment (slides, swings, monkey bars) and remembers how much fun
each one was. Over time, the child learns to spend more time on the most enjoyable
equipment.
Financial Example: In algorithmic trading, Q-learning can be used to determine
the best trading strategy by trying out various strategies and remembering the reward
(profit) from each. Over time, the algorithm focuses more on the most profitable strate-
gies.

4.2 Deep Q Networks (DQN)


DQN is like Q-learning but with the added memory and complexity of a neural network.
Imagine the child in the playground now having a smartwatch that records and analyzes
every move to recommend the next best play equipment.
Financial Example: In high-frequency trading, where decisions need to be made
rapidly based on vast amounts of data, DQNs can analyze current market conditions in
real-time and recommend the best trading actions based on past rewards.

9
4.3 Double Deep Q Networks (DDQN)
Imagine a student preparing for an exam using two textbooks. One book is used to
learn and study the material (let’s call it the ”learning book”), while the other is used
only to test the knowledge by solving its questions (the ”testing book”). By separating
the learning and testing processes, the student avoids potential biases and overestima-
tions that might arise if they relied too heavily on one source. In a similar vein, DDQN
improves upon DQN by using two separate neural networks: one to select the best ac-
tion (like the learning book) and another to evaluate that action’s value (like the testing
book). This separation helps in mitigating overestimation of Q-values, a common issue
in DQN.
Financial Example: In stock trading, a DDQN could be used to manage a portfo-
lio. One network might suggest reallocating funds from one stock to another, while
the second network evaluates the potential return of that reallocation. By having this
dual-check mechanism, the DDQN can make more balanced and less over-optimistic
investment decisions.

4.4 Policy Gradient Methods


Policy Gradient Methods are like a chess player who, instead of memorizing specific
moves, learns the overall strategies and principles of the game. The player adjusts their
gameplay by understanding which strategies generally lead to winning and which to
losing.
Financial Example: For portfolio optimization, policy gradient methods can be
used to adjust investment strategies based on the overall returns of different investment
combinations, rather than specific stock movements.

5 Overfitting
Imagine studying for an exam by memorizing every single word in the textbook, in-
cluding footnotes and page numbers. Come exam day, you find the questions are more
about understanding concepts than regurgitating facts. You’ve over-prepared in the
wrong way. This is akin to overfitting.
In finance, overfitting occurs when a model is too tailored to past data, capturing
every tiny detail, including noise or anomalies. Such a model performs poorly in real-
world scenarios.
Example: A trading strategy might show spectacular returns when back-tested on
historical data because it’s overly adjusted to that specific data. But when applied to
today’s market, it fails, because it’s too aligned with past quirks that aren’t relevant
now.

5.1 Causes of Overfitting


Overfitting often arises due to:

• Using an excessively complex model for a simple task.

• Training on a limited set of data.

10
• Not considering the randomness or noise in the data.

Financial Example: An algorithm designed to predict stock prices might overfit if


it’s built on a very complex neural network but trained only on a month’s worth of data.
The model might pick up on irrelevant patterns unique to that month, leading to poor
predictions in subsequent months.

5.2 Regularization
Regularization is like adding a guiding hand when memorizing for an exam, ensuring
you focus more on understanding key concepts than on rote memorization of every
detail. It introduces a penalty on overly complex models, helping to prevent overfitting.
Financial Example: In portfolio optimization, regularization might prevent putting
too much weight on a single stock or a small group of stocks, ensuring a more diversified
and balanced portfolio.

5.2.1 Regularization Techniques


Imagine you’re trying to plot the best route for a road trip across several cities. One
approach is to ensure you pass through every single tourist spot, even if it means taking
longer, zig-zag routes. This might give you a very detailed journey but could be im-
practical and exhausting. Instead, you could choose a more general route that covers
most of the main attractions but avoids unnecessary detours. Regularization in machine
learning is akin to choosing this more general route. It prevents the model from becom-
ing too complex (like the exhaustive road trip) by adding some constraints, ensuring it
captures the general trend in the data without fitting every tiny detail.
There are two primary regularization techniques:

• L1 Regularization (Lasso): This technique adds a penalty equal to the absolute


value of the magnitude of coefficients. It can lead to some coefficients becom-
ing exactly zero, effectively selecting a simpler model that doesn’t consider those
features.

• L2 Regularization (Ridge): This technique adds a penalty equal to the square of


the magnitude of coefficients. It ensures that coefficients don’t become too large
and disproportionately influence the model’s predictions.

Financial Example Consider a quant analyst developing a model to predict stock


prices. They have access to a vast array of features, from company fundamentals to
global economic indicators. Without regularization, the model might become overly
complex, fitting every fluctuation in the historical data, including the noise. This would
make it perform poorly on new, unseen data. By applying Ridge regularization, the
analyst ensures that the model remains balanced, capturing the essential patterns in the
data without becoming overly tailored to historical quirks. This results in more robust
and reliable stock price predictions.

11
5.3 Cross-Validation
Cross-validation is akin to taking multiple mock exams before the real one. Instead of
relying on one set of questions (or data) to prepare, you test yourself on various sets to
ensure a comprehensive understanding.
Financial Example: Before finalizing a trading strategy, a quant might test it on
different periods of historical data (not just one) to ensure its robustness across varying
market conditions.

5.4 Pruning
Pruning, in the context of decision trees, is like trimming excess branches from a tree
to ensure healthy growth. In machine learning, it means simplifying a model by re-
moving sections of it (like certain nodes in a decision tree) that provide little power in
prediction.
Financial Example: When using decision trees to predict stock movements, pruning
helps in removing nodes that consider irrelevant factors, leading to a more concise and
effective prediction model.

6 Feature Engineering
Think of feature engineering as selecting the right ingredients for a recipe. Instead of
throwing everything into the pot, you choose specific items that complement each other
and enhance the final dish.
For quants, feature engineering is about selecting or transforming the most relevant
pieces of data to improve predictions. Instead of using all available data, they pinpoint
the most insightful variables.
Example: To predict a stock’s movement, instead of just looking at its price, an
analyst might also consider other factors like trading volume, company earnings, or
economic indicators. By carefully choosing these ’features’, they can build a more ef-
fective prediction model.

6.1 Feature Selection


Feature selection is like choosing which ingredients to include in a recipe. Not every
ingredient adds value, and some might even spoil the dish if used inappropriately.
Similarly, not every piece of data is useful for a model, and some might even harm
its performance.
Financial Example: When predicting bond prices, an analyst might find that global
geopolitical events have little influence, and thus exclude them as a feature, focusing
instead on factors like interest rates and inflation.

6.2 Feature Extraction


Feature extraction is like blending multiple ingredients to create a new sauce or flavor.
It involves combining multiple data points to create new, more informative features.

12
Financial Example: Instead of using a company’s monthly sales and expenses as
separate features, an analyst might combine them to create a new feature: ”profit
margin.”

6.3 Feature Scaling


Imagine baking a cake and adding a cup of sugar and a teaspoon of salt. The quantities
are drastically different, but each ingredient plays a crucial role. Feature scaling is
about ensuring all features have a similar scale so that no particular feature dominates
the model.
Financial Example: When building a model using features like ”company’s total
assets” (which can be in billions) and ”number of employees,” an analyst would scale
these features so that both have comparable magnitudes, ensuring balanced influence
in the model.

6.4 Feature Encoding


Feature encoding is like translating a recipe from one language to another, ensuring it’s
understood in a new context. In machine learning, it involves converting categorical
data (like ”yes” or ”no” answers) into a numerical format that a model can understand.
Financial Example: If an analyst is considering the ”industry sector” of a company
as a feature (e.g., ”tech,” ”finance,” ”healthcare”), they would encode these categories
into numerical values for the model to process.

6.5 Domain Knowledge in Feature Engineering


Domain knowledge is like a chef’s expertise in understanding which ingredients work
best for a particular cuisine. In feature engineering, domain knowledge allows experts
to create and choose features that are particularly relevant to the task at hand.
Financial Example: A quant with deep knowledge in commodities might know
that oil prices are influenced by specific geopolitical events. They would ensure to
incorporate relevant geopolitical indicators as features when predicting oil prices.

Thank You

13

You might also like