Generate Alpha
Generate Alpha
I finally beat the S&P 500 by 10%. This might not sound like much but when we’re
dealing with large amounts of capital and with good liquidity, the profits are pretty
sweet for a hedge fund. More aggressive approaches have resulted in much higher
returns.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 1/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
It all started after I read a paper by Gur Huberman titled “Contagious Speculation and a
Cure for Cancer: A Non-Event that Made Stock Prices Soar,” (with Tomer Regev, Journal
of Finance, February 2001, Vol. 56, №1, pp. 387–396). The research described an event
that occurred in 1998 with a public company called EntreMed (ENMD was the symbol at
the time):
Among the many insightful observations made by the researchers, one stood out in the
conclusion:
“[Price] movements may be concentrated in stocks that have some things in common,
but these need not be economic fundamentals.”
I wondered if it was possible to cluster stocks based on something other than what’s
usually used. I started digging around for datasets and after a few weeks I found one
that included scores describing strength of “known and hidden relationships” between
stocks and elements of the Periodic Table designed by Vectorspace AI.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 2/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Expression patterns of selected genes involved signaling pathways for cell plasticity, growth and
differentiation — https://www.researchgate.net/figure/263706740_fig4_Expression-patterns-of-selected-
genes-involved-signaling-pathways-for-cell-plasticity
Equities, like genes, are influenced via a massive network of strong and weak hidden
relationships shared between one another. Some of these influences and relationships
can be predicted.
One of my goals was to create long and short clusters of stocks or “basket clusters” I
could use to hedge or just profit from. This would require an unsupervised machine
learning approach to create clusters of stocks that would share strong and weak
relationships with one another. These clusters would double as “baskets” of stocks my
firm could trade.
I started by downloading the dataset here. The dataset is based on relationships between
elements in the periodic table and public companies. In the future I’d like to work with
cryptocurrencies and create baskets similar to what these guys are doing here but that’s
a future project.
Then using Python and a subset of the usual machine learning suspects — scikit-learn,
numpy, pandas, matplotlib and seaborn, I set out to understand the shape of the dataset
I was dealing with. (To do some of this I looked to a Kaggle Kernel titled “Principal
Component Analysis with KMeans visuals”.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 3/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
1 import numpy as np
2 import pandas as pd
3 from sklearn.decomposition import PCA
4 from sklearn.cluster import KMeans
5 import matplotlib.pyplot as plt
6 import seaborn as sb
7
8 np.seterr(divide='ignore', invalid='ignore')
9
10 # Quick way to test just a few column features
11 # stocks = pd.read_csv('supercolumns-elements-nasdaq-nyse-otcbb-general-UPDATE-2017-03-0
12
13 stocks = pd.read_csv('supercolumns-elements-nasdaq-nyse-otcbb-general-UPDATE-2017-03-01.
14
15 print(stocks.head())
16
17 str_list = []
18 for colname, colvalue in stocks.iteritems():
19 if type(colvalue[1]) == str:
20 str_list.append(colname)
21
22 # Get to the numeric columns by inversion
23 num_list = stocks.columns.difference(str_list)
24
25 stocks_num = stocks[num_list]
26
27 print(stocks_num.head())
1 zack@twosigma-Dell-Precision-M3800:/home/zack/hedge_pool/baskets/hcluster$ ./hidden_rela
2 Symbol_update-2017-04-01 Hydrogen Helium Lithium Beryllium Boron \
3 0 A 0.0 0.00000 0.0 0.0 0.0
4 1 AA 0.0 0.00000 0.0 0.0 0.0
5 2 AAAP 0.0 0.00461 0.0 0.0 0.0
6 3 AAC 0.0 0.00081 0.0 0.0 0.0
7 4 AACAY 0.0 0.00000 0.0 0.0 0.0
8
9 Carbon Nitrogen Oxygen Fluorine ... Fermium Mendelevium \
10 0 0.006632 0.0 0.007576 0.0 ... 0.000000 0.079188
11 1 0.000000 0.0 0.000000 0.0 ... 0.000000 0.000000
12 2 0.000000 0.0 0.000000 0.0 ... 0.135962 0.098090
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 4/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
12 2 0.000000 0.0 0.000000 0.0 ... 0.135962 0.098090
13 3 0.000000 0.0 0.018409 0.0 ... 0.000000 0.000000
14 4 0.000000 0.0 0.000000 0.0 ... 0.000000 0.000000
15
16 Nobelium Lawrencium Rutherfordium Dubnium Seaborgium Bohrium Hassium \
17 0 0.197030 0.1990 0.1990 0.0 0.0 0.0 0.0
18 1 0.000000 0.0000 0.0000 0.0 0.0 0.0 0.0
19 2 0.244059 0.2465 0.2465 0.0 0.0 0.0 0.0
20 3 0.000000 0.0000 0.0000 0.0 0.0 0.0 0.0
21 4 0.000000 0.0000 0.0000 0.0 0.0 0.0 0.0
22
23 Meitnerium
24 0 0.0
25 1 0.0
26 2 0.0
27 3 0.0
28 4 0.0
29
30 [5 rows x 110 columns]
31 Actinium Aluminum Americium Antimony Argon Arsenic Astatine \
32 0 0.000000 0.0 0.0 0.002379 0.047402 0.018913 0.0
33 1 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0
34 2 0.004242 0.0 0.0 0.001299 0.000000 0.000000 0.0
35 3 0.000986 0.0 0.0 0.003378 0.000000 0.000000 0.0
36 4 0.000000 0.0 0.0 0.000000 0.000000 0.000000 0.0
37
38 Barium Berkelium Beryllium ... Tin Titanium Tungsten Uranium \
39 0 0.0 0.000000 0.0 ... 0.0 0.002676 0.0 0.000000
40 1 0.0 0.000000 0.0 ... 0.0 0.000000 0.0 0.000000
41 2 0.0 0.141018 0.0 ... 0.0 0.000000 0.0 0.004226
42 3 0.0 0.000000 0.0 ... 0.0 0.000000 0.0 0.004086
43 4 0.0 0.000000 0.0 ... 0.0 0.000000 0.0 0.000000
44
45 Vanadium Xenon Ytterbium Yttrium Zinc Zirconium
46 0 0.000000 0.0 0.0 0.000000 0.000000 0.0
47 1 0.000000 0.0 0.0 0.000000 0.000000 0.0
48 2 0.002448 0.0 0.0 0.018806 0.008758 0.0
49 3 0.001019 0.0 0.0 0.000000 0.007933 0.0
50 4 0.000000 0.0 0.0 0.000000 0.000000 0.0
51
52 [5 rows x 109 columns]
53 zack@twosigma-Dell-Precision-M3800:/home/zack/hedge_pool/baskets/hcluster$
A Pearson Correlation of concept features. In this case, minerals and elements from the
periodic table:
Output: (ran against the first 16 samples for this visualization example). It’s also
interesting to see how elements in the periodic table correlate to public companies. At
some point, I’d like to use the data to predict breakthroughs a company might make
based on their correlation to interesting elements or materials.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 6/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Output:
From this chart we can see that a large amount of variance comes from the first 85% of
the predicted Principal Components. It’s a high number so let’s start at the low end and
model for just a handful of Principal Component. More information on analyzing a
reasonable number of Principal Components can be found here.
Using scikit-learn’s PCA module, lets set n_components = 9. The second line of the code
calls the “fit_transform” method, which fits the PCA model with the standardized movie
data X_std and applies the dimensionality reduction on this dataset.
1 pca = PCA(n_components=9)
2 x_9d = pca.fit_transform(X_std)
3
4 plt.figure(figsize = (9,7))
5 plt.scatter(x_9d[:,0],x_9d[:,1], c='goldenrod',alpha=0.5)
6 plt.ylim(-10,30)
7 plt show()
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 8/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
7 plt.show()
Output:
We don’t really observe even faint outlines of clusters here so we should likely continue
adjusting n_component values until we see something we like. This relates to the “art”
part of data science and art.
Now lets try the K-means to see if we are able to visualize any distinct clusters in the next
section.
K-Means Clustering
A simple K-Means will now be applied using the PCA projection data.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 9/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Using scikit-learn’s KMeans() call and the “fit_predict” method, we compute cluster
centers and predict cluster indices for the first and third PCA projections (to see if we
can observe any appreciable clusters). We then define our own color scheme and plot
the scatter diagram as follows:
Output:
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 10/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
This K-Means plot looks more promising now as if our simple clustering model
assumption turns out to be right, we can observe 3 distinguishable clusters via this color
visualization scheme.
Of course, there are many different ways to cluster and visualize a dataset like this as
shown here.
Using seaborn’s convenient pairplot function I can automatically plot all the features in
the dataframe in pairwise manner. We can pairplot the first 3 projections against one
another and visualize:
Output:
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 11/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Once you’re satisfied with your clusters and have set scoring thresholds to control
whether certain stocks qualify for a cluster you can then extract the stocks for a given
cluster and trade them as baskets or use the baskets as signals. The list of things you can
do with this kind of approach is largely based on your creativity and how well you might
be able to optimize using deep learning variants to optimize the returns of each cluster
based on which concepts to cluster or data points such as the size of a company’s short
interest or float (available shares on the open market).
You might notice a few interesting traits in the way these clusters trade as baskets.
Sometimes there’s divergence from the S&P or general Market. This can offer
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 12/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
It might be interesting to see clusters related to materials and their supply chain as
mentioned in this article: “Zooming in on 10 materials and their supply chains”. Using
the dataset, I only operated on the feature column labels: ‘Cobalt’, ‘Copper’, ‘Gallium’
and ‘Graphene’ just to see if I might uncover any interesting hidden connections between
public companies working in this area or exposed to risk in this area. These baskets are
also compared against the returns of the S&P (SPY).
By using historical price data, which is readily available at outlets like Quantopian,
Numerai, Quandl or Yahoo Finance, you can then aggregate price data to generate
projected returns visualized using HighCharts:
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 13/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
The returns I gained from the cluster above beat the S&P by a nice margin, which means
you would have approximately an extra 10% over the S&P annual. I’ve seen more
aggressive approaches net close to 70% annual. Now I have to admit that I do a few
other things that I have to keep black-boxed due to the nature of my work, but from what
I’ve observed so far, at least exploring or wrapping new quantitative models around this
approach could turn out to be quite worth it and with the only downside being a
different kind of signal you could pipe into another system.
Generating short basket clusters could be more profitable than long basket clusters. This
approach needs its own article and before the next Black Swan event.
My next iteration on this kind of model should probably include a separate algorithm for
auto-generating feature combinations or unique lists. Perhaps based on near real-time
events that might affect groups of stocks with hidden relationships that only humans,
outfitted with unsupervised machine learning algorithms, can predict.
MAIN MENU
We have moved!
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 14/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Please join us in our main domain to read all the articles we have published under Rabbut.
Hacker Noon is how hackers start their afternoons. We’re a part of the @AMI family. We are
now accepting submissions and happy to discuss advertising & sponsorship opportunities.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech
stories. Until next time, don’t take the realities of the world for granted!
how hackers start their afternoons. the real shit is on hackernoon.com. Take a look.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 15/16
8/27/2021 Generating Alpha with Vectorspace AI NLP/NLU Correlation Matrix Datasets: Equities vs The Periodic Table of Elements | by Gaëtan Rickter | Hacker…
Your email
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.
https://medium.com/hackernoon/unsupervised-machine-learning-for-fun-profit-with-basket-clusters-17a1161e7aa1 16/16