Leveraging Machine Learning For High-Frequency Trading of Commodity Futures

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

Leveraging Machine Learning

for High-Frequency Trading of


Commodity Futures

Value Propositions
Automatic high-frequency (enter/exit positions every second) trading of
commodity futures using multiple strategies.
Pricing of futures contracts based on convergence arbitrage model,
correlating spot/maturity/positive news to futures.
Presentation of market data from multiple exchanges (exchange source
based on contract) in OMS user interface via candlestick charts, pie charts
Margin account computation on a per-contract and per-portfolio basis, based
on the marking to market of open contracts
Fully-functional order entry with FIX (Financial Information Exchange)
protocol data translation.
Line handler for exchange connectivity runs as a separate process to route
orders from the database to the exchange and parse executions.
Robust historical data module which maps online contract prices to training
vectors for the regression model

Differentiation from Existing HighFrequency Trading Systems


Multiple algorithms (logistic regression, Nave Bayes) for the training of the
pricing function from historical data. Both linear and quadratic degree for
logistic regression
Large volume of data (ten years of pricing information) to both train and test
both algorithms against
Measurement of backtesting performance against Goldman Sachs Commodities
Index
Margin computation to determine when margin calls will occur and portfolio P&L,
should position be closed now
Multiple automated strategies: Logistic regression entry/gain exit, Nave Bayes
entry/gain exit, logistic regression entry/time exit, Nave Bayes entry/time exit,
logistic regression entry/stop loss exit, Nave Bayes entry/stop loss exit
Support of volatility options (gold/oil) and futures options will complement the
futures trading functionality

Profit-Take Exit Strategy


Entry into long position occurs if model price > market
price
Entry into short position occurs if model price < market
price
Exit long when contract rises by a percentage above
threshold
Exit short when contract falls by a percentage below the
threshold
When loss occurs (fall in long, rise in short), no action
taken
4

Stop-Loss Exit Strategy


Enter positions using same market/model comparision
as profit-take
Exit long position when contract declines by percentage
above threshold
Exit short position when contract rises by percentage
above threshold
If gain occurs or no change, no action taken

Time-Based Exit Strategy


Enter positions by contrasting the regression model and
market prices as in the case of profit-taking
Close contracts after a specified time (seconds) has
elapsed since the open
Indifferent to gain/loss (premise is that price incongruity
will increase in the imminent future)
Objective is to minimize the speed of order entry (no
need to use inference to determine when to close
contract)

Parallel Computing Emphasis


Minimal latency and maximal scale is required for order
submission, execution processing, backtesting, market
data updates, training of correlation functions, signal
scanning
Separate thread pool at each stage of the high
frequency trading, market data download, and machine
learning trading pipelines
In the future, leverage the graphics processing unit
(GPU) for larger-scale parallelization

Margin Computation
Percent of notional required to be deposited in order to purchase futures is margin.
Initial and maintenance amounts are set by exchanges on a per contract and per
portfolio basis
As the futures price rises , the account is credited and, as it falls, the account is
debited.
If the maintenance margin level is breached, a margin call is issued
Key benefit of the system is the initial/maintenance computation and the margin
call level computation (as the price of the contract moves)
A desirable feature would be connectivity to the clearinghouse to post margin
deposits and receive refunds electronically
The P&L effects on the portfolio are: Profit = final margin-initial margin-(deposits on
margin calls)
Another future benefit would be to render the underlying volatility of the contracts
for which the margin is computed

Historical Data for Training and


Backtesting
Currently utilizing the MRCI database by parsing HTML
(license fees may apply later)
A different thread executes for each month over the
range of years
Each thread downloads the HTML for a given day,
extracts the spot (front month futures contract) and
futures (3-month, 6-month, etc. All months given)
prices
This data is converted into training vectors in the
database used to calibrate the weights of the machine
learning inference

High-Frequency Trading Signal Scan


Iterate over the ticks table. For each underlying, find the current spot
price (front month), contract maturity, and futures price
Input the spot, maturity, and underlying into the selected pricing
algorithm (Logistic Regression Linear, Logistic Regression Quadratic,
Nave Bayes)
If the market price>model price+band, generate the sell signal, and, if
the market price+band<model price, generate the buy signal
The emphasis is on the latency to find the mispriced contracts (minimize
via parallelism) and, given the discovered mispricing, to maximize the
accuracy of the prediction (precision and recall) to minimize false
positives and negatives. A false positive would be an interpretation of a
market price that is not a buy (sell) signal as a buy (sell) signal, and a
false negative would be the missing of a buy (sell) opportunity

Execution Report Processing (fills)


Line
handler
parses
TCP/IP fills

Generate
trade
record in
database

Close
offsetting
open
contracts

Compute
margin,
P&L

Line handler poolDatabase insert poolTrade processor pool


Mark-to-market pool

Multiple thread pools to handle trade processing pipeline


Point is to quickly visualize firms position and margin obligations
Integrates with regulatory reporting, accounting, and customer reporting
applications
In the future, straight-though-processing support via electronic wire transferring of
margin amounts to the clearinghouse is desired

High Frequency Trading


Configuration
Selection one: Algorithm
Logistic Regression (quadratic/linear)
Nave Bayes

Selection Two: Basis for reversal


Time (one hour expiry, 24-hour expiry)
Profit/Loss (close out on plus or minus 10% return for mark-to-market of open contracts)

Selection Three: Running time of strategy


One day
One hour
Indefinitely (user stops manually)

Selection Four: Basket Composition


Underlying (corn, soybeans, oil, etc)
Contract Month (January, March, etc)

Algorithm is used to determine the mispricing which contracts to open positions in


Basis for reversal is used to determine the signal which indicates when contracts in the HFT basket
should be closed
Reversals can occur thousands of times per day for a given strategy

High-Frequency Decision Logic


(time-based side for reversal and
end)

Revers
al Type
Check

Scan Thread
Start

Scan
End
Type
Check

Timebased
end

Timebased
reversa
l

Currtimestarttime>=re Reverse (close


versal int
position)
Open
Contrac
ts Time
Check

Currtimestarttime>=e
End Scan
Scan nd int
Time
Check Currtimestarttime<en
Continue Scan
d int

Non-Reversal
Currtimestarttime<rev
ersal int

High-Frequency Decision Logic


(profit/loss-based side for reversal
and termination)

Revers
al Type
Check

Scan Thread
Start

Scan
End
Type
Check

P/Lbased
end

Profit/los
s-based
reversal

CurrOpenProfi
tReverse (close
InitProfit>=re
position)
versal PL
Open
Contrac
ts P/L
Check CurrOpenPLNon-Reversal
startPL<rever
sal PL

CurrClosedPLstartPL>=end
End Scan
PL
Scan PL
Check CurrClosedPL-

startPL<end
Continue Scan
PL

Portfolio Visualization
Time series graph of profit/loss
Open contracts, closed contracts (accounting position)
Two and possibly three-dimensional line graphs will
render the portfolio time series. X-Axes can be date and
commodity product
Point is to capture a snapshot of the firms current
position (or the funds return to date)

Profit and Loss Simulation


Iterate market returns for a basket of commodity futures,
commodity options, and volatility futures
Upon permuting the futures prices, compute the profit/loss
of a hypothetical portfolio every day over 100 days
Rebalance the portfolio according to the pricing algorithm
and the strategy parameters.
Plot the daily P&L and compute the final portfolio value on
the last day. P&L is return on closed positions.
Repeat the simulation 1000 times and compute the
average Monte Carlo Simulation

Product Universe
Commodity Futures Corn, Wheat, Soybeans, Gold, Oil,
Sugar
Options on Commodity Futures
Volatility based on VIX volatility index computed by
the Chicago Board Options Exchange (CBOE)
Gold Volatility Index Futures (CME)
Crude Oil Volatility Index Futures (CME)

Currently products traded on two exchanges (Chicago


Mercantile Exchange and Intercontinental Exchange) are
covered

Order Management System


Demonstration Script
From the high-frequency demonstration UI, select the +/-10% Linear Logistic
Regression, with a time span of 1 minute for reversal, underlying of Sugar, expiry of
11/2014, quantity of 500
Add the order to the basket
Start the high-frequency algorithm
After one minute, check that reversal occurred, namely that orders of the opposite
side appear in the database
Insert a 20% downtick order
Show new long position
Insert 20% uptick price
Wait one minute, then demonstrate closed contract (based on open_contracts
element apprectiation)
Demonstrate end of scan due to a 10% appreciation in basket holdings. (based on
appreciation of holding documented in closed_contracts table)

Volatility Index (VIX) Computation


30-day forward volatility of gold and oil (WTI)
Based on option prices from next two months
Both put and calls with out-of-the-money strikes
Proprietary forward VIX will be computed based on machine learning algorithms and
compared to CME VIX futures
VIX = 100*sqrt((365/30)(30-day variance))
(30-day variance) = wP1+(1-w)P2, where P1 is the value of the strip of front-month
options and P2 is the value of the strip of second-month options, and w is the weight
P1 = 2(e^(r*T1))*Ki>F0((K/K^2)*calls[i] + (2e^(r*T1))Ki>F0((K/K^2)*puts[i],
where calls and puts are sets of front-month call and put prices. P2 is set similarly
using second-month calls and puts.
This value is highly sensitive to which option strikes are selected, as use of a subset
of clls and puts results in a VIX value more than triple the CME-computed value.

Exchange Web Service


Used for simulation and stress testing of strategies
initially
Later, provide an electronic exchange for other
institutions to trade commodity futures on for a
commission
Client connections will be made using the Simple Object
Access Protocol (SOAP) for sending XML documents via
HTTP
Depth of book will be visible for bids and asks for all
listed security codes
This will be a form of dark pool, whose order types

You might also like