Vector Vs Event Based Backtesting
Vector Vs Event Based Backtesting
Event-Based
Backtesting
Tribhuvan Bisen
Summary
Backtesting evaluates a trading strategy using historical data. Two paradigms exist: Vector-
Based Backtesting, which processes data in fixed time-step batches (e.g., daily or minute bars) and
uses vectorized operations to compute signals across all assets simultaneously, and Event-Based
Backtesting, which simulates a live environment by sequentially handling discrete market data
events (ticks, bar closes, etc.) in an event loop. Vector-based frameworks are fast and simple,
suited for rapid prototyping and low-frequency strategies, but they assume fills at the next bar’s
open or close, ignore intra-bar dynamics (slippage, partial fills, bid-ask spreads), and risk look-ahead
bias if not implemented carefully. Event-based frameworks are more complex and computationally
intensive but provide higher fidelity by modeling realistic order types, slippage, partial fills, and
real-time risk checks; they also facilitate a smoother production transition. The choice depends on
strategy time horizon, execution complexity, and data availability.
1
2 Mechanics of Vector-Based Backtesting
2.1 Fixed Time-Step Processing
Historical data is loaded into arrays or dataframes where each row is a fixed interval (one day or
one minute) and each column is an asset’s price or indicator series. At each bar ti , indicators are
computed via vectorized operations (e.g., pandas’ .rolling(), NumPy functions) up to ti .
3. Signal Matrix Construction: Compare indicator columns (e.g., 20-period MA vs. 50-
period MA) to produce +1 (long), −1 (short), or 0 (flat) signals for each asset, forming a
signals DataFrame parallel to the price DataFrame.
5. P&L and Risk Metrics: Compute returns by comparing executed prices (next bar) to
current prices. Aggregate daily P&L, apply uniform transaction costs (fixed commission per
share), and generate equity curves, drawdowns, and other statistics.
• Simplicity of Implementation: A single pass over bars (or even fully vectorized code)
suffices—indicators, signals, and rebalances use DataFrame operations. This minimizes boil-
erplate for developers using pandas and NumPy.
• Well-Suited for Low-Frequency Research: When strategies trade daily or weekly using
end-of-day pricing, vectorized backtests yield accurate enough results for factor research or
cross-sectional studies.
2
2.5 Key Limitations and Pitfalls
• Look-Ahead Bias and Data Snooping: If indicator windows or signal logic reference fu-
ture bars (e.g., df.shift(-1)), performance becomes artificially inflated. Off-by-one indexing
errors occur easily in vectorized code.
• Unrealistic Execution Assumptions: Assuming fills at the next bar’s open or close ignores
intra-bar price moves, bid-ask spreads, and partial fills. Strategies requiring limit or stop
orders cannot be simulated faithfully.
• Limited Intraday Risk Controls: Risk checks (intraday margin calls, real-time VaR limits)
cannot be enforced mid-bar; the model only reevaluates at bar boundaries.
1. Market Data Event: New bar or tick arrives (trade, quote, or bar close).
2. Signal Event: Strategy logic consumes the market data and, if conditions match, emits a
signal (e.g., “Buy 100 XYZ at market”).
3. Order Event: Portfolio/Execution handler translates signals into orders (market, limit,
stop), specifying type, size, and price.
4. Fill Event: Broker simulator matches orders against market data (bid-ask, available volume)
and issues partial or full fills at realistic prices.
5. Update Event: Portfolio updates positions, cash, and risk metrics; may generate risk alarms
(margin calls, forced liquidations).
6. Performance Recording: Each fill triggers P&L calculations, slippage accounting, and is
logged for performance analysis.
Market data is introduced only when available, avoiding look-ahead bias and enabling realistic
execution simulation.
Strategy Module: Subscribes to data events. Maintains internal state (e.g., rolling indicators
updated incrementally) and emits signal events when conditions are met (e.g., price crosses
above a moving average).
3
Portfolio/Execution Handler: Receives signal events and creates order events (market, limit,
stop). Manages cash, positions, and risk exposure, enforcing predefined risk limits (e.g.,
maximum notional exposure).
Broker/Exchange Simulator: Processes order events by matching them against market data
(quotes or reconstructed order book). Models realistic slippage, partial fills, and commissions,
issuing fill events back to the portfolio.
Risk & Compliance Module: Monitors limits such as intraday VaR, position caps, and short-
sale constraints. Triggers forced liquidations or cancels orders if thresholds are breached.
Performance Recorder: Logs each fill event—timestamp, fill price, quantity, slippage, commis-
sion—for post-mortem analysis and performance metrics calculation.
4
4.2 Data Storage and Preprocessing
• Vector-Based: Requires only bar-level OHLCV data (daily or minute). Storage is mod-
est—few gigabytes for multi-year minute data—and preprocessing focuses on corporate ac-
tions and basic cleaning.
• Event-Based: Requires tick or sub-bar data (trades and quotes), with storage volumes often
reaching terabytes for multi-year, multi-asset archives. Preprocessing involves reconstruct-
ing order books, filtering bad ticks, and merging multiple feeds, usually using specialized
databases (kdb+, Parquet, etc.).
• Event-Based: Supports market, limit, stop, stop-limit, iceberg, and custom TWAP/VWAP
algorithms. Orders are matched against tick-level quotes or reconstructed order books, en-
abling realistic slippage, partial fills, and volume-based execution cost modeling.
• Event-Based: Enables mid-bar risk checks: if a fill or price tick breaches risk limits (mar-
gin calls, stop-loss), the system can issue forced liquidations or cancel orders immediately,
mirroring live trading constraints.
5
2. Ensure indicator calculations reference only past bars (e.g., df[’close’].shift(1)) when
generating signals for day T to execute on day T + 1.
3. Validate code by inspecting windows around known corporate actions to confirm correct
alignment.
1. Incorporate a tiered slippage model: e.g., orders under 5% of ADV incur 1 bp slippage; orders
between 5–10% of ADV incur 5 bps; orders above 10% incur 10 bps.
2. Use event-driven testing where fills draw from real bid-ask data and volume profiles.
1. Use efficient data structures (e.g., collections.deque or heapq) instead of a naı̈ve Queue to
manage millions of events.
2. Apply Cython or PyPy optimizations in critical functions (fill matching, risk checks) to reduce
interpreter overhead.
3. Batch-process events sharing the same timestamp (e.g., multiple quotes at time t) to cut
Python loop overhead.
1. Implement a fill-log table (Table 1) to capture event ID, order type, fill price, fill quantity,
timestamp, slippage, and commission.
2. Post-mortem: Compare fill logs to raw market data to identify anomalous slippage or execu-
tion issues.
6
Table 1: Sample Fill Log Table
Event ID Order ID Asset Order Type Fill Time Fill Price & Quantity
1001 5001 AAPL Market 2024-05-01 09:31 $175.20 / 100 shares
1002 5002 MSFT Limit @ $320.00 2024-05-01 09:35 $319.90 / 50 shares
1003 5003 GOOG Stop @ $2700.00 2024-05-02 10:15 $2700.00 / 10 shares
2. Rapid Prototyping: Iterating quickly over parameter grids (look-back windows, threshold
levels) to identify promising designs before deeper testing.
3. Limited Data Availability: When only end-of-day or minute-bar data is accessible and
tick data is too costly or unavailable.
2. Complex Order Logic: Strategies requiring limit orders, iceberg orders, TWAP/VWAP
execution, or adaptive sizing need event-driven execution for realistic partial fills and slippage
modeling.
7
Table 2 continued
Factor Vector-Based Event-Based
Execution Granularity Bar-level only (one Tick- or intra-bar
price per bar) level, modeling bid-
ask and volume
Order Types Supported Market orders at next Market, limit, stop,
bar open/close; lim- stop-limit, iceberg,
ited stop/limit simula- TWAP, VWAP, OCO
tion via bar high/low
checks
Slippage Modeling Post-hoc flat slippage Dynamic slippage from
assumptions or simple bid-ask data, volume,
tiers order-book depth
Realism Lower fidelity (as- Higher fidelity (simu-
sumes synchronized lates realistic fills, par-
bar rebalances, no tial fills, asynchronous
intra-bar execution signals)
nuances)
Speed & Scalability Very fast (seconds for Slower (minutes to
decades of data across hours for minute/tick
hundreds of symbols data via event loops)
via vectorization)
Code Complexity Simple (few hundred Complex (multi-
lines, DataFrame oper- module event loop,
ations) 500–1,000+ lines,
intricate event/state
management)
Data Requirements OHLCV bars (daily or Tick or sub-minute
minute) data, order-book
reconstruction, exten-
sive cleaning
Risk Management Granularity Bar-level only (risk Event-level (real-time
checks at bar close) risk checks, margin
calls, intraday draw-
down enforcement)
Fidelity to Production Systems Limited (needs rewrit- High (can often be
ing for live streaming reused in production
data) with minimal changes)
Ideal Use Cases Low-frequency, factor Intra-day, high-
research, rapid proto- frequency, algorithmic
typing execution, market-
making, institutional
infrastructure
8
8 Concluding Remarks
Vector-based backtesting excels at rapid prototyping for low-frequency, bar-level strategies by lever-
aging optimized NumPy and pandas routines to compute P&L across the entire dataset in seconds.
It is ideal for factor research or strategies where intra-bar dynamics are negligible. However, it
risks look-ahead bias and cannot model realistic execution aspects—slippage, partial fills, bid-ask
spreads—or enforce intraday risk controls.
Event-based backtesting provides a high-fidelity simulation of live trading: modeling tick-level
updates, sequential event processing, realistic slippage, partial fills, and dynamic risk controls.
This accuracy comes at the cost of greater complexity, more extensive data requirements (tick/sub-
minute), and longer runtimes. Event-driven engines are essential for intra-day or execution-sensitive
strategies—market-making, high-frequency trading, complex order logic—and facilitate a smoother
transition from backtest code to production trading systems.
Many quant teams adopt a hybrid approach: using vectorized backtests to screen and narrow
a universe of candidates, then migrating promising designs into an event-driven framework for
final validation before live deployment. By understanding trade-offs among speed, realism, data
requirements, and strategy complexity, practitioners can tailor their backtesting infrastructure to
meet research, development, and production needs effectively.