0% found this document useful (0 votes)
29 views9 pages

Vector Vs Event Based Backtesting

The document compares Vector-Based and Event-Based Backtesting methodologies for evaluating trading strategies using historical data. Vector-Based Backtesting processes data in fixed time intervals, offering speed and simplicity but risks look-ahead bias and unrealistic execution assumptions, while Event-Based Backtesting simulates a live trading environment with higher fidelity but is more complex and computationally intensive. The choice between the two depends on the strategy's time horizon, execution complexity, and data availability.

Uploaded by

thiagomaia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views9 pages

Vector Vs Event Based Backtesting

The document compares Vector-Based and Event-Based Backtesting methodologies for evaluating trading strategies using historical data. Vector-Based Backtesting processes data in fixed time intervals, offering speed and simplicity but risks look-ahead bias and unrealistic execution assumptions, while Event-Based Backtesting simulates a live trading environment with higher fidelity but is more complex and computationally intensive. The choice between the two depends on the strategy's time horizon, execution complexity, and data availability.

Uploaded by

thiagomaia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

A Practical Breakdown of Vector-Based vs.

Event-Based
Backtesting
Tribhuvan Bisen

Summary
Backtesting evaluates a trading strategy using historical data. Two paradigms exist: Vector-
Based Backtesting, which processes data in fixed time-step batches (e.g., daily or minute bars) and
uses vectorized operations to compute signals across all assets simultaneously, and Event-Based
Backtesting, which simulates a live environment by sequentially handling discrete market data
events (ticks, bar closes, etc.) in an event loop. Vector-based frameworks are fast and simple,
suited for rapid prototyping and low-frequency strategies, but they assume fills at the next bar’s
open or close, ignore intra-bar dynamics (slippage, partial fills, bid-ask spreads), and risk look-ahead
bias if not implemented carefully. Event-based frameworks are more complex and computationally
intensive but provide higher fidelity by modeling realistic order types, slippage, partial fills, and
real-time risk checks; they also facilitate a smoother production transition. The choice depends on
strategy time horizon, execution complexity, and data availability.

1 Overview of Backtesting Paradigms


1.1 What Is Backtesting?
Backtesting applies a trading strategy’s rules to historical market data to estimate performance—returns,
drawdowns, risk metrics—as if the strategy had run live. It is essential for validating logic before
deploying capital.

1.2 Two Core Paradigms


• Vector-Based Backtesting: Processes entire price series at set time intervals (daily or
minute bars), computing indicators, signals, and rebalances in batch—often at bar close or
next bar open.

• Event-Based Backtesting: Simulates a real-time trading environment by sequentially han-


dling discrete market data events (ticks, bar closes, news) in an event loop; strategy, portfolio,
and execution modules react to each event in chronological order.

1
2 Mechanics of Vector-Based Backtesting
2.1 Fixed Time-Step Processing
Historical data is loaded into arrays or dataframes where each row is a fixed interval (one day or
one minute) and each column is an asset’s price or indicator series. At each bar ti , indicators are
computed via vectorized operations (e.g., pandas’ .rolling(), NumPy functions) up to ti .

2.2 Signal Generation and Position Updates


At bar ti , the strategy applies logic such as moving-average crossovers or mean-reversion tests using
vectorized calculations (e.g., price.rolling(window=20).mean()) to decide long, short, or flat for
the next bar. All signals for all assets are computed simultaneously. The engine issues hypothetical
market orders filled at the next bar’s open or close, rebalancing positions in a single batch.

2.3 Practical Implementation Steps


1. Data Preprocessing: Load adjusted OHLCV data into a pandas DataFrame indexed by
timestamp. Apply corporate-action adjustments correctly to avoid look-ahead bias.

2. Indicator Calculation: Use pandas’ vectorized functions—.rolling(), .ewm()—to derive


technical indicators (moving averages, RSI, Bollinger Bands) for each asset in one pass.

3. Signal Matrix Construction: Compare indicator columns (e.g., 20-period MA vs. 50-
period MA) to produce +1 (long), −1 (short), or 0 (flat) signals for each asset, forming a
signals DataFrame parallel to the price DataFrame.

4. Portfolio Rebalancing: Multiply the signals by a position-sizing rule (fixed-dollar allocation


or volatility scaling) to determine target position sizes. Simulate fills at the next bar’s open
or close without modeling intra-bar order book dynamics.

5. P&L and Risk Metrics: Compute returns by comparing executed prices (next bar) to
current prices. Aggregate daily P&L, apply uniform transaction costs (fixed commission per
share), and generate equity curves, drawdowns, and other statistics.

2.4 Strengths of Vector-Based Approaches


• High Computational Efficiency: Vectorized operations run in parallel (SIMD/BLAS),
enabling rapid backtests on large universes (hundreds of assets, millions of rows).

• Simplicity of Implementation: A single pass over bars (or even fully vectorized code)
suffices—indicators, signals, and rebalances use DataFrame operations. This minimizes boil-
erplate for developers using pandas and NumPy.

• Well-Suited for Low-Frequency Research: When strategies trade daily or weekly using
end-of-day pricing, vectorized backtests yield accurate enough results for factor research or
cross-sectional studies.

2
2.5 Key Limitations and Pitfalls
• Look-Ahead Bias and Data Snooping: If indicator windows or signal logic reference fu-
ture bars (e.g., df.shift(-1)), performance becomes artificially inflated. Off-by-one indexing
errors occur easily in vectorized code.

• Unrealistic Execution Assumptions: Assuming fills at the next bar’s open or close ignores
intra-bar price moves, bid-ask spreads, and partial fills. Strategies requiring limit or stop
orders cannot be simulated faithfully.

• Inability to Handle Path Dependencies: Complex money-management rules—trailing


stops based on intra-bar highs/lows or dynamic sizing based on realized P&L—cannot be
implemented accurately when each bar is treated as a single event.

• Limited Intraday Risk Controls: Risk checks (intraday margin calls, real-time VaR limits)
cannot be enforced mid-bar; the model only reevaluates at bar boundaries.

3 Mechanics of Event-Based Backtesting


3.1 Event Loop Architecture
Event-based frameworks use a chronological event queue. The core loop processes events in order:

1. Market Data Event: New bar or tick arrives (trade, quote, or bar close).

2. Signal Event: Strategy logic consumes the market data and, if conditions match, emits a
signal (e.g., “Buy 100 XYZ at market”).

3. Order Event: Portfolio/Execution handler translates signals into orders (market, limit,
stop), specifying type, size, and price.

4. Fill Event: Broker simulator matches orders against market data (bid-ask, available volume)
and issues partial or full fills at realistic prices.

5. Update Event: Portfolio updates positions, cash, and risk metrics; may generate risk alarms
(margin calls, forced liquidations).

6. Performance Recording: Each fill triggers P&L calculations, slippage accounting, and is
logged for performance analysis.

Market data is introduced only when available, avoiding look-ahead bias and enabling realistic
execution simulation.

3.2 Component Breakdown


Market Data Handler: Streams historical tick or bar data into the event queue chronologically.
Prevents look-ahead by releasing data only when appropriate.

Strategy Module: Subscribes to data events. Maintains internal state (e.g., rolling indicators
updated incrementally) and emits signal events when conditions are met (e.g., price crosses
above a moving average).

3
Portfolio/Execution Handler: Receives signal events and creates order events (market, limit,
stop). Manages cash, positions, and risk exposure, enforcing predefined risk limits (e.g.,
maximum notional exposure).
Broker/Exchange Simulator: Processes order events by matching them against market data
(quotes or reconstructed order book). Models realistic slippage, partial fills, and commissions,
issuing fill events back to the portfolio.
Risk & Compliance Module: Monitors limits such as intraday VaR, position caps, and short-
sale constraints. Triggers forced liquidations or cancels orders if thresholds are breached.
Performance Recorder: Logs each fill event—timestamp, fill price, quantity, slippage, commis-
sion—for post-mortem analysis and performance metrics calculation.

3.3 Practical Implementation Steps


1. Data Ingestion: Load tick or bar data into a time-ordered event queue (e.g., collections.deque).
2. Initialize Components: Instantiate DataHandler, Strategy, PortfolioHandler, BrokerSimulator,
and PerformanceTracker, wiring them via a central event bus.
3. Event Loop Execution:
• If the queue is empty, the data handler enqueues the next market data event.
• Dequeue the next event:
– Market Data Event: Call Strategy.on market data() (may emit signal events)
and Portfolio.on market data() (may emit order events).
– Signal Event: Call Portfolio.on signal() to generate order events, which are
enqueued.
– Order Event: Call Broker.execute order() to simulate fills, issuing fill events back
into the queue.
– Fill Event: Call Portfolio.on fill() to update positions and cash, then PerformanceTracker.on
to log P&L, slippage, and commissions.
• Repeat until data is exhausted and the queue is empty.
4. Post-Processing: Aggregate fill-level logs into performance metrics—Sharpe ratio, maxi-
mum drawdown, turnover—once the backtest completes.

4 Practical Strengths and Trade-Offs


4.1 Speed and Scalability
• Vector-Based: A multi-asset, daily-rebalancing vectorized backtest (e.g., 500 tickers over
10 years of daily data) completes in under one second on a modern workstation, thanks to
optimized NumPy/pandas routines.
• Event-Based: Simulating minute-level data for the same universe (500 symbols × 2,500
trading days × 390 minutes 487 million data points) can take 15–30 minutes, even with
optimized Cython or PyPy code. Tick-level simulations (millions of ticks per symbol) may
require hours or distributed computing.

4
4.2 Data Storage and Preprocessing
• Vector-Based: Requires only bar-level OHLCV data (daily or minute). Storage is mod-
est—few gigabytes for multi-year minute data—and preprocessing focuses on corporate ac-
tions and basic cleaning.

• Event-Based: Requires tick or sub-bar data (trades and quotes), with storage volumes often
reaching terabytes for multi-year, multi-asset archives. Preprocessing involves reconstruct-
ing order books, filtering bad ticks, and merging multiple feeds, usually using specialized
databases (kdb+, Parquet, etc.).

4.3 Order Types and Execution Fidelity


• Vector-Based: Essentially limited to market orders at next bar open or close. Simulating
limit orders, OCO, or stops requires heuristics (checking bar high/low), but cannot model
partial fills or depth-of-book effects accurately.

• Event-Based: Supports market, limit, stop, stop-limit, iceberg, and custom TWAP/VWAP
algorithms. Orders are matched against tick-level quotes or reconstructed order books, en-
abling realistic slippage, partial fills, and volume-based execution cost modeling.

4.4 Risk Management Fidelity


• Vector-Based: Risk constraints—maximum drawdown, volatility caps, position size lim-
its—are enforced only at bar close. Intraday breaches (e.g., a 2% intraday drop triggering
liquidation) are missed until the next bar.

• Event-Based: Enables mid-bar risk checks: if a fill or price tick breaches risk limits (mar-
gin calls, stop-loss), the system can issue forced liquidations or cancel orders immediately,
mirroring live trading constraints.

4.5 Strategy Complexity and Path Dependencies


• Vector-Based: Recursive features—trailing stops based on intra-bar highs/lows, dynamic
sizing based on realized P&L—are difficult because bars are processed as atomic events.

• Event-Based: Naturally handles path dependencies by updating indicators and portfolio


state on each event. For instance, a trailing stop adjusting to a high-water mark as ticks
arrive can be coded directly within the event logic.

5 Avoiding Common Pitfalls


5.1 Look-Ahead Bias in Vectorized Backtests
Definition: Look-ahead bias happens when future information inadvertently influences past signal
generation. For example, using adjusted prices that “know” about splits or dividends before they
occur inflates performance. Mitigation:
1. Use raw price series and apply corporate-action adjustments only at or after the ex-date, not
beforehand.

5
2. Ensure indicator calculations reference only past bars (e.g., df[’close’].shift(1)) when
generating signals for day T to execute on day T + 1.

3. Validate code by inspecting windows around known corporate actions to confirm correct
alignment.

5.2 Incorrect Bar Offset and Signal Lag


Issue: If signals are generated on bar T and executed on the same bar’s close, you implicitly assume
knowledge of that close price before it is available. Mitigation: Always shift signals by one bar
(e.g., signals = signals.shift(1)) so execution on bar T + 1 uses only information available at
bar T .

5.3 Poorly Modeled Slippage and Partial Fills


Issue: Vector-based backtests often apply a flat slippage rate (e.g., 5 bps per trade) uniformly,
ignoring volume, liquidity, and bid-ask dynamics, misrepresenting P&L in less liquid securities.
Mitigation:

1. Incorporate a tiered slippage model: e.g., orders under 5% of ADV incur 1 bp slippage; orders
between 5–10% of ADV incur 5 bps; orders above 10% incur 10 bps.

2. Use event-driven testing where fills draw from real bid-ask data and volume profiles.

5.4 Event Loop Bottlenecks


Issue: Inefficient event queues become bottlenecks, slowing simulations over tick data significantly.
Mitigation:

1. Use efficient data structures (e.g., collections.deque or heapq) instead of a naı̈ve Queue to
manage millions of events.

2. Apply Cython or PyPy optimizations in critical functions (fill matching, risk checks) to reduce
interpreter overhead.

3. Batch-process events sharing the same timestamp (e.g., multiple quotes at time t) to cut
Python loop overhead.

5.5 Inadequate Logging and Diagnostics


Issue: Without detailed logs of fills (timestamp, price, quantity, slippage, commission), diagnosing
discrepancies between vectorized and event-driven results is difficult. Mitigation:

1. Implement a fill-log table (Table 1) to capture event ID, order type, fill price, fill quantity,
timestamp, slippage, and commission.

2. Post-mortem: Compare fill logs to raw market data to identify anomalous slippage or execu-
tion issues.

6
Table 1: Sample Fill Log Table
Event ID Order ID Asset Order Type Fill Time Fill Price & Quantity
1001 5001 AAPL Market 2024-05-01 09:31 $175.20 / 100 shares
1002 5002 MSFT Limit @ $320.00 2024-05-01 09:35 $319.90 / 50 shares
1003 5003 GOOG Stop @ $2700.00 2024-05-02 10:15 $2700.00 / 10 shares

6 When to Choose Which Paradigm


6.1 Vector-Based Backtesting Suits
1. High-Level Research: Testing factor models, cross-sectional strategies, or statistical anal-
ysis on monthly or daily data where intra-bar details are negligible.

2. Rapid Prototyping: Iterating quickly over parameter grids (look-back windows, threshold
levels) to identify promising designs before deeper testing.

3. Limited Data Availability: When only end-of-day or minute-bar data is accessible and
tick data is too costly or unavailable.

6.2 Event-Based Backtesting Suits


1. Intra-Day & High-Frequency Strategies: When rapid reaction to price changes, order-
book dynamics, or cross-asset arbitrage opportunities within seconds or milliseconds is essen-
tial.

2. Complex Order Logic: Strategies requiring limit orders, iceberg orders, TWAP/VWAP
execution, or adaptive sizing need event-driven execution for realistic partial fills and slippage
modeling.

3. Institutional-Grade Infrastructure: For code intended to transition into a live trading


system, event-driven frameworks allow swapping historical data feeds for live broker APIs
and simulated brokers for real execution engines.

4. Dynamic Risk Management: Strategies constrained by intraday margin requirements or


real-time VaR limits can enforce rules mid-bar only in an event-driven system.

7 Summary of Key Trade-Offs

Factor Vector-Based Event-Based


Core Loop Fixed time-step loop Event-driven loop
(daily/minute bars) (market data to sig-
nals to orders to
fills)

7
Table 2 continued
Factor Vector-Based Event-Based
Execution Granularity Bar-level only (one Tick- or intra-bar
price per bar) level, modeling bid-
ask and volume
Order Types Supported Market orders at next Market, limit, stop,
bar open/close; lim- stop-limit, iceberg,
ited stop/limit simula- TWAP, VWAP, OCO
tion via bar high/low
checks
Slippage Modeling Post-hoc flat slippage Dynamic slippage from
assumptions or simple bid-ask data, volume,
tiers order-book depth
Realism Lower fidelity (as- Higher fidelity (simu-
sumes synchronized lates realistic fills, par-
bar rebalances, no tial fills, asynchronous
intra-bar execution signals)
nuances)
Speed & Scalability Very fast (seconds for Slower (minutes to
decades of data across hours for minute/tick
hundreds of symbols data via event loops)
via vectorization)
Code Complexity Simple (few hundred Complex (multi-
lines, DataFrame oper- module event loop,
ations) 500–1,000+ lines,
intricate event/state
management)
Data Requirements OHLCV bars (daily or Tick or sub-minute
minute) data, order-book
reconstruction, exten-
sive cleaning
Risk Management Granularity Bar-level only (risk Event-level (real-time
checks at bar close) risk checks, margin
calls, intraday draw-
down enforcement)
Fidelity to Production Systems Limited (needs rewrit- High (can often be
ing for live streaming reused in production
data) with minimal changes)
Ideal Use Cases Low-frequency, factor Intra-day, high-
research, rapid proto- frequency, algorithmic
typing execution, market-
making, institutional
infrastructure

8
8 Concluding Remarks
Vector-based backtesting excels at rapid prototyping for low-frequency, bar-level strategies by lever-
aging optimized NumPy and pandas routines to compute P&L across the entire dataset in seconds.
It is ideal for factor research or strategies where intra-bar dynamics are negligible. However, it
risks look-ahead bias and cannot model realistic execution aspects—slippage, partial fills, bid-ask
spreads—or enforce intraday risk controls.
Event-based backtesting provides a high-fidelity simulation of live trading: modeling tick-level
updates, sequential event processing, realistic slippage, partial fills, and dynamic risk controls.
This accuracy comes at the cost of greater complexity, more extensive data requirements (tick/sub-
minute), and longer runtimes. Event-driven engines are essential for intra-day or execution-sensitive
strategies—market-making, high-frequency trading, complex order logic—and facilitate a smoother
transition from backtest code to production trading systems.
Many quant teams adopt a hybrid approach: using vectorized backtests to screen and narrow
a universe of candidates, then migrating promising designs into an event-driven framework for
final validation before live deployment. By understanding trade-offs among speed, realism, data
requirements, and strategy complexity, practitioners can tailor their backtesting infrastructure to
meet research, development, and production needs effectively.

You might also like