6. Backtesting Engine and Performance

This chapter explains how qteasy’s backtesting engine works from the perspective of usage and results, and how performance and vectorization are implemented, helping users understand where backtest speed and correctness come from. For a lower-level architectural explanation, see Backtesting Engine and Performance (Design Perspective) in “Architecture and Design”.

6.1. 1. 回测引擎概览

The backtest entry point and workflow have been described in Backtest Overview and How to Run a Backtest: enter backtest mode via qt.run(op, mode=1, ...), and internally, at each time step in group_timing_table, run the strategy, parse signals, and simulate executions.

Unlike “bar-by-bar event-driven” or “one-shot matrix operations over the full timeline,” qteasy uses time-dimension sequential stepping with instrument-dimension vectorization within each step:

  • Time dimension: Execute each time step in real trading cadence order to correctly maintain states such as cash, positions, and the settlement queue (e.g., T+1).

  • Instrument dimension: Within each time step, perform vectorized computations (NumPy arrays + Numba acceleration) for all instruments in the asset pool—buy/sell volumes, fees, position updates, etc.—achieving high per-step efficiency.

This both ensures states and rules consistent with live trading, and avoids the performance loss caused by per-tick loops under a purely event-driven approach.

6.2. 2. 性能与向量化

2.1 Core Computation Acceleration

The following core functions in backtesting and trade-result calculation are all accelerated with Numba JIT (@njit(nogil=True, cache=True)):

  • backtest_step: Within a single step, compute buy/sell volumes, fees, positions, and settlement updates for the whole batch of instruments.

  • calculate_trade_results: Trade results and cash-flow calculation.

  • backtest_batch_steps: Loop over time steps inside Numba and call backtest_step to complete the entire backtest segment.

  • backtest_flash_steps: Used only in optimization mode; keeps only the final cash and positions to save memory, and likewise completes batch stepping within Numba.

Signal mixing (e.g., merging multiple strategies) also uses @njit acceleration for some operators in the blender module. Therefore, the main backtesting engine is implemented with vectorization + Numba, rather than pure Python loops.

2.2 Time-Dimension Ordering and Vectorization Along the Instrument Dimension

Dimension

Approach Overview

Time Dimension

Step forward sequentially to correctly maintain states such as T+1, settlement cycles, MOQ, etc., consistent with live trading.

Instrument Dimension

Perform array operations for all instruments within a single step (e.g., op_signal, own_amounts, etc. are 1D arrays) to achieve vectorization.

Optimization Mode

Multiple parameter sets are backtested in parallel via multiprocessing; within each set, efficient paths such as backtest_flash_steps are used to reduce memory and computation.

On the first run or when parameters change, Numba compiles the relevant functions, which may introduce a one-time startup delay; subsequent backtests of the same type will reuse the cache and run faster.

2.3 Brief comparison with VectorBT

Comparison item

qteasy

VectorBT

Time Dimension

Step forward sequentially, vectorized within each step

One-shot matrix operations over the full timeline, with no explicit time loop

States and constraints

Explicitly maintains T+1, settlement, MOQ, etc., closely matching live trading

Mostly simplified equity curves; does not emphasize settlement/MOQ

Applicable scenarios

When you need to accurately simulate A-share rules, merge multiple strategies, and model detailed costs and settlement

Rapid screening of massive parameter combinations; research-oriented backtesting

On the premise of ensuring state correctness and configurable rules, qteasy balances speed through time-axis sequencing + instrument-axis vectorization + Numba single-step execution; multi-parameter optimization, meanwhile, makes up for the difference of “not being able to broadcast the full matrix in a single run” via multiprocessing parallelism.

6.3. 3. 本目录与相关文档