10. Process data (proc.*) and dynamic backtesting (design specification)

This chapter explains the definition and access methods of process data in qteasy, as well as the selection of static branches vs dynamic branches during backtesting and the consistency conventions between them, for reference when implementing and extending. For usage-level explanations, read it together with “How Strategies Declare and Use Data” and the API documentation.

10.1. 1. 背景与目标

1.1 Meaning of Process Data

Some strategies need to rely on data that changes with the backtest or live trading execution path, for example:

  • Current/historical positions, available cash;

  • Historical filled quantity, fill price, transaction costs;

  • Market value, total assets, etc. derived from positions and prices.

Such data cannot be pre-generated in one go before a backtest starts; it can only be maintained during runtime by the Backtester (backtesting) or Trader (live trading), and provided to the strategy at each step of signal generation according to the “currently visible scope”. We collectively refer to it as process data.

1.2 Design Objectives

  • Unified entry point: Like static historical data, process data is obtained via Strategy.get_data(), reducing the learning curve.

  • No look-ahead: When generating the signal for step k, the strategy cannot see the execution results of step k; it can only use the history of completed steps.

  • Backtest/live consistency: The same set of strategies and the same get_data('proc.xxx') calling pattern can be used in both backtesting and live trading; when process data is needed, it follows the dynamic execution path, otherwise it can remain consistent with the original static path in terms of results.

10.2. 2. 过程数据的统一定义(proc.*)

2.1 Naming and Exposure Method

  • All process data is exposed to the strategy in the form proc.<field_name>, such as proc.own_cash and proc.trade_records.

  • Process data does not need to be declared via data_types in the strategy’s __init__. It is injected at runtime by the Backtester / Trader, and the strategy only needs to call get_data('proc.xxx', ...) in realize() as needed.

2.2 Built-in fields implemented

The process data fields implemented in the current version and available for use in strategies include:

Category

Field name

Meaning

Account scalar

proc.own_cash

Total cash in the account at the start of the current step

proc.available_cash

Cash available for placing orders at the start of the current step

proc.total_value

Total asset market value at the start of the current step (position valuation + cash)

Position vector

proc.own_amounts

Total position quantity of each instrument at the start of the current step

proc.available_amounts

Sellable quantity for each instrument at the start of the current step

proc.position_value

Position market value for each instrument at the start of the current step (calculated from the internal price and positions)

Execution results

proc.trade_records

Actual executed quantity for each instrument at each step (positive for buys, negative for sells)

proc.trade_cost

Transaction costs for each instrument at each step

proc.trade_price

Execution price for each instrument at each step

For the time semantics and visibility constraints of the above fields in backtesting and live trading, see Section 4.

2.3 Future extensible fields (optional)

Fields that can be further extended by design include: proc.realized_pnl, proc.unrealized_pnl, proc.last_trade_price, proc.last_trade_volume, etc. Refer to the implementation and documentation for details.

10.3. 3. 访问接口:Strategy.get_data() 与 proc.*

3.1 Static data (no proc. prefix)

  • Single source: self.get_data('close_E_d'); multiple sources: self.get_data('close_E_d', 'high_E_d').

  • Static data does not support the lag / window parameters; if provided, an English ValueError is raised.

3.2 Process data (proc. prefix)

  • Call examples:

    • self.get_data('proc.own_cash'): the cash series up to the current step;

    • self.get_data('proc.own_cash', lag=0): the cash value at the most recent step;

    • self.get_data('proc.own_cash', lag='1d'): the step corresponding to looking back 1 day by time;

    • self.get_data('proc.own_cash', window='5d'): a window slice over the past 5 days.

  • Constraints:

    • A single call allows only one proc.* field; if multiple fields are used or it is mixed with static data in the same call, raise an English ValueError.

    • lag and window cannot be specified at the same time; lag can be an integer (steps) or a string (e.g., '1d', '8h'), and window is a string (e.g., '5d', '8h').

  • Return value: always np.ndarray; the shape and data type are subject to the API documentation.

10.4. 4. 回测分支选择与过程数据协作

4.1 Static branch and dynamic branch

  • Static branch (_backtest_static_operator): Call run_strategies once for all time steps to generate all signals, then complete the backtest using Numba vectorized functions such as backtest_batch_steps. Suitable for strategies that do not depend on process data.

  • Dynamic branch (_backtest_dynamic_operator): Loop over time steps; at each step generate signals → parse and simulate fills → update positions and cash, then move to the next step. Process data is maintained by the Backtester and injected into the Operator before each step, so strategies can access it via get_data('proc.xxx').

Before running, the Backtester decides which branch to take via Operator.check_dynamic_data().

4.2 Decision logic of check_dynamic_data() (current implementation)

If any of the following is true, return True and take the dynamic branch:

  1. op_type == ‘stepwise’: The Operator is explicitly configured to stepwise mode.

  2. Using proc. in strategy source code*: _strategies_use_proc_data() checks whether the source code of each strategy’s realize() contains 'proc.' or "proc.". If so, it is considered to depend on process data.

Therefore, as long as get_data('proc.xxx') is called in realize(), it will automatically take the dynamic branch without any declaration. Process data is accessed only via proc.*; declaring it via DataType is no longer supported (legacy op_* types have been removed).

4.3 No look-ahead guarantee

  • When the strategy generates a signal at step k:

    • Account/position data (e.g., own_cashes, own_amounts) is visible at most up to index [0..k] (i.e., the state at the start of the current step);

    • Trade-related data (e.g., trade_records, trade_cost) is visible at most up to [0..k-1], excluding trades that have not yet occurred in this step.

  • In implementation, Operator’s _current_signal_index and Strategy’s _get_process_data_single() slice by the above ranges. The Backtester updates the index before generating signals at each step, ensuring no look-ahead.

4.4 The injection relationship between Backtester and Operator

  • Backtester (dynamic branch): at the _backtest_dynamic_operator entry point, inject own_cashes, available_cashes, own_amounts_array, available_amounts_array, trade_records_array, trade_cost_array, trade_price_array, trade_price_data, etc. into the Operator as _process_data_sources, and set _process_time_index to a timeline aligned with op_signal_index.

  • Operator: In run_strategy(step_index), before each call to stg.generate(), compute and update _current_signal_index based on group_timing_table and group_merge_type, for Strategy to slice the “currently visible” process data.

4.5 Process data in live trading (Trader)

When operator.check_dynamic_data() is True, Trader will, in _run_strategy(), do the following:

  • Assemble the current account cash, positions, available quantities, current prices, etc. into single-step _process_data_sources and _process_time_index (a single live-trading run is treated as one step);

  • Within this step, the strategy can call get_data('proc.own_cash'), etc. to obtain the current account/position view; the trade history is empty within this step, consistent with the semantics of “no trades have occurred in this step yet”.

10.5. 5. 动态/静态路径一致性约定

  • When a strategy does not use process data: it should take the static branch; if it takes the dynamic branch for other reasons, the backtest results should be exactly identical numerically to the static branch (under the same configuration and data).

  • When a strategy uses process data: it must go through the dynamic branch; otherwise, _process_data_sources not being injected will trigger a RuntimeError.

  • Test convention: use StaticSignalStg (purely static) and ProcAwareButStaticLogicStg (calls proc but does not use it for signals) to backtest under the same configuration, and assert numerical consistency for own_cashes, own_amounts_array, trade_records_array, etc.; see Group B in tests/test_process_data_api.py.

10.6. 6. 测试与文档索引

  • Dedicated tests: tests/test_process_data_api.py

    • Group A: the behavior of check_dynamic_data() under purely static strategies / strategies using proc.*;

    • Group B: backtest array consistency between static strategies and “calling proc but logically equivalent” strategies;

    • Group C: get_data error behavior for static multi-source, rejecting lag/window, proc single-field, and mixed calls;

    • Group D: no look-ahead validation for proc.trade_records;

    • Group E: correctness of the real dynamic-strategy path based on process data.

  • Project memory: .cursor/rules/process-data-and-dynamic-backtest.mdc (implementation and convention summary).

  • Strategy data overview: How strategies declare and use data; Backtest entry points and modes: Backtesting, live trading, and optimization.