3. Data Acquisition, Storage, and Data Types
3.1. 1. 数据层在整体中的位置
The data layer provides the strategy layer and the runtime layer with historical information “by type, by window.” Strategies do not access raw data tables directly; instead, they declare which DataType they need and how long the window_length should be. Before execution, the engine retrieves the corresponding data from the DataSource, organizes it into data windows, and injects it into the strategy. This ensures that backtesting and live trading use the same data view, and it also prevents strategies from using “future data” by design.
3.2. 2. 从原始数据到本地存储
2.1 Data Flow
External data sources (e.g., Tushare, Eastmoney, etc.) → fetching and cleaning (format unification, deduplication, alignment) → DataSource (files or database) → written into multiple data tables in a unified structure.
2.2 The Role of DataSource
Unified storage abstraction: regardless of whether the underlying layer is CSV/HDF/Feather files or a database such as MySQL, to the upper layer it is always read and written “by table name, by column”.
Multi-engine support: Can be configured as file (csv/hdf/fth) or db (MySQL, etc.) to meet different deployment needs.
Does not proactively pull data: DataSource is only responsible for reading and writing data that already exists locally; fetching and updating from the network is done by the user or scheduled tasks calling the data interface, and then writing into DataSource.
Therefore, the data-layer design separates “fetching” and “storage”: data obtained from fetching is written into the DataSource after cleaning, and strategies and backtests only consume data that already exists in the DataSource.
3.3. 3. 数据表
Built-in table partitioning: qteasy predefines multiple data tables, roughly including market data (e.g., daily bars, minute bars), financial data (income statement, balance sheet, etc.), macro and indices, and so on. Each table has a fixed table name, primary key, and column schema.
Mapping between table schema and DataType: A piece of “information that a strategy can reference” often corresponds to a column in a table (or one or more columns after computation). The mapping from a DataType’s name (and freq, asset_type) to data tables/columns is maintained within the system. Strategies only need to declare requirements via DataType or dtype_id, without caring about specific table names or column names.
3.4. 4. DataType:从“表里的数据”到“策略可引用的信息”
4.1 name, freq, asset_type, and dtype_id
name: the data type name (e.g., close, open, total_mv, pe), corresponding to a certain kind of available information.
freq: the data frequency (e.g., d/w/m/q), related to the time granularity of the data table or the resampling method.
asset_type: the applicable asset type (e.g., E, IDX, ANY), used to distinguish stocks, indices, etc.
dtype_id: generated from the three items above, with the rule
name_assettype_freq, for exampleclose_E_d,total_mv_E_q. Strategies use this id when callingget_data(dtype_id).
4.2 Mapping between built-in DataTypes and data tables/columns
The system comes with a large set of built-in DataTypes, each mapped to columns or derived columns in different data tables. For specific table names, column names, and how to obtain them, see Download and Manage Financial Historical Data and the “data types and data tables” documentation in the API docs. This series only emphasizes: strategies reference data only via DataType (dtype_id) and do not directly depend on the table schema. This way, when the schema evolves, you only need to adjust the mapping, without changing strategy code.
3.5. 5. 小结:为什么策略只接触 DataType 而不直接读表
Consistency: Both backtesting and live trading fetch data through the same path—“declare DataType + engine injects by window”—avoiding two separate sets of logic.
No look-ahead bias: The engine strictly prepares the historical data window based on the current time step and window_length, so the strategy cannot access future data that has not been injected.
Unified interface: All strategies fetch data via
get_data(dtype_id). Each dtype_id corresponds one-to-one with a DataType, making maintenance and extension easier.
For more on data configuration and API usage, see Download and Manage Financial Historical Data and the API reference.