5. Local data source — DataSource object

The local data source is the core of qteasy’s data management functionality. All data must first be downloaded and saved in the local data source before it can be used by qteasy. qteasy uses DataSource objects to manage the local data source. The local data source is similar to a database, containing a series of predefined data tables. The DataSource class provides a series of APIs for managing the data in the data tables, including reading, updating, deleting, and querying data.

To create a new DataSource object, you can use the following command:

>>> import qteasy as qt
>>> ds = qt.DataSource()

By printing the DataSource object, you can view its most basic properties:

>>> print(ds)
file://csv@qt_root/data/

From the printed string file://csv@qt_root/data/ of DataSource, we can see its basic properties.

The printed results contain basic information about the data source object:

  • file: – The type of data source. qteasy supports two types of data sources: file-based and database-based. file means that all data tables in the data source are stored on disk as files. Similarly, data can also be stored in a mysql database, in which case the data source type code is db.

  • csv: – Data file saving format. qteasy can save data files to disk in different formats. The most basic format is csv, but users can also save data as hdf and fth (feather) files to meet different performance preferences.

  • qt_root/data/: – The path to save data files. qt_root represents the installation root path of qteasy. By default, all data files are saved in the /data/ subdirectory under the root directory.

Similarly, you can create a data source object with the data storage type “database” and view its basic properties. However, when creating a data source of type “database”, you need to specify the hostname, username, password, and/or database name of the MySQL database you want to connect to.

>>> ds_db = qt.DataSource(source_type='database', host='localhost', user='您的用户名', password='您的密码', db_name='test_db')
>>> print(ds_db)
db:mysql://localhost@3306/test_db

Similarly, the printed results contain basic information about the data source:

  • db:mysql – The data source type is a MySQL database; all data tables are stored in this database.

  • localhost – the hostname of the database

  • 3306 – Database connection port

  • test_db – Database name. If no name is specified, the default database is qt_db, but this database must be created beforehand.

Different data source objects, if they have different storage methods and storage paths, will have no relationship with each other and will not interfere with each other’s data. However, if two data sources point to the same path and have the same file type, then they will store duplicate data.

To view more basic properties of the data source, you can access several properties of the data source object:

>>> print(ds.source_type)  # 数据源的类型
file
>>> print(ds_db.source_type)
db
>>> print(ds.file_type, ds.file_path)  # 数据源的文件类型和存储路径
csv /Users/jackie/Projects/qteasy/qteasy/data/
>>> print(ds.connection_type, ds_db.connection_type)  # 数据源的连接类型
file://csv@qt_root/data/ db:mysql://localhost@3306/test_db

5.1. Default data source

qteasy has a built-in default data source that doesn’t require manual creation by the user. By default, all data is saved to and read from this default data source. Users can configure the parameters of this data source through qteasy’s configuration file, ensuring that the data source points to the correct path or that the correct username is used to log in to the correct database. The properties of the built-in default data source can be viewed using the following method:

>>> print(qt.QT_DATA_SOURCE)
db:mysql://www.qteasy.online@3306/ts_db

5.2. View data table

After understanding the basic properties of the data source object, the next step is to further understand the data stored in the data source.

The data tables in the data source are all predefined. Each data table stores a different type of data. This data covers a large amount of data, including basic information, K-line data, listed company performance reports, macroeconomic data, and so on for various investment products such as stocks, indices, funds, futures, and options in the Shanghai and Shenzhen stock markets and futures markets. For a complete list of data tables, please see the next chapter. This section mainly introduces how to view the basic information and data of various data tables.

The list of tables in a data source can be obtained through the data source’s all_tables property, which returns a list object containing all predefined tables in the data source. However, it’s crucial to clarify that while many tables are defined in the data source, they only have basic definitions: header information, field names, field meanings, and data types are all defined. Newly created data sources typically have empty tables containing no data. To see which tables in the data source already contain data, you need to use the get_table_info() function.

>>> print('All tables in datasource:', len(ds.all_tables))
All tables in datasource: 108
>>> print('Some tables:', ds.all_tables[5:15])
Some tables: ['hk_trade_calendar', 'us_trade_calendar', 'stock_basic', 'hk_stock_basic', 'us_stock_basic', 'stock_names', 'stock_company', 'stk_managers', 'new_share', 'money_flow']

As can be seen, the current version of qteasy predefines 108 data tables, some of which have names such as stock_basic.

To view details of a specific table in the current data source, such as its definition and whether it contains data, use the get_table_info() method.

>>> info = ds.get_table_info('stock_basic')
<stock_basic>--<股票基本信息>
852KB/5K records on disc
primary keys: 
----------------------------------------
1:   ts_code:  <5396> entries
     starts: 000001.SZ, end: 920128.BJ

columns of table:
------------------------------------
        columns        dtypes remarks
0       ts_code    varchar(9)    证券代码
1        symbol    varchar(6)    股票代码
2          name   varchar(20)    股票名称
3          area   varchar(10)      地域
4      industry   varchar(10)    所属行业
5      fullname   varchar(50)    股票全称
6        enname  varchar(120)    英文全称
7       cnspell   varchar(40)    拼音缩写
8        market    varchar(6)    市场类型
9      exchange    varchar(6)   交易所代码
10    curr_type    varchar(6)    交易货币
11  list_status    varchar(4)    上市状态
12    list_date          date    上市日期
13  delist_date          date    退市日期
14        is_hs    varchar(2)  是否沪深港通

The printed information includes the table name, description, field definitions, and data types.

It’s worth noting that if the data table is already populated with data, it will display the current size of the table. In the example above, some key information is explained below:

  • 852KB/5K records on disc – This indicates that the table occupies approximately 852KB of disk space and contains approximately 5000 rows of data.

  • primary keys – Displays the primary key columns of the data. Here, it indicates that the ts_code column is the primary key of the table, and this column has approximately 5396 distinct records. If the table has multiple primary keys, each primary key column will be listed separately, along with the range of data for that field. For example, in the above example, the range of data for the ts_code column is from 000001.SZ to 920128.BJ, letting us know that the basic stock information table contains information on more than 5,000 stocks with codes from 000001 to 920128.

  • columns of table – This displays the field definitions of the data table, which is the table’s schema. It lists the name columns, data type dtypes, and descriptive information remarks for each column.

Additionally, the get_table_info() method returns a dict, which also contains information about the table. This information can be printed out and compared with the formatted information described above for better understanding.

>>> print(info)
{'table': 'stock_basic', 'table_exists': True, 'table_size': '852KB', 'table_rows': '5K', 'primary_key1': 'ts_code', 'pk_records1': 5396, 'pk_min1': '000001.SZ', 'pk_max1': '920128.BJ', 'primary_key2': None, 'pk_records2': None, 'pk_min2': None, 'pk_max2': None}

5.3. View overall information about the data source

Use the methods above to inspect each table; overview() summarizes the whole DataSource:

>>> overview = ds.overview()
Analyzing local data source tables... depending on size of tables, it may take a few minutes
[########################################]104/104-100.0%  A...zing completed!
Finished analyzing datasource: 
file://csv@qt_root/data/
3 table(s) out of 104 contain local data as summary below, to view complete list, print returned DataFrame
===============================tables with local data===============================
               Has_data Size_on_disk Record_count Record_start Record_end
table                                                                    
trade_calendar   True       1.8MB         70K          CFFEX        SZSE 
stock_basic      True       852KB          5K           None        None 
stock_daily      True      98.8MB        1.3M       20211112    20241231

Viewing the overall information of the entire data source may take several minutes because it requires summarizing all the information from each data table.

During overview, qteasy shows a progress bar, then returns a DataFrame of all tables and prints key stats.

The summary shows three populated tables — trade_calendar, stock_basic, stock_daily — with row counts, disk usage, and start/end dates. Print the returned DataFrame to see all tables:

>>> print(overview)
                   has_data   size records       pk1 records1       min1  \
table                                                                      
trade_calendar         True  1.8MB     70K  cal_date    12865   19901012   
hk_trade_calendar     False      0       0  cal_date  unknown        N/A   
us_trade_calendar     False      0       0  cal_date  unknown        N/A   
stock_basic            True  852KB      5K   ts_code     5396  000001.SZ   
hk_stock_basic        False      0       0   ts_code  unknown        N/A   
...                     ...    ...     ...       ...      ...        ...   
cn_cpi                False      0       0     month  unknown        N/A   
cn_ppi                False      0       0     month  unknown        N/A   
cn_money              False      0       0     month  unknown        N/A   
cn_sf                 False      0       0     month  unknown        N/A   
cn_pmi                False      0       0     month  unknown        N/A   

                        max1       pk2 records2   min2  max2  
table                                                         
trade_calendar      20251231  exchange        7  CFFEX  SZSE  
hk_trade_calendar        N/A      None     None   None  None  
us_trade_calendar        N/A      None     None   None  None  
stock_basic        920128.BJ      None     None   None  None  
hk_stock_basic           N/A      None     None   None  None  
...                      ...       ...      ...    ...   ...  
cn_cpi                   N/A      None     None   None  None  
cn_ppi                   N/A      None     None   None  None  
cn_money                 N/A      None     None   None  None  
cn_sf                    N/A      None     None   None  None  
cn_pmi                   N/A      None     None   None  None  

[104 rows x 11 columns]

5.4. The most important data table

On first use, tables are often empty — by design, qteasy simplifies acquiring and using financial data.

Basic data is easy to download, but some tables are more critical — fill them first because others depend on them and on qteasy itself:

  • trade_calendar — Trading calendar for all exchanges (trading day, exchange code/name). Core to qteasy: many features fail or slow without it. Used to determine trading days and download date ranges — fill this table first.

  • stock_basic — Stock basics table with code, name, listing/delisting dates, industry, region, etc. Foundation for stock daily K-line and financial tables; prioritize filling this table.

  • index_basic — Index basics table with code, name, publish/delisting dates, etc. Foundation for index daily K-line and constituent tables; prioritize filling this table.

  • fund_basic — Fund basics table with code, name, type, size, etc. Foundation for fund daily K-line and NAV tables; prioritize filling this table.

Once these key tables are filled, most qteasy data features work smoothly.

qteasy also defines four system tables for live-trading records:

  • sys_op_live_accounts — Live trading account master table with account ID, name, type, status, etc.

  • sys_op_positions — Live positions table with account ID, security code/name, quantity, cost, etc.

  • sys_op_trade_orders — Live order table with account ID, order time/type, security code, quantity, price, etc.

  • sys_op_trade_results — Live fill table with account ID, fill time, security code, quantity, price, etc.

These four system tables underpin live trading; data is auto-generated — do not manually fill, view, or delete them.

Other tables

Besides key tables, DataSource defines many more covering basics, daily K-line, financials, dividends, earnings reports, macro data, etc.:

  • Market data tables — OHLCV at various frequencies for stocks, funds, indices

  • Basics tables — stocks, funds, indices, futures, options, etc.

  • Indicator tables — technical, fundamental, macro indicators, etc.

  • Financial statement tables — balance sheet, income statement, cash flow, etc.

  • Earnings report tables — express reports, earnings guidance, forecasts, etc.

  • Dividend & block-trade tables — dividends, block trades, shareholder trades, etc.

  • Reference tables — macro, industry, exchange, and other reference data

See get_table_info() or the next chapter for table definitions and data types.

5.5. Reading data from tables

Once filled, use read_table_data() to read with date/code filters without worrying about storage details.

To avoid huge reads, always pass filters. With read_table_data() you can filter by security code and date range:

  • shares: One security code or comma-separated codes. If the primary key includes a security code, filter output by those codes

  • start: A date in “YYYYMMDD” format. If the table primary key includes time or date, filter rows between start and end; start/end must be provided as a pair

  • end: A date in “YYYYMMDD” format. If the table primary key includes time or date, filter rows between start and end; start/end must be provided as a pair

DataSource filters data automatically from your criteria, for example:

Read daily K-line data for 000651.SZ from stock_daily between 2024-01-01 and 2024-01-15:

>>> ds.read_table_data(table='stock_daily', shares='000651.SZ', start='20240101', end='20240115')
                       open   high    low  ...  pct_chg        vol       amount
ts_code   trade_date                       ...                                 
000651.SZ 2024-01-03  32.00  32.08  31.70  ...  -0.7181  254468.92   810315.013
          2024-01-04  31.90  32.01  31.45  ...   0.4717  333398.05  1057458.411
          2024-01-08  33.12  33.21  32.85  ...  -0.2426  415911.34  1372722.050
          2024-01-02  32.17  32.20  31.96  ...  -0.4352  253797.30   814257.175
          2024-01-15  33.45  33.95  33.42  ...   0.6544  295681.34   996815.725
          2024-01-11  33.66  33.82  33.42  ...  -0.2376  284088.74   955075.100
          2024-01-09  32.81  33.55  32.65  ...   1.7933  438207.66  1454959.637
          2024-01-10  33.35  33.84  33.28  ...   0.5375  366485.52  1233441.572
          2024-01-12  33.50  33.83  33.42  ...   0.0893  224012.73   753931.821
          2024-01-05  32.05  33.29  31.62  ...   3.2238  832156.75  2738167.636

[10 rows x 9 columns]

DataSource applies the correct filters for each table and ignores unnecessary parameters. For example, when querying basic info for two stocks, trade date is not required — qteasy ignores start/end with a notice. The shares parameter also works for funds, indices, futures, and options, not just stocks:

>>> ds.read_table_data(table='stock_basic', shares='000651.SZ,000700.SZ', start='20240101', end='20240131')
/Users/jackie/Projects/qteasy/qteasy/database.py:1314: RuntimeWarning: list index out of range
can not find date-like primary key in the table stock_basic!
passed start(2024-01-01) and end(2024-01-31) arguments will be ignored!
  warnings.warn(msg, RuntimeWarning)
           symbol  name area industry  ... list_status list_date delist_date is_hs
ts_code                                ...                                        
000651.SZ     651  格力电器   广东     家用电器  ...           L  19961118         NaN     S
000700.SZ     700  模塑科技   江苏     汽车配件  ...           L  19970228         NaN     S

[2 rows x 14 columns]

See DataSource reference for more on read_table_data().

5.6. Adding data to tables

Reads fail if tables are empty or insufficient — fill tables first.

Note:

This covers manual writes only; automatic download, cleaning, and refill are covered later.

Fill tables via update_table_data(), writing a DataFrame with three parameters:

  • table: Name of the target table to write into

  • df: A DataFrame holding data to write into a table

  • merge_type: If update, update existing rows in the table; if ignore, skip duplicate rows.

With update_table_data(), exact schema match is not required — qteasy normalizes format and deduplicates before writing.

Sample data is written below for demonstration only.

>>> import pandas as pd
>>> df = pd.DataFrame({
...         'ts_code':    ['000001.SZ', '000002.SZ', '000003.SZ', '000004.SZ', '000005.SZ',
...                        '000001.SZ', '000002.SZ', '000003.SZ', '000004.SZ', '000005.SZ'],
...         'trade_date': ['20211112', '20211112', '20211112', '20211112', '20211112',
...                        '20211113', '20211113', '20211113', '20211113', '20211113'],
...         'open':       [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
...         'high':       [2., 3., 4., 5., 6., 7., 8., 9., 10., 1.],
...         'low':        [3., 4., 5., 6., 7., 8., 9., 10., 1., 2.],
...         'close':      [4., 5., 6., 7., 8., 9., 10., 1., 2., 3.]
...     })
>>> print(df)
     ts_code trade_date  open  high   low  close
0  000001.SZ   20211112   1.0   2.0   3.0    4.0
1  000002.SZ   20211112   2.0   3.0   4.0    5.0
2  000003.SZ   20211112   3.0   4.0   5.0    6.0
3  000004.SZ   20211112   4.0   5.0   6.0    7.0
4  000005.SZ   20211112   5.0   6.0   7.0    8.0
5  000001.SZ   20211113   6.0   7.0   8.0    9.0
6  000002.SZ   20211113   7.0   8.0   9.0   10.0
7  000003.SZ   20211113   8.0   9.0  10.0    1.0
8  000004.SZ   20211113   9.0  10.0   1.0    2.0
9  000005.SZ   20211113  10.0   1.0   2.0    3.0

The sample DataFrame above will be written to index_daily. Currently index_daily is empty and its schema differs slightly from the DataFrame:

  • The index_daily table schema defines 11 columns while the sample DataFrame above has only 6; all DataFrame columns are still within the index_daily schema.

  • The index_daily table is currently empty with no data filled.

>>> info = ds.get_table_info('index_daily')
<index_daily>--<指数日线行情>
0 MB/0 records on disc
primary keys: 
----------------------------------------
1:   ts_code:  <unknown> entries
     starts: N/A, end: N/A
2:   trade_date:  <unknown> entries
     starts: N/A, end: N/A

columns of table:
------------------------------------
       columns       dtypes  remarks
0      ts_code  varchar(20)     证券代码
1   trade_date         date     交易日期
2         open        float      开盘价
3         high        float      最高价
4          low        float      最低价
5        close        float      收盘价
6    pre_close        float      昨收价
7       change        float      涨跌额
8      pct_chg        float      涨跌幅
9          vol       double   成交量(手)
10      amount       double  成交额(千元)

Next we write data; on success the method returns rows written:

>>> ds.update_table_data(table='index_daily', df=df)
10

After writing, we can read back the data just inserted.

Many columns read as NaN because the written df did not include those fields.

>>> df = ds.read_table_data('index_daily', shares='000001.SZ, 000002.SZ')
>>> print(df)
                      open  high  low  close  pre_close  change  pct_chg  vol  \
ts_code   trade_date                                                            
000001.SZ 2021-11-12   1.0   2.0  3.0    4.0        NaN     NaN      NaN  NaN   
000002.SZ 2021-11-12   2.0   3.0  4.0    5.0        NaN     NaN      NaN  NaN   
000001.SZ 2021-11-13   6.0   7.0  8.0    9.0        NaN     NaN      NaN  NaN   
000002.SZ 2021-11-13   7.0   8.0  9.0   10.0        NaN     NaN      NaN  NaN   

                      amount  
ts_code   trade_date          
000001.SZ 2021-11-12     NaN  
000002.SZ 2021-11-12     NaN  
000001.SZ 2021-11-13     NaN  
000002.SZ 2021-11-13     NaN  

5.7. Drop table — be careful; deletion is irreversible!!

You can also drop entire tables — qteasy does not support partial deletes; be very careful.

DataSource is optimized for storage and reads, not frequent deletes — it is a data warehouse, not a general database.

The DataSource method drop_table_data() deletes an entire table and cannot be undone!

To prevent accidental drops, drop_table_data() errors by default — e.g. when dropping temporary index_daily data:

>>> ds.drop_table_data('index_daily')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[19], line 1
----> 1 ds.drop_table_data('index_daily')

File ~/Projects/qteasy/qteasy/database.py:1587, in DataSource.drop_table_data(self, table)
   1584 if not self.allow_drop_table:
   1585     err = RuntimeError('Can\'t drop table from current datasource according to setting, please check: '
   1586                        'datasource.allow_drop_table')
-> 1587     raise err
   1589 if self.source_type == 'db':
   1590     self._drop_db_table(db_table=table)

RuntimeError: Can't drop table from current datasource according to setting, please check: datasource.allow_drop_table

Set allow_drop_table to True to allow drops; set it back to False afterwards.

The code below drops index_daily; afterwards reads from that table fail:

>>> ds.allow_drop_table = True
>>> ds.drop_table_data('index_daily')
>>> ds.allow_drop_table = False
>>> df = ds.read_table_data(table='index_daily')
>>> print(df)
    
Empty DataFrame
Columns: []
Index: []

5.8. Summary

We now understand DataSource, qteasy’s core class for financial history data management, including:

  • What is DataSource and how to create one

  • Extracting data from DataSource

  • Working with DataSource

Later chapters cover more:

  • What useful financial data is in DataSource?

  • How to batch-download and fill DataSource?

  • Extracting information from DataSource more effectively?