5. Local data source — DataSource object
The local data source is the core of qteasy’s data management functionality. All data must first be downloaded and saved in the local data source before it can be used by qteasy. qteasy uses DataSource objects to manage the local data source. The local data source is similar to a database, containing a series of predefined data tables. The DataSource class provides a series of APIs for managing the data in the data tables, including reading, updating, deleting, and querying data.
To create a new DataSource object, you can use the following command:
>>> import qteasy as qt
>>> ds = qt.DataSource()
By printing the DataSource object, you can view its most basic properties:
>>> print(ds)
file://csv@qt_root/data/
From the printed string file://csv@qt_root/data/ of DataSource, we can see its basic properties.
The printed results contain basic information about the data source object:
file:– The type of data source.qteasysupports two types of data sources: file-based and database-based.filemeans that all data tables in the data source are stored on disk as files. Similarly, data can also be stored in amysqldatabase, in which case the data source type code isdb.csv:– Data file saving format.qteasycan save data files to disk in different formats. The most basic format iscsv, but users can also save data ashdfandfth(feather) files to meet different performance preferences.qt_root/data/:– The path to save data files.qt_rootrepresents the installation root path ofqteasy. By default, all data files are saved in the/data/subdirectory under the root directory.
Similarly, you can create a data source object with the data storage type “database” and view its basic properties. However, when creating a data source of type “database”, you need to specify the hostname, username, password, and/or database name of the MySQL database you want to connect to.
>>> ds_db = qt.DataSource(source_type='database', host='localhost', user='您的用户名', password='您的密码', db_name='test_db')
>>> print(ds_db)
db:mysql://localhost@3306/test_db
Similarly, the printed results contain basic information about the data source:
db:mysql– The data source type is a MySQL database; all data tables are stored in this database.localhost– the hostname of the database3306– Database connection porttest_db– Database name. If no name is specified, the default database isqt_db, but this database must be created beforehand.
Different data source objects, if they have different storage methods and storage paths, will have no relationship with each other and will not interfere with each other’s data. However, if two data sources point to the same path and have the same file type, then they will store duplicate data.
To view more basic properties of the data source, you can access several properties of the data source object:
>>> print(ds.source_type) # 数据源的类型
file
>>> print(ds_db.source_type)
db
>>> print(ds.file_type, ds.file_path) # 数据源的文件类型和存储路径
csv /Users/jackie/Projects/qteasy/qteasy/data/
>>> print(ds.connection_type, ds_db.connection_type) # 数据源的连接类型
file://csv@qt_root/data/ db:mysql://localhost@3306/test_db
5.1. Default data source
qteasy has a built-in default data source that doesn’t require manual creation by the user. By default, all data is saved to and read from this default data source. Users can configure the parameters of this data source through qteasy’s configuration file, ensuring that the data source points to the correct path or that the correct username is used to log in to the correct database. The properties of the built-in default data source can be viewed using the following method:
>>> print(qt.QT_DATA_SOURCE)
db:mysql://www.qteasy.online@3306/ts_db
5.2. View data table
After understanding the basic properties of the data source object, the next step is to further understand the data stored in the data source.
The data tables in the data source are all predefined. Each data table stores a different type of data. This data covers a large amount of data, including basic information, K-line data, listed company performance reports, macroeconomic data, and so on for various investment products such as stocks, indices, funds, futures, and options in the Shanghai and Shenzhen stock markets and futures markets. For a complete list of data tables, please see the next chapter. This section mainly introduces how to view the basic information and data of various data tables.
The list of tables in a data source can be obtained through the data source’s all_tables property, which returns a list object containing all predefined tables in the data source. However, it’s crucial to clarify that while many tables are defined in the data source, they only have basic definitions: header information, field names, field meanings, and data types are all defined. Newly created data sources typically have empty tables containing no data. To see which tables in the data source already contain data, you need to use the get_table_info() function.
>>> print('All tables in datasource:', len(ds.all_tables))
All tables in datasource: 108
>>> print('Some tables:', ds.all_tables[5:15])
Some tables: ['hk_trade_calendar', 'us_trade_calendar', 'stock_basic', 'hk_stock_basic', 'us_stock_basic', 'stock_names', 'stock_company', 'stk_managers', 'new_share', 'money_flow']
As can be seen, the current version of qteasy predefines 108 data tables, some of which have names such as stock_basic.
To view details of a specific table in the current data source, such as its definition and whether it contains data, use the get_table_info() method.
>>> info = ds.get_table_info('stock_basic')
<stock_basic>--<股票基本信息>
852KB/5K records on disc
primary keys:
----------------------------------------
1: ts_code: <5396> entries
starts: 000001.SZ, end: 920128.BJ
columns of table:
------------------------------------
columns dtypes remarks
0 ts_code varchar(9) 证券代码
1 symbol varchar(6) 股票代码
2 name varchar(20) 股票名称
3 area varchar(10) 地域
4 industry varchar(10) 所属行业
5 fullname varchar(50) 股票全称
6 enname varchar(120) 英文全称
7 cnspell varchar(40) 拼音缩写
8 market varchar(6) 市场类型
9 exchange varchar(6) 交易所代码
10 curr_type varchar(6) 交易货币
11 list_status varchar(4) 上市状态
12 list_date date 上市日期
13 delist_date date 退市日期
14 is_hs varchar(2) 是否沪深港通
The printed information includes the table name, description, field definitions, and data types.
It’s worth noting that if the data table is already populated with data, it will display the current size of the table. In the example above, some key information is explained below:
852KB/5K records on disc– This indicates that the table occupies approximately 852KB of disk space and contains approximately 5000 rows of data.primary keys– Displays the primary key columns of the data. Here, it indicates that thets_codecolumn is the primary key of the table, and this column has approximately 5396 distinct records. If the table has multiple primary keys, each primary key column will be listed separately, along with the range of data for that field. For example, in the above example, the range of data for thets_codecolumn is from000001.SZto920128.BJ, letting us know that the basic stock information table contains information on more than 5,000 stocks with codes from 000001 to 920128.columns of table– This displays the field definitions of the data table, which is the table’sschema. It lists the namecolumns, data typedtypes, and descriptive informationremarksfor each column.
Additionally, the get_table_info() method returns a dict, which also contains information about the table. This information can be printed out and compared with the formatted information described above for better understanding.
>>> print(info)
{'table': 'stock_basic', 'table_exists': True, 'table_size': '852KB', 'table_rows': '5K', 'primary_key1': 'ts_code', 'pk_records1': 5396, 'pk_min1': '000001.SZ', 'pk_max1': '920128.BJ', 'primary_key2': None, 'pk_records2': None, 'pk_min2': None, 'pk_max2': None}
5.3. View overall information about the data source
Use the methods above to inspect each table; overview() summarizes the whole DataSource:
>>> overview = ds.overview()
Analyzing local data source tables... depending on size of tables, it may take a few minutes
[########################################]104/104-100.0% A...zing completed!
Finished analyzing datasource:
file://csv@qt_root/data/
3 table(s) out of 104 contain local data as summary below, to view complete list, print returned DataFrame
===============================tables with local data===============================
Has_data Size_on_disk Record_count Record_start Record_end
table
trade_calendar True 1.8MB 70K CFFEX SZSE
stock_basic True 852KB 5K None None
stock_daily True 98.8MB 1.3M 20211112 20241231
Viewing the overall information of the entire data source may take several minutes because it requires summarizing all the information from each data table.
During overview, qteasy shows a progress bar, then returns a DataFrame of all tables and prints key stats.
The summary shows three populated tables — trade_calendar, stock_basic, stock_daily — with row counts, disk usage, and start/end dates. Print the returned DataFrame to see all tables:
>>> print(overview)
has_data size records pk1 records1 min1 \
table
trade_calendar True 1.8MB 70K cal_date 12865 19901012
hk_trade_calendar False 0 0 cal_date unknown N/A
us_trade_calendar False 0 0 cal_date unknown N/A
stock_basic True 852KB 5K ts_code 5396 000001.SZ
hk_stock_basic False 0 0 ts_code unknown N/A
... ... ... ... ... ... ...
cn_cpi False 0 0 month unknown N/A
cn_ppi False 0 0 month unknown N/A
cn_money False 0 0 month unknown N/A
cn_sf False 0 0 month unknown N/A
cn_pmi False 0 0 month unknown N/A
max1 pk2 records2 min2 max2
table
trade_calendar 20251231 exchange 7 CFFEX SZSE
hk_trade_calendar N/A None None None None
us_trade_calendar N/A None None None None
stock_basic 920128.BJ None None None None
hk_stock_basic N/A None None None None
... ... ... ... ... ...
cn_cpi N/A None None None None
cn_ppi N/A None None None None
cn_money N/A None None None None
cn_sf N/A None None None None
cn_pmi N/A None None None None
[104 rows x 11 columns]
5.4. The most important data table
On first use, tables are often empty — by design, qteasy simplifies acquiring and using financial data.
Basic data is easy to download, but some tables are more critical — fill them first because others depend on them and on qteasy itself:
trade_calendar— Trading calendar for all exchanges (trading day, exchange code/name). Core to qteasy: many features fail or slow without it. Used to determine trading days and download date ranges — fill this table first.stock_basic— Stock basics table with code, name, listing/delisting dates, industry, region, etc. Foundation for stock daily K-line and financial tables; prioritize filling this table.index_basic— Index basics table with code, name, publish/delisting dates, etc. Foundation for index daily K-line and constituent tables; prioritize filling this table.fund_basic— Fund basics table with code, name, type, size, etc. Foundation for fund daily K-line and NAV tables; prioritize filling this table.
Once these key tables are filled, most qteasy data features work smoothly.
qteasy also defines four system tables for live-trading records:
sys_op_live_accounts— Live trading account master table with account ID, name, type, status, etc.sys_op_positions— Live positions table with account ID, security code/name, quantity, cost, etc.sys_op_trade_orders— Live order table with account ID, order time/type, security code, quantity, price, etc.sys_op_trade_results— Live fill table with account ID, fill time, security code, quantity, price, etc.
These four system tables underpin live trading; data is auto-generated — do not manually fill, view, or delete them.
Other tables
Besides key tables, DataSource defines many more covering basics, daily K-line, financials, dividends, earnings reports, macro data, etc.:
Market data tables — OHLCV at various frequencies for stocks, funds, indices
Basics tables — stocks, funds, indices, futures, options, etc.
Indicator tables — technical, fundamental, macro indicators, etc.
Financial statement tables — balance sheet, income statement, cash flow, etc.
Earnings report tables — express reports, earnings guidance, forecasts, etc.
Dividend & block-trade tables — dividends, block trades, shareholder trades, etc.
Reference tables — macro, industry, exchange, and other reference data
See get_table_info() or the next chapter for table definitions and data types.
5.5. Reading data from tables
Once filled, use read_table_data() to read with date/code filters without worrying about storage details.
To avoid huge reads, always pass filters. With read_table_data() you can filter by security code and date range:
shares: One security code or comma-separated codes. If the primary key includes a security code, filter output by those codesstart: A date in“YYYYMMDD”format. If the table primary key includes time or date, filter rows betweenstartandend;start/endmust be provided as a pairend: A date in“YYYYMMDD”format. If the table primary key includes time or date, filter rows betweenstartandend;start/endmust be provided as a pair
DataSource filters data automatically from your criteria, for example:
Read daily K-line data for 000651.SZ from stock_daily between 2024-01-01 and 2024-01-15:
>>> ds.read_table_data(table='stock_daily', shares='000651.SZ', start='20240101', end='20240115')
open high low ... pct_chg vol amount
ts_code trade_date ...
000651.SZ 2024-01-03 32.00 32.08 31.70 ... -0.7181 254468.92 810315.013
2024-01-04 31.90 32.01 31.45 ... 0.4717 333398.05 1057458.411
2024-01-08 33.12 33.21 32.85 ... -0.2426 415911.34 1372722.050
2024-01-02 32.17 32.20 31.96 ... -0.4352 253797.30 814257.175
2024-01-15 33.45 33.95 33.42 ... 0.6544 295681.34 996815.725
2024-01-11 33.66 33.82 33.42 ... -0.2376 284088.74 955075.100
2024-01-09 32.81 33.55 32.65 ... 1.7933 438207.66 1454959.637
2024-01-10 33.35 33.84 33.28 ... 0.5375 366485.52 1233441.572
2024-01-12 33.50 33.83 33.42 ... 0.0893 224012.73 753931.821
2024-01-05 32.05 33.29 31.62 ... 3.2238 832156.75 2738167.636
[10 rows x 9 columns]
DataSource applies the correct filters for each table and ignores unnecessary parameters. For example, when querying basic info for two stocks, trade date is not required — qteasy ignores start/end with a notice. The shares parameter also works for funds, indices, futures, and options, not just stocks:
>>> ds.read_table_data(table='stock_basic', shares='000651.SZ,000700.SZ', start='20240101', end='20240131')
/Users/jackie/Projects/qteasy/qteasy/database.py:1314: RuntimeWarning: list index out of range
can not find date-like primary key in the table stock_basic!
passed start(2024-01-01) and end(2024-01-31) arguments will be ignored!
warnings.warn(msg, RuntimeWarning)
symbol name area industry ... list_status list_date delist_date is_hs
ts_code ...
000651.SZ 651 格力电器 广东 家用电器 ... L 19961118 NaN S
000700.SZ 700 模塑科技 江苏 汽车配件 ... L 19970228 NaN S
[2 rows x 14 columns]
See DataSource reference for more on read_table_data().
5.6. Adding data to tables
Reads fail if tables are empty or insufficient — fill tables first.
Note:
This covers manual writes only; automatic download, cleaning, and refill are covered later.
Fill tables via update_table_data(), writing a DataFrame with three parameters:
table: Name of the target table to write intodf: ADataFrameholding data to write into a tablemerge_type: Ifupdate, update existing rows in the table; ifignore, skip duplicate rows.
With update_table_data(), exact schema match is not required — qteasy normalizes format and deduplicates before writing.
Sample data is written below for demonstration only.
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'ts_code': ['000001.SZ', '000002.SZ', '000003.SZ', '000004.SZ', '000005.SZ',
... '000001.SZ', '000002.SZ', '000003.SZ', '000004.SZ', '000005.SZ'],
... 'trade_date': ['20211112', '20211112', '20211112', '20211112', '20211112',
... '20211113', '20211113', '20211113', '20211113', '20211113'],
... 'open': [1., 2., 3., 4., 5., 6., 7., 8., 9., 10.],
... 'high': [2., 3., 4., 5., 6., 7., 8., 9., 10., 1.],
... 'low': [3., 4., 5., 6., 7., 8., 9., 10., 1., 2.],
... 'close': [4., 5., 6., 7., 8., 9., 10., 1., 2., 3.]
... })
>>> print(df)
ts_code trade_date open high low close
0 000001.SZ 20211112 1.0 2.0 3.0 4.0
1 000002.SZ 20211112 2.0 3.0 4.0 5.0
2 000003.SZ 20211112 3.0 4.0 5.0 6.0
3 000004.SZ 20211112 4.0 5.0 6.0 7.0
4 000005.SZ 20211112 5.0 6.0 7.0 8.0
5 000001.SZ 20211113 6.0 7.0 8.0 9.0
6 000002.SZ 20211113 7.0 8.0 9.0 10.0
7 000003.SZ 20211113 8.0 9.0 10.0 1.0
8 000004.SZ 20211113 9.0 10.0 1.0 2.0
9 000005.SZ 20211113 10.0 1.0 2.0 3.0
The sample DataFrame above will be written to index_daily. Currently index_daily is empty and its schema differs slightly from the DataFrame:
The
index_dailytable schema defines 11 columns while the sampleDataFrameabove has only 6; allDataFramecolumns are still within theindex_dailyschema.The
index_dailytable is currently empty with no data filled.
>>> info = ds.get_table_info('index_daily')
<index_daily>--<指数日线行情>
0 MB/0 records on disc
primary keys:
----------------------------------------
1: ts_code: <unknown> entries
starts: N/A, end: N/A
2: trade_date: <unknown> entries
starts: N/A, end: N/A
columns of table:
------------------------------------
columns dtypes remarks
0 ts_code varchar(20) 证券代码
1 trade_date date 交易日期
2 open float 开盘价
3 high float 最高价
4 low float 最低价
5 close float 收盘价
6 pre_close float 昨收价
7 change float 涨跌额
8 pct_chg float 涨跌幅
9 vol double 成交量(手)
10 amount double 成交额(千元)
Next we write data; on success the method returns rows written:
>>> ds.update_table_data(table='index_daily', df=df)
10
After writing, we can read back the data just inserted.
Many columns read as NaN because the written df did not include those fields.
>>> df = ds.read_table_data('index_daily', shares='000001.SZ, 000002.SZ')
>>> print(df)
open high low close pre_close change pct_chg vol \
ts_code trade_date
000001.SZ 2021-11-12 1.0 2.0 3.0 4.0 NaN NaN NaN NaN
000002.SZ 2021-11-12 2.0 3.0 4.0 5.0 NaN NaN NaN NaN
000001.SZ 2021-11-13 6.0 7.0 8.0 9.0 NaN NaN NaN NaN
000002.SZ 2021-11-13 7.0 8.0 9.0 10.0 NaN NaN NaN NaN
amount
ts_code trade_date
000001.SZ 2021-11-12 NaN
000002.SZ 2021-11-12 NaN
000001.SZ 2021-11-13 NaN
000002.SZ 2021-11-13 NaN
5.7. Drop table — be careful; deletion is irreversible!!
You can also drop entire tables — qteasy does not support partial deletes; be very careful.
DataSource is optimized for storage and reads, not frequent deletes — it is a data warehouse, not a general database.
The DataSource method drop_table_data() deletes an entire table and cannot be undone!
To prevent accidental drops, drop_table_data() errors by default — e.g. when dropping temporary index_daily data:
>>> ds.drop_table_data('index_daily')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[19], line 1
----> 1 ds.drop_table_data('index_daily')
File ~/Projects/qteasy/qteasy/database.py:1587, in DataSource.drop_table_data(self, table)
1584 if not self.allow_drop_table:
1585 err = RuntimeError('Can\'t drop table from current datasource according to setting, please check: '
1586 'datasource.allow_drop_table')
-> 1587 raise err
1589 if self.source_type == 'db':
1590 self._drop_db_table(db_table=table)
RuntimeError: Can't drop table from current datasource according to setting, please check: datasource.allow_drop_table
Set allow_drop_table to True to allow drops; set it back to False afterwards.
The code below drops index_daily; afterwards reads from that table fail:
>>> ds.allow_drop_table = True
>>> ds.drop_table_data('index_daily')
>>> ds.allow_drop_table = False
>>> df = ds.read_table_data(table='index_daily')
>>> print(df)
Empty DataFrame
Columns: []
Index: []
5.8. Summary
We now understand DataSource, qteasy’s core class for financial history data management, including:
What is
DataSourceand how to create oneExtracting data from DataSource
Working with DataSource
Later chapters cover more:
What useful financial data is in DataSource?
How to batch-download and fill DataSource?
Extracting information from DataSource more effectively?