4. Research cross-sectional timing factors with HistoryPanel

前置：教程 2.0 最小数据集 + 教程 2.5（建议先完成 2.5 §0）。本地数据与 get_kline 用法见 2.0/2.5。

Cross-sectional timing: factor threshold -> bool mask -> compare portfolio returns and CAGR

This tutorial targets a very common and very practical research scenario: we only have a small number of stocks (a few to a dozen or so), and we want to make hold / not hold timing decisions along each stock’s own timeline; then aggregate those decisions into a single portfolio curve, and finally compare it with the benchmark (HS300) to get a reusable annualized metric (CAGR).

First, let’s make the positioning clear: HistoryPanel here serves as a lightweight factor research container. Its strength is that it connects “data -> conditions -> research definition (mask) -> portfolio aggregation -> visual explanation,” allowing us to quickly validate whether a rule is worth digging into further. It is not a trading backtesting engine: it does not handle full trading semantics such as transaction costs, settlement, slippage, capital constraints, minimum trade size, etc. The goal of this article is to build a research loop, not a trading loop.

4.1. 0. 开场：先跑通一个最小可用的研究闭环

As the code below shows, with just a few lines we can get a HistoryPanel and plot the first chart (whether a line chart or candlesticks), proving that “the data and visualization pipeline works”.

import qteasy as qt

stocks = ['000001.SZ', '600519.SH', '300750.SZ']
benchmark = '000300.SH'
shares = stocks + [benchmark]

hp = qt.get_kline(
    shares=shares,
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

# 最小可跑：能出图即可
fig = hp.plot(interactive=True)
fig

If you run this in a Notebook, you can usually see a chart immediately from the snippet above.

However, merely being able to “plot something” is far from enough. If we really want to use it as a daily research tool, we’ll run into at least the following issues:

How to implement the timing rule: it’s easy to have a rule in mind like “hold when MACD>0”, but once you write it as code it often gets distorted: wrong column names, missing columns, everything comes out NaN, or different stocks are aligned inconsistently. The result is that the research script errors out as soon as you run it—or worse, it doesn’t error but the conclusions aren’t trustworthy.
How to align mask shapes: The condition is essentially 2D (stock × time), but HistoryPanel is 3D (stock × time × field). If the shapes don’t align, you can get a subtle bug where “you think you’re filtering stocks, but you’re actually filtering fields”; and this kind of bug often won’t throw an error right away—it just makes the equity curve look “a bit off.”
Without a benchmark, it’s hard to judge performance: Looking only at the portfolio curve makes it easy to “pat yourself on the back.” It might just be riding a certain style or broad-market trend during part of 2022. Bringing 000300.SH in as a reference at least answers a more practical question: is this timing rule creating excess returns, or is it just moving with the market?
Results but hard to explain: Even if the long portfolio outperforms, it’s still hard to explain “why it won.” Real research needs to be reproducible and reviewable: when the return shows a clear inflection point, can we quickly go back to the chart and pinpoint “which triggers changed the position state”?

Fortunately, these capabilities can be filled in step by step. This article follows a “get it running first, then enhance it” rhythm to flesh it out into a truly usable small research workflow.

4.2. 0.5 First, show the final result (what we’ll end up with)

To keep the pace steadier, let’s first make the “end state” clear. After following this article, you’ll get at least two types of output:

Portfolio-level comparison: Put the LONG / SHORT / 000300.SH curves (all normalized to start at 1.0) on the same chart, and you can tell at a glance whether “the timing rule is effective during the research period.”
Single-asset-level explanation: Plot the candlestick chart (or price curve) for a given stock, and highlight the time points when the “holding condition is triggered.” This way, when you see an abnormal segment in the portfolio curve, you can quickly go back to the single-stock chart to review whether the “trigger points” match your intuition.

Note: This article does not require you to output a GIF. A more recommended approach is: first get the plots working in a Notebook and just take screenshots; if you want to write a blog post or give a presentation, then record the key steps into a GIF.

4.3. 1. 目标（我们这篇文章要完成什么）

Before we start, let’s clarify the goal so that at every step we know what problem we’re solving.

Fetch the HistoryPanel for a small set of individual stocks + 000300.SH
Derive an interpretable timing factor (example: MACD)
Use comparison operations to get boolean conditions, and use where() to generate the research mask
Use portfolio(mask=...) to aggregate into a portfolio curve and compare it with the benchmark
Use normalize/cum_return to get cumulative returns, and derive a CAGR summary
Use plot(highlight=...) to explain the “trigger points” back on the chart
At the end, we provide a complete piece of code that “runs as a single function”
Clarify the scope boundary: this is not a trading backtesting engine

The whole article keeps the same rhythm: first explain what this section aims to solve -> then cover the minimum necessary principles -> finally give the key code and the expected outcome. Repeated code will be omitted as appropriate, but every section is guaranteed to be reproducible by following the article straight through.

4.4. 1.1 Prerequisite for reproduction: Is the data already prepared locally?

The examples in this article assume by default that you have configured the data source locally, and that qt.get_kline() can successfully fetch daily bar data for 20220101–20221231.

If your environment doesn’t have data yet, the most common symptoms are: qt.get_kline() returns an empty panel, or you get all NaNs when computing indicators later. To avoid “only realizing there’s no data halfway through,” it’s recommended that after finishing Section 2 you make sure to check:

Whether the time length of hp.shape is greater than 100 (a year of daily bars is usually around 200);
Does hp.htypes include at least open/high/low/close;
Whether hp.hdates continuously covers the period you want to study.

If you need to download data in your own environment first, please complete the chapters on “Data download and data source configuration” first; this article won’t go into data pipeline details, and focuses on the research workflow around HistoryPanel itself.

4.5. 2. 准备数据：三只个股 + 一个基准指数（HS300）

2.1 What this section aims to solve

Let’s first get the data cleaned up: use 3 individual stocks for cross-sectional timing, and also add 000300.SH (CSI 300) to the panel as the benchmark for later portfolio(..., benchmark=...) comparisons.

The most critical step here is: make sure the OHLC columns are complete. Because whether you’re plotting candlesticks or doing pattern recognition later, you can’t avoid open/high/low/close.

2.2 Minimum necessary principles

HistoryPanel is 3D data: (share, time, htype). Subsequent where/mask/portfolio/cum_return will all rely on the prerequisite of “shape alignment”. So at the very beginning we print shape/shares/htypes and do a minimal field validation, which can block most of the “only errors out halfway through writing it” pitfalls in advance.

2.3 Runnable code + expected results

import qteasy as qt

stocks = ['000001.SZ', '600519.SH', '300750.SZ']
benchmark = '000300.SH'
shares = stocks + [benchmark]

hp = qt.get_kline(
    shares=shares,
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

print('hp.shape:', hp.shape)
print('hp.shares:', hp.shares)
print('hp.htypes:', hp.htypes)
print('last_date:', hp.hdates[-1])

required = {'open', 'high', 'low', 'close'}
missing = [c for c in required if c not in set(hp.htypes)]
if missing:
    raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
if benchmark not in hp.shares:
    raise ValueError(f'Benchmark {benchmark} not found in shares: {hp.shares}')

# 额外做一个“是否真的有数据”的快速检查（避免空面板/全 NaN 继续往下跑）
if hp.shape[1] < 50:
    raise ValueError(
        'Not enough data points loaded (too few hdates). '
        'Please check your local datasource and date range.'
    )

You should see:

hp.shares contains 3 individual stocks + 000300.SH
hp.htypes should include at least open/high/low/close (and possibly also vol).

4.6. 3. 派生择时因子：MACD（把信号落成一列可复用的数据）

3.1 What this section aims to solve

First we pick a timing factor that’s common enough, interpretable enough, and also “immediately usable”: MACD. The goal of this section is simple: compute MACD and make it a new column of HistoryPanel, so later conditional filtering can compare directly against the column.

3.2 Minimum necessary principles

hp.kline.macd() returns a new HistoryPanel and appends three columns to htypes:

macd_12_26_9
macd_signal_12_26_9
macd_hist_12_26_9

Pay attention to this naming: by default it comes with a suffix; it’s not the bare macd_hist. In this step we print out the column names, so when we write conditions later we won’t have to guess and get it wrong.

Also, when many people do factor research for the first time, they may unconsciously “recompute indicators” in different places. It may look fine in the short term, but once you start adding more columns and drawing more charts, it’s easy to end up in a mess of “same-name columns overwriting / multiple copies of synonymous columns”. A more robust approach is: compute indicators into columns first, explicitly write them into htypes, and then have subsequent conditions, aggregation, and visualization depend only on those columns.

3.3 Runnable code + expected result

hp_macd = hp.kline.macd(price_htype='close', fastperiod=12, slowperiod=26, signalperiod=9)
print('new htypes (tail):', hp_macd.htypes[-6:])

Expected: you will see macd_hist_12_26_9 appear in htypes.

4.7. 4. 因子阈值 -> bool 条件 -> `where()` 研究 mask

4.1 What this section aims to solve

Now we spell out the timing rule clearly: for example, “when the MACD histogram is greater than 0 it’s long; when it’s less than or equal to 0 it’s short”. What this step needs to deliver is two masks:

mask_long: which grid points participate in long portfolio aggregation
mask_short: which grid points participate in short portfolio aggregation

4.2 Minimum necessary principles

Starting from 2.2.8, HistoryPanel supports doing comparison operations directly: for example, hp_macd > 0 returns numpy.ndarray(bool). And hp.where(condition) normalizes various broadcastable conditions into an (M,L,N) mask with the same shape as hp.values.

This step is crucial: we are not “deleting data”, but “defining the research convention”: grid points where mask is False will be treated as missing in portfolio/cum_return (they won’t participate in aggregation or will cause the path to break). Therefore, the mask is “the research rule itself”.

Here, let’s explain “shape” a bit more plainly:

The timing rule in your head is “one True/False per stock per day”, so the most natural condition shape is (M, L).
But the values of HistoryPanel are (M, L, N), with N field columns.
What where() does is: expand/broadcast your condition into (M, L, N), so that any subsequent computation that needs to filter by grid point has a single, unified entry point.

Once you get into the habit of “running all conditions through where() first”, you’ll be less likely to run into shape pitfalls later when feeding conditions into portfolio(mask=...) and cum_return(mask=...).

4.3 Runnable code + expected result

import numpy as np

factor_col = 'macd_hist_12_26_9'

# 取单列子面板（形状 (M,L,1)），再与标量比较得到 bool ndarray
cond_long = (hp_macd[factor_col] > 0.0)     # numpy.ndarray(bool)
cond_short = (hp_macd[factor_col] <= 0.0)  # numpy.ndarray(bool)

# 规整为 (M,L,N) 研究 mask
mask_long = hp.where(cond_long)
mask_short = hp.where(cond_short)

print('cond_long shape:', getattr(cond_long, 'shape', None))
print('mask_long shape:', mask_long.shape)
print('mask_long dtype:', mask_long.dtype)

# 只做一个直觉检查：最后一天 long 有多少格点为 True
print('true_count_last_day(long):', int(mask_long[:, -1, 0].sum()))

Expected:

mask_long.shape == hp.shape
mask_long.dtype == bool

4.8. 5. 用 `portfolio(mask=...)` 聚合组合曲线，并与 benchmark 对比

5.1 What this section aims to solve

We’re not doing a trading backtest—only a research-oriented “rough aggregation”: for each day, aggregate the stocks that meet the criteria (long/short) into a single portfolio curve, then pull out 000300.SH as the benchmark for comparison.

5.2 The minimum necessary principles

A few key points of HistoryPanel.portfolio() (that we use):

mask=: follows the same shape rules as where(); grid points that are False do not participate in aggregation
benchmark= + benchmark_output='tag_along': append the benchmark row to the output
The output is still a HistoryPanel, and the timeline remains unchanged.

To reiterate: this is research-oriented aggregation; it does not include transaction costs, has no capital constraints, and does not execute rebalancing.

You can think of it as an “equal-weight basket under a research convention”: each day, the stocks that meet the criteria are equal-weight averaged into a single curve. Its purpose is not to simulate real trading, but to answer a more fundamental question:

If I use this criterion to define a “holding set”, does it exhibit any systematic performance difference over the sample period?

Only when the answer to this question is “yes” is it worth continuing to invest effort and upgrading it into a real trading-strategy backtest.

5.3 Runnable code + expected results

benchmark = '000300.SH'

pf_long = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_long,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='LONG',
)

pf_short = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_short,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='SHORT',
)

print('pf_long.shares:', pf_long.shares)
print('pf_long.htypes:', pf_long.htypes)
print('pf_long.shape:', pf_long.shape)

Expected: pf_long.shares will contain two rows: LONG and 000300.SH.

4.9. 6. `normalize / cum_return` + CAGR：把曲线变成可比较的年化摘要

6.1 What this section aims to solve

With two curves side by side, the eye can see the trend, but it’s hard to quickly summarize: “Over this research period, how much stronger is long than the benchmark, exactly?” So we do two things:

normalize: align the starting point (for easier visual comparison)
cum_return + CAGR: produce a reusable annualized summary table

6.2 The principle of minimum necessity

normalize(base_index=0) scales the baseline point to 1.0 (research convention), which is ideal for plotting and comparing on the same chart
cum_return(method='simple') outputs cumulative returns cumret_*
The essence of CAGR is an “equivalent annualized growth rate”:

[ \text{CAGR}=(1+R)^{1/T}-1 ]

Where (R) is the cumulative return over the interval (ending value), and (T) is the number of years.

Let’s add one more sentence here on “why compute CAGR”: very often we compare periods of different lengths (half a year, one year, two years). If you only look at cumulative return, it’s easy to conclude that “the longer the period, the more impressive it looks.” CAGR converts it into “equivalent annual growth,” making results from different research horizons easier to compare side by side.

6.3 Runnable code + expected results

import pandas as pd

def _years_between(hdates) -> float:
    idx = pd.DatetimeIndex(hdates)
    days = (idx[-1] - idx[0]).days
    return max(1e-9, days / 365.25)

def _cagr_from_cumret(cumret_end: float, years: float) -> float:
    return (1.0 + cumret_end) ** (1.0 / years) - 1.0

years = _years_between(pf_long.hdates)

cr_long = pf_long.cum_return(htypes='close', method='simple')   # 输出列 cumret_close
cr_short = pf_short.cum_return(htypes='close', method='simple')

cumret_long_end = float(cr_long.values[cr_long.shares.index('LONG'), -1, 0])
cumret_short_end = float(cr_short.values[cr_short.shares.index('SHORT'), -1, 0])

# benchmark 行同样在 shares 里（tag_along）
cumret_bm_long_end = float(cr_long.values[cr_long.shares.index('000300.SH'), -1, 0])
cumret_bm_short_end = float(cr_short.values[cr_short.shares.index('000300.SH'), -1, 0])

summary = pd.DataFrame(
    {
        'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_long_end],
        'CAGR': [
            _cagr_from_cumret(cumret_long_end, years),
            _cagr_from_cumret(cumret_short_end, years),
            _cagr_from_cumret(cumret_bm_long_end, years),
        ],
    },
    index=['LONG', 'SHORT', '000300.SH'],
)

print(summary)

You should be able to see a 3-row summary table. It’s recommended that you pay attention to at least two things:

LONG vs 000300.SH: did it really outperform (higher CAGR and/or higher cumulative return)?
SHORT performance: It doesn’t necessarily have to “lose money,” but it helps us judge whether the condition is truly separating the sample (i.e., whether the difference between the LONG and SHORT ends is obvious).

4.10. 7. 可视化解释：用 `plot(highlight=...)` 把“触发点”标回图上

7.1 What this section aims to solve

We already have the return curve, but we’re still missing the last piece: explanation. In this step, we’ll do something “review-friendly”: on the candlestick chart of a single stock, highlight the trigger points so readers can see at a glance “when the timing condition was triggered.”

7.2 The principle of minimum necessity

HistoryPanel.plot(highlight=...) supports two common usages:

Shorthand: highlight='max'/'min'
Explicit: highlight={'condition': <1D bool over time>, 'style': {...}}

One thing to note: in the static rendering path, condition is more oriented toward a 1D time axis (used for scatter markers on the chart). So here it’s more robust to use a 1D bool on a single-stock time axis for highlighting.

7.3 Runnable code + expected output

import numpy as np

primary_share = '000001.SZ'

# 从 cond_long（M,L,1 或 M,L,N 的 bool）里抽出该 share 的时间轴 1D 条件
si = hp_macd.shares.index(primary_share)
cond_1d = np.asarray(cond_long[si, :, 0], dtype=bool).ravel()

# 为了让图更清爽，这里只看最后 200 个交易日（你也可以改成全区间）
hp_one = hp.loc[-200:]

fig = hp_one.plot(
    shares=[primary_share],
    interactive=True,
    highlight={'condition': cond_1d[-200:], 'style': {'marker': 'x', 's': 50}},
)
fig

Expected result: you’ll see a series of highlighted points on the chart (corresponding to the times when macd_hist_12_26_9 > 0). The value of this step is: when you see a segment of the portfolio curve suddenly deteriorate, you can go back to the single-stock chart to quickly confirm whether the trigger points are clustered in a choppy range, whether you’re getting “whipsawed” back and forth, and then decide whether to add filtering conditions next (e.g., trend filters, volatility filters, etc.).

4.11. 8. 完整代码（单函数可跑版本）

Below is a complete “single-function runnable” version for you to copy into a Notebook and run with one click. It does three things:

Run through the full research loop of MACD -> mask -> portfolio -> CAGR;
Plot the normalized curves of LONG/SHORT/benchmark;
Plot single-stock charts and highlight trigger points (for easier explanation and review).

import numpy as np
import pandas as pd
import qteasy as qt


def demo_vertical_timing(
        stocks: list,
        benchmark: str = '000300.SH',
        start: str = '20220101',
        end: str = '20221231',
        primary_share: str = '000001.SZ',
):
    \"\"\"演示纵向择时研究闭环：MACD 阈值 -> mask -> 组合曲线 -> CAGR -> 高亮解释。

    Parameters
    ----------
    stocks : list
        个股代码列表（建议 3~15 只，太多不利于解释）。
    benchmark : str, default '000300.SH'
        基准指数代码（示例使用沪深 300）。
    start : str, default '20220101'
        起始日期（YYYYMMDD）。
    end : str, default '20221231'
        结束日期（YYYYMMDD）。
    primary_share : str, default '000001.SZ'
        用于做“触发点高亮解释”的单只股票代码。

    Returns
    -------
    dict
        结果对象集合，便于你在 Notebook 里继续查看：
        - hp: 原始面板
        - hp_macd: 含 MACD 列的面板
        - pf_long/pf_short: 组合面板（含 benchmark 行）
        - summary: CAGR 摘要表（DataFrame）
        - fig_pf: 组合对比图
        - fig_one: 单股高亮图
    \"\"\"
    shares = list(stocks) + [benchmark]

    hp = qt.get_kline(
        shares=shares,
        start=start,
        end=end,
        freq='D',
        as_panel=True,
    )

    required = {'open', 'high', 'low', 'close'}
    missing = [c for c in required if c not in set(hp.htypes)]
    if missing:
        raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
    if benchmark not in hp.shares:
        raise ValueError(f'Benchmark {benchmark} not found in shares: {hp.shares}')
    if hp.shape[1] < 50:
        raise ValueError(
            'Not enough data points loaded (too few hdates). '
            'Please check your local datasource and date range.'
        )
    if primary_share not in hp.shares:
        raise ValueError(f'primary_share "{primary_share}" not found in shares')

    # 1) 派生因子：MACD（默认 12_26_9）
    hp_macd = hp.kline.macd(price_htype='close', fastperiod=12, slowperiod=26, signalperiod=9)
    factor_col = 'macd_hist_12_26_9'
    if factor_col not in hp_macd.htypes:
        raise ValueError(f'Required factor htype "{factor_col}" not found after macd()')

    # 2) 条件 -> mask（研究口径）
    cond_long = (hp_macd[factor_col] > 0.0)
    cond_short = (hp_macd[factor_col] <= 0.0)
    mask_long = hp.where(cond_long)
    mask_short = hp.where(cond_short)

    # 3) 组合聚合 + benchmark
    pf_long = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_long,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='LONG',
    )
    pf_short = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_short,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='SHORT',
    )

    # 4) cum_return -> CAGR 摘要
    def _years_between(hdates) -> float:
        idx = pd.DatetimeIndex(hdates)
        days = (idx[-1] - idx[0]).days
        return max(1e-9, days / 365.25)

    def _cagr_from_cumret(cumret_end: float, years: float) -> float:
        return (1.0 + cumret_end) ** (1.0 / years) - 1.0

    years = _years_between(pf_long.hdates)
    cr_long = pf_long.cum_return(htypes='close', method='simple')
    cr_short = pf_short.cum_return(htypes='close', method='simple')

    cumret_long_end = float(cr_long.values[cr_long.shares.index('LONG'), -1, 0])
    cumret_short_end = float(cr_short.values[cr_short.shares.index('SHORT'), -1, 0])
    cumret_bm_end = float(cr_long.values[cr_long.shares.index(benchmark), -1, 0])

    summary = pd.DataFrame(
        {
            'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_end],
            'CAGR': [
                _cagr_from_cumret(cumret_long_end, years),
                _cagr_from_cumret(cumret_short_end, years),
                _cagr_from_cumret(cumret_bm_end, years),
            ],
        },
        index=['LONG', 'SHORT', benchmark],
    )
    print('\\n[CAGR summary]')
    print(summary)

    # 5) 图：组合曲线对比（先 normalize，便于肉眼比较）
    pf_view = pf_long.normalize(htypes='close', base_index=0)
    fig_pf = pf_view.plot(interactive=True)

    # 6) 图：单股触发点解释（1D 时间轴条件）
    si = hp_macd.shares.index(primary_share)
    cond_1d = np.asarray(cond_long[si, :, 0], dtype=bool).ravel()
    lookback = min(200, len(cond_1d))
    fig_one = hp.loc[-lookback:].plot(
        shares=[primary_share],
        interactive=True,
        highlight={'condition': cond_1d[-lookback:], 'style': {'marker': 'x', 's': 50}},
    )

    return {
        'hp': hp,
        'hp_macd': hp_macd,
        'pf_long': pf_long,
        'pf_short': pf_short,
        'summary': summary,
        'fig_pf': fig_pf,
        'fig_one': fig_one,
    }


res = demo_vertical_timing(
    stocks=['000001.SZ', '600519.SH', '300750.SZ'],
    benchmark='000300.SH',
    start='20220101',
    end='20221231',
    primary_share='000001.SZ',
)
res['fig_pf']

Note: the normalize above is only to make visual comparison more intuitive; for statistics you should still rely on cum_return or the raw price series.

4.12. 9. 小结与边界

By this point, we’ve already run through a complete research loop for “timing on a small set of instruments”: data -> factor -> bool condition -> where mask -> portfolio + benchmark -> cum_return + CAGR -> plot(highlight) explanation.

Note: the portfolio/cum_return here are all research-oriented calculations and do not include full backtesting semantics such as transaction costs, slippage, settlement, capital constraints, etc. If you want to migrate the research logic into a real strategy backtest, it’s recommended to output the factor/condition as strategy-usable data columns or signals, and hand them off to Operator/Backtester to handle the trading-layer semantics.

4.13. 附录：插图索引（可选：在 Notebook 中生成后截图）

下列文件名为建议命名；仓库内未必已包含对应 png，不影响跟做正文代码。在 Notebook 跑通各节后自行截图即可。

建议文件名	Suggested placement	What you’ll see
`3.1_minimal_run.png`	§0 Minimal runnable kickoff	Prove the `get_kline -> plot` pipeline works
`3.1_pf_compare.png`	§0.5 or §6	Normalized curve comparison of `LONG/SHORT/000300.SH`
`3.1_highlight_one_share.png`	§7	Highlight trigger points on a single-stock chart (for easier review and explanation)