6. Studying event-driven factors with HistoryPanel

前置：教程 2.0 最小数据集 + 教程 2.5（建议先完成 2.5 §0）；本篇依赖完整 OHLC 列。

Event-driven factor: candlestick pattern signal -> event-window mask -> CAGR and visual explanation

In real research, “event-type factors” are often closer to our intuition than continuous factors: for example, “a hammer appears,” “an engulfing pattern appears,” “a long bullish candle on high volume appears”… Once these events occur, we often instinctively ask:

If I only pick stocks around events, will it be better in the long run?

This tutorial centers on this problem: use HistoryPanel.candle_pattern() to turn pattern events into researchable data, then window them into cross-sectional screening conditions, and finally run through the full closed loop: event signal -> event window -> where/mask -> portfolio + benchmark -> cum_return + CAGR -> plot(highlight) interpretation.

Unified constraints (consistent with the previous two posts):

Focus on individual stocks, and also include 000300.SH as the benchmark
Event window (K=5) (counts as “valid” if it happened within the most recent 5 days)

6.1. 0. 开场：先跑通“形态信号 -> 在 K 线上高亮事件”的最小版本

First, the minimal runnable proof: we only do two things:

Compute the pattern signal;
Highlight the event points on a single stock’s candlestick chart.

import qteasy as qt

share = '000001.SZ'
benchmark = '000300.SH'
hp = qt.get_kline(
    shares=[share, benchmark],
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

signals = hp.candle_pattern(name='cdlhammer', as_panel=False)
print(signals.tail())

fig = hp.plot(shares=[share], interactive=True, highlight='max')
fig

However, merely being able to “compute it / plot it” is far from enough. If we really want to use it as a reusable research method, we’ll run into at least the following problems:

Events are too sparse: Many pattern signals pop up like “pinpoints”—they appear today and are gone tomorrow. If you build the basket by “pick it when it appears today”, the portfolio will be very unstable: the number of selected stocks will swing wildly, and the return curve will easily become choppy and discontinuous. In the end, you may not even be sure whether you’re studying the pattern or “sample sparsity”.
How to window it like in real trading: A more natural convention is usually not “pick it immediately if it appears today”, but “count it as valid if it has appeared within the most recent period of time”. Because in reality we may not be able to catch the perfect pattern point on the same day; more commonly, we treat it as an “attention signal within a short window”.
How to turn it into cross-sectional stock selection: The value of event studies often lies in “picking, on the same day, the subset that had the event from among many stocks.” This requires turning the event signal into an (M,L) condition matrix, then standardizing it into a where mask.
Returns without an explanation: Even if you’ve computed CAGR, you still need something you can review: which candlesticks did the event points correspond to? Did they occur in an uptrend, a downtrend, or a range? Marking the events back onto the chart is the most critical step in event-based research.

Fortunately, all of these capabilities can be filled in step by step. This article starts with the “structure of candlestick pattern signals.”

6.2. 0.5 First, show the final result (what we’ll end up with)

By following this article through to the end, you’ll get three types of outputs:

Portfolio curve from event-window screening vs benchmark: Treat stocks that have “had the pattern occur within the last 5 days” as a dynamic basket, compute its portfolio curve, and compare it with 000300.SH.
CAGR summary table: Convert the portfolio’s terminal return into an annualized figure, making it easy to compare different research periods side by side.
Single-stock candlestick chart with event highlights: Mark the event dates back onto the candlestick chart so the results are reviewable and discussable.

6.3. 1. 目标（我们这篇文章要完成什么）

Fetch an OHLC panel for multiple shares, and add 000300.SH as the benchmark
Use candle_pattern to obtain a pattern-signal matrix (time x shares)
Window the event signals (K=5) to obtain event conditions of (M,L)
Use where() to normalize it into a research mask, and build long/short portfolio curves
Use cum_return to derive a CAGR summary table
Use plot(highlight=...) to highlight event points on the candlestick chart, closing the interpretability loop
At the end, provide the complete “single-function runnable” code

6.4. 2. 准备数据：必须有 OHLC（事件型因子离不开它）

2.1 What this section aims to solve

Pattern recognition depends on open/high/low/close. So in this section we do only one thing: make sure the OHLC columns are complete, and include the benchmark as well so we can compare directly later.

2.2 Minimum necessary principles

candle_pattern(name=...) checks whether the price_htypes you pass in exist in hp.htypes. So we validate the fields first to avoid discovering missing columns halfway through.

2.3 Runnable code + expected results

import qteasy as qt

benchmark = '000300.SH'
shares = ['000001.SZ', '600519.SH', '300750.SZ', benchmark]

hp = qt.get_kline(
    shares=shares,
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

required = {'open', 'high', 'low', 'close'}
missing = [c for c in required if c not in set(hp.htypes)]
if missing:
    raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
print('hp.shape:', hp.shape)
print('hp.htypes:', hp.htypes)

if hp.shape[1] < 50:
    raise ValueError(
        'Not enough data points loaded (too few hdates). '
        'Please check your local datasource and date range.'
    )

6.5. 3. 形态因子提取：`candle_pattern` 得到事件信号矩阵

3.1 What this section aims to solve

We want an “event signal table”: for each day and each stock, whether the pattern appears. This example uses cdlhammer (Hammer); later you can replace it with other pattern function names.

3.2 Minimum necessary principles

The return value of signals = hp.candle_pattern(name='cdlhammer', as_panel=False) is:

DataFrame (index=time, columns=shares)
Values are floats (typically 0 means no event; positive/negative indicates direction/strength, as defined by TA-Lib)

Later we will convert it into a boolean event matrix and then apply windowing.

Here we use a very practical “research convention simplification”: regardless of whether it’s +100 or -100, as long as it’s non-zero, we treat it as “an event occurred.” If you care more about direction, you can also split it into two windows, >0 and <0 (this article will demonstrate both long/short windows to stay consistent with the previous two articles).

3.3 Runnable code + expected result

signals = hp.candle_pattern(name='cdlhammer', as_panel=False)
print('signals shape:', signals.shape)

# 只看非 0 的事件（便于验证确实有触发）
nonzero = signals.where(signals != 0.0)
print(nonzero.dropna(how='all').tail())

6.6. 4. 事件窗口化（K=5）：不是“今天发生就选”，而是“最近 5 天发生过就选”

4.1 What this section aims to solve

This is the most critical section of the article: turning sparse event signals into a more practical, “valid within the window” condition.

We choose K=5: if an event occurred within the most recent 5 days, we treat the stock as belonging to the candidate set on that day.

4.2 Minimum necessary principles

The shape of signals is (L, M) (time x shares). We convert it to (M, L) (shares x time), then do a rolling-window any:

events_ml[i, t] = True means share i had the event on day t
window_ml[i, t] = any(events_ml[i, t-K+1 : t+1])

The resulting window_ml is still a (M, L) boolean array, so you can feed it directly into hp.where(window_ml).

You can understand this “windowing” as a very intuitive research convention:

I’m not asking you to hit the exact pattern day today; as long as the pattern appeared once in the last 5 days, I’ll treat you as a candidate where the “event is still valid.”

This will make the basket more stable (it won’t be left with only one or two needle-in-a-haystack signals), and it’s closer to what we actually do in reviews: patterns are often a “phase signal,” not a “millisecond-level trigger.”

4.3 Runnable code + expected output (corresponding to sample sheet 3, K=5)

import numpy as np

K = 5

sig_ml = signals.to_numpy().T  # (M, L)

long_events_ml = sig_ml > 0.0
short_events_ml = sig_ml < 0.0

def any_in_last_k(events_ml: np.ndarray, k: int) -> np.ndarray:
    m, l = events_ml.shape
    out = np.zeros((m, l), dtype=bool)
    for t in range(l):
        left = max(0, t - k + 1)
        out[:, t] = np.any(events_ml[:, left:t+1], axis=1)
    return out

long_window_ml = any_in_last_k(long_events_ml, K)
short_window_ml = any_in_last_k(short_events_ml, K)

mask_long = hp.where(long_window_ml)
mask_short = hp.where(short_window_ml)

print('mask_long shape:', mask_long.shape)  # 期望 (M,L,N)
print('selected_count_last_day(long):', int(long_window_ml[:, -1].sum()))

I suggest you add another sanity check: look at the distribution of “daily selected count” to avoid ending up with an empty basket due to overly strict filtering:

selected_count_by_day = long_window_ml.sum(axis=0)
print('selected_count stats (event_long):')
print('  min/max:', int(selected_count_by_day.min()), int(selected_count_by_day.max()))
print('  mean:', float(selected_count_by_day.mean()))
print('  p10/p50/p90:', np.quantile(selected_count_by_day.astype(float), [0.1, 0.5, 0.9]))

6.7. 5. `portfolio + benchmark + cum_return + CAGR`：把事件窗口筛选变成可比较的结果

5.1 What this section aims to solve

We’ve got the event-window filtering, but does it actually “work”? In this section, we give a clear conclusion using the portfolio curve + a CAGR table: the difference between the event long/short groups and the benchmark over the research period.

5.2 The minimum necessary principles

Here we still use research-oriented portfolio aggregation, without trade execution. Then use the ending value of cum_return to derive CAGR (equivalent annualized return).

5.3 Runnable code (illustration)

import pandas as pd

benchmark = '000300.SH'

pf_long = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_long,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='EVENT_LONG',
)

pf_short = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_short,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='EVENT_SHORT',
)

def _years_between(hdates) -> float:
    idx = pd.DatetimeIndex(hdates)
    days = (idx[-1] - idx[0]).days
    return max(1e-9, days / 365.25)

def _cagr_from_cumret(cumret_end: float, years: float) -> float:
    return (1.0 + cumret_end) ** (1.0 / years) - 1.0

years = _years_between(pf_long.hdates)
cr = pf_long.cum_return(htypes='close', method='simple')

cumret_long_end = float(cr.values[cr.shares.index('EVENT_LONG'), -1, 0])
cumret_bm_end = float(cr.values[cr.shares.index('000300.SH'), -1, 0])

print('CAGR(event_long):', _cagr_from_cumret(cumret_long_end, years))
print('CAGR(benchmark):', _cagr_from_cumret(cumret_bm_end, years))

Likewise, I suggest you organize it into a summary table (at least 3 rows: EVENT_LONG / EVENT_SHORT / benchmark) so readers can compare at a glance:

cr2 = pf_short.cum_return(htypes='close', method='simple')
cumret_short_end = float(cr2.values[cr2.shares.index('EVENT_SHORT'), -1, 0])

summary = pd.DataFrame(
    {
        'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_end],
        'CAGR': [
            _cagr_from_cumret(cumret_long_end, years),
            _cagr_from_cumret(cumret_short_end, years),
            _cagr_from_cumret(cumret_bm_end, years),
        ],
    },
    index=['EVENT_LONG', 'EVENT_SHORT', '000300.SH'],
)
print('\\n[CAGR summary]')
print(summary)

6.8. 6. 可视化解释：在 K 线上高亮事件发生日（让结论可复盘）

6.1 What this section aims to solve

The biggest fear in event studies is “having only a single returns table.” We need to plot the event points back onto the candlestick chart and visually verify “what pattern this actually is, and within what trend it occurs,” so the conclusion can be reproduced and discussed.

6.2 The principle of minimum necessity

plot(highlight=...) can use a 1D bool (time axis) to highlight event points. Therefore, for a given primary_share, we extract a 1D condition from signals, then plot the candlestick chart and highlight it.

6.3 Runnable code + expected results

import numpy as np

primary_share = '000001.SZ'
event_1d = (signals[primary_share].to_numpy() != 0.0).astype(bool)

fig = hp.plot(
    shares=[primary_share],
    interactive=True,
    highlight={'condition': event_1d, 'style': {'marker': 'x', 's': 60}},
)
fig

Expected outcome: you’ll see a series of highlighted points (event days) on the candlestick chart. This step is crucial: event studies aren’t afraid of “bad results”; they’re afraid of “results that can’t be explained.” By marking the events back on the chart, you can directly see where they occur: at the end of a decline, as a continuation within an uptrend, or just sideways noise—so you can decide whether the next step is to add trend filters or volatility filters.

6.9. 7. 完整代码（单函数可跑版本）

Below is a complete “single-function runnable” version for you to copy into a Notebook and run with one click. It covers:

candle_pattern extracts the event signals;
K=5 windowing (any in the last 5 days);
The research closed loop of where -> portfolio -> cum_return -> CAGR;
Highlight event points on a single stock’s candlestick chart (explainable via review/replay).

import numpy as np
import pandas as pd
import qteasy as qt


def demo_event_pattern(
        shares: list,
        benchmark: str = '000300.SH',
        pattern_name: str = 'cdlhammer',
        k: int = 5,
        start: str = '20220101',
        end: str = '20221231',
        primary_share: str = '000001.SZ',
):
    \"\"\"演示事件型因子研究闭环：形态信号 -> 窗口化 -> 组合曲线 -> CAGR -> K线高亮解释。

    Parameters
    ----------
    shares : list
        股票池（必须包含 benchmark；建议 3~30 只即可演示横向筛选）。
    benchmark : str, default '000300.SH'
        基准指数代码。
    pattern_name : str, default 'cdlhammer'
        形态名称（ta-lib 风格名称）。
    k : int, default 5
        事件窗口长度：最近 k 天出现过就算有效。
    start : str, default '20220101'
        起始日期（YYYYMMDD）。
    end : str, default '20221231'
        结束日期（YYYYMMDD）。
    primary_share : str, default '000001.SZ'
        用于高亮解释的单只股票代码。

    Returns
    -------
    dict
        包含 hp/signals/pf_long/pf_short/summary/fig_pf/fig_one 等结果对象。
    \"\"\"
    if benchmark not in shares:
        raise ValueError('benchmark must be included in shares')

    hp = qt.get_kline(
        shares=shares,
        start=start,
        end=end,
        freq='D',
        as_panel=True,
    )
    required = {'open', 'high', 'low', 'close'}
    missing = [c for c in required if c not in set(hp.htypes)]
    if missing:
        raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
    if hp.shape[1] < 50:
        raise ValueError(
            'Not enough data points loaded (too few hdates). '
            'Please check your local datasource and date range.'
        )
    if primary_share not in hp.shares:
        raise ValueError(f'primary_share "{primary_share}" not found in shares')

    # 1) 形态信号：DataFrame (L, M)
    signals = hp.candle_pattern(name=pattern_name, as_panel=False)
    print('\\n[signals]')
    print('  shape:', signals.shape)
    print('  nonzero tail:')
    print(signals.where(signals != 0.0).dropna(how='all').tail())

    # 2) 事件 -> bool -> 窗口化 any-in-last-k（得到 (M, L)）
    sig_ml = signals.to_numpy().T  # (M, L)
    long_events_ml = sig_ml > 0.0
    short_events_ml = sig_ml < 0.0

    def any_in_last_k(events_ml: np.ndarray, kk: int) -> np.ndarray:
        m, l = events_ml.shape
        out = np.zeros((m, l), dtype=bool)
        for t in range(l):
            left = max(0, t - kk + 1)
            out[:, t] = np.any(events_ml[:, left:t + 1], axis=1)
        return out

    long_window_ml = any_in_last_k(long_events_ml, k)
    short_window_ml = any_in_last_k(short_events_ml, k)

    selected_count_by_day = long_window_ml.sum(axis=0)
    print('\\n[Selection count stats]')
    print('  min/max:', int(selected_count_by_day.min()), int(selected_count_by_day.max()))
    print('  mean:', float(selected_count_by_day.mean()))
    print('  p10/p50/p90:', np.quantile(selected_count_by_day.astype(float), [0.1, 0.5, 0.9]))

    mask_long = hp.where(long_window_ml)
    mask_short = hp.where(short_window_ml)

    # 3) 组合聚合 + benchmark
    pf_long = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_long,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='EVENT_LONG',
    )
    pf_short = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_short,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='EVENT_SHORT',
    )

    # 4) cum_return -> CAGR 摘要
    def _years_between(hdates) -> float:
        idx = pd.DatetimeIndex(hdates)
        days = (idx[-1] - idx[0]).days
        return max(1e-9, days / 365.25)

    def _cagr_from_cumret(cumret_end: float, years: float) -> float:
        return (1.0 + cumret_end) ** (1.0 / years) - 1.0

    years = _years_between(pf_long.hdates)
    cr_long = pf_long.cum_return(htypes='close', method='simple')
    cr_short = pf_short.cum_return(htypes='close', method='simple')

    cumret_long_end = float(cr_long.values[cr_long.shares.index('EVENT_LONG'), -1, 0])
    cumret_short_end = float(cr_short.values[cr_short.shares.index('EVENT_SHORT'), -1, 0])
    cumret_bm_end = float(cr_long.values[cr_long.shares.index(benchmark), -1, 0])

    summary = pd.DataFrame(
        {
            'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_end],
            'CAGR': [
                _cagr_from_cumret(cumret_long_end, years),
                _cagr_from_cumret(cumret_short_end, years),
                _cagr_from_cumret(cumret_bm_end, years),
            ],
        },
        index=['EVENT_LONG', 'EVENT_SHORT', benchmark],
    )
    print('\\n[CAGR summary]')
    print(summary)

    # 5) 图：组合对比（归一化更直观）
    fig_pf = pf_long.normalize(htypes='close', base_index=0).plot(interactive=True)

    # 6) 图：单股事件高亮（使用 1D 时间轴 bool）
    event_1d = (signals[primary_share].to_numpy() != 0.0).astype(bool)
    lookback = min(200, len(event_1d))
    fig_one = hp.loc[-lookback:].plot(
        shares=[primary_share],
        interactive=True,
        highlight={'condition': event_1d[-lookback:], 'style': {'marker': 'x', 's': 60}},
    )
    return {
        'hp': hp,
        'signals': signals,
        'pf_long': pf_long,
        'pf_short': pf_short,
        'summary': summary,
        'fig_pf': fig_pf,
        'fig_one': fig_one,
    }


res = demo_event_pattern(
    shares=['000001.SZ', '600519.SH', '300750.SZ', '000300.SH'],
    benchmark='000300.SH',
    pattern_name='cdlhammer',
    k=5,
    start='20220101',
    end='20221231',
    primary_share='000001.SZ',
)
res['fig_one']

6.10. 8. 小结与边界

By this point, we’ve already run through the full event-factor research pipeline: pattern signal -> windowing -> cross-sectional filtering -> composite curve -> CAGR -> candlestick highlighting and explanation.

Need to emphasize again: this pipeline is research-oriented coarse aggregation and contains no trade-execution semantics. If you want to turn the event signal into a real backtestable strategy, you should convert the “event-window filtering rules” into strategy signals and hand them off to Operator/Backtester to handle trading-layer details (costs, settlement, order constraints, etc.).

6.11. 附录：插图索引（可选：在 Notebook 中生成后截图）

下列文件名为建议命名；仓库内未必已包含对应 png。在 Notebook 跑通各节后自行截图即可。

建议文件名	Suggested placement	What you’ll see
`3.3_minimal_run.png`	§0	Minimal runnable: compute signals + plot the basic chart
`3.3_pf_compare.png`	§0.5 or §5	Composite curve of the event-window basket vs. the benchmark
`3.3_highlight_pattern.png`	§6	Highlight the event day on a single-stock candlestick chart (reproducible and explainable)