6. 用 HistoryPanel 研究事件型因子

事件型因子：K線形態信號 -> 事件窗口 mask -> CAGR 與可視化解釋

在真實研究裏，“事件型因子”經常比連續因子更貼近我們的直覺：比如“錘頭線出現”“吞沒形態出現”“放量長陽出現”……這些事件一旦發生，我們往往會下意識地問一句：

如果我只在事件附近挑股票，長期看會不會更好？

本篇教程就圍繞這個問題，用 HistoryPanel.candle_pattern() 把形態事件變成可研究的數據，再把它窗口化成橫向篩選條件，最後跑通完整閉環： 事件信號 -> 事件窗口 -> where/mask -> portfolio + benchmark -> cum_return + CAGR -> plot(highlight) 解釋。

統一約束（與前兩篇保持一致）：

以個股爲主，同時加入 000300.SH 作爲 benchmark
事件窗口 (K=5)（最近 5 天內發生過就算“有效”）

6.1. 0. 开场：先跑通“形态信号 -> 在 K 线上高亮事件”的最小版本

先看最小可跑證明：我們只做兩件事：

計算形態信號；
在單隻股票的 K 線上把事件點高亮出來。

import qteasy as qt

share = '000001.SZ'
benchmark = '000300.SH'
hp = qt.get_kline(
    shares=[share, benchmark],
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

signals = hp.candle_pattern(name='cdlhammer', as_panel=False)
print(signals.tail())

fig = hp.plot(shares=[share], interactive=True, highlight='max')
fig

不過，僅僅“能算出來/能畫出來”還遠遠不夠。如果我們真的要把它當成可複用的研究方法來用，至少會遇到下面這些問題：

事件太稀疏：很多形態信號都像“針尖”一樣跳出來——今天出現、明天又沒了。如果你按“當天出現就選”來做籃子，組合會非常不穩定：入選股票數量忽多忽少，收益曲線也容易斷斷續續，最後你甚至分不清是在研究形態，還是在研究“樣本稀疏性”。
怎麼窗口化纔像實戰：更自然的口徑通常不是“今天出現就立刻選”，而是“最近一段時間出現過就算有效”。因爲現實裏我們不一定能在當天捕捉到完美的形態點；更常見的做法是把它當成一個“短窗口內的關注信號”。
怎麼變成橫向選股：事件研究的價值，往往在於“同一天從很多股票裏挑出發生過事件的那一批”。這要求我們把事件信號變成 (M,L) 的條件矩陣，再統一規整成 where 的 mask。
有收益沒解釋：就算你算出了 CAGR，你也需要可覆盤：事件點到底對應了哪些 K 線？發生在上升趨勢、下跌趨勢還是震盪？把事件標回圖上，是事件型研究裏最關鍵的一步。

好在這些能力都可以一步步補齊。本文就從“形態信號的結構”講起。

6.2. 0.5 先貼最終效果（我們最後會得到什麼）

按本文做完，你會得到三類輸出：

事件窗口篩選的組合曲線 vs benchmark：把“最近 5 天發生過形態”的股票當作一個動態籃子，算出它的組合曲線，並與 000300.SH 對比。
CAGR 摘要表：把組合末值收益折算成年化口徑，便於橫向比較不同研究期。
單股 K 線事件高亮圖：把事件發生日標回 K 線上，讓結果可覆盤、可討論。

6.3. 1. 目标（我们这篇文章要完成什么）

獲取多 shares 的 OHLC 面板，並加入 000300.SH 作爲 benchmark
用 candle_pattern 得到形態信號矩陣（time x shares）
把事件信號窗口化（K=5），得到 (M,L) 的事件條件
用 where() 規整爲研究 mask，並構建 long/short 兩組組合曲線
用 cum_return 推導 CAGR 摘要表
用 plot(highlight=...) 在 K 線上高亮事件點，做到可解釋閉環
文末給出“單函數可跑”的完整代碼

6.4. 2. 准备数据：必须有 OHLC（事件型因子离不开它）

2.1 本節要解決什麼

形態識別依賴 open/high/low/close。所以本節我們只做一件事：確保 OHLC 列齊全，並把 benchmark 一併放進來，後面直接對比。

2.2 最小必要原理

candle_pattern(name=...) 會檢查你傳入的 price_htypes 是否在 hp.htypes 中存在。因此我們先做字段校驗，避免寫到一半才發現缺列。

2.3 可運行代碼 + 預期效果

import qteasy as qt

benchmark = '000300.SH'
shares = ['000001.SZ', '600519.SH', '300750.SZ', benchmark]

hp = qt.get_kline(
    shares=shares,
    start='20220101',
    end='20221231',
    freq='D',
    as_panel=True,
)

required = {'open', 'high', 'low', 'close'}
missing = [c for c in required if c not in set(hp.htypes)]
if missing:
    raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
print('hp.shape:', hp.shape)
print('hp.htypes:', hp.htypes)

if hp.shape[1] < 50:
    raise ValueError(
        'Not enough data points loaded (too few hdates). '
        'Please check your local datasource and date range.'
    )

6.5. 3. 形态因子提取：`candle_pattern` 得到事件信号矩阵

3.1 本節要解決什麼

我們要得到一張“事件信號表”：每一天、每隻股票，是否出現該形態。這裏示例用 cdlhammer（錘頭線），你後面可以替換成其他形態函數名。

3.2 最小必要原理

signals = hp.candle_pattern(name='cdlhammer', as_panel=False) 的返回是：

DataFrame（index=时间，columns=shares）
值爲浮點數（通常 0 表示無事件，正/負表示方向/強度，具體由 ta-lib 定義）

後面我們會把它轉成 bool 事件矩陣，再做窗口化。

這裏我們用一個很實用的“研究口徑簡化”：不管它是 +100 還是 -100，只要 非 0，我們就認爲“發生過事件”。如果你更關心方向，也可以分成 >0 和 <0 兩套窗口（本文會同時演示 long/short 兩套窗口，保持與前兩篇一致）。

3.3 可運行代碼 + 預期效果

signals = hp.candle_pattern(name='cdlhammer', as_panel=False)
print('signals shape:', signals.shape)

# 只看非 0 的事件（便于验证确实有触发）
nonzero = signals.where(signals != 0.0)
print(nonzero.dropna(how='all').tail())

6.6. 4. 事件窗口化（K=5）：不是“今天发生就选”，而是“最近 5 天发生过就选”

4.1 本節要解決什麼

這是本文最關鍵的一節：把稀疏的事件信號變成更貼近實戰的“窗口內有效”條件。

我們選擇 K=5： 最近 5 天內出現過事件，就認爲該股票在當日屬於候選集合。

4.2 最小必要原理

signals 的形狀是 (L, M)（time x shares），我們把它轉成 (M, L)（shares x time），再做窗口滾動的 any：

events_ml[i, t] = True 表示 share i 在 t 日發生了事件
window_ml[i, t] = any(events_ml[i, t-K+1 : t+1])

最終得到的 window_ml 仍然是 (M, L) 的 bool，就可以直接餵給 hp.where(window_ml)。

你可以把這個“窗口化”理解成一句非常直覺的研究口徑：

我不要求你今天剛好踩在形態當天；只要你在最近 5 天裏出現過一次形態，我就把你當作“事件仍然有效”的候選。

這會讓籃子更穩定（不會只剩一兩個針尖信號），也更接近我們做覆盤時的真實行爲：形態往往是一個“階段信號”，不是一個“毫秒級觸發器”。

4.3 可運行代碼 + 預期效果（樣張3對應，K=5）

import numpy as np

K = 5

sig_ml = signals.to_numpy().T  # (M, L)

long_events_ml = sig_ml > 0.0
short_events_ml = sig_ml < 0.0

def any_in_last_k(events_ml: np.ndarray, k: int) -> np.ndarray:
    m, l = events_ml.shape
    out = np.zeros((m, l), dtype=bool)
    for t in range(l):
        left = max(0, t - k + 1)
        out[:, t] = np.any(events_ml[:, left:t+1], axis=1)
    return out

long_window_ml = any_in_last_k(long_events_ml, K)
short_window_ml = any_in_last_k(short_events_ml, K)

mask_long = hp.where(long_window_ml)
mask_short = hp.where(short_window_ml)

print('mask_long shape:', mask_long.shape)  # 期望 (M,L,N)
print('selected_count_last_day(long):', int(long_window_ml[:, -1].sum()))

建議你再加一段 sanity check：看看“每天入選數量”的分佈，避免篩選過嚴導致空籃子：

selected_count_by_day = long_window_ml.sum(axis=0)
print('selected_count stats (event_long):')
print('  min/max:', int(selected_count_by_day.min()), int(selected_count_by_day.max()))
print('  mean:', float(selected_count_by_day.mean()))
print('  p10/p50/p90:', np.quantile(selected_count_by_day.astype(float), [0.1, 0.5, 0.9]))

6.7. 5. `portfolio + benchmark + cum_return + CAGR`：把事件窗口筛选变成可比较的结果

5.1 本節要解決什麼

事件窗口篩選得到了，但它到底“有沒有用”？這一節我們用組合曲線 + CAGR 表給出一個清晰的結論：事件 long/short 兩組在研究期內與 benchmark 的差異。

5.2 最小必要原理

這裏仍然使用研究向的 portfolio 聚合，不做交易執行。然後用 cum_return 取末值推導 CAGR（等效年化）。

5.3 可運行代碼（示意）

import pandas as pd

benchmark = '000300.SH'

pf_long = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_long,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='EVENT_LONG',
)

pf_short = hp.portfolio(
    htypes='close',
    mode='equal',
    mask=mask_short,
    benchmark=benchmark,
    benchmark_output='tag_along',
    new_share_name='EVENT_SHORT',
)

def _years_between(hdates) -> float:
    idx = pd.DatetimeIndex(hdates)
    days = (idx[-1] - idx[0]).days
    return max(1e-9, days / 365.25)

def _cagr_from_cumret(cumret_end: float, years: float) -> float:
    return (1.0 + cumret_end) ** (1.0 / years) - 1.0

years = _years_between(pf_long.hdates)
cr = pf_long.cum_return(htypes='close', method='simple')

cumret_long_end = float(cr.values[cr.shares.index('EVENT_LONG'), -1, 0])
cumret_bm_end = float(cr.values[cr.shares.index('000300.SH'), -1, 0])

print('CAGR(event_long):', _cagr_from_cumret(cumret_long_end, years))
print('CAGR(benchmark):', _cagr_from_cumret(cumret_bm_end, years))

同樣建議你整理成摘要表（至少 3 行：EVENT_LONG / EVENT_SHORT / benchmark），便於讀者一眼比較：

cr2 = pf_short.cum_return(htypes='close', method='simple')
cumret_short_end = float(cr2.values[cr2.shares.index('EVENT_SHORT'), -1, 0])

summary = pd.DataFrame(
    {
        'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_end],
        'CAGR': [
            _cagr_from_cumret(cumret_long_end, years),
            _cagr_from_cumret(cumret_short_end, years),
            _cagr_from_cumret(cumret_bm_end, years),
        ],
    },
    index=['EVENT_LONG', 'EVENT_SHORT', '000300.SH'],
)
print('\\n[CAGR summary]')
print(summary)

6.8. 6. 可视化解释：在 K 线上高亮事件发生日（让结论可复盘）

6.1 本節要解決什麼

事件型研究最怕“只有一張收益表”。我們需要把事件點標回到 K 線圖上，肉眼覈對“這到底是什麼形態、發生在什麼趨勢裏”，這樣結論纔可覆盤、可討論。

6.2 最小必要原理

plot(highlight=...) 可以用 1D bool（時間軸）高亮事件點。因此我們對某隻 primary_share 從 signals 裏抽出 1D 條件，再畫 K 線並高亮。

6.3 可運行代碼 + 預期效果

import numpy as np

primary_share = '000001.SZ'
event_1d = (signals[primary_share].to_numpy() != 0.0).astype(bool)

fig = hp.plot(
    shares=[primary_share],
    interactive=True,
    highlight={'condition': event_1d, 'style': {'marker': 'x', 's': 60}},
)
fig

預期效果：你會在 K 線上看到一串被高亮的點（事件發生日）。這一步很重要：事件型研究不怕“結果不好”，怕的是“結果無法解釋”。把事件標回圖上，你就能直觀看到它出現在哪些位置：是下跌末端、是上漲中繼、還是震盪噪聲，從而決定下一步要不要疊加趨勢過濾或波動過濾。

6.9. 7. 完整代码（单函数可跑版本）

下面給出一段“單函數可跑”的完整版本，方便你複製進 Notebook 一鍵運行。它覆蓋：

candle_pattern 提取事件信號；
K=5 窗口化（最近 5 天 any）；
where -> portfolio -> cum_return -> CAGR 的研究閉環；
單股 K 線高亮事件點（可覆盤解釋）。

import numpy as np
import pandas as pd
import qteasy as qt


def demo_event_pattern(
        shares: list,
        benchmark: str = '000300.SH',
        pattern_name: str = 'cdlhammer',
        k: int = 5,
        start: str = '20220101',
        end: str = '20221231',
        primary_share: str = '000001.SZ',
):
    \"\"\"演示事件型因子研究闭环：形态信号 -> 窗口化 -> 组合曲线 -> CAGR -> K线高亮解释。

    Parameters
    ----------
    shares : list
        股票池（必须包含 benchmark；建议 3~30 只即可演示横向筛选）。
    benchmark : str, default '000300.SH'
        基准指数代码。
    pattern_name : str, default 'cdlhammer'
        形态名称（ta-lib 风格名称）。
    k : int, default 5
        事件窗口长度：最近 k 天出现过就算有效。
    start : str, default '20220101'
        起始日期（YYYYMMDD）。
    end : str, default '20221231'
        结束日期（YYYYMMDD）。
    primary_share : str, default '000001.SZ'
        用于高亮解释的单只股票代码。

    Returns
    -------
    dict
        包含 hp/signals/pf_long/pf_short/summary/fig_pf/fig_one 等结果对象。
    \"\"\"
    if benchmark not in shares:
        raise ValueError('benchmark must be included in shares')

    hp = qt.get_kline(
        shares=shares,
        start=start,
        end=end,
        freq='D',
        as_panel=True,
    )
    required = {'open', 'high', 'low', 'close'}
    missing = [c for c in required if c not in set(hp.htypes)]
    if missing:
        raise ValueError(f'Missing required OHLC columns in htypes: {missing}')
    if hp.shape[1] < 50:
        raise ValueError(
            'Not enough data points loaded (too few hdates). '
            'Please check your local datasource and date range.'
        )
    if primary_share not in hp.shares:
        raise ValueError(f'primary_share "{primary_share}" not found in shares')

    # 1) 形态信号：DataFrame (L, M)
    signals = hp.candle_pattern(name=pattern_name, as_panel=False)
    print('\\n[signals]')
    print('  shape:', signals.shape)
    print('  nonzero tail:')
    print(signals.where(signals != 0.0).dropna(how='all').tail())

    # 2) 事件 -> bool -> 窗口化 any-in-last-k（得到 (M, L)）
    sig_ml = signals.to_numpy().T  # (M, L)
    long_events_ml = sig_ml > 0.0
    short_events_ml = sig_ml < 0.0

    def any_in_last_k(events_ml: np.ndarray, kk: int) -> np.ndarray:
        m, l = events_ml.shape
        out = np.zeros((m, l), dtype=bool)
        for t in range(l):
            left = max(0, t - kk + 1)
            out[:, t] = np.any(events_ml[:, left:t + 1], axis=1)
        return out

    long_window_ml = any_in_last_k(long_events_ml, k)
    short_window_ml = any_in_last_k(short_events_ml, k)

    selected_count_by_day = long_window_ml.sum(axis=0)
    print('\\n[Selection count stats]')
    print('  min/max:', int(selected_count_by_day.min()), int(selected_count_by_day.max()))
    print('  mean:', float(selected_count_by_day.mean()))
    print('  p10/p50/p90:', np.quantile(selected_count_by_day.astype(float), [0.1, 0.5, 0.9]))

    mask_long = hp.where(long_window_ml)
    mask_short = hp.where(short_window_ml)

    # 3) 组合聚合 + benchmark
    pf_long = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_long,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='EVENT_LONG',
    )
    pf_short = hp.portfolio(
        htypes='close',
        mode='equal',
        mask=mask_short,
        benchmark=benchmark,
        benchmark_output='tag_along',
        new_share_name='EVENT_SHORT',
    )

    # 4) cum_return -> CAGR 摘要
    def _years_between(hdates) -> float:
        idx = pd.DatetimeIndex(hdates)
        days = (idx[-1] - idx[0]).days
        return max(1e-9, days / 365.25)

    def _cagr_from_cumret(cumret_end: float, years: float) -> float:
        return (1.0 + cumret_end) ** (1.0 / years) - 1.0

    years = _years_between(pf_long.hdates)
    cr_long = pf_long.cum_return(htypes='close', method='simple')
    cr_short = pf_short.cum_return(htypes='close', method='simple')

    cumret_long_end = float(cr_long.values[cr_long.shares.index('EVENT_LONG'), -1, 0])
    cumret_short_end = float(cr_short.values[cr_short.shares.index('EVENT_SHORT'), -1, 0])
    cumret_bm_end = float(cr_long.values[cr_long.shares.index(benchmark), -1, 0])

    summary = pd.DataFrame(
        {
            'cum_return_end': [cumret_long_end, cumret_short_end, cumret_bm_end],
            'CAGR': [
                _cagr_from_cumret(cumret_long_end, years),
                _cagr_from_cumret(cumret_short_end, years),
                _cagr_from_cumret(cumret_bm_end, years),
            ],
        },
        index=['EVENT_LONG', 'EVENT_SHORT', benchmark],
    )
    print('\\n[CAGR summary]')
    print(summary)

    # 5) 图：组合对比（归一化更直观）
    fig_pf = pf_long.normalize(htypes='close', base_index=0).plot(interactive=True)

    # 6) 图：单股事件高亮（使用 1D 时间轴 bool）
    event_1d = (signals[primary_share].to_numpy() != 0.0).astype(bool)
    lookback = min(200, len(event_1d))
    fig_one = hp.loc[-lookback:].plot(
        shares=[primary_share],
        interactive=True,
        highlight={'condition': event_1d[-lookback:], 'style': {'marker': 'x', 's': 60}},
    )
    return {
        'hp': hp,
        'signals': signals,
        'pf_long': pf_long,
        'pf_short': pf_short,
        'summary': summary,
        'fig_pf': fig_pf,
        'fig_one': fig_one,
    }


res = demo_event_pattern(
    shares=['000001.SZ', '600519.SH', '300750.SZ', '000300.SH'],
    benchmark='000300.SH',
    pattern_name='cdlhammer',
    k=5,
    start='20220101',
    end='20221231',
    primary_share='000001.SZ',
)
res['fig_one']

6.10. 8. 小结与边界

到這裏，我們已經把事件型因子研究跑通了：形態信號 -> 窗口化 -> 橫向篩選 -> 組合曲線 -> CAGR -> K線高亮解釋。

需要再次強調：這條鏈路是研究向粗聚合，不含交易執行語義。如果你希望把事件信號做成真實可回測的策略，應把“事件窗口的篩選規則”轉成策略信號，並交給 Operator/Backtester 處理交易層細節（成本、交割、下單約束等）。

6.11. 附錄：插圖索引（建議你在 Notebook 裏生成/截圖）

插圖	建議放置位置	你將看到什麼
`img/3.3_minimal_run.png`	§0	最小可跑：算 signals + 畫出基礎圖
`img/3.3_pf_compare.png`	§0.5 或 §5	事件窗口籃子 vs benchmark 的組合曲線
`img/3.3_highlight_pattern.png`	§6	單股 K 線高亮事件發生日（可覆盤解釋）