Multiplicity Control in Oncology Trials
Definition
Multiplicity arises whenever a clinical trial tests more than one hypothesis and the trial may be declared positive if any test reaches significance. Without adjustment, the probability of at least one false-positive finding (the family-wise error rate, FWER) exceeds the nominal alpha level.
"Failure to account for multiplicity when there are several clinical endpoints evaluated in a study can lead to false conclusions regarding the effects of the drug. The regulatory concern regarding multiplicity arises principally in the evaluation of clinical trials intended to demonstrate effectiveness and support drug approval." — FDA Multiple Endpoints Guidance (Final, January 2017), §III
Sources of multiplicity in oncology trials:
| Source | Example | Typical magnitude |
|---|---|---|
| Multiple endpoints | PFS + OS + ORR + PRO | 2–6 hypotheses |
| Multiple populations | ITT + biomarker-positive subgroup | 2–3 populations |
| Multiple treatment arms / doses | High dose vs. low dose vs. control | 2–4 comparisons |
| Interim analyses | 1–3 interim looks for efficacy | 2–4 analyses |
| Multiple time points | 12-month vs. 24-month landmark | Rarely >2 |
FWER vs. FDR:
- FWER (Family-Wise Error Rate): Probability of at least one false rejection among all tested hypotheses. FDA requires strong FWER control at alpha = 0.05 (two-sided) or 0.025 (one-sided) for confirmatory trials. This is the standard for Phase 3 oncology submissions.
- FDR (False Discovery Rate): Expected proportion of false rejections among all rejections. More liberal than FWER; appropriate for exploratory biomarker screening (e.g., testing 200 genes for differential expression) but not acceptable for confirmatory efficacy endpoints in registrational trials.
"An important principle for controlling multiplicity is to prospectively specify all planned endpoints, time points, analysis populations, and analyses." — FDA Multiple Endpoints Guidance (Final, January 2017), §III
Regulatory Position
FDA requires strong FWER control for all confirmatory claims:
- Primary endpoints supporting the efficacy claim for approval
- Key secondary endpoints intended for labeling with inferential language
- Co-primary endpoints (both must be significant, or alpha must be split)
- Pre-specified subgroup analyses used for labeling (e.g., biomarker-positive population)
- Interim analyses that can lead to early stopping for efficacy
FDA does not require multiplicity adjustment for:
- Exploratory endpoints (descriptive only; p-values should not be presented as confirmatory)
- Sensitivity analyses labeled as "supportive"
- Safety analyses (unless pre-specified safety endpoints in dedicated safety studies)
- Post hoc subgroup analyses (hypothesis-generating only)
"Post hoc analyses of trials that fail on their prospectively specified endpoints may be useful for generating hypotheses for future testing, they do not yield definitive results... post hoc analyses by themselves cannot establish effectiveness." — FDA Multiple Endpoints Guidance (Final, January 2017)
ICH E8(R1) reinforces that confirmatory studies must evaluate clinical endpoints relevant to disease burden with pre-specified statistical methods to ensure scientific rigor. (ICH E8(R1), adopted October 2021, Final guidance)
Source: FDA Multiple Endpoints Guidance (January 2017) = Final; ICH E8(R1) General Principles for Clinical Studies (October 2021) = Final
Methods for Multiplicity Control
1. Bonferroni Method (Single-Step)
The simplest and most conservative approach. Divide alpha equally across k hypotheses: each tested at alpha/k.
Procedure:
For k hypotheses at overall alpha = 0.05:
Reject H_i if p_i <= alpha/k
Example (k = 2, co-primary PFS + OS):
Test PFS at alpha = 0.025
Test OS at alpha = 0.025
Both must achieve p < 0.025 for their respective claims
Properties:
- Strongly controls FWER under any dependence structure
- Conservative when endpoints are positively correlated (PFS and OS typically rho = 0.4-0.6)
- Power loss: each endpoint tested at reduced alpha
- Unequal splits allowed (e.g., alpha_1 = 0.04, alpha_2 = 0.01) if clinically justified
R implementation:
p_values <- c(0.018, 0.032)
p.adjust(p_values, method = "bonferroni")
# [1] 0.036 0.064 → PFS significant, OS not significant
2. Holm Procedure (Step-Down)
Less conservative than Bonferroni. Uniformly more powerful. FDA recommended.
Procedure:
- Order p-values from smallest to largest: p_(1) <= p_(2) <= ... <= p_(k)
- Reject H_(1) if p_(1) <= alpha/k
- Reject H_(2) if p_(2) <= alpha/(k-1)
- Continue until first non-rejection; retain all subsequent hypotheses
Example (3 endpoints, alpha = 0.05):
p-values: p_PFS = 0.008, p_OS = 0.022, p_ORR = 0.045
Sorted: 0.008 <= 0.022 <= 0.045
Step 1: p_(1) = 0.008 <= 0.05/3 = 0.0167? YES -> Reject H_PFS
Step 2: p_(2) = 0.022 <= 0.05/2 = 0.025? YES -> Reject H_OS
Step 3: p_(3) = 0.045 <= 0.05/1 = 0.05? YES -> Reject H_ORR
Conclusion: All three endpoints claimed
(Bonferroni would have failed OS: 0.022 > 0.0167)
Properties:
- Strongly controls FWER under any dependence structure (no assumptions needed)
- Always at least as powerful as Bonferroni; often strictly more powerful
- Step-down: starts with the most significant hypothesis
- Recommended by FDA as a default improvement over Bonferroni
R implementation:
p_values <- c(PFS = 0.008, OS = 0.022, ORR = 0.045)
p.adjust(p_values, method = "holm")
# [1] 0.024 0.044 0.045 → all significant at alpha = 0.05
3. Hochberg Procedure (Step-Up)
More powerful than Holm when hypotheses are independent or positively correlated (common for PFS/OS in oncology).
Procedure:
- Order p-values from largest to smallest: p_(k) >= ... >= p_(1)
- If p_(k) <= alpha, reject all k hypotheses
- If not, check p_(k-1) <= alpha/2; if yes, reject H_(1) through H_(k-1)
- Continue stepping down
Example (PFS and OS, alpha = 0.05):
p_PFS = 0.02, p_OS = 0.03
Sorted (descending): p_(2) = 0.03, p_(1) = 0.02
Step 1: p_(2) = 0.03 <= 0.05? YES -> Reject BOTH H_PFS and H_OS
Comparison:
Bonferroni: p_OS = 0.03 > 0.025 -> OS NOT significant
Holm: p_(2) = 0.03 > 0.025 -> OS NOT significant
Hochberg: p_(2) = 0.03 <= 0.05 -> OS SIGNIFICANT
Properties:
- More powerful than Holm for positively correlated endpoints
- Valid under independence or positive dependence (Simes' inequality condition)
- Caution: Not valid under arbitrary negative correlation (rare in oncology)
- Particularly useful for PFS + OS testing (strong positive correlation)
R implementation:
p_values <- c(PFS = 0.02, OS = 0.03)
p.adjust(p_values, method = "hochberg")
# [1] 0.03 0.03 → both significant at alpha = 0.05
library(multcomp) # Additional multiple comparison tools
4. Fixed-Sequence (Hierarchical) Testing
Test endpoints in a pre-specified clinically motivated order, each at the full alpha level. Proceed to the next hypothesis only if the current one is rejected.
Procedure:
H_1 -> H_2 -> H_3 -> ... (pre-specified order)
Test H_1 at alpha = 0.05
If rejected: test H_2 at alpha = 0.05
If rejected: test H_3 at alpha = 0.05
...
If NOT rejected: STOP. No subsequent claims.
Properties:
- No alpha penalty — full alpha available at each step
- Most power-efficient when the ordering matches expected significance
- Risk: if primary fails unexpectedly, entire inference chain is lost
- Strongly controls FWER regardless of correlation structure
Common oncology hierarchies:
| Indication | Hierarchy |
|---|---|
| NSCLC targeted therapy | PFS -> OS -> ORR -> PRO |
| Colorectal / Pancreatic | OS -> PFS -> ORR |
| Breast adjuvant | iDFS -> OS -> PRO |
| Myeloma maintenance | PFS -> OS -> MRD negativity rate |
5. Gatekeeping Strategies
Primary endpoint family "gates" the testing of secondary endpoints. Alpha from the primary family is recycled to secondary endpoints only after gating conditions are met.
Serial gatekeeping:
Family 1 (Primary): PFS [test at alpha = 0.025]
IF significant -> Gate opens ->
Family 2 (Secondary): OS, ORR [test with recycled alpha]
IF PFS not significant -> Gate closed; no secondary testing
Parallel gatekeeping (truncated Holm/Hochberg):
Family 1 (Primary): H_PFS_ITT, H_PFS_BIO [both tested; alpha recycled]
IF at least one significant -> Gate partially opens ->
Family 2 (Secondary): OS [receives recycled alpha from rejected primaries]
Truncated Holm procedure for gatekeeping:
- Reserve a fraction gamma of alpha from primary family for secondary family
- Primary tested using Holm at (1 - gamma) * alpha
- Secondary family receives gamma * alpha plus any alpha from rejected primary hypotheses
- FDA recommendation: use truncated Holm or Hochberg to preserve some alpha for secondary endpoints even when not all primaries are rejected
R implementation:
library(gMCP)
library(graphicalMCP)
# Define a serial gatekeeper
# Primary: PFS (weight = 1.0)
# Secondary: OS (weight = 0.0, receives alpha only if PFS rejected)
m <- matrix(c(0, 1, # PFS -> OS transition = 1.0
0, 0), # OS -> (no further transition)
nrow = 2, byrow = TRUE)
weights <- c(1, 0)
g <- graph_create(weights, m)
6. Fallback Method
A hybrid of fixed-sequence and alpha splitting. Allows secondary endpoints to be tested at reduced alpha even if the primary fails.
Procedure:
Allocate alpha: alpha_1 (primary) + alpha_2 (secondary) = alpha_total
Scenario A: H_1 significant (p < alpha_1)
-> Test H_2 at full alpha (alpha_1 + alpha_2 = alpha_total)
Scenario B: H_1 NOT significant (p >= alpha_1)
-> Test H_2 at alpha_2 only (reduced power, but still testable)
Oncology example (PFS primary, OS secondary):
alpha_total = 0.05 (two-sided)
alpha_PFS = 0.04, alpha_OS = 0.01
If PFS p = 0.035 < 0.04:
-> PFS significant; test OS at alpha = 0.05
-> If OS p = 0.03 < 0.05: OS significant
If PFS p = 0.06 >= 0.04:
-> PFS NOT significant; test OS at alpha = 0.01
-> If OS p = 0.008 < 0.01: OS significant (at reduced alpha)
-> Advantage: OS can still be claimed even though PFS failed
When to use: When the secondary endpoint (typically OS) may succeed independently and its claim is clinically important even without the primary. Common in IO combinations where PFS signals may be uncertain but OS benefit is expected.
7. Graphical Approach (Bretz et al.)
The graphical approach, developed by Bretz, Maurer, and Hommel (2009), generalizes all Bonferroni-based procedures into a single visual framework. It is FDA's recommended method for complex multiplicity structures.
Components:
-
Nodes (vertices): Each node represents a hypothesis H_i with an initial weight w_i (fraction of alpha allocated). Weights must sum to 1: sum(w_i) = 1.
-
Transition matrix G: A k x k matrix where g_ij is the fraction of alpha from H_i that transfers to H_j upon rejection of H_i. Row sums must equal 1 (or 0 if no transitions from that node). g_ii = 0 always.
-
Weight vector w: Initial alpha allocation. Alpha for H_i = w_i * alpha_total.
Step-by-step algorithm:
INITIALIZATION:
Set alpha_i = w_i * alpha_total for each hypothesis i = 1, ..., k
Set transition matrix G = [g_ij]
ITERATION:
1. Select any hypothesis H_j with p_j <= alpha_j
(If none exists, STOP — no further rejections)
2. Reject H_j. Remove node j from the graph.
3. UPDATE remaining hypotheses:
For each remaining H_i (i != j):
alpha_i_new = alpha_i + alpha_j * g_ji
(H_i receives the fraction g_ji of H_j's alpha)
4. UPDATE transition matrix for remaining nodes:
For each pair (i, l) where i != j and l != j:
g_il_new = (g_il + g_ij * g_jl) / (1 - g_ij * g_ji)
if denominator = 0, set g_il_new = 0
5. Return to Step 1.
Example: Co-primary PFS + OS with secondary ORR (3-node graph):
Nodes and initial weights:
H_PFS: w = 0.5 (alpha = 0.025)
H_OS: w = 0.5 (alpha = 0.025)
H_ORR: w = 0.0 (alpha = 0.000)
Transition matrix G:
To_PFS To_OS To_ORR
From_PFS [ 0 0.5 0.5 ]
From_OS [ 0.5 0 0.5 ]
From_ORR [ 0.5 0.5 0 ]
Interpretation:
- If PFS rejected: half its alpha goes to OS, half to ORR
- If OS rejected: half its alpha goes to PFS, half to ORR
- ORR can only accumulate alpha from rejected primary endpoints
Scenario: p_PFS = 0.01, p_OS = 0.04, p_ORR = 0.02
Step 1: alpha_PFS = 0.025; p_PFS = 0.01 < 0.025 -> Reject H_PFS
Update: alpha_OS = 0.025 + 0.025*0.5 = 0.0375
alpha_ORR = 0.000 + 0.025*0.5 = 0.0125
Step 2: alpha_OS = 0.0375; p_OS = 0.04 > 0.0375 -> Cannot reject H_OS
alpha_ORR = 0.0125; p_ORR = 0.02 > 0.0125 -> Cannot reject H_ORR
Result: Only PFS claimed
Key properties:
- Subsumes Bonferroni, Holm, fixed-sequence, fallback, and gatekeeping as special cases
- Maintains strong FWER control under any dependence structure (Bonferroni-based)
- Visually intuitive — facilitates communication with clinical teams, regulators, and DMCs
- Iterative algorithm is straightforward to implement and verify
- Can incorporate parametric extensions for additional power (weighted Simes, Dunnett)
Common Oncology Scenarios
Scenario 1: Co-Primary OS + PFS
Used when regulatory approval requires demonstration on both time-to-event endpoints.
Graph: 2-node loop
H_OS (w = 0.5) <---> H_PFS (w = 0.5)
Transition: g_OS->PFS = 1.0, g_PFS->OS = 1.0
Testing:
- Each starts at alpha = 0.025
- If either rejected, full alpha (0.05) passes to the other
- Equivalent to Holm procedure for 2 hypotheses
Power considerations:
- Individual power 90% each, rho(PFS,OS) = 0.5
- Joint power: ~83-85%
- Sample size: ~15-20% inflation over single-primary design
Examples: IO combination trials (pembrolizumab + chemotherapy in NSCLC); some myeloma trials requiring both PFS and OS.
Scenario 2: Biomarker Subgroup Fallback
Testing treatment effect in both ITT and biomarker-positive subgroup, with potential for alpha recycling.
Graph: 3-node with fallback
H_BIO (w = 0.5) -----> H_ITT (w = 0.5)
<-----
H_OS (w = 0.0) <----- both H_BIO and H_ITT
Transition matrix:
To_BIO To_ITT To_OS
From_BIO [ 0 0.5 0.5 ]
From_ITT [ 0.5 0 0.5 ]
From_OS [ 0.5 0.5 0 ]
Strategy:
1. Test PFS in biomarker+ (alpha = 0.025) and PFS in ITT (alpha = 0.025)
2. If biomarker+ significant: recycle alpha to ITT and OS
3. If ITT significant: recycle alpha to biomarker+ and OS
4. OS testable only after at least one PFS hypothesis rejected
Examples: KEYNOTE-024/189 (pembrolizumab, PD-L1 stratified); CheckMate-227 (nivolumab + ipilimumab, TMB subgroup).
Scenario 3: Multiple Doses (Dunnett-Type Comparisons)
Testing multiple dose levels vs. control in dose-finding confirmatory trials.
Graph: 3-node (2 doses + control comparison)
H_high (w = 0.5) -----> H_low (w = 0.5)
<-----
Each node: treatment vs. control comparison
Transition: g_high->low = 1.0, g_low->high = 1.0
Dunnett test alternative:
Exploit known correlation structure (shared control arm)
rho = sqrt(n_ctrl / (n_ctrl + n_trt)) for balanced designs
Dunnett-adjusted critical values are less conservative
than Bonferroni because they account for the correlation
from the shared control group.
R implementation:
library(multcomp)
# Dunnett's test for multiple doses vs. control
fit <- aov(outcome ~ dose_group, data = trial_data)
dunnett <- glht(fit, linfct = mcp(dose_group = "Dunnett"))
summary(dunnett) # Adjusted p-values accounting for correlation
Examples: Dose-ranging Phase 2/3 trials in oncology; adaptive dose selection with multiplicity control.
Scenario 4: Hierarchical PFS -> OS with Interim Analyses
The most common oncology multiplicity structure: fixed-sequence primary PFS -> key secondary OS, with group sequential interim analyses.
Structure:
PFS tested at interim(s) and final using alpha spending function
OS tested only after PFS is significant (fixed-sequence gate)
Alpha management:
Total alpha = 0.05 (two-sided) = 0.025 (one-sided)
PFS alpha spending: O'Brien-Fleming via Lan-DeMets
Interim 1 (50% info): boundary alpha ~ 0.003
Interim 2 (75% info): boundary alpha ~ 0.018
Final (100% info): boundary alpha ~ 0.043 (adjusted)
OS: Full alpha = 0.025 available only after PFS is rejected
OS may also have its own interim analyses with separate spending
Interaction:
- Alpha spent at PFS interims reduces PFS final boundary
- But does NOT reduce OS alpha (OS uses full 0.025 after PFS gate opens)
- OS interim analyses have their own spending function
R implementation:
library(gsDesign)
# PFS group sequential design
pfs_design <- gsDesign(
k = 3, # 2 interims + final
test.type = 2, # Two-sided symmetric
alpha = 0.025, # One-sided
beta = 0.10, # 90% power
sfu = sfLDOF # Lan-DeMets O'Brien-Fleming spending
)
pfs_design$upper$bound # Efficacy boundaries (z-scale)
Graphical Approach: Detailed Mechanics
Weight Matrix
The weight vector w = (w_1, w_2, ..., w_k) specifies the initial alpha allocation:
alpha_i = w_i * alpha_total
Constraints:
- 0 <= w_i <= 1 for all i
- sum(w_i) = 1
- Nodes with w_i = 0 start with no alpha (must receive alpha from other rejections)
Choosing weights:
- Equal weights (w_i = 1/k) for equally important hypotheses
- Asymmetric weights when clinical priority differs: allocate more alpha to the hypothesis most likely to succeed or most clinically important
- Zero-weight nodes for secondary/exploratory hypotheses that can only be tested after primary rejections
Transition Matrix
The transition matrix G is a k x k matrix governing alpha redistribution:
G = [g_ij] where g_ij = fraction of alpha_i passed to H_j when H_i is rejected
Constraints:
- 0 <= g_ij <= 1
- g_ii = 0 (no self-loops)
- sum_j(g_ij) <= 1 for each row (usually = 1; <1 means alpha is "lost")
Common patterns:
| Pattern | Transition structure | Use case |
|---|---|---|
| Fixed-sequence | g_12 = 1, all others 0 | PFS -> OS hierarchy |
| Full loop | g_12 = g_21 = 1 | Co-primary with recycling |
| Fallback | g_12 = 1, g_21 = epsilon | Primary -> secondary with fallback |
| Star | g_1j = 1/(k-1) for all j != 1 | Primary distributes equally |
| Gatekeeping | Block diagonal with cross-family edges | Primary family gates secondary family |
Graph Update Algorithm (Formal)
When hypothesis H_j is rejected, the graph is updated as follows:
For all remaining hypotheses i (i != j):
1. Update alpha:
alpha_i := alpha_i + g_ji * alpha_j
2. Update transitions (for all remaining l != j, l != i):
g_il := (g_il + g_ij * g_jl) / (1 - g_ij * g_ji)
Special case: if g_ij * g_ji = 1, set g_il = 0
3. Remove node j and all edges to/from j
4. Renormalize remaining row sums if needed
This update ensures that alpha flows through removed nodes are preserved. The denominator (1 - g_ij * g_ji) accounts for potential loops between nodes i and j.
R Packages
gMCP (Classical Implementation)
library(gMCP)
# Define a 3-hypothesis graph
hypotheses <- c("H_PFS" = 0.5, "H_OS" = 0.5, "H_ORR" = 0)
transitions <- rbind(
H_PFS = c(0, 0.5, 0.5),
H_OS = c(0.5, 0, 0.5),
H_ORR = c(0.5, 0.5, 0)
)
g <- matrix2graph(transitions, hypotheses)
# Test with observed p-values
pvalues <- c(H_PFS = 0.01, H_OS = 0.04, H_ORR = 0.02)
result <- gMCP(g, pvalues, alpha = 0.05)
result@rejected # Which hypotheses are rejected
graphicalMCP (Modern Interface)
library(graphicalMCP)
# Create graph
g <- graph_create(
hypotheses = c(0.5, 0.5, 0),
transitions = rbind(
c(0, 0.5, 0.5),
c(0.5, 0, 0.5),
c(0.5, 0.5, 0)
)
)
# Sequential (shortcut) test
graph_test_shortcut(
graph = g,
p = c(0.01, 0.04, 0.02),
alpha = 0.05
)
# Closure-based test (more powerful with parametric extensions)
graph_test_closure(
graph = g,
p = c(0.01, 0.04, 0.02),
alpha = 0.05
)
# Power simulation
graph_calculate_power(
graph = g,
alpha = 0.05,
sim_n = 1e5,
power_marginal = c(0.9, 0.8, 0.7)
)
multcomp (Dunnett and General Contrasts)
library(multcomp)
# Dunnett's procedure: multiple doses vs. shared control
fit <- lm(response ~ factor(arm), data = trial_data)
mc <- glht(fit, linfct = mcp(`factor(arm)` = "Dunnett"))
summary(mc) # Adjusted p-values
confint(mc) # Simultaneous confidence intervals
# Comparison of methods:
# Bonferroni: p.adjust(p, method = "bonferroni")
# Holm: p.adjust(p, method = "holm")
# Hochberg: p.adjust(p, method = "hochberg")
# Hommel: p.adjust(p, method = "hommel")
# BH (FDR): p.adjust(p, method = "BH") # exploratory only
SAP Template: Complete Graphical Multiplicity Procedure
9. MULTIPLICITY CONTROL
9.1 Overview
This trial tests [k] hypotheses across [describe: endpoints, populations, doses].
The overall Type I error rate is controlled at alpha = 0.05 (two-sided) using
the graphical approach of Bretz, Maurer, and Hommel (2009). Strong control of
the family-wise error rate (FWER) is maintained under the closed testing principle.
9.2 Hypotheses
The hypotheses to be tested are:
H_1: No treatment effect on [endpoint 1] in [population 1]
H_2: No treatment effect on [endpoint 2] in [population 1]
H_3: No treatment effect on [endpoint 1] in [population 2]
[...]
9.3 Initial Alpha Allocation (Weight Vector)
The initial weights and alpha allocation are:
H_1: w_1 = [X], alpha_1 = [X] * 0.05 = [value]
H_2: w_2 = [X], alpha_2 = [X] * 0.05 = [value]
H_3: w_3 = [X], alpha_3 = [X] * 0.05 = [value]
[...]
Total: sum(w_i) = 1.0
Justification: [Clinical rationale for the allocation, e.g., "H_1 (PFS in
biomarker+ population) receives the largest weight because this population
has the strongest biologic rationale for treatment benefit."]
9.4 Transition Matrix
The transition matrix governing alpha redistribution upon hypothesis rejection is:
H_1 H_2 H_3
H_1 [ 0 g_12 g_13 ]
H_2 [ g_21 0 g_23 ]
H_3 [ g_31 g_32 0 ]
where:
g_12 = [value]: Upon rejection of H_1, [fraction] of its alpha passes to H_2
g_13 = [value]: Upon rejection of H_1, [fraction] of its alpha passes to H_3
[...]
The graph is depicted in Figure [X].
9.5 Testing Algorithm
The graphical testing procedure proceeds as follows:
1. Test each hypothesis H_i at its currently allocated alpha_i.
2. If any H_j has p_j <= alpha_j, reject H_j.
3. Update the graph: redistribute alpha_j to remaining hypotheses
according to the transition matrix, using the update formulas
of Bretz et al. (2009).
4. Repeat until no further hypotheses can be rejected.
The algorithm is implemented using the graphicalMCP R package (version [X])
and independently verified using gMCP.
9.6 Interaction with Interim Analyses
Interim analyses for [endpoint] use the [Lan-DeMets O'Brien-Fleming / Hwang-Shih-DeCani]
alpha spending function. Alpha spent at interim analyses is deducted from the
hypothesis-specific alpha within the graphical procedure.
The interim analysis schedule, information fractions, and spending function
are specified in Section [X] of this SAP.
9.7 Decision Rules
The trial will be declared positive for H_i if and only if H_i is rejected
by the graphical testing procedure at the pre-specified overall alpha = 0.05
(two-sided). Any hypothesis not rejected by the procedure will be reported
as "not statistically significant" regardless of the unadjusted p-value.
Adjusted p-values consistent with the graphical procedure will be computed
using the sequential rejection algorithm and reported in the CSR.
Limitations and Pitfalls
1. Fixed-sequence inflexibility: If the primary endpoint fails unexpectedly (e.g., PFS HR = 0.82, p = 0.08), no formal inference is possible for secondary endpoints — even if OS HR = 0.72, p = 0.005. The fallback or graphical approach mitigates this risk.
2. Post hoc reordering invalidates FWER control: Changing endpoint ordering, alpha allocation, or graph structure after unblinding is not acceptable. FDA: "presenting p-values from descriptive analyses is inappropriate because doing so would imply a statistically rigorous conclusion."
3. Hochberg invalid under negative correlation: The Hochberg step-up procedure requires independence or positive dependence. Negatively correlated endpoints (rare in oncology, but possible with competing risk endpoints) require Holm or Bonferroni instead.
4. Co-primary power loss underestimated: Joint power for two co-primary endpoints with individual power of 85% each is approximately 72% (assuming independence) to 78% (assuming rho = 0.5). Sample size must account for this reduction.
5. Graphical complexity without FDA pre-agreement: Graphs with 4+ nodes and complex transition rules should be discussed with FDA (Type C meeting) before SAP finalization. Complex graphs may be misinterpreted by reviewers without advance discussion.
6. Alpha spending interaction with multiplicity: Interim analyses consume alpha within each hypothesis. If PFS has 3 interim looks using O'Brien-Fleming spending, the cumulative alpha spent at interims must be tracked within the graphical procedure. The remaining PFS alpha at the final analysis is reduced, but OS alpha (gated behind PFS) remains at full 0.025.
7. Subgroup multiplicity: Testing more than 3 pre-specified subgroups without multiplicity adjustment inflates FWER. Only the primary biomarker-defined subgroup should receive formal alpha allocation; additional subgroups are exploratory.
8. FDR used inappropriately in confirmatory settings: FDR (Benjamini-Hochberg) controls the expected proportion of false discoveries, not the probability of any false discovery. This is inappropriate for registrational endpoints where each individual claim must be controlled. FDR is appropriate only for exploratory biomarker analyses.
Backlinks
- Multiple Endpoints and Alpha Allocation
- Overall Survival (OS)
- Progression-Free Survival (PFS)
- Response-Based Endpoints (ORR, CR, DOR)
- DFS and EFS Endpoints
- Group Sequential Designs (GSD)
- ICH E9(R1) Estimand Framework
- Statistical Analysis Methods in Oncology Trials
Source: FDA Guidance for Industry — Multiple Endpoints in Clinical Trials (January 2017, Final); Bretz, Maurer, Hommel (2009) A Graphical Approach to Sequentially Rejective Multiple Test Procedures; ICH E8(R1) General Considerations for Clinical Studies (October 2021, Final) Status: Final guidance (FDA Multiple Endpoints); Final guideline (ICH E8(R1)) Compiled from retrieved FDA chunks + literature on graphical multiplicity methods