Group Sequential Designs (GSD)
Definition
A group sequential design (GSD) is a clinical trial design that allows for pre-specified interim analyses with the possibility of early stopping for efficacy, futility, or safety, while maintaining statistical Type I error control (strong FWER control at the nominal significance level).
"Group sequential designs allow trials to be monitored during conduct with prospectively planned stopping rules for efficacy, futility, or safety. Properly designed and monitored group sequential trials maintain the overall Type I error rate at the nominal level while providing opportunities for early termination if convincing evidence of benefit or harm accumulates." — FDA Guidance for Industry — Adaptive Designs for Clinical Trials of Drugs and Biologics (November 2019, Final)
Key distinction from fixed designs:
- Fixed-design trial: Single analysis at the end with one opportunity to reject the null hypothesis
- Group sequential: Multiple pre-specified interim analyses with stopping boundaries at each stage, allowing early stopping while preserving overall Type I error
Type I error preservation mechanism:
Alpha (significance level) is "spent" across interim and final analyses using a pre-specified alpha spending function (O'Brien-Fleming, Pocock, Lan-DeMets, etc.). The boundaries at each stage are adjusted so that the cumulative probability of any false positive across all stages equals α.
Error Spending Framework
The error spending function is the mathematical foundation of GSD. It allocates the overall Type I error rate across interim and final analyses while maintaining strong FWER control.
O'Brien-Fleming (OBF) Spending Function
The most conservative spending function, recommended as default in oncology. Allocates minimal alpha to early interims and maximum alpha to the final analysis.
Formula (one-sided test):
α(t) = 2 * Φ(-z_α / √t)
where:
t = information fraction (0 < t ≤ 1)
z_α = critical value for overall alpha (z_0.025 = 1.96 for 0.025 one-sided)
Φ = standard normal cumulative distribution function
Interpretation:
α(t) = cumulative alpha "spent" up to information fraction t
Boundary structure (two interims, α = 0.025 one-sided):
| Analysis | Information Fraction | Cumulative α Spent | Z-Boundary | p-Value Threshold | Interpretation |
|---|---|---|---|---|---|
| Interim 1 | 50% (t=0.5) | 0.0035 | 2.75 | < 0.003 | Very restrictive; difficult to stop early |
| Interim 2 | 75% (t=0.75) | 0.0111 | 2.29 | < 0.011 | Still restrictive |
| Final | 100% (t=1.0) | 0.0250 | 1.96 | < 0.025 | Nominal level; only slightly adjusted |
Properties:
- Strongly controls FWER under any dependence structure
- Minimal power loss from alpha spending (< 1% inflation in sample size)
- Unlikely to declare efficacy at interim unless treatment effect is very large
- Recommended when: early stopping for efficacy is unlikely but acceptable if overwhelming benefit appears
When to use:
- Oncology Phase 3 trials with large sample size (less sensitive to power loss)
- Trials where time/cost savings from early stopping are modest
- Conservative approach preferred (regulatory comfort)
Pocock Spending Function
More liberal than O'Brien-Fleming. Allocates approximately equal alpha to each analysis stage, enabling earlier stopping if evidence is strong.
Formula (one-sided):
α(t) = α_total * log(1 + (e - 1) * t) / log(e)
where e ≈ 2.71828 (mathematical constant)
At t = 0.5: α(t) ≈ 0.0108 (more alpha at interim than OBF)
At t = 1.0: α(t) = α_total
Boundary structure (two interims, α = 0.025 one-sided):
| Analysis | Information Fraction | Cumulative α Spent | Z-Boundary | p-Value Threshold |
|---|---|---|---|---|
| Interim 1 | 50% | 0.0108 | 2.31 | < 0.010 |
| Interim 2 | 75% | 0.0191 | 2.06 | < 0.020 |
| Final | 100% | 0.0250 | 1.96 | < 0.025 |
Properties:
- More permissive at interim (easier to declare early efficacy)
- Higher final boundary (~p < 0.029 instead of 0.025 for two stages)
- Power loss ~2–3% (acceptable for large trials)
- Valid for any dependence structure
When to use:
- Trials where early stopping is clinically valuable and resource-saving
- Smaller Phase 2 trials where power loss is less tolerable
- Endpoint expected to show large, rapid treatment effect (immunotherapy with quick OS benefit)
Lan-DeMets (Adaptive) Spending Function
A flexible framework that adapts to unequal information fractions. Mimics a target spending function (OBF, Pocock, etc.) while allowing flexible interim scheduling.
Use case:
Planned: Interim 1 at 50%, Interim 2 at 75%, Final at 100%
Actual: Interim 1 at 45%, Interim 2 at 68%, Final at 100%
(enrollment faster than projected)
Lan-DeMets recalculates boundaries to match the OBF spending curve
at the actual information fractions (0.45, 0.68, 1.0) instead of
planned (0.50, 0.75, 1.0).
Result: Statistical properties (Type I error, power) preserved despite
flexible interim timing.
Advantages:
- Allows interim scheduling to shift based on actual accrual
- Maintains intended spending function properties
- Removes pressure to analyze exactly at pre-planned information fractions
R implementation:
library(gsDesign)
# Lan-DeMets adapter with O'Brien-Fleming spending
design <- gsDesign(
k = 3, # 2 interims + final
test.type = 2, # Two-sided
alpha = 0.025,
beta = 0.20,
sfu = sfLDOF, # Lan-DeMets O'Brien-Fleming
sfupar = 0 # OBF parameter
)
# Actual information fractions differ from planned
actual_info <- c(0.45, 0.68, 1.0) # vs planned c(0.5, 0.75, 1.0)
# Lan-DeMets recalculates boundaries
design_actual <- gsDesign(
k = 3,
test.type = 2,
alpha = 0.025,
beta = 0.20,
sfu = sfLDOF,
informationRates = actual_info # Use actual fractions
)
Kim-DeMets (Power Family) Spending Function
A parametric family indexed by shape parameter ρ (rho) that interpolates between Pocock (ρ=0) and O'Brien-Fleming (ρ=2).
Formula (one-sided):
α(t) = α_total * t^ρ
ρ = 0 → Linear (uniform allocation across stages)
ρ = 1 → Intermediate
ρ = 1.5 → Compromise between Pocock and OBF
ρ = 2 → O'Brien-Fleming-like (conservative)
Example: ρ = 1.5, α_total = 0.025, t = 0.5
α(0.5) = 0.025 * 0.5^1.5 ≈ 0.0088
When to use:
- Fine-tuning the balance between early stopping opportunity and power preservation
- Sensitivity analyses showing robustness to spending function choice
- Trials where "aggressiveness" of stopping rule should be tailored
Stopping Boundaries
Efficacy Boundaries (Upper)
A pre-specified threshold beyond which the trial is declared positive (reject the null hypothesis) and can stop early.
Interpretation:
- If the Z-statistic (test statistic from log-rank test, t-test, etc.) crosses the efficacy boundary at interim, the cumulative evidence is strong enough to declare the treatment effective
- Trial must stop for efficacy (binding rule)
Oncology example (PFS primary, O'Brien-Fleming):
Target: HR = 0.70, 80% power, α = 0.025 (one-sided)
Expected PFS events: 350
Interim 1 (50% = 175 events):
Efficacy boundary: Z ≥ 2.75 (p < 0.003)
Interpretation: VERY strong evidence needed to stop early
If observed Z = 2.8: STOP FOR EFFICACY, declare treatment effective
If observed Z = 2.2: Continue to Interim 2
Interim 2 (75% = 262 events):
Efficacy boundary: Z ≥ 2.29 (p < 0.011)
If observed Z = 2.35: STOP FOR EFFICACY
If observed Z = 1.90: Continue to Final
Final (100% = 350 events):
Efficacy boundary: Z ≥ 1.96 (p < 0.025)
Standard threshold; trial is positive if Z ≥ 1.96
Futility Boundaries (Lower)
A pre-specified threshold below which the trial is considered unlikely to succeed at the final analysis and may be stopped early.
Binding vs. Non-Binding Futility:
| Aspect | Binding Futility | Non-Binding Futility |
|---|---|---|
| Rule if boundary crossed | MUST stop | DSMB recommends stopping; sponsor can continue |
| Type I error | Preserved (properly adjusted boundary) | Preserved |
| Type II error (β) | Increased; cannot recover if boundary crossed prematurely | Protected; can continue if DSMB allows |
| Flexibility | None; trial terminates automatically | DSMB can override; allows clinical judgment |
| Oncology use | Less common; requires confidence in effect size assumption | More common; preferred for uncertainty |
Futility threshold (Conditional Power approach):
At interim, calculate Conditional Power (CP):
CP = Probability of rejecting H₀ at final analysis
given current interim data and assuming continued enrollment
Rule: If CP < threshold (e.g., 20%), futility boundary is crossed
Example (Interim 1, n=175 PFS events):
Observed HR = 0.88 (less favorable than assumed HR = 0.70)
Current Z = 1.34 (p = 0.09; not significant)
CP calculation:
"If true HR = 0.88 and we continue to 350 events,
probability of achieving p < 0.025 at final = 15%"
Decision: CP (15%) < 20% threshold → Futility boundary crossed
DSMB recommendation: Consider stopping (may recommend or allow continuation)
R implementation:
library(gsDesign)
# Two-stage design with O'Brien-Fleming spending and futility
design <- gsDesign(
k = 2,
alpha = 0.025,
beta = 0.20,
sided = 1,
sfu = sfLDOF, # O'Brien-Fleming for efficacy
sfl = sfLDOF # O'Brien-Fleming for futility (binding)
)
# Extract boundaries
design$upper$bound # Efficacy (upper) boundaries
design$lower$bound # Futility (lower) boundaries
Information Fraction vs. Event-Driven Timing
Information Fraction (Recommended)
Define interim analyses based on the proportion of target information (events or sample size) accrued, not calendar time.
For time-to-event endpoints (OS, PFS, DFS):
Information = (actual events accrued) / (planned total events)
Example:
Planned total PFS events: 350
Interim 1: 50% information = 175 events
Schedule analysis when 175 PFS events documented
(no specific calendar date; depends on accrual/follow-up)
Interim 2: 75% information = 262 events
Final: 100% information = 350 events
Advantages:
- Timing is adaptive to event accrual rate (no delay if events accrue faster)
- Statistical properties (power, Type I error) are exact and valid
- Interim analyses benefit from proper follow-up accumulation
- Lan-DeMets can handle unequal information fractions seamlessly
For binary endpoints (ORR, safety):
Information = (patients with response/event) / (planned total)
Example:
Planned responders: 100
Interim 1: 50 responders
Final: 100 responders
Event-Driven Timing (Alternative)
Specify interim analyses at fixed cumulative event counts rather than percentages.
Example (OS primary):
Interim 1: Schedule when 150 OS events have occurred
Interim 2: Schedule when 225 OS events have occurred
Final: Schedule when 300 OS events have occurred
Advantage: Easier to communicate (concrete event numbers)
Disadvantage: Less flexible if event accrual rate differs from assumption
Calendar-Based Timing (Not Recommended for Efficacy)
Schedule interims at fixed calendar times (e.g., "every 6 months").
Example:
Interim 1: Month 12
Interim 2: Month 18
Final: Month 24
Drawback: Event counts are unpredictable; may analyze before expected
information is available or long after desired timing
Use only for: Administrative/safety monitoring (not efficacy analyses)
Sample Size Inflation Factor
Alpha spending increases the required sample size slightly above a fixed-design trial to achieve the same power.
Inflation factor formula (approximation):
n_GSD / n_fixed = 1 + (power loss factor)
Power loss factor depends on:
- Number of interims (k)
- Spending function (OBF < Pocock)
- Information fractions
Examples (α = 0.025, one-sided; 80% power; HR = 0.70):
O'Brien-Fleming:
1 interim at 50%: inflation ≈ 1.01 (1% larger)
2 interims at 50%, 75%: inflation ≈ 1.01 (1% larger)
Pocock:
1 interim at 50%: inflation ≈ 1.03 (3% larger)
2 interims at 50%, 75%: inflation ≈ 1.05 (5% larger)
Oncology impact:
Single-design PFS trial: Need 350 events (n ≈ 400 per arm)
O'Brien-Fleming GSD: Need ~354 events (n ≈ 405 per arm) — minimal increase
Pocock GSD: Need ~368 events (n ≈ 420 per arm) — 5% increase
For large oncology trials (n > 300 per arm), inflation is acceptable.
For small Phase 2 trials (n < 100 per arm), inflation is proportionally larger.
R calculation:
library(gsDesign)
# Fixed design
n_fixed <- nSurvival(
lambda1 = 0.0462, # Control event rate (median OS = 15 months)
lambda2 = 0.0462 * 0.70, # Treatment event rate (HR = 0.70)
Ts = 36, # Follow-up duration (months)
Tr = 36, # Accrual duration
alpha = 0.025,
beta = 0.20,
sided = 1
)
# GSD with O'Brien-Fleming
gsd_design <- gsDesign(
k = 2,
alpha = 0.025,
beta = 0.20,
sided = 1,
sfu = sfLDOF
)
n_gsd <- nSurvival(
lambda1 = 0.0462,
lambda2 = 0.0462 * 0.70,
Ts = 36,
Tr = 36,
alpha = gsd_design$alpha, # Adjusted alpha from GSD
beta = 0.20,
sided = 1
)
inflation_factor <- n_gsd$n / n_fixed$n
cat("Sample size inflation:", round(inflation_factor, 3), "\n")
Conditional Power and Predictive Power
Conditional Power (CP) at Interim
Definition: Probability of rejecting the null hypothesis at the final analysis given the interim data observed and assuming the observed effect size continues.
Formula (simplified, Pocock 1992):
CP = Φ((Z_interim + Z_β) / √(1 - t))
where:
Z_interim = observed Z-statistic at interim
Z_β = critical value for desired power (Z_0.20 = 0.84 for 80% power)
t = information fraction at interim
Oncology example (Interim 1 at 50% information):
Planned: HR = 0.70, 80% power
Interim observed: HR = 0.88, Z = 1.34, p = 0.09
CP calculation:
CP = Φ((1.34 + 0.84) / √(1 - 0.5))
= Φ(2.18 / 0.707)
= Φ(3.08)
≈ 99.9%
Interpretation:
Even with the less favorable HR = 0.88 observed at interim,
if this effect continues to final analysis, we have > 99% chance
of achieving statistical significance.
Conditional Power > 80%: No need to increase sample size
Conditional Power < 50%: Consider futility stopping or sample size increase
Conditional Power 50-80%: Middle ground; DSMB decides
Predictive Power (Bayesian Perspective)
Definition: Probability of future success integrating over uncertainty about the true effect size.
Differs from Conditional Power:
- CP: Assumes observed effect = true effect (frequentist)
- PP: Averages over plausible effect sizes (Bayesian)
Calculation (simplified):
Prior belief: Treatment effect could range from HR = 0.60 to 1.00
Interim observation: HR = 0.88
Posterior distribution (updated belief):
Mean ≈ 0.75 (weighted average of prior + data)
Predictive Power:
PP = P(reject H₀ at final | interim data)
averaged over posterior distribution of true HR
Example: PP ≈ 75% (accounting for uncertainty)
vs.
Conditional Power: CP ≈ 99% (assuming true HR = 0.88)
When PP is useful:
- Accounting for skepticism about interim effect size
- Decision-making when interim effect differs from assumptions
- More conservative (lower) than CP for unexpected interim results
R implementation:
library(rpact)
# Conditional power calculation
cp <- getConditionalPower(
n1 = 175, # Events at interim
meanEffect = 1.34, # Observed Z-statistic
stDev = 1.0, # Standard deviation
n2 = 175, # Remaining events to final
alpha = 0.025,
twoSided = FALSE
)
cat("Conditional Power:", cp$conditionalPower, "\n")
DSMB/IDMC Charter Requirements
An Independent Data Safety and Monitoring Board (DSMB) or Independent Data Monitoring Committee (IDMC) oversees interim analyses in group sequential trials.
Membership
Typical composition (3–5 members):
- Chair: Oncologist or senior biostatistician
- Biostatistician: Expertise in group sequential design, interim analysis
- Clinician: Relevant specialty (medical oncology, pulmonology, etc.)
- Additional members (optional): Patient advocate, pharmacovigilance expert
Independence Requirements
- No financial interest in sponsor or treatment
- No prior/current affiliation with sponsor (except contract service)
- Not a site investigator or steering committee member
- Published track record in relevant field
Charter Must Specify
-
Confidentiality & Firewall:
- DSMB members are unblinded to treatment assignment
- Sponsor staff with access: Only designated biostatistician and medical monitor
- Site investigators remain blinded
- DSMB meeting materials are confidential
-
Meeting Frequency:
- Before trial starts: Charter/procedures review
- At each interim: Unblinded analysis review
- Safety reviews: Quarterly or ongoing depending on risk
-
Decision Rules:
- Binding efficacy: If upper boundary crossed, DSMB must recommend stopping
- Non-binding futility: DSMB recommends but does not mandate stopping
- Safety signals: Define thresholds (e.g., ">5% grade 3+ hepatotoxicity")
-
Voting & Documentation:
- Consensus preferred; document any dissents
- Recommendation to sponsor (continue/stop for efficacy/futility/safety)
- Minutes confidential but maintained for regulatory audit
Oncology Examples from Clinical Trials
Example 1: KEYNOTE-024 (Pembrolizumab, PD-L1+ NSCLC)
Design features:
- Endpoint: PFS (progression-free survival)
- Event-driven trial: 140 PFS events target
- Two interims: 35 and 105 events
- Spending function: O'Brien-Fleming (conservative)
- Outcome: Interim 1 showed HR = 0.50 (p < 0.001); trial stopped early for efficacy
- Regulatory decision: Pembrolizumab approved for 1L PD-L1+ NSCLC
Example 2: CheckMate-227 (Nivolumab + Ipilimumab, Advanced NSCLC)
Design features:
- Co-primary endpoints: PFS and OS
- Interim analyses: 50% and 75% of OS events (270 and 405 events, respectively)
- Futility rule: Conditional Power < 25% at interim → stop (non-binding)
- Adaptive randomization: 1:1:1:1 arms, potential arm elimination at interim
- Outcome: Continued to final; demonstrated OS benefit in TMB-high subgroup
Example 3: ATTRACT-2 (Atezolizumab, First-Line, Chemotherapy Comparison)
Design features:
- Stopping rule: Non-binding futility based on CP < 20% at interim
- Interim scheduled at 50% of target events (PFS or OS)
- DSMB oversight: Quarterly safety reviews + efficacy interim
- Outcome: Interim analysis at 50% events showed favorable trend; continued to final analysis
R Packages for Group Sequential Design
gsDesign (Keaven Anderson, Merck)
Purpose: Group sequential and adaptive designs with O'Brien-Fleming, Pocock, and custom spending functions
Key functions:
library(gsDesign)
# Create a group sequential design
design <- gsDesign(
k = 2, # 2 interims (interim + final)
test.type = 2, # 2-sided test
alpha = 0.025, # Significance level (one-sided)
beta = 0.20, # Type II error (80% power)
sfu = sfLDOF, # O'Brien-Fleming spending for efficacy
sfl = sfLDOF, # O'Brien-Fleming spending for futility
informationRates = c(0.5, 1) # Information fractions
)
# Calculate sample size for time-to-event
n <- nSurvival(
lambda1 = 0.0462, # Control event rate
lambda2 = 0.0462 * 0.65, # Treatment event rate
Ts = 36, # Follow-up duration
Tr = 36, # Accrual duration
alpha = 0.025,
beta = 0.20
)
# Extract boundaries
design$upper$bound # Efficacy boundaries (Z-scale)
design$lower$bound # Futility boundaries
rpact (Wassmer, Brannath)
Purpose: Comprehensive adaptive and group sequential designs with modern workflows
Key functions:
library(rpact)
# Group sequential design
design <- getDesignGroupSequential(
kMax = 2,
alpha = 0.025,
beta = 0.20,
sided = 1,
typeOfDesign = "asOF" # asOF = Asymptotic O'Brien-Fleming
)
# Simulate operating characteristics
sim <- getSimulationGroupSequential(
design = design,
plannedEvents = 280, # Target number of events
maxNumberOfIterations = 10000
)
# Sample size calculation
n <- getSampleSizeGroupSequential(
design = design,
hazardRatio = 0.65,
lambda = 0.08 # Event rate
)
gsDesign2 (Yujie Zhao, Merck)
Purpose: Next-generation GSD package with enhanced flexibility for complex designs, non-proportional hazards, and MaxCombo testing
Key features:
- Seamless Phase 2/3 designs
- MaxCombo and RMST testing
- Custom spending functions
- Non-proportional hazards simulation
Example:
library(gsDesign2)
# GSD with MaxCombo for NPH robustness
design_nph <- gsSurvival(
k = 2,
test.type = 2,
alpha = 0.025,
beta = 0.20,
timing = c(0.5, 1),
sfu = sfLDOF,
hazardRatio = 0.65,
lambda = 0.08,
tau = NULL, # No max follow-up restriction
upper = gs_b, # Upper spending function
lower = gs_b, # Lower spending function
upar = 2, # OBF parameter for upper
lpar = 2 # OBF parameter for lower
)
SAP Template: Complete 2-Interim GSD with O'Brien-Fleming Boundaries
7. GROUP SEQUENTIAL DESIGN AND INTERIM ANALYSES
7.1 Overview
This trial uses a group sequential design (GSD) with two pre-specified interim
analyses for efficacy and futility monitoring. The design controls the overall
Type I error rate at α = 0.025 (one-sided) using O'Brien-Fleming alpha spending
while maintaining strong FWER control.
The interim analyses are scheduled at 50% and 75% of the target information
(PFS events), with the option for additional safety interims if adverse events
warrant.
7.2 Primary Endpoint and Sample Size
**Endpoint:** Progression-Free Survival (PFS)
Time from randomization to radiographically-confirmed disease progression
or death from any cause, whichever occurs first. Assessed per RECIST 1.1.
**Null hypothesis (H₀):** HR(treatment vs. control) = 1.0
**Alternative hypothesis (H₁):** HR = 0.70 (25% reduction in PFS hazard)
**Sample size calculation:**
Target HR = 0.70
Event-driven trial: 350 PFS events required for 80% power
Assuming:
- Control median PFS: 4.2 months (λ = 0.165/month)
- Treatment median PFS: 6.0 months (λ = 0.115/month)
- Accrual: 15 patients/month for 24 months (n = 360 total)
- Follow-up: 36 months
- Dropout: 1% per month
Total required: 360 patients (180 per arm)
**Group sequential adjustment:**
O'Brien-Fleming spending (k=2 interims) inflation ≈ 1.01
→ No material increase in sample size (350 events sufficient)
7.3 Interim Analysis Timing and Information Fractions
### Interim Analysis 1 (50% Information)
**Trigger:** 175 PFS events documented
**Expected timing:** Approximately 18–20 months after first patient enrollment
**Efficacy boundary (O'Brien-Fleming):**
Z ≥ 2.75 (p < 0.003, one-sided)
If crossed: STOP FOR EFFICACY
- Trial declares treatment significantly superior
- Interim results submitted in BLA/NDA
- Site investigators unblinded; patients offered treatment arm
**Futility boundary (non-binding):**
Conditional Power (CP) < 20% at interim
CP calculation: "If observed HR continues to final analysis,
what is probability of p < 0.025?"
If CP < 20%:
- DSMB recommends considering trial termination for futility
- Sponsor may choose to continue with documented justification
- No automatic stopping (non-binding rule allows sponsor discretion)
### Interim Analysis 2 (75% Information)
**Trigger:** 262 PFS events documented
**Expected timing:** Approximately 26–28 months after first patient enrollment
**Efficacy boundary (O'Brien-Fleming):**
Z ≥ 2.29 (p < 0.011, one-sided)
If crossed: STOP FOR EFFICACY
- Remaining patients unblinded
- Submit interim results for regulatory approval
**Futility boundary (non-binding):**
CP < 30% at interim (slightly relaxed from Interim 1)
If CP < 30% and trial has not crossed efficacy boundary:
- DSMB reviews interim results
- Recommends continuing to final or stopping for futility
- Sponsor decision documented in trial file
### Final Analysis (100% Information)
**Trigger:** 350 PFS events documented
**Expected timing:** Approximately 32–36 months after first patient enrollment
**Efficacy boundary:**
Z ≥ 1.96 (p < 0.025, one-sided)
Standard significance threshold; no further interim analyses
7.4 Efficacy Stopping Rule (Binding)
If the Z-statistic (log-rank test, stratified by [randomization factors])
crosses the pre-specified efficacy boundary at any interim analysis, the trial
MUST stop and the treatment is declared effective. The DSMB will immediately
recommend stopping to the sponsor.
The interim p-value will be compared directly to the boundary threshold;
no multiplicity adjustment is applied (already incorporated into the boundary
via alpha spending).
7.5 Futility Stopping Rule (Non-Binding)
At each interim analysis, conditional power (CP) will be calculated:
CP = P(reject H₀ at final analysis | interim data, assuming observed HR persists)
If CP falls below the pre-specified threshold (20% at Interim 1, 30% at
Interim 2), the futility boundary is crossed and the DSMB will convene to
discuss whether the trial should continue.
**DSMB decision framework:**
- **CP ≥ threshold:** CONTINUE trial as planned
- **CP < threshold:** DSMB may recommend stopping but does not mandate it
- If continuing despite low CP: Sponsor documents clinical/scientific rationale
- Examples: Secondary endpoint showing benefit, safety improvement,
emerging biomarker evidence
**Conditional Power Calculation (Example):**
At Interim 1 (175 of 350 events):
Observed: HR = 0.82, Z = 1.70, p = 0.045
Remaining events needed: 175
Current estimate of hazard ratio: 0.82
Variance of log(HR): 1/175 + 1/175 = 0.0114
CP = Φ((Z_observed + Z_β) / √(remaining info fraction))
= Φ((1.70 + 0.84) / √0.5)
= Φ(3.57)
≈ 99.98%
Decision: CP >> 20% threshold → CONTINUE trial
7.6 Safety Interim Analyses (Non-Binding, Continuous)
Safety data (serious adverse events, deaths, grade 3+ toxicity) will be
reviewed at each efficacy interim and continuously between analyses.
**Pre-specified safety stopping rules (trigger emergency DSMB review):**
1. ≥2 treatment-related deaths (grade 5 AEs)
2. Grade 3+ hepatotoxicity: > 5% in treatment vs. < 2% in control
3. Grade 3+ interstitial lung disease: > 2% in treatment vs. < 0.5% in control
4. Any unexpected serious adverse event pattern (signal)
If triggered:
- Emergency DSMB meeting within 48 hours
- Assessment of causality, severity, prevalence
- Recommendation: CONTINUE, MODIFY (protocol amendment), PAUSE (enrollment halt),
or STOP (trial termination)
- Sponsor decision documented with FDA if applicable
7.7 DSMB Operations and Confidentiality
**Composition:**
- Chair: [Name], Oncologist, [Institution]
- Biostatistician: [Name], [Institution]
- Clinician: [Name], [Specialty], [Institution]
**Meetings:**
- Baseline DSMB meeting: [Date] — Charter/procedures review
- Interim 1 meeting: [Date ± 2 weeks after 175 events]
- Interim 2 meeting: [Date ± 2 weeks after 262 events]
- Safety meetings: Quarterly or as triggered
- Final DSMB meeting: After final analysis, before unblinding
**Confidentiality:**
- Interim results reviewed in closed session (DSMB members + unblinded biostatistician)
- Sponsor receives only a recommendation (e.g., "Continue trial as planned")
- Specific p-values, HR estimates not disclosed to blinded sponsor staff
- DSMB minutes confidential; trial team unblinded only if stopping rule crossed
**Firewall:**
- Unblinded statistician: Conducts all interim analyses in locked office
- Blinded sponsor team: Continues enrollment/follow-up without knowledge of interim results
- Data management: Remains blinded; no access to unblinded data
7.8 Statistical Test and Analysis
The primary analysis will use the **log-rank test** (two-sided), stratified by:
- [Stratification factor 1, e.g., ECOG status]
- [Stratification factor 2, e.g., prior therapy status]
**Analysis population:** Intent-to-treat (ITT)
Interim and final p-values will be compared to the pre-specified efficacy
and futility boundaries (Sections 7.3–7.5) without further adjustment
(alpha spending is implicit in the boundary calculation).
**Sensitivity analysis:**
- Per-protocol population (as supportive)
- Log-rank test without stratification (robustness check)
7.9 Type I Error Preservation and Power
The O'Brien-Fleming alpha spending ensures that the overall one-sided Type I
error is controlled at α = 0.025 despite two interim analyses and the possibility
of early stopping. The final efficacy boundary (Z ≥ 1.96) is adjusted to reflect
the alpha already spent at interim stages.
**Power calculation:**
- Analytically: 80% power for HR = 0.70 with 350 events
- GSD adjustment (O'Brien-Fleming): ~1% sample size inflation
- Practical power: 79–80% depending on adherence to assumptions
7.10 Interim Analysis Conduct and Reporting
**Conduct:**
1. When 175 PFS events are confirmed, database will be locked for interim analysis
2. Unblinded biostatistician breaks treatment codes and performs log-rank analysis
3. Results compared to efficacy/futility boundaries
4. DSMB convenes within 2 weeks to review and formulate recommendation
5. Recommendation sent to sponsor (no detailed results unless stopping rule crossed)
6. If continuing: Blinded team resumes enrollment; interim results not disclosed
**Reporting:**
- If early stopping for efficacy: Full interim results reported in BLA/NDA
- If continuing to final: Interim efficacy/futility data included in CSR
only to document DSMB decision and trial integrity
Limitations and Pitfalls
1. Over-interpretation of interim p-values: Interim p-values that don't cross boundaries should NOT be reported as "nearly significant" or used to guide post-hoc decisions. They are not meaningful for inference; only the final analysis provides valid Type I error control if efficacy boundary is not crossed.
2. Unequal information fractions and Lan-DeMets: If interim information fractions differ from planned (e.g., 45% instead of 50%), boundaries must be recalculated using Lan-DeMets; using pre-calculated 50% boundaries is incorrect and invalidates error control.
3. Futility stopping overridden without justification: If DSMB recommends stopping for futility and sponsor continues without documentation, FDA may view final results with skepticism during regulatory review.
4. DSMB firewall breached: If interim results or stopping recommendations leak to blinded staff or site investigators, trial integrity is compromised and Type I error protection is lost.
5. Safety stopping thresholds set post-hoc: Pre-defining safety stopping rules after seeing interim data invalidates the decision. Rules must be specified in SAP a priori.
6. Interim analyses for immature endpoints: Testing OS at interim with < 60% follow-up or < 100 events risks declaring futility/efficacy prematurely. ICH guidance recommends caution with premature interim OS testing.
Backlinks
- Interim Analysis and DSMB Operations
- Multiplicity Control in Oncology Trials
- Adaptive Trial Designs in Oncology
- Simulation-Based Power Analysis
- Statistical Analysis Methods in Oncology Trials
Source: FDA Guidance for Industry — Adaptive Designs for Clinical Trials of Drugs and Biologics (November 2019, Final) Status: Final guidance Compiled from FDA guidance, gsDesign/rpact documentation, and oncology trial examples