Group Sequential Designs (GSD)

Definition

A group sequential design (GSD) is a clinical trial design that allows for pre-specified interim analyses with the possibility of early stopping for efficacy, futility, or safety, while maintaining statistical Type I error control (strong FWER control at the nominal significance level).

"Group sequential designs allow trials to be monitored during conduct with prospectively planned stopping rules for efficacy, futility, or safety. Properly designed and monitored group sequential trials maintain the overall Type I error rate at the nominal level while providing opportunities for early termination if convincing evidence of benefit or harm accumulates." — FDA Guidance for Industry — Adaptive Designs for Clinical Trials of Drugs and Biologics (November 2019, Final)

Key distinction from fixed designs:

Fixed-design trial: Single analysis at the end with one opportunity to reject the null hypothesis
Group sequential: Multiple pre-specified interim analyses with stopping boundaries at each stage, allowing early stopping while preserving overall Type I error

Type I error preservation mechanism:

Alpha (significance level) is "spent" across interim and final analyses using a pre-specified alpha spending function (O'Brien-Fleming, Pocock, Lan-DeMets, etc.). The boundaries at each stage are adjusted so that the cumulative probability of any false positive across all stages equals α.

Error Spending Framework

The error spending function is the mathematical foundation of GSD. It allocates the overall Type I error rate across interim and final analyses while maintaining strong FWER control.

O'Brien-Fleming (OBF) Spending Function

The most conservative spending function, recommended as default in oncology. Allocates minimal alpha to early interims and maximum alpha to the final analysis.

Formula (one-sided test):

α(t) = 2 * Φ(-z_α / √t)

where:
  t = information fraction (0 < t ≤ 1)
  z_α = critical value for overall alpha (z_0.025 = 1.96 for 0.025 one-sided)
  Φ = standard normal cumulative distribution function

Interpretation:
  α(t) = cumulative alpha "spent" up to information fraction t

Boundary structure (two interims, α = 0.025 one-sided):

Analysis	Information Fraction	Cumulative α Spent	Z-Boundary	p-Value Threshold	Interpretation
Interim 1	50% (t=0.5)	0.0035	2.75	< 0.003	Very restrictive; difficult to stop early
Interim 2	75% (t=0.75)	0.0111	2.29	< 0.011	Still restrictive
Final	100% (t=1.0)	0.0250	1.96	< 0.025	Nominal level; only slightly adjusted

Properties:

Strongly controls FWER under any dependence structure
Minimal power loss from alpha spending (< 1% inflation in sample size)
Unlikely to declare efficacy at interim unless treatment effect is very large
Recommended when: early stopping for efficacy is unlikely but acceptable if overwhelming benefit appears

When to use:

Oncology Phase 3 trials with large sample size (less sensitive to power loss)
Trials where time/cost savings from early stopping are modest
Conservative approach preferred (regulatory comfort)

Pocock Spending Function

More liberal than O'Brien-Fleming. Allocates approximately equal alpha to each analysis stage, enabling earlier stopping if evidence is strong.

Formula (one-sided):

α(t) = α_total * log(1 + (e - 1) * t) / log(e)

where e ≈ 2.71828 (mathematical constant)

At t = 0.5: α(t) ≈ 0.0108 (more alpha at interim than OBF)
At t = 1.0: α(t) = α_total

Boundary structure (two interims, α = 0.025 one-sided):

Analysis	Information Fraction	Cumulative α Spent	Z-Boundary	p-Value Threshold
Interim 1	50%	0.0108	2.31	< 0.010
Interim 2	75%	0.0191	2.06	< 0.020
Final	100%	0.0250	1.96	< 0.025

Properties:

More permissive at interim (easier to declare early efficacy)
Higher final boundary (~p < 0.029 instead of 0.025 for two stages)
Power loss ~2–3% (acceptable for large trials)
Valid for any dependence structure

When to use:

Trials where early stopping is clinically valuable and resource-saving
Smaller Phase 2 trials where power loss is less tolerable
Endpoint expected to show large, rapid treatment effect (immunotherapy with quick OS benefit)

Lan-DeMets (Adaptive) Spending Function

A flexible framework that adapts to unequal information fractions. Mimics a target spending function (OBF, Pocock, etc.) while allowing flexible interim scheduling.

Use case:

Planned: Interim 1 at 50%, Interim 2 at 75%, Final at 100%
Actual:  Interim 1 at 45%, Interim 2 at 68%, Final at 100%
         (enrollment faster than projected)

Lan-DeMets recalculates boundaries to match the OBF spending curve
at the actual information fractions (0.45, 0.68, 1.0) instead of
planned (0.50, 0.75, 1.0).

Result: Statistical properties (Type I error, power) preserved despite
flexible interim timing.

Advantages:

Allows interim scheduling to shift based on actual accrual
Maintains intended spending function properties
Removes pressure to analyze exactly at pre-planned information fractions

R implementation:

library(gsDesign)

# Lan-DeMets adapter with O'Brien-Fleming spending
design <- gsDesign(
  k = 3,              # 2 interims + final
  test.type = 2,      # Two-sided
  alpha = 0.025,
  beta = 0.20,
  sfu = sfLDOF,       # Lan-DeMets O'Brien-Fleming
  sfupar = 0          # OBF parameter
)

# Actual information fractions differ from planned
actual_info <- c(0.45, 0.68, 1.0)  # vs planned c(0.5, 0.75, 1.0)

# Lan-DeMets recalculates boundaries
design_actual <- gsDesign(
  k = 3,
  test.type = 2,
  alpha = 0.025,
  beta = 0.20,
  sfu = sfLDOF,
  informationRates = actual_info  # Use actual fractions
)

Kim-DeMets (Power Family) Spending Function

A parametric family indexed by shape parameter ρ (rho) that interpolates between Pocock (ρ=0) and O'Brien-Fleming (ρ=2).

Formula (one-sided):

α(t) = α_total * t^ρ

ρ = 0    → Linear (uniform allocation across stages)
ρ = 1    → Intermediate
ρ = 1.5  → Compromise between Pocock and OBF
ρ = 2    → O'Brien-Fleming-like (conservative)

Example: ρ = 1.5, α_total = 0.025, t = 0.5
  α(0.5) = 0.025 * 0.5^1.5 ≈ 0.0088

When to use:

Fine-tuning the balance between early stopping opportunity and power preservation
Sensitivity analyses showing robustness to spending function choice
Trials where "aggressiveness" of stopping rule should be tailored

Stopping Boundaries

Efficacy Boundaries (Upper)

A pre-specified threshold beyond which the trial is declared positive (reject the null hypothesis) and can stop early.

Interpretation:

If the Z-statistic (test statistic from log-rank test, t-test, etc.) crosses the efficacy boundary at interim, the cumulative evidence is strong enough to declare the treatment effective
Trial must stop for efficacy (binding rule)

Oncology example (PFS primary, O'Brien-Fleming):

Target: HR = 0.70, 80% power, α = 0.025 (one-sided)
Expected PFS events: 350

Interim 1 (50% = 175 events):
  Efficacy boundary: Z ≥ 2.75 (p < 0.003)
  Interpretation: VERY strong evidence needed to stop early
  If observed Z = 2.8: STOP FOR EFFICACY, declare treatment effective
  If observed Z = 2.2: Continue to Interim 2

Interim 2 (75% = 262 events):
  Efficacy boundary: Z ≥ 2.29 (p < 0.011)
  If observed Z = 2.35: STOP FOR EFFICACY
  If observed Z = 1.90: Continue to Final

Final (100% = 350 events):
  Efficacy boundary: Z ≥ 1.96 (p < 0.025)
  Standard threshold; trial is positive if Z ≥ 1.96

Futility Boundaries (Lower)

A pre-specified threshold below which the trial is considered unlikely to succeed at the final analysis and may be stopped early.

Binding vs. Non-Binding Futility:

Aspect	Binding Futility	Non-Binding Futility
Rule if boundary crossed	MUST stop	DSMB recommends stopping; sponsor can continue
Type I error	Preserved (properly adjusted boundary)	Preserved
Type II error (β)	Increased; cannot recover if boundary crossed prematurely	Protected; can continue if DSMB allows
Flexibility	None; trial terminates automatically	DSMB can override; allows clinical judgment
Oncology use	Less common; requires confidence in effect size assumption	More common; preferred for uncertainty

Futility threshold (Conditional Power approach):

At interim, calculate Conditional Power (CP):
  CP = Probability of rejecting H₀ at final analysis 
       given current interim data and assuming continued enrollment

Rule: If CP < threshold (e.g., 20%), futility boundary is crossed

Example (Interim 1, n=175 PFS events):
  Observed HR = 0.88 (less favorable than assumed HR = 0.70)
  Current Z = 1.34 (p = 0.09; not significant)

  CP calculation:
    "If true HR = 0.88 and we continue to 350 events,
     probability of achieving p < 0.025 at final = 15%"

  Decision: CP (15%) < 20% threshold → Futility boundary crossed
  DSMB recommendation: Consider stopping (may recommend or allow continuation)

R implementation:

library(gsDesign)

# Two-stage design with O'Brien-Fleming spending and futility
design <- gsDesign(
  k = 2,
  alpha = 0.025,
  beta = 0.20,
  sided = 1,
  sfu = sfLDOF,   # O'Brien-Fleming for efficacy
  sfl = sfLDOF    # O'Brien-Fleming for futility (binding)
)

# Extract boundaries
design$upper$bound  # Efficacy (upper) boundaries
design$lower$bound  # Futility (lower) boundaries

Information Fraction vs. Event-Driven Timing

Information Fraction (Recommended)

Define interim analyses based on the proportion of target information (events or sample size) accrued, not calendar time.

For time-to-event endpoints (OS, PFS, DFS):

Information = (actual events accrued) / (planned total events)

Example:
  Planned total PFS events: 350

  Interim 1: 50% information = 175 events
             Schedule analysis when 175 PFS events documented
             (no specific calendar date; depends on accrual/follow-up)

  Interim 2: 75% information = 262 events

  Final:    100% information = 350 events

Advantages:

Timing is adaptive to event accrual rate (no delay if events accrue faster)
Statistical properties (power, Type I error) are exact and valid
Interim analyses benefit from proper follow-up accumulation
Lan-DeMets can handle unequal information fractions seamlessly

For binary endpoints (ORR, safety):

Information = (patients with response/event) / (planned total)

Example:
  Planned responders: 100
  Interim 1: 50 responders
  Final: 100 responders

Event-Driven Timing (Alternative)

Specify interim analyses at fixed cumulative event counts rather than percentages.

Example (OS primary):
  Interim 1: Schedule when 150 OS events have occurred
  Interim 2: Schedule when 225 OS events have occurred
  Final:     Schedule when 300 OS events have occurred

Advantage: Easier to communicate (concrete event numbers)
Disadvantage: Less flexible if event accrual rate differs from assumption

Calendar-Based Timing (Not Recommended for Efficacy)

Schedule interims at fixed calendar times (e.g., "every 6 months").

Example:
  Interim 1: Month 12
  Interim 2: Month 18
  Final:     Month 24

Drawback: Event counts are unpredictable; may analyze before expected
          information is available or long after desired timing

Use only for: Administrative/safety monitoring (not efficacy analyses)

Sample Size Inflation Factor

Alpha spending increases the required sample size slightly above a fixed-design trial to achieve the same power.

Inflation factor formula (approximation):

n_GSD / n_fixed = 1 + (power loss factor)

Power loss factor depends on:
  - Number of interims (k)
  - Spending function (OBF < Pocock)
  - Information fractions

Examples (α = 0.025, one-sided; 80% power; HR = 0.70):

O'Brien-Fleming:
  1 interim at 50%: inflation ≈ 1.01 (1% larger)
  2 interims at 50%, 75%: inflation ≈ 1.01 (1% larger)

Pocock:
  1 interim at 50%: inflation ≈ 1.03 (3% larger)
  2 interims at 50%, 75%: inflation ≈ 1.05 (5% larger)

Oncology impact:

Single-design PFS trial: Need 350 events (n ≈ 400 per arm)
O'Brien-Fleming GSD: Need ~354 events (n ≈ 405 per arm) — minimal increase
Pocock GSD: Need ~368 events (n ≈ 420 per arm) — 5% increase

For large oncology trials (n > 300 per arm), inflation is acceptable.
For small Phase 2 trials (n < 100 per arm), inflation is proportionally larger.

R calculation:

library(gsDesign)

# Fixed design
n_fixed <- nSurvival(
  lambda1 = 0.0462,   # Control event rate (median OS = 15 months)
  lambda2 = 0.0462 * 0.70,  # Treatment event rate (HR = 0.70)
  Ts = 36,            # Follow-up duration (months)
  Tr = 36,            # Accrual duration
  alpha = 0.025,
  beta = 0.20,
  sided = 1
)

# GSD with O'Brien-Fleming
gsd_design <- gsDesign(
  k = 2,
  alpha = 0.025,
  beta = 0.20,
  sided = 1,
  sfu = sfLDOF
)

n_gsd <- nSurvival(
  lambda1 = 0.0462,
  lambda2 = 0.0462 * 0.70,
  Ts = 36,
  Tr = 36,
  alpha = gsd_design$alpha,     # Adjusted alpha from GSD
  beta = 0.20,
  sided = 1
)

inflation_factor <- n_gsd$n / n_fixed$n
cat("Sample size inflation:", round(inflation_factor, 3), "\n")

Conditional Power and Predictive Power

Conditional Power (CP) at Interim

Definition: Probability of rejecting the null hypothesis at the final analysis given the interim data observed and assuming the observed effect size continues.

Formula (simplified, Pocock 1992):

CP = Φ((Z_interim + Z_β) / √(1 - t))

where:
  Z_interim = observed Z-statistic at interim
  Z_β = critical value for desired power (Z_0.20 = 0.84 for 80% power)
  t = information fraction at interim

Oncology example (Interim 1 at 50% information):

Planned: HR = 0.70, 80% power
Interim observed: HR = 0.88, Z = 1.34, p = 0.09

CP calculation:
  CP = Φ((1.34 + 0.84) / √(1 - 0.5))
     = Φ(2.18 / 0.707)
     = Φ(3.08)
     ≈ 99.9%

Interpretation:
  Even with the less favorable HR = 0.88 observed at interim,
  if this effect continues to final analysis, we have > 99% chance
  of achieving statistical significance.

  Conditional Power > 80%: No need to increase sample size
  Conditional Power < 50%: Consider futility stopping or sample size increase
  Conditional Power 50-80%: Middle ground; DSMB decides

Predictive Power (Bayesian Perspective)

Definition: Probability of future success integrating over uncertainty about the true effect size.

Differs from Conditional Power:

CP: Assumes observed effect = true effect (frequentist)
PP: Averages over plausible effect sizes (Bayesian)

Calculation (simplified):

Prior belief: Treatment effect could range from HR = 0.60 to 1.00
Interim observation: HR = 0.88

Posterior distribution (updated belief):
  Mean ≈ 0.75 (weighted average of prior + data)

Predictive Power:
  PP = P(reject H₀ at final | interim data)
       averaged over posterior distribution of true HR

Example: PP ≈ 75% (accounting for uncertainty)

vs.

Conditional Power: CP ≈ 99% (assuming true HR = 0.88)

When PP is useful:

Accounting for skepticism about interim effect size
Decision-making when interim effect differs from assumptions
More conservative (lower) than CP for unexpected interim results

R implementation:

library(rpact)

# Conditional power calculation
cp <- getConditionalPower(
  n1 = 175,           # Events at interim
  meanEffect = 1.34,  # Observed Z-statistic
  stDev = 1.0,        # Standard deviation
  n2 = 175,           # Remaining events to final
  alpha = 0.025,
  twoSided = FALSE
)

cat("Conditional Power:", cp$conditionalPower, "\n")

DSMB/IDMC Charter Requirements

An Independent Data Safety and Monitoring Board (DSMB) or Independent Data Monitoring Committee (IDMC) oversees interim analyses in group sequential trials.

Membership

Typical composition (3–5 members):

Chair: Oncologist or senior biostatistician
Biostatistician: Expertise in group sequential design, interim analysis
Clinician: Relevant specialty (medical oncology, pulmonology, etc.)
Additional members (optional): Patient advocate, pharmacovigilance expert

Independence Requirements

No financial interest in sponsor or treatment
No prior/current affiliation with sponsor (except contract service)
Not a site investigator or steering committee member
Published track record in relevant field

Charter Must Specify

Confidentiality & Firewall:
- DSMB members are unblinded to treatment assignment
- Sponsor staff with access: Only designated biostatistician and medical monitor
- Site investigators remain blinded
- DSMB meeting materials are confidential
Meeting Frequency:
- Before trial starts: Charter/procedures review
- At each interim: Unblinded analysis review
- Safety reviews: Quarterly or ongoing depending on risk
Decision Rules:
- Binding efficacy: If upper boundary crossed, DSMB must recommend stopping
- Non-binding futility: DSMB recommends but does not mandate stopping
- Safety signals: Define thresholds (e.g., ">5% grade 3+ hepatotoxicity")
Voting & Documentation:
- Consensus preferred; document any dissents
- Recommendation to sponsor (continue/stop for efficacy/futility/safety)
- Minutes confidential but maintained for regulatory audit

Oncology Examples from Clinical Trials

Example 1: KEYNOTE-024 (Pembrolizumab, PD-L1+ NSCLC)

Design features:

Endpoint: PFS (progression-free survival)
Event-driven trial: 140 PFS events target
Two interims: 35 and 105 events
Spending function: O'Brien-Fleming (conservative)
Outcome: Interim 1 showed HR = 0.50 (p < 0.001); trial stopped early for efficacy
Regulatory decision: Pembrolizumab approved for 1L PD-L1+ NSCLC

Example 2: CheckMate-227 (Nivolumab + Ipilimumab, Advanced NSCLC)

Design features:

Co-primary endpoints: PFS and OS
Interim analyses: 50% and 75% of OS events (270 and 405 events, respectively)
Futility rule: Conditional Power < 25% at interim → stop (non-binding)
Adaptive randomization: 1:1:1:1 arms, potential arm elimination at interim
Outcome: Continued to final; demonstrated OS benefit in TMB-high subgroup

Example 3: ATTRACT-2 (Atezolizumab, First-Line, Chemotherapy Comparison)

Design features:

Stopping rule: Non-binding futility based on CP < 20% at interim
Interim scheduled at 50% of target events (PFS or OS)
DSMB oversight: Quarterly safety reviews + efficacy interim
Outcome: Interim analysis at 50% events showed favorable trend; continued to final analysis

R Packages for Group Sequential Design

`gsDesign` (Keaven Anderson, Merck)

Purpose: Group sequential and adaptive designs with O'Brien-Fleming, Pocock, and custom spending functions

Key functions:

library(gsDesign)

# Create a group sequential design
design <- gsDesign(
  k = 2,                          # 2 interims (interim + final)
  test.type = 2,                  # 2-sided test
  alpha = 0.025,                  # Significance level (one-sided)
  beta = 0.20,                    # Type II error (80% power)
  sfu = sfLDOF,                   # O'Brien-Fleming spending for efficacy
  sfl = sfLDOF,                   # O'Brien-Fleming spending for futility
  informationRates = c(0.5, 1)    # Information fractions
)

# Calculate sample size for time-to-event
n <- nSurvival(
  lambda1 = 0.0462,               # Control event rate
  lambda2 = 0.0462 * 0.65,        # Treatment event rate
  Ts = 36,                        # Follow-up duration
  Tr = 36,                        # Accrual duration
  alpha = 0.025,
  beta = 0.20
)

# Extract boundaries
design$upper$bound   # Efficacy boundaries (Z-scale)
design$lower$bound   # Futility boundaries

`rpact` (Wassmer, Brannath)

Purpose: Comprehensive adaptive and group sequential designs with modern workflows

Key functions:

library(rpact)

# Group sequential design
design <- getDesignGroupSequential(
  kMax = 2,
  alpha = 0.025,
  beta = 0.20,
  sided = 1,
  typeOfDesign = "asOF"           # asOF = Asymptotic O'Brien-Fleming
)

# Simulate operating characteristics
sim <- getSimulationGroupSequential(
  design = design,
  plannedEvents = 280,            # Target number of events
  maxNumberOfIterations = 10000
)

# Sample size calculation
n <- getSampleSizeGroupSequential(
  design = design,
  hazardRatio = 0.65,
  lambda = 0.08                   # Event rate
)

`gsDesign2` (Yujie Zhao, Merck)

Purpose: Next-generation GSD package with enhanced flexibility for complex designs, non-proportional hazards, and MaxCombo testing

Key features:

Seamless Phase 2/3 designs
MaxCombo and RMST testing
Custom spending functions
Non-proportional hazards simulation

Example:

library(gsDesign2)

# GSD with MaxCombo for NPH robustness
design_nph <- gsSurvival(
  k = 2,
  test.type = 2,
  alpha = 0.025,
  beta = 0.20,
  timing = c(0.5, 1),
  sfu = sfLDOF,
  hazardRatio = 0.65,
  lambda = 0.08,
  tau = NULL,                     # No max follow-up restriction
  upper = gs_b,                   # Upper spending function
  lower = gs_b,                   # Lower spending function
  upar = 2,                       # OBF parameter for upper
  lpar = 2                        # OBF parameter for lower
)

SAP Template: Complete 2-Interim GSD with O'Brien-Fleming Boundaries

7. GROUP SEQUENTIAL DESIGN AND INTERIM ANALYSES

7.1 Overview

This trial uses a group sequential design (GSD) with two pre-specified interim 
analyses for efficacy and futility monitoring. The design controls the overall 
Type I error rate at α = 0.025 (one-sided) using O'Brien-Fleming alpha spending 
while maintaining strong FWER control.

The interim analyses are scheduled at 50% and 75% of the target information 
(PFS events), with the option for additional safety interims if adverse events 
warrant.

7.2 Primary Endpoint and Sample Size

**Endpoint:** Progression-Free Survival (PFS)
  Time from randomization to radiographically-confirmed disease progression 
  or death from any cause, whichever occurs first. Assessed per RECIST 1.1.

**Null hypothesis (H₀):** HR(treatment vs. control) = 1.0
**Alternative hypothesis (H₁):** HR = 0.70 (25% reduction in PFS hazard)

**Sample size calculation:**
  Target HR = 0.70
  Event-driven trial: 350 PFS events required for 80% power

  Assuming:
    - Control median PFS: 4.2 months (λ = 0.165/month)
    - Treatment median PFS: 6.0 months (λ = 0.115/month)
    - Accrual: 15 patients/month for 24 months (n = 360 total)
    - Follow-up: 36 months
    - Dropout: 1% per month

  Total required: 360 patients (180 per arm)

**Group sequential adjustment:**
  O'Brien-Fleming spending (k=2 interims) inflation ≈ 1.01
  → No material increase in sample size (350 events sufficient)

7.3 Interim Analysis Timing and Information Fractions

### Interim Analysis 1 (50% Information)

**Trigger:** 175 PFS events documented
**Expected timing:** Approximately 18–20 months after first patient enrollment

**Efficacy boundary (O'Brien-Fleming):**
  Z ≥ 2.75 (p < 0.003, one-sided)

  If crossed: STOP FOR EFFICACY
    - Trial declares treatment significantly superior
    - Interim results submitted in BLA/NDA
    - Site investigators unblinded; patients offered treatment arm

**Futility boundary (non-binding):**
  Conditional Power (CP) < 20% at interim

  CP calculation: "If observed HR continues to final analysis,
                   what is probability of p < 0.025?"

  If CP < 20%:
    - DSMB recommends considering trial termination for futility
    - Sponsor may choose to continue with documented justification
    - No automatic stopping (non-binding rule allows sponsor discretion)

### Interim Analysis 2 (75% Information)

**Trigger:** 262 PFS events documented
**Expected timing:** Approximately 26–28 months after first patient enrollment

**Efficacy boundary (O'Brien-Fleming):**
  Z ≥ 2.29 (p < 0.011, one-sided)

  If crossed: STOP FOR EFFICACY
    - Remaining patients unblinded
    - Submit interim results for regulatory approval

**Futility boundary (non-binding):**
  CP < 30% at interim (slightly relaxed from Interim 1)

  If CP < 30% and trial has not crossed efficacy boundary:
    - DSMB reviews interim results
    - Recommends continuing to final or stopping for futility
    - Sponsor decision documented in trial file

### Final Analysis (100% Information)

**Trigger:** 350 PFS events documented
**Expected timing:** Approximately 32–36 months after first patient enrollment

**Efficacy boundary:**
  Z ≥ 1.96 (p < 0.025, one-sided)

  Standard significance threshold; no further interim analyses

7.4 Efficacy Stopping Rule (Binding)

If the Z-statistic (log-rank test, stratified by [randomization factors]) 
crosses the pre-specified efficacy boundary at any interim analysis, the trial 
MUST stop and the treatment is declared effective. The DSMB will immediately 
recommend stopping to the sponsor.

The interim p-value will be compared directly to the boundary threshold; 
no multiplicity adjustment is applied (already incorporated into the boundary 
via alpha spending).

7.5 Futility Stopping Rule (Non-Binding)

At each interim analysis, conditional power (CP) will be calculated:

  CP = P(reject H₀ at final analysis | interim data, assuming observed HR persists)

If CP falls below the pre-specified threshold (20% at Interim 1, 30% at 
Interim 2), the futility boundary is crossed and the DSMB will convene to 
discuss whether the trial should continue.

**DSMB decision framework:**
  - **CP ≥ threshold:** CONTINUE trial as planned

  - **CP < threshold:** DSMB may recommend stopping but does not mandate it
    - If continuing despite low CP: Sponsor documents clinical/scientific rationale
    - Examples: Secondary endpoint showing benefit, safety improvement, 
               emerging biomarker evidence

**Conditional Power Calculation (Example):**
  At Interim 1 (175 of 350 events):
    Observed: HR = 0.82, Z = 1.70, p = 0.045

    Remaining events needed: 175
    Current estimate of hazard ratio: 0.82
    Variance of log(HR): 1/175 + 1/175 = 0.0114

    CP = Φ((Z_observed + Z_β) / √(remaining info fraction))
       = Φ((1.70 + 0.84) / √0.5)
       = Φ(3.57)
       ≈ 99.98%

  Decision: CP >> 20% threshold → CONTINUE trial

7.6 Safety Interim Analyses (Non-Binding, Continuous)

Safety data (serious adverse events, deaths, grade 3+ toxicity) will be 
reviewed at each efficacy interim and continuously between analyses.

**Pre-specified safety stopping rules (trigger emergency DSMB review):**
  1. ≥2 treatment-related deaths (grade 5 AEs)
  2. Grade 3+ hepatotoxicity: > 5% in treatment vs. < 2% in control
  3. Grade 3+ interstitial lung disease: > 2% in treatment vs. < 0.5% in control
  4. Any unexpected serious adverse event pattern (signal)

If triggered:
  - Emergency DSMB meeting within 48 hours
  - Assessment of causality, severity, prevalence
  - Recommendation: CONTINUE, MODIFY (protocol amendment), PAUSE (enrollment halt), 
                    or STOP (trial termination)
  - Sponsor decision documented with FDA if applicable

7.7 DSMB Operations and Confidentiality

**Composition:**
  - Chair: [Name], Oncologist, [Institution]
  - Biostatistician: [Name], [Institution]
  - Clinician: [Name], [Specialty], [Institution]

**Meetings:**
  - Baseline DSMB meeting: [Date] — Charter/procedures review
  - Interim 1 meeting: [Date ± 2 weeks after 175 events]
  - Interim 2 meeting: [Date ± 2 weeks after 262 events]
  - Safety meetings: Quarterly or as triggered
  - Final DSMB meeting: After final analysis, before unblinding

**Confidentiality:**
  - Interim results reviewed in closed session (DSMB members + unblinded biostatistician)
  - Sponsor receives only a recommendation (e.g., "Continue trial as planned")
  - Specific p-values, HR estimates not disclosed to blinded sponsor staff
  - DSMB minutes confidential; trial team unblinded only if stopping rule crossed

**Firewall:**
  - Unblinded statistician: Conducts all interim analyses in locked office
  - Blinded sponsor team: Continues enrollment/follow-up without knowledge of interim results
  - Data management: Remains blinded; no access to unblinded data

7.8 Statistical Test and Analysis

The primary analysis will use the **log-rank test** (two-sided), stratified by:
  - [Stratification factor 1, e.g., ECOG status]
  - [Stratification factor 2, e.g., prior therapy status]

**Analysis population:** Intent-to-treat (ITT)

Interim and final p-values will be compared to the pre-specified efficacy 
and futility boundaries (Sections 7.3–7.5) without further adjustment 
(alpha spending is implicit in the boundary calculation).

**Sensitivity analysis:**
  - Per-protocol population (as supportive)
  - Log-rank test without stratification (robustness check)

7.9 Type I Error Preservation and Power

The O'Brien-Fleming alpha spending ensures that the overall one-sided Type I 
error is controlled at α = 0.025 despite two interim analyses and the possibility 
of early stopping. The final efficacy boundary (Z ≥ 1.96) is adjusted to reflect 
the alpha already spent at interim stages.

**Power calculation:**
  - Analytically: 80% power for HR = 0.70 with 350 events
  - GSD adjustment (O'Brien-Fleming): ~1% sample size inflation
  - Practical power: 79–80% depending on adherence to assumptions

7.10 Interim Analysis Conduct and Reporting

**Conduct:**
  1. When 175 PFS events are confirmed, database will be locked for interim analysis
  2. Unblinded biostatistician breaks treatment codes and performs log-rank analysis
  3. Results compared to efficacy/futility boundaries
  4. DSMB convenes within 2 weeks to review and formulate recommendation
  5. Recommendation sent to sponsor (no detailed results unless stopping rule crossed)
  6. If continuing: Blinded team resumes enrollment; interim results not disclosed

**Reporting:**
  - If early stopping for efficacy: Full interim results reported in BLA/NDA
  - If continuing to final: Interim efficacy/futility data included in CSR 
                            only to document DSMB decision and trial integrity

Limitations and Pitfalls

1. Over-interpretation of interim p-values: Interim p-values that don't cross boundaries should NOT be reported as "nearly significant" or used to guide post-hoc decisions. They are not meaningful for inference; only the final analysis provides valid Type I error control if efficacy boundary is not crossed.

2. Unequal information fractions and Lan-DeMets: If interim information fractions differ from planned (e.g., 45% instead of 50%), boundaries must be recalculated using Lan-DeMets; using pre-calculated 50% boundaries is incorrect and invalidates error control.

3. Futility stopping overridden without justification: If DSMB recommends stopping for futility and sponsor continues without documentation, FDA may view final results with skepticism during regulatory review.

4. DSMB firewall breached: If interim results or stopping recommendations leak to blinded staff or site investigators, trial integrity is compromised and Type I error protection is lost.

5. Safety stopping thresholds set post-hoc: Pre-defining safety stopping rules after seeing interim data invalidates the decision. Rules must be specified in SAP a priori.

6. Interim analyses for immature endpoints: Testing OS at interim with < 60% follow-up or < 100 events risks declaring futility/efficacy prematurely. ICH guidance recommends caution with premature interim OS testing.

Backlinks

Source: FDA Guidance for Industry — Adaptive Designs for Clinical Trials of Drugs and Biologics (November 2019, Final) Status: Final guidance Compiled from FDA guidance, gsDesign/rpact documentation, and oncology trial examples

Group Sequential Designs (GSD)

Definition

Error Spending Framework

O'Brien-Fleming (OBF) Spending Function

Pocock Spending Function

Lan-DeMets (Adaptive) Spending Function

Kim-DeMets (Power Family) Spending Function

Stopping Boundaries

Efficacy Boundaries (Upper)

Futility Boundaries (Lower)

Information Fraction vs. Event-Driven Timing

Information Fraction (Recommended)

Event-Driven Timing (Alternative)

Calendar-Based Timing (Not Recommended for Efficacy)

Sample Size Inflation Factor

Conditional Power and Predictive Power

Conditional Power (CP) at Interim

Predictive Power (Bayesian Perspective)

DSMB/IDMC Charter Requirements

Membership

Independence Requirements

Charter Must Specify

Oncology Examples from Clinical Trials

Example 1: KEYNOTE-024 (Pembrolizumab, PD-L1+ NSCLC)

Example 2: CheckMate-227 (Nivolumab + Ipilimumab, Advanced NSCLC)

Example 3: ATTRACT-2 (Atezolizumab, First-Line, Chemotherapy Comparison)

R Packages for Group Sequential Design

gsDesign (Keaven Anderson, Merck)

rpact (Wassmer, Brannath)

gsDesign2 (Yujie Zhao, Merck)

SAP Template: Complete 2-Interim GSD with O'Brien-Fleming Boundaries

Limitations and Pitfalls

Backlinks

`gsDesign` (Keaven Anderson, Merck)

`rpact` (Wassmer, Brannath)

`gsDesign2` (Yujie Zhao, Merck)