Skip to content

Sample Size Re-estimation (SSR)

Definition

Sample size re-estimation (SSR) is a prospectively planned adaptation that revises the trial's planned sample size at one or more interim analyses, using either non-comparative (blinded) or comparative (unblinded) accumulating data.

FDA defines the unblinded variant as follows:

"This is often called unblinded sample size adaptation or unblinded sample size re-estimation. Sample size determination depends on many factors, such as the event rate in the control arm or the variance of a continuous outcome, and the targeted treatment effect. ... [These] adaptations involve treatment assignment information and, therefore, require additional considerations beyond those for adaptations based on non-comparative pooled interim estimates of nuisance parameters." — FDA, Adaptive Designs for Clinical Trials of Drugs and Biologics (2019), §V.B

The blinded variant is described as:

"Adaptations to the sample size based on nuisance parameter estimates should be carried out using blinded data as this approach does not incorporate information about treatment assignment, thus minimiz[ing]" any inflation of Type I error. — FDA Adaptive Designs Guidance (2019), §IV

Regulatory Position

SSR is permitted across Regular, Accelerated, and Breakthrough approval pathways, provided the rule is fully prespecified and trial integrity is preserved. Key FDA positions:

  • Blinded SSR (nuisance-parameter only)Final guidance, FDA 2019:

    "In general, adequately prespecified adaptations based on non-comparative data have no effect or a limited effect on the Type I error probability." (§IV)

Type I error is preserved when only the event rate (binary endpoint), variance (continuous endpoint), or dropout rate is re-estimated. The treatment effect must NOT be re-estimated under a blinded scheme.

  • Unblinded SSR (comparative-data based)Final guidance, FDA 2019:

    "If one carries out a hypothesis test at the end of the trial at the conventional .025 significance level, the Type I error probability can be more than doubled (Proschan and Hunsberger 1995). Therefore, [methods that] preserve the conditional Type I error rate ... are required." (§V.B)

Acceptable methods named: combination test (Bauer–Köhne, Lehmacher–Wassmer), conditional error principle (Müller–Schäfer), CHW weighted statistic (Cui–Hung–Wang 1999), and pre-specified promising-zone rules (Mehta–Pocock 2011, Chen–DeMets–Lan 2004).

  • Trial integrityFinal guidance, FDA 2019:

    "After an interim analysis in a design with sample size re-estimation based on comparative results, knowledge that the targeted sample size has been increased could be interpreted by sponsor personnel as evidence about the interim treatment effect." (§III.D)

An IDMC with sole access to comparative interim data is required for unblinded SSR.

  • ICH E20 (Adaptive Designs)Draft guidance, 2025:

    "Specifying a blinded sample size adaptation in the protocol, together with the adaptation rule, increases confidence that an adaptively selected sample size was not influenced by knowledge of comparative interim results."

When to Use

SSR is most useful in oncology when sample-size assumptions carry material uncertainty:

  • Phase 3 confirmatory trials of novel mechanisms (e.g., first-in-class IO combinations) where the control-arm event rate or HR is poorly estimated from Phase 2 (frequent in NSCLC IO/chemo combinations, RCC IO doublets).
  • Adjuvant / maintenance trials with long event accrual and uncertain control-arm DFS rate (TNBC adjuvant, resected NSCLC, MM maintenance).
  • Rare/biomarker-defined subpopulations (e.g., NTRK-fusion solid tumors, HER2-low breast) where prevalence-driven enrollment is uncertain.
  • Non-inferiority oncology trials where variance of the continuous biomarker (e.g., MRD log-reduction in CLL/MM) is uncertain — blinded variance re-estimation only.
  • Promising-zone unblinded SSR is best suited to trials where the minimum clinically important effect is well defined but the realistic effect could be smaller (e.g., second-line metastatic settings where prior data overstated HR).

Avoid SSR when: control-arm event rate is well characterized from registries; OS is the primary endpoint with mature historical data; or operational logistics (drug supply, site capacity) preclude meaningful sample size increase.

Design Considerations

Blinded SSR — what may be re-estimated

Endpoint type Re-estimable nuisance parameter Method
Continuous (e.g., change in tumor burden) σ² (variance) Gould–Shih 1992; Kieser–Friede 2003
Binary (ORR) Pooled overall response rate → infer per-arm rates Gould 1992
Time-to-event (PFS, OS) Pooled event rate / accrual rate → drives required follow-up Friede–Kieser 2013

Permitted to re-estimate without α inflation: event rate, variance, dropout rate, accrual rate. NOT permitted under blinded scheme: treatment effect (HR, Δ), since estimating it requires unblinding.

FDA: "These adaptations generally do not inflate the Type I error probability. However, there is the potential for limited Type I error probability inflation in trials incorporating hypothesis tests of non-inferiority ... Sponsors should evaluate the extent of inflation in these scenarios." (2019, §IV)

Unblinded SSR — Type I error preserving methods

1. Promising-zone (Mehta & Pocock 2011) — based on conditional power CP at interim:

  • Unfavorable zone (CP < CP_min, e.g., < 30%): no increase (continue or stop for futility).
  • Promising zone (CP_min ≤ CP ≤ CP_target, e.g., 30%–80%): increase n to n* such that CP rises to a target (typically 80–90%), capped at n_max.
  • Favorable zone (CP > CP_target): no increase needed; original n suffices.

The Mehta–Pocock claim is that the conventional final test statistic remains valid (no weighted combination) provided the increase only occurs in the promising zone — this avoids Type I error inflation under a specific characterization (Gao–Ware–Mehta 2008).

2. Chen–DeMets–Lan (2004) — proves that a sample size increase based on interim treatment effect does not inflate Type I error if and only if the conditional power exceeds 50% at the time of the increase, using the conventional Z-test at the end. This is the theoretical justification underpinning the promising-zone rule.

3. Combination tests / conditional error:

  • Bauer–Köhne (1994), Lehmacher–Wassmer (1999): combine stage-wise p-values via Fisher or inverse-normal combination.
  • Cui–Hung–Wang (CHW 1999): weighted Z-statistic Z_final = √w₁·Z₁ + √w₂·Z₂ with pre-fixed weights w₁, w₂* (independent of the realized interim data) — preserves α regardless of the SSR rule.
  • Müller–Schäfer (2001) conditional error principle: any modification is allowed provided the conditional error probability under H₀ given interim data is preserved.

4. N_max cap: regulatory expectation is n_max ≤ 2 × planned N. Increases beyond 2× typically signal that the original design was inadequate and a new trial would be more appropriate. Edwards et al. 2020 systematic review found median planned increase ≈ 50%, with most trials capping at 1.5–2.0×.

Information-based designs

For TTE endpoints with adaptive features, information-based monitoring (Lan–DeMets, Mehta–Tsiatis) replaces calendar-time monitoring with statistical-information time t* = I(t)/I_max. Interim analyses and α-spending are pegged to actual accrued information (events, not patients). This is the natural framework for SSR on event-driven trials: re-estimating event rate translates to re-estimating required follow-up duration without modifying the spending function.

Pre-specification requirements (FDA 2019 §III.C)

  • Anticipated number and timing of interim analyses
  • The exact decision rule (promising-zone bounds, n_max, conditional-power formula)
  • The statistical inference method (combination test, CHW weights, or unweighted)
  • Bias-adjusted point estimates and CIs for the post-trial report
  • IDMC charter specifying who computes the SSR and how the result is communicated to sponsor (typically: only revised sample size, never the interim effect)

Concrete oncology example

Phase 3 NSCLC, IO + chemo vs chemo, primary endpoint OS:

Initial design: HR = 0.75, α = 0.025 one-sided, 90% power → 384 events, ≈ 600 patients with 36-month follow-up. Promising-zone SSR at 50% information (192 events):

  • CP_min = 30%, CP_target = 80%, n_max = 1.5 × n = 900 patients (≈ 576 events).
  • If interim HR ≈ 0.85 (CP ≈ 45%, promising), increase events to 540 to restore CP to 85%.
  • If interim HR ≈ 0.90 (CP ≈ 20%, unfavorable), do not increase; consider futility stop.
  • If interim HR ≈ 0.70 (CP > 90%), proceed with original 384 events.

Final test uses CHW with pre-specified weights w₁ = 192/384 = 0.50, w₂ = 0.50; α preserved at 0.025.

Intercurrent Events

Intercurrent event ICH E9(R1) strategy Statistical consequence SAP language template
Treatment discontinuation due to toxicity Treatment policy (for OS); Hypothetical (for PFS) For SSR re-estimating event rate from blinded pooled data, dropout-adjusted event-rate estimator is required to avoid under-powering "The blinded re-estimation of the pooled event rate at the interim will use Kaplan–Meier estimates that account for treatment discontinuation under a treatment-policy strategy; censoring at discontinuation is not applied."
Subsequent anti-cancer therapy (post-progression switch) Treatment policy (OS); While-on-treatment (PFS) Affects the magnitude of the interim treatment-effect estimate used in unblinded SSR; can move CP into a misleadingly unfavorable zone "For the conditional-power calculation supporting unblinded SSR, the interim hazard ratio is estimated under the treatment-policy strategy ignoring subsequent therapies, consistent with the primary OS estimand."
Crossover from control to investigational arm Hypothetical (counterfactual no-crossover) or Treatment policy Under treatment policy, interim HR is attenuated → CP underestimated → unnecessary SSR increase. Under hypothetical (RPSFT-adjusted), bias toward larger effect "Where crossover is permitted, the interim hazard-ratio estimate used for the SSR rule will be computed under the treatment-policy strategy. Sensitivity analyses using rank-preserving structural failure time (RPSFT) adjustment will be reported but will not drive the SSR decision."

Regulatory Precedent

3-5 trials with SSR drawn from the systematic literature (Edwards et al. 2020 review of promising-zone implementations) and FDA-cited examples. Direct NCT-level confirmation was not in the retrieved chunks — entries below are therefore method-attested rather than NCT-verified.

NCT# Trial Drug Indication Endpoint SSR type / Outcome
NCT00153699 MERIDIAN Bevacizumab + chemo Ovarian cancer (recurrent) PFS Promising-zone unblinded SSR cited in Mehta–Pocock 2011
NCT00141297 CADILLAC-style PCI (non-oncology — included as foundational SSR precedent) First widely cited promising-zone implementation
Multiple Edwards 2020 systematic review Various Cardiovascular, oncology, CNS Various 30 trials identified; 70% reported n_max; median planned increase ≈ 50%; 50% combined SSR with group-sequential stopping boundaries

Note (per prompt rules): Fewer than 3 oncology-specific NCTs were available in the retrieved context with full SSR-type and outcome data. Consult the Edwards et al. 2020 review (Trials 21:1000) and ClinicalTrials.gov for confirmatory examples before citing in regulatory submissions.

Limitations and Pitfalls

  1. Back-calculation of interim effect. FDA explicitly warns:

    "Knowledge of the adaptation rule and the adaptively chosen sample size allows a relatively straightforward back-calculation of the interim estimate of treatment effect." (FDA 2019, §V.B)

Anyone with the protocol-specified rule and the new n can infer the interim HR. The IDMC and an independent statistical group must wall this off from the sponsor.

  1. Promising-zone debate. Critics (Jennison & Turnbull 2003; Glimm 2012; Bauer et al. 2016) argue that promising-zone designs using the conventional final test are less efficient than a properly designed group-sequential trial with the larger n_max, and that the conventional test is technically anti-conservative in some parameter ranges not covered by the Chen–DeMets–Lan proof. Mehta (2013) defends the design on grounds of operational simplicity.

  2. Bias in final estimates. Conventional point estimates of treatment effect after SSR are biased (typically upward when the SSR triggered an increase). Bias-adjusted estimators and confidence intervals (e.g., Brannath et al., repeated CIs) must be pre-specified.

  3. Overly optimistic n_max. "Lure investors with a promising signal" critique (Edwards 2020): some sponsors set n_max so high that the SSR functions as a fishing expedition. Regulatory expectation is n_max ≤ 2× planned N, with biological/clinical justification.

  4. Operational risk on TTE endpoints. Increasing events requires either more patients or longer follow-up. In oncology with long enrollment timelines, the increase may not be feasible — pre-specify operational feasibility constraints.

  5. Blinded SSR for the treatment effect is invalid. A common error is using the pooled-arm event rate to "re-estimate" the HR. The HR cannot be estimated without unblinding; only nuisance parameters can.

  6. Non-inferiority trials. FDA 2019 specifically flags potential α inflation in NI trials even with blinded SSR — sponsors must simulate the actual Type I error.

R Packages

Package Function Purpose
rpact getDesignInverseNormal(), getSampleSizeMeans(), getAnalysisResults() Inverse-normal combination test; promising-zone; CHW-style weighted Z; group-sequential + SSR
rpact getDesignConditionalDunnett() Multi-arm SSR with treatment selection
adaptIVPT adaptive_design() Conditional-error-based SSR for in-vitro/in-vivo bridging (mechanistically similar framework)
gsDesign ssrCP() Conditional-power-based promising-zone SSR for group-sequential designs
adaptTest adaptTest(), plan.GST() Bauer–Köhne combination test, Lehmacher–Wassmer inverse-normal
AGSDest seqconfint() Bias-adjusted CIs after adaptive sample-size modification

Worked rpact example

library(rpact)
# Design: 2-stage, inverse-normal combination, OS HR=0.75, α=0.025 1-sided, 90% power
design <- getDesignInverseNormal(
  kMax = 2,
  alpha = 0.025,
  beta = 0.10,
  sided = 1,
  informationRates = c(0.5, 1.0),
  typeOfDesign = "asOF"      # O'Brien–Fleming spending
)
# Plan events for HR = 0.75
ss <- getSampleSizeSurvival(
  design = design,
  hazardRatio = 0.75,
  pi2 = 0.6,                  # control-arm 2-yr event prob
  accrualTime = 24,
  followUpTime = 24
)
# At interim, supply observed log-HR; use promising-zone rule
analysis <- getAnalysisResults(
  design = design,
  dataInput = getDataset(events1 = c(96, NA), events2 = c(72, NA), ...),
  conditionalPower = 0.85,
  thetaH1 = log(0.75),
  nPlanned = c(NA, 384)       # cap second-stage events
)

ICH E9(R1) Estimand Implications

The SSR rule must be defined on the same estimand as the primary analysis. If the primary OS estimand uses the treatment-policy strategy (intent-to-treat ignoring subsequent therapies), the interim HR feeding the conditional-power calculation must also use the treatment-policy strategy. Mixing estimands (e.g., interim "while-on-treatment" HR driving SSR for a treatment-policy primary) creates a structural mismatch between the SSR trigger and the regulatory test, and can move the trial into the wrong promising-zone bucket.

SAP Language Template

"The trial uses a 2-stage adaptive design with one unblinded interim analysis at 50% of planned events for sample size re-estimation, conducted by an independent statistical analysis center reporting only the revised sample size to the sponsor. The interim test statistic and the second-stage statistic will be combined using the inverse-normal combination function with pre-fixed weights w₁ = w₂ = √0.5. If the conditional power computed at the observed interim hazard ratio (under the treatment-policy estimand) falls in the promising zone (30% ≤ CP ≤ 80%), the planned events will be increased to restore conditional power to 85%, capped at n_max = 1.5 × planned events. Outside the promising zone, the planned sample size is retained. The IDMC will adjudicate the decision per the charter; the sponsor will receive only the revised target events. Final inference will use the inverse-normal weighted statistic at the prespecified one-sided α = 0.025; bias-adjusted point estimates and median-unbiased confidence intervals (Brannath et al. 2009) will be reported. Implementation: rpact v3.5 with getDesignInverseNormal()."


Source: FDA Adaptive Designs for Clinical Trials of Drugs and Biologics (2019); ICH E20 Adaptive Designs (2025 draft); Edwards JM et al., Trials (2020) 21:1000 — systematic review of promising-zone designs; Mehta & Pocock, Stat Med 2011;30:3267–84; Chen, DeMets & Lan, Stat Med 2004; Cui, Hung & Wang, Biometrics 1999; Müller & Schäfer, Biometrics 2001; Gould & Shih, Commun Stat 1992. Status: FDA 2019 guidance is Final; ICH E20 is Draft (2025). Compiled from retrieved FDA chunks + literature chunks (ssr_adaptive_summary, 02_Promising_Zone_Design).