Sensitivity Analyses for Estimands
Definition
"Inferences based on a particular estimand should be robust to limitations in the data and deviations from the assumptions used in the statistical model for the main estimator. This robustness is evaluated through a sensitivity analysis." — ICH E9(R1) Addendum, §A.5.2.1 (Final, November 2019)
Sensitivity analysis in the estimand framework serves a specific purpose: to verify that the primary estimator's conclusions are not artefacts of its underlying assumptions. This is distinct from supplementary analyses, which explore additional insights without the goal of verifying robustness.
Three-tier analysis hierarchy (ICH E9(R1) §A.5):
- Main estimator — the pre-specified primary analysis for the primary estimand
- Sensitivity analysis — one or more analyses targeting the same estimand, varying the assumptions of the main estimator to test robustness
- Supplementary analysis — analyses that provide additional context or explore secondary estimands; generally given lower weight in regulatory assessment
Key distinction: A sensitivity analysis uses the same estimand with different analytic assumptions. An analysis targeting a different estimand (e.g., hypothetical sensitivity when treatment policy is primary) is technically a supplementary analysis exploring an alternative estimand — not a sensitivity analysis in the strict ICH E9(R1) sense. In oncology practice, the terms are used loosely, but both types are pre-specified and expected by regulators.
ICH E9(R1) on tipping point:
"This might be characterised as the extent of departures from assumptions that change the interpretation of the results in terms of their statistical or clinical significance (e.g. tipping point analysis)." (ICH E9(R1), §A.5.2.1)
Regulatory Position
FDA (post-E9(R1) adoption, 2021): Sensitivity analyses for all primary estimands must be:
- Pre-specified in the SAP (not post-hoc)
- Focused on the same estimand (same estimand, different assumptions) OR pre-specified as alternative estimand analyses
- Structured: "altering multiple aspects of the main analysis simultaneously can make it challenging to identify which assumptions, if any, are responsible for any potential differences seen" — therefore, sensitivity analyses should vary one assumption at a time (ICH E9(R1), §A.5.2.2)
FDA OS 2025 draft (implicit): Sensitivity analyses for OS as a safety endpoint require:
- RMST comparison (for non-proportional hazards)
- Landmark analysis at pre-specified time points
- Subgroup consistency evaluation
Common FDA feedback patterns:
- Missing treatment policy sensitivity analysis when hypothetical is primary (PFS)
- Missing RPSFT/IPCW sensitivity when crossover occurred
- No tipping point analysis when informative censoring suspected
- Inadequate specification of missing data handling assumptions
Status: ICH E9(R1) = Final (November 2019); FDA adoption = May 2021
Structured Approach to Sensitivity Analysis (ICH E9(R1) §A.5.2.2)
Key Principle — Vary One Assumption at a Time:
ICH E9(R1) emphasizes that sensitivity analyses should adopt a "structured approach, specifying the changes in assumptions that underlie the alternative analyses, rather than simply comparing the results of different analyses." This means:
- Identify key assumptions of the main estimator
- For each assumption, pre-specify one or more alternative analyses that test robustness to plausible departures
- Document explicitly which assumption is being varied in each sensitivity analysis
- Order by importance (not by expected favorable result)
Example: OS Treatment Policy Main Estimator
- Assumption 1 (Censoring): Administrative censoring at LKDA is non-informative
-
Sensitivity: Alternative censoring rule (censor at last contact)
-
Assumption 2 (Proportional hazards): Cox model PH assumption holds
-
Sensitivity: RMST at 24 months (does not assume PH)
-
Assumption 3 (Differential follow-up): Follow-up is similar between arms
- Sensitivity: Landmark analysis at 12, 24, 36 months
SAP Language:
"Pre-specified sensitivity analyses for the primary OS estimand are structured to test robustness to departures from main estimator assumptions: (1) Censoring rule sensitivity — alternative censoring rule (last contact vs. LKDA); (2) Non-proportional hazards sensitivity — RMST at 24 months; (3) Differential follow-up sensitivity — landmark analysis at 12, 24, 36 months. Each sensitivity analysis independently varies a single assumption of the main Cox model."
Sensitivity Analysis by Estimand Strategy
Treatment Policy Primary Estimand
What needs testing: The main assumption of treatment policy is that the ITT analysis correctly captures the treatment assignment effect. Threats to this include:
- Informative censoring (patients censored for reasons correlated with prognosis)
- Non-proportional hazards (Cox model assumption)
- Differential follow-up between arms
- Major protocol deviations
Standard sensitivity hierarchy for treatment policy primary (OS or PFS):
| Analysis | What it tests | Implementation |
|---|---|---|
| Alternative censoring rule | Whether censoring decisions affect conclusions | Censor at last contact vs. last adequate tumor assessment; per-protocol vs. administrative censoring date |
| RMST analysis | Non-proportional hazards robustness | Restricted mean survival time at pre-specified time horizon (e.g., 24 months) instead of HR |
| Landmark analysis | Survival difference at fixed time points | OS/PFS rates at 12, 24, 36 months |
| Per-protocol set analysis | Sensitivity to major protocol deviations | Exclude patients with major eligibility violations; should give consistent results if ITT is valid |
| Subgroup consistency | Whether treatment effect is consistent | Check consistency across key prognostic subgroups (ECOG PS, prior treatment lines, biomarker status) |
SAP language:
"Pre-specified sensitivity analyses for the primary treatment policy OS estimand include: (1) Alternative censoring rule analysis censoring patients at administrative data cutoff rather than last known survival contact; (2) RMST analysis at 24 months to assess robustness under non-proportional hazards; (3) Landmark OS rates at 12, 24, and 36 months; (4) Subgroup consistency analysis by baseline ECOG performance status and prior treatment lines."
Hypothetical Primary Estimand (PFS with censoring at new therapy)
What needs testing: The main assumption of the hypothetical strategy (censor at new therapy) is that censoring at new therapy initiation is non-informative — that patients who initiate new therapy have the same prognosis (conditional on covariates) as those who do not. This is often violated if:
- Patients with shorter PFS (poorer prognosis) are more likely to receive new therapy earlier
- New therapy itself is effective and prolongs post-PFS survival, making censoring at new therapy informative
Standard sensitivity hierarchy for hypothetical PFS primary:
| Analysis | What it tests | Implementation |
|---|---|---|
| Treatment policy sensitivity | Whether censoring at new therapy materially biases the estimate | Remove censoring at new therapy; allow progression/death regardless of subsequent therapy |
| Alternative censoring window | Whether the timing of the censoring rule matters | Compare: censor at new therapy start vs. censor at last tumor assessment before new therapy start vs. censor 30 days before new therapy |
| Tipping point analysis | How many additional events in control arm would negate significance | Vary the assumed non-informative censoring assumption; determine how many censored patients would need to have had events to change the conclusion |
| Missing tumor assessment sensitivity | Robustness to imputed progression dates | Analyze per alternative rules for handling missing assessment windows (e.g., treat all missing assessments as progression) |
SAP language:
"The primary PFS analysis (hypothetical strategy) will be supported by the following pre-specified sensitivity analyses: (1) Treatment policy sensitivity: PFS will be re-analyzed without censoring at initiation of subsequent anti-cancer therapy; all patients analyzed to first documented progression or death. (2) Alternative censoring window: patients initiating new therapy will be censored at [date of new therapy start] rather than last adequate tumor assessment; (3) Tipping point analysis: the minimum number of events in the control arm among censored patients that would render the primary PFS analysis non-significant at the two-sided 0.05 level will be calculated."
Composite Variable Primary Estimand (PFS with death as event)
What needs testing: The composite strategy's main assumption is that death is informative about tumor progression — i.e., patients who die without documented progression would likely have progressed imminently had they not died. If this is not true (e.g., patients die of unrelated causes in a trial with elderly population), the composite overestimates the progression event rate.
Standard sensitivity hierarchy for composite PFS:
| Analysis | What it tests | Implementation |
|---|---|---|
| Competing risk analysis | Whether death "competing" with progression affects interpretation | Fine-Gray subdistribution hazard model; cumulative incidence function instead of 1-KM |
| PFS analysis censoring deaths | Whether deaths are driving the PFS result | Censor at death date (time-to-progression only); note: this analysis should be clearly labeled as a sensitivity, not primary |
| Cause-specific hazard | Whether the treatment effect on progression component differs from the death component | Separate cause-specific hazard models for progression and death |
| Death classification sensitivity | Robustness to classification of on-study vs. off-study deaths | Re-run analysis with all deaths (including post-trial) vs. on-study only |
SAP language:
"A pre-specified sensitivity analysis will apply a competing risk approach to PFS. The cumulative incidence function for disease progression will be estimated using the Fine-Gray subdistribution hazard model, treating death as a competing risk. This analysis addresses whether the composite PFS effect is primarily driven by the progression or survival component."
DFS Primary Estimand (Adjuvant Settings)
Standard sensitivity hierarchy for composite DFS:
| Analysis | What it tests | Implementation |
|---|---|---|
| Competing risk (Fine-Gray) | Non-cancer deaths as competing risk for recurrence | Fine-Gray model treating non-cancer death as competing event |
| Cancer-related deaths only | Whether all-cause death assumption drives result | Restrict to cancer deaths + recurrences as events; censor non-cancer deaths (note: FDA prefers all-cause, but this is a sensitivity) |
| Alternative event list | Whether secondary events (second primary cancer, contralateral breast) drive result | Sensitivity excluding specific event types |
| Per-protocol sensitivity | Protocol deviation impact | Exclude patients with major pre-recurrence protocol violations |
Sensitivity Analysis for Missing Data
Separate from IE strategy sensitivity: Even after pre-specifying IE strategies, missing data in the collected observations requires separate sensitivity analysis. The distinction:
- IE strategy handles events that are conceptually resolved (e.g., treatment discontinuation — handled as treatment policy, composite, or hypothetical)
- Missing data handles observations that should have been collected but were not (e.g., patient missed a tumor assessment, patient withdrew from study without an event)
ICH E9(R1) §A.5.1: "Even after defining estimands that address intercurrent events in an appropriate manner and making efforts to collect the data required for estimation, some data may still be missing."
Missing Data Sensitivity Approaches by Assumption
| Assumption | Description | Sensitivity Label | Oncology Context |
|---|---|---|---|
| Missing at random (MAR) | Probability of missingness depends only on observed data | Main analysis assumption in most mixed models | Used for missing tumor assessments when dropout is explained by observed covariates |
| Missing not at random — control-based imputation (CBI) | Missing observations assumed to follow control arm trajectory | Conservative sensitivity for active treatment arm | Assumes missing patients in treatment arm had control-like outcomes |
| Missing not at random — reference-based imputation (RBI) | Missing observations follow reference group increment pattern | Tipping point sensitivity | Assumes missing treatment patients had control-like changes |
| Missing not at random — worst-case imputation | All missing values assumed to be worst possible outcome | Extreme sensitivity / tipping point | All missing progressions are "events"; all missing survivals are "deaths" |
| Pattern mixture model | Sensitivity to different missing data patterns (e.g., patients who discontinue vs. who remain on treatment) | ICH E9(R1)-recommended structured approach | Separate imputations for early-discontinuation vs. late-discontinuation cohorts |
| Tipping point (delta adjustment) | Magnitude of departure from MAR needed to change conclusion | "How much MNAR would negate significance?" | Estimates treatment effect under varying degrees of informative missingness |
Reference-Based Imputation Methods for MNAR Sensitivity
When missingness is suspected to be not at random (MNAR) — e.g., sicker patients more likely to drop out and have missing efficacy measurements — reference-based imputation (RBI) offers a structured approach to sensitivity analysis.
When Reference-Based Imputation Is Appropriate
Reference-based imputation is most appropriate when two conditions hold:
- Outcome data is continuous (e.g., biomarker change, symptom score, lung function) — RBI is less straightforward for event-based endpoints (OS, PFS) where complete-case analysis or event-based imputation is more common
- Dropout creates missing data that should be imputed under MNAR assumption, with the reference group defining the missing data distribution
Reference-Based Imputation Methods
Copy Increments in Reference (CIR):
- Mechanics: Assumes the individual's increment profile (change per visit) after dropout equals the reference group's observed increment profile. This preserves the trend observed in patients who remained on control therapy.
- Example: If a patient drops out at Week 8, their Week 12 and 16 values are imputed as: imputed value = baseline + (patient's change from baseline to Week 8) + (average change in reference group from Week 8 to Week 12).
- Assumption: Patients with missing data have the same change trajectory as the reference (control) group
- Use case: Conservative sensitivity when treatment patients drop out early
Copy Reference (CR):
- Mechanics: Directly imputes missing observations as if the patient had the reference group's mean value at that visit.
- Assumption: Missing treatment patients had control-like outcomes (most conservative)
- Use case: Extreme MNAR sensitivity
Jump-to-Reference (J2R):
- Mechanics: Imputes missing values as a blend: starting with the patient's last observation, then gradually transitioning to the reference group's mean over subsequent visits.
- Formula: Imputed value = patient's last observed value + (δ × (reference mean − patient's trajectory))
- Assumption: Missing patients revert to control group trajectory, but gradual transition is more plausible than abrupt
- Use case: Moderate MNAR sensitivity, bridges between treatment and control assumptions
Tipping Point Framework for Reference-Based Imputation
Methodology:
- Baseline: Conduct primary analysis under MAR (e.g., standard multiple imputation or mixed model for repeated measures, MMRM)
-
Parameterize departure: Define a sensitivity parameter δ (delta) representing the magnitude of departure from MAR
- δ = 0: No MNAR — same as MAR (primary analysis)
- δ = −1, −2, −3, ...: Increasing MNAR — missing patients have progressively worse outcomes
-
Conduct sensitivity analyses: For each δ value, re-impute missing data under the MNAR assumption and re-run primary analysis
- Identify tipping point: Report the minimum δ at which the primary conclusion changes (p-value crosses 0.05)
Interpretation guide:
- If tipping point requires δ = −1 to negate significance → primary result is fragile; modest MNAR negates result
- If tipping point requires δ = −3 or worse → primary result is robust; implausible level of missingness needed to negate
- If tipping point is clinically implausible (requires very severe assumptions) → confidence in primary result increases
SAP Language Template (Reference-Based Imputation Sensitivity):
"For the primary continuous efficacy endpoint, the main analysis will use mixed model for repeated measures (MMRM) under the missing at random (MAR) assumption. As a pre-specified sensitivity analysis addressing the robustness to missing not at random (MNAR) assumptions, a tipping point analysis will be conducted using reference-based imputation methods:
(1) Copy Increments in Reference (CIR): Missing values for treatment-arm patients will be imputed using the average increment (change from visit to visit) observed in the control-arm patients.
(2) Jump-to-Reference (J2R): Missing treatment-arm values will be imputed using a gradual transition from the patient's last observed value (at discontinuation) to the control group's mean, over [pre-specified] visits.
(3) Tipping Point Delta Parameter: For each method, a sensitivity parameter δ will be varied (δ = 0, −1, −2, −3) to represent departures from MAR. The tipping point — the minimum δ at which the primary efficacy conclusion reverses — will be reported. If the tipping point δ is clinically implausible (e.g., δ < −3), the primary result is considered robust to reasonable departures from MAR."
Stress Tests for Estimand Robustness
Stress tests extend sensitivity analysis beyond single-assumption variation to examine estimand robustness under multiple simultaneous plausible departures. While ICH E9(R1) emphasizes varying one assumption at a time, stress tests provide a complementary assessment of estimand resilience.
When to Use Stress Tests
Stress tests are particularly useful for:
- High-risk estimands: Hypothetical strategies with strong structural assumptions (RPSFT, principal stratum)
- Complex trial settings: Multiple IEs, high missing data, substantial post-baseline imbalances
- Regulatory concerns: Anticipated questions about robustness of critical efficacy claims
- Non-proportional hazards: When multiple assumptions (PH, censoring, baseline imbalances) may simultaneously be violated
Stress Test Scenarios for OS Estimand
Scenario 1: Informative Censoring + Protocol Deviations
- Assumption 1 departure: Censoring is informative (20% of censored patients would have died in year 2)
- Assumption 2 departure: 10% of control-arm patients violated major protocol criteria
- Analysis: RMST + per-protocol sensitivity + tipping point on censoring
Scenario 2: Non-Proportional Hazards + Crossover
- Assumption 1 departure: Cox model PH assumption violated (late emerging treatment effect)
- Assumption 2 departure: 50% crossover in control arm (RPSFT adjustment applies)
- Analysis: RMST (instead of HR) + RPSFT-adjusted HR + Fleming-Harrington weighted log-rank
Scenario 3: Treatment Discontinuation (AE) + Differential Follow-up
- Assumption 1 departure: Treatment-arm patients discontinue earlier due to AEs (treatment policy is primary, but may miss delayed benefit)
- Assumption 2 departure: Control-arm patients have better follow-up adherence (differential follow-up bias)
- Analysis: Landmark at multiple timepoints (12, 24, 36 months) + RMST
SAP Language Template (Stress Test):
"To evaluate robustness of the primary OS estimate under multiple plausible simultaneous departures from statistical assumptions, a pre-specified stress test will be conducted:
Stress Test Scenario: Combination of (1) informative censoring sensitivity (assume 20% of censored control-arm patients would have died in year 2) AND (2) per-protocol sensitivity (exclude major protocol violators, n=[X]). Under this stress test scenario: - Primary analysis: RMST at 24 months (instead of Cox HR, to address potential non-PH) - Supplementary: Stratified log-rank by baseline ECOG (to address differential follow-up) - Tipping point: Estimate minimum proportion of censored patients that must have died to reverse OS significance
Interpretation: If the primary OS conclusion holds under this stress test (p < 0.05), robustness is demonstrated across multiple simultaneous assumption departures."
Crossover Adjustment Sensitivity Analyses
When formal crossover occurred (control arm patients received investigational drug post-progression), OS sensitivity analyses are expected:
RPSFT (Rank Preserving Structural Failure Time Model)
Purpose: Estimate OS under the hypothetical scenario where crossover did not occur Assumptions: Common treatment effect assumption — the treatment effect is the same for initial and subsequent use Implementation:
- Define treatment-free intervals (before experimental drug exposure) and treatment intervals
- Apply accelerated failure time shrinkage to on-treatment intervals
- Re-censor to remove informative censoring induced by RPSFT adjustment SAP language: "A rank-preserving structural failure time model will be applied to the control arm to estimate overall survival under the hypothetical scenario in which no patients received [drug] after disease progression. The null hypothesis of no treatment effect will be tested using the g-estimation procedure. Bootstrapped 95% confidence intervals (1,000 iterations) will be reported. RPSFT-adjusted OS HR is presented as a pre-specified supplementary analysis; the primary OS analysis remains the ITT treatment policy estimate."
Two-Stage Estimation (TSE)
Purpose: Estimate the counterfactual OS from secondary baseline (progression date) Assumptions: No unmeasured confounders at secondary baseline; separable switcher/non-switcher populations Best for: Settings with >20% non-switchers in control arm
IPCW (Inverse Probability of Censoring Weighting)
Purpose: Re-weight patients to account for covariate-dependent censoring at crossover Assumptions: No unmeasured time-varying confounders; overlap in covariate distributions between arms Best for: 40–85% switching rates; when comprehensive covariate data collected SAP language: "As a sensitivity analysis, IPCW will be applied to the primary OS analysis. Weights will be derived from a time-varying logistic regression model predicting the probability of not having crossed over at each event time, using pre-specified time-varying covariates [list]. The weighted log-rank test and IPCW-adjusted Cox model will be presented."
Agency acceptance of crossover adjustments:
- FDA: Supportive only — ITT OS remains primary; RPSFT/IPCW presented as supplementary
- EMA/NICE: Accepted for HTA submissions — RPSFT or IPCW may be the primary basis for comparative effectiveness claim
- IQWIG (Germany): Does not accept any switching adjustment; ITT-only accepted
Non-Proportional Hazards Sensitivity
When PH assumption is suspected violated (delayed IO effect, crossing survival curves, biomarker-selected populations with early responder depletion):
Pre-specified NP hazards analyses:
-
RMST (Restricted Mean Survival Time): Compare mean survival time within a pre-specified window (e.g., 24 or 36 months). Robust to PH violation; directly interpretable as "mean additional months of survival."
- SAP language: "RMST will be calculated at 24 months for both OS and PFS as a pre-specified sensitivity analysis to assess robustness of the primary Cox model results under potential non-proportional hazards."
-
Weighted log-rank test (Fleming-Harrington): Use ρ=0, γ=1 weighting to emphasize late events; compare to standard log-rank to identify timing of treatment effect
- Landmark analysis: Pre-specify OS/PFS rates at 12, 24, 36 months; if HR changes substantially between periods, NP hazards are present
- Schoenfeld residuals test: Formal test of PH assumption; report p-value and plot scaled residuals vs. time
FDA expectation: For IO trials (immunotherapy), non-proportional hazards are expected and pre-specification of RMST or weighted log-rank is recommended in the pre-IND or Type B meeting agreement.
Tipping Point Analysis Framework
For PFS Censoring (Hypothetical Strategy)
When PFS uses hypothetical strategy (censoring at new therapy), a tipping point analysis quantifies how many censored patients in the control arm would need to have events to negate significance:
Methodology:
- Start from the primary PFS analysis (log-rank test statistic, HR)
- Sequentially convert censored control arm patients to events (starting from last censored)
- Track when the primary p-value crosses 0.05
- Report: "The PFS analysis remains significant (p < 0.05) even if [X] of [Y] censored control arm patients ([Z]%) had experienced events at their censoring date."
Interpretation guide:
- If tipping point requires <10% of censored patients to have events → primary result is fragile; sensitivity matters
- If tipping point requires 10–30% → primary result is moderately robust
- If tipping point requires >30% → primary result is robust to most plausible MNAR departures
For Missing Data (Delta Parameter)
When continuous endpoints have missing data, a delta-adjusted tipping point analysis tests robustness:
Methodology:
- Fit primary MMRM or MI model under MAR assumption
- Define δ = change in treatment effect per-unit-dropout-risk
- Re-fit for δ = 0 (MAR), δ = −1, δ = −2, δ = −3
- Identify tipping point δ where primary conclusion reverses
Interpretation:
- δ = 0: Primary MAR analysis
- δ = −1: Modest MNAR (missing patients have outcomes ~1 unit worse)
- δ = −2, −3: Strong MNAR (clinically implausible)
SAP Language:
"A tipping point analysis for missing data will be conducted by varying the delta parameter (δ) representing departure from the missing at random assumption. For each δ value (0, −1, −2, −3), missing treatment-arm observations will be imputed under the specified MNAR assumption, and the primary analysis re-run. The tipping point is defined as the minimum δ at which the primary efficacy conclusion (p = 0.05 significance threshold) is negated."
Sensitivity Analysis Documentation Requirements
SAP requirements (pre-specified before unblinding):
For each primary estimand, the SAP must include:
-
Full list of pre-specified sensitivity analyses, each labeled as:
- Same estimand, different estimator (pure sensitivity) — e.g., alternative censoring rule
- Alternative estimand (supplementary, different clinical question) — e.g., hypothetical when treatment policy is primary
-
For each sensitivity: description of which assumption is varied and why
- Ordered from most important to least important (not by expected favorable result)
- Statement that sensitivity analyses do not contribute to Type I error control (no alpha adjustment required for sensitivity)
SAP multiplicity statement (required):
"The primary analysis uses the treatment policy strategy for [endpoint]. All pre-specified sensitivity analyses are supportive and do not contribute to the confirmatory testing hierarchy. P-values from sensitivity analyses are descriptive only. No adjustment for multiplicity is applied to sensitivity analyses."
Backlinks
- ICH E9(R1) Estimand Framework
- Intercurrent Events in Oncology Trials
- Overall Survival (OS)
- Progression-Free Survival (PFS)
- DFS and EFS Endpoints
Source: ICH Harmonised Guideline E9(R1) — Addendum on Estimands and Sensitivity Analysis in Clinical Trials (Final, November 2019), §A.5.1–A.5.3 Status: Final (ICH E9(R1) Step 4, adopted November 2019; FDA adoption May 2021) Compiled from ICH E9(R1) §A.5.1–A.5.3 + reference-based imputation best practices
Last Updated: April 2026
Knowledge Base: oncology_kb
Section: Clinical Trial Design — Sensitivity Analyses for Estimands