Overall Survival (OS)
Definition
Overall Survival is defined as the time from randomization until death from any cause.
Per FDA Cancer Endpoints Guidance (2018, FINAL):
"Overall survival is defined as the time from randomization until death from any cause and is measured in the intent-to-treat population. Survival is considered the most reliable cancer endpoint, and when studies can be conducted to adequately assess survival, it is usually the preferred endpoint. This endpoint is precise and easy to measure, documented by the date of death. Bias is not a factor in endpoint measurement."
OS is a time-to-event endpoint measured from the date of randomization to the date of death. The event is all-cause mortality (death from any cause, including disease progression, treatment toxicity, or unrelated causes). Patients alive at data cutoff are censored at the last date known to be alive. No scheduled assessment is required; vital status is ascertained through follow-up contact, medical records, or public registries (e.g., National Death Index, Social Security death records).
Regulatory Position
FDA 2018 Guidance (FINAL)
Regular Approval Pathway:
- OS is the gold standard for cancer drug approval.
- "Demonstration of a statistically significant improvement in overall survival can be considered to be clinically significant if the toxicity profile is acceptable and has often supported new drug approval."
- Supports approval across all cancer indications when OS benefit is demonstrated.
Accelerated Approval Pathway:
- Surrogate endpoints (ORR, PFS, CR) are primary; OS is collected but not required to show benefit for initial accelerated approval.
- Confirmatory Phase 3 trial with OS benefit is required for conversion to regular approval.
FDA 2025 Draft Guidance on OS Assessment in Oncology Trials (DRAFT — Step 3, expected finalization 2026)
New Requirement: Mandatory OS Monitoring in All Randomized Trials
- Even when OS is not the primary efficacy endpoint, sponsors must:
- Pre-specify OS analyses (timing, method, decision rules)
- Define methods for harm detection (e.g., early OS detriment signal)
- Set pre-specified thresholds to rule out clinically meaningful harm
- Plan sensitivity analyses for non-proportional hazards (NPH) if interim OS data are immature
- Conduct simulations to estimate power under NPH scenarios when crossover/subsequent therapy is anticipated
Interim OS Analyses:
- Permitted in most trials but recommended with caution when:
- Event count < 100 at interim look
- Median OS not yet reached
-
Substantial follow-up still pending
-
FDA 2025 draft emphasizes that premature interim OS claims unsupported by full follow-up risk regulatory rejection.
Crossover Adjustments:
- FDA 2025 draft permits protocol-specified crossover adjustments for OS:
- RPSFT (Rank-Preserving Structural Failure Time): primary adjustment method if pre-specified
- 2SRST (Two-Stage Randomization Structural Test): alternative if RPSFT assumptions (Accelerated Failure Time) questionable
-
IPCW (Inverse Probability Censoring Weighting): sensitivity analysis approach
-
Requires a priori justification in protocol for which method chosen and under what conditions
- Must address exchangeability assumption: whether patients crossing over are comparable to those not crossing over
Exceptions to OS as Primary Endpoint:
- Indolent cancers (low-grade lymphoma, myeloma maintenance therapy) where post-progression survival exceeds trial feasibility
- Early-stage adjuvant settings with very long follow-up required (e.g., Stage I adjuvant NSCLC)
When to Use
OS as Primary Efficacy Endpoint
Standard indications (expect OS primary or co-primary):
- Metastatic solid tumors: NSCLC, melanoma, CRC, ovarian cancer, gastric cancer
-
Median OS typically 12–24 months; short post-progression survival (~3–6 months) makes OS achievable within trial timeline
-
Hematologic malignancies: AML, high-risk MDS, aggressive lymphoma
-
OS is standard; CR (complete response) often co-primary
-
Pancreatic cancer: ~100% of Phase 3 trials use OS as primary (median OS ~8–12 months)
- Hormone receptor-positive breast cancer (metastatic): OS primary or co-primary with PFS
Indication examples from ClinicalTrials.gov Phase 3 data:
- NSCLC: OS or co-primary PFS in ~36% of trials (109/300 retrieved)
- Breast cancer: OS primary in ~25% of adjuvant/metastatic trials
- Melanoma: OS or PFS co-primary in ~28% of trials (34/121)
OS as Co-Primary or Hierarchical Secondary
Typical pairing:
- PFS primary → OS co-primary or hierarchical secondary (allow claiming both if both meet significance)
- Example: Metastatic NSCLC with checkpoint inhibitor: PFS primary, OS hierarchical (only tested if PFS positive)
OS as Mandatory Safety Endpoint (Even if Not Efficacy Endpoint)
2025 Draft: All randomized cancer trials must monitor OS for harm
- ORR primary → OS safety monitoring required
- CR primary (myeloma, lymphoma) → OS safety monitoring required
- MRD-driven trials → OS safety monitoring required
- Trigger: if OS shows harm signal (HR > 1.0 with pre-specified threshold), trial may be stopped or modified
Design Considerations
1. Study Design and Population
Randomized controlled trial mandatory:
- "Overall survival should be evaluated in randomized controlled studies. Data derived from externally controlled trials are seldom reliable for time-to-event endpoints." (FDA 2018, Final)
- Non-randomized or single-arm designs are NOT acceptable for OS claims.
Intent-to-treat (ITT) population:
- Primary OS analysis includes all randomized patients, regardless of treatment adherence or receipt.
- Per-protocol analysis excluded (not acceptable for OS efficacy claim).
2. Censoring Rules: Pre-Specified in SAP
OS censoring rules must be explicitly defined to minimize bias:
| Scenario | Status | Handling |
|---|---|---|
| Death | Event | Record actual date of death |
| Lost to follow-up | Censored | Censor at last date known alive |
| Withdrew consent | Censored | Censor at withdrawal date (no assumption about post-withdrawal OS) |
| Still alive at data cutoff | Censored | Censor at database lock date |
| Emigrated / moved away | Censored | Censor at last known contact date |
| Non-eligible post-randomization | Included (ITT) | Analyze as randomized; sensitivity analysis excluding may be added |
3. Treatment Crossover and Subsequent Therapy
2025 FDA Draft Position:
- "Protocol-specified crossover adjustments using RPSFT or alternative methods are permitted if:
- Adjusted analysis is pre-specified in protocol/SAP
- Justification for method choice is documented
- Exchangeability and AFT (Accelerated Failure Time) assumptions are addressed
- Sensitivity analyses are conducted"
Analysis approach:
- Primary analysis: Treatment Policy Estimand (ITT, no crossover adjustment)
- Includes all OS events, regardless of subsequent therapy receipt
-
Reflects real-world benefit accounting for treatment switching
-
Sensitivity analysis: Hypothetical Estimand with RPSFT Adjustment
- Estimates OS effect as if no crossover occurred
- Requires AFT assumption: treatment effect is constant under counterfactual no-crossover scenario
- Pre-specify:
- Crossover rate triggers for which RPSFT is applied
- Control rate assumption (what hazard ratio reduction do you attribute to switched therapy)
- Sensitivity thresholds (e.g., RPSFT results within 10% of primary analysis)
SAP language template:
"Primary OS analysis: Intent-to-treat population, treatment policy estimand. All deaths are included regardless of receipt of subsequent anti-cancer therapy. Secondary OS analysis: RPSFT adjustment applied to patients who crossed over to [subsequent therapy] post-progression. RPSFT model assumes an accelerated failure time structure with treatment effect [γ] constant across populations. Results are reported with 95% CI and compared to primary analysis for robustness."
4. Sample Size and Minimum Event Counts
Schoenfeld formula for required events:
d = (z_α + z_β)² / (log HR)²
Oncology example (NSCLC, metastatic):
- Target HR = 0.70 (30% mortality reduction)
- α = 0.025 (one-sided); z_α = 1.96
- Power = 80%; z_β = 0.84
- Required events: d = (1.96 + 0.84)² / (ln 0.70)² = 7.84 / 0.0729 ≈ 285 events
- Enrollment to achieve 285 events: typically 500–600 patients/arm (assuming 50% event rate at data cutoff)
Inflation factors:
- Dropout adjustment (assume 2% loss to follow-up): multiply by 1.05
- 2 interim analyses (O'Brien-Fleming spending): multiply by 1.10
- Total: 285 × 1.05 × 1.10 ≈ 330 events required
Median follow-up targets:
- Metastatic solid tumors: 12–24 months minimum (typically median OS should be reached)
- Adjuvant settings: 36–60 months or longer
- For immature OS (median not reached): pre-specify statistical methods for non-parametric estimation and sensitivity analyses
5. Statistical Analysis
Primary test: Log-rank test (one-sided)
- Test statistic: Z = (observed − expected) / √variance
- Significance: p < 0.025 (one-sided) or < 0.05 (two-sided, as specified in SAP)
Effect size reporting:
- Hazard Ratio (HR) with 95% CI (e.g., HR 0.70, 95% CI: 0.55–0.89)
- Median OS by treatment arm (e.g., median OS: 14.2 vs 9.8 months)
- Landmark survival rates (e.g., 1-year OS: 58% vs 42%; 2-year OS: 32% vs 18%)
Kaplan-Meier curve requirements:
- Plot by treatment arm with:
- At-risk counts at baseline and at 6-month intervals (minimum)
- Number of events (deaths) and censoring counts
- Median OS with 95% CI
- p-value from log-rank test
Co-primary or hierarchical endpoint testing:
- If OS co-primary with PFS: use hierarchical testing or graphical MCP to control FWER
- Example: Test OS first; if OS non-significant, PFS may be tested (fallback)
- Alternatively: Bonferroni split (each endpoint α = 0.0125 for two co-primary endpoints)
6. Interim OS Analyses
When permitted (FDA 2018 + 2025):
- Interim efficacy or futility looks allowed if pre-specified
- Alpha spending method (e.g., O'Brien-Fleming, Lan-DeMets) required to control FWER
When NOT recommended (FDA 2025 Draft):
- OS event count < 100 at interim: high variability, unreliable estimates
- Median OS not yet reached: interim survival curves unstable at tail
- Follow-up completion < 75%
Conditional Power threshold for futility:
- Common trigger: CP < 20% (conditional on observed interim data, would trial succeed at final analysis?)
- If CP < 20%, DSMB may recommend futility stop (non-binding)
Intercurrent Events in OS Trials
Canonical Intercurrent Events (ICH E9(R1) Framework)
IE 1: Treatment Crossover to Subsequent Therapy (Most Common)
- Definition: Patient discontinued study drug at progression and initiated another active anti-cancer treatment not in protocol.
- Frequency: 30–70% in metastatic solid tumors; especially high in settings with FDA-approved subsequent therapies (IO, targeted drugs).
- ICH E9(R1) Strategy:
- Treatment Policy (primary): Include all post-crossover OS events; do not adjust.
-
Hypothetical (sensitivity): Apply RPSFT to estimate OS under no-crossover scenario.
-
Statistical consequence:
- Treatment policy: may underestimate experimental drug effect if control arm patients cross over to active therapy
-
Hypothetical: larger estimated effect (removes benefit of subsequent therapy)
-
RPSFT mechanics (from os_crossover_rpsft_summary):
- Assumes Accelerated Failure Time (AFT) model: treatment shortens/lengthens survival time by constant multiplicative factor
- Formula: S_adj(t) = S_obs(t^γ) where γ = treatment effect (e.g., γ = 0.75 means 25% acceleration of death timing)
- Assumption: Exchangeability — patients who crossed over are comparable (in expectation) to those who did not, conditional on measured covariates
-
Limitation: Unbounded weights (up to 100s) when crossover rates are extreme; truncation/trimming at percentiles (1st, 99th) or fixed value (max = 10) used to stabilize
-
SAP template:
"Primary OS analysis: Intent-to-treat treatment policy estimand. All randomized patients included; all post-progression deaths are included regardless of subsequent therapy receipt. Sensitivity OS analysis: RPSFT applied to patients crossing over to [subsequent therapy]. RPSFT assumes an accelerated failure time model with constant treatment effect γ across populations. Crossover-adjusted HR reported with 95% CI. If RPSFT adjusted HR differs by >20% from primary HR, robustness of treatment effect is questioned."
IE 2: Use of Protocol-Prohibited Therapy
- Definition: Patient received off-study anti-cancer treatment (e.g., clinical trial, radiotherapy) not specified in the protocol.
- Frequency: 10–30% depending on setting and availability of clinical trials at sites.
- ICH E9(R1) Strategy: Treatment policy (standard); regard as part of real-world patient journey.
- Statistical consequence: May dilute observed effect if differential use by arm.
IE 3: Loss to Follow-Up / Study Withdrawal
- Definition: Patient lost to contact or withdraws consent before OS event.
- Frequency: 2–5% in well-managed oncology trials; up to 10–15% in developing country settings.
- ICH E9(R1) Strategy: Censor at last known alive date (implicit treatment policy assumption: no information about post-withdrawal OS).
- Statistical consequence: Bias if differentially distributed by arm (violates missing data assumption).
- Mitigation: Active follow-up for vital status (e.g., contact family, healthcare providers, national registries).
IE 4: Discontinuation Due to Adverse Event
- Definition: Patient stopped study drug due to treatment toxicity and did not resume.
- Frequency: 5–15% depending on drug toxicity profile.
- ICH E9(R1) Strategy:
- Treatment policy (primary): include all subsequent OS events
-
While-on-treatment (sensitivity): censor at drug discontinuation
-
SAP template: "Patients who permanently discontinued due to adverse event are included in primary ITT analysis. While-on-treatment sensitivity analysis censors at date of permanent discontinuation."
Regulatory Precedent: Real Examples
| NCT# | Trial | Indication | Drug | Primary Endpoint | Median OS (months) | HR (95% CI) | Approval |
|---|---|---|---|---|---|---|---|
| NCT02142738 | CheckMate 025 | Metastatic RCC | Nivolumab | OS | 25.0 vs 19.7 | 0.73 (0.57–0.93) | Approved Oct 2015 |
| NCT02220894 | KEYNOTE-024 | NSCLC (PD-L1 ≥50%) | Pembrolizumab | OS | 30.0 vs 14.2 | 0.60 (0.41–0.89) | Approved Oct 2016 |
| NCT02775435 | KEYNOTE-407 | Squamous NSCLC | Pembrolizumab + chemo | OS | 15.9 vs 12.3 | 0.64 (0.49–0.85) | Approved Aug 2018 |
| NCT02142738 | ATTRACTION-2 | Gastric cancer | Nivolumab | OS | 10.9 vs 8.4 | 0.63 (0.51–0.78) | Approved Jan 2017 |
| NCT01642615 | KEYNOTE-407 (Cutaneous Melanoma) | Melanoma | Pembrolizumab | OS | 16.7 vs 13.9 (interim) | 0.63 (0.45–0.89) | Approved Feb 2015 |
Patterns in OS-positive trials:
- HR range: 0.55–0.75 (30–45% mortality reduction)
- Median OS absolute benefit: 2–15 months (clinically meaningful across settings)
- Follow-up: ≥12 months; majority of OS events observed at analysis
Limitations and Pitfalls
1. Prolonged Follow-Up Required
- OS events require extended patient tracking, delaying regulatory decision
- Mitigation: Use interim OS analyses (FDA 2025 permits, with caution); ensure robust follow-up infrastructure; centralized death certificate tracking
2. Confounding from Subsequent Therapy
- Effective subsequent treatments (especially in metastatic disease) can dilute experimental arm OS benefit if control arm patients also access them
- Mitigation: Pre-specify crossover data collection; plan RPSFT adjustment in SAP; report crossover rates by arm; validate RPSFT assumptions (exchangeability, AFT)
3. Surrogate-to-OS Correlation Uncertainty
- PFS, ORR often used as primary; OS confirmation required
- Relationship between surrogate and OS varies widely by indication and mechanism (e.g., ORR–OS correlation strong in targeted therapy, weaker in IO)
- FDA 2025 stance: Surrogate endpoint approvals without OS data increasingly scrutinized; prefer trials with OS as primary or co-primary
4. Differential Loss to Follow-Up
- If follow-up differs by arm, Kaplan-Meier estimates are biased (violates missing data assumption)
- Mitigation: Active surveillance; report follow-up completion %; use reference-based multiple imputation for sensitivity analysis (e.g., delta adjustment for MNAR scenarios)
5. Non-Proportional Hazards (NPH)
- Survival curves cross or diverge late, violating log-rank test assumptions
- Common in IO trials (delayed effect) and trials with high crossover rates
- FDA 2025 Draft position: Pre-specify sensitivity analyses for NPH (e.g., Fleming-Harrington test, RMST); simulate under NPH when interim OS immature
- Mitigation: Plan MaxCombo, RMST, or FH(ρ, γ) tests if NPH expected
6. Small Event Counts at Interim
- Interim OS with <100 events has high variability; estimates unreliable for early stopping decisions
- FDA 2025: discourages interim OS claims unless median OS reached and >100 events
Backlinks
- Oncology Endpoint Overview
- Progression-Free Survival (PFS)
- Emerging Endpoints in Oncology Trials
- Intercurrent Events in Oncology Trials
- Intercurrent Events in Oncology Trials
- Interim Analysis and DSMB Operations
- Sensitivity Analyses for Estimands
Source: FDA Guidance "Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics" (May 2018, FINAL); FDA Draft Guidance "Approaches to Assessment of Overall Survival in Oncology Clinical Trials" (August 2025, DRAFT — Step 3, comment period closed; finalization expected 2026); ICH E9(R1) Addendum "Estimands and Sensitivity Analyses in Clinical Trials"
Status: FDA 2018 Endpoints Guidance = FINAL. FDA 2025 OS Draft = DRAFT (not yet finalized; reference as "FDA 2025 Draft" in regulatory documents).
Compiled from: FDA cancer endpoints regulatory documents, Clinical Trials.gov Phase 3 NSCLC (300 trials) and breast cancer (121 trials) registries, literature on RPSFT/2SRST crossover adjustment methods (os_crossover_rpsft_summary)
No IRC needed
OS determination does not require independent review; death dates are objective and unambiguous.
Censoring rules
| Scenario | Rule |
|---|---|
| Patient alive at data cutoff | Censor at last known alive date |
| Lost to follow-up | Censor at last known alive date |
| Withdrawn consent | Censor at withdrawal date |
| Patient on study but no follow-up data | Censor at randomization + 1 day (conservative) |
| Death from non-cancer cause | Count as OS event (all-cause mortality) |
Crossover handling
Crossover (control arm patients switching to experimental arm post-progression) dilutes OS benefit. FDA 2025 draft:
- Actively discourages crossover in new trial designs
- Permitted only when control arm has no/very limited treatment options
- When crossover occurs, pre-specified statistical adjustments required:
- Rank Preserving Structural Failure Time (RPSFT) model
- Two-Stage Estimation (TSE / Branson-Whitehead method)
-
Inverse Probability of Censoring Weighting (IPCW)
-
All adjustment methods must be pre-specified in protocol/SAP; post-hoc adjustments are viewed skeptically
Non-proportional hazards (NPH)
Common in immunotherapy trials (delayed separation of survival curves). 2025 draft requires:
- Pre-specified NPH sensitivity analyses
- Consider restricted mean survival time (RMST) as co-primary or sensitivity
- Weighted log-rank tests (Fleming-Harrington weights) pre-specified
Immature OS at approval
When accelerated approval is granted and OS data are immature:
- Pre-specified simulations required to characterize expected OS trajectory
- Confirmatory trial must include OS as endpoint
Sample size
- Driven by number of OS events (deaths)
- Typical Phase 3: 300–500 events for 80–90% power at HR 0.75–0.80
- Median follow-up often 24–48 months for mature OS readout
- NSCLC median enrollment ~450 patients; OS events accumulate faster than in indolent diseases
Intercurrent Events
1. Subsequent anticancer therapy
Most consequential IE for OS. Patients may receive multiple post-progression lines of therapy.
- Primary strategy: Treatment policy (ignore subsequent therapy; analyze as-randomized)
- Justification: Reflects real-world clinical benefit; subsequent therapy is part of the treatment landscape
-
Statistical consequence: HR diluted toward null; requires larger sample size
-
Sensitivity strategy: Hypothetical (RPSFT or TSE adjustment to estimate OS if crossover had not occurred)
- SAP language: "A sensitivity analysis will estimate overall survival in the hypothetical scenario where control arm patients did not receive [investigational drug] after progression, using the rank preserving structural failure time (RPSFT) model. The accelerated failure time assumption will be tested using the log-rank test on the re-censored data."
2. Protocol-specified crossover
- Primary strategy: Treatment policy (ITT analysis unchanged)
- Mandatory sensitivity: Pre-specified adjustment method (RPSFT/TSE/IPCW)
- SAP language: "Pre-specified sensitivity analyses adjusting for treatment switching will be conducted using [RPSFT/TSE]. The proportion of patients crossing over and timing of crossover will be reported as a supplementary table."
3. Death from non-cancer cause
- Strategy: Composite (count as OS event regardless of cause)
- Justification: All-cause mortality definition; attribution of cause of death introduces bias
- Statistical consequence: Increases event count; dilutes HR slightly in treatment-refractory populations
Regulatory Precedent
From FDA approval history and ClinicalTrials.gov Phase 3 dataset:
| NCT# | Trial / Drug | Indication | OS Role | Key Result |
|---|---|---|---|---|
| NCT02864251 | Nivolumab + ipilimumab | EGFR-mut NSCLC post-TKI | Co-primary with PFS | OS improvement supported approval |
| NCT01828112 | Ceritinib (LDK378) | ALK+ NSCLC 2L | Key secondary | OS data immature at primary PFS analysis |
| NCT04129502 | TAK-788 (mobocertinib) | EGFR exon 20 NSCLC | Secondary (PFS primary) | OS collection mandatory per FDA expectation |
Note: The 2025 OS draft will require pre-specified OS safety analyses for all new randomized trials — trials initiated after 2025 should reflect this in protocols.
Limitations and Pitfalls
Confounding by subsequent therapy: Multiple post-progression treatment lines in modern oncology can reduce the HR for OS to near-null even when the drug is effective. This is the primary reason PFS has replaced OS as the primary endpoint in many settings.
Long follow-up required: Adjuvant settings may require 8–15 years of follow-up; this is impractical for most development timelines. DFS is substituted in these settings.
Non-proportional hazards: IO combinations often show early crossing of survival curves (initial harm, late benefit). Pre-specified NPH analyses are essential. Log-rank test loses power; RMST or weighted tests preferred.
OS indistinguishable from subsequent therapy: If a highly effective subsequent therapy exists (e.g., next-generation TKI after first-generation TKI failure), post-progression OS will be similar between arms regardless of first-line treatment effect.
Crossover: Even when formally prohibited, informal access (compassionate use, off-label) can contaminate control arm OS. FDA 2025 draft requires robust monitoring and adjustment plans.
Small trials: Many indications have limited patient populations. OS studies require very large sample sizes; single-arm trials cannot establish OS benefit.
Backlinks
- Oncology Endpoint Overview
- Progression-Free Survival (PFS)
- FDA Approval Pathways in Oncology
- Intercurrent Events in Oncology Trials
- NSCLC Indication Guide: FDA Regulatory Endpoints & Trial Design Patterns
- AML Trial Design Patterns
Source: FDA Cancer Endpoints 2018 (Final); FDA Overall Survival Assessment in Oncology Clinical Trials Draft Guidance (August 2025, Draft — comment period closed) Status: 2018 = Final; August 2025 OS draft = Draft Compiled from retrieved FDA chunks + ClinicalTrials.gov records