Principal Stratum and While-on-Treatment Strategies
Definition
Principal Stratum Strategy
"The target population might be taken to be the 'principal stratum' in which an intercurrent event would occur. Alternatively, the target population might be taken to be the principal stratum in which an intercurrent event would not occur. The clinical question of interest relates to the treatment effect only within the principal stratum." — ICH E9(R1) Addendum, §A.3.2 (Final, November 2019)
A principal stratum is a subgroup of patients defined by their potential intercurrent event status — not their actual IE status on their assigned treatment. Specifically, a principal stratum is defined by what would happen to a patient under both treatment conditions. This fundamentally distinguishes it from post-hoc subsetting based on observed events.
The key distinction (ICH E9(R1)):
"It is important to distinguish 'principal stratification', which is based on potential intercurrent events (for example, subjects who would discontinue therapy if assigned to the test product), from subsetting based on actual intercurrent events (subjects who discontinue therapy on their assigned treatment). The subset of subjects who experience an intercurrent event on the test treatment will often be a different subset from those who experience the same intercurrent event on control."
Practical implication: "Responders on drug A" is not the same population as "patients who would respond to drug A OR drug B" — the former is an observed subgroup (subject to selection bias); the latter is a latent construct requiring identification methods.
While-on-Treatment Strategy
"For this strategy, response to treatment prior to the occurrence of the intercurrent event is of interest. Terminology for this strategy will depend on the intercurrent event of interest; e.g. 'while alive', when considering death as an intercurrent event." — ICH E9(R1) Addendum, §A.3.2 (Final, November 2019)
The while-on-treatment strategy restricts the observation window to the period before the IE occurs. Unlike the principal stratum strategy (which changes the population), while-on-treatment changes the variable by restricting when measurements are taken.
Key warning (ICH E9(R1)): "Particular care is required if the occurrence of the intercurrent event differs between the treatments being compared." If patients on the experimental arm stay on treatment longer (or die at different rates), the observation windows will be systematically different between arms, confounding the comparison.
Regulatory Position
ICH E9(R1): Both principal stratum and while-on-treatment are recognized as valid IE strategies with appropriate uses. Neither is prohibited — but both carry conditions and limitations that must be explicitly addressed.
FDA oncology practice (2022–present):
- Principal stratum: Rarely used as a primary estimand in Phase 3 oncology. Its main application is the DOR estimand (duration of response among responders) where the response stratum is naturally defined. FDA has not endorsed principal stratum as a primary estimand for any standard oncology efficacy endpoint.
- While-on-treatment: Accepted for secondary/supportive analyses. Standard for DOR, on-treatment AE rates, and some PRO analyses. Not appropriate as primary for OS or PFS.
EMA position: EMA has expressed concern about principal stratum in oncology, noting that the identification assumptions are rarely verifiable and the statistical methods are complex. EMA has discouraged routine use of principal stratum as primary estimand.
Status: ICH E9(R1) = Final (November 2019)
When to Use Each Strategy
Principal Stratum — When Appropriate
1. Duration of Response (DOR) — standard application:
The DOR estimand is: "Among patients who would achieve a response (CR or PR) under either treatment arm, what is the treatment effect on the duration of that response?"
- The principal stratum = patients who would achieve a response under both treatment conditions
- In practice, the stratum is approximated by: patients who achieved a response on the treatment they received (the "responder" subgroup in each arm)
- This is the most widely accepted principal stratum application in oncology
2. Tolerability stratum:
"Among patients who would be able to tolerate the treatment at the assigned dose, what is the treatment effect?"
- The stratum = patients who would not discontinue due to AE on either arm
- Rarely used in practice because identification requires assumptions about counterfactual tolerance
3. Never-switcher population (crossover trials):
"Among patients who would not cross over regardless of treatment assignment, what is the OS effect?"
- The stratum = patients who would comply with assigned treatment (not cross over) in either arm
- Theoretically correct for hypothetical OS estimation in crossover designs, but practically infeasible — the never-switcher principal stratum cannot be identified without strong assumptions (monotone treatment compliance, no unmeasured confounders)
4. Subgroup defined by biomarker-predicted response:
When a biomarker predicts IE occurrence (e.g., EGFR mutation predicts response to EGFR TKI), the biomarker-positive group approximates the principal stratum of patients "who would respond." This is the basis for enrichment designs.
While-on-Treatment — When Appropriate
1. Duration of Response (DOR) — secondary aspect:
DOR is measured while the patient is "in response" — from confirmed response to documented progression. This is a while-on-treatment strategy applied within the response stratum.
2. On-treatment safety endpoints:
- AE rates restricted to the on-treatment period
- Dose-intensity delivered
- "Treatment-emergent AEs" — the standard safety analysis population
- ICH E9(R1): "subjects might discontinue treatment and, in some circumstances, it will be of interest to assess the risk of an adverse drug reaction while the patient is exposed to treatment"
3. Symptom endpoints in palliative care:
"Time to symptom deterioration while on treatment" — when the clinical question is about whether the drug maintains QoL during active treatment. The observation window is explicitly restricted to the active treatment period.
4. PRO/HRQoL while alive:
"Time to sustained deterioration in HRQoL while alive" — restricts QoL assessment to living patients. This is the "while alive" variant of the while-on-treatment strategy.
When NOT to Use These Strategies
Principal stratum should NOT be used when:
- The principal stratum cannot be reliably identified (no biomarker or proxy for potential IE occurrence)
- The clinical question of primary regulatory interest is about all-comers, not a subgroup
- The estimation method requires unverifiable assumptions (e.g., monotone treatment selection assumption)
- As a primary estimand for OS, PFS, or DFS in a standard Phase 3 superiority trial
While-on-treatment should NOT be used when:
- The clinical question requires comparison of outcomes over the full follow-up period
- The intercurrent event (end of treatment) differs substantially between arms — creating informative truncation that prevents valid comparison
- The endpoint of interest is OS (treatment policy is universally required for OS primary)
Design Considerations
Identifying the Principal Stratum
The principal stratum is a latent construct — it cannot be directly observed because each patient receives only one treatment. Identification requires one of:
Method 1: Biomarker-based identification
- If a pre-treatment biomarker reliably predicts the IE (e.g., EGFR mutation status predicts response), the biomarker-positive group approximates the principal stratum of "would-respond" patients
- Validity assumption: the biomarker predicts IE occurrence under both treatment conditions
- Example: EGFR exon 19/21 deletion as a proxy for the "would-achieve-response" principal stratum in EGFR TKI trials
Method 2: Principal score methods
- Model the probability of IE occurrence under each treatment as a function of pre-treatment covariates
- Estimate principal stratum membership probabilistically
- Requires: (a) strong ignorability (no unmeasured confounders), (b) overlap in covariate distributions
- Statistical complexity: requires specialized Bayesian or semiparametric estimation; not yet standard in oncology NDA submissions
Method 3: Bounding approaches
- Provide bounds on the principal stratum effect rather than a point estimate
- Bounds are often wide and clinically uninformative
- Used when strong identification assumptions cannot be justified
Practical oncology application (DOR):
The DOR principal stratum is approximated by the observed responders in each arm. This is NOT the true principal stratum (it conditions on observed response, not potential response), but it is the standard regulatory-accepted approximation. The limitation (selection bias in favor of the experimental arm if response rate differs) is typically acknowledged as a limitation rather than addressed with formal principal stratum methods.
While-on-Treatment: Asymmetric Observation Window Problem
The central methodological challenge: if treatment duration differs between arms (e.g., experimental arm patients stop treatment earlier due to toxicity, or remain on treatment longer due to better efficacy), the observation window is systematically different.
Example:
- Arm A: median treatment duration 8 months
- Arm B: median treatment duration 14 months
- While-on-treatment PRO analysis: Arm A has PRO data for a median 8-month window; Arm B for 14 months
- Arm B will appear to show sustained PRO benefit simply because the observation window is longer — not necessarily because PRO is better
Solutions (partial):
- Landmark analysis: Compare PRO at fixed time points (3 months, 6 months) rather than over the whole treatment period — avoids asymmetric window but reduces information
- Restrict to shorter arm's observation window: Compare both arms over the same time window (e.g., 8 months) — reduces power for Arm B
- Mixed model for repeated measures (MMRM): Account for informative dropout due to treatment discontinuation — requires MAR assumption
FDA guidance on while-on-treatment PRO analysis: FDA requires that the analysis method addresses the asymmetric window problem. Simply reporting "on-treatment PRO scores" without addressing differential observation windows is not acceptable for a confirmatory claim.
Principal Stratum and Sample Size
When principal stratum is pre-specified as the primary analysis:
- Sample size must account for the estimated proportion of patients in the principal stratum (e.g., if 40% of patients are expected to be in the "responder" stratum, effective sample size ≈ 40% of total randomized patients)
- Power calculations must use the expected event rate within the principal stratum, not the overall population rate
- Larger total enrollment is required to achieve adequate power within the stratum
DOR Estimand: The Standard Principal Stratum Application
Duration of Response (DOR) is the most accepted principal stratum + while-on-treatment endpoint in oncology. Its estimand has a natural structure:
Population (principal stratum): Patients who achieve confirmed response (CR or PR) — observed responders are the proxy for the true "would-respond" principal stratum
Variable: Time from confirmed response to documented progression or death (whichever first)
IE strategy: Two strategies combined:
- Principal stratum (for the population): restrict to responders
- While-on-treatment / composite (for the variable): measure duration until progression or death
Population-level summary: Median DOR with 95% CI; Kaplan-Meier curves starting from confirmed response date
SAP language:
"Duration of response will be analyzed in the principal stratum of confirmed responders (patients achieving confirmed CR or PR per [criteria]). The analysis population includes all patients who achieve confirmed response as the best overall response. DOR is defined as the time from confirmed response to first documented disease progression per [criteria] or death from any cause, whichever occurs first. Patients without documented progression or death will be censored at the date of last adequate tumor assessment. Median DOR with 95% CI will be estimated using Kaplan-Meier methods."
Regulatory acceptance: DOR in confirmed responders is accepted for labeling by FDA and EMA as a secondary/supportive endpoint. DOR supports the "durability" component of ORR-based accelerated approval claims. DOR as a primary endpoint requires an explicit estimand section defining the responder principal stratum.
While-on-Treatment: On-Treatment Safety Application
Standard safety application: Treatment-emergent adverse events (TEAEs) are defined as AEs occurring after the first dose and up to a pre-specified window after the last dose (commonly 30 or 90 days). This is a while-on-treatment estimand for safety:
- IE: End of treatment exposure
- Observation window: From first dose to last dose + X days
- Variable: AE occurrence, grade, time to onset
SAP language:
"Treatment-emergent adverse events (TEAEs) are defined as adverse events with onset on or after the date of first study drug administration and up to 30 days after the date of last study drug administration, or the date of initiation of subsequent anticancer therapy, whichever comes first. The TEAE analysis implements a while-on-treatment strategy in which the observation period is explicitly restricted to the active treatment exposure window."
Regulatory Precedent
| Application | Strategy | Status | Example |
|---|---|---|---|
| DOR in confirmed responders | Principal stratum (response stratum) | Accepted for labeling | Multiple IO and targeted therapy approvals |
| TEAEs | While-on-treatment | Standard | Universal in oncology safety analyses |
| PRO "while alive" | While-on-treatment (alive) | Accepted secondary | Palliative cancer symptom trials |
| Never-switcher OS | Principal stratum (compliance stratum) | Exploratory only | Rarely submitted; not accepted as primary |
| Tolerability stratum efficacy | Principal stratum | Not accepted as primary | Proposed in some dose optimization trials |
Limitations and Pitfalls
Principal stratum identification problem: Without a perfect biomarker, the principal stratum cannot be identified — only approximated. Observed responders ≠ patients who would respond. The approximation introduces selection bias that is not addressed by randomization.
DOR comparison arm validity: When response rates differ substantially between arms (e.g., 70% vs. 30%), the "responder" groups are fundamentally different in their tumor biology and prognosis. Comparing DOR between arms is potentially confounded by this selection difference. DOR is most reliable as a single-arm descriptive measure or when response rates are similar.
While-on-treatment: interpretation of negative results: A negative while-on-treatment result (no benefit during treatment period) could reflect: (a) the drug has no effect, or (b) the observation window is too short. The analysis cannot distinguish these without additional off-treatment follow-up data.
EMA concern about principal stratum in Phase 3: EMA has noted in reflection papers that principal stratum methods are mathematically rigorous but practically difficult to verify in oncology Phase 3 settings. The complexity of identification assumptions and sensitivity analyses has led EMA to recommend caution before adopting principal stratum as a primary estimand.
Backlinks
- ICH E9(R1) Estimand Framework
- Intercurrent Events in Oncology Trials
- Intercurrent Events in Oncology Trials
- Sensitivity Analyses for Estimands
- Response-Based Endpoints (ORR, CR, DOR)
Source: ICH Harmonised Guideline E9(R1) — Addendum on Estimands and Sensitivity Analysis in Clinical Trials (Final, November 2019), §A.3.2 Status: Final (ICH E9(R1) Step 4, adopted November 2019) Compiled from ICH E9(R1) §A.3.2 + estimand_framework_oncology_review.md