Longitudinal, PRO, and Repeated-Measures Methods
Definition
Longitudinal and repeated-measures methods estimate treatment effects on outcomes measured multiple times per subject across scheduled visits (symptom scores, QoL scales, biomarkers, tumor burden). In oncology, these analyses most commonly target patient-reported outcome (PRO) endpoints and exploratory biomarker trajectories.
Per ICH E9(R1) (Final, 2019), the estimand framework requires that longitudinal analyses make explicit how intercurrent events (ICEs) such as treatment discontinuation, rescue therapy, progression, or death are handled, since these events give rise to missing data that "needs to be addressed as a missing data problem in the statistical analysis" once the estimand is fixed.
Per the FDA Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics guidance (Final, December 2018), symptom endpoints may include "specific symptom endpoints" or "composite symptom endpoints, such as the myelofibrosis symptom assessment form," and time-to-event symptom analyses.
PRO Endpoints in Oncology: Instrument Selection and Labeling Claims
PRO instruments must be fit-for-purpose, validated in the target tumor population, and pre-specified with a clear conceptual framework per the FDA Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims guidance (Final, 2009) and the FDA Core Patient-Reported Outcomes in Cancer Clinical Trials guidance (Draft, 2021 / Final, 2024). Core oncology instruments and their typical claim targets:
| Instrument | Domain | Typical oncology use | Labeling claim precedent |
|---|---|---|---|
| EORTC QLQ-C30 + disease modules (LC13, BR23, CR29, OV28, MY20) | HRQoL, functioning, symptoms | Global QoL and functional scales in Phase 3 metastatic trials | Supportive labeling (e.g., enzalutamide mCRPC QoL maintenance) |
| FACT-G / FACT-L / FACT-P / FACT-B | HRQoL, disease-specific | U.S. pivotal trials; physical/functional/social/emotional | Supportive labeling in prostate, lung, breast |
| MDASI (core + modules) | Symptom burden | Symptom improvement endpoints | Limited stand-alone; supportive |
| MF-SAF / MPN-SAF TSS | Myelofibrosis symptoms | TSS50 response at Week 24 | Primary efficacy labeling (ruxolitinib, COMFORT-I) |
| BPI (Brief Pain Inventory) | Pain severity / interference | Bone-pain endpoints in mCRPC, bone metastases | Pain-response labeling (radium-223, abiraterone) |
| PRO-CTCAE | Symptomatic AE | Tolerability / Project Optimus dose-finding | Supportive tolerability characterization |
| EQ-5D-5L | Health utility | HEOR and reimbursement; not efficacy labeling | Not an efficacy claim driver |
Core principles for instrument selection:
- Concept of interest must match the claim (symptom improvement ≠ HRQoL maintenance ≠ functional benefit).
- Content validity demonstrated in the specific tumor population and line of therapy.
- Recall period and administration frequency aligned with the estimand's time horizon.
- Psychometric evidence (reliability, construct validity, responsiveness) pre-submitted; meaningful-change threshold (MID) anchored in that population.
- Missing data strategy is part of fit-for-purpose evaluation — instruments with chronic high attrition cannot support a labeling claim even with valid psychometrics.
Labeling-claim archetypes:
- Symptom improvement claim — primary or co-primary symptom endpoint (e.g., MF-SAF TSS50); requires blinded design or strong sensitivity analyses, pre-specified responder definition anchored to MID, and reference-based/tipping-point sensitivity for MNAR.
- Symptom delay / time-to-deterioration claim — composite TTDD with death as event; stratified log-rank, competing-risk sensitivity.
- HRQoL maintenance claim — MMRM on global QoL across visits with pre-specified primary visit; typically supports descriptive labeling rather than efficacy.
- Tolerability characterization (non-claim) — PRO-CTCAE bother/interference descriptives; informs prescribing information safety section.
Regulatory Position
- ICH E9(R1) (Final, 2019) — the main estimator must be aligned with the estimand, and "to explore the robustness of inferences from the main estimator to deviations from its underlying assumptions, a sensitivity analysis should be conducted." Missing-data-specific sensitivity analyses (e.g., reference-based imputation, delta adjustment) are required whenever the primary analysis invokes MAR.
- ICH E8(R1) (Final, 2021) — patient-centered quality factors and pre-specified statistical analysis plans are "critical to quality" in confirmatory oncology studies, including PRO endpoints.
- FDA Cancer Endpoints Guidance (Final, 2018) — supports symptom/PRO endpoints for regular approval when clinically meaningful, well-defined, and protected from bias (blinding, complete capture, pre-specified analysis). PROs rarely support accelerated approval alone; they most often support labeling claims or serve as key secondary endpoints alongside OS/PFS.
When to Use
- Symptom-directed endpoints — myelofibrosis (MF-SAF Total Symptom Score 50% response; Jakafi), CRPC bone pain, myelodysplastic syndrome fatigue, cachexia interventions.
- QoL/functioning — EORTC QLQ-C30, FACT-G, and disease-specific modules (QLQ-LC13 NSCLC, QLQ-BR23 breast, FACT-L, FACT-P) as secondary endpoints in metastatic Phase 3 trials.
- Biomarkers — longitudinal ctDNA, PSA kinetics (PCWG3), tumor size (sum of diameters, RECIST 1.1) for exposure-response and dose-optimization (Project Optimus).
- Settings — primarily metastatic/advanced disease; maintenance and supportive-care trials; neoadjuvant symptom burden; post-transplant GVHD symptom tracking.
Design Considerations
Model choice
| Scenario | Preferred model | Rationale |
|---|---|---|
| Two timepoints (baseline + one post-baseline) | ANCOVA adjusting for baseline | Most efficient; no covariance to model |
| ≥3 visits, continuous outcome, MAR plausible | MMRM with visit as categorical fixed effect | Uses all partial data; valid under MAR; no explicit imputation |
| Binary/count repeated outcomes | GLMM or GEE | Logit/log link; robust SE for GEE |
| Trajectory shape of interest | Random-slope LMM | Models subject-specific growth |
MMRM specification (primary template)
Change from baseline ~ treatment + visit + treatment×visit + baseline + baseline×visit + stratification covariates, with unstructured (UN) covariance within subject, Kenward–Roger denominator df, REML estimation. R: mmrm::mmrm() or nlme::gls() with corSymm; SAS: PROC MIXED with REPEATED / TYPE=UN.
MMRM assumptions
- MAR conditional on observed outcomes, baseline, and covariates in the model
- Covariance structure: UN preferred when visits ≤ ~6; fallback Toeplitz or AR(1) for many visits or convergence failure
- Visits treated as categorical; schedule aligned across arms (analysis windows pre-specified)
- Estimand-linked: MMRM estimates the hypothetical effect if subjects had remained on treatment and assessed per schedule — requires explicit ICE strategy declaration
Assessment schedule
- Pre-specify PRO completion windows (e.g., ±7 days of imaging visit), order of administration (PRO before clinical contact to avoid bias), and analysis visits.
- Compliance thresholds (commonly ≥70% baseline, ≥60% at analysis visit) should be monitored; deviations trigger sensitivity analyses.
Alpha allocation & multiplicity
- PRO endpoints are typically secondary: alpha recycled via graphical testing (Bretz/Maurer) or fixed-sequence gatekeeping after primary PFS/OS success.
- Across visits: one designated primary visit (e.g., Week 24) for hypothesis testing; other visits descriptive. Avoid "significance at any visit" without adjustment.
Intercurrent Events
For PRO/repeated-measures endpoints in oncology, the dominant ICEs are:
-
Disease progression with treatment discontinuation
- Strategy: Hypothetical (what would scores be if subjects remained on treatment) for symptom-improvement claims; Treatment policy (follow-up regardless) for disease-related symptom endpoints.
- Statistical consequence: Hypothetical → MMRM under MAR + reference-based/delta sensitivity; Treatment policy → requires continued PRO collection post-progression.
- SAP template: "For patients who discontinue treatment due to progression, the hypothetical strategy is applied; data collected after discontinuation are set to missing and imputed under MAR via MMRM, with reference-based (J2R) imputation as sensitivity."
-
Death
- Strategy: Composite (deterioration or death = failure) for time-to-deterioration PROs; While-on-treatment for symptom scores among survivors.
- Statistical consequence: Composite → survival-style analysis (Kaplan–Meier, Cox); while-on-treatment → conditional on survival and biased toward healthier patients.
- SAP template: "Time to definitive deterioration is analyzed as a composite endpoint where death without prior deterioration is treated as an event."
-
Rescue/subsequent anticancer therapy
- Strategy: Treatment policy for health-utility/QoL; Hypothetical for symptom-pharmacology claims.
- Statistical consequence: Treatment policy requires continued data collection post-switch; hypothetical requires censoring and MAR/MNAR sensitivity.
- SAP template: "Data following initiation of subsequent systemic anticancer therapy are retained and included under the treatment-policy strategy for QoL endpoints; under the hypothetical strategy for targeted symptom endpoints, post-switch data are excluded and imputed."
Missing Data Strategy by Estimand
| Estimand strategy | Primary analysis | Sensitivity analyses |
|---|---|---|
| Treatment policy | Direct-likelihood MMRM or MI using all observed data incl. post-ICE | Tipping-point analysis; pattern-mixture with varying MAR extrapolation |
| Hypothetical | MMRM under MAR (discard post-ICE data) | Reference-based (J2R, CIR, CR); delta adjustment |
| Composite (time-to-deterioration) | KM + stratified log-rank, Cox | Competing-risk (Gray's test) treating death as competing |
| While-on-treatment | MMRM on observed windows | Restrict to subjects with ≥X visits; selection-model sensitivity |
Multiple Imputation under MAR
- Impute missing outcomes from posterior predictive distribution conditional on observed data, treatment, baseline, covariates (M ≥ 50 imputations).
- Analyze each completed dataset via ANCOVA/MMRM.
- Pool using Rubin's rules. R:
mice,mi,Hmisc::aregImpute.
Reference-Based Imputation under MNAR
Core idea (Carpenter/Roger): after ICE in the active arm, impute as if the subject followed the control-arm trajectory. Controlled, conservative departures from MAR aligned with the hypothetical estimand.
- J2R (Jump to Reference) — post-ICE mean jumps to control mean immediately.
- CIR (Copy Increments in Reference) — preserves pre-ICE level, future increments follow reference.
- CR (Copy Reference) — full trajectory replaced by reference group's.
R: mimix, RefBasedMI, rbmi (Roche). SAS: PROC MI with MNAR statement.
Delta-Adjusted (Tipping-Point) Sensitivity
Imputed values in the experimental arm are shifted by +δ (worse). δ is increased until the treatment effect loses significance; the "tipping point" is compared to plausible clinical magnitudes. Reported as range of δ sustaining significance.
Composite and While-on-Treatment Approaches for PRO Deterioration
- Time to Deterioration (TTD) — first ≥MID worsening from baseline (e.g., ≥10-point on QLQ-C30 functional scale). Death without prior deterioration counted as event (composite).
- Time to Definitive Deterioration (TTDD) — requires confirmed worsening at two consecutive visits without subsequent recovery.
- Responder analyses — proportion achieving ≥MID improvement sustained at primary timepoint.
- While-on-treatment means — restrict analysis window to on-treatment visits; interpret cautiously (selection bias toward responders).
Practical Issues
- Multiplicity across visits: prespecify one primary analysis visit; use hierarchical or graphical alpha for multi-visit claims.
- Change-from-baseline interpretability: report least-squares mean difference with 95% CI and anchor to minimal important difference (MID); avoid over-interpreting transient between-visit differences.
- Death/progression handling: never "carry forward" observed scores from before death (LOCF is discouraged by ICH E9(R1) and FDA); use composite endpoints or explicit estimand-aligned imputation.
- Open-label bias: PROs in open-label trials are susceptible to response bias; supportive blinded endpoints and sensitivity analyses required.
- Baseline balance: strong baseline-outcome correlation → always include baseline and baseline×visit.
- Convergence: UN covariance with many visits/small N may fail; use Toeplitz fallback and document in SAP.
SAP Template — Primary MMRM with Sensitivity Hierarchy
Primary analysis (hypothetical estimand). Change from baseline in [QLQ-C30 Global Health Status] at Week 24 is analyzed using a mixed model for repeated measures including fixed effects for treatment, scheduled visit (categorical), treatment-by-visit interaction, baseline score, baseline-by-visit interaction, and stratification factors [...]. An unstructured covariance matrix models within-subject errors with Kenward–Roger denominator degrees of freedom. The between-treatment LS-mean difference at Week 24 with two-sided 95% CI is the primary estimate. Data after treatment discontinuation or initiation of subsequent anticancer therapy are set to missing (hypothetical strategy).
Sensitivity 1 — Covariance robustness. Repeat MMRM with Toeplitz covariance.
Sensitivity 2 — MAR via MI. Multiple imputation (M=100) under MAR using treatment, baseline, covariates, and observed post-baseline values; analyze each completed dataset by ANCOVA at Week 24; pool via Rubin's rules.
Sensitivity 3 — MNAR reference-based. Jump-to-Reference (J2R) imputation of post-ICE values in the experimental arm using the control arm as reference (
rbmi, M=100).Sensitivity 4 — Delta-adjusted tipping point. Impute under MAR, then add δ ∈ {0, 2, 4, …, 20} to imputed values in the experimental arm; report the smallest δ at which the Week-24 LS-mean difference loses 95% significance, interpret against MID=10.
Sensitivity 5 — Composite supportive. Time to definitive deterioration (confirmed ≥10-point worsening) with death as event; Kaplan–Meier and stratified log-rank.
Supportive — Treatment policy. Repeat primary MMRM retaining all observed data regardless of ICE occurrence.
Decision rule: primary claim is based on Primary analysis; claim is considered robust if sensitivities 1–4 agree in direction with Primary, and the tipping-point δ exceeds the MID.
Regulatory Precedent
Fewer than 3 endpoint-specific precedents are available from the provided retrieval context. One explicit example is cited in the FDA 2018 Cancer Endpoints guidance:
| NCT# | Trial | Drug | Indication | Endpoint | Outcome |
|---|---|---|---|---|---|
| NCT00934544 | COMFORT-I | Ruxolitinib | Myelofibrosis | MF-SAF Total Symptom Score ≥50% reduction at Week 24 (composite symptom endpoint, MMRM supportive) | Supported regular approval (2011) — cited in FDA 2018 endpoint guidance as exemplar of composite symptom endpoint |
Additional context-supported references (from ICH E9(R1) and estimand literature): RECORD-1 everolimus (crossover-adjusted OS via RPSFT) illustrates estimand-aligned sensitivity for related analyses but is not a PRO precedent.
Limitations and Pitfalls
- MAR is unverifiable — informative dropout (progression, death, toxicity) is the norm in oncology, so reference-based and delta sensitivity analyses are effectively mandatory, not optional.
- LOCF is deprecated — biases results and violates ICH E9(R1) alignment principles; persists only as historical reference.
- Over-reliance on while-on-treatment estimates — creates survivorship bias; should never stand alone for symptom-benefit claims.
- PRO open-label bias — unblinded designs cannot distinguish pharmacologic effect from expectation; strong regulatory skepticism for labeling claims.
- Ceiling/floor effects in QoL instruments limit MMRM validity; consider rank-based or mixture approaches.
- Multiplicity abuse — "significance at any visit" claims without pre-specified primary visit will not survive review.
- Schedule misalignment — differential assessment frequency between arms corrupts visit-by-visit inference.
Backlinks
- ICH E9(R1) Estimand Framework
- Intercurrent Events in Oncology Trials
- Sensitivity Analyses for Estimands
- Missing Data: Mechanisms, Methods, and Estimand-Driven Strategy
- Statistical Analysis Methods in Oncology Trials
- Time-to-Event Assumptions and Nonproportional Hazards
- Response, Binary, and Disease-Control Endpoint Methods
- Sensitivity Analysis Playbook for Oncology Trials
- Multiplicity Control in Oncology Trials
Source: ICH E9(R1) Addendum on Estimands and Sensitivity Analysis (2019, Final); ICH E8(R1) General Considerations for Clinical Studies (2021, Final); FDA Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics (December 2018, Final); MMRM, reference-based imputation, and estimand-oncology literature summaries. Status: Final guidance (all three regulatory sources) Compiled from retrieved FDA chunks + literature summaries