Longitudinal, PRO, and Repeated-Measures Methods

Definition

Longitudinal and repeated-measures methods estimate treatment effects on outcomes measured multiple times per subject across scheduled visits (symptom scores, QoL scales, biomarkers, tumor burden). In oncology, these analyses most commonly target patient-reported outcome (PRO) endpoints and exploratory biomarker trajectories.

Per ICH E9(R1) (Final, 2019), the estimand framework requires that longitudinal analyses make explicit how intercurrent events (ICEs) such as treatment discontinuation, rescue therapy, progression, or death are handled, since these events give rise to missing data that "needs to be addressed as a missing data problem in the statistical analysis" once the estimand is fixed.

Per the FDA Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics guidance (Final, December 2018), symptom endpoints may include "specific symptom endpoints" or "composite symptom endpoints, such as the myelofibrosis symptom assessment form," and time-to-event symptom analyses.

PRO Endpoints in Oncology: Instrument Selection and Labeling Claims

PRO instruments must be fit-for-purpose, validated in the target tumor population, and pre-specified with a clear conceptual framework per the FDA Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims guidance (Final, 2009) and the FDA Core Patient-Reported Outcomes in Cancer Clinical Trials guidance (Draft, 2021 / Final, 2024). Core oncology instruments and their typical claim targets:

Instrument	Domain	Typical oncology use	Labeling claim precedent
EORTC QLQ-C30 + disease modules (LC13, BR23, CR29, OV28, MY20)	HRQoL, functioning, symptoms	Global QoL and functional scales in Phase 3 metastatic trials	Supportive labeling (e.g., enzalutamide mCRPC QoL maintenance)
FACT-G / FACT-L / FACT-P / FACT-B	HRQoL, disease-specific	U.S. pivotal trials; physical/functional/social/emotional	Supportive labeling in prostate, lung, breast
MDASI (core + modules)	Symptom burden	Symptom improvement endpoints	Limited stand-alone; supportive
MF-SAF / MPN-SAF TSS	Myelofibrosis symptoms	TSS50 response at Week 24	Primary efficacy labeling (ruxolitinib, COMFORT-I)
BPI (Brief Pain Inventory)	Pain severity / interference	Bone-pain endpoints in mCRPC, bone metastases	Pain-response labeling (radium-223, abiraterone)
PRO-CTCAE	Symptomatic AE	Tolerability / Project Optimus dose-finding	Supportive tolerability characterization
EQ-5D-5L	Health utility	HEOR and reimbursement; not efficacy labeling	Not an efficacy claim driver

Core principles for instrument selection:

Concept of interest must match the claim (symptom improvement ≠ HRQoL maintenance ≠ functional benefit).
Content validity demonstrated in the specific tumor population and line of therapy.
Recall period and administration frequency aligned with the estimand's time horizon.
Psychometric evidence (reliability, construct validity, responsiveness) pre-submitted; meaningful-change threshold (MID) anchored in that population.
Missing data strategy is part of fit-for-purpose evaluation — instruments with chronic high attrition cannot support a labeling claim even with valid psychometrics.

Labeling-claim archetypes:

Symptom improvement claim — primary or co-primary symptom endpoint (e.g., MF-SAF TSS50); requires blinded design or strong sensitivity analyses, pre-specified responder definition anchored to MID, and reference-based/tipping-point sensitivity for MNAR.
Symptom delay / time-to-deterioration claim — composite TTDD with death as event; stratified log-rank, competing-risk sensitivity.
HRQoL maintenance claim — MMRM on global QoL across visits with pre-specified primary visit; typically supports descriptive labeling rather than efficacy.
Tolerability characterization (non-claim) — PRO-CTCAE bother/interference descriptives; informs prescribing information safety section.

Regulatory Position

ICH E9(R1) (Final, 2019) — the main estimator must be aligned with the estimand, and "to explore the robustness of inferences from the main estimator to deviations from its underlying assumptions, a sensitivity analysis should be conducted." Missing-data-specific sensitivity analyses (e.g., reference-based imputation, delta adjustment) are required whenever the primary analysis invokes MAR.
ICH E8(R1) (Final, 2021) — patient-centered quality factors and pre-specified statistical analysis plans are "critical to quality" in confirmatory oncology studies, including PRO endpoints.
FDA Cancer Endpoints Guidance (Final, 2018) — supports symptom/PRO endpoints for regular approval when clinically meaningful, well-defined, and protected from bias (blinding, complete capture, pre-specified analysis). PROs rarely support accelerated approval alone; they most often support labeling claims or serve as key secondary endpoints alongside OS/PFS.

When to Use

Symptom-directed endpoints — myelofibrosis (MF-SAF Total Symptom Score 50% response; Jakafi), CRPC bone pain, myelodysplastic syndrome fatigue, cachexia interventions.
QoL/functioning — EORTC QLQ-C30, FACT-G, and disease-specific modules (QLQ-LC13 NSCLC, QLQ-BR23 breast, FACT-L, FACT-P) as secondary endpoints in metastatic Phase 3 trials.
Biomarkers — longitudinal ctDNA, PSA kinetics (PCWG3), tumor size (sum of diameters, RECIST 1.1) for exposure-response and dose-optimization (Project Optimus).
Settings — primarily metastatic/advanced disease; maintenance and supportive-care trials; neoadjuvant symptom burden; post-transplant GVHD symptom tracking.

Design Considerations

Model choice

Scenario	Preferred model	Rationale
Two timepoints (baseline + one post-baseline)	ANCOVA adjusting for baseline	Most efficient; no covariance to model
≥3 visits, continuous outcome, MAR plausible	MMRM with visit as categorical fixed effect	Uses all partial data; valid under MAR; no explicit imputation
Binary/count repeated outcomes	GLMM or GEE	Logit/log link; robust SE for GEE
Trajectory shape of interest	Random-slope LMM	Models subject-specific growth

MMRM specification (primary template)

Change from baseline ~ treatment + visit + treatment×visit + baseline + baseline×visit + stratification covariates, with unstructured (UN) covariance within subject, Kenward–Roger denominator df, REML estimation. R: mmrm::mmrm() or nlme::gls() with corSymm; SAS: PROC MIXED with REPEATED / TYPE=UN.

MMRM assumptions

MAR conditional on observed outcomes, baseline, and covariates in the model
Covariance structure: UN preferred when visits ≤ ~6; fallback Toeplitz or AR(1) for many visits or convergence failure
Visits treated as categorical; schedule aligned across arms (analysis windows pre-specified)
Estimand-linked: MMRM estimates the hypothetical effect if subjects had remained on treatment and assessed per schedule — requires explicit ICE strategy declaration

Assessment schedule

Pre-specify PRO completion windows (e.g., ±7 days of imaging visit), order of administration (PRO before clinical contact to avoid bias), and analysis visits.
Compliance thresholds (commonly ≥70% baseline, ≥60% at analysis visit) should be monitored; deviations trigger sensitivity analyses.

Alpha allocation & multiplicity

PRO endpoints are typically secondary: alpha recycled via graphical testing (Bretz/Maurer) or fixed-sequence gatekeeping after primary PFS/OS success.
Across visits: one designated primary visit (e.g., Week 24) for hypothesis testing; other visits descriptive. Avoid "significance at any visit" without adjustment.

Intercurrent Events

For PRO/repeated-measures endpoints in oncology, the dominant ICEs are:

Disease progression with treatment discontinuation
- Strategy: Hypothetical (what would scores be if subjects remained on treatment) for symptom-improvement claims; Treatment policy (follow-up regardless) for disease-related symptom endpoints.
- Statistical consequence: Hypothetical → MMRM under MAR + reference-based/delta sensitivity; Treatment policy → requires continued PRO collection post-progression.
- SAP template: "For patients who discontinue treatment due to progression, the hypothetical strategy is applied; data collected after discontinuation are set to missing and imputed under MAR via MMRM, with reference-based (J2R) imputation as sensitivity."
Death
- Strategy: Composite (deterioration or death = failure) for time-to-deterioration PROs; While-on-treatment for symptom scores among survivors.
- Statistical consequence: Composite → survival-style analysis (Kaplan–Meier, Cox); while-on-treatment → conditional on survival and biased toward healthier patients.
- SAP template: "Time to definitive deterioration is analyzed as a composite endpoint where death without prior deterioration is treated as an event."
Rescue/subsequent anticancer therapy
- Strategy: Treatment policy for health-utility/QoL; Hypothetical for symptom-pharmacology claims.
- Statistical consequence: Treatment policy requires continued data collection post-switch; hypothetical requires censoring and MAR/MNAR sensitivity.
- SAP template: "Data following initiation of subsequent systemic anticancer therapy are retained and included under the treatment-policy strategy for QoL endpoints; under the hypothetical strategy for targeted symptom endpoints, post-switch data are excluded and imputed."

Missing Data Strategy by Estimand

Estimand strategy	Primary analysis	Sensitivity analyses
Treatment policy	Direct-likelihood MMRM or MI using all observed data incl. post-ICE	Tipping-point analysis; pattern-mixture with varying MAR extrapolation
Hypothetical	MMRM under MAR (discard post-ICE data)	Reference-based (J2R, CIR, CR); delta adjustment
Composite (time-to-deterioration)	KM + stratified log-rank, Cox	Competing-risk (Gray's test) treating death as competing
While-on-treatment	MMRM on observed windows	Restrict to subjects with ≥X visits; selection-model sensitivity

Multiple Imputation under MAR

Impute missing outcomes from posterior predictive distribution conditional on observed data, treatment, baseline, covariates (M ≥ 50 imputations).
Analyze each completed dataset via ANCOVA/MMRM.
Pool using Rubin's rules. R: mice, mi, Hmisc::aregImpute.

Reference-Based Imputation under MNAR

Core idea (Carpenter/Roger): after ICE in the active arm, impute as if the subject followed the control-arm trajectory. Controlled, conservative departures from MAR aligned with the hypothetical estimand.

J2R (Jump to Reference) — post-ICE mean jumps to control mean immediately.
CIR (Copy Increments in Reference) — preserves pre-ICE level, future increments follow reference.
CR (Copy Reference) — full trajectory replaced by reference group's.

R: mimix, RefBasedMI, rbmi (Roche). SAS: PROC MI with MNAR statement.

Delta-Adjusted (Tipping-Point) Sensitivity

Imputed values in the experimental arm are shifted by +δ (worse). δ is increased until the treatment effect loses significance; the "tipping point" is compared to plausible clinical magnitudes. Reported as range of δ sustaining significance.

Composite and While-on-Treatment Approaches for PRO Deterioration

Time to Deterioration (TTD) — first ≥MID worsening from baseline (e.g., ≥10-point on QLQ-C30 functional scale). Death without prior deterioration counted as event (composite).
Time to Definitive Deterioration (TTDD) — requires confirmed worsening at two consecutive visits without subsequent recovery.
Responder analyses — proportion achieving ≥MID improvement sustained at primary timepoint.
While-on-treatment means — restrict analysis window to on-treatment visits; interpret cautiously (selection bias toward responders).

Practical Issues

Multiplicity across visits: prespecify one primary analysis visit; use hierarchical or graphical alpha for multi-visit claims.
Change-from-baseline interpretability: report least-squares mean difference with 95% CI and anchor to minimal important difference (MID); avoid over-interpreting transient between-visit differences.
Death/progression handling: never "carry forward" observed scores from before death (LOCF is discouraged by ICH E9(R1) and FDA); use composite endpoints or explicit estimand-aligned imputation.
Open-label bias: PROs in open-label trials are susceptible to response bias; supportive blinded endpoints and sensitivity analyses required.
Baseline balance: strong baseline-outcome correlation → always include baseline and baseline×visit.
Convergence: UN covariance with many visits/small N may fail; use Toeplitz fallback and document in SAP.

SAP Template — Primary MMRM with Sensitivity Hierarchy

Primary analysis (hypothetical estimand). Change from baseline in [QLQ-C30 Global Health Status] at Week 24 is analyzed using a mixed model for repeated measures including fixed effects for treatment, scheduled visit (categorical), treatment-by-visit interaction, baseline score, baseline-by-visit interaction, and stratification factors [...]. An unstructured covariance matrix models within-subject errors with Kenward–Roger denominator degrees of freedom. The between-treatment LS-mean difference at Week 24 with two-sided 95% CI is the primary estimate. Data after treatment discontinuation or initiation of subsequent anticancer therapy are set to missing (hypothetical strategy).

Sensitivity 1 — Covariance robustness. Repeat MMRM with Toeplitz covariance.

Sensitivity 2 — MAR via MI. Multiple imputation (M=100) under MAR using treatment, baseline, covariates, and observed post-baseline values; analyze each completed dataset by ANCOVA at Week 24; pool via Rubin's rules.

Sensitivity 3 — MNAR reference-based. Jump-to-Reference (J2R) imputation of post-ICE values in the experimental arm using the control arm as reference (rbmi, M=100).

Sensitivity 4 — Delta-adjusted tipping point. Impute under MAR, then add δ ∈ {0, 2, 4, …, 20} to imputed values in the experimental arm; report the smallest δ at which the Week-24 LS-mean difference loses 95% significance, interpret against MID=10.

Sensitivity 5 — Composite supportive. Time to definitive deterioration (confirmed ≥10-point worsening) with death as event; Kaplan–Meier and stratified log-rank.

Supportive — Treatment policy. Repeat primary MMRM retaining all observed data regardless of ICE occurrence.

Decision rule: primary claim is based on Primary analysis; claim is considered robust if sensitivities 1–4 agree in direction with Primary, and the tipping-point δ exceeds the MID.

Regulatory Precedent

Fewer than 3 endpoint-specific precedents are available from the provided retrieval context. One explicit example is cited in the FDA 2018 Cancer Endpoints guidance:

NCT#	Trial	Drug	Indication	Endpoint	Outcome
NCT00934544	COMFORT-I	Ruxolitinib	Myelofibrosis	MF-SAF Total Symptom Score ≥50% reduction at Week 24 (composite symptom endpoint, MMRM supportive)	Supported regular approval (2011) — cited in FDA 2018 endpoint guidance as exemplar of composite symptom endpoint

Additional context-supported references (from ICH E9(R1) and estimand literature): RECORD-1 everolimus (crossover-adjusted OS via RPSFT) illustrates estimand-aligned sensitivity for related analyses but is not a PRO precedent.

Limitations and Pitfalls

MAR is unverifiable — informative dropout (progression, death, toxicity) is the norm in oncology, so reference-based and delta sensitivity analyses are effectively mandatory, not optional.
LOCF is deprecated — biases results and violates ICH E9(R1) alignment principles; persists only as historical reference.
Over-reliance on while-on-treatment estimates — creates survivorship bias; should never stand alone for symptom-benefit claims.
PRO open-label bias — unblinded designs cannot distinguish pharmacologic effect from expectation; strong regulatory skepticism for labeling claims.
Ceiling/floor effects in QoL instruments limit MMRM validity; consider rank-based or mixture approaches.
Multiplicity abuse — "significance at any visit" claims without pre-specified primary visit will not survive review.
Schedule misalignment — differential assessment frequency between arms corrupts visit-by-visit inference.

Backlinks

Source: ICH E9(R1) Addendum on Estimands and Sensitivity Analysis (2019, Final); ICH E8(R1) General Considerations for Clinical Studies (2021, Final); FDA Clinical Trial Endpoints for the Approval of Cancer Drugs and Biologics (December 2018, Final); MMRM, reference-based imputation, and estimand-oncology literature summaries. Status: Final guidance (all three regulatory sources) Compiled from retrieved FDA chunks + literature summaries