This is the last in a series of three posts on missing data imputation.
- In the first post ”Considerations for Missing Data Imputation“ we reviewed the underlying assumptions and limitations of single imputation methods, in particular, last observation carried forward (LOCF). Although simple to apply and understand, we gave examples where LOCF may either overestimate or underestimate the treatment effect, depending on the disease under study and the reasons for dropout.
- In the second post “Analyzing ‘Missing at Random’ Data”, we considered more sophisticated data imputation methods for longitudinal data, Mixed Models for Repeated Measures (MMRM) and Multiple Imputation (MI). These methods have a stronger theoretical basis and are generally preferred by the regulatory authorities over single imputation methods. An important assumption for both MMRM and MI is that the data are ‘missing at random’ (MAR), if not ‘missing completely at random’ (MCAR).
But what if this assumption is not true for all the missing data?
For many types of dropout and missing data scenarios, it may be reasonable to assume data are MAR. Often, however, it may be that some of the missing data in your study is neither MCAR nor MAR, but ‘missing not at random’ (MNAR). Data are MNAR when missing observations are related to values of unobserved data. This means that neither the prior observed values nor the covariates in the model can predict the missing values. For example, in an antihypertensive study, if patients whose blood pressures were substantially lowered after several visits decided to discontinue, a statistical model may predict low values for the subsequent missing blood pressure measurements, when there is a strong likelihood the readings would have increased after treatment discontinuation. Because there are numerous reasons that data may be missing for patients in a clinical trial, and it is not possible to exclude that some data are MNAR, analyses that rely solely on the assumption of MCAR or MAR may lead to biased estimates of the treatment effect.
For analyses of primary and secondary endpoints that are based on MCAR or MAR assumptions, sensitivity analyses should be performed to assess the impact of deviations from these assumptions. This is particularly important when a significant treatment effect is found, but the number and/or type of patient discontinuations are imbalanced between treatment groups. In this situation, the validity of the result could be questioned. A type of sensitivity analysis that can be used to explore the impact of MNAR data that has been successfully applied in many regulatory submissions is called a tipping point analysis.
In a tipping point analysis, missing data are imputed over a range of possible scenarios for the treatment effect. This is done in order to identify the scenario or ‘tipping point’ where the treatment effect in subjects with missing data overturns the significant treatment effect obtained in the MCAR or MAR analysis. Researchers then use their knowledge of the disease and patient population to assess the plausibility of the tipping point scenario. If considered unlikely, this provides support for the treatment effect found in the MCAR or MAR analysis. The intuitiveness of this approach is appealing for regulatory decision makers.
Let’s consider an example.
Suppose a placebo-controlled clinical trial is being conducted to test the efficacy of Drug X for lowering LDL-cholesterol (LDL-C) in patients with baseline levels between 130 and 180 mg/dL. The primary endpoint is change from baseline in fasting LDL-C levels after 12 weeks of treatment. Several patients in both treatment groups discontinue prior to completing the 12-week treatment period. An MMRM model with terms for treatment, visit, treatment-by-visit interaction and baseline covariates is fit using SAS PROC MIXED to analyze the primary endpoint. The treatment effect is found to be significant, so a tipping point analysis is performed to examine the robustness of the result to departures from the MAR assumption of the MMRM model.
The tipping point analysis is conducted as follows. Missing data for LDL-C at Week 12 and any of the prior visits are imputed under the assumption of MAR using SAS PROC MI. For each of the visits with imputed values, the imputed LDL-C values for Drug X are made worse by adding a delta defined as k times the treatment difference observed at that visit, where k is a shift parameter that is incremented to test a range of scenarios for the treatment effect to identify the point at which the primary analysis result becomes non-significant.
When k = 0%, this produces a scenario that is equivalent to the MAR assumption used in the primary analysis, and when k = 100%, this produces a scenario in which the effect of Drug X is essentially equivalent to that of placebo. Values of k > 100% indicate scenarios where the effect of Drug X is worse than that of placebo. For each value of k, 30 (or more) imputed datasets are generated using PROC MI. Each of the 30 complete datasets are then analyzed using the MMRM model, and the results are combined using PROC MIANALYZE.
Incrementing k, these calculations produce a p-value for the treatment effect corresponding to each value of k. The smallest value of k where the treatment effect is no longer significant is identified as the tipping point. Consideration is then given to how plausible the imputed values are for Drug X patients at the tipping point. For example, if the imputed LDL-C values for Drug X patients must be twice those of placebo patients before reaching the tipping point, they would not be considered likely values, and the primary analysis result under the MAR assumption would be supported.
Despite attempts to minimize the amount of missing data in clinical trials through proper study design and conduct, some missing data is inevitable. If not properly addressed in analysis, the impact of missing data can raise questions from regulatory agencies regarding the validity of the results, particularly in controlled trials when there is imbalance among the treatment groups in the amount or types of missing data. Commonly used methods in the industry today are MMRM and MI, both of which assume MAR data. As the MAR assumption cannot be verified, analyses are incomplete without the inclusion of sensitivity analyses, such as a tipping point analysis.