Overcoming Missing Data in the Swedish National Study on Aging

In this new study, researchers compared three multiple imputation strategies for overcoming the missing discrete variable of gait speed in the Swedish National Study on Aging and Care (SNAC).

Missing data in aging studies, especially in the assessment of gait speed (the time it takes individuals to cover a set distance), presents a significant challenge. The elderly are more prone to health and functional issues, which often interfere with data collection efforts. Given that gait speed is a key indicator of functional status and overall health in older individuals, ensuring its availability and accurate measurement is essential for the integrity of aging research.

In a new study, researchers Robert Thiesmeier, Ahmad Abbadi, Debora Rizzuto, Amaia Calderón-Larrañaga, Scott M. Hofer, and Nicola Orsini from Karolinska Institutet, Stockholm University, Stockholm Gerontology Research Center, and Oregon Health and Science University address the systematic challenge of missing gait speed data in aging research and explore the application of multiple imputation (MI), a statistical technique that has emerged as a constructive approach to handle such gaps in data. The team critically examined the implementation strategies, methodologies, and the impact that these missing variables could have on the outcomes of aging studies, thereby offering a framework to manage and interpret incomplete datasets in aging research. On February 14, 2024, their research paper was published in Aging’s Volume 16, Issue 4, entitled, “Multiple imputation of systematically missing data on gait speed in the Swedish National Study on Aging and Care.”

“[…] this study aims to investigate and assess the performance of different MI strategies specifically targeting the systematically missing discrete variable of gait speed in the SNAC [Swedish National Study on Aging and Care] IPDMA [individual participant data meta-analyses] with only four large cohort studies.”

Setting the Context

Before delving into the specifics of the study, it’s crucial to comprehend the broader context. Aging, as a biological process, presents numerous challenges, particularly in healthcare. Addressing these challenges requires comprehensive data to inform clinical diagnosis and prognosis. The Swedish National Study on Aging and Care (SNAC) is one such initiative that aims to provide a holistic view of aging and elderly data.

SNAC was launched in 2001 as an ongoing longitudinal cohort study based on samples of the Swedish elderly population. The study comprises four sites: Kungsholmen, Skåne, Nordanstig, and Blekinge. Each site collects data on health determinants, disease outcomes, functional capacity, and social conditions. SNAC’s diverse data collection has facilitated the development of an innovative Health Assessment Tool integrating indicators of both clinical and functional health in a population aged 60+ years.

SNAC, like any extensive study, faces the issue of missing data. One variable, gait speed, is systematically absent in one study site, Blekinge. This absence poses a significant challenge for researchers. They must decide between using complete data from only three studies, risking information loss and potential bias in combined estimates, or employing multiple imputation (MI) methods to estimate missing values based on observed data.

What is Multiple Imputation?

Gait speed, or the speed at which a person walks, is a simple but powerful indicator of health and functional status in older adults. It can predict the risk of mortality, disability, cognitive decline, and institutionalization. However, measuring gait speed is not always feasible in large-scale epidemiological studies, especially when participants are frail, have mobility limitations, or live in remote areas. This can result in missing data on gait speed, which can bias the estimates of its association with health outcomes and reduce the statistical power of the analyses.

One way to handle missing data on gait speed is to use multiple imputation, a statistical technique that replaces each missing value with a set of plausible values that reflect the uncertainty about the true value. Multiple imputation can reduce bias and increase precision compared to excluding cases with missing data or using a single imputation method. However, there are different ways to perform multiple imputation, and some may be more suitable than others depending on the type and pattern of missing data.

The Study

In the current study, the researchers compared three multiple imputation strategies for dealing with systematically missing data on gait speed in the SNAC. The SNAC consists of four prospective cohort studies that measured gait speed at baseline and follow-up, except for one study that did not measure gait speed at all. The authors simulated 1000 individual participant data meta-analyses (IPDMA) based on the characteristics of the SNAC and evaluated the performance of three multiple imputation strategies: fully conditional specification (FCS), multivariate normal (MVN), and conditional quantile imputation (CQI).

The FCS method imputes each variable separately by using regression models that depend on the other variables in the dataset. The MVN method assumes that the data follow a multivariate normal distribution and imputes all variables simultaneously by using an expectation-maximization algorithm. The CQI method imputes discrete variables by using quantile regression models that preserve the distribution of the original data.

The authors analyzed the imputed datasets with a two-stage common-effect multivariable logistic model that estimated the effect of three levels of gait speed (<0.8 m/s, 0.8-1.2 m/s, >1.2 m/s) on 5-years mortality. They found that all three imputation methods performed relatively well in terms of bias and coverage of the confidence intervals. However, the CQI method showed the smallest bias and the best coverage for both low and high levels of gait speed. The FCS and MVN methods tended to overestimate the effect of low gait speed and underestimate the effect of high gait speed on mortality.


The authors concluded that multiple imputation can be a useful tool for dealing with systematically missing data on gait speed in IPDMA based on the SNAC. They recommended the CQI method as the preferred approach for imputing discrete variables such as gait speed, as it preserves the original distribution and avoids unrealistic values. They also highlighted the importance of reporting the details of the multiple imputation procedure and checking the plausibility of the imputed values.

This study provides valuable insights for researchers who face similar challenges with missing data on gait speed or other discrete variables in aging research. By using appropriate multiple imputation methods, they can improve the validity and reliability of their results and avoid losing valuable information.

Click here to read the full research paper published in Aging.

Aging is an open-access, traditional, peer-reviewed journal that has published high-impact papers in all fields of aging research since 2009. All papers are available to readers (at no cost and free of subscription barriers) in bi-monthly issues at Aging-US.com.

Click here to subscribe to Aging publication updates.

For media inquiries, please contact media@impactjournals.com.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

  • Follow Us