Overcoming Missing Data in the Swedish National Study on Aging

In this new study, researchers compared three multiple imputation strategies for overcoming the missing discrete variable of gait speed in the Swedish National Study on Aging and Care (SNAC).

Missing data in aging studies, especially in the assessment of gait speed (the time it takes individuals to cover a set distance), presents a significant challenge. The elderly are more prone to health and functional issues, which often interfere with data collection efforts. Given that gait speed is a key indicator of functional status and overall health in older individuals, ensuring its availability and accurate measurement is essential for the integrity of aging research.

In a new study, researchers Robert Thiesmeier, Ahmad Abbadi, Debora Rizzuto, Amaia Calderón-Larrañaga, Scott M. Hofer, and Nicola Orsini from Karolinska Institutet, Stockholm University, Stockholm Gerontology Research Center, and Oregon Health and Science University address the systematic challenge of missing gait speed data in aging research and explore the application of multiple imputation (MI), a statistical technique that has emerged as a constructive approach to handle such gaps in data. The team critically examined the implementation strategies, methodologies, and the impact that these missing variables could have on the outcomes of aging studies, thereby offering a framework to manage and interpret incomplete datasets in aging research. On February 14, 2024, their research paper was published in Aging’s Volume 16, Issue 4, entitled, “Multiple imputation of systematically missing data on gait speed in the Swedish National Study on Aging and Care.”

“[…] this study aims to investigate and assess the performance of different MI strategies specifically targeting the systematically missing discrete variable of gait speed in the SNAC [Swedish National Study on Aging and Care] IPDMA [individual participant data meta-analyses] with only four large cohort studies.”

Setting the Context

Before delving into the specifics of the study, it’s crucial to comprehend the broader context. Aging, as a biological process, presents numerous challenges, particularly in healthcare. Addressing these challenges requires comprehensive data to inform clinical diagnosis and prognosis. The Swedish National Study on Aging and Care (SNAC) is one such initiative that aims to provide a holistic view of aging and elderly data.

SNAC was launched in 2001 as an ongoing longitudinal cohort study based on samples of the Swedish elderly population. The study comprises four sites: Kungsholmen, Skåne, Nordanstig, and Blekinge. Each site collects data on health determinants, disease outcomes, functional capacity, and social conditions. SNAC’s diverse data collection has facilitated the development of an innovative Health Assessment Tool integrating indicators of both clinical and functional health in a population aged 60+ years.

SNAC, like any extensive study, faces the issue of missing data. One variable, gait speed, is systematically absent in one study site, Blekinge. This absence poses a significant challenge for researchers. They must decide between using complete data from only three studies, risking information loss and potential bias in combined estimates, or employing multiple imputation (MI) methods to estimate missing values based on observed data.

What is Multiple Imputation?

Gait speed, or the speed at which a person walks, is a simple but powerful indicator of health and functional status in older adults. It can predict the risk of mortality, disability, cognitive decline, and institutionalization. However, measuring gait speed is not always feasible in large-scale epidemiological studies, especially when participants are frail, have mobility limitations, or live in remote areas. This can result in missing data on gait speed, which can bias the estimates of its association with health outcomes and reduce the statistical power of the analyses.

One way to handle missing data on gait speed is to use multiple imputation, a statistical technique that replaces each missing value with a set of plausible values that reflect the uncertainty about the true value. Multiple imputation can reduce bias and increase precision compared to excluding cases with missing data or using a single imputation method. However, there are different ways to perform multiple imputation, and some may be more suitable than others depending on the type and pattern of missing data.

The Study

In the current study, the researchers compared three multiple imputation strategies for dealing with systematically missing data on gait speed in the SNAC. The SNAC consists of four prospective cohort studies that measured gait speed at baseline and follow-up, except for one study that did not measure gait speed at all. The authors simulated 1000 individual participant data meta-analyses (IPDMA) based on the characteristics of the SNAC and evaluated the performance of three multiple imputation strategies: fully conditional specification (FCS), multivariate normal (MVN), and conditional quantile imputation (CQI).

The FCS method imputes each variable separately by using regression models that depend on the other variables in the dataset. The MVN method assumes that the data follow a multivariate normal distribution and imputes all variables simultaneously by using an expectation-maximization algorithm. The CQI method imputes discrete variables by using quantile regression models that preserve the distribution of the original data.

The authors analyzed the imputed datasets with a two-stage common-effect multivariable logistic model that estimated the effect of three levels of gait speed (<0.8 m/s, 0.8-1.2 m/s, >1.2 m/s) on 5-years mortality. They found that all three imputation methods performed relatively well in terms of bias and coverage of the confidence intervals. However, the CQI method showed the smallest bias and the best coverage for both low and high levels of gait speed. The FCS and MVN methods tended to overestimate the effect of low gait speed and underestimate the effect of high gait speed on mortality.

Conclusions

The authors concluded that multiple imputation can be a useful tool for dealing with systematically missing data on gait speed in IPDMA based on the SNAC. They recommended the CQI method as the preferred approach for imputing discrete variables such as gait speed, as it preserves the original distribution and avoids unrealistic values. They also highlighted the importance of reporting the details of the multiple imputation procedure and checking the plausibility of the imputed values.

This study provides valuable insights for researchers who face similar challenges with missing data on gait speed or other discrete variables in aging research. By using appropriate multiple imputation methods, they can improve the validity and reliability of their results and avoid losing valuable information.

Click here to read the full research paper published in Aging.

Aging is an open-access, traditional, peer-reviewed journal that has published high-impact papers in all fields of aging research since 2009. All papers are available to readers (at no cost and free of subscription barriers) in bi-monthly issues at Aging-US.com.

Click here to subscribe to Aging publication updates.

For media inquiries, please contact media@impactjournals.com.

Behind the Study: Interview with Dr. Gil Atzmon

Dr. Gil Atzmon from the Albert Einstein College of Medicine discusses his 2017 study published by Aging, entitled, “The complex genetics of gait speed: genome-wide meta-analysis approach.”

Researchers explain their studies that were published in Aging
Researchers explain their studies that were published in Aging

Speaker

Welcome to the Aging YouTube channel. This interview is with Dr. Gil Atzmon in the department of medicine and genetics at the Albert Einstein College of Medicine in the Bronx, New York. (He is) also in the department of human biology and a faculty member of the Department of Natural Science at the University of Haifa in Haifa, Israel. (He is) talking about a manuscript published in Volume 9, Issue 1 of Aging titled, “The complex genetics of gait speed, genome-wide meta analysis approach.”

Dr. Gil Atzmon

So the paper that I’m talking about is, “The complex genetic of gait speed: genome-wide meta analysis approach.” And what we did here is to combine 21 studies around the world and try to figure out what is the genetic predisposition for gait speed. The idea was that if we are going by number, then we will find something because the size is a matter of the resolution that you can pinpoint the genetic variant that might have an effect on the phenotype that, in our case, is gait speed. So when you’re talking about challenges, think of do you have 21 people or 21 groups that you need to combine together and figure out how you harmonize the data that they provide you with and try to figure out what’s going on there. This is a challenge because it lasted for almost four years until we had the paper done and published.

But eventually what we found was great. Although what we expected to find once we started this endeavor, we thought we’d have variants that have genomic significance. Meaning, if you have this variant either you have a lower gait speed or you have higher gait speed or normal gait speed. And we’re talking about elderly people. That’s what we tried to figure out. We found out that we didn’t find such a variant, but we find other alternatives.

We try to use protein analysis, group analysis, pathway analysis on all kinds of stuff. And every time that we put the finger on such a different analysis, we found something, some other interesting views. As I said, for genetic variant we didn’t find any, meaning the closest that we have was 10 to the -7 when the threshold was 10 to the -8.

Figure 1. Manhattan plot of meta-analysis of genome wide association studies of gait speed for ~2.5 million genotype and imputed SNPs

But when you look at these genes, we found that there are a couple of them that have higher prevalence among the top hit. Again, they didn’t reach a significance, but the minute you have such a number in the top hits, you think it might be relevant. We have a HLA-DPB1, we have the POM121-L2, and so forth and so forth. And you can see in the paper to look at those variants.

The interesting idea I’m seeing of the observation was that there was a couple of hits that we saw only once, but they are hits such as the [inaudible 00:03:36] 12I02 with a peak, meaning there is aggregation of a couple of hits around this gene or inside this gene. Again, it tells us that this gene might be relevant to what we are looking for. When we did the pathway analysis we found a couple of them that are associated with diabetes, which if you think about it, that really can cause people to either have slow gait speed or higher gait speed. It depends on the disease that you have. We have a couple of hits in the pathway, and a lot of this link us to cancer. And again, the same thing. If you think about it, the minute you have a disease, your performance, in this case it’s gait speed, is either declined or increased.

So we can see in both cases, though we didn’t find the right hit, still what we found has some biological explanation. It also does expression analysis or expression QTL. QTL means that those genes that are associated with the expression of the genes didn’t code in the phenotype, we found a couple of them that were higher significance. Again, another example of what is the predisposition of those genes to the phenotype that we had.

So, all in all, we concluded that we found some relevant genetic predisposition for this phenotype. And although we didn’t find the exact variant that can say “if you have it, you have low speed, and if you don’t have it, you have a higher speed,” we think that if we’re looking at the story that we crafted, we think that we’ve found some ideas, some biological explanation which is what is inside this paper.

Speaker

Aging was launched in 2009 and is currently a traditional peer reviewed journal with free access which publishes in monthly issues. Topics include high impact research papers of general interest and biological significance in all fields of aging research, as well as topics beyond traditional gerontology. You can click on the link in the description below to order a reprint or read the manuscript that was discussed in this interview on aging-us.com. Please feel free to subscribe to our YouTube channel and connect with us on Facebook, Twitter, or LinkedIn. Thank you.

Click here to read the full paper, published by Aging.

WATCH: MORE AGING VIDEOS ON LABTUBE TV

Aging is an open-access journal that publishes research papers monthly in all fields of aging research and other topics. These papers are available to read at no cost to readers on Aging-us.com. Open-access journals offer information that has the potential to benefit our societies from the inside out and may be shared with friends, neighbors, colleagues, and other researchers, far and wide.

For media inquiries, please contact media@impactjournals.com.

  • Follow Us