Proteomics and machine learning in the prediction and explanation of low pectoralis muscle area.

1. Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
Authors
Enzer NA^{1,

3}
Choi B^{1,

3}
Diaz AA^{1,

3,

5}
Washko GR^{1,

3,

5}
(4 authors)
2. Division of Pulmonary, Allergy and Critical Care Medicine, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL, USA.
Authors
Chiles J^{2,

5}
McDonald ML^{2,

5}
(2 authors)
3. Applied Chest Imaging Laboratory, Brigham and Women's Hospital, Boston, MA, USA.
Authors
Enzer NA^{1,

3}
Mason S³
Shirahata T³
Choi B^{1,

3}
Diaz AA^{1,

3,

5}
Washko GR^{1,

3,

5}
Estépar RSJ³
(7 authors)
4. Boston University School of Medicine, Boston, MA, USA.
Authors
Castro V⁴
(1 author)
5. COPDGene Study Consortium, Denver, CO, USA.
Authors
Chiles J^{2,

5}
Regan E⁵
Diaz AA^{1,

3,

5}
Washko GR^{1,

3,

5}
McDonald ML^{2,

5}
(5 authors)

Show all (7)

ORCIDs linked to this article

Show all (7)

Scientific Reports, 03 Aug 2024, 14(1):17981
https://doi.org/10.1038/s41598-024-68447-y PMID: 39097658 PMCID: PMC11297919

This article is in the Europe PMC Open access subset. Refer to the copyright information in the article for licensing details.

Free full text in Europe PMC

This is an update of "Proteomics and Machine Learning in the Prediction and Explanation of Low Pectoralis Muscle Area." Res Sq. 2024 Mar 04:rs.3.rs-3957125. doi: 10.21203/rs.3.rs-3957125/v1. This article is based on a previously available preprint.

Abstract

Low muscle mass is associated with numerous adverse outcomes independent of other associated comorbid diseases. We aimed to predict and understand an individual's risk for developing low muscle mass using proteomics and machine learning. We identified eight biomarkers associated with low pectoralis muscle area (PMA). We built three random forest classification models that used either clinical measures, feature selected biomarkers, or both to predict development of low PMA. The area under the receiver operating characteristic curve for each model was: clinical-only = 0.646, biomarker-only = 0.740, and combined = 0.744. We displayed the heterogenetic nature of an individual's risk for developing low PMA and identified two distinct subtypes of participants who developed low PMA. While additional validation is required, our methods for identifying and understanding individual and group risk for low muscle mass could be used to enable developments in the personalized prevention of low muscle mass.

Free full text

Sci Rep. 2024; 14: 17981.

Published online 2024 Aug 3. https://doi.org/10.1038/s41598-024-68447-y

PMCID: PMC11297919

PMID: 39097658

Proteomics and machine learning in the prediction and explanation of low pectoralis muscle area

Nicholas A. Enzer,^1,² Joe Chiles,^3,⁴ Stefanie Mason,² Toru Shirahata,^2,⁵ Victor Castro,⁶ Elizabeth Regan,^4,⁷ Bina Choi,^1,² Nancy F. Yuan,⁸ Alejandro A. Diaz,^1,^2,⁴ George R. Washko,^1,^2,⁴ Merry-Lynn McDonald,^3,⁴ Raúl San José Estépar,^2,⁹ Samuel Y. Ash,^10,¹¹ and COPDGene Study Consortium

Nicholas A. Enzer

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

Find articles by Nicholas A. Enzer

Joe Chiles

³Division of Pulmonary, Allergy and Critical Care Medicine, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Joe Chiles

Stefanie Mason

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

Find articles by Stefanie Mason

Toru Shirahata

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁵Department of Respiratory Medicine, Saitama Medical University Hospital, Kawagoe, Japan

Find articles by Toru Shirahata

Victor Castro

⁶Boston University School of Medicine, Boston, MA USA

Find articles by Victor Castro

Elizabeth Regan

⁴COPDGene Study Consortium, Denver, CO USA

⁷Division of Rheumatology, Department of Medicine, National Jewish Health, Denver, CO USA

Find articles by Elizabeth Regan

Bina Choi

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

Find articles by Bina Choi

Nancy F. Yuan

⁸Department of Biomedical Informatics, University of California at San Diego, San Diego, CA USA

Find articles by Nancy F. Yuan

Alejandro A. Diaz

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Alejandro A. Diaz

George R. Washko

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by George R. Washko

Merry-Lynn McDonald

³Division of Pulmonary, Allergy and Critical Care Medicine, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Merry-Lynn McDonald

Raúl San José Estépar

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁹Department of Radiology, Brigham and Women’s Hospital, Boston, MA USA

Find articles by Raúl San José Estépar

Samuel Y. Ash

¹⁰Department of Critical Care Medicine, South Shore Hospital, 55 Fogg Road, South Weymouth, MA 02190 USA

¹¹Department of Medicine, Tufts University School of Medicine, Boston, MA USA

Find articles by Samuel Y. Ash

Author information Article notes Copyright and License information Disclaimer

Associated Data

Supplementary Materials: Supplementary Information.
41598_2024_68447_MOESM1_ESM.docx (1.5M)

Data Availability Statement: The data that support the findings of this study are available from the database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap/, accession number pht002239.v4.p2), the National Heart, Lung and Blood Institute (NHLBI) BioData Catalyst (https://biodatacatalyst.nhlbi.nih.gov/resources/data), and by reasonable request from the COPDGene study (https://www.copdgene.org/).

Abstract

Low muscle mass is associated with numerous adverse outcomes independent of other associated comorbid diseases. We aimed to predict and understand an individual’s risk for developing low muscle mass using proteomics and machine learning. We identified eight biomarkers associated with low pectoralis muscle area (PMA). We built three random forest classification models that used either clinical measures, feature selected biomarkers, or both to predict development of low PMA. The area under the receiver operating characteristic curve for each model was: clinical-only=0.646, biomarker-only=0.740, and combined=0.744. We displayed the heterogenetic nature of an individual’s risk for developing low PMA and identified two distinct subtypes of participants who developed low PMA. While additional validation is required, our methods for identifying and understanding individual and group risk for low muscle mass could be used to enable developments in the personalized prevention of low muscle mass.

Subject terms: Proteomics, Medical imaging

Introduction

Sarcopenia is a clinical syndrome characterized by low muscle strength and low muscle quality or quantity, and its presence is often associated with low physical performance^¹,². While sarcopenia often considered a result or a complication of age and comorbid conditions, sarcopenia as a disease in and of itself is independently associated with numerous adverse outcomes including injury, disease, and mortality^¹. Thus it is crucial to identify those at risk for developing sarcopenia in order to intervene before adverse outcomes occur^³.

One approach to measuring the low muscle quantity aspect of sarcopenia is the use of computed tomography (CT), including the measurement of pectoralis muscle area (PMA) on CT imaging of the chest. Prior work has demonstrated the utility of these measurements for predicting adverse outcomes such as exacerbations of respiratory disease and death^⁴–⁶. In addition, a variety of clinical factors and biomarkers have been identified as being associated with low muscle mass, such as comorbid conditions, demographics such as age, and biomarkers such as those associated with inflammation^³,⁷–⁹. However, little research has been conducted evaluating the prediction of incident low muscle mass, a key problem that must be addressed in order to help prevent it from occurring, and the studies that do exist are often limited by a small sample size or a lack of longitudinal data^⁵,⁶,⁹. Additionally, more work needs to be done examining what drives the risk for low muscle mass on the individual level. This is especially relevant as the benefits of precision-based approaches to medicine over disease-based approaches have become more realized in the medical community. Muscular dystrophies, sarcopenia, and cachexia have all been viewed as appropriate for undergoing precision-based care due to the variability of patients’ genetic makeup, health, and exposure to therapies^¹¹.

We leveraged longitudinal data collected from a large cohort of current and/or former smokers to identify peripheral protein blood biomarkers associated with the development of CT-derived PMA^¹². In hopes of identifying those at highest risk for developing low PMA, we hypothesized that we could predict the development of low PMA by using a machine learning classification model that utilizes the identified biomarkers in conjunction with clinical measures and demographics. Additionally, we aimed to not only predict low muscle mass but also to illustrate and understand individual and group risk for it.

Results

Participant characteristics

The Genetic Epidemiology of COPD (COPDGene) study enrolled 10,305 participants at baseline. For this study the analysis was limited to the 598 current and/or former smoking participants and 98 never-smoking control participants with complete data available (e-Fig. ¹). The current and/or former smoking cohort was made up of 48% men and 52% women. The cohort was 10.7% Black and 89.3% White. The mean age and BMI were 61.8 and 28.9 respectively. 36.3% were current smokers, 63.7% were former smokers, and the mean pack years was 42.9. Among the never-smoking control group, the 25th percentile of gender-stratified PMA at baseline was 44.9 cm² for men (n=32) and 24.5 cm² for women (n=66). Based on these values, there were 415 current and/or former smoking participants who did not have low PMA at baseline and 183 who did. Of the 415 current and/or former smoking participants that did not have low PMA at baseline, 22.9% developed low PMA at phase 2 (Table (Table11).

Table 1

Baseline characteristics of COPDGene participants used in this study, non-stratified and stratified by low pectoralis muscle area at baseline.

	Baseline characteristics	Low PMA at baseline	No low PMA at baseline
n	598	183	415
Gender, n (%)
Men	287 (48.0)	117 (63.9)	170 (41.0)
Women	311 (52.0)	66 (36.1)	245 (59.0)
Race, n (%)
Black	64 (10.7)	1 (0.5)	63 (15.2)
White	534 (89.3)	182 (99.5)	352 (84.8)
Age, mean (SD)	61.8 (8.7)	66.7 (7.8)	59.7 (8.2)
BMI, mean (SD)	28.9 (5.7)	28.1 (5.8)	29.2 (5.6)
Smoking status, n (%)
Current smoker	217 (36.3)	53 (29.0)	164 (39.5)
Former smoker	381 (63.7)	130 (71.0)	251 (60.5)
Pack years, mean (SD)	42.9 (23.6)	47.1 (24.7)	41.1 (22.9)
Developed low PMA at phase 2, n (%)	–	–	95 (22.9)

BMIbody mass index, COPDGenegenetic epidemiology of COPD, PMApectoralis muscle area, SDstandard deviation.

Biomarker feature selection

There were 355 peripheral protein blood biomarkers that passed the univariate screen. Of those, eight biomarkers were deemed important for predicting the development of low PMA by the Boruta feature selection algorithm: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1) (Table (Table22).

Table 2

Biomarkers that underwent a univariate screen (Weltch’s t-Test, FDR q<0.10) between those without and with low PMA at baseline and were considered relevant for predicting the development of low pectoralis muscle area via Boruta feature selection.

Biomarker	Mean (SD) no low PMA at baseline (n=415)	Mean (SD) low PMA at baseline (n=183)	T-statistic (P value)	FDR q value	Brief description
Histone acetyltransferase type B catalytic subunit (Hat1)	6.04 (0.39)	5.96 (0.35)	2.66 (0.008)	0.038	Enzyme associated with the acetylation of newly synthesized histone H4^¹⁵
Vascular cell adhesion protein 1 (VCAM-1)	9.54 (0.27)	9.60 (0.26)	−2.60 (0.010)	0.045	Cell adhesion molecule whose expression is induced on endothelial cells during inflammatory disease. Plays a role in the regulation of leukocyte migration^¹⁴
Secreted protein acidic and rich in cysteine (SPARC)	9.99 (0.64)	9.75 (0.69)	4.13 (<0.001)	<0.001	Glycoprotein associated with the binding of cells and matrix components^³⁷
Lymphotoxin alpha 1/beta 2 (Lymphotoxin a1/b2)	4.34 (0.27)	4.28 (0.30)	2.39 (0.017)	0.072	Cytokines associated with the adaptive immune response and the maintenance of lymphoid organ architecture^³⁸
Growth/differentiation factor 15 (GDF15)	7.19 (0.39)	7.39 (0.37)	−6.00 (<0.001)	<0.001	Cytokine released in response to stress and tissue injury^³⁹
Cell adhesion molecule-related/down-regulated by oncogenes (CDON)	8.76 (0.22)	8.68 (0.21)	3.94 (<0.001)	0.001	Transmembrane glycoprotein associated with Hedgehog proteins and myoblast differentiation^¹⁹,⁴⁰
Neurexophilin-1 (NXPH1)	8.93 (0.39)	8.83 (0.41)	2.84 (0.005)	0.025	Glycoprotein whose detected expression (in humans) is strongest in the spleen^⁴¹
EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1)	7.42 (0.18)	7.49 (0.21)	−3.90 (<0.001)	0.001	Glycoprotein that has a role in basement membranes^⁴²

FDRfalse discovery rate, PMApectoralis muscle area, SDstandard deviation.

Predicting low PMA with machine learning

Regarding the random forest prediction models’ discrimination (Fig. 1), the clinical-only model had an area under the receiver operating characteristic curve (AUROC) of 0.646, which was worse than the biomarker-only model’s AUROC of 0.740, but their difference did not reach statistical significance (p for AUC comparison=0.093). The combined model had better discrimination than the clinical-only model with an AUROC of 0.744 (p for comparison=0.032) but was not better than the biomarker-only model (p for comparison=0.779). Model precision recall and calibration curves are found in the supplementary material. The area under the precision recall curve (AUPRC) for the clinical-only model was 0.36, for the biomarker model was 0.53, and for the combined model was 0.51 (e-Figs. ²–⁴). Regarding calibration, the Brier scores of the combined model and the biomarker-only model were identical (0.174) while the Brier score of the clinical-only model was slightly higher (0.203). (e-Figs. ⁵–⁷). The testing set included 139 participants and the training set included 168 participants after down sampling (276 originally). Of note, as described in the supplemental results, similar results were found in secondary analyses using logistic regression models in place of Random Forest models. For example, the AUROC for the logistic regression models were 0.653, 0.736 and 0.750 for the clinical-only, biomarker only, and combined models respectively (e-Figs. ⁸–¹⁴). For each respective model, there was no difference between the AUROC of the random forest model and AUROC of the logistic regression model (p for comparison for clinical-only models=0.799, p for comparison for biomarker-only models=0.895, and p for comparison for combined models=0.840).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig1_HTML.jpg

Figure 1

Random forest model discrimination. Areas under the receiver operating characteristic curves (AUROC) of our three random forest classification models built to predict low pectoralis muscle area (PMA). Five clinical measures were used in the clinical-only model: age, gender, pack years, height, and weight. Eight feature selected biomarkers for predicting the development of low PMA were used in the biomarker-only model: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), Vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1). The combined model used predictors from both the clinical-only and biomarker-only models. The combined model and the clinical-only model were significantly different (P=0.032). The combined model and the biomarker-only model were not significantly different (P=0.78). The clinical-only model and the biomarker-only model were not significantly different (P=0.09).

Individual risk

For the combined model, the order of importance of the predictors was GDF15, EFEMP1, CDON, Lymphotoxin a1/b2, VCAM-1, age, ON, NXPH1, Hat1, gender, pack years, height, and weight (Fig. 2). Feature importance analysis of the clinical-only and biomarker-only models are found in the supplementary material (e-Figs. ¹⁵–¹⁶).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig2_HTML.jpg

Figure 2

Random forest combined model summary plot. The combined random forest classification model’s training set’s (n=168) predictors ordered by importance for predicting low pectoralis muscle area (PMA). Shapley additive explanation (SHAP) values indicate the predictors' impact on the probability of developing low PMA. For numeric predictors, red indicates a high value and blue indicates a low value. For the sole categorical predictor, “Women”, red and blue represent women and men respectively. Five clinical measures were used: age, gender, pack years, height, and weight. Eight feature selected biomarkers for predicting the development of low PMA were used: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), Vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1).

Visual evaluation of the relationships between the measurements of each model’s training set’s (n=168) predictors and their respective Shapley additive explanation (SHAP) values suggests that several may have definable thresholds. For example, for the combined model, GDF15 and EFEMP1 had breakpoints near the middle of their range. (Combined model: Fig. 3, e-Fig. ¹⁷, clinical-only and biomarker-only models: e-Figs. ¹⁸–¹⁹) In addition, visual evaluation of the force plots from 10 randomly selected participants revealed a large amount of heterogeneity in the covariates that drive the individual participant’s final predicted probability. The mean predicted probability of the combined, biomarker-only, and clinical only models’ training sets were 0.337, 0.337, and 0.333 respectively (combined model: Fig. 4, e-Fig. ²⁰, clinical-only and biomarker-only models: e-Figs. ²¹–²⁴).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig3_HTML.jpg

Figure 3

Predictor measurements vs. Shapley additive explanation values (random forest combined model). The relationships between the clinical predictors: age, pack years, height, weight, and gender, and the 5 most important feature selected biomarkers for predicting the development of low pectoralis muscle area (PMA): Growth/differentiation factor 15 (GDF15), EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Vascular cell adhesion protein 1 (VCAM-1) with their respective Shapley additive explanation (SHAP) values. SHAP values indicate the predictors' impact on the probability of developing low PMA. Yellow and green indicate whether the participant is a woman or a man respectively. This is solely examining the combined random forest classification model’s training set (n=168).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig4_HTML.jpg

Figure 4

Force plots for participants with a predicted probability of developing low pectoralis muscle area greater than the mean probability of the random forest combined model’s training set. Force plots for 5 randomly selected participants from the combined random forest classification model’s training set (n=168) with a predicted probability of developing low pectoralis muscle area (PMA) greater than the mean probability of the combined model’s training set (0.337). Each predictor has a Shapley additive explanation (SHAP) value that indicates the predictors' impact on the probability of developing low PMA. Red and blue indicate whether the impact is positive or negative respectively. Five clinical measures were used: age, gender, pack years, weight, and height. Eight feature selected biomarkers for predicting the development of low PMA were used: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), Vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1).

Group risk

K-Means clustering resulted in three distinct clusters of participants based on the silhouette coefficient. Performing principal component analysis (PCA) on the combined model’s biomarkers’ standardized SHAP values resulted in the first component explaining 27.6% of the variance and the second component explaining 20.5% of the variance. When stratified for the development of low PMA, one cluster was predominantly made up of participants who did not develop low PMA, and the remaining 2 clusters were predominantly made up of participants who did develop low PMA (Fig. 5). All the feature selected biomarkers’ SHAP values were significantly different between the three clusters via one-way ANOVA (P<0.001). The clusters that were predominantly made up of participants who developed low PMA had different SHAP profiles from one another despite having the same outcome. The cluster that was predominantly made up of participants who did not develop low PMA had consistently low SHAP values (Fig. 6).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig5_HTML.jpg

Figure 5

Clustering participants via principal component analysis and K-means clustering. The plot on the left illustrates the participants in the training set (n=168) of the combined random forest classification model, for predicting the development of low PMA, clustered based on the similarity of their feature selected biomarkers’ Shapley additive explanation (SHAP) values using principal component analysis (PCA) and K-means clustering. There were 2 PCA components. The plot on the right illustrates whether the individuals in the clusters did or did not develop low pectoralis muscle area (PMA). Black dots indicate the centroids of the clusters. The SHAP values of eight feature selected biomarkers for predicting the development of low PMA were used: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), Vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1).

An external file that holds a picture, illustration, etc.
Object name is 41598_2024_68447_Fig6_HTML.jpg

Figure 6

Comparing feature selected biomarker Shapley additive explanation values between clusters. Box plots comparing the feature selected biomarkers for predicting the development of low PMA’s SHAP values between the three clusters that were illustrated using principal component analysis (PCA) and K-means clustering. All the biomarkers’ SHAP values were significantly different between the three clusters via one-way ANOVA (P<0.001). Eight feature selected biomarkers for predicting the development of low PMA were used: Histone acetyltransferase type B catalytic subunit (Hat1), Secreted protein acidic and rich in cysteine (SPARC), Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2), Growth/differentiation factor 15 (GDF15), Cell adhesion molecule-related/down-regulated by oncogenes (CDON), Neurexophilin-1 (NXPH1), Vascular cell adhesion protein 1 (VCAM-1), and EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1). The black lines indicate the medians, the red triangles indicate the means, the circles represent outliers, and the error bars represent 1.5×the interquartile range. There were 168 participants between the 3 groups.

Feature selected biomarkers relationship with PMA

Finally, of the five most important feature selected biomarkers, baseline EFEMP1 was significantly (P=0.008) negatively correlated (r=−1.29) with PMA change. Baseline CDON was significantly (P=0.009) positively correlated (r=0.127) with PMA change. The remaining 3 biomarkers at baseline were not significantly correlated with PMA change (Table (Table33).

Table 3

Relationships between the 5 most important feature selected biomarkers at baseline for predicting low pectoralis muscle area and the change in pectoralis muscle area (cm²) between baseline and phase 2 (n=415).

Biomarker	Pearson correlation coefficient	P value
Growth/differentiation factor 15 (GDF15)	−0.049	0.317
EGF-containing fibulin-like extracellular matrix protein 1 (EFEMP1)	−0.129	0.008
Cell adhesion molecule-related/down-regulated by oncogenes (CDON)	0.127	0.009
Lymphotoxin alpha 1/ beta 2 (Lymphotoxin a1/b2)	0.048	0.332
Vascular cell adhesion protein 1 (VCAM-1)	−0.011	0.823

Discussion

Leveraging longitudinal data from the COPDGene study, we developed a machine learning classification model that predicted the development of low PMA in smokers using clinical measures, demographics, and peripheral protein blood biomarkers. This model outperformed a model that utilized only clinical measures and demographics as predictors and performed similarly to one that incorporated biomarker information only. In addition, subsequent analysis of the models suggests that there may be specific cut-points of interest for the biomarkers identified, and that there is a large amount of heterogeneity in what drives an individual patient’s risk for developing low PMA. This heterogeneity was used to cluster the participants into distinct subtypes.

This work has several strengths, one of which is the use of a large-scale longitudinal research cohort that enabled the prediction of low muscle mass utilizing an abundance of protein biomarkers in the initial panel. Prior efforts to predict low muscle mass using biomarkers have often been cross-sectional with relatively small and non-diverse cohorts and with relatively small candidate biomarker panels^{⁷,¹⁰,¹³}. Also, by utilizing all-relevant feature selection tools such as Boruta, we were able to select a small number of relevant biomarkers of interest. Subsequent evaluation using SHAP analysis and K-Means clustering provided insights into potential threshold values for those biomarkers as well as demonstrating the heterogeneity in what contributes to a specific individual’s probability of developing low PMA. We believe our methods for biomarker selection and analyzing patient risk are novel to the issue of low muscle mass.

In terms of specific findings, the eight biomarkers that were deemed important for predicting low PMA were surprisingly diverse, with roles ranging from leukocyte migration regulation to histone acetylation^¹⁴,¹⁵. Some of the biomarkers found validated prior research. For example, serum GDF15 has been identified as a potential biomarker for sarcopenia due to it being negatively correlated with muscle mass^¹⁶ and muscle power^¹⁷ in humans. Although, we could not find any research relating circulating CDON to muscle mass, it has been shown that mice with satellite cell-specific CDON ablation had impaired muscle generation^¹⁸ and it is believed that CDON positively regulates skeletal myogenesis^¹⁹,²⁰.

Interestingly, some of the biomarkers found contradicted prior research. For example, Hat1-haplodeficient mice have been revealed to have a shorter lifespan and more premature age-related phenotypes, including muscle atrophy, than wildtype mice^²¹. Moreover, satellite cell VCAM-1 null mice had delayed, or decreased myofibril growth compared to wildtype mice^²². These contradictions may be due to species differences and contrasts in function between circulating biomarkers and biomarkers’ expression in muscle, a notable weakness of our current work which relies on peripheral biomarkers.

Some of the biomarkers found may help elucidate prior unclear research. For example, a cross-species meta-analysis identified EFEMP1 as consistently overexpressed in the muscle with age, and even consistently overexpressed in all studied tissues in their analyses^²³. However, there are areas where EFEMP1 appears to be reduced during aging such as the superficial zone of the articular cartilage^²⁴, and mice with inactivated EFEMP1 appear to age prematurely^²⁵. In our study, EFEMP1 was found to increase the likelihood of developing low PMA in our model, and it was found that EFEMP1 measurements were higher in the cohort that had low PMA at baseline. Altogether, this suggests that the upregulation of EFEMP1 may be an adaptive response to delay the inevitable aging and muscle loss processes. Similarly, conflicting data also exists for the role of SPARC in muscle biology and sarcopenia. For example, there has been evidence that SPARC both positively and negatively effects the differentiation of myoblasts^²⁶,²⁷. Moreover, one group found that serum SPARC was significantly higher in a sarcopenic cohort compared to a non-sarcopenic cohort while, another group found the opposite, although the latter finding was not statistically significant and there were concurrent disease processes^⁷,²⁸. In our study SPARC was found to decrease the likelihood of developing low PMA in our model, and it was found that SPARC measurements were higher in the cohort that did not have low PMA at baseline. Together, this suggests that SPARC likely has a negative role in the complex muscle loss process. Hopefully, our results concerning EFEMP1 and SPARC will help minimize the ambiguity of these biomarkers.

With regards to the identification of novel biomarkers related to low muscle mass, neither NXPH1 nor Lymphotoxin a1/b2 appear to have a connection with low muscle mass in the literature. Whether our findings reflect true associations or confounding is unclear and further work is needed to better elucidate what roles, if any, these proteins may play in the development of low muscle mass.

Interestingly, when assessing the feature importance of the combined model’s predictors we noticed that the protein biomarkers appeared more important than most of the clinical predictors. While this could be taken to support the use of proteomics for identifying those at risk for low muscle mass, it is important to caution that there are numerous other clinical predictors that can and should be evaluated, including both complicated screening tools as well as simple clinical questions related to weight loss and exercise capacity. These extensive analyses are beyond the scope of this current investigation but should be done to better explore these issues.

Notably, for the quantitative predictors in our models there is a greater range of positive impact values than negative impact values. In other words, the models avoid giving strong negative impact values regardless of the predictors’ actual values, insinuating that there is not one realistic predictor value that can drastically negatively affect the model’s outcome. Interestingly, the five most important biomarkers for predicting low PMA, when assessed individually at baseline, were not highly correlated with change in PMA between baseline and phase 2. This highlights the potential strength of tools such as machine learning to identify predictors that may not be readily apparent when using more traditional statistical analyses. Similarly, tools such as SHAP analysis may enable insights into specific relationships between predictors and outcomes. For example, plotting the SHAP values against the predictor measurements allowed us to examine the threshold at which the impact direction changes. The plots for age and pack years are especially illustrative. This information may help determine threshold values for concern in clinical applications. The SHAP force plots also help illustrate what is happening on the individual level and show the multifactorial nature of low muscle mass. This could be especially helpful when considering personalized medicine approaches to specific patients, as different patients may have different pathobiological processes responsible for the same phenotype, and thus they may respond differently to targeted therapy. Our cluster analysis supports this theory as they illustrated two distinct subtypes of participants who developed low PMA. This could be due to differences in biomarker profiles, or perhaps due to underlying conditions, for example, aging and smoking-related disease. Interestingly, of the three clusters, it appears that the cluster that mostly did not develop low PMA is the densest cluster, and therefore has a less variance than the other two clusters. Perhaps this consistency is indicative of a “normal” profile subtype. As expected, when comparing the biomarkers’ SHAP profiles between the 3 clusters, the cluster that was mostly composed of those who did not develop low PMA consistently had the lowest SHAP values (when examining the median). The other two clusters had considerably different biomarker SHAP profiles from one another. For example, the participants in Cluster 1 developed low PMA with CDON and Lymphotoxin a1/b2 having a negative impact on their predicted probability for developing low PMA. On the other hand, Cluster 3 developed low PMA with CDON and Lymphotoxin a1/b2 having a positive impact on their predicted probability for developing low PMA. Surprisingly, the most important biomarkers overall, GDF15 and EFEMP1, had similar SHAP values in both clusters, indicating that it may be the less important biomarkers that are the most responsible for this stratification.

Clinically, this study demonstrates that it may be possible to identify patients at highest risk for low muscle mass before it develops, potentially enabling targeted interventions ranging from diet and exercise to current and novel pharmacologic therapies. This is especially important given both the growing recognition of the benefits of personalized medicine and the growing recognition that muscle loss, while often related to other co-morbid diseases, is a distinct process independently associated with morbidity and mortality. Finally, our approach to biomarker selection and risk analysis is not unique to low muscle mass and could be expanded to other domains as well, potentially enabling the identification of important biomarkers and underlying pathways for other clinical problems.

Unfortunately, this project had several limitations. We did not have a validation cohort and the participants enrolled in this study were less diverse than the general population, which may reduce its generalizability. In addition, there is likely collinearity between some of the biomarkers and clinical measures. For example, plasma GDF15 has been shown to be significantly positively associated with age^²⁹. It is therefore difficult to separate the effects of age from the effects of specific protein biomarkers. Moreover, SHAP analyses assume independence between the predictors, which may not be the case. In addition, although the feature importance results are interesting, they do not indicate causality, only association, significantly limiting their interpretation.

Other important limitations include the imaging metric used and the outcome definition. As noted in the introduction, sarcopenia is a clinical syndrome characterized by low muscle strength and low muscle quality or quantity^¹,². Although CT measured PMA is associated with adverse clinical outcomes, it only measures one aspect of sarcopenia. Moreover, there are numerous both imaging and non-imaging based approaches to measuring muscle quantity as well as muscle quantity including other measures of the pectoralis muscle such as muscle volume and density and measures of other muscle groups such as the erector spinae muscles^¹,². Additional work is needed to determine if the protein associations found in this study are present with other imaging and non-imaging based definitions of sarcopenia are used.

Finally, it should be noted that supplemental analyses using logistic regression prediction models produced similar results to the random forest models. This finding could be interpreted in several ways. One possibility is that the specific form of statistical prediction model is less important than the predictors used. Additional work is needed to explore whether other forms of machine learning models produce similar results.

In summary, using proteomics and machine learning, we identified protein biomarkers associated with low PMA in smokers, developed risk prediction tools able to predict the development of low PMA over 5 years of follow-up, and analyzed individual risk and group risk for developing low PMA.

Methods

Parent study

Data was acquired through COPDGene study: an ongoing longitudinal observational study that examines the development of chronic obstructive pulmonary disease in smokers. There were 10,198 current and/or former smokers and 107 non-smoking control participants initially enrolled in COPDGene (e-Fig. ¹). All participants were non-Hispanic white or African American, and all current and/or former smokers had a minimum of 10 pack years. Data was collected at baseline (phase 1) and after 5 years of follow-up (phase 2). Additional phase 3, 10-year follow up visits are currently in progress and are not included in this current study. Data used for this study included an extensive questionnaire at baseline, CT of the chest at baseline and phase 2, and peripheral protein blood biomarker measurements via the SomaScan assay at baseline. The biomarkers were measured in relative fluorescent units and the measurements were normalized and natural log transformed^³⁰. PMA (cm²) was derived using a single axial CT image at the level of the aortic arch and the suprasternal notch using a previously described method^⁵. All research was performed in accordance with relevant guidelines. All participants provided written informed consent, and the study was approved by the institutional review board at each of the 21 centers including Brigham and Women’s Hospital^¹².

Defining low PMA

For this study, we defined the current and/or former smokers as having low PMA if they had a PMA that was less than the 25th percentile of baseline never-smoking control participants, stratified by gender. We defined the current and/or former smokers as having low PMA at baseline and at phase 2.

Biomarker feature selection

To identify protein biomarkers of interest, we performed an initial univariate screen comparing mean biomarker measurements in current and/or former smokers with (n=183) and without (n=415) low PMA at baseline. There were 1317 initial biomarkers and only the biomarkers with a Welch’s t-test false discovery rate (FDR) q<0.10 were retained. We then utilized Boruta feature selection with a one-step correction to identify the most relevant biomarkers for predicting the development of low PMA, i.e., the change from not having low PMA at baseline to having low PMA at the 5-year follow-up visit. The default parameters were used except for the number of estimators which was set to ‘auto’ and the maximum depth which was set to 8. Boruta was chosen due to it being an all-relevant feature selection method, meaning that it aims to uncover all the relevant features as opposed to uncovering the minimal number of features that score well^³¹,³².

Predicting low PMA with machine learning

To identify participants at highest risk for developing low PMA and to determine the utility of clinical and biomarker data to predict low PMA, we built three random forest classification models to predict the development of low PMA, i.e. the change from not having low PMA at baseline to having low PMA at the 5-year follow-up visit^³³. The first was a clinical-only model that incorporated easily attainable baseline clinical measures (height, weight, pack years) and demographics (age and gender). The second was a biomarker-only model that incorporated the baseline protein biomarkers selected using the feature selection process. The third model incorporated both the clinical measures/demographics and the selected biomarkers. All models were trained on the same 2/3 random sample and tested on the remaining 1/3. Finally, 2:1 down-sampling was performed to account for event prevalence. Model hyperparameters were tuned using Bayesian optimization. The models’ performances were summarized by the AUROC, AUPRC, the calibration curve, and the Brier score (“the mean squared difference between the predicted probability and the actual outcome”) of their respective testing sets^³³. The calibration curves were calculated using 10 bins. For comparison purposes, three logistic regression classification models were created using the same predictors as for each of the random forest models.

Individual risk

To assess the importance of the combined model’s individual predictors and to examine the predictors’ impact (strength and direction) on the predicted probability for developing low PMA, a SHAP summary plot was built^³⁴. SHAP plots utilize SHAP values which are assigned to each predictor and indicate how much the predictor, alone, contributes to a model’s prediction. This is based on the game theory idea of Shapley values which represent the average marginal contribution of a predictor across all possible combinations of predictors. In other words, on the individual level, the difference between the predicted probability and the expected (base) probability is the sum of the SHAP values for every predictor^³⁴,³⁵. To determine if there were possible threshold values for the predictors, the clinical measurements and the five most important biomarker measurements were then plotted against their respective SHAP values. In addition, to visualize how SHAP values were affecting the prediction on the individual level, SHAP force plots were built for ten randomly selected individuals: five predicted to develop low PMA and five predicted to not develop low PMA (using the mean predicted probability of the combined model’s training set as the cutoff point)^³⁶. All SHAP analyses focused on the training set of the combined model unless otherwise specified.

Group risk

Additionally, to examine whether there were any distinguishable groups within the participants, we clustered the combined model’s training set based on the biomarkers’ standardized SHAP values. This was done using PCA, to reduce dimensionality, and K-Means clustering. The optimal number of clusters was based on the silhouette coefficient of the raw SHAP values. We then stratified the clusters based on whether the participants developed low PMA in phase 2. Differences in the biomarkers’ raw SHAP values between the three clusters were then assessed using a one-way ANOVA and visualized using box plots. All SHAP analyses focused on the training set of the combined model unless otherwise specified.

Feature selected biomarkers relationship with PMA

Finally, to explore the relevance of the five most important biomarkers, Pearson correlation coefficients were calculated between the biomarkers at baseline and the change in PMA between the two phases (cm²) amongst participants without low PMA at baseline.

Statistics

All analyses were conducted using Python 3.9.7 and R 4.0.3. All statistical tests were two-tailed and P values<0.05 were taken to mean statistical significance unless otherwise specified. The initial univariate screen included a Welch’s t-test where FDR q<0.10 (calculated using the Benjamini–Hochberg procedure) was taken to mean statistical significance. The prediction models’ performances were summarized by the AUROC, AUPRC, the calibration curve, and the Brier score (“the mean squared difference between the predicted probability and the actual outcome”) of their respective testing sets^³³. The calibration curves were calculated using 10 bins. The AUROCs were compared using a t-test. A one-way ANOVA and boxplots were used to examine and visualize the differences in biomarker SHAP values between clusters. Boxplots included means (red triangles), medians (black lines) and error bars (1.5×the interquartile range). Pearson correlation coefficients were calculated to examine the relationship between biomarkers and change in PMA between baseline and phase 2.

Supplementary Information

Supplementary Information.^{(1.5M, docx)}

Acknowledgements

The COPDGene study consortium (NCT00608764) is supported by NHLBI U01 HL089897 and U01 HL089856, as well as by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion. Additional funding for this work includes National Institutes of Health grants: K08-HL145118 (Ash), T32-HL007633 (Mason, Choi), T32-HL105346 (Chiles), R01-HL116931 (San José Estépar, Washko) and R01-HL122464 (San José Estépar, Washko).

Author contributions

NAE: Hypothesis/research question generation, image and data analysis, initial manuscript draft, manuscript revisions, final manuscript JC: Hypothesis/research question generation, manuscript revisions, final manuscript SM: Hypothesis/research question generation, manuscript revisions, final manuscript TS: Hypothesis/research question generation, manuscript revisions, final manuscript VC: Hypothesis/research question generation, manuscript revisions, final manuscript ER: Hypothesis/research question generation, manuscript revisions, final manuscript BC: Hypothesis/research question generation, manuscript revisions, final manuscript NFY: Hypothesis/research question generation, manuscript revisions, final manuscript AAD: Hypothesis/research question generation, manuscript revisions, final manuscript GRW: Hypothesis/research question generation, image and data analysis, manuscript revisions, final manuscript MLM: Hypothesis/research question generation, manuscript revisions, final manuscript RSJE: Hypothesis/research question generation, image and data analysis, manuscript revisions, final manuscript SYA: Hypothesis/research question generation, image and data analysis, initial manuscript draft, manuscript revisions, final manuscript COPDGene Study Consortium: Data acquisition and provision.

Data availability

The data that support the findings of this study are available from the database of Genotypes and Phenotypes (dbGaP, https://www.ncbi.nlm.nih.gov/gap/, accession number pht002239.v4.p2), the National Heart, Lung and Blood Institute (NHLBI) BioData Catalyst (https://biodatacatalyst.nhlbi.nih.gov/resources/data), and by reasonable request from the COPDGene study (https://www.copdgene.org/).

Competing interests

Mr. Enzer reports no conflicts of interest. Dr. Mason reports employment by Sarepta Therapeutics, outside of this current work, and grant funding from the National Institutes of Health (NIH), related to this current work. Dr. Chiles reports grant funding from the NIH. Dr. McDonald reports no conflicts of interest. Dr. Shirahata reports no conflicts of interest. Ms. Yuan reports no conflicts of interest. Mr. Castro reports no conflicts of interest. Dr. Regan reports no conflicts of interest. Dr. Choi reports consulting fees from Quantitative Imaging Solutions, outside of this current work. Dr. Diaz reports no conflicts of interest. Dr Washko reports ownership/dividend from Quantitative Imaging Solutions, outside of this current work. Dr. Estépar reports ownership/dividend from Quantitative Imaging Solutions, outside of this current work. Dr. Ash reports ownership/dividend from Quantitative Imaging Solutions, outside of this current work, and grant funding from the National Institutes of Health (NIH), related to this current work. The COPDGene study consortium (NCT00608764) is supported by NHLBI U01 HL089897 and U01 HL089856, as well as by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Bayer Pharmaceuticals, Boehringer-Ingelheim, Genentech, GlaxoSmithKline, Novartis, Pfizer, and Sunovion. The remaining authors do not have any competing interests to declare.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A list of authors and their affiliations appears at the end of the paper.

Contributor Information

COPDGene Study Consortium:

Nicola A. Hanania,¹² Mustafa Atik,¹² Laura Bertrand,¹² Aladin Boriek,¹² Thomas Monaco,¹² Dharani Narendra,¹² Francesca Polverino,¹² Veronica V. Lenge de Rosen,¹² Paula Sierra Salas,¹² Tianshi David Wu,¹² Dawn L. DeMeo,¹³ Craig P. Hersh,¹³ Alejandro A. Diaz,^1,^2,⁴ Staci M. Gagne,¹³ Francine L. Jacobson,¹³ Kathryn Marentette,¹³ George R. Washko,^1,^2,⁴ Seth Wilson,¹³ Jeong H. Yun,¹³ R. Graham Barr,¹⁴ John H. M. Austin,¹⁴ Maria Lorena Gomez Blum,¹⁴ Belinda M. D’Souza,¹⁴ Emilay Florez,¹⁴ Valeria Lopez,¹⁴ Wanda Pecheco,¹⁴ Byron Thomashow,¹⁴ Chris H. Wendt,¹⁵ Arianne Baldomero,¹⁵ Miranda Hassler,¹⁵ Ken M. Kunisaki,¹⁵ David MacDonald,¹⁵ Charlene McEvoy,¹⁶ Nell Adams,¹⁶ Barbara Heinz,¹⁶ Jonathan Phelan,¹⁶ Cheryl Sasse,¹⁶ Eric L. Flenaugh,¹⁷ Judith Delancy,¹⁷ Marilyn G. Foreman,¹⁷ Hirut Gebrekristos,¹⁷ Willi Howell,¹⁷ Dominique Lawson,¹⁷ Mario Ponce,¹⁷ Gloria Westney,¹⁷ Russell P. Bowler,¹⁸ Sophia Addi,¹⁸ Elena Engel,¹⁸ Jay Finigan,¹⁸ Claire Guo,¹⁸ Seth Kligerman,¹⁸ David A. Lynch,¹⁸ Elizabeth Regan,^4,⁷ Lisa Ruvuna,¹⁸ Richard Rosiello,¹⁹ Jean Champagne,¹⁹ Mary Charpentier,¹⁹ Theodore Girard,¹⁹ Jon Jaksha,¹⁹ Diane Kirk,¹⁹ Laurie Kuck,¹⁹ Mohammed Quraishi,¹⁹ Lucia Sears,¹⁹ Gerard J. Criner,²⁰ Elise Cortese,²⁰ Chandra Dass,²⁰ Laurie Jameson,²⁰ Nathaniel Marchetti,²⁰ Francine McGonagle,²⁰ Lauren Miller,²⁰ Kim Selwood,²⁰ Kartik Shenoy,²⁰ Regina Sheridan,²⁰ Shubhra Srivastava-Malhotra,²⁰ Surya P. Bhatt,²¹ William C. Bailey,²¹ Sandeep Bodduluri,¹² Joe W. Chiles,^3,⁴ Mark T. Dransfield,²¹ Scott Grumley,²¹ Sonya Hardy,²¹ Anand Iyer,²¹ David C. LaFon,²¹ Padma Manapragada,²¹ Merry-Lynn McDonald,^3,⁴ Hrudaya Nath,²¹ Gabriela Oates,²¹ Satinder P. Singh,²¹ Raymond C. Wade,²¹ Mike Wells,²¹ Abigail West,²¹ Douglas Conrad,²² Jeffrey Barry,²² Marissa Gil,²² Albert Hsiao,²² Amber Martineau,²² Jenna Mielke,²² Gabriel Querido,²² Xavier Soler,²² Rajat Suri,²² Sean Swenson,²² Angela Wang,²² Andrew Yen,²² Alejandro Comellas,²³ Eric Bruening,²³ Sidney Davis,²³ Nick Feeley,²³ Spyridon Fortis,²³ Devon Foster,²³ Eric Garcia,²³ Kaitlyn Glosser,²³ Karin F. Hoth,²³ Justin D. Kuhn,²³ Archana Laroia,²³ Changhyun Lee,²³ Jeni Michelson,²³ Kim Sprenger,²³ Katelyn Wilensky,²³ Alejandro Comellas,²³ Eric Bruening,²³ Sidney Davis,²³ Nick Feeley,²³ Spyridon Fortis,²³ Devon Foster,²³ Eric Garcia,²³ Kaitlyn Glosser,²³ Karin F. Hoth,²³ Justin D. Kuhn,²³ Archana Laroia,²³ Changhyun Lee,²³ Jeni Michelson,²³ Kim Sprenger,²³ Katelyn Wilensky,²³ MeiLan K. Han,²⁴ Gretchen Bautista,²⁴ Jeffrey L. Curtis,²⁴ Crystal Cutlip,²⁴ Craig J. Galban,²⁴ Jaide Hawn,²⁴ Ella Kazerooni,²⁴ Wassim Labaki,²⁴ Lisa McCloskey,²⁴ Kelly Rysso,²⁴ Liujian Zhao,²⁴ Joanne Billings,²⁵ Tadashi L. Allen,²⁵ Mary P. Bailey,²⁵ Anne Duesterbeck,²⁵ Nate Gaeckle,²⁵ Brooke Noren,²⁵ Kyong Yun,²⁵ Frank Sciurba,²⁶ Daniel Arminavage,²⁶ P. Takis Benos,²⁶ Jessica Bon,²⁶ Divay Chandra,²⁶ Paula Consolaro,²⁶ Tiffany Ditter,²⁶ Jason Duin,²⁶ Robert Gregg,²⁶ Chad Karoleski,²⁶ Zehavit Kirshenboim,²⁶ Rhonda Lincoln,²⁶ Antonio Anzueto,²⁷ Sandra G. Adams,²⁷ Diego Maselli-Caceres,²⁷ and Mario E. Ruiz²⁷

Nicola A. Hanania

¹²Baylor College of Medicine, Houston, USA

Find articles by Nicola A. Hanania

Mustafa Atik

¹²Baylor College of Medicine, Houston, USA

Find articles by Mustafa Atik

Laura Bertrand

¹²Baylor College of Medicine, Houston, USA

Find articles by Laura Bertrand

Aladin Boriek

¹²Baylor College of Medicine, Houston, USA

Find articles by Aladin Boriek

Thomas Monaco

¹²Baylor College of Medicine, Houston, USA

Find articles by Thomas Monaco

Dharani Narendra

¹²Baylor College of Medicine, Houston, USA

Find articles by Dharani Narendra

Francesca Polverino

¹²Baylor College of Medicine, Houston, USA

Find articles by Francesca Polverino

Veronica V. Lenge de Rosen

¹²Baylor College of Medicine, Houston, USA

Find articles by Veronica V. Lenge de Rosen

Paula Sierra Salas

¹²Baylor College of Medicine, Houston, USA

Find articles by Paula Sierra Salas

Tianshi David Wu

¹²Baylor College of Medicine, Houston, USA

Find articles by Tianshi David Wu

Dawn L. DeMeo

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Dawn L. DeMeo

Craig P. Hersh

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Craig P. Hersh

Alejandro A. Diaz

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Alejandro A. Diaz

Staci M. Gagne

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Staci M. Gagne

Francine L. Jacobson

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Francine L. Jacobson

Kathryn Marentette

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Kathryn Marentette

George R. Washko

¹Division of Pulmonary and Critical Care Medicine, Department of Medicine, Brigham and Women’s Hospital, Boston, MA USA

²Applied Chest Imaging Laboratory, Brigham and Women’s Hospital, Boston, MA USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by George R. Washko

Seth Wilson

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Seth Wilson

Jeong H. Yun

¹³Brigham and Women’s Hospital, Boston, USA

Find articles by Jeong H. Yun

R. Graham Barr

¹⁴Columbia University Medical Center, New York, USA

Find articles by R. Graham Barr

John H. M. Austin

¹⁴Columbia University Medical Center, New York, USA

Find articles by John H. M. Austin

Maria Lorena Gomez Blum

¹⁴Columbia University Medical Center, New York, USA

Find articles by Maria Lorena Gomez Blum

Belinda M. D’Souza

¹⁴Columbia University Medical Center, New York, USA

Find articles by Belinda M. D’Souza

Emilay Florez

¹⁴Columbia University Medical Center, New York, USA

Find articles by Emilay Florez

Valeria Lopez

¹⁴Columbia University Medical Center, New York, USA

Find articles by Valeria Lopez

Wanda Pecheco

¹⁴Columbia University Medical Center, New York, USA

Find articles by Wanda Pecheco

Byron Thomashow

¹⁴Columbia University Medical Center, New York, USA

Find articles by Byron Thomashow

Chris H. Wendt

¹⁵Minneapolis VA Medical Center, Minneapolis, USA

Find articles by Chris H. Wendt

Arianne Baldomero

¹⁵Minneapolis VA Medical Center, Minneapolis, USA

Find articles by Arianne Baldomero

Miranda Hassler

¹⁵Minneapolis VA Medical Center, Minneapolis, USA

Find articles by Miranda Hassler

Ken M. Kunisaki

¹⁵Minneapolis VA Medical Center, Minneapolis, USA

Find articles by Ken M. Kunisaki

David MacDonald

¹⁵Minneapolis VA Medical Center, Minneapolis, USA

Find articles by David MacDonald

Charlene McEvoy

¹⁶Minnesota HealthPartners-Twin Cities, Bloomington, USA

Find articles by Charlene McEvoy

Nell Adams

¹⁶Minnesota HealthPartners-Twin Cities, Bloomington, USA

Find articles by Nell Adams

Barbara Heinz

¹⁶Minnesota HealthPartners-Twin Cities, Bloomington, USA

Find articles by Barbara Heinz

Jonathan Phelan

¹⁶Minnesota HealthPartners-Twin Cities, Bloomington, USA

Find articles by Jonathan Phelan

Cheryl Sasse

¹⁶Minnesota HealthPartners-Twin Cities, Bloomington, USA

Find articles by Cheryl Sasse

Eric L. Flenaugh

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Eric L. Flenaugh

Judith Delancy

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Judith Delancy

Marilyn G. Foreman

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Marilyn G. Foreman

Hirut Gebrekristos

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Hirut Gebrekristos

Willi Howell

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Willi Howell

Dominique Lawson

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Dominique Lawson

Mario Ponce

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Mario Ponce

Gloria Westney

¹⁷Morehouse School of Medicine, Atlanta, USA

Find articles by Gloria Westney

Russell P. Bowler

¹⁸National Jewish Health, Denver, USA

Find articles by Russell P. Bowler

Sophia Addi

¹⁸National Jewish Health, Denver, USA

Find articles by Sophia Addi

Elena Engel

¹⁸National Jewish Health, Denver, USA

Find articles by Elena Engel

Jay Finigan

¹⁸National Jewish Health, Denver, USA

Find articles by Jay Finigan

Claire Guo

¹⁸National Jewish Health, Denver, USA

Find articles by Claire Guo

Seth Kligerman

¹⁸National Jewish Health, Denver, USA

Find articles by Seth Kligerman

David A. Lynch

¹⁸National Jewish Health, Denver, USA

Find articles by David A. Lynch

Elizabeth Regan

⁴COPDGene Study Consortium, Denver, CO USA

⁷Division of Rheumatology, Department of Medicine, National Jewish Health, Denver, CO USA

Find articles by Elizabeth Regan

Lisa Ruvuna

¹⁸National Jewish Health, Denver, USA

Find articles by Lisa Ruvuna

Richard Rosiello

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Richard Rosiello

Jean Champagne

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Jean Champagne

Mary Charpentier

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Mary Charpentier

Theodore Girard

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Theodore Girard

Jon Jaksha

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Jon Jaksha

Diane Kirk

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Diane Kirk

Laurie Kuck

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Laurie Kuck

Mohammed Quraishi

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Mohammed Quraishi

Lucia Sears

¹⁹Reliant Medical Group (Fallon), Auburn, USA

Find articles by Lucia Sears

Gerard J. Criner

²⁰Temple University, Philadelphia, USA

Find articles by Gerard J. Criner

Elise Cortese

²⁰Temple University, Philadelphia, USA

Find articles by Elise Cortese

Chandra Dass

²⁰Temple University, Philadelphia, USA

Find articles by Chandra Dass

Laurie Jameson

²⁰Temple University, Philadelphia, USA

Find articles by Laurie Jameson

Nathaniel Marchetti

²⁰Temple University, Philadelphia, USA

Find articles by Nathaniel Marchetti

Francine McGonagle

²⁰Temple University, Philadelphia, USA

Find articles by Francine McGonagle

Lauren Miller

²⁰Temple University, Philadelphia, USA

Find articles by Lauren Miller

Kim Selwood

²⁰Temple University, Philadelphia, USA

Find articles by Kim Selwood

Kartik Shenoy

²⁰Temple University, Philadelphia, USA

Find articles by Kartik Shenoy

Regina Sheridan

²⁰Temple University, Philadelphia, USA

Find articles by Regina Sheridan

Shubhra Srivastava-Malhotra

²⁰Temple University, Philadelphia, USA

Find articles by Shubhra Srivastava-Malhotra

Surya P. Bhatt

²¹University of Alabama, Birmingham, USA

Find articles by Surya P. Bhatt

William C. Bailey

²¹University of Alabama, Birmingham, USA

Find articles by William C. Bailey

Sandeep Bodduluri

¹²Baylor College of Medicine, Houston, USA

Find articles by Sandeep Bodduluri

Joe W. Chiles

³Division of Pulmonary, Allergy and Critical Care Medicine, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Joe W. Chiles

Mark T. Dransfield

²¹University of Alabama, Birmingham, USA

Find articles by Mark T. Dransfield

Scott Grumley

²¹University of Alabama, Birmingham, USA

Find articles by Scott Grumley

Sonya Hardy

²¹University of Alabama, Birmingham, USA

Find articles by Sonya Hardy

Anand Iyer

²¹University of Alabama, Birmingham, USA

Find articles by Anand Iyer

David C. LaFon

²¹University of Alabama, Birmingham, USA

Find articles by David C. LaFon

Padma Manapragada

²¹University of Alabama, Birmingham, USA

Find articles by Padma Manapragada

Merry-Lynn McDonald

³Division of Pulmonary, Allergy and Critical Care Medicine, Heersink School of Medicine, The University of Alabama at Birmingham, Birmingham, AL USA

⁴COPDGene Study Consortium, Denver, CO USA

Find articles by Merry-Lynn McDonald

Hrudaya Nath

²¹University of Alabama, Birmingham, USA

Find articles by Hrudaya Nath

Gabriela Oates

²¹University of Alabama, Birmingham, USA

Find articles by Gabriela Oates

Satinder P. Singh

²¹University of Alabama, Birmingham, USA

Find articles by Satinder P. Singh

Raymond C. Wade

²¹University of Alabama, Birmingham, USA

Find articles by Raymond C. Wade

Mike Wells

²¹University of Alabama, Birmingham, USA

Find articles by Mike Wells

Abigail West

²¹University of Alabama, Birmingham, USA

Find articles by Abigail West

Douglas Conrad

²²University of California, San Diego, USA

Find articles by Douglas Conrad

Jeffrey Barry

²²University of California, San Diego, USA

Find articles by Jeffrey Barry

Marissa Gil

²²University of California, San Diego, USA

Find articles by Marissa Gil

Albert Hsiao

²²University of California, San Diego, USA

Find articles by Albert Hsiao

Amber Martineau

²²University of California, San Diego, USA

Find articles by Amber Martineau

Jenna Mielke

²²University of California, San Diego, USA

Find articles by Jenna Mielke

Gabriel Querido

²²University of California, San Diego, USA

Find articles by Gabriel Querido

Xavier Soler

²²University of California, San Diego, USA

Find articles by Xavier Soler

Rajat Suri

²²University of California, San Diego, USA

Find articles by Rajat Suri

Sean Swenson

²²University of California, San Diego, USA

Find articles by Sean Swenson

Angela Wang

²²University of California, San Diego, USA

Find articles by Angela Wang