Cigarette Smoking and E-Cigarette Use Induce Shared DNA Methylation Changes Linked To Carcinogenesis
Cigarette Smoking and E-Cigarette Use Induce Shared DNA Methylation Changes Linked To Carcinogenesis
ABSTRACT
◥
Tobacco use is a major modifiable risk factor for adverse health Significance: The use of both cigarettes and e-cigarettes elicits
outcomes, including cancer, and elicits profound epigenetic changes cell- and exposure-specific epigenetic effects that are predictive of
thought to be associated with long-term cancer risk. While electronic carcinogenesis, suggesting caution when broadly recommending e-
cigarettes (e-cigarettes) have been advocated as harm reduction cigarettes as aids for smoking cessation.
alternatives to tobacco products, recent studies have revealed poten-
tial detrimental effects, highlighting the urgent need for further
research into the molecular and health impacts of e-cigarettes. Here,
we applied computational deconvolution methods to dissect the cell-
and tissue-specific epigenetic effects of tobacco or e-cigarette use on
DNA methylation (DNAme) in over 3,500 buccal/saliva, cervical, or
blood samples, spanning epithelial and immune cells at directly and
indirectly exposed sites. The 535 identified smoking-related DNAme
loci [cytosine-phosphate-guanine sites (CpG)] clustered into four
functional groups, including detoxification or growth signaling,
based on cell type and anatomic site. Loci hypermethylated in buccal
epithelial cells of smokers associated with NOTCH1/RUNX3/growth
factor receptor signaling also exhibited elevated methylation in
cancer tissue and progressing lung carcinoma in situ lesions, and
hypermethylation of these sites predicted lung cancer development
in buccal samples collected from smokers up to 22 years prior to
diagnosis, suggesting a potential role in driving carcinogenesis.
Alarmingly, these CpGs were also hypermethylated in e-cigarette
users with a limited smoking history. This study sheds light on the
cell type–specific changes to the epigenetic landscape induced by
smoking-related products.
1 10
European Translational Oncology Prevention and Screening (EUTOPS) Insti- Population Health Sciences, Bristol Medical School, University of Bristol,
tute, Universit€
at Innsbruck, Innsbruck, Austria. 2Research Institute for Bio- Bristol, United Kingdom. 11Department of Women’s and Children’s Health,
medical Aging, Universit€at Innsbruck, Innsbruck, Austria. 3Department of Karolinska Institutet, Stockholm, Sweden.
Women’s Cancer, UCL EGA Institute for Women’s Health, University College
Corresponding Author: Martin Widschwendter, EUTOPS Institute, Universit€at
London, London, United Kingdom. 4Division of Clinical Epidemiology and
Innsbruck, Milser Straße 10, Hall in Tirol, Tirol 6060, Austria. E-mail:
Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Ger-
[email protected]
many. 5German Cancer Consortium (DKTK), German Cancer Research Center
(DKFZ), Heidelberg, Germany. 6Department of Gynecology and Obstetrics, Cancer Res 2024;84:1898–914
First Faculty of Medicine and Hospital Na Bulovce, Charles University in
Prague, Prague, Czech Republic. 7Gynecologic Oncology Center, Department doi: 10.1158/0008-5472.CAN-23-2957
of Obstetrics and Gynecology, First Faculty of Medicine, Charles University in
Prague, General University Hospital in Prague, Prague, Czech Republic. 8MRC This open access article is distributed under the Creative Commons Attribution
Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, 4.0 International (CC BY 4.0) license.
University College London, London, United Kingdom. 9MRC Integrative
Epidemiology Unit at the University of Bristol, Bristol, United Kingdom. 2024 The Authors; Published by the American Association for Cancer Research
AACRJournals.org | 1898
Smoking-Triggered Epigenome-Based Carcinogenic Mechanisms
dysfunction (7) and DNA damage (8), underscoring the urgency different anatomic sites (directly exposed vs. not directly exposed).
for further research into molecular changes and long-term health This is of particular interest given the role of epithelial cells, whether
impacts of e-cigarettes (9). However, the relative novelty of e- directly exposed (e.g., lung, oral mucosa) or not (e.g., cervix), as the
cigarettes and the fact that many e-cigarette users (“vapers”) are also predominant cell of origin for tobacco-related malignancies, and the fact
former smokers renders this task complex and studies with several that smoking-related DNAme changes in buccal samples, consisting
decades of follow-up would be required to investigate the impact of predominantly of epithelial cells, were found to reflect cancer-associated
e-cigarette use on cancer risk if incidence were the primary outcome. changes (38). Meanwhile, immune cells and their dysregulation can
Biomarkers could represent an attractive strategy to evaluate promote tumor initiation and progression (42), and their specific
their impact in the absence of such long-term studies. The majority changes in response to smoking might likewise be of relevance.
of existing biomarker studies for e-cigarette use have thus far focused Investigating cell type–specific DNAme changes resulting from
only on acute impacts. Some of these studies have found e-cigarettes smoking or vaping could therefore help to (i) unveil diverse biological
elicit similar biomarker changes to cigarette smoking (10–12) while responses to tobacco use by distinct cell types, (ii) identify common or
others found a relative reduction in risk indicators or pre-existing divergent epigenetic alterations elicited by tobacco or e-cigarette use in
disease after switching from cigarettes to e-cigarettes (13, 14). None- distinct cell types that might be obscured by bulk analysis, (iii) provide
theless, to evaluate longer-term health effects, it is essential to identify insights into carcinogenesis and potential diagnostic markers. In this
biomarkers that may be informative of cancer risk related to cigarette study, we systematically unravel the impact of tobacco use on epithelial
and e-cigarette use. Such biomarkers should meet the following criteria versus immune cells, employing deconvolution and cell type–specific
to be of clinical use: (i) they should be modified by smoking and e- DNAme inference using data from 1,164 buccal/saliva, 1,777 cervi-
cigarette use; (ii) they should lie in genes associated with carcinogen- cal, and 616 blood samples. We comprehensively assess and validate
esis; (iii) they should indicate a clonal advantage for cells, as indicated effects on directly or not directly exposed, thereafter termed “prox-
by an aggravation in cancer tissue compared with adjacent non-cancer imal” and “distal”, epithelial and immune cells, in response to smoking,
tissue; (iv) they should be associated with cancer progression in a smokeless tobacco, or e-cigarette use. Thereafter, we extend our
premalignant lesion; and (v) they should be reflective of long-term enquiry into lung cancer tissue and prognosis, along with surrogate
cancer risk in a surrogate tissue, for example, blood or buccal swab, to samples preceding lung cancer diagnosis to investigate whether smok-
allow for noninvasive monitoring. ing-related changes might be suitable for cancer prediction in smokers.
Investigating how tobacco use or e-cigarettes influence the epigen-
ome, and might thereby be linked to carcinogenesis, could help to
better understand their long-term impacts. DNA methylation Materials and Methods
(DNAme) at the cytosine C-5 position is an epigenetic modification Study and sample overview
that integrates the impact of heritable and nonheritable factors (15). It An overview of characteristics of participants and samples is shown
has previously been implicated in conveying, at least in part, the long- in Supplementary Table S1.
term health impacts of smoking, with DNAme alterations enriched in
genes associated with smoking-related diseases (16). Certain epige- Discovery set
netic changes have shown persistence after smoking cessation (17) and Buccal, cervical, and blood samples were obtained from healthy
could even predict lung cancer incidence (e.g., methylation in genes volunteers who took part in the FORECEE study (female cancer
AHRR or F2RL3; refs. 18–20). Investigations into smokeless tobac- prediction using cervical omics to individualize screening and pre-
co (21, 22) or e-cigarette use (22, 23) on DNAme are also emerging. vention—4C), a multicenter study involving several recruitment sites
These studies generally report less pronounced epigenetic changes in five European countries (the United Kingdom, Czech Republic,
when comparing smokeless tobacco with combustible cigar- Italy, Norway, and Germany). The FORECEE study had ethical
ettes (21, 22), as well as an absence of a strong DNAme response to approval from the UK Health Research Authority (REC 14/LO/1633)
e-cigarette use in blood (22) and saliva (23). and all other contributing centers. Participants were ages >18 years and
The majority of DNAme studies into smoking-related changes, <86 years. After providing written informed consent, participants
including those predicting lung cancer incidence (18, 19), have used completed an epidemiologic questionnaire.
blood samples (e.g., refs. 24–29). However, DNAme variations across Samples were processed as described previously (43). Briefly, buccal
cell types (30), in particular in response to exposures and other cells were collected using two Copan 4N6FLOQ Buccal Swabs (Copan
nonheritable factors, merit consideration. For instance, aging has been Medical Diagnostics, catalog no. 4504C) by firmly brushing the swab
found to impact DNAme differently across distinct cell types or head five to six times against the buccal mucosa of each cheek. The
tissues (31–33). Such findings necessitate the consideration of cellular swabs were recapped and left to dry out at room temperature within the
heterogeneity during DNAme analysis, which is typically carried out in sampling tube, which contained a drying desiccant. The sample vial
bulk, for the interpretation of epigenetic changes (34, 35). Although was sealed and stored locally at room temperature. For blood samples,
many studies in blood have accounted for cellular composition, studies 2.5 mL of venous whole blood was collected in PAX gene blood DNA
that explore methylation changes in specific cell types remain tubes (BD Biosciences #761165) and stored locally at 20 C. Cervical
sparse (36, 37). These studies identified that smoking differentially liquid-based cytology samples were collected at appropriate clinical
impacts on cell types of the innate and adaptive immune sys- venues by trained staff using the ThinPrep system (Hologic Inc.,
tem (36, 37). Some studies have also investigated DNAme changes catalog no. 70098-002). Cervical cells were sampled from the cervix
in response to smoking in other sample types, including buccal using a cervix brush (Rovers Medical Devices, catalog no. 70671-001),
swabs (21, 38, 39), saliva (40), adipose, or skin tissue (41). which was rotated five times through 360 degrees while in contact with
Notably, while investigating different tissues or accounting for cellu- the cervix to maximize cell sampling. The brush was removed from the
lar heterogeneity, few studies have aimed to study the effects of tobacco vagina and immersed in a ThinPrep vial containing Preserve-cyt fluid
or e-cigarette use on DNAme across distinct cell types (36, 37), and none and then pushed against the bottom of the vial 10 times to facilitate
have directly scrutinized impacts on epithelial versus immune cells at release of the cells from the brush into the solution. All samples were
shipped to University College London (UCL) at ambient temperature. ization using meffil, an R package designed for preprocessing of large
Biological samples were given an anonymous Participant ID Number, samples of Illumina Methylation BeadChip microarrays (45). Sample
which was assigned to the person’s name in a securely stored link file. outliers were identified and removed on the basis of sex chromosome
Cervical, buccal, and breast tissue DNA were normalized to methylation, methylation versus unmethylation intensity, control
25 ng/mL and 500 ng total DNA were bisulfite modified using the probes, detection P values (N ¼ 10 exclusions in total: 4 vapers, 3
EZ-96 DNA Methylation-Lightning kit (Zymo Research Corp, catalog smokers, and 3 nonsmokers). Poor-quality CpG sites, SNP/control
no. D5047) on the Hamilton Star Liquid handling platform. A total of probes and CpGs on the sex chromosomes were excluded, resulting in
8 mL of modified DNA was subjected to methylation analysis on the 846,244 CpG sites for analysis.
Illumina Infinium HumanMethylationEPIC BeadChip (Illumina) at
UCL Genomics according to the manufacturer’s standard protocol. Smokeless tobacco use set
Data on saliva samples from snuff tobacco users, smokers, and
Validation set nonsmokers were obtained from the “Development of Biomarkers of
The validation set comprised 304 matched buccal and blood samples Effect From Chronic Tobacco Usage” study (NCT01923402; ref. 21).
from 152 female volunteers in the UK Medical Research Council Briefly, a cross-sectional study was conducted between June 2010 and
(MRC) National Survey of Health and Development (NSHD), a birth January 2011. Adult male subjects ages 35–60 years were enrolled into
cohort study of men and women born in 1946, as described previous- three cohorts of 40 subjects each, and written informed consent was
ly (38, 44), and 442 cervical samples from breast cancer cases collected obtained from all participants. Smokers were defined as exclusive
as part of the FORECEE study (see Discovery set). All volunteers in cigarette smokers who self-reported smoking at least 10 cigarettes per
the NSHD study provided written informed consent for their samples day for at least 3 years; moist snuff tobacco users were defined as self-
to be used in genetic studies of health, and the Central Manchester reporting using at least two cans of moist snuff per week for at least 3
Ethics Committee approved the use of these samples for epigenetic years; nonsmokers were individuals who self-reported not to use any
studies of health in 2012. Women were selected from those who tobacco or nicotine-containing products for at least 5 years. Buccal
provided a buccal and blood sample at age 53 years in 1999, who had cells were collected following a 2-hour fasting window from food and
not previously developed any cancer, and who had complete informa- tobacco. Subjects rinsed their mouth with Scope mouthwash followed
tion on epidemiologic variables of interest and follow-up. Methylation by a water rinse and buccal cells were collected. The cell pellet was
analysis for buccal and blood samples was performed using the Illumina washed in PBS and used for DNA extraction. DNA extraction and
Infinium HumanMethylation450 BeadChip array (38), while it was global methylation profiling of 485,577 CpG sites were performed by
performed using the Illumina Infinium HumanMethylationEPIC Expression Analysis, Inc., on Illumina Infinium HumanMethyla-
BeadChip (Illumina) at UCL Genomics according to the manu- tion450 BeadChip arrays.
facturer’s standard protocol in the cervical samples.
Lung cancer tissue
Preprocessed and harmonized Illumina HumanMethylation450K
E-cigarette set
array DNAme data from The Cancer Genome Atlas (TCGA) from
Data on e-cigarette users were derived from the Studying the
lung squamous cell carcinoma (LUSC) and lung adenocarcinoma
Epigenetics of E-cigarette Use (SEE-Cigs) study (23). As described
(LUAD) were accessed via TCGAbiolinks, utilizing all available meth-
previously, e-cigarette users, tobacco smoker, and nonsmokers ages 16
ylation samples in using project codes TCGA-LUAD and TCGA-
to 35 years were recruited from the UK general population via several
LUSC (46). Detailed methods are provided in the code repository.
mechanisms, including flyers, blogs, podcasts, and social media from
January 2017 to January 2019. E-cigarette users were defined as having Cervical cancer tissue
used e-cigarettes at least weekly for the past 6 months and having DNAme data from cervical cancer tissue or matched normal
smoked less than 100 cigarettes in their lifetime; smokers were defined samples were obtained from NCBI Gene Expression Omnibus (GEO;
as having smoked cigarettes at least weekly for the past 6 months and GSE211668; ref. 47).
having used an e-cigarette less than 100 times in their lifetime; never
smokers were defined as having smoked cigarettes or e-cigarettes less Carcinoma in situ progression data
than 100 times in their lifetime. Additional eligibility criteria were good DNAme data from premalignant precursor lesions [carcinoma in
self-reported physical and mental health and ability to give informed situ (CIS)] that either recurred or did not recur were obtained from
consent as judged by the investigator. Exclusion criteria were depen- NCBI GEO (GSE108123; ref. 48). Progressive and regressive lung CIS
dence on alcohol or drugs other than nicotine; significant current or lesions were laser-captured, and their epigenome interrogated using
past illness, current pregnancy or breast feeding; having a related the Illumina Infinium HumanMethylation450 BeadChip. Data were
individual in the study (23). matched to patient characteristics using Supplementary Materials and
After completing an online questionnaire, participants were Methods, Table 1 from a previous publication (49).
screened for eligibility and sent an information sheet and consent
form. Written informed consent was obtained from all participants. ESTHER study set
Participants received a saliva collection kit (DNA Genotek Oragene) DNAme data were obtained from participants of the ESTHER
and were asked to provide 2 mL of saliva. DNA was extracted from (Epidemiological Study on the Chances of Cure, Early Detection and
saliva samples and underwent bisulphite conversion using the Zymo Optimized Therapy of Chronic Diseases in the Elderly Population)
EZ DNA Methylation kit (Zymo). Genome-wide methylation status of study, a large ongoing prospective, population-based cohort study
over 850,000 cytosine-phosphate-guanine sites (CpG) was measured conducted in Germany. In brief, 9,940 participants were recruited by
using the Illumina HumanMethylationEPIC array according in three their general practitioners during routine health checkups between July
batches with sampling criteria in place to ensure that all three groups 2000 and December 2002 and provided written informed consent for
were represented in each batch to minimize potential confounding by study participation. The participants have been followed up every 2
batch effects. Microarray data underwent quality control and normal- to 3 years since then. At baseline recruitment and each follow-up,
standardized self-administered questionnaires were used to collect 8.2 108 for cervical and P < 7.9 108 for buccal and blood
information on sociodemographic characteristics, lifestyle, and dietary samples, which is more conservative than a benchmarking study for
factors. Blood samples were collected during the examinations and the EPIC array suggested (P < 9 108; ref. 54). To estimate the impact
stored at 80 C for later testing. DNAme analyses in this study were of smoking on epithelial versus immune cells in buccal and cervical
based on 1,352 samples from randomly selected individuals (subset IV, samples, we performed linear regression of the beta values on
total n ¼ 1,493), and analyzed using the Illumina MethylationEPIC. EpiDISH-inferred immune cell proportion (52) for each CpG site,
Incident cases of cancer during follow-up between 2000 and end of as described previously (43, 55). The linear models were fitted for
2018 (17 years of follow-up) were identified through record linkage smokers and never smokers separately, and the intercept points at
with the Saarland Cancer Registry. Controls are participants without immune cell proportion ¼ 0 were used as estimates of mean beta
lung cancer diagnosis until the end of 17 years of follow-up. values in smokers and never smokers in a pure epithelial cell
population. The difference between these intercept points provided
General information for clinical studies a D b estimate in epithelial cells. Conversely, the difference between
All studies obtained written informed consent from participants. intercept points at immune cell proportion ¼ 1 provided immune
Studies were conducted in accordance with the Declaration of Helsinki cell D b estimates. The same approach was applied to account for
and approved by Institutional Review Boards. myeloid and lymphoid differences.
All CpGs that were (i) significant in at least one of the samples
DNAme data preprocessing after Holm–Bonferroni correction, (ii) present on Illumina Human
Methylation microarray data in the discovery, validation, moist MethylationEPIC array version 2, and (iii) not on our list of
snuff tobacco user, and CIS datasets were processed through the same previously identified “unreliable” probes were used for further
standardized pipeline running in R version 4.2.2. Raw data were loaded analysis (n ¼ 535; ref. 56). Of note, seven of these CpGs are located
using the R package minfi, version 1.36.0 (50). Any samples with on the X chromosome and were removed for evaluation of mean
median methylated and unmethylated intensities <9.5 were removed. scores in additional datasets.
Any probes with a detection P value >0.01 were regarded as having We performed clustering on a reduced feature space to identify
failed. Any samples with >10% failed probes, and any probes with co-regulated groups of CpGs, that is, a matrix of D b values where
>10% failure rate were removed from the dataset. Beta values from rows were based on CpGs that were significantly associated with
failed probes (0.001% of the dataset) were imputed using the impute. smoking in the initial EWAS, and columns were based on D b
knn function as part of the impute R package, version 1.62.0. Non-CpG values of the given CpG across all tissues (D b epithelial in buccal,
probes (2,932), SNP-related probes as identified by Zhou and col- D b immune in buccal, D b epithelial in cervical, D b immune in
leagues (82,108), and chrY probes were removed from the dataset as cervical, D b lymphoid in blood, D b myeloid in blood), constituting
previously reported (43). An additional 6,102 previously identified a matrix of 535 6 (Supplementary Fig. S1). Clusters were
probes that followed a trimodal methylation pattern characteristic of identified via Uniform Manifold Approximation and Projection
an underlying SNP were removed. Background intensity correction (UMAP) and validated using a distance-based hierarchical cluster-
and dye bias correction were performed using the minfi single-sample ing approach.
preprocessNoob function. Probe bias correction was performed using
the beta mixture quantile normalization (BMIQ) algorithm of the Functional annotation and gene set enrichment analysis
ChAMP package, version 2.18.3 (51). The Illumina Infinium HumanMethylationEPIC BeadChip man-
For the ESTHER study data, raw DNAme data were normalized to ifest (doi: 10.18129/B9.bioc.IlluminaHumanMethylationEPICanno.
internal controls provided by the manufacturer. In data preprocessing, ilm10b4.hg19) was used to identify genes the CpGs were spanning.
signals of probes with detection P value >0.01, >10% missing values, CpGs on sex chromosomes were excluded. The clusterProfiler
and probes targeting the X and Y chromosomes were excluded. package (57) was used for gene set enrichment analysis of genes
Cell type proportions were inferred using EpiDISH (epigenetic unique to each group (i.e., not present in other groups). All genes
dissection of intrasample heterogeneity; ref. 52). Epithelial, fibroblast, with CpGs on the EPIC array not located on sex chromosomes were
and immune cell proportions were identified using the centEpiFibIC. used as background. Reactome pathway analysis was conducted
m reference matrix. Immune cell subtype proportions were identified using ReactomePA package (58) with the PvalueCutoff set to 0.2
using the hierarchical EpiDISH algorithm (hEpiDISH) with the and minGSSize set to 3. P values were adjusted using Benjamini–
centBloodSubtype.m reference matrix (maxit ¼ 500, RPC ¼ 3, h. Hochberg method.
CT.idx ¼ 3). Polycomb group target (PCGT) genes were defined genes with
occupancy of at least one of SUZ12, EED, and H3K27me in a previous
Analysis of DNAme association with smoking chromatin immunoprecipitation sequencing experiment (Supplemen-
Our analysis workflow is shown in Supplementary Fig. S1. We tary Table S9 in ref. 59). Of these, 1,343 genes were found in the
evaluated cell type–specific DNAme changes associated with smoking Illumina Infinium HumanMethylationEPIC manifest. Enrichment for
separately in DNAme data buccal, cervical, and blood samples of PCGT genes was conducted via Fisher exact test.
current or never smokers (Supplementary Table S1). Initially,
we conducted an epigenome-wide association study (EWAS) sep- Association with gene expression
arately in each tissue, accounting for age and immune cell propor- Matched gene expression (STAR counts) and methylation data
tion (buccal, cervical samples), or age and lymphoid cell proportion were obtained from TCGA-LUAD and TCGA-LUSC via the TCGA-
[blood, 1 – (myeloid proportion)], utilizing hEpiDISH (53). We Biolinks package. For each CpG, methylation beta values were
grouped monocytes, neutrophils, and eosinophils as myeloid line- correlated to log2 corresponding cis gene counts (Pearson correla-
age (hepidish_Mono, hEpidish_Neutro, hEpidish_Eosino). tion). P values and Pearson R were collected and visualized. CpGs
CpGs were considered significantly associated with smoking if their with a correlation of Holm–Bonferroni–corrected P value < 0.05
Holm–Bonferroni–corrected P value was < 0.05, corresponding to P < were considered significantly associated with gene expression.
Mean methylation computation and correction for cell type but can be requested via https://nshd.mrc.ac.uk/. All proposals to use
(mean b) for each set of CpGs was
Mean methylation beta value b NSHD data must support and adhere to the core principles of data
calculated as: sharing with the MRC (ethical, equitable, efficient). Data of the e-
Pn cigarette set were obtained from the original authors of the SEE-Cigs
b
b ¼ i¼1 i ; ðAÞ study. Data on smokeless tobacco use were obtained from NCBI GEO,
n
under accession number GSE94876.
where bi represents the beta value of each CpGs and n is the total Data on lung cancer were obtained from TCGA. Data on CIS
number of CpGs in each set. Datasets derived from the Illumina- progression were obtained from NCBI GEO under accession number
MethylationEPIC array would use all sites unless specifically indicated GSE108123. Data on cervical cancer were obtained from NCBI GEO
(i.e., when directly comparing 450K and EPIC array), whereas from the under accession number GSE211668.
450K array would only use sites present on the 450K array. Perfor- All data that support the findings of the ESTHER study are
mance of mean methylation values did not seem to depend on Illumina available upon request from the co-author Hermann Brenner. The
Methylation array version, although the 450K array only included data are not publicly available due to them containing information
approximately half of the relevant smoking site CpGs. that could compromise research participant privacy/consent. All
Our epigenome-wide analysis revealed that cell type heteroge- other raw data are available upon request from the corresponding
neity can influence methylation scores at sites associated with author.
smoking. To account for cell type heterogeneity in buccal or saliva
samples and infer methylation values of a “pure” sample consisting
either of only epithelial or immune cells, we applied a correction Results
algorithm. Briefly, for each type (never, ex-smokers, current smo- Smoking elicits cell type–specific functional epigenetic
kers; or e-cigarette users, moist snuff tobacco users), a linear model alterations across epithelial and immune cells depending on
was fit for mean methylation value against immune cell proportion. anatomic site
For each score b and type t, the residual between true and predicted Our analysis workflow is shown in Supplementary Fig. S1. Initially,
value was then added to the intercept at immune cell proportion ¼ 0 to identify DNAme changes across diverse tissues that are either
(“pure” epithelial sample; for epi hypomethylated (hypoM), distal directly exposed or not directly exposed to tobacco (Fig. 1A), we
epithelial hypermethylated (hyperM), and proximal epithelial conducted an EWAS of DNAme levels and smoking status in a
hyperM) or immune cell proportion ¼ 1 (“pure” immune sample; discovery set of 542 buccal, 464 blood samples, and 1,335 cervical
for immune hypoM). samples from current or never smokers, including samples from
women as these enabled access to both directly exposed and indirectly
btðcorrÞ ¼ interceptt þ e; ðBÞ exposed epithelium (cervix). Characteristics of the discovery set
participants are shown in Supplementary Table S1.
where t is type (e.g., never smoker, ex-smoker, current smoker), The EWAS was conducted separately per sample type, accounting
intercept is the intercept of the model for type t at immune cell for age and cell type proportion. As expected on the basis of previous
proportion 0 or 1 (depending on whether an epithelial or immune reports, we identified multiple CpG loci significantly associated with
effect is to be estimated), and e is defined as the residual y ^y [y ¼ b, smoking in buccal and blood samples, and additionally for the first
b time describe loci associated with smoking in cervical samples (Sup-
i.e., the mean beta value in the set as computed in Eq. A], and ^y is b ,
t plementary Fig. S2–S4a: Manhattan plots; b, quantile-quantile plots; c,
that is, the mean estimated value based on the linear regression model delta-beta histogram in buccal, blood, and cervical samples, respec-
in type t. tively). We report a total of 535 sites significantly associated with
smoking in at least one of the tissues, 279 (52%) of which are also
Statistical analysis present on the IlluminaHumanMethylation450K (Supplementary
All analyses were conducted in R version 4.3.1. Comparison of Table S2).
mean beta values for between smokers, never smokers, ex-smokers, To investigate cell lineage–specific effects, we were additionally
e-cigarette users, or moist snuff tobacco users, were conducted interested in whether the signal within each tissue was derived from
using Wilcoxon test (paired where indicated). Area under the ROC epithelial or immune cells (buccal/cervical) or myeloid or lymphoid
and corresponding confidence intervals (CI) were computed using cells (blood). To investigate this, we fitted linear models for smokers
the pROC package 1.18.0 (60), utilizing DeLong’s method for CI and never smokers versus immune cell proportion within each sample
computation. ORs for immune hypoM in the ESTHER study were type and inferred the difference in methylation levels, termed delta beta
computed after standardising immune hypoM values, using logistic (D b), in pure epithelial or immune cells for buccal or cervical samples
regression. [see Materials and Methods and Supplementary Fig. S1, as described
previously (43, 55)]. For blood, the same approach was applied but the
Code availability term immune cell proportion was replaced with lymphoid proportion,
Code used in this analysis is deposited under https://github. based on (1 inferred sum of monocyte, neutrophil, and eosinophil
com/chiaraherzog/WID_SMK_code/. proportion; Materials and Methods). Among the 535 sites significantly
associated with smoking in the EWAS after Bonferroni correction, we
Data availability identified several loci that exhibited lineage-specific methylation
Data accession numbers for smoking datasets are shown in Sup- changes. In Fig. 1B, we specifically visualize three example CpGs,
plementary Table S1. Data of the discovery set are deposited located within the AHRR gene or intergenic region, that appear to
in the European Genome-Phenome Archive under study ID exhibit distinct methylation changes depending on tissue and cell type:
EGAS00001005055. Data in the validation set are not deposited for instance, cg04066994 exhibits more pronounced hypomethylation
because of restrictions on the informed consent of the NSHD cohort with decreasing immune cell proportion in smokers compared with
Exposure
B Never smoker Smoker
C Distal
Proximal
Cell
Buccal Cervical Blood
1.0 Epithelial
Immune
Epithelial ∆ β
Epithelial ∆ β
cg04066994
0.8
Epithelial ∆β
0.6 0.2
hypoM
0.1
0.4 0
-0.1
0.2 -0.2
1.0
Distal
Methylation β
cg21566642
0.8 epithelial
Immune* ∆ β
Immune* ∆ β
Immune* ∆ β
hyperM
0.6
0.4
Proximal
0.2 epithelial
1.0 hyperM
cg24688690
0.8
Epithelial ∆ β
0.6
Immune
0.4 hypoM
0.2
Myeloid (blood)
Epithelial (cervical)
Immune (cervical)
Lymphoid (blood)
Immune (buccal)
Epithelial (buccal)
0.0 0.4 0.8 1.2 0.0 0.4 0.8 1.2 0.0 0.4 0.8 1.2
Immune* proportion
Figure 1.
General overview of the study and identification of cell type–specific smoking-dependent epigenetic changes. A, Overview of the study. We aimed to identify cell-
and tissue-specific epigenetic alterations and used a discovery set of buccal, cervical, and immune cells (all female). Findings were then validated in several
independent sets to confirm the association with current and former smoking and explore association of cell-specific effects across smoking alternatives (e-cigarette
use, moist tobacco use), lung cancer tissue and progression, and possibility to predict lung cancers in smokers using noninvasive samples. A detailed workflow of the
analysis is shown in Supplementary Fig. S1. B, Scatterplots of methylation beta values in three CpGs located in the AHRR gene or intergenic region versus immune cell
proportion (buccal and cervical samples) or lymphoid proportion (blood) indicate methylation differences may be derived from distinct cell types. C, Visualization of
delta-beta values across four groups of CpGs identified in Supplementary Fig. S5A. A matrix of inferred delta-beta values across all tissues for all significant CpGs (i.e.,
significant in at least one tissue in the EWAS) was clustered using UMAP and the following clusters identified: epithelial hypomethylation (epithelial hypoM), immune
hypomethylation (immune hypoM), distal epithelial hypermethylation (distal epithelial hyperM; effects in distal epithelium but not directly exposed epithelium), and
proximal epithelial hypermethylation (proximal epithelial hypoM; effects in buccal/directly exposed samples only). (A, Created with BioRender.com.)
nonsmokers (i.e., “epithelial differential methylation”), while the more pronounced in samples with a lower lymphoid proportion,
hypomethylation is not evident in blood samples or cervical samples suggesting a stronger differential methylation in cells of the myeloid
with higher immune cell proportions. cg21566642 shows the opposite lineage. cg24688690 shows differential methylation in buccal but not
behavior, indicating differential methylation is driven by immune cells. cervical epithelial cells of smokers compared with never smokers,
Moreover, differential methylation of cg21566642 in smokers seemed suggesting methylation changes may be observed only in epithelial
cells that are directly exposed to smoke. These examples highlight that We investigated how many of the sites overlapped with those
even within the same gene, differential methylation signals may be identified by previous studies. Except for immune hypoM CpGs, the
derived from different cell types. majority of CpGs (320/535, 60%) were not previously reported,
To more systematically classify the 535 significant loci listed in likely due to the fact that the majority of prior studies utilized blood
Supplementary Table S2, we conducted data-driven clustering on a samples containing immune cells only (Supplementary Fig. S6D;
reduced feature space, whereby we clustered CpGs based on a refs. 16, 28, 38).
matrix of their D b values in each sample and inferred cell type As female-only samples were used for discovery, seven of 535
(Supplementary Fig. S1) via UMAP. Our approach proposed the CpGs were on the X chromosome and were excluded from
existence for four distinct groups of CpGs (Supplementary further analyses, some of which also contained samples from
Fig. S5A), which was also confirmed by an independent dis- male donors, resulting in 528 CpGs evaluated the remainder of
tance-based hierarchical clustering approach (Supplementary the study.
Fig. S5B). Visualization of Db values by cluster indicated that
groups were, as expected, largely driven by cell type specificity Smoking-related cell-type specific effects are attenuated in
(Fig. 1C). For simplicity, groups were subsequently named after former smokers
their predominant pattern: (general) epithelial hypoM CpGs, hypo- We initially validated our findings in a dataset of 152 matched
methylated in both proximal and distal epithelial (buccal and buccal and blood samples (450K array), as well as a separate set of 442
cervical) but not immune cells; immune hypoM CpGs, showing cervical samples (EPIC array), derived from never smokers, ex-smo-
a loss of methylation across all immune samples but not epithelial kers, or current smokers (Fig. 3A–C) by visualizing mean methylation
cells; distal epithelial hyperM CpGs, hypermethylated in distant in each group versus inferred immune or lymphoid cell proportion,
epithelial cells with few other changes; and proximal epithelial which revealed groups of CpGs behaved similarly as in the discovery
hyperM CpGs, which showed hypermethylation in buccal epithelial set (Fig. 2A).
cells (Fig. 1C). To enable the comparison of each group of CpGs to distinguish
Figure 2A illustrates the mean b value across all CpGs in each of between never smokers, ex-smokers, or current smokers using the
the four groups against immune cell proportion (buccal and area under the ROC curve despite differences in cellular compo-
cervical sample) or lymphoid proportion (blood) in the discovery sition (cell type distributions across all datasets in this study are
set, and confirmed cell type–specific effects: for example, epithelial shown in Supplementary Fig. S7A and S7B), we applied a correction
hypoM exhibited a loss of methylation with decreasing immune algorithm, illustrated in Supplementary Figs. S1, S8a–S8d, and in
cell content in both buccal and cervical samples, but no difference Supplementary Data S1. Similar to the initial discovery approach to
in blood samples, indicating a general epithelial effect, whereas infer delta-betas in pure epithelial or immune cell proportions, this
proximal epithelial hyperM specifically emerged with decreasing correction allowed us to estimate the methylation level in a pure
immune cell content in buccal samples, but not in cervical (or epithelial or immune cell fraction derived from a given sample.
blood) samples. Corrected mean beta values in each group of CpGs showed AUC
Aiming to investigate whether the four groups of CpGs were values in line with what would be expected (Fig. 3D–F): for
associated with specific genes or functions, we found CpGs in the example, in blood the immune hypoM score performed best where-
four groups shared little overlap in the genes that they were as the mean methylation of epithelial-derived CpGs did not result in
spanning (Fig. 2B), with only one gene being shared between all a high AUC (Fig. 3E), indicative that epithelial-specific differential
three groups (AHRR). Gene set enrichment identified specific path- methylation does not distinguish smokers from never smokers in
ways for each group (Fig. 2C–F; Supplementary Tables S3–S6): immune samples. Epithelial hypoM signature distinguished smo-
genes unique to epithelial hypoM CpGs were enriched for involve- kers from never smokers in both cervical and buccal samples
ment in cellular response to oxidative stress and detoxification, whereas the proximal and distal epithelial hyperM signatures
immune hypoM CpGs were uniquely associated with genes involved exhibited a high ability to distinguish smokers from never smokers
in morphogenesis and development; distal epithelial hyperM CpGs only in relevant proximal (buccal) or distal (cervical) samples
were uniquely associated with genes involved in glucoronate and containing epithelial cells, respectively (Fig. 3D–F).
uronic acid metabolism, and Reactome pathway analysis revealed As reported previously, the methylation changes in former
that proximal epithelial hyperM CpGs were associated with genes smokers were less pronounced than in current smokers, in relation
involved in NOTCH1/RUNX3/growth factor receptor signaling and to never smokers. In buccal samples, the mean corrected beta value
transduction, and included genes HDAC7 and MTOR. Proximal of epithelial hypoM CpGs was not significantly different from never
epithelial hyperM and immune hypoM sites exhibited an enrich- smokers, whereas the same signature remained differentially meth-
ment for genes covered by PCGTs, that are known regulators of cell ylated in cervical samples of ex-smokers (Fig. 3D–F). Proximal
fate (Supplementary Fig. S6A). epithelial hyperM also remained significantly elevated in buccal
Leveraging matched expression and methylation data for CpGs samples from former smokers compared with never smokers
present on the 450K array in lung tissue derived from TCGA, we (Fig. 3D). Across all samples, the immune hypoM signature was
assessed whether individual CpGs were associated with expression of significantly differentially methylated in never smokers compared
cis genes. This indicated that several methylation loci were significantly with controls (Fig. 3D–F).
associated with expression, and 55/98 loci with matching expression To study dose dependence of smoking signatures, we investigated
data exhibited P < 0.05 after Bonferroni correction (Supplementary their association with smoking pack year in buccal and blood samples,
Fig. S6B; Supplementary Table S7). Depending on their regulatory for which we had this information available. Smoking pack years were
position, CpG loci that were significantly correlated with expression significantly correlated with the mean methylation levels of relevant
after Bonferroni correction were associated with negative (tran- groups of CpGs in each tissue (immune hypoM in blood, all except
scription start site) or positive regulation of expression (body; distal epithelial hyperM in buccal samples; Supplementary Fig. S9A
Supplementary Fig. S6C). and S9B).
0.40
0.6 0.40
0.40
0.5 0.35 0.35
0.35 Distal
0.4 0.30
0.30
0.30 epithelial
0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 Immune
hyperM Proximal
Immune cell proportion hypoM
Epithelial epithelial
hypoM 56 80 hyperM
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
(17.4%) (24.8%)
0.8
Blood sample
1 1 1
0.7 (0.3%) (0.3%)
Mean β
(0.3%) 88
0.6
87 (27.3%)
0.5 0 0
(27.0%)
0.4 (0.0%) (0.0%)
0.3
0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 1
lymphoid proportion 6 (0.3%) 0
(1.9%) (0.0%)
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM 0 0
Cervical sample
C cellular response
cellular
Cellular response to
responseto oxygen
tooxygen radical
oxygenradical
radical
D CSRNP1
CSRNP1
head morphogenesis
head morphogenesis
removal
Removal
removal of
ofofsuperoxide
superoxideradicals
superoxide radicals
radicals ARID5B
ARID5B LRRC32
LRRC32
cellular
Cellular
cellular response
responseto
response tosuperoxide
to superoxide
superoxide
face
face morphogenesis
morphogenesis
BMP7
BMP7
size
NQO1
NQO1
3
roof
roof of
of mouth
mouth development
development
H19
H19 4
SKI
SKI 5
NFE2L2
NFE2L2
6
response
Response
response to
totosuperoxide
superoxide
superoxide TSHZ1
TSHZ1
Size
cellular
Cellular
cellular oxidant
oxidantdetoxification
oxidant detoxification
detoxification
3 SMAD protein
SMAD protein signal
signal transduction
transduction
GPX3
GPX3
4 INHBA
INHBA ATOH8
ATOH8
5
GPX2
GPX2
SMAD6
SMAD6
E UGT1A1
UGT1A1 F CBFB
CBFB
RUNX3
RUNX3 Regulates
Regulates Immune
Immune Response
Response and
and Cell
Cell Migration
Migration
ITGA4
ITGA4
UGT1A5
UGT1A5 Size
Glucuronate
glucuronate
glucuronate metabolic
metabolic
metabolic process
process
process 2
Signaling
Signaling by
by NOTCH1
NOTCH1 PEST
PEST Domain
domain Mutants
Domain mutants
Mutants in
in cancer
in Cancer
Cancer 3
TLE4
TLE4
4
Signaling 5
Signaling by
by NOTCH1
NOTCH1
uronic
Uronic
uronic acid
acid metabolic
acid metabolic process
metabolic process
process 6
CUL1
CUL1 MIB2
MIB2
Size 7
NOTCH1
NOTCH1 Intracellular
Intracellular Domain
Domain Regulates
Regulates Transcription
Transcription
3.00 8
cellular
Cellular
cellular glucuronidation
glucuronidation
glucuronidation HDAC7
HDAC7 MTOR
MTOR
UGT1A4
UGT1A4 3.25
3.50
Diseases
Diseases ofof signal
signal transduction
transduction by
by growth
growth factor
factor
receptors
receptors
receptorsand
and
andsecond
second
secondmessengers
messengers
messengers MYO18A
MYO18A
3.75
4.00 NRG2
NRG2
UGT1A3
UGT1A3 SPTBN1
SPTBN1
DCTN1
DCTN1
Figure 2.
Combined methylation scores of CpGs in the four sets and annotation. A, Association of mean methylation b values in each of the sets described in Fig. 1C
with immune cell proportion in buccal and cervical samples and lymphoid proportion in blood samples in the discovery set. B, Venn Diagram of genes
associated with CpGs in each of the four smoking-associated sets of CpG indicates little overlap between involved genes. C–F, Gene ontology (C–E) and
Reactome pathway enrichment (F) for the four sets of smoking-associated CpGs reveals different pathways.
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
0.8
0.6
Mean β
(raw)
0.5
0.4
0.3
0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75
Immune cell proportion
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
B
0.8
Blood samples
0.7
Mean β
(raw)
0.6
0.5
0.4
0.3
0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5
Lymphoid proportion
C Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
0.8 0.40
0.50
Cervical samples
0.30
0.45 0.35
0.7
Mean β
(raw)
0.40 0.25
0.30
0.35
0.6
0.30 0.25 0.20
0.25
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Immune cell proportion
D Ex-smoker Smoker E F
Buccal samples Blood samples Cervical samples
1.0 1 0.99 1.0 1.0
(1-1)* 0.97
0.94 (0.98- 0.94 (0.95-
1)* 0.91 (0.91-
(0.88- 1)*
0.99)* 0.9 (0.85- 0.97)*
0.82 (0.84- 0.98)* 0.82
0.8 0.8 0.8 0.8
(0.73- 0.95)* 0.77 (0.76-
0.9)* (0.75- 0.88)*
(0.72- 0.84)*
0.82)*
0.66 0.65
AUC
AUC
AUC
rM
M
rM
M
rM
rM
M
M
po
po
o
pe
pe
po
pe
o
pe
yp
pe
pe
yp
yp
hy
hy
hy
hy
hy
hy
hy
eh
hy
hy
eh
eh
al
al
al
al
al
al
al
n
al
al
n
e li
e li
n
e li
e li
e li
mu
e li
e li
mu
e li
e li
mu
ith
ith
ith
ith
ith
ith
ith
ith
ith
Im
Im
Ep
Im
Ep
ep
ep
Ep
ep
ep
ep
ep
al
tal
tal
al
al
tal
im
im
D is
D is
im
D is
ox
ox
ox
Pr
Pr
Pr
Figure 3.
Evaluation of scores in independent validation sets. Independent dataset comprising 304 matched blood and buccal samples (n ¼ 152 each) and 442 cervical samples
was used to validate the findings. A–C, Mean beta values (uncorrected) in each of the four sets of CpGs in buccal (A), blood (B), and cervical (C) samples of never
smokers, ex-smokers, and current smokers versus immune cell proportion (A and C) or lymphoid proportion (B). D–F, AUC of corrected values in each of the four sets
of CpG comparing never smokers with current or former smokers in buccal (D), blood (E), and cervical (F) samples. Mean methylation scores in this figure only include
sites present on the 450K array for comparability between datasets.
E-cigarette and smokeless tobacco use alter the epigenome of CpGs exhibited consistent hypermethylation in the tumor com-
oral epithelial cells similar to cigarette smoking pared with matched normal tissue, whereas epithelial hypoM
We next evaluated corrected methylation scores in saliva samples showed consistent hypomethylation. Sets that showed opposing
from never smokers, e-cigarette users who smoked less than 100 directions between cancer tissue compared with smoking were
cigarettes in their life, and current cigarette smokers (Fig. 4A–D; excluded from AUC graphs in Fig. 5 (e.g., immune hypoM
raw and corrected values in Supplementary Fig. S10A and S10B). CpGs; Fig. 5B and C).
Whereas e-cigarette users did not have significantly different levels Cigarette smoking is also associated with cancers at non-directly
of immune hypoM levels from controls, they exhibited altered levels exposed sites, including cervical cancer, and we hypothesized
of proximal epithelial hyperM [AUC: 0.91 (95% CI: 0.87–0.95)], that smoking-related CpGs might be associated with these cancers as
distal epithelial hyperM CpGs [AUC: 0.74 (0.67–0.80)], and epi- well. Distal epithelial hyperM CpGs, identified in cervical samples,
thelial hypoM [AUC: 0.59 (0.52–0.66)] compared with controls were significantly hypermethylated in cervical cancer tissue. Interest-
(Fig. 4A and B). E-cigarette users had a limited smoking history ingly, also proximal epithelial hyperM was significantly hyperM in
(<100 cigarettes in their life), and methylation levels were not cervical cancer tissue, possibly due to its role in cancer-related genes
correlated with reported smoking history, except for immune (Fig. 5D and E).
hypoM (Supplementary Fig. S10C), or mL of e-cigarette liquid As established cancers often exhibit a highly disrupted epigenome
used per day as a quantitative proxy for e-cigarette use frequency and might therefore not be as informative regarding early alterations
(Supplementary Fig. S10D). Categorical information was available driving cancer progression, we also investigated mean methylation
on duration of smoking or e-cigarette use, respectively (≤1 year, > 1 levels of each of the four sets of CpGs in CIS lesions, that can either
year, > 5 years). In smokers, the smoking-related changes became progress to cancer or regress. In particular, the proximal epithelial
more pronounced with increasing duration for epithelial hypoM hyperM was highly elevated in CIS lesions that later progressed to
and immune hypoM but were less time dependent for proximal or cancer, while it was not significantly elevated in regressing lesions
distal epithelial hyperM (Fig. 4C). Likewise, proximal and (Fig. 5F). Proximal epithelial hyperM distinguished between progres-
distal epithelial hyperM changes in e-cigarette users appeared sing and regressing lesions with an AUC of 0.85 (0.73–0.97; Fig. 5G).
sooner (<1 year) than epithelial hypoM (≤1 year), the latter of Dependence of mean methylation values on immune cell proportion
which was significantly different from controls only in after 1 year for lung tissue, cervical tissue, and CIS samples is shown in Supple-
or more of reported vaping (Fig. 4D). mentary Fig. S13A–S13C.
To better understand the similarities and differences of smoking and
e-cigarette use, we next assessed the inferred epithelial and immune Prediction of lung cancer using blood and buccal samples
delta beta value at the individual 528 CpG sites. This revealed a partial Previous studies have indicated lung cancer may be predicted via
but not complete overlap between smokers in the discovery set and e- methylation levels in blood samples, which could help with risk
cigarette users (Supplementary Fig. S11A). Sites overlapping for stratification for screening methodologies such as low-dose CT. We
proximal epithelial hyperM in the inferred epithelial fraction were were interested in comparing the immune-related set of CpGs
still enriched for sites associated with growth factors and damage discovered in the current study to previous predictors. Moreover,
response and notably included genes such as HDAC7 and MTOR as some of the sets were associated with cancer or CIS progression
(Supplementary Fig. S11B), while epithelial hypoM sites remained in lung tissue, we wondered whether they might be able to distin-
enriched for cellular response to chemical stress, including genes such guish between future cancer cases in controls on buccal samples
as NFE2L2 and GPX2/3 (Supplementary Fig. S11C; Supplementary from current smokers.
Tables S8 and S9). Assessment of immune hypoM signature in 1,352 blood samples
To compare cigarette and e-cigarette use to smokeless tobacco, derived from the ESTHER study with complete smoking pack-year
we next evaluated methylation changes in moist snuff users. Saliva information (Supplementary Table S10), including samples from
samples from smokeless tobacco users exhibited significant differ- controls and cases who developed lung cancer up to 16.8 years after
ences in epithelial hypoM and proximal epithelial hyperM, but not sample donation, indicated that one SD increase in immune hypoM
immune hypoM, compared with nonsmokers (Fig. 4E) and these was associated with significantly reduced OR of developing lung cancer
signatures were highly discriminative between nonsmokers and [OR ¼ 0.96 (95% CI: 0.94–0.97), P ¼ 1.64e-07] (Supplementary
smokeless users (Fig. 4F; AUCs of 1 and 0.92, respectively; raw Table S11). However, the effect was modest and comparing AUC
values for mean beta in each group of CpGs are shown in in indicated that no significant gain in information could be achieved in
Supplementary Fig. S12A and S12B). comparison with previously identified single-site predictors AHRR or
F2RL3 (Fig. 6A).
Smoking-associated methylation alterations are associated Follow-up information on lung or airway cancer incidence in the
with cancer and CIS progression 22 years following sample collection was available for the validation
Smoking-associated changes in buccal cells were previously found set of matched blood and buccal samples. While sample numbers
to be associated with cancer-related changes (38). We were therefore were small (n ¼ 31, 6 cancer cases), AHRR alone had the highest
interested in whether one or more of the four sets of functionally AUC in blood (Fig. 6B). Conversely, in buccal samples, proximal
distinct CpGs showed a particular association with current or future epithelial hyperM exhibited the highest AUC (0.71; Fig. 6C).
cancers associated with smoking.
Mean methylation levels of the each of the four sets of CpGs in
lung cancer samples from LUAD or LUSC in TCGA revealed Discussion
similar changes compared with smoking for epithelial hypoM, Several previous studies have investigated smoking-induced
distal epithelial hyperM, and proximal epithelial hyperM when DNAme alterations, primarily conducted in blood (16, 28). Recent
compared with matched normal tissue to control for smoking studies have highlighted the importance of accounting for cell type
exposure (Fig. 5A–C). For instance, proximal epithelial hyperM heterogeneity when investigating DNAme, including cell lineage, when
0.42
0.56 0.45 0.96)* (0.82- (0.82-
0.40
0.91
0.8 0.92)* 0.92)* (0.87-
0.30
0.40 0.38 0.95)*
0.52 0.74
0.36 (0.67-
0.35 0.25 0.8)*
AUC
0.6 0.59
(0.52- 0.56
0.66)* (0.49-
C Cigarette smoking duration Control ≤1 y >1 y >5 y Unknown 0.64)
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM 0.4
0.58 0.48
0.36
**** **** **** *** **** **** * **** **** **** * **** **** **** **
0.42
0.56
(corrected)
0.44 0.33
Mean β
0.40
0.54 0.2
0.40 0.30
0.52 0.38
rM
rM
po
o
0.27
pe
pe
yp
0.36
hy
0.50 0.36
hy
hy
eh
al
al
al
n
e li
mu
e li
e li
ith
ith
ith
Im
Ep
ep
ep
D
tal
al
e-cigarette use duration
im
D is
ox
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
Pr
0.58 *** 0.48 **** **** **** **** *
0.46
0.40 0.325
F Smokeless tobacco user Smoker
(corrected)
0.56
Mean β
0.44 0.300
0.38 1.0 1
0.54
(1-1)*
0.42 0.275 0.94 0.94
0.36 1 (0.9- (0.89-
0.52
0.40 (1-1)* 0.99)* 0.99)*
0.250
0.8 0.92
(0.86-
0.98)*
Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM 0.6
5.1e-09 9.8e-16 0.38 0.049
0.57
<2.22e-16 0.49 0.27 8.5e-13 0.55 (0.45-
<2.22e-16 1.1e-14 0.75 2.3e-14 (0.42- 0.7)
0.6 0.67)
0.52
0.4 (0.39-
(corrected)
0.5
Mean β
0.65)
0.4
0.2
0.3
rM
M
rM
M
pe
po
pe
yp
hy
hy
hy
eh
al
al
al
n
e li
e li
mu
e li
ith
ith
ith
Im
ep
Ep
ep
al
tal
im
D is
ox
Pr
Figure 4.
Impact of e-cigarette and smokeless use on cell type–specific epigenetic smoking signatures. A, Mean beta values (corrected) in each of the four sets in saliva samples
of never or current smokers or e-cigarette users, corrected for cell type–specific effects. B, AUC of corrected values in each of the four sets comparing smokers or e-
cigarette users with controls in the e-cigarette use dataset. C, Mean beta values in each of the four sets in never smokers (control) or smokers, stratified by categorial
smoking duration information. D, Mean beta values in each of the four sets in never smokers (control) or e-cigarette users, stratified by categorial e-cigarette duration
information. The legend is identical to C. E, Mean beta values (corrected) in each of the four sets in saliva samples of current nonsmokers (prior smoking history not
known), smokeless tobacco users, or smokers in the smokeless tobacco use set. F, AUC of corrected values in each of the four sets of CpGs comparing nonsmokers
with smokeless tobacco users or smokers in the smokeless tobacco use set. , P < 0.05; , P < 0.01; , P < 0.001; , P < 0.0001 in Wilcoxon test compared with
relevant controls (never or nonsmokers, respectively, for A, C, D, and E).
evaluating impacts of smoking in blood (36). Our data provide a first applying deconvolution and linear models. Importantly, this approach
insight into cell type–specific and tissue-specific epigenetic alterations enabled investigation of cell type–specific alterations that are shared by
in response to smoking as an external exposure across various cell types cigarette smoking and e-cigarette use, may be associated with carci-
and tissues, looking primarily at epithelial versus immune cells, by nogenesis, and could form the basis for novel cancer detection or risk
Sensitivity
0.45 0.50 Epithelial hypoM: 0.82
0.32 0.45
(0.68-0.95)
0.75
0.40 Distal epithelial hyperM:
0.25 0.57 (0.38-0.77)
0.30 0.40
0.35 Proximal epithelial
0.70
hyperM: 0.71 (0.53-0.88)
0.28 0.00
0.30
0.35 0.00 0.25 0.50 0.75 1.00
0.65
C
1-Specificity
Mean β
Sensitivity
0.70
0.40 0.325 0.50 Epithelial hypoM: 0.93
0.4
(0.87-0.99)
0.65 0.35 Distal epithelial hyperM:
0.300
0.25 0.85 (0.76-0.95)
0.3
Proximal epithelial
0.60 0.30
0.275 hyperM: 0.81 (0.7-0.93)
0.00
0.00 0.25 0.50 0.75 1.00
1-Specificity
D Epithelial hypoM Immune hypoM Distal epithelial hyperM Proximal epithelial hyperM
E Cervical tissue
1.00
0.017 0.55 1.5e-08 0.70 1e-07 0.6 2.2e-08
0.80 0.75
0.50 0.65
cervical tissue
0.5
Sensitivity
0.75
Mean β
0.00
0.00 0.25 0.50 0.75 1.00
1-Specificity
F Control tissue Regressing CIS lesion Progressing CIS lesion G Regressing versus progressing CIS lesion
Epithelial hypoM Immune hypo Distal epithelial hyperM Proximal epithelial hyperM 1.00
0.65 6.8e-05 0.00015 0.7 9.4e-06
0.44
0.8 0.45
0.047 0.0026 0.00092 0.11
0.00042 2.6e-12 5.8e-12 5.5e-15 0.75
0.55 0.6
0.40
Sensitivity
Mean β
0.7
0.5 0.50
0.45
0.35 Epithelial hypoM: 0.57
(0.4-0.74)
0.4
0.35 0.25
0.6 0.30 Distal epithelial hyperM:
0.81 (0.68-0.93)
0.3 Proximal epithelial
0.25 0.25 0.00 hyperM: 0.85 (0.73-0.97)
0.00 0.25 0.50 0.75 1.00
1-Specificity
Figure 5.
Mean methylation beta of smoking-associated CpG sets in cancer tissue and progressing versus regressing CIS lesions. A, Mean methylation beta values in each set in
TCGA LUAD and LUSC projects. Only samples with matched normal control tissue were included to control for smoking exposure. P values are derived from a paired
Wilcoxon test. B and C, AUC plots for mean methylation levels in epithelial hypoM, distal epithelial hyperM, and proximal epithelial hyperM, comparing matched
control tissue versus lung cancer tissue in TCGA-LUAD (B) and TCGA-LUSC (C). D, Mean methylation beta values in each set in cervical cancer or matched normal
tissue (GSE211668). Only samples with matched normal control tissue were included to control for smoking exposure. P values are derived from a paired Wilcoxon
test. E, AUC plots for mean methylation levels in epithelial hypoM, distal epithelial hyperM, and proximal epithelial hyperM, comparing matched control tissue versus
cervical cancer tissue (GSE211668). F, Mean methylation beta values in the smoking-associated CpG sets in control lung tissue, regressing CIS lesions, or progressing
CIS lesions. P values are derived from paired Wilcoxon tests. G, AUC plots for mean methylation levels in epithelial hypoM, distal epithelial hyperM, and proximal
epithelial hyperM, comparing matched regressing CIS versus progressing CIS lesions.
A ESTHER Study B NSHD Matched samples (blood) C NSHD Matched samples (buccal)
lung cancer incidence lung or airway cancer incidence lung or airway cancer incidence
1.00 1.00 1.00
Sensitivity
AHRR: 0.67 AHRR: 0.84
Sensitivity
AHRR: 0.62
0.50 (0.58−0.76) (0.67−1) (0.4−0.84)
0.50 0.50
F2RL3: 0.66 F2RL3: 0.71 F2RL3: 0.63
(0.58−0.74) (0.46−0.95) (0.37−0.9)
Immune hypoM: 0.64 Immune hypoM: 0.73 Proximal epithelial
0.25 (0.54−0.73) 0.25 (0.48−0.97) 0.25 hyperM: 0.71 (0.48−0.93)
Figure 6.
Prediction of lung cancer using immune hypoM in blood and proximal epithelial hyperM in buccal samples compared with previously described predictors.
A, Comparison of the AUCs of AHRR (cg05575921), F2RL3 (cg03636183), and mean methylation at immune hypoM to identify any lung cancer cases within
17 years in 259 current smokers in the ESTHER study. B, Comparison of the AUCs of AHRR (cg05575921), F2RL3 (cg03636183), and mean methylation at
immune hypoM (corrected for immune cell proportion) to identify any lung or airway cancer cases within 22 years in 31 blood samples (n ¼ 6 cancer cases) of
the validation set (same individuals as in C). C, Comparison of the AUCs of AHRR (cg05575921), F2RL3 (cg03636183), and mean methylation at proximal
epithelial hyperM (corrected for immune cell proportion) to identify any lung or airway cancer cases within 22 years in 31 buccal samples (n ¼ 6 cancer cases)
of the validation set (same individuals as in B).
stratification approaches using self-collected buccal or saliva samples studies, our data indicate a partial reversibility of smoking-induced
pending additional optimization and validation. epigenetic alterations in former smokers (Fig. 3). For instance, epi-
Goldfarbmuren and colleagues recently showed that smoking can thelial hypoM, a signature associated with detoxification, was unable to
induce both pan- and cell-specific changes using single-cell RNA distinguish ex-smokers from never smokers in our buccal sample
sequencing of the airway epithelium of smokers and nonsmokers (61). validation set while it was highly elevated in current smokers (Fig. 3D).
Their findings indicated that smoking also induces changes in “pro- We note that to date neither the precise mechanisms of DNAme
tected” stem and submucosal gland cells. In absence of large-scale induction (or loss) upon smoking nor the kinetics and causes of
single-cell methylation datasets from various tissues with regards to reversal are known. If smoking induces DNAme hypermethylation
smoking, we employed a cell type deconvolution-based approach. at a site and changes persist after giving up smoking, it could imply that
Although obtained via a different modality and investigating different (i) either the individual cell survived or (ii) the site was methylated in a
cell types, our data are in line with these findings: on one hand, we stem cell and is propagated. Conversely, if the hypermethylation
identify general epithelial effects elicited by cigarette smoking (epi- disappears after smoking cessation, it could imply that either (i) the
thelial hypoM). These DNAme changes occur both in directly exposed cell has died and been replaced by another cell or (ii) that the smoking-
and not directly exposed cell types, while on the other hand, we identify associated methyl group has been actively displaced in a living cell.
DNAme alterations specific to certain cell types and contexts, for Methylation patterns may also be influenced by tissue-dependent cell
example, changes occurring in directly exposed epithelial cells (prox- turnover rates (that, in turn, may be affected by smoking), and tobacco
imal epithelial hyperM) or not directly exposed epithelial cells (distal “dose”; for instance, relatively longer-lived cells (e.g., lymphocytes)
epithelial hyperM; Fig. 2A). In line with another recent study (36), our may have more chance to accumulate methylation changes than
data indicate that effects of smoking for some sites more pronounced shorter-lived cells (e.g., neutrophils). Changes in DNAme upon smok-
in the myeloid than lymphoid lineage (Fig. 1B). Importantly, the total ing and its cessation may therefore be the result of a combination of
of 535 sites, grouped into four sets of CpGs, shared little overlap in the cell-specific enzymatic activity related to methylation/demethylation,
genes they spanned (Fig. 1E) and were associated with distinct cell turnover, stem cell involvement, and dose differences. While
functions. For instance, epithelial hypoM sites were associated with studies investigating DNA mutation suggest that quitting smoking
detoxification responses (Fig. 2C), whereas proximal epithelial drives gradual replenishment of bronchial epithelium from cells that
hyperM sites were associated with growth signaling and DNA damage have avoided tobacco mutagenesis (63), suggesting at least in part that
responses (Fig. 2F). In addition, our findings indicate that methylation some stem cells may escape tobacco-related changes, other findings
levels at CpGs identified in this study were significantly associated gene indicate that smoking can also induce gene expression changes in stem
expression at cis genes (Supplementary Fig. S6B and S6C). A limitation cells (26, 61). Longitudinal sample sets (e.g., as collected in ref. 62 and
of the current study is that we employed pathway analysis based on ClinicalTrials.gov NCT05678426), possibly in combination with single
gene names and limited our investigation to cis genes. Future studies cell and tracing experiments, are vital to further investigate cellular
will be required investigate the link between methylation changes and kinetics and the relationship with smoking-related changes. This will
gene transcription and function in more detail, including via multio- help to further interpret the current findings in the context of cellular
mics profiling (e.g., methylation and gene transcription) of bulk sorted kinetics and could help to improve our understanding of the reversal of
or single cells in various tissues. smoking-associated disease risk in the future as well as model when
Previous studies have investigated epigenetic changes and their and by what mechanism epigenetic alterations return to baseline after
reversal in current and former smokers (16, 17, 62). In line with these smoking cessation.
The impact of e-cigarettes on health and disease risk has not cancer in blood samples. The immune hypoM signature did not
been completely clarified, and conflicting evidence and opinions provide a significant benefit compared with individual methylation
exist. A 2015 report by Public Health England estimated that electronic levels at AHRR or F2RL3 in blood samples (Fig. 6A and B). Use of
cigarettes are at least “95% less harmful” than smoking (6), whereas buccal or saliva samples could improve convenience for participants
a 2018 advisory by the U.S. Surgeon General stated the recent surge in e- and/or reduce healthcare provider costs (e.g., by enabling self-
cigarette use among youth is a “cause for great concern,” in part sampling at home). Our data indicate that DNAme at proximal
due to the impact of lifelong nicotine addiction (https://www.cdc. epithelial hyperM sites may be able to detect cancers up to 22 years
gov/tobacco/e-cigarettes/index.html). Additional studies have since in the future with an AUC of 0.71 (Fig. 6C). However, given the
acknowledged potential risks of e-cigarette use such as long-term limited sample size and wide CIs, future prospective sample collec-
addiction and a possible link to cancer (64), for example, due to tions should address whether these sites, or a more informative
evidence provided by a study by Lee and colleagues, which indicated subset thereof, possibly with a higher AUC, may provide a clinical
that e-cigarette smoke damages DNA and reduces repair activity in benefit for stratification.
the mouse heart, lung, and bladder, as well as human lung and Our study has several strengths. To our knowledge, our study is one
bladder cells (8). Moreover, e-cigarette smoke exposure can induce of the first to investigate smoking-associated epigenetic alterations in
features of chronic obstructive pulmonary disease, a disease asso- diverse tissues applying cell type–specific methylation inference to
ciated with smoking, in a nicotine-dependent manner (65), and identify differences in epithelial and immune cells. By not limiting our
more recent studies have suggested that e-cigarette smoke can investigation to sources of immune cells (blood) and accounting for
dysregulate immune function and reduce pathogen resistance (66), cell type–specific differences within proximal and distal sites, the
such as oral cell clearance of potentially pathogenic microbe Staph- interpretability of our findings is improved and we, for the first time,
ylococcus aureus (67). Our data derived from saliva samples of e- identify cell type–specific differences in DNAme alterations between
cigarette users suggest epigenetic alterations of directly exposed epithelial and immune cells in response to smoking. A majority of our
epithelial cells are, in part, similar to those of cigarette smokers reported CpGs (60%) have not previously been described in the
(Fig. 4A and B) and shared sites are enriched for genes involved in literature (Supplementary Fig. S6D), and our observations are vali-
DNA damage repair, growth signaling, oxidation, and response to dated in several independent datasets (Fig. 3 and 4). We also developed
cellular stress, including genes such as HDAC7, MTOR, NFE2L2 and an algorithm to correct for cellular heterogeneity in samples to infer
GPX2/3 (Supplementary Fig. S11). Mean methylation at sites methylation in “pure” epithelial or immune populations of the given
involved in detoxification exhibited a duration-dependent effect sample (Supplementary Fig. S8; Supplementary Data S1). Moreover,
(Fig. 4D), and was only significantly different from controls fol- we compare alternatives to cigarette smoking and identify similar
lowing ≥ 1 year of e-cigarette use. Our findings stand in contrast patterns of DNAme-associated alterations (Fig. 4) and investigate the
with those of a previous study that observed distinct DNAme link of these signatures with progression to cancer (Fig. 5) and cancer
patterns of cigarette and e-cigarette users (23). This discrepancy prediction (Fig. 6).
is most likely explained by the different approach applied in this Likewise, our study also has limitations. As non-directly exposed
study, especially the identification of cell type–specific DNAme epithelial cells are more challenging to obtain in men, we have used
changes. only samples from women in our discovery set, which may induce a
Smokeless tobacco is another alternative to smoking previously gender bias. However, the fact that our signatures validate across
linked to the development of head and neck cancers and other adverse several independent datasets across both sexes, including a dataset
health outcomes. Our data indicate smokeless tobacco use induces consisting entirely of samples from men (“smokeless tobacco use set”),
similar effects on CpGs in the epithelial hypoM and proximal epithelial suggests our findings are applicable to both men and women, although
hyperM sets as cigarette smoking, but we did not observe any future studies should investigate sex-specific effects.
significant effects on immune cells (Fig. 4E and F). In absence of large-scale single-cell DNAme data, we utilize bulk
Comparing the three modes of smoking and/or tobacco use (cigar- DNAme deconvolution and linear models to identify cell-specific
ettes, e-cigarettes, or smokeless tobacco), our data suggest that tobac- smoking-related alterations. Several deconvolution approaches
co-containing products (cigarette smoking or smokeless tobacco), or exist (71), including reference-based methods that rely on knowledge
e-cigarette use for more than 1 year, may elicit loss of methylation in of main constituent cell types of the tissue with reference molecular
epithelial hypoM regions that are associated with detoxification (of profiles, reference-free methods, or Bayesian approaches, for instance
tobacco; Fig. 2C). Discontinuation of smoking resulted in a complete leveraging prior knowledge of distributions of cell types in the studied
reversal of epithelial hypoM alterations (Fig. 3A–C), although the tissue such as BayesCCE (72). The best deconvolution approach
exact timeline and mechanism underlying this reversal is unclear. Only depends on the study type and context (71). We justify the use
cigarette smokers exhibited alterations in mean DNAme at immune of the reference-based EpiDISH method with the fact that the main
hypoM sites whereas all three types of smoking-related products— cell types were known and the approach has been previously validated
cigarettes, e-cigarettes, and smokeless tobacco—elicited proximal for the sample types assessed in this study (73). We then applied
epithelial hypermethylation (Fig. 4). Importantly, proximal epithelial linear models to identify differences across groups and cell types.
hypermethylation was the most consistently associated set of CpGs While these models relied on strong assumptions, previous studies
with lung cancer progression and was strongly altered also in cervical indicated that this approach is feasible and can add additional infor-
cancer compared with normal cervical tissue (Fig. 5), highlighting a mation (33, 43), and importantly, a separate benchmarking study
potential link of these sites to carcinogenesis. indicated that linear regression is a valid statistical methodology for
Efforts to reduce lung cancer mortality via early detection, such as DNAme despite the fact that the data do not always perfectly satisfy the
with low-dose CT in smokers, exist but are likely to require prior risk assumptions (54). We note that a degree of heteroskedasticity is
stratification to reduce false positives (68, 69). Previous studies have expected in the case of cell type–specific differential methylation
demonstrated that methylation at certain sites (18) or composite (associated with differential variability). Further work using different
methylation risk scores (70) can identify individuals at risk of lung deconvolution approaches, Bayesian models that can deal with
References
1. United States Public Health Service Office of the Surgeon General, Centers for 9. Rose JJ, Krishnan-Sarin S, Exil VJ, Hamburg NM, Fetterman JL, Ichinose F,
Disease Control and Prevention (U.S.), National Center for Chronic Disease et al. Cardiopulmonary impact of electronic cigarettes and vaping products: a
Prevention and Health Promotion (U.S.) Office on Smoking and Health. scientific statement from the American Heart Association. Circulation 2023;
How tobacco smoke causes disease: the biology and behavioral basis for 148:703–28.
smoking-attributable disease : a report of the surgeon general. Atlanta (GA): 10. Bro_zek GM, Jankowski M, Zejda JE. Acute respiratory responses to the use of
Centers for Disease Control and Prevention; 2010. e-cigarette: an intervention study. Sci Rep 2019;9:6844.
2. Lushniak BD, Samet JM, Pechacek TF, Norman LA, Taylor PA. The health 11. Sakamaki-Ching S, Williams M, Hua M, Li J, Bates SM, Robinson AN, et al.
consequences of smoking—50 years of progress : a report of the surgeon general. Correlation between biomarkers of exposure, effect and potential harm in
Atlanta (GA): Centers for Disease Control and Prevention; 2014. the urine of electronic cigarette users. BMJ Open Respir Res 2020;7:e000452.
3. GBD 2019 Tobacco Collaborators. Spatial, temporal, and demographic patterns 12. Singh KP, Lawyer G, Muthumalage T, Maremanda KP, Khan NA, McDonough
in prevalence of smoking tobacco use and attributable disease burden in 204 SR, et al. Systemic biomarkers in electronic cigarette users: implications for
countries and territories, 1990–2019: a systematic analysis from the global noninvasive assessment of vaping-associated pulmonary injuries. ERJ Open Res
burden of disease study 2019. Lancet 2021;397:2337–60. 2019;5:00182–2019.
4. Rodu B, Godshall WT. Tobacco harm reduction: an alternative cessation strategy 13. George J, Hussain M, Vadiveloo T, Ireland S, Hopkinson P, Struthers AD, et al.
for inveterate smokers. Harm Reduct J 2006;3:37. Cardiovascular effects of switching from tobacco cigarettes to electronic cigar-
5. Notley C, Ward E, Dawkins L, Holland R. The unique contribution of e-cigarettes ettes. J Am Coll Cardiol 2019;74:3112–20.
for tobacco harm reduction in supporting smoking relapse prevention. 14. Polosa R, Morjaria JB, Prosperini U, Busa B, Pennisi A, Malerba M, et al. COPD
Harm Reduct J 2018;15:31. smokers who switched to e-cigarettes: health outcomes at 5-year follow up.
6. McNeill A, Brose L, Calder R, Hitchman S, Hajek P, McRobbie H. E-cigarettes: an Ther Adv Chronic Dis 2020;11:2040622320961617.
evidence update: a report commissioned by Public Health England. London, 15. Widschwendter M, Jones A, Evans I, Reisel D, Dillner J, Sundstr€om K, et al.
England: Public Health England; 2015. Epigenome-based cancer risk prediction: rationale, opportunities and chal-
7. Mohammadi L, Han DD, Xu F, Huang A, Derakhshandeh R, Rao P, et al. Chronic lenges. Nat Rev Clin Oncol 2018;15:292–309.
E-cigarette use impairs endothelial function on the physiological and cellular 16. Joehanes R, Just AC, Marioni RE, Pilling LC, Reynolds LM, Mandaviya PR,
levels. Arterioscler Thromb Vasc Biol 2022;42:1333–50. et al. Epigenetic signatures of cigarette smoking. Circ Cardiovasc Genet 2016;
8. Lee H-W, Park S-H, Weng M, Wang H-T, Huang WC, Lepor H, et al. E-cigarette 9:436–47.
smoke damages DNA and reduces repair activity in mouse lung, heart, and 17. McCartney DL, Stevenson AJ, Hillary RF, Walker RM, Bermingham ML, Morris
bladder as well as in human lung and bladder cells. Proc National Acad Sci U S A SW, et al. Epigenetic signatures of starting and stopping smoking. EBioMedicine
2018;115:E1560–9. 2018;37:214–20.
18. Zhang Y, Elgizouli M, Sch€ottker B, Holleczek B, Nieters A, Brenner H. Smoking- 40. Barcelona V, Huang Y, Brown K, Liu J, Zhao W, Yu M, et al. Novel DNA
associated DNA methylation markers predict lung cancer incidence. methylation sites associated with cigarette smoking among African Americans.
Clin Epigenetics 2016;8:127. Epigenetics 2019;14:383–91.
19. Baglietto L, Ponzi E, Haycock P, Hodge A, Assumma MB, Jung C, et al. 41. Tsai P-C, Glastonbury CA, Eliot MN, Bollepalli S, Yet I, Castillo-Fernandez
DNA methylation changes measured in pre-diagnostic peripheral blood JE, et al. Smoking induces coordinated DNA methylation and gene expres-
samples are associated with smoking and lung cancer risk. Int J Cancer sion changes in adipose tissue with consequences for metabolic health.
2017;140:50–61. Clin Epigenetics 2018;10:126.
20. Bhardwaj M, Sch€ottker B, Holleczek B, Brenner H. Enhanced selection of people 42. Gonzalez H, Hagerling C, Werb Z. Roles of the immune system in cancer: from
for lung cancer screening using AHRR (cg05575921) or F2RL3 (cg03636183) tumor initiation to metastatic progression. Genes Dev 2018;32:1267–84.
methylation as biological markers of smoking exposure. Cancer Commun 2023; 43. Barrett JE, Herzog C, Jones A, Leavy OC, Evans I, Knapp S, et al. The WID-BC-
43:956–9. index identifies women with primary poor prognostic breast cancer based on
21. Jessen WJ, Borgerding MF, Prasad GL. Global methylation profiles in buccal cells DNA methylation in cervical samples. Nat Commun 2022;13:449.
of long-term smokers and moist snuff consumers. Biomarkers 2018;23:625–39. 44. Wadsworth M, Kuh D, Richards M, Hardy R. Cohort profile: the 1946 national
22. Andersen A, Reimer R, Dawes K, Becker A, Hutchens N, Miller S, et al. DNA birth cohort (MRC National Survey of Health and Development). Int J Epide-
methylation differentiates smoking from vaping and non-combustible tobacco miol 2006;35:49–54.
use. Epigenetics 2022;17:178–90. 45. Min JL, Hemani G, Smith GD, Relton C, Suderman M. Meffil: efficient
23. Richmond RC, Sillero-Rejon C, Khouja JN, Prince C, Board A, Sharp G, et al. normalization and analysis of very large DNA methylation datasets. Bioinfor-
Investigating the DNA methylation profile of e-cigarette use. Clin Epigenetics matics 2018;34:3983–9.
2021;13:183. 46. Colaprico A, Silva TC, Olsen C, Garofano L, Cava C, Garolini D, et al.
24. Wan ES, Qiu W, Baccarelli A, Carey VJ, Bacherman H, Rennard SI, et al. TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA
Cigarette smoking behaviors and time since quitting are associated with differ- data. Nucleic Acids Res 2016;44:e71.
ential DNA methylation across the human genome. Hum Mol Genet 2012;21: 47. Chakravarthy A, Reddin I, Henderson S, Dong C, Kirkwood N, Jeyakumar M,
3073–82. et al. Integrated analysis of cervical squamous cell carcinoma cohorts from three
25. Zeilinger S, K€uhnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, et al. continents reveals conserved subtypes of prognostic significance. Nat Commun
Tobacco smoking leads to extensive genome-wide changes in DNA methylation. 2022;13:5818.
PLoS One 2013;8:e63812. 48. Teixeira VH, Pipinikas CP, Pennycuick A, Lee-Six H, Chandrasekharan D, Beane
26. Guida F, Sandanger TM, Castagne R, Campanella G, Polidoro S, Palli D, et al. J, et al. Deciphering the genomic, epigenomic, and transcriptomic landscapes of
Dynamics of smoking-induced genome-wide methylation changes with time pre-invasive lung cancer lesions. Nat Med 2019;25:517–25.
since smoking cessation. Hum Mol Genet 2015;24:2349–59. 49. Pennycuick A, Teixeira VH, AbdulJabbar K, Raza SEA, Lund T, Akarca AU, et al.
27. Sun YV, Smith AK, Conneely KN, Chang Q, Li W, Lazarus A, et al. Epigenomic Immune surveillance in clinical regression of preinvasive squamous cell lung
association analysis identifies smoking-related DNA methylation sites in African cancer. Cancer Discov 2020;10:1489–99.
Americans. Hum Genet 2013;132:1027–37. 50. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD,
28. Christiansen C, Castillo-Fernandez JE, Domingo-Relloso A, Zhao W, Moustafa et al. Minfi: a flexible and comprehensive bioconductor package for the analysis
JSE-S, Tsai P-C, et al. Novel DNA methylation signatures of tobacco smoking of Infinium DNA methylation microarrays. Bioinformatics 2014;30:1363–9.
with trans-ethnic effects. Clin Epigenetics 2021;13:36. 51. Tian Y, Morris TJ, Webster AP, Yang Z, Beck S, Feber A, et al. ChAMP: updated
29. Jamieson E, Korologou-Linden R, Wootton RE, Guyatt AL, Battram T, methylation analysis pipeline for illumina BeadChips. Bioinformatics 2017;33:
Burrows K, et al. Smoking, DNA methylation, and lung function: a men- 3982–4.
delian randomization analysis to investigate causal pathways. Am J Hum 52. Teschendorff AE, Breeze CE, Zheng SC, Beck S. A comparison of reference-based
Genet 2020;106:315–26. algorithms for correcting cell-type heterogeneity in epigenome-wide association
30. Loyfer N, Magenheim J, Peretz A, Cann G, Bredno J, Klochendler A, et al. A DNA studies. BMC Bioinformatics 2017;18:105.
methylation atlas of normal human cell types. Nature 2023;613:355–64. 53. Zheng SC, Webster AP, Dong D, Feber A, Graham DG, Sullivan R, et al. A novel
31. Day K, Waite LL, Thalacker-Mercer A, West A, Bamman MM, Brooks JD, et al. cell-type deconvolution algorithm reveals substantial contamination by immune
Differential DNA methylation with age displays both common and dynamic cells in saliva, buccal and cervix. Epigenomics 2018;10:925–40.
features across human tissues that are influenced by CpG landscape. 54. Mansell G, Gorrie-Stone TJ, Bao Y, Kumari M, Schalkwyk LS, Mill J, et al.
Genome Biol 2013;14:R102. Guidance for DNA methylation studies: statistical insights from the Illumina
32. Slieker RC, Relton CL, Gaunt TR, Slagboom PE, Heijmans BT. Age-related DNA EPIC array. BMC Genomics 2019;20:366.
methylation changes are tissue-specific with ELOVL2 promoter methylation as 55. Barrett JE, Jones A, Evans I, Reisel D, Herzog C, Chindera K, et al. The DNA
exception. Epigenetics Chromatin 2018;11:25. methylome of cervical cells can predict the presence of ovarian cancer.
33. Barrett JE, Herzog C, Kim YN, Bartlett TE, Jones A, Evans I, et al. Susceptibility to Nat Commun 2022;13:448.
hormone-mediated cancer is reflected by different tick rates of the epithelial and 56. Nazarenko T, Vavourakis CD, Jones A, Evans I, Watson A, Brandt K, et al.
general epigenetic clock. Genome Biol 2022;23:52. Technical and biological sources of unreliability of Infinium type II probes of the
34. Campbell KA, Colacino JA, Park SK, Bakulski KM. Cell types in environmental illumina MethylationEPIC BeadChip microarray. bioRxiv 2023.
epigenetic studies: biological and epidemiological frameworks. Curr Environ 57. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, et al. clusterProfiler 4.0: a universal
Health Rep 2020;7:185–97. enrichment tool for interpreting omics data. Innovation 2021;2:100141.
35. Bauer M. Cell-type-specific disturbance of DNA methylation pattern: a chance to 58. Yu G, He Q-Y. ReactomePA: an R/Bioconductor package for reactome pathway
get more benefit from and to minimize cohorts for epigenome-wide association analysis and visualization. Mol Biosyst 2015;12:477–9.
studies. Int J Epidemiol 2018;47:917–27. 59. Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, et al. Control
36. You C, Wu S, Zheng SC, Zhu T, Jing H, Flagg K, et al. A cell-type deconvolution of developmental regulators by polycomb in human embryonic stem cells. Cell
meta-analysis of whole blood EWAS reveals lineage-specific smoking-associated 2006;125:301–13.
DNA methylation changes. Nat Commun 2020;11:4779. 60. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an
37. Bauer M, Fink B, Th€ urmann L, Eszlinger M, Herberth G, Lehmann I, et al. open-source package for R and Sþ to analyze and compare ROC curves.
Tobacco smoking differently influences cell types of the innate and adaptive BMC Bioinformatics 2011;12:77.
immune system—indications from CpG site methylation. Clin Epigenetics 61. Goldfarbmuren KC, Jackson ND, Sajuthi SP, Dyjack N, Li KS, Rios CL, et al.
2016;8:83. Dissecting the cellular specificity of smoking effects and reconstructing lineages
38. Teschendorff AE, Yang Z, Wong A, Pipinikas CP, Jiao Y, Jones A, et al. in the human airway epithelium. Nat Commun 2020;11:2485.
Correlation of smoking-associated DNA methylation changes in buccal cells 62. Keshawarz A, Joehanes R, Guan W, Huan T, DeMeo DL, Grove ML, et al.
with DNA methylation changes in epithelial cancer. JAMA Oncol 2015;1: Longitudinal change in blood DNA epigenetic signature after smoking cessation.
476–85. Epigenetics 2022;17:1098–109.
39. Wan ES, Qiu W, Carey VJ, Morrow J, Bacherman H, Foreman MG, et al. 63. Yoshida K, Gowers KHC, Lee-Six H, Chandrasekharan DP, Coorens T, Maughan
Smoking-associated site-specific differential methylation in buccal mucosa in the EF, et al. Tobacco smoking and somatic mutations in human bronchial epithe-
COPDGene study. Am J Resp Cell Mol 2015;53:246–54. lium. Nature 2020;578:266–72.
64. Steliga MA. Health hazards of electronic cigarettes and their utility in smoking 70. Yu H, Raut JR, Sch€ottker B, Holleczek B, Zhang Y, Brenner H. Individual and
cessation. J Thorac Cardiovasc Surg 2022;163:307–10. joint contributions of genetic and methylation risk scores for enhancing lung
65. Garcia-Arcos I, Geraghty P, Baumlin N, Campos M, Dabo AJ, Jundi cancer risk stratification: data from a population-based cohort in Germany.
B, et al. Chronic electronic cigarette exposure in mice induces Clin Epigenetics 2020;12:89.
features of COPD in a nicotine-dependent manner. Thorax 2016; 71. Teschendorff AE, Relton CL. Statistical and integrative system-level analysis of
71:1119–29. DNA methylation data. Nat Rev Genet 2018;19:129–47.
66. Davis LC, Sapey E, Thickett DR, Scott A. Predicting the pulmonary effects of 72. Rahmani E, Schweiger R, Shenhav L, Wingert T, Hofer I, Gabel E, et al.
long-term e-cigarette use: are the clouds clearing? Eur Respir Rev 2022;31: BayesCCE: a Bayesian framework for estimating cell-type composition from
210121. DNA methylation without the need for methylation reference. Genome Biol
67. Catala-Valentín AR, Almeda J, Bernard JN, Cole AM, Cole AL, Moore SD, et al. 2018;19:141.
E-cigarette aerosols promote oral S. aureus colonization by delaying an immune 73. Qi L, Teschendorff AE. Cell-type heterogeneity: why we should adjust for it in
response and bacterial clearing. Cells 2022;11:773. epigenome and biomarker studies. Clin Epigenetics 2022;14:31.
68. Jonas DE, Reuland DS, Reddy SM, Nagle M, Clark SD, Weber RP, et al. 74. Murgas KA, Ma Y, Shahidi LK, Mukherjee S, Allen AS, Shibata D, et al. A
Screening for lung cancer with low-dose computed tomography. JAMA 2021; Bayesian hierarchical model to estimate DNA methylation conservation in
325:971–87. colorectal tumors. Bioinformatics 2021;38:22–9.
69. Haaf K, Aalst CM, Koning HJ, Kaaks R, Tammem€agi MC. Personalising lung 75. Banos DT, McCartney DL, Patxot M, Anchieri L, Battram T, Christiansen C, et al.
cancer screening: an overview of risk-stratification opportunities and challenges. Bayesian reassessment of the epigenetic architecture of complex traits.
Int J Cancer 2021;149:250–63. Nat Commun 2020;11:2865.