0% found this document useful (0 votes)

14 views15 pages

Tutorial Online Calculator

Uploaded by

Andre Gustavo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

Tutorial Online Calculator

Uploaded by

Andre Gustavo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

TUTORIAL for the Online Age Calculator:

Estimate DNA methylation age

Steve Horvath (shorvath at mednet.ucla.edu)

This tutorial illustrates how to calculate DNA methylation age using the online calculator.

Mandatory input: A (compressed) file with beta values, e.g. measured on the Illumina 27k or
450k platform. Optionally, you can compress the comma delimited file (.csv files) into a file that
ends either with .zip or with .bz2. Other compression formats cannot yet be used.

Output:

 DNAmAge=predicted age (referred to as DMAm age)

 corSampleVSgoldstandard quality statistic for detecting outlying samples (e.g.
corSampleVSgoldstandard<0.8 should probably be excluded)

Optional, additional input: I recommend that you also input a sample annotation file that
specifies age, tissue, etc. In this case use the following variable names "Age" (note it starts with
capital A), "Female" (with values 1 for female, 0 for male, NA for missing info), "Tissue". Make
sure that the rows (samples) in the sample annotation file have the same order as the columns
(samples) in the methylation file. If you provide a sample annotation file then you will obtain the
following variables:

 AgeAccelerationResidual=the recommended age acceleration measure based on a linear

regression model.
 AgeAccelerationDiff=DNAmAge-Age
 predictedGender (based on the DNAm levels of X chromosomal markers)
 predictedTissue and probabilities that the sample comes from various tissues).

Advanced Analysis for Blood

If you applied the Illumina 450K platform to blood then you can get a host of additional output
by selecting the AdvancedAnalysisBlood option. In this case, the software will output

 additional measures of biological age in blood

 estimates of blood cell counts
 different measures of age acceleration.

Citation of this software

Horvath S (2013) DNA methylation age of human tissues and cell types. Genome Biol
14(10):R115 PMID: 24138928

1
Contents
How to upload the data?.............................................................................................................................4
Upload form................................................................................................................................................5
Strategies for uploading very large data sets..............................................................................................5
Normalization, imputation..........................................................................................................................6
Uploading the sample annotation file.........................................................................................................7
After you push the submit button...............................................................................................................8
Output file...................................................................................................................................................9
Log file.......................................................................................................................................................10
Advanced Analysis in Blood.......................................................................................................................10
1) BioAge1HO, BioAge2HO, BioAge3HO, BioAge4HO............................................................................10
2) BioAge1HA, BioAge2HA, BioAge3HA, BioAge4HA.............................................................................10
3) BioAge2HOStatic, BioAge3HOStatic, BioAge4HOStatic......................................................................10
4) BioAge2HAStatic, BioAge3HAStatic, BioAge4HAStatic.......................................................................11
5) BioAge1HOAdjAge, BioAge2HOAdjAge, BioAge3HOAdjAge, BioAge4HOAdjAge...............................11
6) BioAge1HAAdjAge, BioAge2HAAdjAge, BioAge3HAAdjAge, BioAge4HAAdjAge................................11
7) BioAge2HOStaticAdjAge, BioAge3HOStaticAdjAge, BioAge4HOStaticAdjAge...................................11
8) BioAge2HAStaticAdjAge, BioAge3HAStaticAdjAge, BioAge4HAStaticAdjAge....................................11
9) PlasmaBlastAdjAge, CD8pCD28nCD45RAnAdjAge, CD8.naiveAdjAge, CD4.naiveAdjAge.................11
10) Cell count measures: CD8T, CD4T, NK, Bcell, Mono, Gran...............................................................11
11) PlasmaBlast, CD8pCD28nCD45RAn, CD8.naive, CD4.naive..............................................................12
12) Cell count measures for multivariate regression models................................................................12
12) AAHOAdjCellCounts and AAHAAdjCellCounts.................................................................................13
Why does the web based calculator not return any results for my data set?...........................................13
Frequently asked questions.......................................................................................................................14
Q: Does the order of the samples in the sample annotation file have to match that of the
methylation file?................................................................................................................................14
Q: Are additional columns allowed in the sample annotation file?...................................................14
Q: Does the order of the columns matter in the sample annotation file? It seems like you will
require the first column to be "SampleID", second column "Age"....................................................14
Q: In the "Advanced Analysis in Blood" option, the 4 weighted averages are a bit of a mystery as
currently described. Can you elaborate on how the weighted averages were calculated?...............14

2
Q: In the advanced analysis option, it appears that only 2 age acceleration measure account for cell
types (e.g. "AAHOAdjCellcounts" and "AAHAAdjCellcounts" ). Which epigenetic age measure is
being used?.......................................................................................................................................14
References.................................................................................................................................................15

Instructions

Go to the webpage: http://labs.genetics.ucla.edu/horvath/dnamage/

To run this tutorial, download the following example data set from the webpage

MethylationDataExample55.csv

The following screen shot shows that this input file is a comma delimited Excel file whose first
column reports probe identifiers. The remaining columns correspond to samples (i.e. DNA meth
arrays) for which DNAm age will be estimated.

In this tutorial, I analyze data set 55:

 16 men: autistic subjects and controls

 brain occipital cortex samples
 Illumina 27K platform

3
 GEO accession GSE38608
 Citation for the data set:
Ginsberg MR, Rubin RA, Falcone T, Ting AH et al. Brain transcriptional and epigenetic
associations with autism. PLoS One 2012;7(9):e44736. PMID: 22984548

Some comments for the experts:

These DNA methylation data were downloaded from the Gene Expression Omnibus data
base (GEO accession GSE38608). GEO allows allows users to post both normalized data
and raw data. The authors posted M values as normalized values. However, my age
predictor makes use of beta values since I did not find any evidence that M values are
superior to beta values when it comes to age prediction.
Message: the beta values used in this tutorial do not match the normalized (M value) data
from GEO. But it is straightforward to turn M values into beta values...

How to upload the data?

Note that the following webpage http://labs.genetics.ucla.edu/horvath/dnamage/

contains a hyperlink called "Access Online Age Calculator".

After you click it you will arrive at the following webpage

4
Upload form
In the online form, enter your

1. Name:
2. Organization:
3. Email address. The results will be sent to this email address. Make sure it works.
4. Data file: Select the comma delimited file that contains your data. As mentioned before
you can upload a zipped version of this file.

Strategies for uploading very large data sets

Please take a note of the upper limit when it comes to uploading files. If you have a large data
set that exceeds these limits then I recommend the strategies below. If you have a very large data
set, start with strategy 2 and then move to strategy 1.

Strategy 1: Compress the file into a file that ends either with .zip or with .bz2. Other compression
formats cannot yet be used.

Strategy 2: Turn your Illumina 450K data into a "reduced" file that only contains probes that can
be found in the file datMiniAnnotation.csv (which is on our webpage). This does not result in
any information loss since the epigenetic clock only uses probes that can be found in this file.
After implementing this step, compress the resulting file (i.e. apply Strategy 1).

CpG probes that were not measured in your data set (e.g. are not present on the 450K array)
should lead to a row filled with NAs.

Here is some relevant R code that assume your large data file is called "dat0" and the first
column of dat0 contains the probe identifiers.

5
library(sqldf)

#change the setwd filepath to that of the folder with your data. Note the forward slash

setwd("C:/Users/SHorvath/Documents/DNAmAge/Example55")

#replace "MethylationData.csv" with the name of your methylation data file

dat0=read.csv.sql("MethylationData.csv") ;

datMiniAnnotation=read.csv("datMiniAnnotation.csv")

match1=match(datMiniAnnotation[,1], dat0[,1] )

dat0Reduced=dat0[match1,]

dat0Reduced[,1]=as.character(dat0Reduced[,1])

dat0Reduced[is.na(match1),1]= as.character(datMiniAnnotation[is.na(match1),1])

datout=data.frame(dat0Reduced)

# make sure you output numeric variables...

for (i in 2:dim(datout)[[2]] ){datout[,i]=

as.numeric(as.character(gsub(x=datout[,i],pattern="\"",replacement=""))) }

#replace "MethylationData" with a filename of your choice

write.table(datout,"MethylationData.csv", row.names=F, sep="," )

Strategy 3: Split the data into batches, e.g. batches of 500 samples each. Next apply strategies 1
or 2.

Strategy 4: Email Steve Horvath or Yining Zhao to increase the upload limit for you.

Normalization, imputation
Additional buttons for the DNAm Age calculator allow you to check whether you want to
normalize the data. It is strongly recommended to use the default setting (i.e. check "Normalize
Data") since it often improves the predictive accuracy.

I have noticed that some users don't select this option since they think that they have their own
superior normalization method. You should still check "Normalize Data". Reason: your
normalization method has a different goal from my normalization method. The purpose of my
normalization method is to make your data comparable to the training data of the epigenetic
clock.

6
I advise against using the fast imputation method. However, if you have hundreds of samples
with missing data and want to get a quick result then check "Fast Imputation".

Uploading the sample annotation file

Sample annotation format
This sample annotation file is optional. Please upload it if you want to
a) obtain various measures of age acceleration,
b) allow the function to do some basic quality checks (e.g. check of gender, tissue).

Requirements: The sample annotation file should be comma delimited text file whose rows
correspond to samples (e.g. human subjects). Make sure that the rows (samples) in the sample
annotation file have the same order as the columns (samples) in the methylation file.

1) Not necessary but highly recommended: The first column should report the sample
identifiers (matching those of the DNA methylation data, e.g. "Subject1", etc).
2) Mandatory: a column whose name is spelled "Age". This column should report the
(chronological) age in years, e.g. 0 for a newborn, 0.5 encodes a 6 month old child, 30 for a 30
year old. Prenatal samples would get a negative value, i.e. -.5 for a sample measured half a year
before the expected birth. If you don't have age values, simply fill up the column with "NA".
3) Optional: I strongly recommend that you include gender information since this allows us
to check whether the data are properly normalized etc. Toward this end, please insert a column
called "Female" (note the capitalization) which takes a value of 1 if the subject is female, 0 if the
subject is male, and NA if the information is not available. If you don't use ones or zeros, you
will get an error message. The calculator will output a column called "predictedGender". If the
gender prediction does not match the known gender then there may be data quality issues.
4) Optional: I strongly recommend that you include a column that reports the DNA source (e.g.
tissue). Toward this end, please insert a column called "Tissue" (note the capitalization) which
takes a descriptive value. The tissue prediction tool is not yet published and its predictions
should be interpreted with all due caution. I include this early version since it may help you
identify mislabeled/suspicious samples.

Check whether one of the following descriptive terms matches your DNA source. If so, please
use it. Otherwise simply report the best name that describes your DNA source.

[1] " Vasc.Endoth(Umbilical)"

[2] "Ape WB"
[3] "Blood CD4 Tcells"
[4] "Blood CD4+CD14"
[5] "Blood Cell Types"
[6] "Blood Cord"
[7] "Blood PBMC"
[8] "Blood WB"
[9] "Bone"
[10] "Brain Cerebellar"
[11] "Brain CRBLM"

7
[12] "Brain FCTX"
[13] "Brain Occipital Cortex"
[14] "Brain PONS"
[15] "Brain Prefr.CTX"
[16] "Brain TCTX"
[17] "Breast"
[18] "Breast NL"
[19] "Buccal"
[20] "Cartilage Knee"
[21] "Colon"
[22] "Dermal fibroblast"
[23] "Epidermis"
[24] "Fat Adip"
[25] "Gastric"
[26] "GlialCell"
[27] "Head+Neck"
[28] "Heart"
[29] "Kidney"
[30] "Liver"
[31] "Liver "
[32] "Lung"
[33] "MSC" note that this stands for mesenchymal stromal cells
[34] "Muscle"
[35] "Neuron"
[36] "Placenta"
[37] "Prostate NL"
[38] "Saliva"
[39] "Sperm"
[40] "Stomach"
[41] "Thyroid"
[42] "Uterine Cervix"
[43] "Uterine Endomet"

The software will output a column called predictedTissue, which reports the predicted DNA
source, i.e. one of the above mentioned DNA sources. Future versions of the age predictor will
report more potential DNA sources.

After you push the submit button

Push the "Submit" button. After a few minutes you will receive an email with the subject
heading "Your Processing Result" that contains two attachments. The first attached file, whose
name ends with "...output.csv" is a comma delimited file (which can be opened with Excel).

How long does it take to get an email after your submitted your data?

8
That depends on your sample size and whether or not you want the software to normalize the
data. If you don't normalized the data, you should get an email within a couple of minutes. In
contrast, normalizing several hundred samples could take several hours.

If you don't get any email, it means that your data crashed the R program. In this case, please
carefully look at your input data. Do they meet the requirements? Maybe your methylation data
set contains non-numeric variables (apart from the identifiers in the first column).

Output file
Note that the output file contains a host of useful information e.g.

 SampleID=sample identifier
 DNAmAge=DNA methylation age=predicted age
 Comment=A comment is only added if a sample looks suspicious.
 noMissingPerSample=number of missing beta values per sample
 meanMethBySample, minMethBySample=the mean and min beta value before
normalization
 corSampleVSgoldstandard=correlation between the sample and the gold standard
(defined by averaging the beta values across the samples from the largest blood data set).
A low value spells trouble and a comment will be added.
 meanAbsDifferenceSampleVSgoldstandard=mean absolute difference between the
sample and the gold standard. A large value spells trouble and a comment will be added.
 predictedGender=predicted gender based on the mean across the X chromosomal
markers. The sample is problematic if the predicted gender does not match the known
gender.
 meanXchromosome= mean beta value across the X chromosomal markers. This variable
is used for predicting gender. Female samples should have a higher value than male
samples if X chromosomal inactivation is applicable.
 predictedTissue=the predicted DNA source (i.e. it does not have to be a tissue)
 ProbabilityFrom.Blood.PBMC=probability that the DNA derives from peripheral blood
mononuclear cells.
 ProbabilityFrom.Brain.Cerebellar=probability that it comes from cerebellar brain samples
 ProbabilityFrom.Brain.FCTX=probability that it comes from frontal cortex
 ETC
 AgeAccelerationDiff=Age acceleration measure defined simply as difference, i.e.
DNAmAge minus Age
 AgeAccelerationResidual=Age acceleration measure defined as residual from regressing
DNAm age on chronological age. In R language: residuals(lm(DNAmAge-Age))

9
Log file
The second email attachment (ending in log.txt) is a log file that briefly describes the data and
provides some feedback, e.g. warnings or error messages.

Advanced Analysis in Blood

If you measured Illumina 450K data in blood then I recommend that you select the advanced
analysis option in blood. Side note: If you have more than say 100 samples then I strongly
recommend to use data compression strategies 2 and 1 described in Strategies for uploading very
large data sets.

The advanced analysis option leads to a host of additional output: various measures of biological
age, age acceleration and blood cell counts.

1) BioAge1HO, BioAge2HO, BioAge3HO, BioAge4HO

Explanation: All of these measures of biological age generalize the DNAmAge described in
Horvath 2013. BioAge1HO is simply another name for DNAmAge. BioAge2HO, BioAge3HO,
BioAge4HO are defined as weighted average based on two, three, and four epigenetic input
variables, respectively. The weights are "dynamically" calculated by correlating the input
variables to chronological age. Measures 2-4 can only be calculated if chronological age
specified in the variable "Age" is available and has a non-zero variance. If age is not available or
all samples have the same age (zero variance) simply use 3) BioAge2HOStatic,
BioAge3HOStatic, BioAge4HOStatic .

2) BioAge1HA, BioAge2HA, BioAge3HA, BioAge4HA

Explanation: All of these measures extend the predicted age measures based on the 71 CpGs in
Hannum 2013. Again, BioAge2HA, BioAge3HA, BioAge4HA are generalized measures of
biological age based on two, three, and four epigenetic input variables, respectively. They can
only be calculated if chronological age specified in the variable "Age" is available. If age is not
available or all samples have the same age (zero variance) simply use 4) BioAge2HAStatic,
BioAge3HAStatic, BioAge4HAStatic

3) BioAge2HOStatic, BioAge3HOStatic, BioAge4HOStatic

These measures are analogous to those described in 1) BioAge1HO, BioAge2HO, BioAge3HO,

BioAge4HO but the weights are static (meaning constant). In particular, these measures can be
calculated even if the column "Age" is filled with missing values or all subjects have the same
chronological age.

10
4) BioAge2HAStatic, BioAge3HAStatic, BioAge4HAStatic
These measures are analogous to those described in 2) BioAge1HA, BioAge2HA, BioAge3HA,
BioAge4HA but the weights are static (meaning constant). In particular, these measures can be
calculated even if the column "Age" is filled with missing values or all subjects have the same
chronological age.

5) BioAge1HOAdjAge, BioAge2HOAdjAge, BioAge3HOAdjAge,

BioAge4HOAdjAge
These are measures of age acceleration based on adjusting measures 1) BioAge1HO,
BioAge2HO, BioAge3HO, BioAge4HO for chronological age using a linear regression model.

In other words, a positive (negative) value indicates that the biological age of the sample is
higher (lower) than expected based on chronological age. Specifically, these measures were
defined as residuals (observed minus predicted) resulting from a simple linear regression model
that regressed biological age on chronological age. By definition, these measures are not
correlated with chronological age, which is an attractive property. A disadvantage of these
measures is that they can only be defined if "Age" has a non-zero variance and is available for at
least 4 subjects.

6) BioAge1HAAdjAge, BioAge2HAAdjAge, BioAge3HAAdjAge,

BioAge4HAAdjAge
These are measures of age acceleration based on adjusting measures 2) BioAge1HA,
BioAge2HA, BioAge3HA, BioAge4HA for chronological age using a linear regression model.

7) BioAge2HOStaticAdjAge, BioAge3HOStaticAdjAge, BioAge4HOStaticAdjAge

These are measures of age acceleration based on adjusting measures 3) BioAge2HOStatic,
BioAge3HOStatic, BioAge4HOStatic for chronological age using a linear regression model.

8) BioAge2HAStaticAdjAge, BioAge3HAStaticAdjAge, BioAge4HAStaticAdjAge

These are measures of age acceleration based on adjusting measures 4) BioAge2HAStatic,
BioAge3HAStatic, BioAge4HAStatic for chronological age using a linear regression model.

9) PlasmaBlastAdjAge, CD8pCD28nCD45RAnAdjAge, CD8.naiveAdjAge,

CD4.naiveAdjAge
These are age adjusted versions of 11) PlasmaBlast, CD8pCD28nCD45RAn, CD8.naive,
CD4.naive. In other words, these are residuals resulting from a linear model that regresses the
respective cell abundance measure on chronological age.

10) Cell count measures: CD8T, CD4T, NK, Bcell, Mono, Gran
These are estimated proportions of CD8 T cells, CD4T cells, natural killer cells, B cells,
monocytes and granulocytes. Toward this end, I used the method and R code described in
Houseman et al (2012). Specifically, I used the R command "projectCellType" in the minfi R

11
package (Aryee et al 2014). If you use these cell types in your work, make sure to cite Houseman
et al 2014.

11) PlasmaBlast, CD8pCD28nCD45RAn, CD8.naive, CD4.naive

These are estimated abundance measures of plasma blasts, CD8+CD28-CD45RA- T cells, naive
CD8 T cells, and naive CD4 T cells. Since a novel approach was used to arrive at these
estimates, please cite the epigenetic clock software (Horvath 2013) if you use these measures.
Interpretation: The resulting estimates should *not* be interpreted as counts or percentages but
rather as ordinal abundance measures. Don't turn them into proportions (by dividing the
measures by the sum). Negative values simply indicate very low values. Personally, I would not
set a negative value to zero but would not object if you do that.

Biology:

a) CD8+CD28-CD45RA- T cells have characteristics of both memory and effector T cells. These
cells increase with chronological age.

b) Naive CD8 T cells decrease with age.

c) Here naive CD8 and CD4 T cells are defined as CD45RA+CCR7+ cells.

d) Plasma cells, also called plasma B cells, and effector B cells, are white blood cells that secrete
large volumes of antibodies. From Wikipedia: Upon stimulation by a T cell, which usually
occurs in germinal centers of secondary lymphoid organs like the spleen and lymph nodes, the
activated B cell begins to differentiate into more specialized cells. Germinal center B cells may
differentiate into memory B cells or plasma cells. Most of these B cells will become
plasmablasts, and eventually plasma cells, and begin producing large volumes of antibodies.

Statistical method for estimating these cell abundance measures: A penalized regression model
(elastic net) was used regress cell count measures on DNA methylation levels. Estimated values
are predicted values based on this penalized regression model.

12) Cell count measures for multivariate regression models

Since cell heterogeneity can greatly affect DNAm studies, it is often a good idea to adjust for cell
abundance measures (Houseman 2012). If you have whole blood methylation data, then I suggest
you adjust for the following cell counts in your analysis:

1. CD8.naive (Horvath method)

2. CD8pCD28nCD45RAn (Horvath method)
3. PlasmaBlast (Horvath method)
4. CD4T (Houseman)
5. NK (Houseman)
6. Mono (Houseman)
7. Gran (Houseman)

12
Since many of the cells are highly correlated with each other, I dropped the B cell and CD8T cell
estimates from the Houseman method. When studying various diseases, it is probably a good
idea to replace "Bcell" by "PlasmaBlast" (related to B cells) since the latter is often more disease
relevant. Further, I usually replace "CD8T" by the two measures "CD8.naive"

"CD8pCD28nCD45RAn" since the latter are probably more disease relevant. I rarely use
CD4.naive since CD8.naive is often more relevant.

To assess whether DNAmAge relates to a disease outcome, I use the following covariate list

DNAmAge+Age+CD8.naive + CD8pCD28nCD45RAn + PlasmaBlast+CD4T+NK+Mono+Gran.

Obviously, you would also adjust for standard variables such as gender, race, body mass index, prior
history of disease e.g. cancer, type II diabetes status, etc.

12) AAHOAdjCellCounts and AAHAAdjCellCounts

These are measures of age acceleration that adjust for cell counts. Specifically, these are residuals
resulting from multivariate regression models that regress an estimate of DNAm age on age+CD8.naive +
CD8pCD28nCD45RAn + PlasmaBlast+CD4T+NK+Mono+Gran (as described in the previous section).

AAHOAdjCellCounts and AAHAAdjCellCounts correspond to age acceleration measures based on

Horvath (2013) and Hannum (2013), respectively. It turns out that each of the following four measures
1) BioAge1HO, BioAge2HO, BioAge3HO, BioAge4HO leads to the same measure AAHOAdjCellCounts.
Similarly, each of the following four measures 2) BioAge1HA, BioAge2HA, BioAge3HA, BioAge4HA leads
to the same measure AAHAAdjCellCounts.

Why does the web based calculator not return any results for my data
set?
Answer: A small data set (say fewer than 100 samples) should lead to a response within an hour or so. Try a
small subset of your data to see whether you get a response. If not, your data lead to an error. Here are some
common remedies.

a) If you uploaded a sample annotation file, make sure that its numbers of rows correspond to the number of
samples, i.e. the numbers of columns of dat0 minus 1.

b) Make sure that your DNA methylation data file contains all the necessary probes. While it is OK to have
missing DNA methylation levels, it is not OK to have missing probe IDs. Unless you use all probes on the
450K array or the 27K array, please make sure that your file includes all CpGs listed in datMiniAnnotation.csv
Probes that were not measured in your data set should lead to a row filled with NAs. But the probe name needs
to be listed. The advanced analysis option for blood requires that your data were measured on the
Illumina450K platform but it only uses the probes in datMiniAnnotation.

13
c) Line feeds: I have noticed that the session breaks down when users upload the wrong line breaks. It should
be CR+LF (carriage return and line feed) and not just LF or CR. A simple remedy is to open the csv file in
Excel and save it as a .csv file for Windows.

d) Make sure that you upload numeric data (missing values should be coded as NA and not as null or NULL).
Sometimes a user uploads a file that also contains various annotations (e.g. chromosome number, gene name).
Carefully look at dat0 before you upload it. The first column should contain CpG identifiers. The remaining
columns should only contain numeric values. If a column (sample) only contains missing values, remove it
from dat0 and datSample. If need be, run the following R code before you upload the data.

for (i in 2:dim(dat0)[[2]] ) { dat0[,i]=as.numeric(as.character(dat0[,i])) }

Frequently asked questions

Q: Does the order of the samples in the sample annotation file have to match that of the
methylation file?
A: Yes, absolutely. If DNAm age is not correlated with chronological age then there is a good chance that the
user or the lab accidentally permuted the sample order. I could tell you several anecdotes about how the
epigenetic clock software allowed us to find plating errors or labeling errors.

Q: Are additional columns allowed in the sample annotation file?

Yes, as many as you can handle. Thousands.

Q: Does the order of the columns matter in the sample annotation file? It seems like you will
require the first column to be "SampleID", second column "Age".
A: No the order does not matter. The first column does *not* have to be called SampleID. However, it is very
important that the file contains columns called "Age", "Female", and "Tissue". The capitalization has to be as
specified. Don't use variable names such as age, AGE, female, tissue, TISSUE.

Q: In the "Advanced Analysis in Blood" option, the 4 weighted averages are a bit of a mystery
as currently described. Can you elaborate on how the weighted averages were calculated?
These measures have not yet been published. Please email Steve Horvath to request a relevant manuscript.

Q: In the advanced analysis option, it appears that only 2 age acceleration measure account
for cell types (e.g. "AAHOAdjCellcounts" and "AAHAAdjCellcounts" ). Which epigenetic age
measure is being used?
A: It turns out that multiple measures of biological age lead to the same adjusted measure, which is why it is
sufficient to calculate only two age acceleration measures that account for cell counts. Specifically, each of the
following four measures 1) BioAge1HO, BioAge2HO, BioAge3HO, BioAge4HO leads to the same measure
AAHOAdjCellCounts.Similarly, each of the following four measures 2) BioAge1HA, BioAge2HA,
BioAge3HA, BioAge4HA leads to the same measure AAHAAdjCellCounts.

14
References
 Horvath S (2013) DNA methylation age of human tissues and cell types. Genome Biol
14(10):R115 PMID: 24138928
 Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, Klotzle B, Bibikova M,
Fan JB, Gao Y, Deconde R, Chen M, Rajapakse I, Friend S, Ideker T, Zhang K (2013)
Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol
Cell. 2013 Jan 24;49(2):359-67.
 Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH,
Wiencke JK, Kelsey KT (2012) DNA methylation arrays as surrogate measures of cell
mixture distribution. BMC Bioinformatics 2012, 13:86 doi:10.1186/1471-2105-13-86
 Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD,
Irizarry RA (2014) Minfi: a flexible and comprehensive Bioconductor package for the
analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014 May
15;30(10):1363-9. doi: 10.1093/bioinformatics/btu049.

Epigenetic Age Clock Test Sample Report
No ratings yet
Epigenetic Age Clock Test Sample Report
78 pages
LevineBioclockJuly2023 07 13 548904v1 Full
No ratings yet
LevineBioclockJuly2023 07 13 548904v1 Full
58 pages
2021 06 01 446559v5 Full
No ratings yet
2021 06 01 446559v5 Full
35 pages
Fail-Tests of DNA Methylation Clocks, and Development of A Noise Barometer For Measuring Epigenetic Pressure of Aging and Disease
No ratings yet
Fail-Tests of DNA Methylation Clocks, and Development of A Noise Barometer For Measuring Epigenetic Pressure of Aging and Disease
24 pages
Telomere Length Analysis Report
No ratings yet
Telomere Length Analysis Report
28 pages
Epigenetics
No ratings yet
Epigenetics
13 pages
Dna Methylation Thesis
100% (3)
Dna Methylation Thesis
4 pages
Ijms 24 02759 v2
No ratings yet
Ijms 24 02759 v2
15 pages
Using High-Density DNA Methylation Arrays To Profile Copy Number Alterations
No ratings yet
Using High-Density DNA Methylation Arrays To Profile Copy Number Alterations
13 pages
Combining A Novel Ensemble Model and Multiplex Methylation Snapshot Assays For Saliva Age Prediction and Cross-Platform Data Analysis
No ratings yet
Combining A Novel Ensemble Model and Multiplex Methylation Snapshot Assays For Saliva Age Prediction and Cross-Platform Data Analysis
13 pages
An Epigenetic Biomarker of Aging For Lifespan and Healthspan
No ratings yet
An Epigenetic Biomarker of Aging For Lifespan and Healthspan
19 pages
2021 Article 407
No ratings yet
2021 Article 407
10 pages
DNA Methylation Age of Human Tissues and Cell Types
No ratings yet
DNA Methylation Age of Human Tissues and Cell Types
20 pages
Independent Evaluation of An 11-CpG
No ratings yet
Independent Evaluation of An 11-CpG
8 pages
Erratum To DNA Methylation Age of Human Tissues and Cell Types
No ratings yet
Erratum To DNA Methylation Age of Human Tissues and Cell Types
5 pages
Dna Methylation PHD Thesis
100% (3)
Dna Methylation PHD Thesis
5 pages
Repeatability of Methylation Measures Using A Qiaseq Targeted Methyl Panel and Comparison With The Illumina Humanmethylation450 Assay
No ratings yet
Repeatability of Methylation Measures Using A Qiaseq Targeted Methyl Panel and Comparison With The Illumina Humanmethylation450 Assay
7 pages
Know Your Bio Age Sample Report
No ratings yet
Know Your Bio Age Sample Report
6 pages
Starnawska 2016
No ratings yet
Starnawska 2016
4 pages
Raharris - Illumina Infinium 450K Array
No ratings yet
Raharris - Illumina Infinium 450K Array
10 pages
Forensic Epigenetic Age Estimation and Beyond: Ethical and Legal Considerations
No ratings yet
Forensic Epigenetic Age Estimation and Beyond: Ethical and Legal Considerations
3 pages
Forensic Epigenetic Age Estimation and Beyond: Ethical and Legal Considerations
No ratings yet
Forensic Epigenetic Age Estimation and Beyond: Ethical and Legal Considerations
3 pages
Genessense Brochure
No ratings yet
Genessense Brochure
4 pages
AP Minimum Wages Details - 1st Oct 2024 - 31st March 2025
No ratings yet
AP Minimum Wages Details - 1st Oct 2024 - 31st March 2025
29 pages
Deliveraddis
No ratings yet
Deliveraddis
7 pages
Lecture 2.a Analysis of RC Beams
No ratings yet
Lecture 2.a Analysis of RC Beams
27 pages
4K电影合集 - 副本
No ratings yet
4K电影合集 - 副本
19 pages
TNOU Hall Ticket
100% (1)
TNOU Hall Ticket
2 pages
Credentials - Impeerical Consulting
No ratings yet
Credentials - Impeerical Consulting
22 pages
Negro Who's Who in California (1948)
100% (2)
Negro Who's Who in California (1948)
154 pages
08Pr067C Electrical Safety: Safety Management System Procedure
No ratings yet
08Pr067C Electrical Safety: Safety Management System Procedure
8 pages
Dig CHINA - US - CANADA NEXUS Incl NXIVM Clinton Epstein Feinstein Belzberg
No ratings yet
Dig CHINA - US - CANADA NEXUS Incl NXIVM Clinton Epstein Feinstein Belzberg
69 pages
Sworn Statement of Assets, Liabilities and Net Worth
No ratings yet
Sworn Statement of Assets, Liabilities and Net Worth
2 pages
State Farm Report
No ratings yet
State Farm Report
20 pages
Estimating and Bidding
No ratings yet
Estimating and Bidding
5 pages
Pro Wrestling Illustrated, 2005-03 (2004 in Wrestling) (C)
No ratings yet
Pro Wrestling Illustrated, 2005-03 (2004 in Wrestling) (C)
148 pages
HYD691 Datasheet: Introducing The Hyd691... Standard Materials of Construction
No ratings yet
HYD691 Datasheet: Introducing The Hyd691... Standard Materials of Construction
4 pages
Purple Ocean Strategy
No ratings yet
Purple Ocean Strategy
11 pages
Jeppview For Windows: List of Pages in This Trip Kit
No ratings yet
Jeppview For Windows: List of Pages in This Trip Kit
30 pages
PolicyClauseNewIndiaMediclaimPolicy (NIAHLIP23187V052223)
No ratings yet
PolicyClauseNewIndiaMediclaimPolicy (NIAHLIP23187V052223)
36 pages
As ISO IEC 6523.1-2005 Information Technology - Structure For The Identification of Organizations and Organiz
No ratings yet
As ISO IEC 6523.1-2005 Information Technology - Structure For The Identification of Organizations and Organiz
7 pages
LUVOBATCH Blowingagents EN 2022
No ratings yet
LUVOBATCH Blowingagents EN 2022
7 pages
Opportunities For Nys Graduates May 2,2025
No ratings yet
Opportunities For Nys Graduates May 2,2025
4 pages
1 Introduction To Environmental Science
No ratings yet
1 Introduction To Environmental Science
16 pages
AOA 2023 Solution
No ratings yet
AOA 2023 Solution
25 pages
GBV Monthly Work Plan
No ratings yet
GBV Monthly Work Plan
20 pages
State Space Control of Systems Tutorial
No ratings yet
State Space Control of Systems Tutorial
15 pages
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
No ratings yet
Dayananda Sagar College of Engineering: M.TECH: Digital Electronics and Communication
4 pages
Air21 Location
No ratings yet
Air21 Location
1 page
(Communication Electronic Circuits) Preface
No ratings yet
(Communication Electronic Circuits) Preface
2 pages
Advanced Microcontroller Programming DC
No ratings yet
Advanced Microcontroller Programming DC
3 pages
CD Stereo System SC-AKX12: Supplied Accessories
No ratings yet
CD Stereo System SC-AKX12: Supplied Accessories
2 pages
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
No ratings yet
Experimental Study On The Application of Polymer Modified Bitumen in The Flexible Pavement
1 page
Practical Data Analysis
From Everand
Practical Data Analysis
Hector Cuesta
4.5/5 (14)
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
Mastering Clojure Data Analysis
From Everand
Mastering Clojure Data Analysis
Eric Rochester
No ratings yet
R Data Structures and Algorithms
From Everand
R Data Structures and Algorithms
Dr. PKS Prakash
No ratings yet
Introduction to Biostatistics with JMP (Hardcover edition)
From Everand
Introduction to Biostatistics with JMP (Hardcover edition)
Steve Figard
1/5 (1)
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
From Everand
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
UBER AUTHOR
No ratings yet
Biostatistics by Example Using SAS Studio
From Everand
Biostatistics by Example Using SAS Studio
Ron Cody
No ratings yet
Fundamental of Database Management System: Learn essential concepts of database systems
From Everand
Fundamental of Database Management System: Learn essential concepts of database systems
Dr. Mukesh Chandra Negi
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet
Big Data Forensics – Learning Hadoop Investigations
From Everand
Big Data Forensics – Learning Hadoop Investigations
Joe Sremack
No ratings yet
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Python 3 and Data Analytics Pocket Primer: A Quick Guide to NumPy, Pandas, and Data Visualization
From Everand
Python 3 and Data Analytics Pocket Primer: A Quick Guide to NumPy, Pandas, and Data Visualization
Mercury Learning and Information
No ratings yet
An Introduction to Creating Standardized Clinical Trial Data with SAS
From Everand
An Introduction to Creating Standardized Clinical Trial Data with SAS
Todd Case
No ratings yet
Analysis of Experimental Data Microsoft®Excel or Spss??! Sharing of Experience English Version: Book 3
From Everand
Analysis of Experimental Data Microsoft®Excel or Spss??! Sharing of Experience English Version: Book 3
Ping Yuen PY Cheng
No ratings yet
Python Data Analysis
From Everand
Python Data Analysis
Ivan Idris
4/5 (2)
R High Performance Programming
From Everand
R High Performance Programming
Aloysius Lim
4.5/5 (2)
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
From Everand
Applied Survival Analysis: Regression Modeling of Time-to-Event Data
David W. Hosmer, Jr.
4/5 (2)
Mastering Parallel Programming with R
From Everand
Mastering Parallel Programming with R
Simon R. Chapple
No ratings yet
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Step by step practical guide with Statistics (from ANOVA to survival analysis) in Biological Sciences: Or: Help, how can I analyze my “damned” scientific data correctly and in an easy way with free R!
From Everand
Step by step practical guide with Statistics (from ANOVA to survival analysis) in Biological Sciences: Or: Help, how can I analyze my “damned” scientific data correctly and in an easy way with free R!
Boran Altincicek
3/5 (1)
PostgreSQL for Data Architects
From Everand
PostgreSQL for Data Architects
Jayadevan Maymala
3/5 (1)
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
From Everand
Time Series Forecasting using Deep Learning: Combining PyTorch, RNN, TCN, and Deep Neural Network Models to Provide Production-Ready Prediction Solutions
Ivan Gridin
No ratings yet
Python Data Science Essentials - Second Edition
From Everand
Python Data Science Essentials - Second Edition
Alberto Boschetti
4.5/5 (3)
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
From Everand
Laboratory Practice, Testing, and Reporting: Time-Honored Fundamentals for the Sciences
Dwayne Phillips
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Preparing Data for Analysis with JMP
From Everand
Preparing Data for Analysis with JMP
Robert Carver
No ratings yet
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
From Everand
A Quick and Easy Guide in Using SPSS for Linear Regression Analysis
Jurex Gallo
No ratings yet
Large Scale Machine Learning with Python
From Everand
Large Scale Machine Learning with Python
Bastiaan Sjardin
2/5 (1)
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
Regression Analysis with Python: Discover everything you need to know about the art of regression analysis with Python, and change how you view data
From Everand
Regression Analysis with Python: Discover everything you need to know about the art of regression analysis with Python, and change how you view data
Luca Massaron
No ratings yet
Instant Heat Maps in R How-to
From Everand
Instant Heat Maps in R How-to
Sebastian Raschka
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Tutorial Online Calculator

Uploaded by

Tutorial Online Calculator

Uploaded by

TUTORIAL for the Online Age Calculator:

Estimate DNA methylation age

Steve Horvath (shorvath at mednet.ucla.edu)

 DNAmAge=predicted age (referred to as DMAm age)

 AgeAccelerationResidual=the recommended age acceleration measure based on a linear

Advanced Analysis for Blood

 additional measures of biological age in blood

Citation of this software

Go to the webpage: http://labs.genetics.ucla.edu/horvath/dnamage/

In this tutorial, I analyze data set 55:

 16 men: autistic subjects and controls

Some comments for the experts:

How to upload the data?

contains a hyperlink called "Access Online Age Calculator".

After you click it you will arrive at the following webpage

Strategies for uploading very large data sets

#replace "MethylationData.csv" with the name of your methylation data file

# make sure you output numeric variables...

for (i in 2:dim(datout)[[2]] ){datout[,i]=

#replace "MethylationData" with a filename of your choice

write.table(datout,"MethylationData.csv", row.names=F, sep="," )

Uploading the sample annotation file

[1] " Vasc.Endoth(Umbilical)"

After you push the submit button

Advanced Analysis in Blood

1) BioAge1HO, BioAge2HO, BioAge3HO, BioAge4HO

2) BioAge1HA, BioAge2HA, BioAge3HA, BioAge4HA

3) BioAge2HOStatic, BioAge3HOStatic, BioAge4HOStatic

These measures are analogous to those described in 1) BioAge1HO, BioAge2HO, BioAge3HO,

5) BioAge1HOAdjAge, BioAge2HOAdjAge, BioAge3HOAdjAge,

6) BioAge1HAAdjAge, BioAge2HAAdjAge, BioAge3HAAdjAge,

7) BioAge2HOStaticAdjAge, BioAge3HOStaticAdjAge, BioAge4HOStaticAdjAge

8) BioAge2HAStaticAdjAge, BioAge3HAStaticAdjAge, BioAge4HAStaticAdjAge

9) PlasmaBlastAdjAge, CD8pCD28nCD45RAnAdjAge, CD8.naiveAdjAge,

11) PlasmaBlast, CD8pCD28nCD45RAn, CD8.naive, CD4.naive

b) Naive CD8 T cells decrease with age.

12) Cell count measures for multivariate regression models

1. CD8.naive (Horvath method)

DNAmAge+Age+CD8.naive + CD8pCD28nCD45RAn + PlasmaBlast+CD4T+NK+Mono+Gran.

12) AAHOAdjCellCounts and AAHAAdjCellCounts

AAHOAdjCellCounts and AAHAAdjCellCounts correspond to age acceleration measures based on

for (i in 2:dim(dat0)[[2]] ) { dat0[,i]=as.numeric(as.character(dat0[,i])) }

Frequently asked questions

Q: Are additional columns allowed in the sample annotation file?

You might also like