ASReml Workshop PDF
ASReml Workshop PDF
Salvador A. Gezan
[email protected]
Patricio R. Muñoz
[email protected]
October, 2014
CONTENTS
Session
1 Introduction to ASReml
2 Introduction to Linear Mixed Models
3 Job Structure in ASReml
4 Breeding Theory
5 Genetic Analyses: Parental Models
6 Genetic Anayses: Animal Models
5 Variance Structures in ASReml
6 Multivariate Analysis
7 Multi-environment Analysis
8 Spatial Analyses
9 Generalized Linear Mixed Models
10 Introduction to GBLUP
11 GBLUP in ASReml
Session 1
Introduction to
ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
WHAT is ASReml?
ASReml uses the Average Information (AI) algorithm and sparse matrix
methods.
Platforms
Windows 98/ME/2000/XP/Vista/Windows7
Linux
Apple Macintosh
Interface
DOS (edit)
Windows (Notepad, ASReml-W)
R (or S-plus)
Text editors (e.g. ConTEXT)
GSView (graphical viewer)
Terminal (Mac)
WHERE TO GET HELP?
Official Documentation
c:\Program Files\Asreml3\Doc\
Webpages
uncronopio.org/ASReml/HomePage (cookbook)
http://www.vsni.co.uk/software/asreml/htmlhelp/ (distributor page)
www.vsni.co.uk/forum (user forum)
STEPS FOR AN ANALYSIS
yij i g j eij
yij observation belonging to ith treatment jth block
αi fixed effect of the ith block
gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ALFALFA EXPERIMENT
Data file: /Day1/Alfalfa/ALFALFA.txt
Source Variety Block Resp
1 A 1 2.17
1 B 1 1.58
1 C 1 2.29
1 D 1 2.23
2 E 1 2.33
2 F 1 1.38
2 G 1 1.86
2 H 1 2.27
3 I 1 1.75
3 J 1 1.52
3 K 1 1.55
3 L 1 1.56
...
3 J 6 1.31
3 K 6 1.13
3 L 6 1.33
ALFALFA EXPERIMENT
Job file: /Day1/Alfalfa/Alfalfa.as
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I # Not used
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFA.txt !SKIP 1
!DISPLAY 7 !SUMMARY
yield ~ mu Block !r Variety
predict Variety !SED !TDIFF !PLOT
Some syntax
~ separates response from the list of fixed and random terms.
! Used for identification of option.
# Comment following (skips rest of line).
ALFALFA EXPERIMENT
ALFALFA EXPERIMENT
ASReml 3.0 [01 Jan 2009] Alfalfa experiment - 12 varieties - Response Yield
Build gt [26 Nov 2010] 32 bit
28 Sep 2013 16:28:33.369 32 Mbyte Windows Alfalfa
Licensed to: UFL 31-dec-2013
***********************************************************
* Contact [email protected] for licensing and support *
***************************************************** ARG *
Folder: C:\WORK\ASReml\ASReml_2013\Distribute_Instr\Day1\Alfalfa
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
QUALIFIERS: !SKIP 1 !DISPLAY 7 !SUMMARY
Reading Alfalfa.txt FREE FORMAT skipping 1 lines
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using algebraic derivatives.
Peak Count: 6
Range: -0.382836 0.456330
0
ALFALFA EXPERIMENT
Interpreting output
LogL Converged
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001
---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of yield
The SIMPLE averaging set: Block
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MIXED MODELS
• Mixed models extend the linear model by allowing a more flexible
specification of the errors (and other random factors). Hence, it allows for a
different type of inference and also allows to incorporate correlation and
heterogeneous variances between the observations.
• Fixed effects: are those factors whose levels are selected by a nonrandom
process or whose levels consist of the entire population of possible levels.
Inferences are made only to those levels included in the study. Hint: all
levels of interest are in your data set.
yij i g j eij
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
MODEL FOR A RCBD
Dataset: two factors to consider: one defining the block to which each
experimental unit is allocated, and the other to the treatment applied
to each unit.
yij i g j eij
where,
yij observation belonging to the ith treatment jth block, i = 1 … r, j = 1 … t
μ is the population mean
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
gi ~ N[0,σg2]
eij ~ N[0,σ2]
MODEL COMPONENTS
response = systematic component + random component
response = structural component + explanatory component + random component
Multi-stratum ANOVA: makes explicit the separation between blocks (or the
more general structure of units) and treatments.
MIXED MODELS
Hypothesis of interest
Test statistic: F or t
yij i g j eij
yij observation belonging to ith treatment jth block
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2
i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ALFALFA EXPERIMENT
yield = µ + block + variety + error
y Xβ Zg e
y11 1 1 ... 0 1 0 ... 0 e11 2g 0
. .
. . 2g
G
. . . . ...
. . . .
g1 0 2g
y1r 1 1 ... 0 1 0 0 ... 1 g 2 et1
. 1 0 ... 0 1 0 ... 0 .
. . . 2 0
. . . .
. 2
. . . g . R
y 1 r t e
0 ... 1 0 ... 0
...
t1
1
t1
0 2
. . . .
. . . .
ytr 1 0 ... 1 0 0 ... 1 etr
LINEAR MIXED MODEL
g 0 g G 0
y Xβ Zg e E Var
e 0
e 0 R
g1 g 2 ... gt
g1 2g 0 1 0
1
g2 g
2
2
G 2
g It
... ... g
...
2
g t 0 g 0 1
2
ert 0
LINEAR MIXED MODEL
g 0 g G 0
y Xβ Zg e E Var
e 0 e 0 R
Assumptions
hence, E(y) = Xβ
Var(y) = V = V(θ) = V(y) = ZGZ’ + R
βˆ (X' V
ˆ 1X) 1 X' V
ˆ 1y
• V(β) = (X’V-1X)-1
• V(Lβ) = L(X’V-1X)-1L’
• Lβ is the best linear unbiased estimate of Lβ
• Test of H0: Lβ = 0
β’L’(LX’V-1XL’)-1Lβ ~ F (approx) with df1= r(L) and df2
(Satterthwaite or Kenward-Roger)
Predictions
Pˆ L' βˆ M' gˆ
ˆ Czz )M
Var (Pˆ ) L' CxxL M' (G
PROPERTIES OF EBLUP (optional)
SE(BLUP): standard error of a random effect
SD(ˆg i ) c ii
PEV: predictor error variance
PEV 2
r 2 (ˆg i ) 1 2 1 c ii 2e
g g
r: accuracy
PEV
r (ˆg i ) r 2 (ˆg i ) 1
2g
TESTING VAR. COMPONENTS
LRT: likelihood ratio test
Hypothesis P-value
Two-sided Prob(χ2r2-r1 > d)
One-sided 0.5(1 – Prob(χ21 ≤ d))
TESTING VAR. COMPONENTS
Critical values
r2 - r 1 α = 0.05 α = 0.01
Δdf Two-sided One-sided Two-sided One-sided
1 3.84 2.71 6.63 5.41
2 5.99 4.61 9.21 7.82
3 7.81 6.25 11.34 9.84
4 9.49 7.78 13.28 11.67
5 11.07 9.24 15.09 13.39
Goodness-of-fit statistics
• AIC and BIC can be used to select/rank non-nested models
AIC = – 2×logL + 2×t
BIC = – 2×logL + 2×t×log(v)
Testing Variety
H0: σ2g = 0 against H0: σ2g > 0
d = 2 [51.737 – 44.878] = 13.72 , Δdf = 1
χ20.05 = 2.71, p-value < 0.001
Session 3
Job Structure in
ASReml
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
JOB FILE
[Job title]
[Data definition]
[Specification of file(s) to read]
[Options]
[Linear model: factors and variables]
[Linear model: variance structure(s)]
[Additional output]
[Restrictions on variance components]
Note: some options can be indicated in this file or they can be added in the
batch command line.
JOB FILE
Some manipulations/transformations
!FILTER f it will filter the variable f
!SELECT v selects observations equal to v from variable f (Above)
!=v to create/overwrite a variable with all values equal to v
!+o sums to variable the number o
!-o subtracts to variable the number o
!*o multiplies the variable by the number o
!/o divides variable by number o
!^p raises the variable to the power p
!^0 calculates the natural logarithm of the variable
!D v eliminates record with missing values or v
!M v converts values of v to missing values
!REPLACE o n replace data values o with n
READING / MANIPULATING DATA
Examples
Yield !*100 variable yield is multiplied by 100 as is read
Yield !M-9 observations with -9 are changed to missing
Yield !^0 calculates the natural log of variable yield
Ymean !=0 !+Y1 !+Y2 !/2 mean of two variables
Relevant Options
!SUMMARY provides a histogram, correlations, counts, etc. (see file .ass)
!OUTLIER performs additional outlier checks (see files .res and .yht)
!X x !Y y produces an scatter-plot for variables x and y
!SORT re-orders labels in alphabetical order
!MVINCLUDE missing values in a factor or variate are treated as zeros.
!WORKSPACE m assigns m Mbytes of memory for the fitting model
!EXTRA n forces n additional iterations after model converge
!MAXIT m indicates a maximum of m iterations
!DOPART $A indicates that different parts will be done
!PART n a specific model n within a job file (may list several parts)
!CONTINUE re-starts fitting of model from last iteration
GRAPHICAL OUTPUT
Relevant Options
!DISPLAY n selects type(s) of diagnostic plot
!NODISPLAY suppresses diagnostic plot output
!PS saves plots in ps format
!EPS saves plots in eps format
!PNG saves plots in png format
!EPS saves plots in eps format
!WMF saves plots in wmf format
!BMP saves plots in bmp format
Coding !DISPLAY n
1 = variogram
2 = histogram
4 = row and column trends
8 = perspective plot of residuals
Univariate case
y ~ <fixed dense> !r <random sparse> !f <fixed sparse>
mu the constant term or intercept (overall mean)
!r random effects to follow
!f sparse fixed effects to follow (not in ANOVA table)
mv term to estimate missing values (as fixed effects)
Examples
yield ~ mu Variety !r Block
Volume ~ mu Site Site.Block !r Mother Mother.Site !f mv
JOB FILE
Specification of Linear Models
• ASReml uses the Wilkinson and Rogers (1973) notation.
• Note that the model term A.B denotes interaction or nested effects
depending on which other terms are previously included in the model.
Examples
Volume ~ mu Site !r Genotype Site.Genotype
Volume ~ mu Site !r Site.Genotype
Yield ~ mu A.B !r Block
JOB FILE
Model functions
(to be used after an specified column, or to create new model variables).
and(t) overlays a design matrix for a model term into an existing one
at(f,n) creates a binary variable for the condition specified in a factor
fac(v) forms a factor with the values of a continuous variable
lin(f) transform the factor f into a covariate
uni(v) creates a factor with a level for every record in the data file
fav(v,y) forms a factor with the levels of a combination of 2 factors
ide(f) fits an additional factor without its genetic relationship matrix
inv(v) calculates inverse of variable v
Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
RUNNING ASReml (BATCH MODE)
Some options
Examples
>asreml –rs3c Alfalfa 1
RUNNING ASReml (JOB FILE MODE)
Some options
Example
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC VARIATION
Discrete variation
• Different phenotypic classes are easily distinguished among genotypes
• Few genes with large effect (i.e. major genes).
xi ~ Bin(n, p)
Quantitative variation
• No clear classes between genotypes. Corresponds to most economically
important traits in animal and plant breeding.
• Due to the effect of many genes that contribute to the phenotypic
variation. Every gene with a small additive effect, plus some
environmental variation (infinitesimal model, Fisher 1918).
p=μ+g+e
• Phenotypic value (p) deviates from the mean (μ) because the genotypic
component (g) and the environmental deviation (e).
• To isolate g we need to test the progeny!!!
g=a+d+i
p=μ+a+d+i+e
a is the additive component, i.e. cumulative effect of the genes or breeding
value (also known as GCA).
d is the dominance deviation, i.e. interaction between alleles or within-locus
interaction (also known as SCA).
i is the epistatic deviation, i.e. between-loci interaction and higher order
interactions.
e is the random deviation o residual.
VARIANCE COMPONENTS
Vp = Vg + Ve
Vp = Va + Vna + Ve
• In the statistical analysis (MM) the genetic variance estimates (e.g. Va) are
obtained by relating them to the causal component (e.g. σa2)
HERITABILITY
Broad sense heritability or degree of genetic determination
Estimation
• By BLUP (Best Linear Unbiased Predictor), i.e. the prediction of the
random effects from linear mixed models.
BLUP (or EBLUP)
ˆ Z' V
gˆ G ˆ 1 (y Xβˆ )
ĝ vector of random effect predictions.
Gˆ Z' C' covariance matrix between observations and random
(genetic) effects to be predicted.
V̂ variance-covariance matrix for the observations.
(y Xβˆ ) individual observations ‘corrected’ by fixed effects.
ˆ Z' V
ˆ G
g ˆ)
ˆ 1 (y Xβ
ˆg i [ 2a / 2p ] ( yi y )
ˆg i h 2 ( yi y ) Gain
Note: the expression changes depending of what trait is being evaluated (y).
SELECTION
• All kind of selection have by aim to increase frequency of favourable
alleles at loci influencing the selected trait(s)
• Types: mass, parental, family, combined, indirect, forward, backward.
Propagation
Increase population Increase
genetic gain diversity
Selected
population
Base
population
SELECTION DIFFERENTIAL (S)
Example
Assuming normal distribution, truncated selection and h2 = 0.4
25 cm 35 cm
29 cm
• In mass selection, genetic gain can be quantified as the difference between the
average breeding (e.g. additive) values from the selected and original
population, i.e.
Ga aS aP h 2 S
But i S / p then
Ga h 2 S i h 2 p
• Genetic gain depends of the selection intensity (i), heritability (h2) and the
phenotypic standard deviation.
• Here i corresponded to the selection differential
(S = μselected – μpopulation) expressed in terms of phenotypic standard deviations.
TYPE-A CORRELATIONS
Definition: Correlation between traits (pleitrophy)
Cov( p1 , p2 ) Cov( g1 , g 2 )
rg A( p ) rg A( g )
Var ( p1 ) Var ( p2 ) Var ( g1 ) Var ( g 2 )
Indirect Selection
Ga1 i2 h1 h2 rg A( a ) p1
TYPE-B CORRELATIONS
Definition: Correlation between sites
• Is a relative expression of genotype-by-environment interaction.
• It could be zero or positive (0 to 1).
• A value close to 0 indicates that the rank in one environment is very
different than the rank in another environment (i.e. low stability)
• A value close to 1 indicates that a single ranking can be used across all
environments without loss of information (i.e. high stability).
• Vaxs is the variance estimation of the site by genotype interaction.
• The following expressions represent the average correlation between sites
(if more than 2 sites are analyzed).
Va Vg
rg 2
rg 2
Va Vaxs Vg Vgxs
B (a ) B( g )
Session 5
Genetic Analyses:
Parental Models
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC MODELS
Parental Models
• Half-sib crosses / sire model.
– One parent known. Parent selection.
• Full-sib crosses model.
– Both parents known. Parent/cross selection. Add and Dom effects estimable.
• Family model.
– Both parents known. Cross selection. Add and Dom effects confounded.
• Clonal model.
– Clonally replicated individuals. Parent/cross/individual selection.
Individual Models
• Animal model.
– One or two parents known. Individual/parent selection.
• Reduced animal model.
– One or two parents known. Individual/parent selection (only individuals with
records).
HALF-SIB / SIRE MODEL
General aspects
• One parent is known (mother, sire, variety).
• The other parent is assumed to be unknown and to mate at random.
• Only additive component (Va) can be estimated.
• Useful for selection of parents (backward selection).
• Parental pedigree can (and should) be incorporated.
• Runs faster than other models (e.g. animal model).
Difficulties
• Concern about situations under non-random mating.
• Selection does not capture non-additive genetic variability.
HALF-SIB / SIRE MODEL
y Xβ Z1b Z2s e
y vector of observations
β vector of fixed effects
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)
e vector of random residual effects, ~ N(0, Iσ2)
Va = 4 σ2s Vp = σ2s + σ2
h2 = Va / Vp = 4 σ2s / [σ2s + σ2]
OPEN POLLINATION
Example: /Day1/OpenPol/OPENPOL.txt
A tree genetic study consisting on seeds from a total of 28 female parents were
collected from mass selection and tested in a RCBD together with 3 control female
parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.
The response variables of interest are total height (HT, cm) and diameter at breast
height (DBH, cm). For now we will concentrate in the response HT. The objective is
to rank the female parents for future selections and seed production. In this analysis
parental pedigree will be ignored. Note that a model can be fitted with and without
the controls included as parents.
ID REP PLOT FEMALE TYPE DBH HT
1 1 1 FEM1 Test 23.8 12.4
2 1 1 FEM1 Test 24.4 12.1
3 1 1 FEM1 Test 25.4 10.9
4 1 1 FEM1 Test 28.0 12.7
5 1 1 FEM1 Test 20.9 11.9
6 1 1 FEM1 Test 22.6 11.2
7 1 2 FEM15 Test 22.4 10.7
8 1 2 FEM15 Test 21.9 11.6
9 1 2 FEM15 Test 20.8 11.3
...
OPEN POLLINATION
Example: /Day1/OpenPol/OpenPol_.as
!RENAME !ARGS 1
Open pollination trial
ID
REP 10 !I
PLOT 34 !I
FEMALE 31 !A !SORT
TYPE 2 !A !SORT
DBH
HT
OPENPOL.TXT !SKIP 1
!PART 1
HT ~ mu REP !r FEMALE REP.PLOT
predict FEMALE
OPEN POLLINATION
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
FEMALE 31 31 0.192379 0.196155 3.48 0 P
REP.PLOT 340 340 0.518915E-01 0.529102E-01 2.58 0 P
Variance 1876 1866 1.00000 1.01963 27.74 0 P
Difficulties
• Dominance effects usually estimated with low precision, or confounded with
other effects.
• Better results obtained with a proper planning of crosses (e.g. connected
diallels).
• Need to check connectivity and number of crosses per parent (male and
female) otherwise this model cannot be fitted.
FULL-SIB: CLASSIC APPROACH
!PART 1
YIELD ~ mu REP !r FEMALE MALE FEMALE.MALE
FULL-SIB
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
FEMALE 54 54 0.775232E-01 295.857 2.32 0 P
MALE 57 57 0.826941E-01 315.590 2.28 0 P
FAMILY 182 182 0.251902 961.350 6.07 0 P
Variance 3879 3854 1.00000 3816.36 42.73 0 P
Extract solutions for every parent and family and rank!!! (.sln file)
FAMILY MODEL (Optional)
General aspects
• More common in animal breeding
• Occurs when parents are only present in a single cross.
• Parents might, or might not, be known.
• Additive and dominance component (Va and Vd) can not be separated, unless
there is a well connected parental pedigree.
• Useful for family selection or forward selection.
• Of practical use when dominance variance is known to be negligible.
Difficulties
• Dominance effects are confounded with additive effects.
• Potentially it could over-estimate future genetic gain.
FAMILY MODEL (Optional)
y X Z1b Z2 F e
β vector of fixed effects (e.g. μ, replication)
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
F vector of random family effects, ~ N(0, Aσ2F) or N(0, Iσ2F)
e vector of random residual effects, ~ N(0, Iσ2)
!MAXIT 40 !DOPART $A
!PART 1
Weight ~ mu !r Family
FAMILY MODEL (Optional)
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Family 32 32 0.768666E-01 8.12079 1.98 0 P
Variance 459 458 1.00000 105.648 14.71 0 P
General aspects
• It can estimated total genetic variability (Vg).
• If both parents are known (mother, father, family or cross) then the additive,
dominance and epistasis components (Va, Vd and Vi) can be reasonably
estimated.
• Useful for selection of parents (backward selection), crosses or specific
genotypes.
• Allows to capture, in new generations, additive, dominance and epistasis
effects.
Difficulties
• Presents same difficulties as full-sib models.
• Some confounding of the epistasis component occurs (higher order terms).
• Occasionally produces negative causal variance components.
CLONAL MODEL (Optional)
!PART 1
VOL ~ mu REP !r REP.IBLOCK FEMALE MALE FAMILY CLONE !f mv
!PART 2
VOL ~ mu REP !r REP.IBLOCK FEMALE and(MALE) FAMILY CLONE !f mv
CLONAL MODEL (Optional)
Interpreting variance components
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC MODELS
Parental Models
• Half-sib crosses / sire model.
– One parent known. Parent selection.
• Full-sib crosses model.
– Both parents known. Parent/cross selection. Add and Dom effects estimable.
• Family model.
– Both parents known. Cross selection. Add and Dom effects confounded.
• Clonal model.
– Clonally replicated individuals. Parent/cross/individual selection.
Individual Models
• Animal model.
– One or two parents known. Individual/parent selection.
• Reduced animal model.
– One or two parents known. Individual/parent selection (only individuals with
records).
INCORPORATING PEDIGREE
6
PEDIGREE
Numerator relationship matrix (A)
1 2 3 4 5 6
1 1.00 0.00 0.50 0.50 0.50 0.25
2 1.00 0.50 0.00 0.25 0.625
3 1.00 0.25 0.625 0.563
A 4
1.00 0.625 0.313
5 1.125 0.688
6
1.125
• Linked to the concept of identity by descent.
• Diagonal aii = 1 + Fi (inbreeding coefficient on individual i)
Twice the probability that two gametes taken at random from animal i will
carry identical alleles by descent.
• Off-diagonal aij numerator of the coefficient of relationship between animal
i and j.
• Several algorithms are available in ASReml to obtain this matrix.
PEDIGREE
CALCULATING THE A MATRIX
ai,i = 1 + as,d/2 = 1 + Fi
Graphically In ASReml
Indiv Male Female
1 0 0
? 1 2
2 0 0
3 1 2
4 1 0
3 5 4 3
4
6 5 2
General aspects
• Requires defining individual and parental pedigree.
• A breeding value (or GCA) is obtained for each individual in the dataset,
and for all individuals (e.g. parents) in pedigree file.
• Typically used to estimates additive component (Va) only, but it can be
extended to non-additive and maternal effects.
• Useful for selection of individuals based on additive values (forward
selection) but can be also used to select parents.
• GCA values (or EBV) of parents will be proportional to a parental model.
Difficulties
• For large datasets it can be computationally costly.
• Pedigree file could be difficult to construct/maintain and it needs to be
checked carefully.
ANIMAL / INDIVIDUAL MODEL
y Xβ Z1b Z2a e
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)
e vector of random residual effects, ~ N(0, Iσ2)
Va = σ2a
Vp = σ2a + σ2
h2 = Va / Vp = σ2a / [σ2a+ σ2]
Note: any individual that are included in the pedigree file will have a
prediction of its breeding values (even those that are not measured).
ANIMAL / INDIVIDUAL MODEL
Example: /Day1/Fish/FISH.txt
The dataset for a fish breeding program contains a total of 933 records of fish.
The objective is to fit an animal model that considers the complete pedigree. The
parental pedigree is found in the file PEDPAR.txt, but an individual pedigree
needs to be constructed. For fitting the model consider the factor SEX as a
covariate. The response of interest is days to market size (DAYSM).
!PART 1
DAYSM ~ mu SEX !r INDIV
ANIMAL / INDIVIDUAL MODEL
Source Model terms Gamma Component Comp/SE % C
INDIV 1380 1380 0.584596 2046.39 4.52 0 P
Variance 933 931 1.00000 3500.52 10.21 0 P
Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
7 mu 1 77.6 15677.14 15677.14 . <.001
5 SEX 1 888.2 21.88 21.88 A <.001
---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of DAYSM
The SIMPLE averaging set: SEX
Va = σ2a
Vp = σ2a + σ2ce + σ2
h2 = Va / Vp = σ2a / [σ2a+ σ2ce + σ2]
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
VARIANCE STRUCTURES
Direct Product
• Variance structures are specified by using direct products or two or more
matrices (, or Kronecker product).
a a a B a12B
A 11 12 A B 11
a21 a22 a21B a22B
Example
1 0 0 12 12 0 0 0 0
12
2
A 0 1 0 B
12 22
1
2 0 0 0 0
0 0 1 12 2 0 0 12 12 0 0
AB
0 0 12 22 0 0
0 0 0 0 12 12
0 12 22
0 0 0
VARIANCE STRUCTURES
Direct Sum
• The desired matrix is specified by several square matrices in a block
diagonal matrix.
Example
A1 0 0
R 3j 1 R j diag ( A1 , A 2 , A 3 ) 0 A2 0
0 0 A 3
ALFALFA EXPERIMENT
Example: /Day2/VarStruct/AlfalfaS_.as
An experiment was establish to compare 12 alfalfa varieties (labeled A-L).
These correspond to 3 different sources but the objective is to estimate
heritability of varieties regardless of its source. A total of 6 plots per variety
were established arranged in a RCB design. The response variable
corresponds to yield (tons/acre) at harvest time. It is of interest to fit a linear
model with an specific error variance for each of the different sources.
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFAS.TXT !SKIP 1 !DISPLAY 7 !SUMMARY
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 9.9 990.35 <.001
3 Block 5 25.0 24.05 <.001
Correlation/Spatial structures
CORB Banded correlation w-1
AR1 First order autoregressive 1
AR2 Second order autoregressive 2
ARMA Autoregressive and moving average 2
CORG General correlation (homogeneous) w(w - 1)/2
ANTE1 Antedependence of order 1 w(w - 1)/2
LVR Linear variance 1
VARIANCE STRUCTURES
Correlation-variance structures (homogeneous)
AR1V First order autoregressive (homog.) 2
CORUV Uniform correlation (homogenoeus) 2
CORBV Banded correlation (homogeneos) w
CORGV general correlation (homogeneous) w(w - 1)/2 + 1
Heterogeneous structures
IDH = DIAG Identity (heterogenoeus) w
AR1H First order autoregressive (heterog.) 1+w
CORUH Uniform correlation (heterogeneous) 1+w
CORBH Banded correlation (heterogeneos) 2w - 1
CORGH = US general correlation (heterogeneous) w(w - 1)/2 + w
Special structures
IEXP Isotropic Exponential 1
AEXP Anisotropic Exponential 2
OWNk User supplied G matrix k
GIVk User supplied General (Inverse) matrix 0 or 1
VARIANCE STRUCTURES
ID: identity AR1V: autocorrelation 1st order
1 0 0 0 2 0 0 0 1 1 2 3
2 0 1 0 0 0 2 0 0 1
2 1 1 2
0 0 1 0 0 2 0 2
0 1 1 1
3
0 0 0 1 0 0 0 2 2 1 1
<sections>
• Number of residual (Rj) structures to define.
R sj 1 R j
However, it is also possible to define each error structure with a direct product:
R j R j1 R j 2
VARIANCE STRUCTURES
Variance Header Line
<dimensions>
• Number of direct product of variance structures that are required to define
each of the residual, Rj, structures.
<number of G structures>
• Number of random effects (Gi, or any interaction) that are defined with
structures different than identically and independently distributed.
Note: each of this components will have to be defined in greater detail later.
VAR. STRS. - EXAMPLES
<sections>
• Number of residual structures to define.
3 1 0
1280 0 ID
1320 0 ID
2300 0 ID
3: acts as a counter (here, 3 sites)
1: only a single structure on each of the residual structures
0: no G structures defined
1280: number of observations in site 1 (sorted by site)
0: sortkey (sorting variable no specified here)
ID: VCODE corresponding to independent errors.
1320: number of observations in site 2 (sorted by site)
2300: number of observations in site 3 (sorted by site)
1 2 0
16 row AR1
20 col AR1
site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV
Note: the command !f mv keeps the missing observations and is useful for
counting observations over multiple R structures
VARIANCE STRUCTURES
3 1 1
1280 0 ID
1320 0 ID
2300 0 ID
site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV
Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
VARIANCE STRUCTURES
• Order of starting values for variance and correlation matrices is important
Variance Matrices
1 1 2 4 7
2 3 5 8
3
or
4 5 6 6 9
7 8 9 10 10
Correlation Matrices
7 1 23 7
8 4 5 1 8
or
9 6 2 4 9
10 3 5 6 10
Note: for most complex variance structures it is critical to specify starting values.
CONSTRAINTS IN VAR-COV COMP.
Next to model terms
!GP positive variance component
!GU unrestricted variance component (default)
!GF fixed variance component
Volume ~ mu Block !r Mother 0.25 Plot 0.4 !GF
!PART 3
yield ~ mu Block !r Variety
1 2 0
3 Source DIAG 0.8 0.8 0.8 !=ABA
24 0 ID !S2==1
!PART 4
!VCC 1
yield ~ mu Block !r Variety
3 1 0
24 0 ID
24 0 ID
24 0 ID
4 6
FUNCTIONS OF VAR. COMPS.
• Post-analysis procedure to calculate functions of variance components
(e.g. heritability or genetic correlations).
• Based in approximations using delta method (i.e. Taylor series approx.)
• It should not be used for statistical inference only as a rough reference.
pvc file
1 Variety 0.276798E-01
2 Variance 0.476526E-01
3 Vg 1 0.27680E-01 0.15265E-01
4 Vtotal 1 0.75332E-01 0.16972E-01
Herit = Vg 1 3/Vtotal 4= 0.3674 0.1397
Notice: The parameter estimates are followed by
their approximate standard errors.
Session 8
Multivariate Analysis /
Repeated Measures
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MULTIVARIATE ANALYSIS
General Uses
!PART 2
HT DBH ~ Trait Trait.REP !r Trait.FEMALE Trait.REP.PLOT
1 2 2
0 0 ID
Trait 0 US 1.01 1.82 7.25
Trait.FEMALE 2
Trait 0 US 0.19 0.31 0.61
FEMALE 0 ID
Trait.REP.PLOT 3
Trait 0 US 0.05 0.001 0.001 !GUFF
REP 0 ID
PLOT 0 ID
OPEN POLLINATION (bivariate)
Interpreting analysis
Source Model terms Gamma Component Comp/SE % C
Residual UnStructured 1 1 1.00196 1.00196 29.17 0 U
Residual UnStructured 2 1 1.83449 1.83449 23.69 0 U
Residual UnStructured 2 2 7.43730 7.43730 29.20 0 U
Trait.FEMALE UnStructured 1 1 0.191142 0.191142 3.44 0 U
Trait.FEMALE UnStructured 2 1 0.310167 0.310167 3.16 0 U
Trait.FEMALE UnStructured 2 2 0.705031 0.705031 3.39 0 U
Trait.REP.PLOT DIAGonal 1 0.790064E-01 0.790064E-01 5.19 0 U
Trait.REP.PLOT DIAGonal 2 -0.201829 -0.201829 -2.84 0 U
Covariance/Variance/Correlation Matrix UnStructured Residual
1.002 0.6720
1.834 7.437
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
0.1911 0.8449
0.3102 0.7050
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 Trait 2 29.2 9584.21 <.001
9 Trait.REP 18 643.1 4.82 <.001
MULTIVARIATE ANALYSIS
Strategy for fitting models in ASReml
Extensions
• Consider different sites (or years) as different traits (e.g. helps to classify
sites).
• Variance-covariance matrices can be used to ‘study’ genetic structure
(e.g. evaluating / separating genetic groups).
REPEATED MEASURES
• Very similar to multivariate analysis but every measurement point (time) is
considered as a different trait.
• Requires modelling of the mean effects (patterns) and variance structures.
• Additional modelling of fixed effects of time points is possible (e.g.
polynomials or splines).
• Convergence conflicts are still present, but to a lesser extent.
• Two modelling approaches:
- Multiple vectors: parallel vectors with, typically, US error structure.
- Single vector: stacked responses with, typically, AR1V correlations.
!PART 1
HT1 HT2 HT3 HT4 ~ Trait Trait.REP !r Trait.FEMALE
1 2 1
0 0 ID
Trait 0 US 419
556 1405
698 1846 3801
821 2306 4624 7154
Trait.FEMALE 2
Tr 0 US 36
48 74
38 70 117
61 126 223 410
FEMALE 0 ID
REPEATED MEASURES: AS MV
Interpreting analysis
Covariance/Variance/Correlation Matrix UnStructured Residual
419.7 0.7241 0.5527 0.4744
556.1 1405. 0.7989 0.7275
698.1 1847. 3801. 0.8868
822.0 2307. 4625. 7155.
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
35.60 0.9375 0.5843 0.5068
48.08 73.88 0.7499 0.7245
37.78 69.86 117.5 1.019
61.25 126.2 223.8 410.4
REPEATED MEASURES: AS UNIV
Example: /Day2/RepMeas/REPCOLS.txt
IDD Indiv Female Rep Time HT
1 1 F09 1 1 62
2 1 F09 1 2 108
3 1 F09 1 3 240
4 1 F09 1 4 411.5
5 2 F02 1 1 66
6 2 F02 1 2 154
7 2 F02 1 3 275
8 2 F02 1 4 442
9 3 F21 1 1 65
10 3 F21 1 2 116
11 3 F21 1 3 245
12 3 F21 1 4 323.1
13 4 F25 1 1 68
14 4 F25 1 2 102
15 4 F25 1 3 225
16 4 F25 1 4 350.5
17 5 F13 1 1 58
18 5 F13 1 2 170
19 5 F13 1 3 325
20 5 F13 1 4 457.2
...
REPEATED MEASURES
Example: /Day2/RepMeas/RepCols_.as
!RENAME !ARGS 1
Repeated Measures Analysis of HT - 4 meas
IDD
INDIV
FEMALE 26 !A
REP 4 !I
TIME 4 !I
HT
REPCOLS.txt !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A
!PART 1
!FILTER TIME !SELECT 1
HT ~ mu REP !r FEMALE
!PART 2
log(HT) ~ mu lin(TIME) TIME.REP !r,
!{ FEMALE lin(TIME).FEMALE !} !f mv
1 2 1
824 0 ID !S2==1
TIME 0 AR1H 0.8 0.05 0.05 0.05 0.05
FEMALE 2
2 0 CORUH -0.8 0.004 0.0001
FEMALE
REPEATED MEASURES
Interpreting analysis
Source Model terms Gamma Component Comp/SE % C
Residual AR=AutoR 4 0.798949 0.798949 82.75 0 U
Residual AR=AutoR 4 0.641964E-01 0.641964E-01 19.31 0 U
Residual AR=AutoR 4 0.464063E-01 0.464063E-01 19.32 0 U
Residual AR=AutoR 4 0.365361E-01 0.365361E-01 20.19 0 U
Residual AR=AutoR 4 0.310505E-01 0.310505E-01 20.57 0 U
FEMALE CORRelat 2 -0.807724 -0.807724 -6.53 0 U
FEMALE CORRelat 2 0.362337E-02 0.362337E-02 1.87 0 U
FEMALE CORRelat 2 0.262804E-03 0.262804E-03 2.06 0 U
Covariance/Variance/Correlation Matrix CORRelation FEMALE
0.3625E-02 -0.8078
-0.7884E-03 0.2629E-03
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 mu 1 23.2 0.37E+06 <.001
9 lin(TIME) 1 24.0 17256.68 <.001
10 TIME.REP 14 2096.0 256.63 <.001
Session 9
Multi-environment
Analysis
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MET ANALYSIS
General Uses
• Incorporates information from several experiments (over different sites or
years) to obtain overall BVs.
• Allows to estimate Genotype-by-Environment (or Genotype-by-Year)
effects, and their variance structure. Hence, it separates genetic effects into
their pure component and their interaction with site (or year).
• Provides with unbiased estimates of heritability and Type-B correlations.
• Critical to understand the genotypes structure of the population and to
define breeding strategies.
Difficulties
• Every site (or year) has its own ‘personality’ (i.e. error structure, design
effects, etc.) that needs to be combined into a single analysis.
• Amount of data can large with difficulties in fitting and convergence.
• Requires additional prior checks (e.g. EDA, coding, etc.).
MET ANALYSIS
In ASReml
Va
rg 2
Va Vaxs
B (a )
Vg
rg 2
Vg Vgxs
B( g )
MET ANALYSIS
Option 1: Simple GxE structure
• Aims at modelling a common GxE correlation.
• Common structures are: DIAG, CORUH.
• Correlation corresponds to an average value across all sites.
• It is simpler to fit, easy to converge.
• It does not allow for a better understanding of the GxE.
• Provides with average genetic values across all sites, together with GxE
deviations for each site.
• Useful for generating ranking across all sites.
• Allows for simplification of GxE term.
!PART 2
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Genotype Test.Genotype !f mv
4 1 0
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
Genotype 100 100 301.167 301.167 4.60 0 P
Test.Genotype 400 400 158.584 158.584 6.74 0 P
at(Test,1).REP.IBloc 4400 4400 1159.04 1159.04 9.75 0 P
at(Test,2).REP.IBloc 4400 4400 1960.32 1960.32 10.84 0 P
at(Test,3).REP.IBloc 4400 4400 815.989 815.989 9.18 0 P
at(Test,4).REP.IBloc 4400 4400 206.324 206.324 4.77 0 P
Variance 0 0 4390.59 4390.59 44.30 0 P
Variance 0 0 3871.67 3871.67 43.39 0 P
Variance 0 0 4130.69 4130.69 42.40 0 P
Variance 0 0 3812.02 3812.02 42.26 0 P
Test.Genotype 2
Test 0 US 520.7
392.2 563.6
256.7 376.6 392.1
384.1 268.8 200.0 356.8
Genotype 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1161.07 1161.07 9.76 0 P
at(Test,2).REP.IBloc 440 440 1961.80 1961.80 10.84 0 P
at(Test,3).REP.IBloc 440 440 816.001 816.001 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.978 207.978 4.79 0 P
Variance[ 1] 4480 0 4388.87 4388.87 44.30 0 P
Variance[ 2] 4480 0 3871.39 3871.39 43.38 0 P
Variance[ 3] 4608 0 4131.87 4131.87 42.38 0 P
Variance[ 4] 4400 0 3811.58 3811.58 42.26 0 P
Test.Genotype UnStructured 1 1 520.722 520.722 4.86 0 U
Test.Genotype UnStructured 2 1 392.218 392.218 4.21 0 U
Test.Genotype UnStructured 2 2 563.561 563.561 4.94 0 U
Test.Genotype UnStructured 3 1 256.719 256.719 3.43 0 U
Test.Genotype UnStructured 3 2 376.619 376.619 4.44 0 U
Test.Genotype UnStructured 3 3 392.056 392.056 4.65 0 U
Test.Genotype UnStructured 4 1 304.148 304.148 4.04 0 U
Test.Genotype UnStructured 4 2 268.839 268.839 3.59 0 U
Test.Genotype UnStructured 4 3 200.202 200.202 3.20 0 U
Test.Genotype UnStructured 4 4 356.775 356.775 4.66 0 U
Covariance/Variance/Correlation Matrix UnStructured Test.Genotype
520.7 0.7240 0.5682 0.7056
392.2 563.6 0.8012 0.5995
256.7 376.6 392.1 0.5353
304.1 268.8 200.2 356.8
MET ANALYSIS
BLUP values: Variant 1
Effect Level BLUP SE(BLUP)
'
MET ANALYSIS
FA model: FAk
DCD
D is a diagonal matrix such that DD diag ()
C is a correlation matrix of the form FF ' E
F is a matrix of loadings on the correlation scale
E is a diagonal matrix defined by difference (remnant).
FA model: FACVk
'
is a matrix of loadings on the covariance scale, with DF
is a diagonal matrix, with DED
MET ANALYSIS
Example Variant 2: /MultiEnv/GxE_.as
!PART 4
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Test.Genotype !f mv
4 1 1
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
Test.Genotype 2
Test 0 FA1
0.8 0.9 0.1 0.2 # 1st factor
520.7 563.6 392.1 356.8 # Site Variances
Genotype 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1159.40 1159.40 9.75 0 P
at(Test,2).REP.IBloc 440 440 1961.62 1961.62 10.84 0 P
at(Test,3).REP.IBloc 440 440 815.999 815.999 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.516 207.516 4.79 0 P
Variance[ 1] 4480 0 4389.44 4389.44 44.29 0 P
Variance[ 2] 4480 0 3871.43 3871.43 43.38 0 P
Variance[ 3] 4608 0 4131.95 4131.95 42.38 0 P
Variance[ 4] 4400 0 3811.38 3811.38 42.26 0 P
Test.Genotype FA D(LL'+E)D 1 1 0.787009 0.787009 10.71 0 U
Test.Genotype FA D(LL'+E)D 1 2 0.931814 0.931814 17.28 0 U
Test.Genotype FA D(LL'+E)D 1 3 0.818246 0.818246 11.66 0 U
Test.Genotype FA D(LL'+E)D 1 4 0.695414 0.695414 7.58 0 U
Test.Genotype FA D(LL'+E)D 0 1 519.153 519.153 4.83 0 U
Test.Genotype FA D(LL'+E)D 0 2 563.923 563.923 4.94 0 U
Test.Genotype FA D(LL'+E)D 0 3 391.055 391.055 4.63 0 U
Test.Genotype FA D(LL'+E)D 0 4 359.863 359.863 4.63 0 U
Covariance/Variance/Correlation Matrix FA D(LL'+E)D Test.Genotype
519.1 0.7333 0.6440 0.5472
396.8 563.9 0.7625 0.6480
290.2 358.1 391.1 0.5690
236.5 291.9 213.4 359.9
MET ANALYSIS
Two-Stage Analyses
1st Stage
• Every site is analysed individually with its own characteristics.
• Genotype effects are assumed fixed.
• Means and SEMs are obtained for each site.
2nd Stage
• All means (and SEMs) are combined into a single file.
• The use of !TWOSTAGEWEIGHTS generates weights (and covariance) for
each prediction and combines the analyses into a single run.
Session 10
Spatial
Analysis
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
SPATIAL ANALYSIS
General Uses
• It corresponds to an extension to the single vector repeated measures analysis.
• Incorporates information from physical positions (x and y coordinates).
• Effect: improves estimates (BLUPs) and allows for a better control of errors.
Hence, it will increase heritability and genetic gains.
• More efficient analysis (under presence of correlation) as it ‘borrows’
information from neighbours.
• ASReml can handle regular or irregular grids.
• Can be used for unreplicated trials!
Difficulties
• At the present is more like an ‘art’ that requires to evaluate several options.
• Requires the knowledge of the position of each individual experimental unit
(e.g. plant or plot).
• Additional variance components need to be estimated (i.e. convergence
problems).
SPATIAL ANALYSIS
• Gradients or Trends
Linear trends
Polynomial functions, e.g. f(xc, yc) = + 1xc + 2yc + 3 xc2 yc + 4xc yc2
Row or Column effects (random).
• Patches
Incomplete Blocks
Spatial Error Structures, e.g. AR1 AR1 +
Var (eij) = s2 + ms2
Cov (eij , ei’j’) = s2 ρxhx ρyhy
SPATIAL ANALYSIS
4g2
h2
g2 (|xdx| |ydy| ) e2 02
mean{PEV (g)}
h2
1
2g
PEV
SPATIAL ANALYSIS
Comparing spatial models
• Use LRT when models are nested and have the same fixed effect terms.
• Compare AIC (Akaike Information Criteria) and BIC (Bayesian
Information Criteria) to select among non-nested models (but with same
fixed effect terms).
• Use a h2PEV to compare among different models.
• Calculate one of the proposed R2 expressions for mixed models.
!PART 1
YA ~ mu REP !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 0 ID
16 0 ID
!PART 2
YA ~ mu REP fac(Y) fac(X) !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 X AR1 0.3
16 Y AR1 0.3
SPATIAL TRIAL
Interpreting variograms
SPATIAL TRIAL
Traditional Analysis
LogL=-55.4337 S2= 0.40323 252 df
Spatial Analysis
LogL=-61.9450 S2= 0.41594 224 df
Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
11 mu 1 13.8 6143.27 <.001
2 REP 3 5.6 4.27 0.062
12 fac(Y) 14 14.8 3.33 0.014
13 fac(X) 14 43.1 1.60 0.119
SPATIAL ANALYSIS
BLUP values
Traditional Spatial
Female BLUP SE(BLUP) BLUP SE(BLUP)
1 -0.215 0.197 -0.277 0.189
2 0.204 0.197 0.191 0.190
3 -0.154 0.197 -0.129 0.188
4 -0.099 0.197 -0.207 0.189
Heritabilites
Traditional Spatial
Va 0.421 0.465
Vp 0.790 0.634
mean(PEV) 0.039 0.036
h2 0.532 0.733
h2pev 0.631 0.693
UNREPLICATED TRIALS (UR)
11 C2 24 112 23 69 C1 96 22 6 34 C1
85 101 48 C1 28 7 89 60 C2 108 74 56
47 C1 10 43 C2 16 52 5 38 33 C2 93
65 111 64 100 81 104 C2 78 C1 113 21 106
12 C2 44 68 42 C1 97 17 32 73 C1 35
25 C1 27 C2 15 88 29 4 53 C2 55 75
102 84 1 49 C1 61 70 C2 18 95 37 C1
46 86 C2 63 2 51 79 39 59 92 C2 57
66 13 C1 82 41 98 C2 90 C1 77 20 36
C1 45 83 87 C2 62 3 30 72 54 105 76
26 C2 9 14 50 8 40 C1 31 19 C2 C1
110 103 67 C1 99 80 C2 71 91 58 109 94
UNREPLICATED TRIALS (UR)
Example: /Day2/UnRep/PEPPER.TXT
An unreplicated pepper trial was established to evaluate a total of 824 genotypes
planted in single plots and arranged as a RCBD with 4 blocks. In addition, a total of
10 control genotypes were planted with 20 replications each (i.e. 5 replications per
block). All these individuals were arranged in a 32x32 grid, and the response variable
yield, YD, was obtained. It is of interest to rank all the single replicated genotypes.
!DOPART $A !MAXIT 50
!PART 1
YD ~ mu !r Rep Gens !f mv
1 2 0
32 0 ID
32 0 ID
!PART 2
YD ~ mu !r Rep Gens
1 2 0
32 X AR1 0.5
32 Y AR1 0.5
UNREPLICATED TRIALS (UR)
Traditional Analysis
LogL=-478.184 S2= 0.74805 1023 df
Spatial Analysis
LogL=-468.587 S2= 0.77062 1023 df
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GLMM
General Uses
• It corresponds to an extension of the linear mixed models to situations with a
distribution other than the Normal, typically, Binomial and Poisson.
• It needs the specification of the distribution, together with a link function that
connects the response to the explanatory variables of the linear model.
• For linear models, estimation of parameters is based in maximum likelihood
estimation (MLE), and therefore it can run into problems.
• For linear mixed models, estimation of parameters is based in an
approximation to the MLE.
• Testing is done using a LRT, mainly in comparison of the mean deviance.
Difficulties
• Interpretation, and calculation of genetic parameters are more difficult as we
are in a different scale.
• Convergence problems are common, and with unbalanced data it is common
to have biologically inconsistent estimates.
BINOMIAL RESPONSES
General expression g( μ) Xβ Zg
p
Link: logit loge Xβ Zg
1 p
μ 1 exp( Xβ Zg )
Back-transformed model p
ni ni 1 exp( Xβ Zg )
p(1 - p)
Variance expression Var (p)
ni
over- under-dispersion parameter
General expression g( μ) Xβ Zg
Alternatives
• Perform a transformation of the original data, and then back-transform
predictions.
• Assume a normal distribution (by the CLT), whenever values are relatively
large.
• Collapse data into a higher strata (e.g. PLOT).
GLMM MODEL
Heritability in GLMM (Binomial)
• Calculation is not direct and it requires an approximation.
• Several alternatives are available in the literature
Logit approach
4 2
e2 2 / 3
2
hlogit 2 s 2 with
s e
Distributional approach
4 2s
h2
2
s p (1 p )
Bin
BINOMIAL MODEL
Example: /Day2/GLMM/SALMONAB.TXT
A salmon breeding program evaluated a total of 933 records of fish originated
from 124 families. The objective is to select individuals that will constitute the
parents for the next generation. The response variables are MARKETA and
MARKETB, which are binary responses that indicate if a given individual makes it
for a given market category. The linear model to fit should consider the full
pedigree and the factor SEX as a covariate.
INDIV Sire Dam DaysM Sex MarketA MarketB
1001 564 727 741.46 1 1 1
1002 564 727 500.09 2 1 1
1003 564 727 495.07 1 1 1
1004 564 727 506.25 2 0 0
1005 564 727 593.21 2 1 1
1006 564 727 671.1 1 1 1
1007 564 727 523.48 1 1 1
1008 564 727 531.33 1 1 1
1009 564 727 446.02 2 1 0
1010 564 727 599.2 1 1 1
1011 564 727 509.38 2 1 1
1012 564 727 643.45 2 1 1
1013 607 707 711.68 1 1 1
...
BINOMIAL MODEL
Example: /Day2/GLMM/GLMFish_.as
!RENAME !ARGS 1
Breeding Program Salmon
INDIV 2040 !P !SORT
SIRE 115 !I
DAM 124 !I
DAYSM
SEX 2 !I
MARKETA
MARKETB
PEDIND.TXT !SKIP 1 !MAKE
SALMONAB.TXT !SKIP 1
!PART 1
MARKETA !BIN !AOD ~ mu SEX !r INDIV
predict INDIV
!PART 2
MARKETA ~ mu SEX !r INDIV
BINOMIAL MODEL
Interpreting output
Analysis of Deviance Table for MARKETA
Source of Variation df Deviance Derived F
SEX 1 9.20 15.964
Deviance from GLM fit 931 536.33
Variance heterogeneity factor [Deviance/DF] 0.58
Notice: The Derived F is calculated assuming 931 degrees of freedom
which will usually be a false assumption under a mixed model.
The Analysis of Variance below is of the 'working' variable.
Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
8 mu 1 162.4 276.33 134.54 . <.001
5 SEX 1 931.0 4.36 4.36 A 0.038
GLMM MODEL
Heritability
4 2
0.575
hlogit 2
2 s
0.168
s e 0.575 / 4 1 3.290
2
Predictions
Predicted values of MARKETA
The SIMPLE averaging set: SEX
Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
RATIONALE
• Genetic improvement aims to select the best individuals for the production
and breeding populations. However, traditional breeding is a long and
expensive process, with many traits difficult to measure.
• More than 20 years ago molecular markers became the promise to aid
breeders in selection using Marker Assisted Selection (MAS). To perform
MAS QTL or association genetics type of analysis was required.
• Construct prediction
100
90
80
a models using the current
b breeding population phenotype 100
90
80
70 Range: 0.980 ~ 1.051 70 Range: -0.0227 ~ 0.0256
Density
Mean: 1.001
Density
50 50
SD: 0.00519 SD: 0.00455
40 40
30 30
20 20
140 120
c d
120 100
100 Range: 0.983 ~ 1.043 Range: -0.0190 ~ 0.0214
80
Mean: 1.001 Mean: -0.00021
Density
80
Density
60 SD: 0.00380
SD: 0.00434
60
40
40
20
20
0 0
Diagonal elements of genetic relationship matrix Off-diagonal elements of genetic relationship matrix
(Adjusted estimates) (Adjusted estimates)
0.6
e
0.5
0.4
Breeding Value (BV) + Molecular Markers
Density
0.3
0.2
p
BV = 1m + åW j m j + e
0.1
0
Prediction model construction:
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1
0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9
Z-score
j=1
Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of
the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
GENOMIC SELECTION
• Future individuals are genotyped to be use as input on prediction models to
select superior genotypes in next cycles
Genotypes Generation i Molecular Markers
Supplementary Figures
100 100
90 a 90 b
80 80
70 Range: 0.980 ~ 1.051 70 Range: -0.0227 ~ 0.0256
60 60 Mean: -0.00026
Density
Mean: 1.001
Density
50 50
SD: 0.00519 SD: 0.00455
40 40
30 30
20 20
10
10
0
0
80
Density
60 SD: 0.00380
SD: 0.00434
60
40
40
20
20
0 0
Diagonal elements of genetic relationship matrix Off-diagonal elements of genetic relationship matrix
j=1
(Adjusted estimates) (Adjusted estimates)
0.6
e
0.5
0.4
Density
0.3
0.2
0.1
0
Deployment
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1
0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9
Z-score
Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of
the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
BENEFITS OF GS
Note
• To apply GS successfully the constructed models need to accurately predict
the genetic performance.
GENOMIC SELECTION
• The level of linkage disequilibrium (LD) between the markers and the QTL
(effective population size and genotyping density).
– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.
• If the markers are capturing all genetic variation, then we can assume that:
• If we also assume: a Wm
V ( m ) I 2m
• Then we get:
V ( a ) W W' 2m
which is a covariance matrix for the individual breeding values a
FROM MARKERS TO GA
• Ideally, we want to model this covariance using the same classical Linear
Mixed Model framework, therefore, it would be desirable to have this
matrix in terms of σ2a
2
q m
2 ALL _ SNPs 2 2
2p a
i 1
i1
a m ALL _ SNPs
i i 2 pi qi
WW ' 2a
by replacing σ2m. V( a ) GA 2a
2 p q
i
i i
ANIMAL MODEL GBLUP
y Xβ Z1b Z2a e
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, GAσ2a)
e vector of random residual effects, ~ N(0, Iσ2)
Note:
• The variance-covariance matrix (GA) of the additive effects is now
derived from molecular markers, and it replaces the old A matrix.
GBLUP
• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the
same framework than BLUP analysis, but replaces:
– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.
• GA is also known as observed relationship matrix or genomic matrix.
Problem:
• GA matrix is usually not positive definite
Solution:
• Bending the matrix (e.g. diag(GA) + 0.00001).
• Blending the matrix (e.g. GA* = 0.99 GA + 0.01 A).
GBLUP
• There are several different algorithms to compute the GA matrix from SNP
data:
• Hayes and Goddard (2008)
• Van Raden (2008) – 2 methods
• Yang et al. (2010) – Human genetics
• The relationship matrix (GA) is computed using a given algorithm from other
software (R, Fortran, etc.) based on molecular markers, and then supplied to
ASReml.
Options
!SKIP [n]
!DENSEGRM, !DENSEGIV
!SAVEGIV [f] default dense format, use f = 1 for sparse format
Warning
• The number and order of levels have to match perfectly the ones used for
the associated factor, e.g. animalID, read in the data.
GBLUP in ASReml
How to associate the G matrix with the genetic factor?
Warning: The number and order of levels have to match the ones used for
the associated factor read in the data.
GBLUP in ASReml
Example: /GBLUP/
An experiment consisting in evaluating a total of 10 individuals originating from
full-sib families of 4 sires and 4 dams. The objective is to fit a parental model
(i.e. select sires) that considers the molecular pedigree information.
DATA.txt PEDSIRE.txt
Sire 1
Sire 0 GIV1 200
predict Sire
GBLUP in ASReml
Predictions for ‘new’ individuals
10 20 30 40 50 60
1.023 0.012 -0.036 0.364 0.083 0.176
0.012 0.992 0.226 0.023 0.023 0.508
!RENAME !ARGS 4
Evaluating GBLUP
INDIV 10 !I
Sire 4 !P #!I
Dam 3 !I
Resp
DUMMYPED.txt !MAKE !SKIP 1
GMATRIX6.grm !SKIP 1
DATA.txt !SKIP 1 !DISPLAY 7 !DOPART $A