0% found this document useful (0 votes)

337 views211 pages

ASReml Workshop PDF

Uploaded by

丁曦

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

337 views211 pages

ASReml Workshop PDF

Uploaded by

丁曦

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 211

Analysis of Experiments using ASReml:

with emphasis on breeding trials ©

Salvador A. Gezan
[email protected]

Patricio R. Muñoz
[email protected]

October, 2014
CONTENTS

Session
1 Introduction to ASReml
2 Introduction to Linear Mixed Models
3 Job Structure in ASReml
4 Breeding Theory
5 Genetic Analyses: Parental Models
6 Genetic Anayses: Animal Models
5 Variance Structures in ASReml
6 Multivariate Analysis
7 Multi-environment Analysis
8 Spatial Analyses
9 Generalized Linear Mixed Models
10 Introduction to GBLUP
11 GBLUP in ASReml
Session 1
Introduction to
ASReml

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
WHAT is ASReml?

“ASReml is an statistical packages that fits linear mixed models using

Residual Maximum Likelihood (REML)”

“Typical applications include the analysis of (un)balanced longitudinal data,

repeated measures analysis, the analysis of (un)balanced designed
experiments, the analysis of multi-environment trials, the analysis of both
univariate and multivariate animal breeding, genetics data and the analysis
of regular or irregular spatial data.”

ASReml uses the Average Information (AI) algorithm and sparse matrix
methods.

• Useful for analysis of large and complex dataset.

• Very flexible to model a wide range of variance models for random effects
or error structures (however, complex to program).
HOW TO GET ASReml?
Distributor Page
http://www.vsni.co.uk/products/asreml (version 3)

Platforms
Windows 98/ME/2000/XP/Vista/Windows7
Linux
Apple Macintosh

Interface
DOS (edit)
Windows (Notepad, ASReml-W)
R (or S-plus)
Text editors (e.g. ConTEXT)
GSView (graphical viewer)
Terminal (Mac)
WHERE TO GET HELP?

Official Documentation

c:\Program Files\Asreml3\Doc\

UserGuide.pdf (use Find window for searching)

UpdateR3.pdf

Webpages

uncronopio.org/ASReml/HomePage (cookbook)
http://www.vsni.co.uk/software/asreml/htmlhelp/ (distributor page)
www.vsni.co.uk/forum (user forum)
STEPS FOR AN ANALYSIS

• Identify the problem and experimental design / observational study.

• Detail treatment and design structure.
• Specify hypotheses / components of interest.
• Collect and prepare data file (e.g. Excel, Access).
• Perform initial data validation and exploratory data analysis (EDA) in
statistical software (e.g. SAS, R, GenStat).

Definition / modification of linear model.

Running / fitting of linear model.
Checking output.

• Extract final output.

• Report analysis.
STEPS FOR AN ANALYSIS IN ASReml
• Prepare ASCII data file (any ASCII editor).
• Prepare a job file (.as, e.g. ASReml-W, ConTEXT).
• Run analysis in ASReml (submit job).
• Check diagnostic plots and output.
• Extract results from output files (e.g. .asr, .sln, .yht).
• Review, revise, re-run fitted model.
• Report analysis.
ALFALFA EXPERIMENT
Example: /Day1/Alfalfa/ALFALFA.txt
An experiment was establish to compare 12 alfalfa varieties (labeled A-L).
These correspond to 3 different sources but the objective is to estimate
heritability of varieties regardless of its source. A total of 6 plots per variety
were established arranged in a RCB design. The response variable corresponds
to yield (tons/acre) at harvest time.

Source Variety Bk1 Bk2 Bk3 Bk4 Bk5 Bk6

1 A 2.17 1.88 1.62 2.34 1.58 1.66
1 B 1.58 1.26 1.22 1.59 1.25 0.94
1 C 2.29 1.60 1.67 1.91 1.39 1.12
1 D 2.23 2.01 1.82 2.10 1.66 1.10
2 E 2.33 2.01 1.70 1.78 1.42 1.35
2 F 1.38 1.30 1.85 1.09 1.13 1.06
2 G 1.86 1.70 1.81 1.54 1.67 0.88
2 H 2.27 1.81 2.01 1.40 1.31 1.06
3 I 1.75 1.95 2.13 1.78 1.31 1.30
3 J 1.52 1.47 1.80 1.37 1.01 1.31
3 K 1.55 1.61 1.82 1.56 1.23 1.13
3 L 1.56 1.72 1.99 1.55 1.51 1.33
ALFALFA EXPERIMENT
Consider a model with block as fixed and variety as random effects.

yield = µ + block + variety + error

yij    i  g j  eij
yij observation belonging to ith treatment jth block
αi fixed effect of the ith block
gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2

i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ALFALFA EXPERIMENT
Data file: /Day1/Alfalfa/ALFALFA.txt
Source Variety Block Resp
1 A 1 2.17
1 B 1 1.58
1 C 1 2.29
1 D 1 2.23
2 E 1 2.33
2 F 1 1.38
2 G 1 1.86
2 H 1 2.27
3 I 1 1.75
3 J 1 1.52
3 K 1 1.55
3 L 1 1.56
...
3 J 6 1.31
3 K 6 1.13
3 L 6 1.33
ALFALFA EXPERIMENT
Job file: /Day1/Alfalfa/Alfalfa.as
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I # Not used
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFA.txt !SKIP 1

!DISPLAY 7 !SUMMARY
yield ~ mu Block !r Variety
predict Variety !SED !TDIFF !PLOT

Some syntax
~ separates response from the list of fixed and random terms.
! Used for identification of option.
# Comment following (skips rest of line).
ALFALFA EXPERIMENT
ALFALFA EXPERIMENT
ASReml 3.0 [01 Jan 2009] Alfalfa experiment - 12 varieties - Response Yield
Build gt [26 Nov 2010] 32 bit
28 Sep 2013 16:28:33.369 32 Mbyte Windows Alfalfa
Licensed to: UFL 31-dec-2013
***********************************************************
* Contact [email protected] for licensing and support *
***************************************************** ARG *
Folder: C:\WORK\ASReml\ASReml_2013\Distribute_Instr\Day1\Alfalfa
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
QUALIFIERS: !SKIP 1 !DISPLAY 7 !SUMMARY
Reading Alfalfa.txt FREE FORMAT skipping 1 lines

Univariate analysis of yield

Summary of 72 records retained of 72 read

Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn

1 Source 3 0 0 1 2.0000 3
2 Variety 12 0 0 1 6.5000 12
3 Block 6 0 0 1 3.5000 6
4 yield Variate 0 0 0.8800 1.597 2.340 0.3584
5 mu 1
QUALIFIERS: predict Variety !SED !TDIFF !PLOT
Forming 19 equations: 7 dense.
Initial updates will be shrunk by factor 0.316
Notice: 1 singularities detected in design matrix.
1 LogL= 48.7345 S2= 0.61974E-01 66 df 0.1000 1.000
2 LogL= 50.0218 S2= 0.57316E-01 66 df 0.1705 1.000
3 LogL= 51.1506 S2= 0.52550E-01 66 df 0.2957 1.000
4 LogL= 51.6976 S2= 0.48748E-01 66 df 0.4902 1.000
5 LogL= 51.7366 S2= 0.47751E-01 66 df 0.5717 1.000
6 LogL= 51.7370 S2= 0.47654E-01 66 df 0.5808 1.000
7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000
ALFALFA EXPERIMENT
Final parameter values 0.58087 1.0000

- - - Results from analysis of yield - - -

Approximate stratum variance decomposition

Stratum Degrees-Freedom Variance Component Coefficients
Variety 11.00 0.213732 6.0 1.0
Residual Variance 55.00 0.476526E-01 0.0 1.0

Source Model terms Gamma Component Comp/SE % C

Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001
Notice: The DenDF values are calculated ignoring fixed/boundary/singular
variance parameters using algebraic derivatives.

Solution Standard Error T-value T-prev

3 Block
2 -0.180833 0.891185E-01 -2.03
3 -0.875000E-01 0.891185E-01 -0.98 1.05
4 -0.206667 0.891185E-01 -2.32 -1.34
5 -0.501667 0.891185E-01 -5.63 -3.31
6 -0.687500 0.891185E-01 -7.71 -2.09
5 mu
1 1.87417 0.792320E-01 23.65
2 Variety 12 effects fitted
SLOPES FOR LOG(ABS(RES)) on LOG(PV) for Section 1
0.87
Finished: 28 Sep 2013 16:28:34.002 LogL Converged
ALFALFA EXPERIMENT
a experiment - 12 varieties - Response Yield Residuals vs Fitted values
Residuals (Y)-0.3828:0.4563 Fitted values (X) 0.95733: 2.09034

Alfalfa experi ment - 12 varieties - Response Yield vE_1_A

Histogram of residuals 28 Sep 2013 16:28:33

Peak Count: 6
Range: -0.382836 0.456330

0
ALFALFA EXPERIMENT
Interpreting output
LogL Converged

Approximate stratum variance decomposition

Stratum Degrees-Freedom Variance Component Coefficients
Variety 11.00 0.213732 6.0 1.0
Residual Variance 55.00 0.476526E-01 0.0 1.0

Source Model terms Gamma Component Comp/SE % C

Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 11.0 858.95 <.001
3 Block 5 55.0 17.42 <.001

gi ~ N[0,σg2] sg2 = 0.0277

eij ~ N[0,σ2] s2 = 0.0477
H2 = 0.0277/(0.0277 + 0.0477) = 0.367

Source of Num Den Variance P-value

variation df df ratio

Block 5 55 17.42 < 0.001

ALFALFA EXPERIMENT
Interpreting output
Alfalfa experiment - 12 varieties - Response Yield 19 Feb 2012 20:34:11
Alfalfa

Ecode is E for Estimable, * for Not Estimable

The predictions are obtained by averaging across the hypertable

calculated from model terms constructed solely from factors
in the averaging and classify sets.
Use !AVERAGE to move ignored factors into the averaging set.

---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of yield
The SIMPLE averaging set: Block

Variety Predicted_Value Standard_Error Ecode

A 1.8130 0.0795 E
B 1.3714 0.0795 E
C 1.6485 0.0795 E
D 1.7702 0.0795 E
E 1.7275 0.0795 E
F 1.3675 0.0795 E
G 1.5812 0.0795 E
H 1.6330 0.0795 E
I 1.6796 0.0795 E
J 1.4542 0.0795 E
K 1.5086 0.0795 E
L 1.6071 0.0795 E
ASReml FILES

.apj Project file created with ASReml-W

.as Model and job specifications
.ass Summary statistics for variables from data set
.asr Report output of analysis and summary job
.aov Details of ANOVA table calculations
.sln Solutions of fixed and random effects
.pvs Report predictions and their standard errors
.res Residual statistics and basic residual plots
.ps Graphic files in PS format
.vvp Matrix of variance of variance components
.yht Residuals, predicted and hat values
.pin Calculations of functions of variance components
.pvc Report calculations of functions of variance components
Session 2
Introduction to
Linear Mixed Models

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MIXED MODELS
• Mixed models extend the linear model by allowing a more flexible
specification of the errors (and other random factors). Hence, it allows for a
different type of inference and also allows to incorporate correlation and
heterogeneous variances between the observations.

• Fixed effects: are those factors whose levels are selected by a nonrandom
process or whose levels consist of the entire population of possible levels.
Inferences are made only to those levels included in the study. Hint: all
levels of interest are in your data set.

• Random effects: a factor where its levels consist of a random sample of

levels from a population of possible levels. The inference is about the
population of levels, not just the subset of levels included in the study.

• Mixed linear models contain both random and fixed effects.

ALFALFA EXPERIMENT
yij observation belonging to ith treatment jth block

yij    i  g j  eij

αi fixed effect of the ith block

gj random effect of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2

i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
MODEL FOR A RCBD

Dataset: two factors to consider: one defining the block to which each
experimental unit is allocated, and the other to the treatment applied
to each unit.

yij    i  g j  eij
where,
yij observation belonging to the ith treatment jth block, i = 1 … r, j = 1 … t
μ is the population mean
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2

gi ~ N[0,σg2]
eij ~ N[0,σ2]
MODEL COMPONENTS
response = systematic component + random component
response = structural component + explanatory component + random component

Structural component (or blocking structure)

• Concerned the underlying variability (heterogeneity) and structure of the
experimental or measurement units.
• “Controls” different sources of natural variation amongst the units using
factors (e.g. blocks) or variates (e.g. covariates).

Explanatory component (or treatment structure)

• Defines the different treatments (or treatment combinations) applied to the
experimental units.
• Provides information about the differences in response caused by the
different treatments and answers the questions of interest.

Multi-stratum ANOVA: makes explicit the separation between blocks (or the
more general structure of units) and treatments.
MIXED MODELS

Hypothesis of interest

Fixed effects: H0: µ1 = µ2 = … = µt

H1: µi ≠ µj for some i, j in the set 1 … t
(i.e. is there a significant treatment effect)

Test statistic: F or t

Random effects: H0: σg2 = 0

H1: σg2 > 0

(i.e. is there a significant variation due to the random effects)

Test statistic: Chi-square (likelihood ratio test)

ALFALFA EXPERIMENT
Consider a model with block as fixed and variety as random effects.

yield = µ + block + variety + error

yij    i  g j  eij
yij observation belonging to ith treatment jth block
αi fixed effects of the ith block
gj random effects of the jth variety, E(gj) = 0, V(gj) = σg2
eij random error of the ijth observation, E(eij) = 0, V(eij) = σ2

i = 1, … , 6 (r blocks)
j = 1, … , 12 (t treatments)
ALFALFA EXPERIMENT
yield = µ + block + variety + error

y  Xβ  Zg  e
 y11  1 1 ... 0 1 0 ... 0 e11   2g 0
 .       .   
   .   .     2g
G 
 .   .   .   .   ... 
 .   .   .   .   
        g1     0  2g 
 y1r  1 1 ... 0 1  0 0 ... 1  g 2   et1 
 .  1 0 ... 0   1 0 ... 0    .
    .    .  .   2 0
 .   .  .  .    
.   2
 .   .     .  g   .  R 
 y  1  r   t  e   
0 ... 1 0 ... 0
...
 t1  
1
 t1   
    0  2 
 .   .   .   . 
 .   .   .   . 
 ytr  1 0 ... 1 0 0 ... 1  etr 
       
LINEAR MIXED MODEL

g  0  g  G 0 
y  Xβ  Zg  e E     Var     
 e  0    
e 0 R 

X (n x r) design matrix for fixed effects

β (r x 1) vector of fixed effects
Z (n x t) design matrix for random effects
g (t x 1) vector of random effects
e (n x 1) vector of random errors
G (t x t) matrix of variance-covariance of random effects
R (n x n) matrix of variance-covariance of random errors
ALFALFA EXPERIMENT

g1 g 2 ... gt
g1  2g 0 1 0
   1 
g2  g
2
  2  
G   2
g It
...  ...  g
 ... 
 2  
g t  0  g   0 1 

e12 e12 ... ert

e11  2 0
 
e12  2   2I
R
 ...  rt

 2
ert  0  
LINEAR MIXED MODEL

g  0  g  G 0 
y  Xβ  Zg  e E     Var     
 e  0  e   0 R 
Assumptions

• Random effects: E(g) = 0, V(g) = G = G(θ)

• Deviations: E(e) = 0, V(e) = R = R(θ)
• g and e independent.

hence, E(y) = Xβ
Var(y) = V = V(θ) = V(y) = ZGZ’ + R

Note: normality assumptions can be made about g and e.

g ~ MVN(0, G) and e ~ MVN(0, R)

VARIANCE COMPONENTS
• Variance components need to be estimated before obtaining estimates of
fixed/random effects and performing any type of inference.
^ ^
G = G(θ) ^ ^ ^ ^
^
θ→ ^ V = V(θ) = V(y) = ZGZ’ + R
^
R = R(θ)
• Restricted/residual maximum likelihood (REML) is a likelihood-based
method used to estimate these variance components and is based assuming
that both g and e follow a multivariate normal distribution.
• The REML variance component estimates are later used to estimate the
solution of fixed and random effects.
• Henderson (1950) derived the Mixed Model Equations (MME) to obtain
the solutions of all affects simultaneously:

βˆ  (X' Vˆ 1X) 1 X' V

ˆ 1y BLUE → EBLUE
gˆ  Gˆ Z' V
ˆ 1 (y  Xβˆ ) BLUP → EBLUP
VARIANCE STRUCTURES
ID: identity AR1V: autocorrelation 1st order
1 0 0 0   2 0 0 0 1 1 2 3 
  
2 0 1 0 0  0 2 0 0  1 
  2  1 1 2 
0 0 1 0  0 2 0  2
0  1 1 1 
     3 
0 0 0 1  0 0 0  2   2 1 1 

DIAG: diagonal CORUH: uniform heterogeneous

12 0 0 0  12 1  2 1 3 1  4 
   
0  22 0 0 1  2  22 2 3 2  4 
0 0 32 0 1 3 2 3 32 3 4 
   
 0 0 0  24  1  4 2  4 3 4  24 

CORUV: uniform correlation US: unstructured

1    12  22  22  22  112
12
2
13
2
14
2

  
2  1    22 12  22  22   2
  2
 2
 2 
   12 22 23 24 
  1   22  22 12  22  13  23 33 34
2 2 2 2 
     2 2 
   1   22  22  22 12  
 14  2
24  2
34  44 

CORRELATION STRUCTURES
CORU: unform correlation CORUB: banded correlation
1    1 1 2 3 
  
 1   1 1 1 2 
  1   2 1 1 1 
   
   1 3 2 1 1 

AR1: autocorrelation 1st order CORG: general correlation

1 1 2 3   1 12 13 14 
 1   
 1  
 1 1 2   12 23 24 
 2 13  23 1 34 
1 1 1   
 3  14  24 34 1 
 2 1 1 
PROPERTIES OF EBLUE (optional)

βˆ  (X' V
ˆ 1X) 1 X' V
ˆ 1y

• V(β) = (X’V-1X)-1
• V(Lβ) = L(X’V-1X)-1L’
• Lβ is the best linear unbiased estimate of Lβ

• Test of H0: Lβ = 0
β’L’(LX’V-1XL’)-1Lβ ~ F (approx) with df1= r(L) and df2
(Satterthwaite or Kenward-Roger)

• 100(1-α)% confidence interval for l’β

l’β ± zα/2 l’(X’V-1X)-1l

PROPERTIES OF EBLUP (optional)

βˆ   X' R
ˆ -1 X X' R ˆ -1 Z  ˆ -1 Y
 X' R
    ˆ -1 ˆ ˆ -1   ˆ -1 
g   Z' R X Z' R Z  G 
ˆ -1
 Z' R y 

βˆ  Cxx Cxz   X' R
ˆ -1 y  ˆ )  C xx
    zx Var (β
zz   ˆ -1 
gˆ  C C   Z' R y 
Var (ˆg)  Gˆ  C zz
Var (g - ˆg)  C zz

Predictions

• Linear Combination of a function of fixed and random effects:

Pˆ  L' βˆ  M' gˆ
ˆ  Czz )M
Var (Pˆ )  L' CxxL  M' (G
PROPERTIES OF EBLUP (optional)
SE(BLUP): standard error of a random effect

SD(ˆg i )  c ii
PEV: predictor error variance

PEV( ˆg i )  cii  e2  (1  r 2 )  e2

r2: reliability (correlation between true and predicted BV)

PEV  2
r 2 (ˆg i )  1  2  1  c ii  2e
g g
r: accuracy
PEV
r (ˆg i )  r 2 (ˆg i )  1 
 2g
TESTING VAR. COMPONENTS
LRT: likelihood ratio test

• Based on asymptotic derivations.

• Used to compare nested models and is valid if the fixed effects are the same
(under REML).
• Examples: H0: ρ = 0 against H0: ρ ≠ 0
H0: σ2g = 0 against H0: σ2g > 0

• Test Statistic: d = 2 [ logL2 – logL1] ~ χ2r2-r1

Hypothesis P-value
Two-sided Prob(χ2r2-r1 > d)
One-sided 0.5(1 – Prob(χ21 ≤ d))
TESTING VAR. COMPONENTS
Critical values
r2 - r 1 α = 0.05 α = 0.01
Δdf Two-sided One-sided Two-sided One-sided
1 3.84 2.71 6.63 5.41
2 5.99 4.61 9.21 7.82
3 7.81 6.25 11.34 9.84
4 9.49 7.78 13.28 11.67
5 11.07 9.24 15.09 13.39

Goodness-of-fit statistics
• AIC and BIC can be used to select/rank non-nested models
AIC = – 2×logL + 2×t
BIC = – 2×logL + 2×t×log(v)

t number of variance parameters in the model

v residual degrees of freedom, v = n – p
ALFALFA EXPERIMENT
Testing Genetic variation
H0: H2 = 0 against H0: H2 > 0

Model with Variety

7 LogL= 51.7370 S2= 0.47653E-01 66 df 0.5809 1.000

Source Model terms Gamma Component Comp/SE % C

Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P

Model without Variety

2 LogL= 44.8781 S2= 0.75332E-01 66 df 1.000

Source Model terms Gamma Component Comp/SE % C

Variance 72 66 1.00000 0.753324E-01 5.74 0 P

Testing Variety
H0: σ2g = 0 against H0: σ2g > 0
d = 2 [51.737 – 44.878] = 13.72 , Δdf = 1
χ20.05 = 2.71, p-value < 0.001
Session 3
Job Structure in
ASReml

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
JOB FILE

STRUCTURE .as FILE

PART A: Data definition and reading of data set.
PART B: Definition of analysis (options, linear model, output).

[Job title]
[Data definition]
[Specification of file(s) to read]
[Options]
[Linear model: factors and variables]
[Linear model: variance structure(s)]
[Additional output]
[Restrictions on variance components]

Note: some options can be indicated in this file or they can be added in the
batch command line.
JOB FILE

General Relevant File Syntax

~ separates response from the list of fixed and random terms.
! used for identification of option.
 comment following (skips rest of line).
, model specification continues on next line.
$ specifies an user-input option from commands.

Basic Model Syntax Operators

. interaction (e.g. A.B, interaction A and B).

/ forms nested factor expansion.
* forms crossed factor expansion.
+ treated as a space.
- excludes model term from model.
READING / MANIPULATING DATA

Column variables (continuous or discrete)

• Indented by a single space.
• Case sensitive!
• Should follow the same order of the data in original file.
• Less than 16 characters (recommended).
• Should start with character
• No spaces in field name.

Examples of name and type of variables

yield yield is a continuous variable.
treatment * treatments is a simple coded factor (as 1, 2, ... ).
Variety 12 !A variety is an alphabetically coded factor.
dose 4 !I dose is a numerically coded factor (any number).
sex 2 !I !L m f assigns labels to numerical values.
mother [n] !P mother is link to a pedigree structure.
READING / MANIPULATING DATA

• ASCII file (delimited by: tab, comma or space).

• “NA”, “*” and “.” identify missing values.
• First s lines can be skipped by using !SKIP s
• Labels are stored in the order on which they are read.

Some manipulations/transformations
!FILTER f it will filter the variable f
!SELECT v selects observations equal to v from variable f (Above)
!=v to create/overwrite a variable with all values equal to v
!+o sums to variable the number o
!-o subtracts to variable the number o
!*o multiplies the variable by the number o
!/o divides variable by number o
!^p raises the variable to the power p
!^0 calculates the natural logarithm of the variable
!D v eliminates record with missing values or v
!M v converts values of v to missing values
!REPLACE o n replace data values o with n
READING / MANIPULATING DATA
Examples
Yield !*100 variable yield is multiplied by 100 as is read
Yield !M-9 observations with -9 are changed to missing
Yield !^0 calculates the natural log of variable yield
Ymean !=0 !+Y1 !+Y2 !/2 mean of two variables

Relevant Options
!SUMMARY provides a histogram, correlations, counts, etc. (see file .ass)
!OUTLIER performs additional outlier checks (see files .res and .yht)
!X x !Y y produces an scatter-plot for variables x and y
!SORT re-orders labels in alphabetical order
!MVINCLUDE missing values in a factor or variate are treated as zeros.
!WORKSPACE m assigns m Mbytes of memory for the fitting model
!EXTRA n forces n additional iterations after model converge
!MAXIT m indicates a maximum of m iterations
!DOPART $A indicates that different parts will be done
!PART n a specific model n within a job file (may list several parts)
!CONTINUE re-starts fitting of model from last iteration
GRAPHICAL OUTPUT

Relevant Options
!DISPLAY n selects type(s) of diagnostic plot
!NODISPLAY suppresses diagnostic plot output
!PS saves plots in ps format
!EPS saves plots in eps format
!PNG saves plots in png format
!EPS saves plots in eps format
!WMF saves plots in wmf format
!BMP saves plots in bmp format

Coding !DISPLAY n
1 = variogram
2 = histogram
4 = row and column trends
8 = perspective plot of residuals

e.g. 1 + 8 = 9  !DISPLAY 9 (default)

JOB FILE

Specification of Linear Models

Univariate case
y ~ <fixed dense> !r <random sparse> !f <fixed sparse>
mu the constant term or intercept (overall mean)
!r random effects to follow
!f sparse fixed effects to follow (not in ANOVA table)
mv term to estimate missing values (as fixed effects)

Examples
yield ~ mu Variety !r Block
Volume ~ mu Site Site.Block !r Mother Mother.Site !f mv
JOB FILE
Specification of Linear Models
• ASReml uses the Wilkinson and Rogers (1973) notation.

A.B indicates crossed factors

AB = A + B + A.B SAS: A + B + AB

A/B = A + A.B SAS: A + B(A)

• Note that the model term A.B denotes interaction or nested effects
depending on which other terms are previously included in the model.

Examples
Volume ~ mu Site !r Genotype Site.Genotype
Volume ~ mu Site !r Site.Genotype
Yield ~ mu A.B !r Block
JOB FILE
Model functions
(to be used after an specified column, or to create new model variables).

and(t) overlays a design matrix for a model term into an existing one
at(f,n) creates a binary variable for the condition specified in a factor
fac(v) forms a factor with the values of a continuous variable
lin(f) transform the factor f into a covariate
uni(v) creates a factor with a level for every record in the data file
fav(v,y) forms a factor with the levels of a combination of 2 factors
ide(f) fits an additional factor without its genetic relationship matrix
inv(v) calculates inverse of variable v

log(v) calculates the natural logarithm of v

pow(y,p) calculates the variable y to power v
sqrt(v) calculates the square root of v
spl(v,n) fits a spline for variable v with n knots
pol(y,n) forms a set of orthogonal polynomials of order n
JOB FILE
Some options in the variance components
!GP restricts to the positive parameter space
!GU unrestricted
!GF fixed at a given supplied value (e.g. starting value)
!VCC c indicates the number of variance parameters constraints

Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
RUNNING ASReml (BATCH MODE)

>asreml –<options> <filename> <arguments>

<options> single letter that indentifies output or job options.
<filename> file “.as” with job details.
<arguments> allows for specific user-defined arguments.

Some options

-c re-start iteration from latest one (continue)

-p calculation of a function of variance components (.pin)
-sm assigns different memory space to job (usually 4, 5, 6, 7 or 8)
-rn renames the file with the argument n (default n = 1)
-n suppress interactive graphics

Examples
>asreml –rs3c Alfalfa 1
RUNNING ASReml (JOB FILE MODE)

Add commands/arguments in the first line of job file.

Equivalent to using batch mode but useful within ASReml-W

Some options

!RENAME renames the file with the arguments

!ARGS n specifies the arguments (can be more than one)
!NOGRAPHS suppress interactive graphics
!WORKSPACE w sets workspace to w Mbytes (e.g. 1600)
!CONTINUE re-start iteration from the latest one

Example

!RENAME !ARGS 1 2 !WORKSPACE 1600

Session 4
Breeding Theory

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC VARIATION

Discrete variation
• Different phenotypic classes are easily distinguished among genotypes
• Few genes with large effect (i.e. major genes).

xi ~ Bin(n, p)

Quantitative variation
• No clear classes between genotypes. Corresponds to most economically
important traits in animal and plant breeding.
• Due to the effect of many genes that contribute to the phenotypic
variation. Every gene with a small additive effect, plus some
environmental variation (infinitesimal model, Fisher 1918).

Probability distribution gi ~ N(0, σ2)

PHENOTYPIC VALUE

p=μ+g+e
• Phenotypic value (p) deviates from the mean (μ) because the genotypic
component (g) and the environmental deviation (e).
• To isolate g we need to test the progeny!!!

g=a+d+i
p=μ+a+d+i+e
a is the additive component, i.e. cumulative effect of the genes or breeding
value (also known as GCA).
d is the dominance deviation, i.e. interaction between alleles or within-locus
interaction (also known as SCA).
i is the epistatic deviation, i.e. between-loci interaction and higher order
interactions.
e is the random deviation o residual.
VARIANCE COMPONENTS

• Partition of the variance is central to quantitative genetics and breeding,

because is the way we quantify the relative importance of genetic and
environmental influences (e.g. heritability).
• Partition is possible with data where the resemblance among relatives can be
used to estimate genetic variance components.

Vp = Vg + Ve
Vp = Va + Vna + Ve

where, Vna = Vd + Vi is the non-additive variance.

• In the statistical analysis (MM) the genetic variance estimates (e.g. Va) are
obtained by relating them to the causal component (e.g. σa2)
HERITABILITY
Broad sense heritability or degree of genetic determination

H2 = Vg / Vp How much of the total variation is due to genetic

causes (g). Important when working with clonally
replicated individuals.

Narrow sense heritability

h2 = Va / Vp Extent to which phenotypes are determined by the

genes transmitted from parents. Determines the degree
of resemblance among relatives. The most important
measure for breeding programs.

Heritabilities vary from 0 to 1 (e.g. 0.5 could be considered high).

Other definitions: family, plot-mean heritabilities and clonal repeatability

BREEDING VALUE (BLUP)
Definition
• The average effect of the parental alleles passed to the offspring determine
the mean genotypic value of its offspring, or
• The genetic value of an individual (or cross) judged by mean value of its
progeny.

- Sum of average effects across loci (theoretical, now molecular).

- Mean value of offspring (practical).

• Not equivalent concepts if interaction between loci is present or if mating is

not at random.

Estimation
• By BLUP (Best Linear Unbiased Predictor), i.e. the prediction of the
random effects from linear mixed models.
BLUP (or EBLUP)

ˆ Z' V
gˆ  G ˆ 1 (y  Xβˆ )
ĝ vector of random effect predictions.
Gˆ Z'  C' covariance matrix between observations and random
(genetic) effects to be predicted.
V̂ variance-covariance matrix for the observations.
(y  Xβˆ ) individual observations ‘corrected’ by fixed effects.

ˆ Z' V
ˆ G
g ˆ)
ˆ 1 (y  Xβ
ˆg i  [ 2a /  2p ]  ( yi  y )
ˆg i  h 2  ( yi  y )  Gain

Note: the expression changes depending of what trait is being evaluated (y).
SELECTION
• All kind of selection have by aim to increase frequency of favourable
alleles at loci influencing the selected trait(s)
• Types: mass, parental, family, combined, indirect, forward, backward.

Propagation
Increase population Increase
genetic gain diversity
Selected
population

Base
population
SELECTION DIFFERENTIAL (S)
Example
Assuming normal distribution, truncated selection and h2 = 0.4

25 cm 35 cm

29 cm

S = μselected – μpopulation = 35 – 25 =10 cm

GENETIC GAIN (GA)

• In mass selection, genetic gain can be quantified as the difference between the
average breeding (e.g. additive) values from the selected and original
population, i.e.

Ga  aS  aP  h 2 S
But i  S /  p then

Ga  h 2 S  i h 2  p
• Genetic gain depends of the selection intensity (i), heritability (h2) and the
phenotypic standard deviation.
• Here i corresponded to the selection differential
(S = μselected – μpopulation) expressed in terms of phenotypic standard deviations.
TYPE-A CORRELATIONS
Definition: Correlation between traits (pleitrophy)

• Property of genes of influencing more than one phenotypic trait.

• It could be negative or positive (-1 to 1).
• Informs about the biological relationships among traits.
• Assists in the selection of ‘good’ individuals by looking into two traits
simultaneously.

Cov( p1 , p2 ) Cov( g1 , g 2 )
rg A( p )  rg A( g ) 
Var ( p1 )  Var ( p2 ) Var ( g1 )  Var ( g 2 )

Indirect Selection

Ga1  i2  h1  h2  rg A( a )   p1
TYPE-B CORRELATIONS
Definition: Correlation between sites
• Is a relative expression of genotype-by-environment interaction.
• It could be zero or positive (0 to 1).
• A value close to 0 indicates that the rank in one environment is very
different than the rank in another environment (i.e. low stability)
• A value close to 1 indicates that a single ranking can be used across all
environments without loss of information (i.e. high stability).
• Vaxs is the variance estimation of the site by genotype interaction.
• The following expressions represent the average correlation between sites
(if more than 2 sites are analyzed).

Va Vg
rg 2
 rg 2

Va  Vaxs Vg  Vgxs
B (a ) B( g )
Session 5
Genetic Analyses:
Parental Models

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC MODELS

Parental Models
• Half-sib crosses / sire model.
– One parent known. Parent selection.
• Full-sib crosses model.
– Both parents known. Parent/cross selection. Add and Dom effects estimable.
• Family model.
– Both parents known. Cross selection. Add and Dom effects confounded.
• Clonal model.
– Clonally replicated individuals. Parent/cross/individual selection.

Individual Models
• Animal model.
– One or two parents known. Individual/parent selection.
• Reduced animal model.
– One or two parents known. Individual/parent selection (only individuals with
records).
HALF-SIB / SIRE MODEL

General aspects
• One parent is known (mother, sire, variety).
• The other parent is assumed to be unknown and to mate at random.
• Only additive component (Va) can be estimated.
• Useful for selection of parents (backward selection).
• Parental pedigree can (and should) be incorporated.
• Runs faster than other models (e.g. animal model).

Difficulties
• Concern about situations under non-random mating.
• Selection does not capture non-additive genetic variability.
HALF-SIB / SIRE MODEL

y  Xβ  Z1b  Z2s  e
y vector of observations
β vector of fixed effects
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)
e vector of random residual effects, ~ N(0, Iσ2)

X, Z1 and Z2 are incidence matrices

A is the numerator relationship matrix for sires. Replace by I if no pedigree.
I is an identity matrix

Va = 4 σ2s Vp = σ2s + σ2
h2 = Va / Vp = 4 σ2s / [σ2s + σ2]
OPEN POLLINATION
Example: /Day1/OpenPol/OPENPOL.txt
A tree genetic study consisting on seeds from a total of 28 female parents were
collected from mass selection and tested in a RCBD together with 3 control female
parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.
The response variables of interest are total height (HT, cm) and diameter at breast
height (DBH, cm). For now we will concentrate in the response HT. The objective is
to rank the female parents for future selections and seed production. In this analysis
parental pedigree will be ignored. Note that a model can be fitted with and without
the controls included as parents.
ID REP PLOT FEMALE TYPE DBH HT
1 1 1 FEM1 Test 23.8 12.4
2 1 1 FEM1 Test 24.4 12.1
3 1 1 FEM1 Test 25.4 10.9
4 1 1 FEM1 Test 28.0 12.7
5 1 1 FEM1 Test 20.9 11.9
6 1 1 FEM1 Test 22.6 11.2
7 1 2 FEM15 Test 22.4 10.7
8 1 2 FEM15 Test 21.9 11.6
9 1 2 FEM15 Test 20.8 11.3
...
OPEN POLLINATION
Example: /Day1/OpenPol/OpenPol_.as
!RENAME !ARGS 1
Open pollination trial
ID
REP 10 !I
PLOT 34 !I
FEMALE 31 !A !SORT
TYPE 2 !A !SORT
DBH
HT
OPENPOL.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !DOPART $A

!PART 1
HT ~ mu REP !r FEMALE REP.PLOT
predict FEMALE
OPEN POLLINATION
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
FEMALE 31 31 0.192379 0.196155 3.48 0 P
REP.PLOT 340 340 0.518915E-01 0.529102E-01 2.58 0 P
Variance 1876 1866 1.00000 1.01963 27.74 0 P

fi ~ N[0,σs2] sf2 = 0.196

pij ~ N[0,σp2] sp2 = 0.053
eijk ~ N[0,σ2] s2 = 1.020

Va = 4 s2f = 4 x 0.196 = 0.785

Vp = s2f + sp2 + s2 = 0.196 + 0.053 + 1.020 = 1.269
h2 = Va / Vp = 0.785 / 1.269 = 0.619

Extract solutions for every parent and rank!!! (.sln file)

FULL-SIB MODELS
General aspects
• Both parents are known (mother, father, family or cross).
• Mating is often planned (e.g. diallels).
• Additive and dominance component (Va and Vd) can be estimated.
• Some studies allow to obtain common environment, reciprocals, etc.
• Useful for selection of parents (backward selection) or specific crosses.
• Increased gain as dominance effects can be ‘captured’.
• Parental pedigree can be incorporated.

Difficulties
• Dominance effects usually estimated with low precision, or confounded with
other effects.
• Better results obtained with a proper planning of crosses (e.g. connected
diallels).
• Need to check connectivity and number of crosses per parent (male and
female) otherwise this model cannot be fitted.
FULL-SIB: CLASSIC APPROACH

y  Xβ  Z1b  Z2m  Z3f  Z4mf  e

β vector of fixed effects (e.g. μ, replicate)
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
m vector of random male effects (i.e. ½ BV), ~ N(0, Aσ2m)
f vector of random female effects (i.e. ½ BV), ~ N(0, Aσ2f)
mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)
e vector of random residual effects, ~ N(0, Iσ2)

Va = 2 (σ2m + σ2f) or Va = 4 σ2m (when σ2m = σ2f)

Vd = 4 σ2mf
Vp = σ2m + σ2f + σ2mf + σ2
h2 = Va / Vp = [2 (σ2m + σ2f)] / [σ2m + σ2f + σ2mf + σ2]
d2 = Vd / Vp = 4 σ2mf / [σ2m + σ2f + σ2mf + σ2]
FULL-SIB: CLASSIC
Example: /Day1/ContPol/CONTPOL.txt
A total of 177 families and 8 checklots were planted in a test using a RCBD with 25
blocks. For all families planted both parents are known. In this analysis parental
pedigree will be ignored. The objective is to estimate the different variance
components, and calculate heritabilities for the response variable YIELD.
REP FAMILY FEMALE MALE YIELD
1 FAM007 PAR0001 PAR0024 128.68
1 FAM163 PAR0059 PAR0041 119.462
1 C10 C10 PAR0043 .
1 FAM040 PAR0020 PAR0053 103.641
1 FAM114 PAR0051 PAR0001 .
1 FAM053 PAR0032 PAR0032 .
1 FAM048 PAR0031 PAR0018 .
1 FAM057 PAR0033 PAR0035 155.226
1 FAM120 PAR0051 PAR0051 .
1 FAM165 PAR0059 PAR0059 193.982
1 FAM133 PAR0053 PAR0009 184.308
1 FAM057 PAR0035 PAR0033 .
1 C30 C30 PAR0043 141.912
1 FAM082 PAR0044 PAR0006 288.692
1 FAM060 PAR0034 PAR0037 .
1 FAM169 PAR0015 PAR0024 245.664
1 FAM047 PAR0031 PAR0016 .
...
FULL-SIB: CLASSIC
Example: /Day1/ContPol/ContPol_.as
!RENAME !ARG 1
Control Crosses trial
REP 25 !I
FAMILY 182 !A
FEMALE 54 !A
MALE 57 !A
YIELD
ANALYSIS
FULLSIB
CHECKLOT
CONTPOLL2.TXT !SKIP 1

!MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 1
YIELD ~ mu REP !r FEMALE MALE FEMALE.MALE
FULL-SIB
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
FEMALE 54 54 0.775232E-01 295.857 2.32 0 P
MALE 57 57 0.826941E-01 315.590 2.28 0 P
FAMILY 182 182 0.251902 961.350 6.07 0 P
Variance 3879 3854 1.00000 3816.36 42.73 0 P

fi ~ N[0, σf2] sf2 = 295.9

mj ~ N[0, σm2] sm2 = 315.6
Fij ~ N[0, σF2] sF2 = 961.4
eijk ~ N[0, σ2] s2 = 3816.4

Va = 2 (s2f + s2m) = 2 (295.9 + 315.6) = 1223.0

Vd = 4 s2F = 4×(961.4) = 3845.6
Vp = s2f + s2m + s2F + s2 = 295.9 + 315.6 + 961.4 + 3816.4 = 5389.3
h2 = Va / Vp = 1223.0 / 5389.3 = 0.23
d2 = Vd / Vp = 3845.6 / 5389.3 = 0.71

Extract solutions for every parent and family and rank!!! (.sln file)
FAMILY MODEL (Optional)
General aspects
• More common in animal breeding
• Occurs when parents are only present in a single cross.
• Parents might, or might not, be known.
• Additive and dominance component (Va and Vd) can not be separated, unless
there is a well connected parental pedigree.
• Useful for family selection or forward selection.
• Of practical use when dominance variance is known to be negligible.

Difficulties
• Dominance effects are confounded with additive effects.
• Potentially it could over-estimate future genetic gain.
FAMILY MODEL (Optional)

y  X  Z1b  Z2 F  e
β vector of fixed effects (e.g. μ, replication)
b vector of random design effects (e.g. block or plot effect), ~ N(0, Iσ2b)
F vector of random family effects, ~ N(0, Aσ2F) or N(0, Iσ2F)
e vector of random residual effects, ~ N(0, Iσ2)

σ2F = Va/2 + Vd/4

Vp = σ2F + σ2
h2cross = Vfamily / Vp = σ2F / [σ2F + σ2]

Va and Vd can not be separated unless we assumed that Vd = 0

If Vd = 0 then Va = 2 σ2F
h2 = Va / Vp = 2 σ2F / [σ2F + σ2]
FAMILY MODEL (Optional)
Example: /Day1/FamilyModel/FISHF.txt
A total of 459 fish were derived from single parental crosses composed of 32 sires
and 32 females to generate 32 families. Number of individuals per family varied
form 2 to 40. The idea is to rank the families and progeny for selection by using the
variable WEIGHT.
ID SireID DamID Family Weight
1001 120 125 22 88.3
1002 120 125 22 84.9
1003 120 125 22 76.8
1004 121 114 23 95.4
1005 121 114 23 85.4
1006 121 114 23 74.8
1007 121 114 23 103.4
1008 121 114 23 78.7
1009 121 114 23 109.5
1010 121 114 23 113.1
1011 121 114 23 95.4
1012 121 114 23 91.1
1013 121 114 23 85.4
1014 121 114 23 85.4
1015 121 114 23 86.0
...
FAMILY MODEL (Optional)
Example: /Day1/FamilyModel/FishF_.as
!RENAME !ARGS 1
Family fish experiment
ID
SireID 32 !A
DamID 32 !A
Family 32 !A !SORT
Weight
Fish_Family.txt !SKIP 1

!MAXIT 40 !DOPART $A

!PART 1
Weight ~ mu !r Family
FAMILY MODEL (Optional)
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Family 32 32 0.768666E-01 8.12079 1.98 0 P
Variance 459 458 1.00000 105.648 14.71 0 P

Fi ~ N[0, σf2] sF2 = 8.12

eijk ~ N[0, σ2] s2 = 105.65

Vfamily = s2F = 8.12

Vp = s2F + s2 = 8.12 + 105.65 = 113.77
h2cross = Vfamily / Vp = 8.12 / 113.77 = 0.071

Extract solutions for every parent and rank!!! (.sln file)

CLONAL MODEL (Optional)

General aspects
• It can estimated total genetic variability (Vg).
• If both parents are known (mother, father, family or cross) then the additive,
dominance and epistasis components (Va, Vd and Vi) can be reasonably
estimated.
• Useful for selection of parents (backward selection), crosses or specific
genotypes.
• Allows to capture, in new generations, additive, dominance and epistasis
effects.

Difficulties
• Presents same difficulties as full-sib models.
• Some confounding of the epistasis component occurs (higher order terms).
• Occasionally produces negative causal variance components.
CLONAL MODEL (Optional)

y  X  Z1b  Z2m  Z3f  Z4mf  Z5mf .c  e

β and b as defined before
m vector of random male effects, ~ N(0, Aσ2m)
f vector of random female effects, ~ N(0, Aσ2f)
mf vector of random interaction male by female effects, ~ N(0, Iσ2mf)
mf.c vector of random clonal within family effects, ~ N(0, Iσ2c)
e vector of random residual effects, ~ N(0, Iσ2)
Va = 2 (σ2m + σ2f) or Va = 4 σ2m (when σ2m = σ2f)
Vd = 4 σ2mf Vi = σ2c – (σ2m+ σ2f) – 3 σ2mf (approx.)
Vg = Va + Vd + Vi
Vp = σ2m + σ2f + σ2mf + σ2c + σ2
H2 = Vg / Vp h2 = Va / Vp d2 = Vd / Vp
CLONAL MODEL (Optional)
Example: /Day1/Clonal/CLONES.txt
A clonal test derived from a total of 61 families crossed in a circular mating
design were established in a field trial with 3 repetitions and incomplete blocks.
Each family has several clones. The objective of this study is to estimate all
variance components (additive, dominance and epistasis).

IDSORT FamilyID Female Male cloneid Rep IncBlock Tree VOL

1 46 Par927 Par931 677 1 1 1 537.7436
2 33 Par908 Par914 476 1 1 2 492.1155
3 53 Par924 Par907 775 1 1 3 704.826
4 41 Par913 Par917 608 1 1 4 494.6012
6 27 Par923 Par905 391 1 2 1 622.0541
7 14 Par925 Par908 192 1 2 2 425.1107
8 22 Par913 Par923 304 1 2 3 298.8255
9 11 Par929 Par920 144 1 2 4 513.8072
11 23 Par901 Par924 320 1 3 1 457.7191
12 60 Par929 Par904 838 1 3 2 709.3598
15 12 Par917 Par921 162 1 3 5 *
16 53 Par924 Par907 763 1 4 1 392.4941
17 13 Par901 Par916 179 1 4 2 463.7218
19 24 Par915 Par904 340 1 4 4 445.3584
20 40 Par922 Par917 592 1 4 5 623.984
21 30 Par904 Par903 424 1 5 1 439.2273
...
CLONAL MODEL (Optional)
Example: /Day1/Clonal/Clonal_.as
!RENAME !ARGS 1
Clonal Analysis of Pinus
IDSORT
FAMILY 61 !A
FEMALE 44 !P
MALE 44 !P
CLONE 868 !A
REP 3 !A
IBLOCK 110 !A
TREE
VOL
PEDPAR.TXT !SKIP 1 !MAKE !ALPHA
CLONES.TXT !SKIP 1

!MAXIT 50 !DISPLAY 2 !DOPART $A

!PART 1
VOL ~ mu REP !r REP.IBLOCK FEMALE MALE FAMILY CLONE !f mv

!PART 2
VOL ~ mu REP !r REP.IBLOCK FEMALE and(MALE) FAMILY CLONE !f mv
CLONAL MODEL (Optional)
Interpreting variance components

Different var. comp. for Male and Female

Source Model terms Gamma Component Comp/SE % C
FEMALE 44 44 0.100569 1769.13 2.04 0 P
MALE 44 44 0.218970E-01 385.195 0.70 0 P
FAMILY 61 61 0.433857E-01 763.208 1.13 0 P
REP.IBLOCK 330 330 0.350074E-06 0.615822E-02 0.00 0 B
CLONE 868 868 0.427149 7514.07 8.42 0 P
Variance 2604 1766 1.00000 17591.2 22.65 0 P

Same var. comp. for Male and Female

Source Model terms Gamma Component Comp/SE % C
FAMILY 61 61 0.294966E-01 518.846 1.12 0 P
REP.IBLOCK 330 330 0.353801E-06 0.622336E-02 0.00 0 B
FEMALE 44 44 0.714393E-01 1256.62 2.51 0 P
CLONE 868 868 0.428337 7534.46 8.44 0 P
Variance 2604 1766 1.00000 17590.0 22.65 0 P
Session 6
Genetic Analyses:
Animal Models

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
GENETIC MODELS

• Why worry about the pedigree in genetic analyses?

Statistically, random genetic effects (i.e. BLUPs) are not independent and
their matrix of correlations or co-variances (G or A) needs to be specified.
Genetically, it is important to consider information about relatives as they
will share some alleles, and therefore their response is correlated.
• How to incorporate this information?
Genetic relationships can be calculated using genetic theory (expected
values) or molecular information (e.g. SNPs), and included into the linear
mixed model by specifying a pedigree file,
• Are there other benefits?
Many. It is a more efficient use of the information about individuals, but also
genetic values of individual not tested, but with relatives tested, can be
predicted and selected.
PEDIGREE
Example
Pedigree of a group of individuals:

? 1 2 Individual Male Female

3 1 2
4 1 Unknown
3
4 5 4 3
6 5 2

6
PEDIGREE
Numerator relationship matrix (A)
1 2 3 4 5 6
1 1.00 0.00 0.50 0.50 0.50 0.25 
2 1.00 0.50 0.00 0.25 0.625

3 1.00 0.25 0.625 0.563
A  4 
 1.00 0.625 0.313
5 1.125 0.688
6  
1.125 

• Linked to the concept of identity by descent.
• Diagonal aii = 1 + Fi (inbreeding coefficient on individual i)
Twice the probability that two gametes taken at random from animal i will
carry identical alleles by descent.
• Off-diagonal aij numerator of the coefficient of relationship between animal
i and j.
• Several algorithms are available in ASReml to obtain this matrix.
PEDIGREE
CALCULATING THE A MATRIX

• Let A = {aij} be the relationship matrix.

• Let ai,-j the the i-th row of A except for the j-th element.
• Assume the relationship matrix for the base animals is known (e.g.
unrelated, non inbred). This will for a base matrix (e.g. identity)
• The row of the relationship matrix for the progeny of two parents is
generates as the average of the relationship matrix rows for the parents:

ai,-j = (as,-i + ad,-i)/2

• The diagonal element, ai,i of this new individual is:

ai,i = 1 + as,d/2 = 1 + Fi

where Fi is the inbreeding coefficient.

PEDIGREE FILE

Graphically In ASReml
Indiv Male Female
1 0 0
? 1 2
2 0 0
3 1 2
4 1 0
3 5 4 3
4
6 5 2

Analysis Trial AB23

5 Indiv 6 !P
Sire 3 !A
Dam 2 !A
Sex 2 !I
6 weight
PEDIGREE.PED !SKIP 1
DATA.DAT !SKIP 1
weight ~ mu Sex !r Indiv
PEDIGREE FILE
In ASReml
• Pedigree file can be part of the data file
(first 3 columns: individual, parent1 and parent2).
• Method used to construct the A inverse s based on the algorithm of
Meuwissen and Luo (1992).
• Genetic groups can be defined here.

Some useful options

!MAKE always generates the A inverse (instead of using a stored one).
!ALPHA allows to accept alphanumeric names of individuals.
!REPEAT ignore repeated individuals/entries in the pedigree file.
!GIV writes matrix A inverse in the ASCII format (.giv).
!INBRED generates pedigree for inbreed lines.
!SELF s allows for partial selfing according to variable s.
!GROUPS g includes genetic groups in the pedigree according to variable g.
PEDIGREE FILE
Construction / Check
• Pedigree information is associated with proper management and
validation/check of data.
• Individuals need to be ordered by generation (e.g. parents need to be
defined before progeny).
• All parents need to be defined in pedigree file (the inclusion of founder
parents is optional).
• All individuals present in dataset (i.e. levels associated with pedigree file)
need to be defined in pedigree file.
• Individuals can be defined as male or female parents (but this should be
checked if is not biologically possible).
ANIMAL / INDIVIDUAL MODEL

General aspects
• Requires defining individual and parental pedigree.
• A breeding value (or GCA) is obtained for each individual in the dataset,
and for all individuals (e.g. parents) in pedigree file.
• Typically used to estimates additive component (Va) only, but it can be
extended to non-additive and maternal effects.
• Useful for selection of individuals based on additive values (forward
selection) but can be also used to select parents.
• GCA values (or EBV) of parents will be proportional to a parental model.

Difficulties
• For large datasets it can be computationally costly.
• Pedigree file could be difficult to construct/maintain and it needs to be
checked carefully.
ANIMAL / INDIVIDUAL MODEL

y  Xβ  Z1b  Z2a  e
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)
e vector of random residual effects, ~ N(0, Iσ2)

Va = σ2a
Vp = σ2a + σ2
h2 = Va / Vp = σ2a / [σ2a+ σ2]

Note: any individual that are included in the pedigree file will have a
prediction of its breeding values (even those that are not measured).
ANIMAL / INDIVIDUAL MODEL
Example: /Day1/Fish/FISH.txt
The dataset for a fish breeding program contains a total of 933 records of fish.
The objective is to fit an animal model that considers the complete pedigree. The
parental pedigree is found in the file PEDPAR.txt, but an individual pedigree
needs to be constructed. For fitting the model consider the factor SEX as a
covariate. The response of interest is days to market size (DAYSM).

INDIV Sire Dam DaysM Sex Market

1001 564 727 741.46 1 1
1002 564 727 500.09 2 1
1003 564 727 495.07 1 1
1004 564 727 506.25 2 1
1005 564 727 593.21 2 1
1006 564 727 671.10 1 1
1007 564 727 523.48 1 1
1008 564 727 531.33 1 1
1009 564 727 446.02 2 1
1010 564 727 599.20 1 0
1011 564 727 509.38 2 0
...
ANIMAL / INDIVIDUAL MODEL
Example: /Day1/Fish/Fish_.as
!RENAME !ARGS 1
Breeding Program Fish
INDIV 2040 !P !SORT
SIRE 100 !I
DAM 100 !I
DAYSM
SEX 2 !I
MARKET
PEDIND.TXT !SKIP 1 !MAKE
FISH.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A

!PART 1
DAYSM ~ mu SEX !r INDIV
ANIMAL / INDIVIDUAL MODEL
Source Model terms Gamma Component Comp/SE % C
INDIV 1380 1380 0.584596 2046.39 4.52 0 P
Variance 933 931 1.00000 3500.52 10.21 0 P

Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
7 mu 1 77.6 15677.14 15677.14 . <.001
5 SEX 1 888.2 21.88 21.88 A <.001

Va = s2a = 2046.39 Vp = s2a + s2 = 2046.39 + 3500.52 = 5546.91

h2 = Va / Vp = 0.369

SEX 1 0.000 0.000

SEX 2 21.57 4.612
mu 1 549.8 5.172
INDIV 501 6.527 37.33
INDIV 502 6.074 35.14
INDIV 503 -27.03 36.32
INDIV 504 -23.94 37.53
INDIV 505 0.6396 35.30
INDIV 506 7.579 38.26
INDIV 507 -8.798 35.33
...
ANIMAL / INDIVIDUAL MODEL
Breeding Program Fish

Ecode is E for Estimable, * for Not Estimable

The predictions are obtained by averaging across the hypertable

calculated from model terms constructed solely from factors
in the averaging and classify sets.
Use !AVERAGE to move ignored factors into the averaging set.

---- ---- ---- ---- ---- ---- 1 ---- ---- ---- ---- ---- ----
Predicted values of DAYSM
The SIMPLE averaging set: SEX

INDIV Predicted_Value Standard_Error Ecode

501 567.1392 37.3393 E
502 566.6863 35.0927 E
503 533.5860 36.4141 E
504 536.6737 37.5067 E
505 561.2515 35.2528 E
506 568.1914 38.2210 E
507 551.8138 35.2626 E
508 526.8242 36.6684 E
509 525.0169 37.9278 E
510 523.4792 37.2501 E
511 616.2975 36.1484 E
512 563.8451 37.7190 E
513 541.1283 38.2338 E
514 532.9948 37.0123 E
515 541.1283 38.2338 E
516 538.0093 38.6922 E
105 586.5505 40.6930 E
1 586.5505 40.6930 E
...
ANIMAL / INDIVIDUAL MODEL
Additional comments
• When pedigree is available from several generations, usually more than 3
generations does not produce a significant improvement on precision of
estimates.

• Incorporation of genetic groups is critical in order to consider previous

achieved genetic gains, and to describe the proper structure of the data.

• Reduced animal model (RAM), it is an alternative that runs faster as only

animals with records are considered.

• Other variants exist of the animal model exist that consider:

• Environmental effects.
• Maternal effects
• Genetic maternal effects
• Model with non-additive genetic effects (mainly dominance)
• Common environment effects
COMMON ENVIRON. EFFECTS

y  Xβ  Z1b  Z2a  Z3ce  e

β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, Aσ2a)
ce vector of random common environmental effects, ~ N(0, Iσ2ce)
e vector of random residual effects, ~ N(0, Iσ2)

Va = σ2a
Vp = σ2a + σ2ce + σ2
h2 = Va / Vp = σ2a / [σ2a+ σ2ce + σ2]

Note: common environment effects are non-genetic effects that causes

resemble between members of the same family.
Session 7
Variance Structures in
ASReml

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
VARIANCE STRUCTURES
Direct Product
• Variance structures are specified by using direct products or two or more
matrices (, or Kronecker product).

a a   a B a12B 
A   11 12  A  B   11 
a21 a22  a21B a22B

Example

1 0 0  12 12 0 0 0 0 
  12 
2
 
A  0 1 0 B
12  22
1
2  0 0 0 0 
0 0 1 12  2   0 0 12 12 0 0 
AB  
 0 0 12  22 0 0 
 0 0 0 0 12 12 
 0 12  22 
 0 0 0
VARIANCE STRUCTURES
Direct Sum
• The desired matrix is specified by several square matrices in a block
diagonal matrix.

Example

 A1 0 0
R  3j 1 R j  diag ( A1 , A 2 , A 3 )   0 A2 0 
 0 0 A 3 
ALFALFA EXPERIMENT
Example: /Day2/VarStruct/AlfalfaS_.as
An experiment was establish to compare 12 alfalfa varieties (labeled A-L).
These correspond to 3 different sources but the objective is to estimate
heritability of varieties regardless of its source. A total of 6 plots per variety
were established arranged in a RCB design. The response variable
corresponds to yield (tons/acre) at harvest time. It is of interest to fit a linear
model with an specific error variance for each of the different sources.
Alfalfa experiment - 12 varieties - Response Yield
Source 3 !I
Variety 12 !A !SORT
Block 6 !I
yield
ALFALFAS.TXT !SKIP 1 !DISPLAY 7 !SUMMARY

yield ~ mu Block !r Variety

3 1 0
24 0 ID
24 0 ID
24 0 ID
ALFALFA EXPERIMENT
Interpreting output
Source Model terms Gamma Component Comp/SE % C
Residual 72 66
Variety 12 12 0.222267E-01 0.222267E-01 1.64 0 P
Variance 0 0 0.928105E-01 0.928105E-01 2.93 0 P
Variance 0 0 0.602051E-01 0.602051E-01 2.99 0 P
Variance 0 0 0.146949E-01 0.146949E-01 2.49 0 P

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
5 mu 1 9.9 990.35 <.001
3 Block 5 25.0 24.05 <.001

Variety A 0.1685 0.1000

Variety B -0.1668 0.1000
Variety C 0.4365E-01 0.1000
Variety D 0.1361 0.1000
Variety E 0.1211 0.9013E-01
Variety F -0.1983 0.9013E-01
Variety G -0.8737E-02 0.9013E-01
Variety H 0.3721E-01 0.9013E-01
Variety I 0.1027 0.6540E-01
Variety J -0.1585 0.6540E-01
Variety K -0.9548E-01 0.6540E-01
Variety L 0.1860E-01 0.6540E-01
VARIANCE STRUCTURES
Variance models (VCODE)
Common structures
ID Identity 1
DIAG Diagonal w
US Unstructured w(w + 1)/2
AINV Numerator relationship matrix (A) 0 or 1
CORU Uniform correlation 1

Correlation/Spatial structures
CORB Banded correlation w-1
AR1 First order autoregressive 1
AR2 Second order autoregressive 2
ARMA Autoregressive and moving average 2
CORG General correlation (homogeneous) w(w - 1)/2
ANTE1 Antedependence of order 1 w(w - 1)/2
LVR Linear variance 1
VARIANCE STRUCTURES
Correlation-variance structures (homogeneous)
AR1V First order autoregressive (homog.) 2
CORUV Uniform correlation (homogenoeus) 2
CORBV Banded correlation (homogeneos) w
CORGV general correlation (homogeneous) w(w - 1)/2 + 1

Heterogeneous structures
IDH = DIAG Identity (heterogenoeus) w
AR1H First order autoregressive (heterog.) 1+w
CORUH Uniform correlation (heterogeneous) 1+w
CORBH Banded correlation (heterogeneos) 2w - 1
CORGH = US general correlation (heterogeneous) w(w - 1)/2 + w

Special structures
IEXP Isotropic Exponential 1
AEXP Anisotropic Exponential 2
OWNk User supplied G matrix k
GIVk User supplied General (Inverse) matrix 0 or 1
VARIANCE STRUCTURES
ID: identity AR1V: autocorrelation 1st order
1 0 0 0   2 0 0 0 1 1 2 3 
  
2 0 1 0 0  0 2 0 0  1 
  2  1 1 2 
0 0 1 0  0 2 0  2
0  1 1 1 
     3 
0 0 0 1  0 0 0  2   2 1 1 

DIAG: diagonal CORUH: uniform heterogeneous

CORUV: uniform correlation US: unstructured

AR1: autocorrelation 1st order CORG: general correlation

1 1 2 3   1 12 13 14 
 1   
 1  
 1 1 2   12 23 24 
 2 13  23 1 34 
1 1 1   
 3  14  24 34 1 
 2 1 1 
VARIANCE STRUCTURES
Variance Header Line
• Required whenever random effects or residuals are not identically and
independently distributed.
<sections> <dimensions> <number of G structures>

<sections>
• Number of residual (Rj) structures to define.

Example. If several experiments are combined into a single analysis, then

each experiment will have an error structure with its own variance:

R   sj 1 R j

However, it is also possible to define each error structure with a direct product:

R j  R j1  R j 2
VARIANCE STRUCTURES
Variance Header Line
<dimensions>
• Number of direct product of variance structures that are required to define
each of the residual, Rj, structures.

Example. An spatial analysis will have an error structure defined by two

elements: correlations across rows and correlations across columns.

<number of G structures>
• Number of random effects (Gi, or any interaction) that are defined with
structures different than identically and independently distributed.

Example. Pedigree matrix can be defined here (G = A)

Note: each of this components will have to be defined in greater detail later.
VAR. STRS. - EXAMPLES
<sections>
• Number of residual structures to define.
3 1 0
1280 0 ID
1320 0 ID
2300 0 ID
3: acts as a counter (here, 3 sites)
1: only a single structure on each of the residual structures
0: no G structures defined
1280: number of observations in site 1 (sorted by site)
0: sortkey (sorting variable no specified here)
ID: VCODE corresponding to independent errors.
1320: number of observations in site 2 (sorted by site)
2300: number of observations in site 3 (sorted by site)

!SECTIONS n number of residual structures to define.

VARIANCE STRUCTURES
<dimensions>
• Number of direct product of variance structures that are required to define
each of the residual structures.

1 2 0
16 row AR1
20 col AR1

1: a single residual structure (1 site here)

2: two direct products that define the residual structure
0: no G structures defined

16: number of rows in experiment (it could be replaced by a number)

row: sortkey for order of rows within dataset
AR1: VCODE corresponding to auto-correlated structure
20: number of columns in experiment (it could be replaced by a number)
col: sortkey for order of columns within dataset
VARIANCE STRUCTURES
<number of G structures>
• Number of random effects (or interactions) that are defined.
3 1 1
1280 0 ID
1320 0 ID
2300 0 ID

site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV

3: acts as a counter (here, 3 sites)

1: a single structure en each of the 3 residual elements
1: a single G structure defined

Note: the command !f mv keeps the missing observations and is useful for
counting observations over multiple R structures
VARIANCE STRUCTURES

3 1 1
1280 0 ID
1320 0 ID
2300 0 ID

site.genotype 2
site 0 CORGH 0.25 0.25 0.25 1.22 1.46 2.05
genotype 0 AINV

site.genotype G structure term to be defined

2 number of factors to define for this G structure
site (or 3): acts as a counter (as before with a value of 3)
0: sortkey (not specified)
CORGH: VCODE heterogeneous general correlation matrix
genotype: acts as counter for the genotype factor
0: sorkey (not specified)
AINV: VCODE inverse of the relationship matrix from pedigree file
VARIANCE STRUCTURES
• Starting values and restrictions can be added next to the parameters.
• Important to aid convergence and to speed up fitting.

Some options in the variance components

!GP restricts to the positive parameter space
!GU unrestricted
!GF fixed at a given supplied value (e.g. starting value)
!VCC c indicates the number of variance parameters constraints
!S2==1 qualifier required to fix the error variance at 1.0 and prevent
ASReml trying to estimate two confounded parameters (usually required for
cases where variance , instead of correlation, matrices are specified)

Example
Volume ~ mu Block !r Mother 0.25 !GF Plot 0.4 !GU
VARIANCE STRUCTURES
• Order of starting values for variance and correlation matrices is important

Variance Matrices
1    1 2 4 7
2  3 5 8 
 3    
or
4 5 6    6 9
   
7 8 9 10    10

Correlation Matrices
7 1 23 7   
 8 4 5  1 8   
 or 
  9 6 2 4 9 
   
   10 3 5 6 10

Note: for most complex variance structures it is critical to specify starting values.
CONSTRAINTS IN VAR-COV COMP.
Next to model terms
!GP positive variance component
!GU unrestricted variance component (default)
!GF fixed variance component
Volume ~ mu Block !r Mother 0.25 Plot 0.4 !GF

After model terms

!VCC n to read n variance component restrictions lines G structure.
25 26 # V25 = V26
2 -3 # V2 = -V3
4 5 * 4 # V4 = V5*4

!=ABA all parameters with the same letter in the G or R structure

are treated as the same parameter.
2 0 US 0.2 0.3 0.5 !=ABA
ALFALFA EXPERIMENT
Example: /Day2/VarStruct/AlfalfaS_.as
It is of interest to fit a linear model with an specific error variance for each of
the sources 1 and 3, and a different for source 2.

!PART 3
yield ~ mu Block !r Variety
1 2 0
3 Source DIAG 0.8 0.8 0.8 !=ABA
24 0 ID !S2==1

!PART 4
!VCC 1
yield ~ mu Block !r Variety
3 1 0
24 0 ID
24 0 ID
24 0 ID

4 6
FUNCTIONS OF VAR. COMPS.
• Post-analysis procedure to calculate functions of variance components
(e.g. heritability or genetic correlations).
• Based in approximations using delta method (i.e. Taylor series approx.)
• It should not be used for statistical inference only as a rough reference.

Linear functions of variance components

Va = 4 σ2s V p = σ2 s + σ2

Ratio of variance components

h2 = Va / Vp = 4 σ2s / [σ2s + σ2]

Correlations based in 3 variance components

Cov( g1 , g 2 )
rg A( g ) 
Var ( g1 )  Var ( g 2 )
FUNCTIONS OF VAR. COMPS.
ASReml options
F Linear functions of variance components
H Ratio of variance components
R Correlations based in 3 variance components

• A .pin file needs to be created with the functions to be calculated

following the order of the variance components presented in the .asr file,
and also uses output from .vvp file.
• Output is presented in file .pvc

• Alternatively commands can be incorporated into the file using the

commands: !PIN !DEFINE, which will generate the file automatically
and then run it.
ALFALFA EXPERIMENT
Example: /Day2/VarStruct/AlfalfaS_.as
!PART 5
yield ~ mu Block !r Variety
!PIN !DEFINE
F Vg 1 #3
F Vtotal 1 2 #4
H Herit 3 4

Variance component estimates

Source Model terms Gamma Component Comp/SE % C
Variety 12 12 0.580868 0.276798E-01 1.81 0 P
Variance 72 66 1.00000 0.476526E-01 5.24 0 P

pvc file
1 Variety 0.276798E-01
2 Variance 0.476526E-01
3 Vg 1 0.27680E-01 0.15265E-01
4 Vtotal 1 0.75332E-01 0.16972E-01
Herit = Vg 1 3/Vtotal 4= 0.3674 0.1397
Notice: The parameter estimates are followed by
their approximate standard errors.
Session 8
Multivariate Analysis /
Repeated Measures

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MULTIVARIATE ANALYSIS
General Uses

• More efficient analysis that combines information on two or more response

variables.
• Produces an improvement on the precision of the breeding values (BLUPs).
• Allows to estimate correlations among traits (e.g. phenotypic and genetic
correlations).
• Assists in predicting individual breeding values for traits that were not
measured (but they need to be correlated).
• Relevant to assess importance of indirect selection.
• Can be used to combine different sources of, complete or incomplete,
sources of data.
• Generates the required matrices to construct a selection index.
• Recommended analysis for cases where a prior selection was done based in a
trait.
BIVARIATE ANALYSIS
g1 g2
• Considers a 2 x 2 matrix for each effect, e.g. V(g i )  g1  t21 t1t 2 
 
g2 t1t 2 t22 
In ASReml
• Uses individual stacked responses: yi = [yi(1) yi(2)]’
• The word Trait is used to defined the stacked response vector.
• Typically genetic and error effects are defined with a UN variance structure.
• Other effects can be defined as UN or DIAG structures.
• It is also recommended to use some of the correlation to maintain parameter
space.

Strategy for fitting models in ASReml

• Sensible to initial starting values (for any multivariate analysis).
• Strategy: start with univariate analysis and add one variable at the time.
• Get rough estimates: Estimate phenotypic or genetic correlations /
covariances using univariate solutions, or prior knowledge.
• Use !CONTINUE or –c from previous runs.
OPEN POLLINATION
Example: /Day2/BivarOpen/OPENPOL.txt
A tree genetic study consisting on seeds from a total of 28 female parents were
collected from mass selection and tested in a RCBD together with 3 control female
parents. The experiment consisted in 10 replicates with 34 plots each of size 2 x 3.
The response variables of interest are total height (HT, cm) and diameter at breast
height (DBH, cm). For now we will concentrate in the response HT. The objective is
to rank the female parents for future selections and seed production. Note that a
model can be fitted with and without the controls included as parents.

ID REP PLOT FEMALE TYPE DBH HT

1 1 1 FEM1 Test 23.8 12.4
2 1 1 FEM1 Test 24.4 12.1
3 1 1 FEM1 Test 25.4 10.9
4 1 1 FEM1 Test 28.0 12.7
5 1 1 FEM1 Test 20.9 11.9
6 1 1 FEM1 Test 22.6 11.2
7 1 2 FEM15 Test 22.4 10.7
8 1 2 FEM15 Test 21.9 11.6
9 1 2 FEM15 Test 20.8 11.3
10 1 2 FEM15 Test 21.6 13.3
...
OPEN POLLINATION (bivariate)
Example: /Day2/BivarOpen/BivarOpen_.as
!RENAME !ARGS 2
Open polination trial
ID
REP 10 !I
PLOT 34 !I
FEMALE 31 !A !SORT
TYPE 2 !A !SORT
DBH
HT
OPENPOL.TXT !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 2
HT DBH ~ Trait Trait.REP !r Trait.FEMALE Trait.REP.PLOT
1 2 2
0 0 ID
Trait 0 US 1.01 1.82 7.25

Trait.FEMALE 2
Trait 0 US 0.19 0.31 0.61
FEMALE 0 ID

Trait.REP.PLOT 3
Trait 0 US 0.05 0.001 0.001 !GUFF
REP 0 ID
PLOT 0 ID
OPEN POLLINATION (bivariate)
Interpreting analysis
Source Model terms Gamma Component Comp/SE % C
Residual UnStructured 1 1 1.00196 1.00196 29.17 0 U
Residual UnStructured 2 1 1.83449 1.83449 23.69 0 U
Residual UnStructured 2 2 7.43730 7.43730 29.20 0 U
Trait.FEMALE UnStructured 1 1 0.191142 0.191142 3.44 0 U
Trait.FEMALE UnStructured 2 1 0.310167 0.310167 3.16 0 U
Trait.FEMALE UnStructured 2 2 0.705031 0.705031 3.39 0 U
Trait.REP.PLOT DIAGonal 1 0.790064E-01 0.790064E-01 5.19 0 U
Trait.REP.PLOT DIAGonal 2 -0.201829 -0.201829 -2.84 0 U
Covariance/Variance/Correlation Matrix UnStructured Residual
1.002 0.6720
1.834 7.437
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
0.1911 0.8449
0.3102 0.7050

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 Trait 2 29.2 9584.21 <.001
9 Trait.REP 18 643.1 4.82 <.001
MULTIVARIATE ANALYSIS
Strategy for fitting models in ASReml

• For fitting model use same strategies as for bivariate analysis.

• Standardized responses, particularly when variables have different scales.
• Implement simple structures first (e.g. ID, DIAG, CORUH, CORGH).
• Correlation variance structures (CORUH, CORBH, CORGH) tend to give
better results.
• Consider constraining some parameters, e.g. !GPFPUP
• Be aware that it might not fit at all!

Extensions
• Consider different sites (or years) as different traits (e.g. helps to classify
sites).
• Variance-covariance matrices can be used to ‘study’ genetic structure
(e.g. evaluating / separating genetic groups).
REPEATED MEASURES
• Very similar to multivariate analysis but every measurement point (time) is
considered as a different trait.
• Requires modelling of the mean effects (patterns) and variance structures.
• Additional modelling of fixed effects of time points is possible (e.g.
polynomials or splines).
• Convergence conflicts are still present, but to a lesser extent.
• Two modelling approaches:
- Multiple vectors: parallel vectors with, typically, US error structure.
- Single vector: stacked responses with, typically, AR1V correlations.

Relevant functions in ASReml

pol(y,n) forms a set of orthogonal polynomials of order n
lin(f) transform the factor f into a covariate
spl(v,k) defines a spline model term for the variable v with k knots
!{ and !} placed around model terms so terms are not reordered
(important for specifying covariances between random terms)
REPEATED MEASURES: AS MV
Example: /Day2/MultiVar/MVCOLS.txt
A total of 824 individuals were measured at 4 equally spaced time points. These
correspond to offspring of 26 parents that were planted as a RCBD with 4 blocks
at 2, 4, 6 and 8 years after establishment.
IDD Indiv Female Rep HT1 HT2 HT3 HT4
1 1 F09 1 62.0 108.0 240.0 411.5
2 2 F02 1 66.0 154.0 275.0 442.0
3 3 F21 1 65.0 116.0 245.0 323.1
4 4 F25 1 68.0 102.0 225.0 350.5
5 5 F13 1 58.0 170.0 325.0 457.2
6 6 F14 1 117.0 265.0 445.0 588.3
7 7 F14 1 * * * *
8 8 F15 1 75.0 162.0 315.0 484.6
9 9 F18 1 74.0 182.0 340.0 493.8
10 10 F03 1 100.0 230.0 350.0 518.2
11 11 F07 1 72.0 148.0 310.0 313.9
12 12 F14 1 69.0 164.0 310.0 469.4
13 13 F11 1 87.0 208.0 340.0 493.8
14 14 F24 1 50.0 148.0 290.0 454.2
15 15 F02 1 66.0 173.0 350.0 521.2
16 16 F21 1 75.0 164.0 305.0 469.4
17 17 F15 1 78.0 166.0 315.0 493.8
...
REPEATED MEASURES: AS MV
Example: /Day2/MultiVar/MV_.as
!RENAME !ARGS 1
Multivariate Analysis of HT - 4 meas
IDD
INDIV
FEMALE !A
REP !A
HT1 HT2 HT3 HT4
MVCols.txt !SKIP 1 !MAXIT 40 !DISPLAY 2 !DOPART $A

!PART 1
HT1 HT2 HT3 HT4 ~ Trait Trait.REP !r Trait.FEMALE
1 2 1
0 0 ID
Trait 0 US 419
556 1405
698 1846 3801
821 2306 4624 7154

Trait.FEMALE 2
Tr 0 US 36
48 74
38 70 117
61 126 223 410
FEMALE 0 ID
REPEATED MEASURES: AS MV
Interpreting analysis
Covariance/Variance/Correlation Matrix UnStructured Residual
419.7 0.7241 0.5527 0.4744
556.1 1405. 0.7989 0.7275
698.1 1847. 3801. 0.8868
822.0 2307. 4625. 7155.
Covariance/Variance/Correlation Matrix UnStructured Trait.FEMALE
35.60 0.9375 0.5843 0.5068
48.08 73.88 0.7499 0.7245
37.78 69.86 117.5 1.019
61.25 126.2 223.8 410.4
REPEATED MEASURES: AS UNIV
Example: /Day2/RepMeas/REPCOLS.txt
IDD Indiv Female Rep Time HT
1 1 F09 1 1 62
2 1 F09 1 2 108
3 1 F09 1 3 240
4 1 F09 1 4 411.5
5 2 F02 1 1 66
6 2 F02 1 2 154
7 2 F02 1 3 275
8 2 F02 1 4 442
9 3 F21 1 1 65
10 3 F21 1 2 116
11 3 F21 1 3 245
12 3 F21 1 4 323.1
13 4 F25 1 1 68
14 4 F25 1 2 102
15 4 F25 1 3 225
16 4 F25 1 4 350.5
17 5 F13 1 1 58
18 5 F13 1 2 170
19 5 F13 1 3 325
20 5 F13 1 4 457.2
...
REPEATED MEASURES
Example: /Day2/RepMeas/RepCols_.as
!RENAME !ARGS 1
Repeated Measures Analysis of HT - 4 meas
IDD
INDIV
FEMALE 26 !A
REP 4 !I
TIME 4 !I
HT
REPCOLS.txt !MAXIT 40 !SKIP 1 !DISPLAY 2 !DOPART $A

!PART 1
!FILTER TIME !SELECT 1
HT ~ mu REP !r FEMALE

!PART 2
log(HT) ~ mu lin(TIME) TIME.REP !r,
!{ FEMALE lin(TIME).FEMALE !} !f mv
1 2 1
824 0 ID !S2==1
TIME 0 AR1H 0.8 0.05 0.05 0.05 0.05

FEMALE 2
2 0 CORUH -0.8 0.004 0.0001
FEMALE
REPEATED MEASURES
Interpreting analysis
Source Model terms Gamma Component Comp/SE % C
Residual AR=AutoR 4 0.798949 0.798949 82.75 0 U
Residual AR=AutoR 4 0.641964E-01 0.641964E-01 19.31 0 U
Residual AR=AutoR 4 0.464063E-01 0.464063E-01 19.32 0 U
Residual AR=AutoR 4 0.365361E-01 0.365361E-01 20.19 0 U
Residual AR=AutoR 4 0.310505E-01 0.310505E-01 20.57 0 U
FEMALE CORRelat 2 -0.807724 -0.807724 -6.53 0 U
FEMALE CORRelat 2 0.362337E-02 0.362337E-02 1.87 0 U
FEMALE CORRelat 2 0.262804E-03 0.262804E-03 2.06 0 U
Covariance/Variance/Correlation Matrix CORRelation FEMALE
0.3625E-02 -0.8078
-0.7884E-03 0.2629E-03

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
8 mu 1 23.2 0.37E+06 <.001
9 lin(TIME) 1 24.0 17256.68 <.001
10 TIME.REP 14 2096.0 256.63 <.001
Session 9
Multi-environment
Analysis

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
MET ANALYSIS
General Uses
• Incorporates information from several experiments (over different sites or
years) to obtain overall BVs.
• Allows to estimate Genotype-by-Environment (or Genotype-by-Year)
effects, and their variance structure. Hence, it separates genetic effects into
their pure component and their interaction with site (or year).
• Provides with unbiased estimates of heritability and Type-B correlations.
• Critical to understand the genotypes structure of the population and to
define breeding strategies.

Difficulties
• Every site (or year) has its own ‘personality’ (i.e. error structure, design
effects, etc.) that needs to be combined into a single analysis.
• Amount of data can large with difficulties in fitting and convergence.
• Requires additional prior checks (e.g. EDA, coding, etc.).
MET ANALYSIS
In ASReml

• Flexible and fast enough to incorporate many datasets.

• Each site will have its own model specification (fixed effects, random
components and error structure).
• Allows to use a 2-stage analysis (see !TWOSTAGEWEIGHTS).

<sections> <dimensions> <number of G structures>

Some useful options

at(f,n) creates a binary variable for the condition specified in a factor
mv creates a missing value as fixed effect (design matrix)
!SECTION n number of residual structures to define.
!MVINCLUDE missing values in a factor are treated as zeros
MET ANALYSIS
Strategy for fitting MET models in ASReml

• Careful cleaning process (same factors, values, etc.).

• Start analyzing every site individually determining all necessary (and
significant) design effects and error structure.
• Evaluate which sites to consider for full analysis (sites with low
heritability contribute little to ranking).
• Consider implementing a data standardization.
• Incorporate and evaluate which variables or factors will act as
‘covariates’ through all trials.
• Combine all trials into a simple single analysis (e.g. heterogeneous error
variances but with common additive variance).
• Progress slowly to more complex variance structure for different model
terms (e.g. DIAG for additive).
• Considering favouring the simplest model that suits your requirements
(practical, operational).
MET ANALYSIS
Complex Variance Structures
• Ideal objective: to fit a US structure to the GxE matrix to understand the
genetic structure and evaluate stability of genotypes and breeding zones.
• A US structure is difficult to fit, but other simpler (approximate) structures
are available.
• ASReml allows to consider other structures based in multivariate
techniques (e.g. factor analytic covariance).
TYPE-B CORRELATIONS
Definition: Correlation between sites

• Is a relative expression of genotype-by-environment interaction.

• It could be zero or positive (0 to 1).
• A value close to 0 indicates that the rank in one environment is very
different than the rank in another environment (i.e. low stability)
• A value close to 1 indicates that a single ranking can be used across all
environments without loss of information (i.e. high stability).

Va
rg 2

Va  Vaxs
B (a )

Vg
rg 2

Vg  Vgxs
B( g )
MET ANALYSIS
Option 1: Simple GxE structure
• Aims at modelling a common GxE correlation.
• Common structures are: DIAG, CORUH.
• Correlation corresponds to an average value across all sites.
• It is simpler to fit, easy to converge.
• It does not allow for a better understanding of the GxE.

Option 2: Complex GxE structure

• Aims at modelling the ‘full’ GxE correlation structure.
• Common structures are: CORGH, US, FAk, FACVk.
• Provides with a different GxE correlation for each pair of sites.
• It is difficult to fit, particularly for several sites.
• Simplifications are usually required, e.g. standardization.
MET ANALYSIS

Variant 1: Explicit GxE

yield ~ mu Site !r Genotype Site.Genotype

• Provides with average genetic values across all sites, together with GxE
deviations for each site.
• Useful for generating ranking across all sites.
• Allows for simplification of GxE term.

Variant 2: Implicit GxE

yield ~ mu Site !r Site.Genotype

• Provides with a different genetic value for each site.

• Useful for generating rankings for each site.
• It could make use of the full correlation structure of the GxE.
• Typically used to understand the dynamics of GxE.
MET HALF-SIB / SIRE MODEL
Explicit GxE
y  X1β  X2l  Z1b  Z2s  Z3sl  e
y vector of observations
β vector of fixed design or covariate effects
l vector of fixed location (sites or years) effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
s vector of random sire effects (i.e. ½ breeding value), ~ N(0, Aσ2s)
sl vector of random sire-by-location interactions, ~ N(0,s Iσ2sl)
e vector of random residual effects, ~ N(0, D) or N(0,  R i)
i 1

Va = 4 σ2s Vaxs = 4 σ2sl

Vp = σ2s + σ2sl + σ2
h2 = Va / Vp = 4 σ2s / [σ2s + σ2sl + σ2]
rgB(a) = Va / [Va + Vaxs] = ρs
MET ANALYSIS
Example: /Day2/MultiEnv/TRIALS4.txt
A set of 4 trials were established as part of a breeding program. A total of 61
unrelated parents were considered (i.e. half-sib model). All trials corresponded to
IBD with 4 full replicates. The response variable of interest is HT. We are
interested in obtaining an analysis using all four sites simultaneously.
IDD Test Genotype Rep Iblock Row Column Surv DBH HT
10001 1 G41 1 1 1 1 1 736.6 557.8
10002 1 G33 1 1 2 1 1 685.8 588.3
10003 1 G22 1 1 3 1 1 838.2 551.7
10004 1 G31 1 1 4 1 1 660.4 539.5
10005 1 G18 1 1 5 1 1 406.4 411.5
10006 1 G01 1 1 6 1 1 508.0 417.6
10007 1 G05 1 1 7 1 1 711.2 518.2
10008 1 G54 1 2 8 1 1 609.6 463.3
10009 1 G30 1 2 9 1 1 482.6 466.3
10010 1 G17 1 2 10 1 1 736.6 527.3
10011 1 G58 1 2 11 1 1 584.2 472.4
10012 1 G37 1 2 12 1 1 431.8 442.0
10013 1 G07 1 2 13 1 1 736.6 600.5
10014 1 G42 1 2 14 1 1 711.2 566.9
10015 1 G38 1 3 15 1 1 711.2 518.2
10016 1 G33 1 3 16 1 1 736.6 606.6
10017 1 G50 1 3 17 1 1 736.6 576.1
10018 1 G20 1 3 18 1 1 660.4 539.5
...
MET ANALYSIS
Example Variant 1: /Day2/MultiEnv/GxE_.as
!RENAME !ARGS 2
Four trials to study GxE for HT
IDD
Test 4 !A !SORT
Genotype 61 !A
REP 4 !A
IBlock 110 !A
Row 56 !A
Col 32 !A
Surv
DBH
HT
TRIALS4.txt !SKIP 1 !MAXIT 50 !DISPLAY 2 !DOPART $A

!PART 2
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Genotype Test.Genotype !f mv
4 1 0
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
Genotype 100 100 301.167 301.167 4.60 0 P
Test.Genotype 400 400 158.584 158.584 6.74 0 P
at(Test,1).REP.IBloc 4400 4400 1159.04 1159.04 9.75 0 P
at(Test,2).REP.IBloc 4400 4400 1960.32 1960.32 10.84 0 P
at(Test,3).REP.IBloc 4400 4400 815.989 815.989 9.18 0 P
at(Test,4).REP.IBloc 4400 4400 206.324 206.324 4.77 0 P
Variance 0 0 4390.59 4390.59 44.30 0 P
Variance 0 0 3871.67 3871.67 43.39 0 P
Variance 0 0 4130.69 4130.69 42.40 0 P
Variance 0 0 3812.02 3812.02 42.26 0 P

Va = 4 s2g = 4 x 301.2 = 1204.7

Vaxs = 4 s2gs = 4 x 158.6 = 634.3
Vp = 301.2 + 158.6 + (4141.7)/4 +(16235.0)/4 = 5553.9
h2 = Va / Vp = 1204.7 / 5553.9 = 0.217
rgB(a) = Va / [Va + Vaxs] = 1204.7 / [1204.7 + 634.3] = 0.655

Note: individual site heritabilites can also be calculated.

MET HALF-SIB / SIRE MODEL
Implicit GxE
y  X1β  X2l  Z1b  Z3sl  e
y vector of observations
β vector of fixed design or covariate effects
l vector of fixed location (sites or years) effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
sl vector of random sire-by-location interactions, ~ N(0, UA)
e vector of random residual effects, ~ N(0, D)
U matrix of variance-covariances
A numerator relationship matrix
D diagonal matrix
MET ANALYSIS
Example Variant 2: /MultiEnv/GxE_.as
!PART 3
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Test.Genotype !f mv
4 1 1
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID

Test.Genotype 2
Test 0 US 520.7
392.2 563.6
256.7 376.6 392.1
384.1 268.8 200.0 356.8
Genotype 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1161.07 1161.07 9.76 0 P
at(Test,2).REP.IBloc 440 440 1961.80 1961.80 10.84 0 P
at(Test,3).REP.IBloc 440 440 816.001 816.001 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.978 207.978 4.79 0 P
Variance[ 1] 4480 0 4388.87 4388.87 44.30 0 P
Variance[ 2] 4480 0 3871.39 3871.39 43.38 0 P
Variance[ 3] 4608 0 4131.87 4131.87 42.38 0 P
Variance[ 4] 4400 0 3811.58 3811.58 42.26 0 P
Test.Genotype UnStructured 1 1 520.722 520.722 4.86 0 U
Test.Genotype UnStructured 2 1 392.218 392.218 4.21 0 U
Test.Genotype UnStructured 2 2 563.561 563.561 4.94 0 U
Test.Genotype UnStructured 3 1 256.719 256.719 3.43 0 U
Test.Genotype UnStructured 3 2 376.619 376.619 4.44 0 U
Test.Genotype UnStructured 3 3 392.056 392.056 4.65 0 U
Test.Genotype UnStructured 4 1 304.148 304.148 4.04 0 U
Test.Genotype UnStructured 4 2 268.839 268.839 3.59 0 U
Test.Genotype UnStructured 4 3 200.202 200.202 3.20 0 U
Test.Genotype UnStructured 4 4 356.775 356.775 4.66 0 U
Covariance/Variance/Correlation Matrix UnStructured Test.Genotype
520.7 0.7240 0.5682 0.7056
392.2 563.6 0.8012 0.5995
256.7 376.6 392.1 0.5353
304.1 268.8 200.2 356.8
MET ANALYSIS
BLUP values: Variant 1
Effect Level BLUP SE(BLUP)

Genotype G22 11.03 7.085

Test.Genotype 1.G22 10.43 8.368
Test.Genotype 2.G22 7.668 8.238
Test.Genotype 3.G22 -13.59 8.386
Test.Genotype 4.G22 1.297 8.198

BLUP values: Variant 2

Effect Level BLUP SE(BLUP)

Test.Genotype 1.G22 23.17 7.485

Test.Genotype 2.G22 17.8 7.12
Test.Genotype 3.G22 -1.8 7.147
Test.Genotype 4.G22 12.36 6.817
MET ANALYSIS
Factor Analytic models
• Useful approximations for modelling an U matrix on GxE or multivariate
analyses.
• Flexible models that require fewer variance-components than US, and tend
to converge better and quicker.
• Allow for additional interpretation of underlie environmental factors
associated with the matrix of correlations.
• Finding solutions for FA models can be difficult requiring proper
specification of initial values.
• Several alternative models are available within ASReml: FAk, FACVk and
XFAk.
• Based on the parameterization:

  '
MET ANALYSIS
FA model: FAk
  DCD
D is a diagonal matrix such that DD  diag ()
C is a correlation matrix of the form FF ' E
F is a matrix of loadings on the correlation scale
E is a diagonal matrix defined by difference (remnant).

FA model: FACVk
  '
 is a matrix of loadings on the covariance scale, with   DF
 is a diagonal matrix, with   DED
MET ANALYSIS
Example Variant 2: /MultiEnv/GxE_.as
!PART 4
HT ~ mu Test Test.REP !r,
at(Test,1).REP.IBlock at(Test,2).REP.IBlock,
at(Test,3).REP.IBlock at(Test,4).REP.IBlock,
Test.Genotype !f mv
4 1 1
4480 0 ID
4480 0 ID
4608 0 ID
4400 0 ID

Test.Genotype 2
Test 0 FA1
0.8 0.9 0.1 0.2 # 1st factor
520.7 563.6 392.1 356.8 # Site Variances
Genotype 0 ID
MET ANALYSIS
Interpreting variance components
Source Model terms Gamma Component Comp/SE % C
Residual 17968 16537
at(Test,1).REP.IBloc 440 440 1159.40 1159.40 9.75 0 P
at(Test,2).REP.IBloc 440 440 1961.62 1961.62 10.84 0 P
at(Test,3).REP.IBloc 440 440 815.999 815.999 9.18 0 P
at(Test,4).REP.IBloc 440 440 207.516 207.516 4.79 0 P
Variance[ 1] 4480 0 4389.44 4389.44 44.29 0 P
Variance[ 2] 4480 0 3871.43 3871.43 43.38 0 P
Variance[ 3] 4608 0 4131.95 4131.95 42.38 0 P
Variance[ 4] 4400 0 3811.38 3811.38 42.26 0 P
Test.Genotype FA D(LL'+E)D 1 1 0.787009 0.787009 10.71 0 U
Test.Genotype FA D(LL'+E)D 1 2 0.931814 0.931814 17.28 0 U
Test.Genotype FA D(LL'+E)D 1 3 0.818246 0.818246 11.66 0 U
Test.Genotype FA D(LL'+E)D 1 4 0.695414 0.695414 7.58 0 U
Test.Genotype FA D(LL'+E)D 0 1 519.153 519.153 4.83 0 U
Test.Genotype FA D(LL'+E)D 0 2 563.923 563.923 4.94 0 U
Test.Genotype FA D(LL'+E)D 0 3 391.055 391.055 4.63 0 U
Test.Genotype FA D(LL'+E)D 0 4 359.863 359.863 4.63 0 U
Covariance/Variance/Correlation Matrix FA D(LL'+E)D Test.Genotype
519.1 0.7333 0.6440 0.5472
396.8 563.9 0.7625 0.6480
290.2 358.1 391.1 0.5690
236.5 291.9 213.4 359.9
MET ANALYSIS
Two-Stage Analyses

• An MET analysis with several sites (> 5) is difficult to obtain, particularly

if there are too many variance components to estimates (e.g. US).
• It is possible to use a two-stage analysis that is decomposed as:

1st Stage
• Every site is analysed individually with its own characteristics.
• Genotype effects are assumed fixed.
• Means and SEMs are obtained for each site.

2nd Stage
• All means (and SEMs) are combined into a single file.
• The use of !TWOSTAGEWEIGHTS generates weights (and covariance) for
each prediction and combines the analyses into a single run.
Session 10
Spatial
Analysis

Gezan and Munoz (2014). Analysis of Experiments using ASReml: with emphasis on breeding trials ©
SPATIAL ANALYSIS
General Uses
• It corresponds to an extension to the single vector repeated measures analysis.
• Incorporates information from physical positions (x and y coordinates).
• Effect: improves estimates (BLUPs) and allows for a better control of errors.
Hence, it will increase heritability and genetic gains.
• More efficient analysis (under presence of correlation) as it ‘borrows’
information from neighbours.
• ASReml can handle regular or irregular grids.
• Can be used for unreplicated trials!

Difficulties
• At the present is more like an ‘art’ that requires to evaluate several options.
• Requires the knowledge of the position of each individual experimental unit
(e.g. plant or plot).
• Additional variance components need to be estimated (i.e. convergence
problems).
SPATIAL ANALYSIS
• Gradients or Trends
Linear trends
Polynomial functions, e.g. f(xc, yc) = + 1xc + 2yc + 3 xc2 yc + 4xc yc2
Row or Column effects (random).
• Patches
Incomplete Blocks
Spatial Error Structures, e.g. AR1  AR1 + 
Var (eij) = s2 + ms2
Cov (eij , ei’j’) = s2 ρxhx ρyhy
SPATIAL ANALYSIS

Strategy in ASReml (regular grid)

• Begin with an separable autorregressive error structure: AR1AR1. This is
a first order autorregressive model that assumes separate correlations x and
y for columns and rows, respectively (i.e. AR1).
• Evaluate if a nugget effect is required (i.e. !r units).
• Check variogram and incorporate additional random or fixed effects for
trends.
• Use a likelihood ratio test (LRT), BIC or AIC to compare models.

Strategy in ASReml (irregular grid)

• Begin with an isotropic exponential (i.e. IEXP) and then move to more
complex models (e.g. AEXP) .
• As before, evaluate if a nugget effect is required (i.e. !r units), check
variogram and incorporate additional random or fixed effects.
VARIANCE STRUCTURES
Correlation/Spatial structures
AR1 First order autoregressive 1
AR2 Second order autoregressive 2
ARMA Autoregressive and moving average 2
LVR Linear variance 1
IEXP Isotropic Exponential 1
AEXP Anisotropic Exponential 2

Relevant functions in ASReml

!S2==1 used to fix the R variance to 1.0
!f mv to include dummy missing values in sparse form
units includes nugget (microsite) random error
pol(y,n) forms a set of orthogonal polynomials of order n
lin(f) transform the factor f into a covariate
fac(v) forms a factor with the values of a continuous variable
spl(v,k) defines a spline model term for the variable v with k knots
SPATIAL ANALYSIS
Heritability in spatial models
• Traditional expression is only valid when distance between individuals is
assumed to be zero.
• Generic expression for spatial analyses:

4g2
h2 
g2  (|xdx|  |ydy| )  e2  02

• An alternative is to use the PEVs to approximate the mean parental

heritability:

mean{PEV (g)}
h2
 1
 2g
PEV
SPATIAL ANALYSIS
Comparing spatial models
• Use LRT when models are nested and have the same fixed effect terms.
• Compare AIC (Akaike Information Criteria) and BIC (Bayesian
Information Criteria) to select among non-nested models (but with same
fixed effect terms).
• Use a h2PEV to compare among different models.
• Calculate one of the proposed R2 expressions for mixed models.

AIC = – 2×logL + 2×t

BIC = – 2×logL + 2×t×log(v)

t number of variance parameters in the model

v residual degrees of freedom, v = n – p
SPATIAL TRIAL
Example: /Day2/Spatial/ROWCOL.TXT
An experiment was established to evaluate a group of open-pollinated families. The
experiment consisted in row-column design with 4 replicates. The plants within the
experiment where arranged in a 16x16 grid and is of interest to rank female parents
based on the response yield (YA) by fitting an spatial model.
ID REP ROW COL PLOT TREE FEMALE X Y YA
1 2 4 1 14 2 4 1 1 8.628352
2 2 4 1 14 1 4 1 2 7.718902
3 2 3 1 26 2 7 1 3 8.041164
4 2 3 1 26 1 7 1 4 9.593278
5 2 2 1 62 2 16 1 5 8.739841
6 2 2 1 62 1 16 1 6 8.456119
7 2 1 1 50 2 13 1 7 9.557565
8 2 1 1 50 1 13 1 8 10.639179
9 1 4 1 1 2 1 1 9 9.938713
10 1 4 1 1 1 1 1 10 8.332414
11 1 3 1 53 2 14 1 11 10.495654
12 1 3 1 53 1 14 1 12 10.130853
13 1 2 1 37 2 10 1 13 11.983712
14 1 2 1 37 1 10 1 14 12.080121
15 1 1 1 33 2 9 1 15 11.203263
16 1 1 1 33 1 9 1 16 10.757546
17 2 4 1 14 4 4 2 1 9.797591
18 2 4 1 14 3 4 2 2 9.206996
19 2 3 1 26 4 7 2 3 8.786462
...
SPATIAL TRIAL
Example: /Day2/Spatial/Spatial_.as
!RENAME !ARGS 1 2
Genetic Spatial trial
ID
REP !I
ROW !I
COL !I
PLOT !I
TREE
FEMALE !A
X 16 # X coordinate
Y 16 # Y coordinate
YA
ROWCOL.TXT !SKIP 1 !MAXIT 40 !DISPLAY 15 !DOPART $A

!PART 1
YA ~ mu REP !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 0 ID
16 0 ID

!PART 2
YA ~ mu REP fac(Y) fac(X) !r REP.ROW REP.COL FEMALE REP.PLOT !f mv
1 2 0
16 X AR1 0.3
16 Y AR1 0.3
SPATIAL TRIAL
Interpreting variograms
SPATIAL TRIAL
Traditional Analysis
LogL=-55.4337 S2= 0.40323 252 df

Source Model terms Gamma Component Comp/SE % C

REP.ROW 16 16 0.447880 0.180596 1.96 0 P
REP.COL 16 16 0.144506 0.582684E-01 1.32 0 P
FEMALE 16 16 0.260711 0.105125 1.81 0 P
REP.PLOT 64 64 0.105548 0.425594E-01 0.96 0 P
Variance 256 252 1.00000 0.403225 9.80 0 P

Spatial Analysis
LogL=-61.9450 S2= 0.41594 224 df

Source Model terms Gamma Component Comp/SE % C

REP.ROW 16 16 0.245284 0.102024 1.20 0 P
REP.COL 16 16 0.101193E-06 0.420904E-07 0.00 0 B
FEMALE 16 16 0.279467 0.116242 2.02 0 P
REP.PLOT 64 64 0.503325E-07 0.209354E-07 0.00 0 B
Variance 256 224 1.00000 0.415943 8.85 0 P
Residual AR=AutoR 16 0.522643E-01 0.522643E-01 0.68 0 U
Residual AR=AutoR 16 0.210814 0.210814 2.85 0 U

Wald F statistics
Source of Variation NumDF DenDF F-inc P-inc
11 mu 1 13.8 6143.27 <.001
2 REP 3 5.6 4.27 0.062
12 fac(Y) 14 14.8 3.33 0.014
13 fac(X) 14 43.1 1.60 0.119
SPATIAL ANALYSIS
BLUP values
Traditional Spatial
Female BLUP SE(BLUP) BLUP SE(BLUP)
1 -0.215 0.197 -0.277 0.189
2 0.204 0.197 0.191 0.190
3 -0.154 0.197 -0.129 0.188
4 -0.099 0.197 -0.207 0.189

Heritabilites
Traditional Spatial
Va 0.421 0.465
Vp 0.790 0.634
mean(PEV) 0.039 0.036
h2 0.532 0.733
h2pev 0.631 0.693
UNREPLICATED TRIALS (UR)

• Field experiments that allows testing several hundreds of genotypes with

little or no replication.
• Useful for initial stages of genotype screening.
• Most treatments (with the exception of controls or checks) have a single
replication.
• Checks are used for estimation of local control and to detect trends, and they
allow estimation of the residual variance.
• Typically augmented designs are the base for unreplicated trials.
• Using too many check plots could be expensive.
• Checks should have a similar response than test genotypes.
• Statistical analysis can be based in simple (e.g. RCBD) or spatial models
(e.g. AR1AR1).
UNREPLICATED TRIALS (UR)
General recommendations
• More control plots improve the efficiency of UR experiments.
• Important gains in efficiency are achieved by using spatial analyses.

11 C2 24 112 23 69 C1 96 22 6 34 C1
85 101 48 C1 28 7 89 60 C2 108 74 56
47 C1 10 43 C2 16 52 5 38 33 C2 93
65 111 64 100 81 104 C2 78 C1 113 21 106
12 C2 44 68 42 C1 97 17 32 73 C1 35
25 C1 27 C2 15 88 29 4 53 C2 55 75
102 84 1 49 C1 61 70 C2 18 95 37 C1
46 86 C2 63 2 51 79 39 59 92 C2 57
66 13 C1 82 41 98 C2 90 C1 77 20 36
C1 45 83 87 C2 62 3 30 72 54 105 76
26 C2 9 14 50 8 40 C1 31 19 C2 C1
110 103 67 C1 99 80 C2 71 91 58 109 94
UNREPLICATED TRIALS (UR)
Example: /Day2/UnRep/PEPPER.TXT
An unreplicated pepper trial was established to evaluate a total of 824 genotypes
planted in single plots and arranged as a RCBD with 4 blocks. In addition, a total of
10 control genotypes were planted with 20 replications each (i.e. 5 replications per
block). All these individuals were arranged in a 32x32 grid, and the response variable
yield, YD, was obtained. It is of interest to rank all the single replicated genotypes.

Gens Control Rep X Y YD

6 0 1 1 25 7.91
16 0 1 7 17 9.04
18 0 1 11 26 9.53
19 0 1 16 20 10.08
22 0 1 2 27 9.78
35 0 1 10 26 9.21
39 0 1 4 30 8.86
40 0 1 8 24 9.15
42 0 1 11 25 9.38
45 0 1 15 22 10.64
48 0 1 10 32 10.32
50 0 1 10 31 11.22
51 0 1 8 26 11.45
...
UNREPLICATED TRIALS (UR)
Example: /Day2/UnRep/Unrep_.as
!RENAME !ARGS 1
Augmented Design
Gens 824 !I !SORT
Control 2 !I !SORT
Rep 4 !I
X 32
Y 32
YD
PEPPER.TXT !SKIP 1

!DOPART $A !MAXIT 50

!PART 1
YD ~ mu !r Rep Gens !f mv
1 2 0
32 0 ID
32 0 ID

!PART 2
YD ~ mu !r Rep Gens
1 2 0
32 X AR1 0.5
32 Y AR1 0.5
UNREPLICATED TRIALS (UR)
Traditional Analysis
LogL=-478.184 S2= 0.74805 1023 df

Source Model terms Gamma Component Comp/SE % C

Rep 4 4 0.101193E-06 0.756971E-07 0.00 0 B
Gens 834 834 0.282634 0.211424 2.66 0 P
Variance 1024 1023 1.00000 0.748048 10.43 0 P

Spatial Analysis
LogL=-468.587 S2= 0.77062 1023 df

Source Model terms Gamma Component Comp/SE % C

Rep 4 4 0.101193E-06 0.779810E-07 0.00 0 B
Gens 834 834 0.238505 0.183796 2.48 0 P
Variance 1024 1023 1.00000 0.770617 11.04 0 P
Residual AR=AutoR 32 0.113712 0.113712 2.98 0 U
Residual AR=AutoR 32 0.120829 0.120829 3.06 0 U
Session 11
Generalized
Linear Mixed Models

General Uses
• It corresponds to an extension of the linear mixed models to situations with a
distribution other than the Normal, typically, Binomial and Poisson.
• It needs the specification of the distribution, together with a link function that
connects the response to the explanatory variables of the linear model.
• For linear models, estimation of parameters is based in maximum likelihood
estimation (MLE), and therefore it can run into problems.
• For linear mixed models, estimation of parameters is based in an
approximation to the MLE.
• Testing is done using a LRT, mainly in comparison of the mean deviance.

Difficulties
• Interpretation, and calculation of genetic parameters are more difficult as we
are in a different scale.
• Convergence problems are common, and with unbalanced data it is common
to have biologically inconsistent estimates.
BINOMIAL RESPONSES

General expression g( μ)  Xβ  Zg

 p 
Link: logit loge    Xβ  Zg
1 p 
μ 1 exp( Xβ  Zg )
Back-transformed model p  
ni ni 1  exp( Xβ  Zg )

p(1 - p)
Variance expression Var (p)  
ni
 over- under-dispersion parameter

Note: ni = 1 for binary data.

POISSON RESPONSES

General expression g( μ)  Xβ  Zg

Link: log loge m  Xβ  Zg

Back-transformed model m  exp( Xβ  Zg )

Variance expression Var (m)   (Xβ  Zg )

 over- under-dispersion parameter

FITTING A GLMM

Relevant functions in ASReml

!BIN assumes a Binomial distribution for the response
!TOTAL specifies vector with the Binomial totals
!POISSON assumes a Poisson distribution for the response
!DISP k estimates or fixed the dispersion parameter to k
!LOGIT considers a logit link function
!PROBIT considers a probit link function
!AOD obtains the analysis of deviance table for fixed effects.

Alternatives
• Perform a transformation of the original data, and then back-transform
predictions.
• Assume a normal distribution (by the CLT), whenever values are relatively
large.
• Collapse data into a higher strata (e.g. PLOT).
GLMM MODEL
Heritability in GLMM (Binomial)
• Calculation is not direct and it requires an approximation.
• Several alternatives are available in the literature

Logit approach
4  2
e2  2 / 3
2
hlogit  2 s 2 with
 s   e

Distributional approach
4  2s
h2
 2
 s   p (1  p )
Bin
BINOMIAL MODEL
Example: /Day2/GLMM/SALMONAB.TXT
A salmon breeding program evaluated a total of 933 records of fish originated
from 124 families. The objective is to select individuals that will constitute the
parents for the next generation. The response variables are MARKETA and
MARKETB, which are binary responses that indicate if a given individual makes it
for a given market category. The linear model to fit should consider the full
pedigree and the factor SEX as a covariate.
INDIV Sire Dam DaysM Sex MarketA MarketB
1001 564 727 741.46 1 1 1
1002 564 727 500.09 2 1 1
1003 564 727 495.07 1 1 1
1004 564 727 506.25 2 0 0
1005 564 727 593.21 2 1 1
1006 564 727 671.1 1 1 1
1007 564 727 523.48 1 1 1
1008 564 727 531.33 1 1 1
1009 564 727 446.02 2 1 0
1010 564 727 599.2 1 1 1
1011 564 727 509.38 2 1 1
1012 564 727 643.45 2 1 1
1013 607 707 711.68 1 1 1
...
BINOMIAL MODEL
Example: /Day2/GLMM/GLMFish_.as
!RENAME !ARGS 1
Breeding Program Salmon
INDIV 2040 !P !SORT
SIRE 115 !I
DAM 124 !I
DAYSM
SEX 2 !I
MARKETA
MARKETB
PEDIND.TXT !SKIP 1 !MAKE
SALMONAB.TXT !SKIP 1

!MAXIT 40 !DISPLAY 2 !FCON !DOPART $A

!PART 1
MARKETA !BIN !AOD ~ mu SEX !r INDIV
predict INDIV

!PART 2
MARKETA ~ mu SEX !r INDIV
BINOMIAL MODEL
Interpreting output
Analysis of Deviance Table for MARKETA
Source of Variation df Deviance Derived F
SEX 1 9.20 15.964
Deviance from GLM fit 931 536.33
Variance heterogeneity factor [Deviance/DF] 0.58
Notice: The Derived F is calculated assuming 931 degrees of freedom
which will usually be a false assumption under a mixed model.
The Analysis of Variance below is of the 'working' variable.

Approximate stratum variance decomposition

Stratum Degrees-Freedom Variance Component Coefficients
INDIV 6.68 0.575366 1.0

Source Model terms Gamma Component Comp/SE % C

INDIV 1380 1380 0.575366 0.575366 1.83 0 P
Variance 933 931 1.00000 1.00000 0.00 0 F

Wald F statistics
Source of Variation NumDF DenDF_con F-inc F-con M P-con
8 mu 1 162.4 276.33 134.54 . <.001
5 SEX 1 931.0 4.36 4.36 A 0.038
GLMM MODEL
Heritability
4  2
0.575
hlogit  2
2 s
  0.168
 s   e 0.575 / 4  1 3.290
2

Predictions
Predicted values of MARKETA
The SIMPLE averaging set: SEX

INDIV Logit_value Stand_Error Ecode Retransformed_value approx_SE

501 2.0548 0.7383 E 0.8864 0.0978
502 2.2073 0.7194 E 0.9009 0.0851
503 1.8882 0.7264 E 0.8686 0.1069
504 2.0722 0.7363 E 0.8882 0.0964
505 2.1586 0.7255 E 0.8965 0.0891
506 2.3017 0.7438 E 0.9090 0.0830
507 2.4341 0.7256 E 0.9194 0.0728
508 1.8930 0.7242 E 0.8691 0.1062
509 2.0337 0.7414 E 0.8843 0.0998
510 1.8163 0.7330 E 0.8601 0.1130
511 2.3666 0.7343 E 0.9142 0.0778
512 2.0696 0.7365 E 0.8879 0.0966
513 2.0390 0.7408 E 0.8848 0.0993
...
Session 12
Genomic Selection
In ASReml

• Genetic improvement aims to select the best individuals for the production
and breeding populations. However, traditional breeding is a long and
expensive process, with many traits difficult to measure.

• More than 20 years ago molecular markers became the promise to aid
breeders in selection using Marker Assisted Selection (MAS). To perform
MAS QTL or association genetics type of analysis was required.

• MAS did work, in a few situations, where a marker-QTL association was

found to explain a significant portion of the variance mainly from single
QTLs with large effect.

• However, most traits of interest in breeding programs are quantitative

complex traits – controlled by a large number of genes.

• Meuwissen et al. 2001 proposed to use all markers simultaneously as

random effects to predict genetic performance (a.k.a. Genomic Selection)
GENOMIC SELECTION
Supplementary Figures

• Construct prediction
100
90
80
a models using the current
b breeding population phenotype 100
90
80
70 Range: 0.980 ~ 1.051 70 Range: -0.0227 ~ 0.0256

and molecular markers capturing most of the quantitative variation

60 60 Mean: -0.00026

Density
Mean: 1.001
Density

50 50
SD: 0.00519 SD: 0.00455
40 40
30 30

20 20

Quantitative phenotypic information

10
0
10
0
Genotypic information
Diagonal elements of genetic relationship matrix Off-diagonal elements of genetic relationship matrix
(Rarw estimates) (Rarw estimates)

140 120
c d
120 100
100 Range: 0.983 ~ 1.043 Range: -0.0190 ~ 0.0214
80
Mean: 1.001 Mean: -0.00021
Density

80
Density

60 SD: 0.00380
SD: 0.00434
60
40
40
20
20

0 0

Diagonal elements of genetic relationship matrix Off-diagonal elements of genetic relationship matrix
(Adjusted estimates) (Adjusted estimates)

0.6
e
0.5

0.4
Breeding Value (BV) + Molecular Markers
Density

0.3

0.2

p
BV = 1m + åW j m j + e
0.1

0
Prediction model construction:
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1
0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9

Z-score

j=1
Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of

the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
GENOMIC SELECTION
• Future individuals are genotyped to be use as input on prediction models to
select superior genotypes in next cycles
Genotypes Generation i Molecular Markers

Supplementary Figures

100 100
90 a 90 b
80 80
70 Range: 0.980 ~ 1.051 70 Range: -0.0227 ~ 0.0256
60 60 Mean: -0.00026
Density

Mean: 1.001
Density

50 50
SD: 0.00519 SD: 0.00455
40 40
30 30

20 20
10
10
0
0

Diagonal elements of genetic relationship matrix

(Rarw estimates) Selection Generation i+1 Off-diagonal elements of genetic relationship matrix
(Rarw estimates)
Prediction
140 120
p
BVF = åW jF m̂ j
c d
120 100
100 Range: 0.983 ~ 1.043 Range: -0.0190 ~ 0.0214
80
Mean: 1.001 Mean: -0.00021
Density

80
Density

60 SD: 0.00380
SD: 0.00434
60
40
40
20
20

0 0

Diagonal elements of genetic relationship matrix Off-diagonal elements of genetic relationship matrix
j=1
(Adjusted estimates) (Adjusted estimates)

0.6
e
0.5

0.4
Density

0.3

0.2

0.1

0
Deployment
-4.1
-3.6
-3.1
-2.6
-2.1
-1.6
-1.1
-0.6
-0.1
0.4
0.9
1.4
1.9
2.4
2.9
3.4
3.9

Z-score

Supplementary Figure 1 Histograms of (a) the diagonal and (b) the off-diagonal elements of

the raw estimates of the genetic relationship matrix, (c) the diagonal and (d) the off-diagonal
BENEFITS OF GS

• Decrease the generation cycle of breeding (e.g. Perennials, Cattle).

• Decrease the cost of testing (e.g. Cattle, Maize).
• Screening a larger number of genotypes without field testing, thus
increasing the selection pressure (e.g. Maize, other cereals).
• Predict performance for difficult and/or expensive traits (e.g. Cattle,
Salmon).
• Predict performance for diseases avoiding challenging and losing the
germplasm (all species).
• Can be used regardless the genetic architecture of the trait.

Note
• To apply GS successfully the constructed models need to accurately predict
the genetic performance.
GENOMIC SELECTION

Accuracy depends on:

• The level of linkage disequilibrium (LD) between the markers and the QTL
(effective population size and genotyping density).

• The number of individuals with phenotypes and genotypes in the reference

population (training set) from which the marker effects are estimated.

• The heritability of the trait in question, or, if deregressed breeding values

are used (clonal means or progeny testing), the reliability of these breeding
values.

• The distribution of QTL effects, i.e. number of loci involved.

• Quality of the phenotyping used to construct the prediction model.

ANALYTIC METHODS FOR GS
• BLUP-Based: G-BLUP, RR-BLUP, RR-BLUP_B
• Bayes-Based: BayesA, BayesB, BayesCπ, BayesR
• LASSO-Based: Bayesian Lasso Regression, Improved Lasso
• Semi-Parametric Regression: RKHS
• Non-Parametrics: Suport Vector Machine, Neural-Networks
• Others...

Meuwissen et al 2001; Habier et al 2011; De los Campos et al 2009; Legarra et al 2011;

Gianola et al 2006; Long et al 2011; Gianola et al 2011
GBLUP
• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the
same framework than BLUP analysis, but replaces:

– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.

• GA is also known as observed relationship matrix or genomic matrix.

FROM MARKERS TO GA
Example:
â = Wm̂
1 0 1 2  0.24  0.44
2  0.02  0.80
2 0 2
W  m̂    â   
2 1 1 0  0.08 0.42
     
0 2 2 1  0. 14   0 .02 

• If the markers are capturing all genetic variation, then we can assume that:

• If we also assume: a  Wm
V ( m )  I 2m
• Then we get:
V ( a )  W W' 2m
which is a covariance matrix for the individual breeding values a
FROM MARKERS TO GA
• Ideally, we want to model this covariance using the same classical Linear
Mixed Model framework, therefore, it would be desirable to have this
matrix in terms of σ2a


2

  q m
2 ALL _ SNPs 2 2
2p a
i 1
i1
a m ALL _ SNPs
i i 2 pi qi

• If we recall then: V ( a )  W W' 2m

WW '  2a
by replacing σ2m. V( a )   GA  2a
2 p q
i
i i
ANIMAL MODEL GBLUP

y  Xβ  Z1b  Z2a  e
β vector of fixed effects
b vector of random design effects (e.g. block effect), ~ N(0, Iσ2b)
a vector of random additive effects (i.e. BV), ~ N(0, GAσ2a)
e vector of random residual effects, ~ N(0, Iσ2)

Note:
• The variance-covariance matrix (GA) of the additive effects is now
derived from molecular markers, and it replaces the old A matrix.
GBLUP
• Genomic BLUP (GBLUP) is a Genomic Selection method that uses the
same framework than BLUP analysis, but replaces:
– The numerator relationship matrix (A) derived from the pedigree by,
– The realized relationship matrix (GA) derived from molecular
markers.
• GA is also known as observed relationship matrix or genomic matrix.

 1 0.5 0.25 0   0.98 0.42 0.23  0.02

 0.5   0.42 0.99 0.26 0.01 
1 0.25 0
A  GA  
0.25 0.25 1 0.25  0.23 0.26 1.03 0.20 
   
 0 0 0.25 1   0.02 0.01 0.20 0.99 
GBLUP

 1 0.5 0.25 0   0.98 0.42 0.23  0.02

 0.5   0.42 0.99 0.26 0.01 
1 0.25 0
A  GA  
0.25 0.25 1 0.25  0.23 0.26 1.03 0.20 
   
 0 0 0.25 1   0.02 0.01 0.20 0.99 

ADVANTAGES AND CONSIDERATIONS

• The use of GBLUP instead of the pedigree-based BLUP was shown to

partition better the genetic from environmental variation.
• The A matrix is derived based on the infinitesimal model and represents and
average relationship.
• The relationship matrix derived from the markers is more informative
because the relationships estimates include the Mendelian sampling.
• Finally, GBLUP is unbiased: E(GA) = A
GBLUP
ADVANTAGES AND CONSIDERATIONS (cont.):
• GBLUP uses the same framework that BLUP (Linear Mixed Models).
• Fewer normal equations need to be solved in the fitting of the model.
• GBLUP is equivalent to RR_BLUP but it is simpler to implement.
• Allows the direct estimation of individual’s accuracies (i.e. SEP found in
sln files).
• Permits the simultaneous analysis of genotyped an non-genotyped
individuals.

Problem:
• GA matrix is usually not positive definite
Solution:
• Bending the matrix (e.g. diag(GA) + 0.00001).
• Blending the matrix (e.g. GA* = 0.99 GA + 0.01 A).
GBLUP

COMPUTING THE RELATIONSHIP MATRIX

• There are several different algorithms to compute the GA matrix from SNP
data:
• Hayes and Goddard (2008)
• Van Raden (2008) – 2 methods
• Yang et al. (2010) – Human genetics

• Relationship matrices work well to model the variance-covariance of

additive effects assuming a large number of markers is used.
• Overall, the different algorithms to calculate GA do not differ considerably
in their predictive ability.
GBLUP in ASReml
User supplied special variance structures

• The relationship matrix (GA) is computed using a given algorithm from other
software (R, Fortran, etc.) based on molecular markers, and then supplied to
ASReml.

• The GA matrix is supplied as an independent file in ASCII format.

• It should be located (in the job file) after the pedigree file, but before the
dataset file (there is a maximum of up to 98 GA matrices)
• The extension of the file is:
name.grm if the relationship matrix, GA, is provided.
name.giv if the inverse of the relationship matrix, GA-1, is provided.
GBLUP in ASReml
G matrix format
• Could be in dense format (lower triangular row-wise), but need to specify the
!DENSE command, or
• Can be read as SPARSE (default) format: row, column, value (lower
triangular row-wise sorted column within rows).
• All diagonal elements of the matrix must be included in the file (even 1s).

Options
!SKIP [n]
!DENSEGRM, !DENSEGIV
!SAVEGIV [f] default dense format, use f = 1 for sparse format

Warning
• The number and order of levels have to match perfectly the ones used for
the associated factor, e.g. animalID, read in the data.
GBLUP in ASReml
How to associate the G matrix with the genetic factor?

A. In the variance specification lines, e.g.

DAYSM ~ mu SEX !r INDIV
0 0 1
INDIV 1
INDIV 0 GIV1 0.2

B. Directly in the model, e.g.

DAYSM ~ mu SEX !r giv(INDIV,1) 0.12

Warning: The number and order of levels have to match the ones used for
the associated factor read in the data.
GBLUP in ASReml
Example: /GBLUP/
An experiment consisting in evaluating a total of 10 individuals originating from
full-sib families of 4 sires and 4 dams. The objective is to fit a parental model
(i.e. select sires) that considers the molecular pedigree information.

DATA.txt PEDSIRE.txt

INDIV Sire Dam Resp INDIV Sire Dam

1001 10 50 155 10 1 0
1002 10 60 121 20 2 0
1003 10 70 130 30 2 0
1004 20 50 141 40 1 0
1005 20 60 130
1006 20 70 162
1007 30 50 118
1008 30 60 108
1009 30 70 119
1010 40 80 143
GBLUP in ASReml
10 20 30 40
 1 0 0 0.25
 0 1 0.25 0 
A 
 0 0.25 1 0 
 
0.25 0 0 1 
 1.023 0.012  0.036 0.364  1.130  0.020 0.073  0.421
 0.012 0.992 0.226 0.023  0.020 1.062  0.237  0.001
GA   G A 
1 
 0.036 0.226 1.016 0.068  0.073  0.237 1.046  0.093
   
 0.364 0.023 0.068 0.987   0 .421  0.001  0.093 1.175 
GMATRIX.grm GINVG.giv

Col Row G Col Row GINV

1 1 1.023 1 1 1.130249244
2 1 0.012 2 1 -0.020490012
2 2 0.992 2 2 1.062319971
3 1 -0.036 3 1 0.072807826
3 2 0.226 3 2 -0.2369711
3 3 1.016 3 3 1.045793666
4 1 0.364 4 1 -0.421368173
4 2 0.023 4 2 -0.0008723
4 3 0.068 4 3 -0.093379618
4 4 0.987 4 4 1.175023193
GBLUP in ASReml
GINV Matrix G Matrix
!RENAME !ARGS 2 !RENAME !ARGS 4
Evaluating GBLUP Evaluating GBLUP
INDIV 10 !I INDIV 10 !I
Sire 4 !I Sire 4 !I #!P #!I
Dam 3 !I Dam 3 !I
Resp Resp
GINVM.giv !SKIP 1 GMATRIX.grm !SKIP 1
DATA.txt !SKIP 1 !DOPART $A DATA.txt !SKIP 1 !DOPART $A

!PART 2 # Using GINVM.giv !PART 4 # Using GMATRIX.grm

Resp ~ mu !r giv(Sire,1) Dam Resp ~ mu !r giv(Sire,1) Dam
predict Sire

!PART 3 # Another way for GINV

Resp ~ mu !r Sire Dam
1 1 1
10 0 ID

Sire 1
Sire 0 GIV1 200
predict Sire
GBLUP in ASReml
Predictions for ‘new’ individuals
10 20 30 40 50 60
1.023 0.012 -0.036 0.364 0.083 0.176
0.012 0.992 0.226 0.023 0.023 0.508

GA  -0.036 0.226 1.016 0.068 -0.011 0.136

0.364 0.023 0.068 0.987 0.123 0.495
0.083 0.023 -0.011 0.083 0.996 0.077
0.176 0.508 0.136 0.495 0.077 1.010

!RENAME !ARGS 4
Evaluating GBLUP
INDIV 10 !I
Sire 4 !P #!I
Dam 3 !I
Resp
DUMMYPED.txt !MAKE !SKIP 1
GMATRIX6.grm !SKIP 1
DATA.txt !SKIP 1 !DISPLAY 7 !DOPART $A

!PART 5 # Doing Predictions GMATRIX6.grm

Resp ~ mu !r giv(Sire,1) Dam
GBLUP in ASReml
Predictions for ‘new’ individuals
Source Model terms Gamma Component Comp/SE % C
Dam 4 4 0.318666 46.7809 0.48 0 P
giv(Sire,1) 6 6 1.14196 167.642 0.81 0 P
Variance 10 9 1.00000 146.802 1.41 0 P

Sire Predicted_Value Standard_Error Ecode

10 135.8410 7.3084 E
20 141.4311 7.3654 E
30 120.1485 7.3634 E
40 137.4303 9.8927 E
50 134.5924 15.2820 E
60 139.5677 11.4333 E
SED: Overall Standard Error of Difference 12.58
GBLUP in ASReml
FINAL COMMENTS
• Modifications can be done that incorporate observed relationships of parents
and all offspring.
• Individuals with measurements correspond to training population and ‘new’
individuals in GA matrix are treated as prediction population.
• It is possible to combine pedigree data (A) with observed relationships (GA)
into a single matrix. This will allows to consider individuals without
molecular data.
• Observed dominance (GD) relationship matrix can also be incorporated to
model these interactions or higher order interactions, e.g. A#D.
• Further understanding of the construction (and properties) of the GA matrix
are required.

Chapter 06 Design and Analysis of Experiments Solutions Manual
No ratings yet
Chapter 06 Design and Analysis of Experiments Solutions Manual
22 pages
CN - Unit 1
No ratings yet
CN - Unit 1
196 pages
Repair Appx V3
No ratings yet
Repair Appx V3
18 pages
IEEE Paper (DEVELOPMENT OF PROGRAMMING LANGUAGE PYTHON)
No ratings yet
IEEE Paper (DEVELOPMENT OF PROGRAMMING LANGUAGE PYTHON)
16 pages
Weekend Workshop I: Proc Mixed
No ratings yet
Weekend Workshop I: Proc Mixed
23 pages
Stins
No ratings yet
Stins
89 pages
Experiment 11
No ratings yet
Experiment 11
9 pages
Vut Pqec It
No ratings yet
Vut Pqec It
4,185 pages
DuraLabel Kodiak User Guide
No ratings yet
DuraLabel Kodiak User Guide
60 pages
ME8793 Process Planning and Cost EStimation UNIT 5 Notes
No ratings yet
ME8793 Process Planning and Cost EStimation UNIT 5 Notes
26 pages
Lecture Note For Chapter 3 and 4
No ratings yet
Lecture Note For Chapter 3 and 4
46 pages
A Novel Deep Learning Framework Approach For Sugarcane Disease Detection
No ratings yet
A Novel Deep Learning Framework Approach For Sugarcane Disease Detection
20 pages
Click! Game Over
No ratings yet
Click! Game Over
49 pages
UniBeast Install MacOS Mojave On Any Supported Intel-Based PC
No ratings yet
UniBeast Install MacOS Mojave On Any Supported Intel-Based PC
16 pages
Next-Generation: Pitch Deck
No ratings yet
Next-Generation: Pitch Deck
285 pages
Haire and Tortoise in Nepali
No ratings yet
Haire and Tortoise in Nepali
30 pages
Pearson 1932
No ratings yet
Pearson 1932
59 pages
2014 Lab 9 B
No ratings yet
2014 Lab 9 B
16 pages
FinGAT Financial Graph Attention Networks For Recommending Top-KK Profitable Stocks
No ratings yet
FinGAT Financial Graph Attention Networks For Recommending Top-KK Profitable Stocks
13 pages
CSE-IT-A2-I-II-III-IV-V-VI Updated Syllabus Ver 3
No ratings yet
CSE-IT-A2-I-II-III-IV-V-VI Updated Syllabus Ver 3
144 pages
Untitled
No ratings yet
Untitled
547 pages
Statistical Data Analysis - 2 - Step by Step Guide To SPSS & MINITAB - Nodrm
No ratings yet
Statistical Data Analysis - 2 - Step by Step Guide To SPSS & MINITAB - Nodrm
83 pages
Jamaica, Build Free Back Links
No ratings yet
Jamaica, Build Free Back Links
7 pages
Abstractive Text Summarization Using Deep Learning
No ratings yet
Abstractive Text Summarization Using Deep Learning
7 pages
The India Growth Story
No ratings yet
The India Growth Story
13 pages
All Crops Leaflet
No ratings yet
All Crops Leaflet
2 pages
LECTURE 3 - ECE521 How To Install Arm Keil PDF
No ratings yet
LECTURE 3 - ECE521 How To Install Arm Keil PDF
27 pages
1993 - Interpolation in Digital Modems. II. Implementation and Performance
No ratings yet
1993 - Interpolation in Digital Modems. II. Implementation and Performance
11 pages
Broiler Pocket Guide: An Aviagen Brand
No ratings yet
Broiler Pocket Guide: An Aviagen Brand
64 pages
DDDDDD
No ratings yet
DDDDDD
22 pages
Cedex
No ratings yet
Cedex
189 pages
Hello Python
No ratings yet
Hello Python
102 pages
严蔚敏《数据结构》（C语言版）笔记和习题（含考研真题）详解
No ratings yet
严蔚敏《数据结构》（C语言版）笔记和习题（含考研真题）详解
324 pages
Coca Cola ICOOL Group 3
No ratings yet
Coca Cola ICOOL Group 3
160 pages
Thamilaga Thonmai Chinnangal
No ratings yet
Thamilaga Thonmai Chinnangal
72 pages
Own Cloud Admin Manual
No ratings yet
Own Cloud Admin Manual
87 pages
Firmas Palo Alto
No ratings yet
Firmas Palo Alto
5 pages
ملزمة الاسلامية 2024 الاستاذ عدنان البياتي - السادس العلمي - موقع سطور
No ratings yet
ملزمة الاسلامية 2024 الاستاذ عدنان البياتي - السادس العلمي - موقع سطور
127 pages
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
No ratings yet
P.E.S. College of Engineering, MANDYA, 571401: Identifying The Android Malware Using Machine Learning Algorithm
34 pages
B450 Aorus Elite V2 Rev1.1
No ratings yet
B450 Aorus Elite V2 Rev1.1
39 pages
EC2 Tutorial
No ratings yet
EC2 Tutorial
18 pages
D. K. Jain & R. Malhotra
No ratings yet
D. K. Jain & R. Malhotra
236 pages
测度论与概率论基础程士宏
No ratings yet
测度论与概率论基础程士宏
254 pages
Internet of Things Based Smart Environments: State-Of-The-Art, Taxonomy, and Open Research Challenges
No ratings yet
Internet of Things Based Smart Environments: State-Of-The-Art, Taxonomy, and Open Research Challenges
12 pages
Manual DNAExtraction
No ratings yet
Manual DNAExtraction
2 pages
Computation and Cognition Toward A Foundation For Cognitive Science. (Zenon W. Pylyshyn) (Z-Library) - 109-126 - CAP4
No ratings yet
Computation and Cognition Toward A Foundation For Cognitive Science. (Zenon W. Pylyshyn) (Z-Library) - 109-126 - CAP4
18 pages
Oocha DXN
No ratings yet
Oocha DXN
16 pages
Lpi Passleader 101 500 Exam Question 2021 Dec 28 by Chisel 176q
No ratings yet
Lpi Passleader 101 500 Exam Question 2021 Dec 28 by Chisel 176q
11 pages
Business Studies Chapter 1 Quesitons
No ratings yet
Business Studies Chapter 1 Quesitons
10 pages
25 Pdfresizer - Com PDF Crop
No ratings yet
25 Pdfresizer - Com PDF Crop
18 pages
EE1354 MODERN CONTROL SYSTEMS - Final
No ratings yet
EE1354 MODERN CONTROL SYSTEMS - Final
23 pages
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Compiler Design - WWW - Rgpvnotes.in
19 pages
Lab 3 (GROUP 6)
No ratings yet
Lab 3 (GROUP 6)
6 pages
University of Waterloo Department of Management Sciences MSCI 331: Introduction To Optimization FALL 2019 Assignment 4
No ratings yet
University of Waterloo Department of Management Sciences MSCI 331: Introduction To Optimization FALL 2019 Assignment 4
3 pages
Fmo PVT 2021 Ans Key
No ratings yet
Fmo PVT 2021 Ans Key
8 pages
Image To Text and Speech Conversion
No ratings yet
Image To Text and Speech Conversion
3 pages
BADB1014 Quantitative Methods - Lesson 3
No ratings yet
BADB1014 Quantitative Methods - Lesson 3
23 pages
2018 Microprocessor Optimizations For The IOT
No ratings yet
2018 Microprocessor Optimizations For The IOT
14 pages
Practical Key-2023
No ratings yet
Practical Key-2023
5 pages
Curriculum Guide: Artificial Intelligence and Machine Learning: Business Applications
No ratings yet
Curriculum Guide: Artificial Intelligence and Machine Learning: Business Applications
8 pages

ASReml Workshop PDF

Uploaded by

ASReml Workshop PDF

Uploaded by

Analysis of Experiments using ASReml:

with emphasis on breeding trials ©

“ASReml is an statistical packages that fits linear mixed models using

“Typical applications include the analysis of (un)balanced longitudinal data,

• Useful for analysis of large and complex dataset.

UserGuide.pdf (use Find window for searching)

• Identify the problem and experimental design / observational study.

Definition / modification of linear model.

• Extract final output.

Source Variety Bk1 Bk2 Bk3 Bk4 Bk5 Bk6

yield = µ + block + variety + error

Univariate analysis of yield

Model term Size #miss #zero MinNon0 Mean MaxNon0 StndDevn

- - - Results from analysis of yield - - -

Approximate stratum variance decomposition

Source Model terms Gamma Component Comp/SE % C

Solution Standard Error T-value T-prev

Alfalfa experi ment - 12 varieties - Response Yield vE_1_A

Approximate stratum variance decomposition

Source Model terms Gamma Component Comp/SE % C

gi ~ N[0,σg2] sg2 = 0.0277

Source of Num Den Variance P-value

Block 5 55 17.42 < 0.001

Ecode is E for Estimable, * for Not Estimable

The predictions are obtained by averaging across the hypertable

Variety Predicted_Value Standard_Error Ecode

.apj Project file created with ASReml-W

• Random effects: a factor where its levels consist of a random sample of

• Mixed linear models contain both random and fixed effects.

αi fixed effect of the ith block

Structural component (or blocking structure)

Explanatory component (or treatment structure)

Fixed effects: H0: µ1 = µ2 = … = µt

Random effects: H0: σg2 = 0

(i.e. is there a significant variation due to the random effects)

Test statistic: Chi-square (likelihood ratio test)

yield = µ + block + variety + error

X (n x r) design matrix for fixed effects

e12 e12 ... ert

• Random effects: E(g) = 0, V(g) = G = G(θ)

Note: normality assumptions can be made about g and e.

g ~ MVN(0, G) and e ~ MVN(0, R)

βˆ  (X' Vˆ 1X) 1 X' V

DIAG: diagonal CORUH: uniform heterogeneous

CORUV: uniform correlation US: unstructured

AR1: autocorrelation 1st order CORG: general correlation

• 100(1-α)% confidence interval for l’β

l’β ± zα/2 l’(X’V-1X)-1l

• Linear Combination of a function of fixed and random effects:

PEV( ˆg i )  cii  e2  (1  r 2 )  e2

r2: reliability (correlation between true and predicted BV)

• Based on asymptotic derivations.

• Test Statistic: d = 2 [ logL2 – logL1] ~ χ2r2-r1

t number of variance parameters in the model

Model with Variety

Source Model terms Gamma Component Comp/SE % C

Model without Variety

Source Model terms Gamma Component Comp/SE % C

STRUCTURE .as FILE

General Relevant File Syntax

Basic Model Syntax Operators

. interaction (e.g. A.B, interaction A and B).

Column variables (continuous or discrete)

Examples of name and type of variables

• ASCII file (delimited by: tab, comma or space).

e.g. 1 + 8 = 9  !DISPLAY 9 (default)

Specification of Linear Models

A.B indicates crossed factors

A*B = A + B + A.B SAS: A + B + A*B

log(v) calculates the natural logarithm of v

>asreml –<options> <filename> <arguments>

-c re-start iteration from latest one (continue)

Add commands/arguments in the first line of job file.

!RENAME renames the file with the arguments

!RENAME !ARGS 1 2 !WORKSPACE 1600

Probability distribution gi ~ N(0, σ2)

AB = A + B + A.B SAS: A + B + AB