50% found this document useful (2 votes)
377 views42 pages

Bayesian Approach For Animal Breeding Data Analysis

This document is a presentation regarding how to analyse animal breeding data using Bayesian approach for deriving genetic parameter estimates.

Uploaded by

Gopal Gowane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
50% found this document useful (2 votes)
377 views42 pages

Bayesian Approach For Animal Breeding Data Analysis

This document is a presentation regarding how to analyse animal breeding data using Bayesian approach for deriving genetic parameter estimates.

Uploaded by

Gopal Gowane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Application of Bayesian Model for

Animal Breeding Data Analysis

G. R. Gowane
ICAR-CSWRI Avikanagar

Why do we need to estimate genetic


parameters?
knowledge of the
genetic architecture of
the population

They are necessary to plan an efficient breeding


program for the trait of interest.

Accurate estimation of VC:


Prediction error variances for
predicted random effects (BV)
increase as differences between
estimated and true values of VC
increase
Henders
on, 1975

Choice of estimating VC
REML
DFREML, MTDFREML, VCE, Wombat

Gibbs Sampling (GS) algorithm for Bayesian


analysis
MTGSAM (Van Tassell and Van Vleck, 1995)
RRGibbs (Meyer)

Why Bayes?
Combine information from prior to obtain the
posterior distribution.
MORE ACCURACY!!

Memory space required for estimating


variance components.
Does it really matter these days?

Threshold traits analysis


Several algorithms are being searched

It is an alternate approach

Bayesian statistics uses probability to express


uncertainty about the unknowns that are
being estimated.
The use of probability is more efficient than
any other method of expressing uncertainty.
Unfortunately, to make it possible, inverse
probability needs the knowledge of some
prior information.

Three problems of Bayesian approach


Difficulty of integrating prior information
how to represent ignorance
because there is no prior information,
because we do not like the way in which this prior is
integrated,
because we would like to assess the information provided
by the data without prior considerations

Use of probability to express uncertainty, this leads to


multiple integrals that cannot be solved even by using
approximate methods
1990s MCMC a numerical method came to resque

In Bayesian theory we
know that all problems
are reduced to a single
pathway: we should
look for a posterior
distribution, given the
distribution of the data
and the prior
distribution.

A brief history of Bayesian analysis


Bayes (1763)
Links statistics to probability

Laplace (1800)
Normal distribution
Many applications, including census
[sampling models]

Gauss (1800)
Least squares
Applications to astronomy [measurement error models]

Keynes, von Neumann, Savage (1920s-1950s)


Link Bayesian statistics to decision theory

Applied statisticians (1950s-1970s)


Hierarchical linear models
Applications to animal breeding, education [data in groups]
Daniel Gianola (Wang et al., 1994) and Daniel Sorensen (Sorensen et al. 1994)
brought these techniques into the field of animal breeding.
9

Prior???
Litter Size of Landrace Pig ~ 10

Spanish Landrace?
Mean = 5
Prior mean = 10
What to do?
Prior information is the
information about the
parameters we want to
estimate that
exists before we perform our
experiment.

Bayesian Variance Components Model


Prior Distributions
"flat" prior distribution for the "fixed" effects, that is, there
is no prior knowledge about these effects.
Next, the random effects are assumed to be normally
distributed. For the genetic effects there will be an
additional assumption of a known covariance structure
among those random effects corresponding to the
relationship matrix.
Finally, the residual effects are assumed to be distributed
normally. These assumptions are the same as those used
with most likelihood based methods.
Results in BLUE and BLUP solutions for fixed and random
effects

Gibbs Sampling Animal Model


Gibbs sampling (GS) is a method of numerical
integration that allows inferences to be made
about joint or marginal densities, even when
those densities cannot be evaluated directly.

Posterior inference is affected by the specified


prior density unless the information in the
data analysed (likelihood) overwhelms the
prior.
With normality, the posterior distribution is
simply the (frequentist) likelihood function
scaled by prior distributions of the unknown
parameters in the model (Van Tassell and and
Van Vleck, 1996).

Burn In: The number of rounds discarded


before the values are considered samples
from the posterior distribution is usually
called the burn-in period. Raftery and Lewis
(1992)
Gibanal (Van Kaam, 1997) can be used to
define the burn in period and convergence
criteria for the problem

Defining the Number of Iterations or length


of the Gibbs sampling
The number of Gibbs sampling for executing
the program should be large enough.
Although Gibanal also dictates the length of
the chain, however, one long chain suffices
the need for the program (Geyer 1992).

MTGSAM
Van Tassell and Van Vleck (1995)
Model Assumptions
y = X + Zu + e
where is the vector of fixed effects associated with
records in y by X, and u is the vector of random effects
associated with records in y by Z, and e is the vector of
random residual effects.

Prior Distribution: The MTGSAM


flat prior distributions for the fixed effects.
For the genetic effects there will be an additional
assumption of a known covariance structure
corresponding to the numerator relationship
matrix.
Inverted Wishart (IW) distributions are used as
prior distributions for the (co)variance
components

Variance Components:
The MTGSAM posterior mean estimate for
(co)variance components is based on the
expected value of the IW RV
The mean of a (co)variance component is
calculated as the average of expected values
over the length of the post burn-in chain.

The MTGSAM for Genetic Analysis


Preparing a pedigree and data file

Malpura.prn file has 7 integers (three


pedigree components and 4 fixed effects) and
9 real variables (8 traits and 1 covariate)

This procedure completes the execution of the


program. The results obtained are stored mainly in the
MTGS60, MTGS61, MTGS62, MTGS63, MTGS81,
MTGS82 and MTGS83 files.
The unit MTGS61 file contains the observed values of
the variance components
The unit 62 file contains the parameters used to
generate the samples from the appropriate
distribution.
These values can be extracted by using the software
PULLDAT.EXE (Annexure 2) for calculating the Mean, SE
or SD for the estimates obtained.

MTGS81 is the log file of information from the


execution of MTGSNRM. The information
includes number of animals in A-1, number of
non-zero elements, and inbreeding
information

Actual
results
are
given in
the
MTGS8
3

MTGS72
Animal effect
Second
animal effect

Pulldat.exe
You need to have pulldat.exe
in the same folder where
MTGS61, MTGS62 and
MTGS63 are present.
Only after files are
subjected to pulldat they
can be further used for
Gibanal or other
analyses.
Open the CMD window and
change path directory to
pulldat
Give the name for output
fille.
Follow the options as given
except at the point Enter
the number of variables
in each record to be read
from MTGS61. Here in
our case, 5 variables
needs to be extracted,
however, the input will
change according to the
data in consideration and

Gibanal.exe
the observed values of the fixed and random
effects are written to unit 61
We can use Gibanal to see
Serial correlation
Burn in
Chain length

Bayesian analysis are carried out to estimate several


sampling for the same (co)variance component.
These values for Va or Vm etc. do not depend of number of
animals, but of length chain you run.
You can use 10,000 animal in your analysis, and in
frequentist approach you'll estimate only one Va or Vm
etc., but in bayesian approach you'll have sampling of
these variances, it depend on the sampling length, for
example:
1 - Iteration length = 1100
2 - Burn-in
= 100
3 - Thinning interval = 10
So, we have: iteration length - Burn-in = 1100 - 100 = 1000
(Sampling after burn-in).
Now, we made: Sampling after burn-in / thinning interval =
1000 / 10 = 100 (this is a number of sampled observations
for Va or Vm etc.), not the number of animals

Bloco Freqncia
2.18
1
2.46
3
2.73
5
3.01
10
3.29
13
3.57
22
3.85
23
4.13
9
4.41
2

Thank You

You might also like