Inferring an Original Sequence from Erroneous Copies : A Bayesian Approach

Keith, J.M., Adams, P., Bryant, D., Mitchelson, K.R., Cochran, D.A.E. and Lala, G.H.

    This paper considers the problem of inferring an original sequence from a number of erroneous copies. The problem arises in DNA sequencing, particularly in the context of emerging technologies that provide high throughput or other advantages, but at the cost of introducing many errors. We develop a Bayesian probabilistic model of the introduction of errors, and search for a sequence that has maximum posterior probability with respect to the model. We present results of extensive tests in which error-prone sequencing of real DNA was simulated. The results obtained using the new approach are compared to results obtained by deriving a consensus sequence from a multiple sequence alignment. We find that a significant improvement in accuracy is obtained using the new approach. The implication is that high error levels need not be a barrier to the adoption of sequencing technologies that are in other respects promising, because most errors can be detected and corrected using a small number of reads.
Cite as: Keith, J.M., Adams, P., Bryant, D., Mitchelson, K.R., Cochran, D.A.E. and Lala, G.H. (2003). Inferring an Original Sequence from Erroneous Copies : A Bayesian Approach. In Proc. First Asia-Pacific Bioinformatics Conference (APBC2003), Adelaide, Australia. CRPIT, 19. Chen, Y.-P. P., Ed. ACS. 23-28.
pdf (from crpit.com) pdf (local if available) BibTeX EndNote GS