BioinformaticsProjects Introduction
BioinformaticsProjects Introduction
Bioinformatics research projects are like traditional biological projects, only they are carried out
using computers. So does this mean you need to be able to know how to write programs to do
bioinformatics research? No. Most bioinformatic projects involve using programs to generate new
data (e.g. BLAST), querying online databases and resources (such as genome browsers), and/or
downloading, integrating and analysing existing datasets (such as microarray or proteomics data).
Others may involve improving or adding information to existing databases, or maybe even building
new ones. Some projects may also involve the design and implementation of new software, or
adding new functions to existing programs. In all cases, bioinformatics projects address a biological
research question, have clearly identifiable objectives that the project sets out to meet, and a proper
study design that attempts to address these objectives, just like any experimental or field project.
Most bioinformatic projects are designed to be accessible to all students in FLS, and can be tailored
to suit the skills and interests of individual students. Even in cases where some programming may be
involved, all interested students should be able to successfully complete the projects regardless of
the degree programme that they are on. The requirements for a bioinformatics project are familiarity
with the standard word processing and spreadsheet tools, an interest in computing, and enthusiasm
for data analysis, but not specialist programming skills.
Examples. These are either examples of projects which have been run already, or generic cases
which represent the types of things students often do on bioinformatics projects.
Understanding differential splicing in the developing chicken embryo using bioinformatics: We had
developed a database of 330,000 chicken cDNAs and ESTs (http://www.chick.umist.ac.uk), used by
researchers all over the world, derived from lots of different tissues including many chick embryos. In
this project, we mapped selected ESTs from different tissues back to the chicken genome, to see if
there are different alternative spliced genes that are represented in the chicken. Specifically, we were
able to identity genes involved in tissue-specific developmental roles, and then identify certain
spliced products favoured in some tissues over others. The project involved running lots of
bioinformatics tools and using the Ensembl genome browser web site, and required good data
handling skills will be essential. The literature review covered alternative gene splicing, tissue specific
gene expression and the development of vertebrates. The student involved got themselves on a
research paper (Tang H, Heeley T, Morlec R, Hubbard SJ. Characterising alternate splicing and tissue
specific expression in the chicken from ESTs. Cytogenet Genome Res. 2007; 117: 268-77).
Integrating proteomics data with genome annotation: genome sequences require annotating.
Although we can generate them at amazing rates, we need to know amongst all the ACGTs where the
genes are, where the exons are, and in particular which ones code for protein. One way to do this is
to use mass spectrometry data to “map back” to the genome sequence to either validate gene
predictions, suggest novel gene structure, or even find completely new genes. This process is termed
“proteogenomics” and we have a run a few projects on these lines, not least for the whipwoem
Trichuris muris which is the mouse version of a human nematode parasite which infects millions of
people worldwide. The project involves mapping the mass spec data on to genome/transcriptome
derived databases to reveal the underlying gene structure.