Bioinformatics Learning Framework
Bioinformatics Learning Framework
Revision of the CourseSource Framework with respect the NIBLSE Core Competencies 3/23/21 v3 (FINAL).
Updates approved by previous authors Sally Elgin and Lonnie Welch as well as the NIBLSE Leadership Team including Adam Kleinschmit, Bill Morgan, Mark
Pauley, Anne Rosenwald, Bill Tapprich, and Eric Triplett.
How are biological data types, structure, and reproducibility C8. Describe and manage biological data types,
described and managed? structure, and reproducibility.
Where are data about the genome (e.g., nucleotide
C5. Find, retrieve, and organize various types of
sequences, epigenomics) found and how are they stored,
biological data.
DNA - Information Storage retrieved, and organized?
[GENOMICS] C4. Use bioinformatics tools to examine complex
How can bioinformatics tools be employed to examine
biological problems in evolution, information flow,
complex biological problems in information storage and flow?
and other important areas of biology.
Where are data about the transcriptome (e.g., expression,
C5. Find, retrieve, and organize various types of
epigenomics, and structure) found and how are they stored,
biological data.
RNA - Information Transfer retrieved, and organized?
[TRANSCRIPTOMICS] C4. Use bioinformatics tools to examine complex
How can bioinformatics tools be employed to examine
biological problems in evolution, information flow,
complex biological problems, in information storage and flow?
and other important areas of biology.
Where are data about the proteome (e.g., amino acid
C5. Find, retrieve, and organize various types of
sequence and structure) found and how are they stored,
biological data.
Protein - Information in Action retrieved, and organized?
[PROTEOMICS] C4. Use bioinformatics tools to examine complex
How can bioinformatics tools be employed to examine
biological problems in evolution, information flow,
complex biological problems in protein structure and function?
and other important areas of biology.
Where are data about metabolomics and systems biology C5. Find, retrieve, and organize various types of
Small Molecules - Cellular found and how are they stored, retrieved, and organized? biological data.
Homeostasis [METABOLOMICS & How can bioinformatics tools be employed to examine C4. Use bioinformatics tools to examine complex
SYSTEMS BIOLOGY] complex biological problems in the flow of molecules within biological problems in evolution, information flow,
pathways? and other important areas of biology.
C4. Use bioinformatics tools to examine complex
Ecology and Evolution How can bioinformatics tools be employed to examine
biological problems in evolution, information flow,
[METAGENOMICS] complex biological problems in ecology and evolution?
and other important areas of biology.
How do biologists employ software development as part of the
scientific discovery process? How do they use bioinformatics C6. Explore and/or model biological interactions,
models to explore biological interactions, networks, and data networks and data integration using bioinformatics.
Computational Skills
integration?
What higher-level computational skills (e.g., command-line C7. Use command-line bioinformatics tools and write
tools and scripting) can be used in bioinformatics research? simple computer scripts.
What are the ethical, legal, medical, and social implications of C9. Interpret the ethical, legal, medical, and social
Societal Impacts of Bioinformatics
biological data and how are such implications interpreted? implications of biological data.
Topic Learning Goals Sample Learning Objectives
• Describe the role of bioinformatics in the scientific research method.
• Explain the necessity for computation in life sciences research.
What is the role of computation • Explain the role of wet-bench techniques in verifying computational results in
and data mining in addressing life science research.
hypothesis-driven and • Compare and contrast computer-based research with wet-lab research.
hypothesis-generating questions • Read a scientific article and evaluate how bioinformatics methods were
within the life sciences? employed by the authors to explore a particular hypothesis.
• Given a scientific question, develop a hypothesis and define computational
approaches that could be used to explore the hypothesis.
• Define the term algorithm.
• Explain the difference between a heuristic (approximate) algorithm and an
exact algorithm.
What computational concepts • Describe the three basic programming structures: sequential, repetition (e.g.,
(e.g., algorithms and database while, for) and selection (e.g., if).
relations) are important in • Use variables and data structures (e.g., lists, arrays, scalars, hash functions).
bioinformatics? What are their • Describe what a regular expression is.
applications in the life sciences? • Explain the concept of cloud computing.
Computation in the Life Sciences • Describe the importance of “big data” in bioinformatics.
• Describe the means by which “big data” are managed and stored (e.g.,
dmptool.org).
• Calculate average, median, mode, range, standard deviations for a given data
set.
What statistical concepts are • Calculate p-values using a t-test for discrete data.
important in bioinformatics and • Calculate p-values using a z-test for continuous data.
how are they applied? • Calculate an e-value statistic.
• Describe the importance of statistical analysis of big data sets.
• Create a network to illustrate gene interactions.
• Describe common types of biological data (e.g., genomic, transcriptomic,
proteomic) used in bioinformatics.
• Explain how the gene features in Gene Feature Format (GFF) files represent
How are biological data types,
genes in a visual format.
structure, and reproducibility
• Identify strengths and weaknesses of different biological data formats (e.g.,
described and managed?
FASTA, BAM, BED, VCF).
• Describe approaches to improve the reliability of data collection and to insure
reproducible data analysis.
• Compare reproducibility of biological and technical replicate data (e.g.,
transcriptomic data) using statistical tests (Spearman rank test and false
discovery calculations).
• Describe how nucleotide sequence data are represented (FASTA, FASTQ,
GenBank).
• Describe the nucleotide databases available at NCBI.
• Describe how the NCBI nucleotide databases intersect with other nucleotide
databases (EBI, DDBJ, UniProt, etc.).
• Compare and contrast the data contained in different nucleotide databases.
• Search for a sequence record in a nucleotide database with a given accession
Where are data about the
number.
genome (e.g., nucleotide
• Create a collection of nucleotide sequence records that meet a specified
sequences, epigenomics) found
criterion (e.g., gene name or symbol).
and how are they stored,
• Determine the DNA methylation state of a particular region of a genome.
retrieved, and organized?
• Describe the types of metadata that accompany sequence data to make for
useful biological interpretation (e.g., biological source, accession number,
GeneID, journal articles, etc.).
• Utilize variant databases such as ClinVar and gnomAD to assess frequency and
DNA - Information Storage
clinical implications of genomic variants (Human Specific).
[GENOMICS]
• Use a genome browser to visualize genomic features (UCSC Genome Browser,
NCBI, Ensembl, etc.).
• Calculate the alignment score between two DNA sequences using a provided
scoring matrix.
• Perform a BLASTN search and interpret the results.
• Explain the BLASTN algorithm for nucleotide sequence information.
• Interpret the biological significance of an e-value.
How can bioinformatics tools be
• Annotate a prokaryotic gene (derive a model).
employed to examine complex
• Annotate a eukaryotic gene (derive a model).
biological problems in
• Create and interpret a multiple sequence alignment (e.g., T-COFFEE, MUSCLE,
information storage and flow?
etc.).
• For a genomic region of interest (e.g., the neighborhood of a particular gene),
use a genome browser to view nearby genes, transcription factor binding
regions, epigenetic information, etc.
• Describe Hidden Markov Models and how they can be used to assess motifs.
Where are data about the
• Identify the euchromatin/heterochromatin boundaries, histone states in a
transcriptome (e.g., expression,
RNA - Information Transfer given sequence, and the nucleosome modifications
epigenomics, and structure)
[TRANSCRIPTOMICS] • Compare and contrast DNA structure at telomeres and centromeres.
found and how are they stored,
• Describe the RNA databases available at NCBI (e.g., ESTs, UniGene).
retrieved, and organized?
• Describe the types of metadata that accompany sequence data to make for
useful biological interpretation (e.g., biological source, accession number,
GeneID, journal articles, etc.).
• Given a microarray or RNA-seq data file, find the set of significantly
differentially expressed genes.
How can bioinformatics tools be • Perform motif discovery on the promoter regions of a set of genes identified
employed to examine complex by a ChIP-seq experiment.
biological problems, in • Use RNA structure prediction programs (e.g., RNASoft, RNAFold, RNAStructure)
information storage and flow? to evaluate possible structures for an RNA sequence.
• Identify the possible different splice isoforms possible from a given gene
sequence.
• Describe how protein sequence data are represented (e.g., FASTA, GenBank,
etc.)
• Describe the different protein databases available at NCBI (sequence,
structure, function).
Where are data about the • Describe how the NCBI databases intersect with other databases (e.g., EBI,
proteome (e.g., amino acid DDBJ, UniProt, etc.).
sequence and structure) found • Compare and contrast data contained in different databases.
and how are they stored, • Search for a protein record in a database with a given accession number.
retrieved, and organized? • Create a collection of records that meet a specified criterion (e.g., gene name
or symbol).
• Describe the types of metadata that accompany sequence, structure, and
function data to make for useful biological interpretation (e.g., biological
source, accession number, UniProt number, journal articles, etc.).
Protein - Information in Action • Explain the BLASTP, BLASTX, tBLASTn, tBLASTx algorithms for protein sequence
[PROTEOMICS] information.
• Interpret the biological significance of an e-value.
• Use variant prediction algorithms to determine the effect of a variant on
protein structure and function (e.g., VEP, PolyPhen, Sift, CADD, etc.).
• Describe Hidden Markov Models and how they can be used to assess motifs.
How can bioinformatics tools be
• Query a dataset with a specific protein sequence to learn about potential
employed to examine complex
functions (e.g., Pfam, CDD, SwissProt, UniProt, etc.).
biological problems in protein
• View and interpret the structure output from Protein Data Bank (e.g., Cn3D,
structure and function?
Jmol, etc.).
• Propose potential functions for a given protein structure.
• Explain the outputs from protein-folding algorithms to predict structure from
sequence.
• Understand how protein structures are determined (e.g., NMR,
crystallography).
• Compare and contrast the output from 2-D gel experiments.
• Analyze the output from mass spectrometry analysis (e.g., use the MASCOT
package).
Where are data about • Describe how metabolomics data are represented (e.g., Human Metabolome
metabolomics and systems Database, METLIN Database, etc.)
biology found and how are they • Describe the different metabolomic databases that are available.
stored, retrieved, and • Describe the types of metadata that accompany metabolomic data to make for
Small Molecules - Cellular organized? useful biological interpretation.
Homeostasis [METABOLOMICS & • Perform a GO analysis to identify the pathways relevant to a set of genes (e.g.,
SYSTEMS BIOLOGY] How can bioinformatics tools be identified by a transcriptomic study or a proteomic experiment).
employed to examine complex • Use the KEGG pathway database to look up the interaction network of a
biological problems in the flow pathway.
of molecules within pathways? • Interpret the data from experiments (e.g., mass spectrometry, nuclear
magnetic resonance, etc.) to determine levels of small molecule metabolites.
• Create and interpret a multiple sequence alignment (e.g., T-COFFEE, MUSCLE,
etc.).
• Describe the components of a phylogenetic tree (e.g., root, node, leaf).
• Explain the various types of phylogenetic trees (e.g., rooted, unrooted).
• Interpret a phylogenetic tree (e.g., which organism is most closely related to a
How can bioinformatics tools be
given organism in the tree).
Ecology and Evolution employed to examine complex
• Sketch a phylogenetic tree from its Newick representation.
[METAGENOMICS] biological problems in ecology
• Use bootstrapping to assess the quality of a phylogenetic tree.
and evolution?
• Create a phylogenetic tree for a set of related sequences (nucleotide or amino
acid) (e.g., MEGA).
• Use pre-existing tools to analyze a metagenomic data set to determine the set
of organisms present in a metagenomic sample (e.g., 16s rRNA, Greengenes,
mothur, etc.).
• Write a script to calculate the reverse complement of a nucleotide sequence.
• Write a script to determine reading frames of a nucleotide sequence.
How do biologists employ
• Write a script to calculate the melting point of double-stranded DNA.
software development as part of
• Write a script to retrieve the promoter regions for a set of related genes.
the scientific discovery process?
• Write a script to find the longest open reading frame in a given nucleotide
Computational Skills How do they use bioinformatics
sequence
models to explore biological
• Write a script to calculate the reverse complement of a nucleotide sequence.
interactions, networks, and data
integration? • Write a script to convert an RNA sequence to cDNA and to amino acids.
• Write a script to calculate molecular weight and isoelectric point.
• Write a script to count amino acid frequency.
• Write a script that compares the relative hydrophilicity/hydrophobicity of two
protein sequences.
• Use a spreadsheet to perform simple data analysis.
• Use a spreadsheet to open, read, parse, modify and output comma-separate
(.csv) files that will be ready to use in subsequent tools.
• Perform elementary statistical analysis on an “omics” dataset (e.g., using Excel
What higher-level
or Weka).
computational skills (e.g.,
• Perform Input/Output with data files.
command-line tools and
• Interact with remote servers.
scripting) can be used in
• Construct a bioinformatics pipeline.
bioinformatics research?
• Use open-source libraries and packages (e.g., BioPerl, Biopython, R,
BioConductor).
• Use programs at the Unix/Linux command line to analyze bioinformatics data.
• Use graph theory to represent data networks.
• Explain the implications of obtaining your own genomic information, either
What are the ethical, legal, through medical professionals or direct-to-consumer testing services.
medical, and social implications • Discuss different perspectives about who should have access to genomic data
Societal Impacts of Bioinformatics
of biological data and how are and how it should be protected.
such implications interpreted? • Describe how the scientific community protects against the falsification or
manipulation of large datasets, including genomic datasets.