The rapid evolution of DNA sequencing technologies over the past 20 years has made it possible to generate enormous amounts of data, and has subsequently spurred the development of computational tools needed to assemble complete genomes and to analyze genomic, transcriptomic and proteomic data. The GSC collaborates with and supports research by the wider research community. We have an extensive collection of software packages developed in-house available for download here and through GitHub.
Software
ABySS
Assembly By Short Sequences - a de novo, parallel, paired-end sequence assembler
Adapter Trimming for Small RNA Sequencing
Removes 3' adapter from Illumina sequencing of small RNAs where read length is greater than the size of RNAs
ALEA
ALEA is a computational toolbox for allele-specific (AS) epigenomics analysis
Barnacle
A pipeline for detecting and characterizing chimeric transcripts from long RNA sequences
BioBloomTools
BioBloom Tools (BBT) is a general use fast sequence categorization tool utilizing Bloom filters
btllib
A common code library with efficient code and wrappers for many common bioinformatics operations
ChopStitch
Exon annotation and splice graph reconstruction using transcriptome assembly and whole genome sequencing data
Circos
Visualize comparative genomic data such as alignments, conservation, homology, synteny and other positional n-tuples in an attractive and informative circular layout
DIDA
DIDA is a novel framework that performs the large-scale alignment tasks by distributing the indexing and alignment stages into smaller subtasks over a cluster of compute nodes
DiscoverySpace
DiscoverySpace is a graphical software application that intends to free the biologist from the micro-level, syntactic detail of the underlying data structures to concentrate on the "big picture" and the meaning of experimental results
FindPeaks
Findpeaks was developed to perform analysis of ChIP-Seq experiments
GraphNER
GraphNER is a named entity recognizer that uses graph propagation and improves BANNER and BANNER-ChemDNER systems. Data is available for gene mention detection task
HLAminer
Derivation of HLA class I and II predictions from shotgun sequence data sets
Internet Contig Explorer (iCE)
iCE is used for viewing fingerprint maps and associated data
KLEAT
c(K)LEavage site Analysis of Transcriptomes (KLEAT) identifies 3' UTR ends of transcripts in de novo RNA-Seq assemblies
Konnector
Connecting Paired-end Reads Using a Bloom Filter de Bruijn Graph
LaneRuler
LaneRuler will identify lanes in a gel image. The core module is a command line C program, whose result can be reviewed and corrected using a Java interface
LongStitch
LongStitch is a de novo genome assembly correction and scaffolding pipeline. LongStitch runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long).
MAVIS
A Python command-line tool for the post-processing of structural variant calls
MSSS
Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering Publication
NanoSim
Nanopore sequence read simulator based on statistical characterization
ntCard
ntCard: a streaming algorithm for cardinality estimation in genomics data
ntJoin
Fast and lightweight assembly-guided scaffolding using minimizer graphs
ntLink
ntLink is a lightweight de novo genome assembly scaffolder using long reads and minimizers.
ntRoot
ntRoot is an alignment-free, computationally lightweight method for inferring human super-population-level global and local ancestry from whole genome assemblies or raw sequencing data types.
ntSynt
ntSynt detects multi-genome synteny blocks using minimizer graph mappings.
PAVFinder
Post-Assembly Variant Finder (PAVFinder) - Structural variant caller on sequence assembly
PAVFinder_transcriptome
Structural and splice variant detection from transcriptome assembly
Physlr
Constructing a physical map from linked reads. The physical map can then be used to scaffold draft genome assemblies
Raw Quant
A Python package for extracting scan meta data and quantification values from Thermo .raw files
SAM - Sequence Assembly Manager
SAM is a Whole Genome Assembly (WGA) Management and Visualization Tool. It provides a generic platform for manipulating, analyzing and viewing WGA data, regardless of input type
Satellog
A database for the identification and prioritization of satellite repeats in disease association studies
Slider
Maximum use of probability information for alignment of short sequence reads and SNP detection
SNVMix
Detecting single nucleotide variants from next generation sequencing data
Spark
NOTE: This software is now being distributed via www.sparkinsight.org - please see that site for the latest release
THOR
THOR, the Targeted High-throughput Ortholog Reconstructor, is a Java application designed to assemble target genomic sequence orthologs in low-coverage genomes
Tigmint
Correct misassemblies in genome assembly drafts using linked or long DNA sequencing reads
TreeBuilder3D
TreeBuilder3D is an interactive viewer that allows the organization of SAGE and other types of gene expression data into hierarchical dendrograms, or phenetic networks
XMatchView
XMatchView is a python application designed to visualize DNA sequence alignments