Biojava Tutorial
Biojava Tutorial
BioJava.org
Open Bio sites bioperl.org biopython.org bioxml.org biodas.org biocorba.org Documentation Overview Getting started Tutorial BioJava in Anger JavaDoc API Demos WikiDocs Bug Tracking Web Interface Mailing List biojava-l Archive Mail us Subscribe Mailing List biojava-dev Archive Mail us Subscribe Forum (experimental) Participants Download WebCVS 14th March About BioJava The BioJava Project is an open-source project dedicated to providing Java tools for processing biological data. This will include objects for manipulating sequences, file parsers, CORBA interoperability, DAS, access to ACeDB, dynamic programming, and simple statistical routines to name just a few things. The BioJava library is useful for automating those daily and mundane bioinformatics tasks. As the library matures, the BioJava libraries will provide a foundation upon which both free software and commercial packages can be developed. News 12th December 30th October 23th August Take a look at BioJava in Anger, a new website with a collection of cookbook-style documentation for BioJava. An experimental forum/blog has been set up. It may not be there always or it may migrate to another platform/host eventually. You can find it here BioJava 1.22 released, bugfix for SequenceIO round-tripping errors. BioJava 1.21 released, including BioSQL support, for general purpose storage of sequence data in a relational database. BioJava 1.20 is ready, featuring improved sequence and blast parsers, faster dynamic programming routines, client libraries for DAS 1.0, and much more. Take a look at the change log or go straight to the download area
13th February
There is going to be a repeat of last year's successful `BioJava boot camp' workshop, held at the Wellcome Trust 6th February 2002 Genome Campus near Cambridge. For more details, look here Getting a copy of the project
BioJava is distributed under LGPL. This means that you can use the libraries without your software being forced under either the LGPL or GPL. LGPL is not GPL. BioJava releases can be obtained by FTP from our download area. Instructions for installing the library, and building source releases, can be found on the Getting started page. You can also maintain an up-to-date view of BioJava with CVS. We provide anonymous CVS server. If you wish to contribute your existing code or help maintain part of the BioJava code-base, then we can supply you with a read/write account. Documentation The overall design ethos of BioJava is probably the best place to start, as it explains how BioJava is structured. The current JavaDoc api reference should provide up-to-date api documentation. Many of the demonstration programs have some description, both as to their purpose, and how to run them. There is also a short tutorial on sequence handling in BioJava. Thanks This site could not exist without the donations of bandwith and hardware from Genetics Institute, Inc. and the Compaq Bioinformatics Solutions Center. In addition, we would like to thank Chris Dagdigian for maintaining the bio* servers.
biojava API
All Classes
Packages
org.biojava.bio org.biojava.bio.dist org.biojava.bio.dp org.biojava.bio.dp.onehead org.biojava.bio.dp.twohead org.biojava.bio.gui org.biojava.bio.gui.sequence org.biojava.bio.program org.biojava.bio.program.blast2html org.biojava.bio.program.das org.biojava.bio.program.gff org.biojava.bio.program.phred org.biojava.bio.program.sax org.biojava.bio.program.search org.biojava.bio.program.ssbind org.biojava.bio.program.xff org.biojava.bio.program.xml org.biojava.bio.proteomics org.biojava.bio.search org.biojava.bio.seq org.biojava.bio.seq.db org.biojava.bio.seq.db.biosql org.biojava.bio.seq.db.emblcd org.biojava.bio.seq.genomic org.biojava.bio.seq.homol org.biojava.bio.seq.impl org.biojava.bio.seq.io org.biojava.bio.seq.io.agave org.biojava.bio.seq.io.game org.biojava.bio.seq.projection org.biojava.bio.seq.ragbag org.biojava.bio.symbol org.biojava.stats.svm org.biojava.stats.svm.tools org.biojava.utils org.biojava.utils.cache org.biojava.utils.io org.biojava.utils.stax org.biojava.utils.xml
org.biojava.bio.seq.genomic Interfaces for representing key features of genomes. org.biojava.bio.seq.io org.biojava.bio.symbol Classes and interfaces for processing and producing flat-file representations of sequences. Representation of the Symbols that make up a sequence, and locations within them.
Readers for the EMBL CD-ROM format binary org.biojava.bio.seq.db.emblcd index files used by EMBOSS and Staden packages. org.biojava.bio.seq.io.agave org.biojava.bio.seq.io.game Classes for converting between AGAVE XML and BioJava objects. Event-driven parsing system for the Gene Annotation Markup Elements (GAME). The Ragbag package is a set of classes for setting up a virtual sequence contig without the need of writing Biojava code.
org.biojava.bio.seq.ragbag
All Classes
biojava API
AbstractAlignmentStyler AbstractAlphabet AbstractBeadRenderer AbstractChangeable AbstractDistribution AbstractFeatureHolder AbstractLocation AbstractLocationDecorator AbstractMatrixPairDPCursor AbstractOrderNDistribution AbstractRangeLocation AbstractSequenceDB AbstractSVMClassifierModel AbstractSVMTarget AbstractSymbol AbstractSymbolList AbstractTrainer AcnumHitReader AcnumTrgReader ActivityListener Agave2AgaveAnnotFilter AGAVEAltIdsPropHandler AGAVEAnnotationsHandler AGAVEAnnotFilter AGAVEAnnotFilterFactory AGAVEAssemblyHandler AGAVEBioSeqCallbackItf AGAVEBioSeqHandler AGAVEBioSequenceHandler AGAVECallbackItf AGAVECdsHandler AGAVEChromosomeCallbackItf AGAVEChromosomeHandler AGAVEClassificationHandler AGAVECompResultHandler AGAVEComputationHandler AGAVEContigCallbackItf AGAVEContigHandler AGAVEDbId AGAVEDbIdCallbackItf AGAVEDbIdPropCallbackItf AGAVEDbIdPropHandler AGAVEDescPropHandler AGAVEElementIdPropHandler AGAVEEvidenceCallbackItf AGAVEEvidenceHandler AGAVEExonsPropHandler AGAVEFeatureCallbackItf AGAVEFragmentOrderHandler AGAVEFragmentOrientationHandler AGAVEGeneHandler AGAVEHandler AGAVEIdAlias AGAVEIdAliasCallbackItf AGAVEIdAliasPropHandler AGAVEKeywordPropHandler
External Tools
org.biojava.bio.program Java wrappers for interacting with external bioinformatics tools. org.biojava.bio.program.blast2html Code for generating HTML reports from blast output org.biojava.bio.program.gff org.biojava.bio.program.phred org.biojava.bio.program.sax GFF manipulation. Parser for Phred output Parsers which offer XML representations of the output from common bioinformatics tools. Interfaces and classes for parsing the results of external search programs. Creation of SeqSimilaritySearchResult objects from SAX events using the BioJava BlastLikeDataSetCollection DTD. Utility classes for the org.biojava.bio.program.sax package.
org.biojava.bio.program.search
org.biojava.bio.program.ssbind
org.biojava.bio.program.xml
Developers' Packages
org.biojava.bio.seq.impl org.biojava.bio.seq.projection org.biojava.utils org.biojava.utils.cache org.biojava.utils.io org.biojava.utils.stax org.biojava.utils.xml Standard in-memory implementations of Sequence and Feature. Code for projecting Feature objects into alternate coordinate systems. Miscellaneous utility classes used by other BioJava components. A simple cache system with pluggable caching behaviours. I/O utility classes The Stack API for XML (StAX). This package contains a number of utilities for processing XML documents.
Experimental/New Packages
org.biojava.bio.proteomics Utilities for analysing protein sequences. org.biojava.bio.seq.homol Feature interfaces for representing regions of homology between sequences.
biojava API
AGAVEMapLocation AGAVEMapLocationPropHandler AGAVEMapPosition AGAVEMapPositionPropHandler AGAVEMatchAlignPropHandler AGAVEMatchDescPropHandler AGAVEMatchRegion AGAVEMatchRegionPropHandler AGAVEMrnaHandler AGAVENotePropHandler AGAVEPredictedProteinHandler AGAVEProperty AGAVEQualifierPropHandler AGAVEQueryRegion AGAVEQueryRegionPropHandler AGAVERelatedAnnot AGAVERelatedAnnotPropHandler AGAVEResultGroupHandler AGAVEResultPropertyPropHandler AGAVESciPropertyPropHandler AGAVESeqFeatureHandler AGAVESeqLocationPropHandler AGAVESeqMapHandler AGAVESeqPropHandler AGAVETranscriptHandler AGAVEUnorderedFragmentsHandler AGAVEViewPropHandler AgaveWriter AGAVEXref AGAVEXrefCallbackItf AGAVEXrefPropHandler AGAVEXrefPropPropHandler AGAVEXrefs AGAVEXrefsPropHandler Alignment AlignmentFormat AlignmentHandler AlignmentMarker AlignmentRenderer Alphabet AlphabetIndex AlphabetManager AlphabetResolver Annotatable Annotatable.AnnotationForwarder AnnotatedSequenceDB Annotation Annotation.EmptyAnnotation AnnotationFactory AppBeanRunner AppEntry AppException AssembledSymbolList AtomicSymbol BackMatrixPairDPCursor BackPointer
org.biojava.stats.svm
biojava API
BarLogoPainter BaseXMLWriter BasicFeatureRenderer BasisSymbol BaumWelchSampler BaumWelchTrainer BetweenLocation BioError BioException BioRuntimeException BioSQLSequenceDB Blast2HTMLHandler BlastDBQueryHandler BlastLikeHomologyBuilder BlastLikeSAXParser BlastLikeSearchBuilder BlastLikeToXMLConverter BlockPainter BooleanElementHandlerBase BumpedRenderer ByteElementHandlerBase Cache CacheMap CacheReference CachingKernel CachingSequenceDB Cell CellCalculator CellCalculatorFactory CellCalculatorFactoryMaker Changeable ChangeableCache ChangeAdapter ChangeEvent ChangeForwarder ChangeListener ChangeListener.AlwaysVetoListener ChangeListener.LoggingListener ChangeSupport ChangeType ChangeVetoException CharacterTokenization CharElementHandlerBase ChunkedSymbolListBuilder CircularLocation CircularSequence CircularView ClassifierExample ClassifierExample.PointClassifier Classify ClustalWAlignmentSAXParser ColourCommand ComponentFeature ComponentFeature.Template ComponentFeatureHandler CompoundLocation
http://www.biojava.org/docs/api/ (4 di 14) [02/04/2003 13.39.41]
biojava API
Count CrosshairRenderer CrossProductTokenization DAS DASGFFFeatureHandler DASLink DASOptimizableFeatureHolder DASSequence DASSequenceDB DatabaseIdHandler DatabaseURLGenerator DataSetHandler DataSource DefaultURLGeneratorFactory DelegationManager DelegationManager DiagonalAddKernel DiagonalCachingKernel Digest Distribution Distribution.NullModelForwarder DistributionFactory DistributionFactory.DefaultDistributionFactory DistributionLogo DistributionTools DistributionTrainer DistributionTrainerContext DivisionLkpReader DNAStyle DNATools DotState DoubleAlphabet DoubleAlphabet.DoubleSymbol DoubleElementHandlerBase DoubleTokenization DP DP.ReverseIterator DPCompiler DPFactory DPFactory.DefaultFactory DPInterpreter DPInterpreter.Maker DPMatrix DummySymbolList EbiDatabaseURLGenerator Edit ElementRecognizer ElementRecognizer ElementRecognizer ElementRecognizer.AllElementRecognizer ElementRecognizer.AllElementRecognizer ElementRecognizer.AllElementRecognizer ElementRecognizer.ByLocalName ElementRecognizer.ByLocalName ElementRecognizer.ByLocalName ElementRecognizer.ByNSName
http://www.biojava.org/docs/api/ (5 di 14) [02/04/2003 13.39.41]
biojava API
ElementRecognizer.ByNSName ElementRecognizer.ByNSName ElementRecognizer.HasAttribute ElementRecognizer.HasAttribute ElementRecognizer.HasAttribute EllipticalBeadRenderer Embl2AgaveAnnotFilter EmblCDROMIndexReader EmblCDROMIndexStore EmblCDROMRandomAccess EmblFileFormer EmblLikeFormat EmblLikeLocationParser EmblProcessor EmblProcessor.Factory EmissionCache EmissionState EntryNamIdxReader EntryNamRandomAccess Exon Exon.Template FastaDescriptionLineParser FastaDescriptionLineParser.Factory FastaFormat FastaSearchBuilder FastaSearchParser FastaSearchSAXParser FastaSequenceSAXParser Feature Feature.ByEmblOrderComparator Feature.ByLocationComparator Feature.Template FeatureBlockSequenceRenderer FeatureFilter FeatureFilter.AcceptAllFilter FeatureFilter.AcceptNoneFilter FeatureFilter.And FeatureFilter.AndNot FeatureFilter.ByAncestor FeatureFilter.ByAnnotation FeatureFilter.ByClass FeatureFilter.ByParent FeatureFilter.BySource FeatureFilter.ByType FeatureFilter.ContainedByLocation FeatureFilter.FrameFilter FeatureFilter.HasAnnotation FeatureFilter.Not FeatureFilter.Or FeatureFilter.OverlapsLocation FeatureFilter.StrandFilter FeatureHandler FeatureHolder FeatureHolder.EmptyFeatureHolder FeatureImpl FeatureLabelRenderer
http://www.biojava.org/docs/api/ (6 di 14) [02/04/2003 13.39.41]
biojava API
FeatureLabelRenderer.LabelMaker FeatureRealizer FeatureRenderer FeatureTableParser FilteringRenderer FilterUtils FiniteAlphabet FixedSizeCache FixedSizeMap FloatElementHandlerBase Frame FramedFeature FramedFeature.ReadingFrame FramedFeature.Template FundamentalAtomicSymbol FuzzyLocation FuzzyLocation.RangeResolver FuzzyPointLocation FuzzyPointLocation.PointResolver GAMEAnnotationHandler GAMEAspectPropHandler GAMEDbxrefPropHandler GAMEDescriptionPropHandler GAMEFeatureCallbackItf GAMEFeatureSetHandler GAMEFeatureSetPropHandler GAMEFeatureSpanHandler GAMEGenePropHandler GAMEHandler GAMEMapPosPropHandler GAMENameCallbackItf GAMENamePropHandler GAMEResiduesPropHandler GAMESeqPropHandler GAMESeqRelPropHandler GAMESpanPropHandler GAMETranscriptCallbackItf GAMETypePropHandler GapDistribution GappedPhredSequence GappedSymbolList GenbankFileFormer GenbankFormat GenbankProcessor GenbankProcessor.Factory Gene Gene.Template GeneticCodes GFFDocumentHandler GFFEntrySet GFFErrorHandler GFFErrorHandler.AbortErrorHandler GFFErrorHandler.SkipRecordErrorHandler GFFFilterer GFFParser GFFRecord
http://www.biojava.org/docs/api/ (7 di 14) [02/04/2003 13.39.41]
biojava API
GFFRecordFilter GFFRecordFilter.AcceptAll GFFRecordFilter.FeatureFilter GFFRecordFilter.SequenceFilter GFFRecordFilter.SourceFilter GFFWriter HashSequenceDB HitDescHandler HitIdHandler Homology HomologyDB HomologyFeature HomologyFeature.Template HTMLRenderer IDMaker IDMaker.ByName IDMaker.ByURN IgnoreCountsTrainer IgnoreRecordException IllegalAlphabetException IllegalIDException IllegalSymbolException IllegalTransitionException Index IndexedCount IndexedSequenceDB IndexStore Initializable IntegerAlphabet IntegerAlphabet.IntegerSymbol IntegerTokenization IntElementHandlerBase ItemValue LabelRenderer LabelRenderer.RenderNothing LayeredRenderer LazyFeatureHolder LightPairDPCursor LinearKernel LineInfo ListSumKernel ListTools ListTools.Doublet ListTools.Triplet ListWrapper Location Location.EmptyLocation Location.LocationComparator LocationHandlerBase LocationTools LogoContext LogoPainter LongElementHandlerBase MagicalState MarkovModel MarkovModel.DistributionForwarder
http://www.biojava.org/docs/api/ (8 di 14) [02/04/2003 13.39.41]
biojava API
MassCalc MatrixPairDPCursor Meme MergeAnnotation MergeFeatureHolder ModelInState ModelTrainer MSFAlignmentFormat MultiLineRenderer NameTokenization NcbiDatabaseURLGenerator NestedError NestedException NestedKernel NestedRuntimeException NormalizingKernel ObjectUtil OptimizableFilter OrderNDistribution OrderNDistributionFactory OverlayAnnotation OverlayMap OverlayMarker OverlayRendererWrapper PaddingRenderer PairDistribution PairDPCursor PairDPMatrix PairwiseDiagonalRenderer PairwiseDP PairwiseFilteringRenderer PairwiseOverlayRenderer PairwiseRenderContext PairwiseSequencePanel PairwiseSequenceRenderer PairwiseSequenceRenderer.PairwiseRendererForwarder ParseErrorEvent ParseErrorListener ParseErrorSource ParseException ParserException PdbSAXParser PdbToXMLConverter PhredFormat PhredSequence PhredTools PlainBlock PlainStyle PointLocation PolynomialKernel PrimaryTranscript PrimaryTranscript.Template ProfileHMM ProjectedFeatureHolder Projection ProjectionContext
http://www.biojava.org/docs/api/ (9 di 14) [02/04/2003 13.39.41]
biojava API
ProjectionEngine ProjectionEngine.Instantiator ProjectionEngine.InstantiatorImpl PropDetailHandler Protease ProteinRefSeqFileFormer ProteinRefSeqProcessor ProteinRefSeqProcessor.Factory ProteinTools Qualitative QueryableSequenceDB QueryIdHandler RadialBaseKernel RagbagAssembly RagbagComponentDirectory RagbagComponentDirectory.EmptyComponentDirectory RagbagFeatureTypeCatcher RagbagFileParserFactory RagbagFilteredCachedSeqFactory RagbagFilterFactory RagbagFixedCacheSeqFactory RagbagHashedComponentDirectory RagbagIdleSequenceBuilder RagbagMap RagbagSequenceFactory RagbagSoftRefSeqFactory RagbagUncachedSeqFactory RandomAccessReader RangeLocation RealizingFeatureHolder RectangularBeadRenderer ReferenceServer RelabeledAlignment RemoteFeature RemoteFeature.Region RemoteFeature.Resolver RemoteFeature.Template ResourceEntityResolver ReversibleTranslationTable RNAFeature RNAFeature.Template RNATools RoundRectangularBeadRenderer RulerRenderer SAX2StAXAdaptor SAX2StAXAdaptor ScoreType ScoreType.NullModel ScoreType.Odds ScoreType.Probability SearchBuilder SearchContentHandler SearchParser SearchReader SeqFileFormer SeqFileFormerFactory
http://www.biojava.org/docs/api/ (10 di 14) [02/04/2003 13.39.41]
biojava API
SeqIOAdapter SeqIOEventEmitter SeqIOFilter SeqIOListener SeqIOTools SeqSimilarityAdapter SeqSimilaritySearcher SeqSimilaritySearchHit SeqSimilaritySearchHit.ByScoreComparator SeqSimilaritySearchHit.BySubHitCountComparator SeqSimilaritySearchResult SeqSimilaritySearchSubHit SeqSimilaritySearchSubHit.ByScoreComparator SeqSimilaritySearchSubHit.BySubjectStartComparator Sequence SequenceAlignmentSAXParser SequenceAnnotator SequenceBuilder SequenceBuilderBase SequenceBuilderFactory SequenceBuilderFilter SequenceContentHandlerBase SequenceDB SequenceDBInstallation SequenceDBLite SequenceDBSearchHit SequenceDBSearchResult SequenceDBSearchSubHit SequenceDBWrapper SequenceFactory SequenceFormat SequenceHandler SequenceIterator SequencePanel SequencePoster SequenceRenderContext SequenceRenderContext.Border SequenceRenderer SequenceRenderer.RendererForwarder SequenceRendererWrapper SequencesAsGFF SequenceViewerEvent SequenceViewerListener SequenceViewerMotionListener SequenceViewerMotionSupport SequenceViewerSupport SigmoidKernel SimilarityPairBuilder SimilarityPairFeature SimilarityPairFeature.EmptyPairwiseAlignment SimilarityPairFeature.Template SimpleAlignment SimpleAlignmentStyler SimpleAlphabet SimpleAnnotation SimpleAnnotFilter
http://www.biojava.org/docs/api/ (11 di 14) [02/04/2003 13.39.41]
biojava API
SimpleAssembly SimpleAssemblyBuilder SimpleAtomicSymbol SimpleDistribution SimpleDistributionTrainer SimpleDistributionTrainerContext SimpleDotState SimpleEmissionState SimpleExon SimpleFeature SimpleFeatureHolder SimpleFeatureRealizer SimpleFramedFeature SimpleGene SimpleGFFRecord SimpleHomology SimpleHomologyFeature SimpleIndex SimpleItemValue SimpleLabelRenderer SimpleMarkovModel SimpleModelInState SimpleModelTrainer SimplePrimaryTranscript SimpleRemoteFeature SimpleRemoteFeature.DBResolver SimpleReversibleTranslationTable SimpleRNAFeature SimpleSeqSimilaritySearchHit SimpleSeqSimilaritySearchResult SimpleSeqSimilaritySearchSubHit SimpleSequence SimpleSequenceBuilder SimpleSequenceDBInstallation SimpleSequenceFactory SimpleSimilarityPairFeature SimpleSpliceVariant SimpleStatePath SimpleStrandedFeature SimpleSVMClassifierModel SimpleSVMTarget SimpleSymbolList SimpleSymbolPropertyTable SimpleSymbolStyle SimpleTranslatedRegion SimpleTranslationTable SimpleWeightMatrix SimpleXMLEmitter SingleDP SingleDPMatrix SingletonAlphabet SingletonList SixFrameRenderer SixFrameZiggyRenderer SmallAnnotation SmallMap
http://www.biojava.org/docs/api/ (12 di 14) [02/04/2003 13.39.41]
biojava API
SMORegressionTrainer SMOTrainer SoftReferenceCache SparseVector SparseVector.NormalizingKernel SpliceVariant SpliceVariant.Template SSPropHandlerFactory StackedLogoPainter State StatePath StaticMemberPlaceHolder StAXContentHandler StAXContentHandler StAXContentHandlerBase StAXContentHandlerBase StAXFeatureHandler StAXFeatureHandler StAXHandlerFactory StAXHandlerFactory StAXPropertyHandler StAXPropertyHandler StoppingCriteria StopRenderer StrandedFeature StrandedFeature.Strand StrandedFeature.Template StrandedFeatureHandler StreamParser StreamReader StreamWriter StringElementHandlerBase SubHitSummaryHandler SubPairwiseRenderContext SubSequence SubSequenceDB SubSequenceRenderContext SuffixTree SuffixTree.SuffixNode SuffixTreeKernel SuffixTreeKernel.DepthScaler SuffixTreeKernel.MultipleScalar SuffixTreeKernel.NullModelScaler SuffixTreeKernel.SelectionScalar SuffixTreeKernel.UniformScaler SVM_Light SVM_Light.LabelledVector SVMClassifierModel SVMKernel SVMRegressionModel SVMTarget SwissprotFileFormer SwissprotProcessor SwissprotProcessor.Factory Symbol SymbolList
http://www.biojava.org/docs/api/ (13 di 14) [02/04/2003 13.39.41]
biojava API
SymbolList.EmptySymbolList SymbolListViews SymbolPropertyTable SymbolReader SymbolSequenceRenderer SymbolStyle SymbolTokenization SymbolTokenization.TokenType TabIndexStore TextBlock TextLogoPainter TickFeatureRenderer Train Trainable TrainerTransition TrainingAlgorithm TrainingContext TrainingEvent TrainingListener TrainRegression Transition TransitionTrainer TranslatedDistribution TranslatedRegion TranslatedRegion.Template TranslatedSequencePanel TranslationTable TypedProperties TypesListener UniformDistribution URLGeneratorFactory UtilHelper ViewingSequenceDB ViewSequence WeakCacheMap WebSequenceDB WeightMatrix WeightMatrixAnnotator WMAsMM WordTokenization XFFFeatureSetHandler XFFPartHandlerFactory XMLBeans XMLDispatcher XmlMarkovModel XMLPeerBuilder XMLPeerFactory ZiggyFeatureRenderer
A SymbolList can be stored as a list of references to singleton objects Actually, it is possible in principle to store a DNA sequence (without gaps or ambiguous residues) using only two bits per residue. Since the BioJava SymbolList is an interface, it only defines how the sequence should be accessed -not how data is stored. If space is important, it is possible to implement a `packed' implementation of SymbolList. Client code need never worry about the underlying data model. BioJava's object oriented view of sequences brings other advantages. Many programs which analyse DNA sequences need to have simultaneous access to the original sequence and that of its complementary strand. In BioJava this is easy.
http://www.biojava.org/tutorials/chap1.html (2 di 5) [02/04/2003 13.39.42]
SymbolList forward = getSequence(); SymbolList backward = DNATools.reverseComplement(forward); System.out.println("First base: " + forward.symbolAt(1).getName()); System.out.println("Complement: " + backward.symbolAt(backward.length()). getName()); Since the reverse complement of a DNA sequence is a simple programmatic transformation, BioJava doesn't need to physically store the sequence in memory at all. Instead, it just creates a special implementation of the SymbolList interface, which computes the reverse strand sequence on the fly. This will typically cost just a few bytes of memory regardless of the sequence length, compared to megabytes for a string representation of a typical genome sequence.
A simple example
The following program is a very simple example, which reads one or more DNA sequences from a FASTA format data file and reports the GC content of each. This example is a (very) simple application of the BioJava Sequence I/O framework, described in later chapters. Used as below, it allows you to iterate over all the sequences in a multiple-entry file, rather than holding all of them in memory at once. import java.io.*;
import org.biojava.bio.symbol.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; public class GCContent { public static void main(String[] args) throws Exception { if (args.length != 1) throw new Exception("usage: java GCContent filename.fa"); String fileName = args[0]; // Set up sequence iterator BufferedReader br = new BufferedReader( new FileReader(fileName)); SequenceIterator stream = SeqIOTools.readFastaDNA(br); // Iterate over all sequences in the stream while (stream.hasNext()) { Sequence seq = stream.nextSequence(); int gc = 0; for (int pos = 1; pos <= seq.length(); ++pos) { Symbol sym = seq.symbolAt(pos); if (sym == DNATools.g() || sym == DNATools.c()) ++gc; } System.out.println(seq.getName() + ": " + ((gc * 100.0) / seq.length()) + "%"); } } }
Ambiguous symbols
Sometimes, it is useful to represent sequences which are not perfectly defined. In such cases, it is common to use ambiguous symbols. A common example is the 'N' character in DNA sequences, which is used to indicate parts of a sequence where the sequencing traces were difficult to interpret. Sometimes, runs of Ns are also used to indicate gaps in assemblies. In the case of DNA, additional ambiguity symbols have been defined, covering all possible combinations of the four bases. For instance, the symbol 'W' realy means (A or T). Within the BioJava object model, it is possible to inspect any ambiguous symbol to determine the set of atomic symbols which it matches, using the getMatches method. Atomic symbols can be considered to be the special case where getMatches returns a set whose size is exactly one. As a conveniece, atomic symbols also implement the AtomicSymbol interfaces. You might want to modify the GCContent program, above, so as to ignore any ambiguous symbols in the input sequence.
BioJava sequences tutorial 0.3 by Thomas Down. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
A tour of a Sequence
Sequence is a sub-interface of SymbolList. Thus, all the standard methods for accessing sequence data in a SymbolList can equally be applied to a Sequence, and Sequences can be passed to any analysis methods which normally expect to receive a SymbolList. The Sequence interface adds two types of additional data to a SymbolList q Global annotations, such as names, database identifiers, and literature references q Location-specific annotations (features) Two pieces of global annotation information are considered to be sufficiently important that they have dedicated accessor methods. The name of the Sequence is a simple string description of the Sequence: normally the name or accession number of the Sequence in the database from which it is retrieved. The getURN method, on the other hand, should return a more structured identifier for the sequence, represented as a Uniform Resource Identifier (URI) e.g.: q urn:sequence/embl:AL121903 q file:///home/thomas/genome.fasta|rpoN q http://adzel.casseiopeia.org/seqs/myseqs.fasta|seq0001 q acedb://humace.sanger.ac.uk/DNA/AL121903 URNs are a special class of URIs which represent global names for `well known' resources. Note that, despite the method name, it may not be appropriate to give an actual URN for sequences. However, for sequences from databases such as EMBL, where many sites have local installations, use of URNs is encouraged. The exact use of the name and URN properties is currently dependent to some extent on how the sequence was loaded. As BioJava enters more common use, more formal definitions of these properties will emerge.
Other annotations
In additions to the two `identifier' properties of the Sequence, it may have other annotation data associated with it. BioJava contains an Annotation interface, which represents a set of key-value pairs, a little like a Java Map (indeed, Annotation has an asMap method). Sequence seq = getSequence(); Annotation seqAn = seq.getAnnotation(); for (Iterator i = seqAn.keys().iterator(); i.hasNext(); ) { Object key = i.next(); Object value = seqAn.getProperty(key); System.out.println(key.toString() + ": " + value.toString()); } Annotation objects aren't just used in Sequences -- many other BioJava objects, including Features, can also have annotations associated with them. Currently, there are no specific conventions for the kind of data which might be found in an Annotation. In general, the keys should be strings (although there is no requirement that this be the case). But the values may be any Java object. More guidelines for the contents of Annotation objects may be introduced as BioJava develops.
recursive method below will print a simple text representation of a tree of features: public void printFeatures(FeatureHolder fh, PrintWriter pw, String prefix) { for (Iterator i = fh.features(); i.hasNext(); ) { Feature f = (Feature) i.next(); pw.print(prefix); pw.print(f.getType()); pw.print(" at "); pw.print(f.getLocation().toString()); pw.println(); printFeatures(f, pw, prefix + " "); } } all Feature implementations include two methods which indicate how it fits into a feature tree. getParent returns the FeatureHolder (Sequence or Feature) which is the feature's immediate parent, while getSequence returns the Sequence object which is the root of the tree. Feature objects are always associated with a specific sequence, and always have exactly one parent FeatureHolder.
template.type = "TestFeature"; template.source = "Test"; template.location = new RangeLocation(100, 200); template.annotation = Annotation.EMPTY_ANNOTATION; mySequence.createFeature(template); Every sub-interface of Feature should have a nested class, also named Template, which extends Feature.Template and adds any extra fields needed to construct that specialized kind of feature. BioJava sequences tutorial 0.1 by Thomas Down. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
database. We want to allow you to create any kind of Sequence object from a given data stream. Pluggable filters Not all users will wish to exactly reflect the contents of a sequence file as a Sequence object. Sometimes it is useful to select specific pieces of data from a file, or to change it into some other format. For instance, BioJava has a hierarchical model for features attached to a sequence, whereas many file formats (for instance, EMBL) do not. You might wish to rebuild some kind of feature hierarchy from an EMBL flatfile during the parsing process.
SequenceBuilders
The sequence input framework is based around the SequenceBuilder interface (this is actually a sub-interface of SeqIOListener, but for these purposes you will usually be using SequenceBuilder). The role of a SequenceBuilder is to accumulate information discovered while parsing a sequence file, and ultimately to construct a Sequence object. There are two kinds of SequenceBuilder implementation. Builders These actually contruct new Sequence objects. Generally, there will just be one Builder implementation for each Sequence implementation. The basic BioJava library provides one Builder implementation, SimpleSequenceBuilder, which constructs simple in-memory representations for any kind of sequence data. Filters These don't construct Sequence objects themselves, but are chained to another SequenceBuilder. When they are notified of data, they perform some processing, then pass the information on to the next SequenceBuilder in the chain. Whenever a SequenceBuilder is required, you can either simply provide a `Builder' implementation, or you can create a chain consisting of one or more `Filters', leading ultimately to a `Builder'. A SequenceBuilder object should only be used once. If multiple sequences are being read from a stream, a new SequenceBuilder (or chain) should be constructed for each one. For convenience, we provide a SequenceBuilderFactory interface, whose sole purpose is to encapsulate the construction of SequenceBuilders. Each SequenceBuilder implementation should provide a suitable factory implementation as well. For `Builder' implementations, it is usually possible to provide a `singleton' factory object. For SimpleSequenceBuilder this is the static field SimpleSequenceBuilder.FACTORY. For filters, the factory must be parameterized with another SequenceBuilderFactory so that a complete chain can be constructed. For instance: SequenceBuilderFactory mySBF = new EmblProcessor.Factory(SimpleSequenceBuilder.FACTORY); Authors of new SequenceBuilder implementations are encouraged to consider this naming style when implementing SequenceBuilderFactory.
import javax.servlet.*; import javac.servlet.http.*; import import import import org.biojava.bio.symbol.*; org.biojava.bio.seq.*; org.biojava.bio.seq.io.*; org.biojava.bio.seq.db.*;
public class SequenceServlet extends HttpServlet { private SequenceDB indexedDB; // Database to serve private SequenceFormat seqFormat; // Used for writing public void init(ServletConfig config) throws ServletException { super.init(config); String dbName = config.getInitParameter("sequence.db"); if (dbName == null) throw new ServletException("Database not specified"); try { TabIndexStore index = TabIndexStore.open(dbName); indexedDB = new IndexedSequenceDB(index); } catch (Exception ex) { log("Can't open sequence database: " + dbName, ex); throw new ServletException(); } seqFormat = new FastaFormat(); } public void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { String id = req.getParameter("id"); if (id == null) { resp.sendError(HttpServletResponse.SC_NOT_FOUND, "No id parameter in request"); return; } try { Sequence seq = indexedDB.getSequence(id); resp.setContentType("text/plain"); PrintStream stream = new PrintStream(resp.getOutputStream()); seqFormat.writeSequence(seq, stream);
} catch (BioException ex) { log("Can't retrieve sequence", ex); resp.sendError(HttpServletResponse.SC_NOT_FOUND, "Couldn't load sequence " + id); } } } BioJava sequences tutorial 0.3 by Thomas Down. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
What is a ChangeEvent?
ChangeEvent extends java.util.EventObject and adds the methods: q getChange - the new value q getPrevious - the old value q getType - the 'type' of event q getChained - an event that caused this event to be fired In constrast to the classical Java events model, one event class is shared among all types of BioJava events. The 'type' of the event is signaled by the value of the type property. ChangeType is a final
class. Each interface that will fire ChangeEvents will have public static final ChangeType fields with descriptive names. ChangeEvent objects store a descriptive name but are always compared with the == operator. This scheme is a type-safe extention of the Swing PropertyChangeEvent system but BioJava interfaces explicitly publish what types of event they may fire.
ChangeSupport is a utility class that handles 99% of the cases where you wish to implement the Changeable interface. Idealy, you should instantiate one of these objects and then delegate the listener methods to this. In addition to the methods in Changeable , ChangeSupport supplys the methods: q firePreChangeEvent(ChangeEvent ce) q firePostChangeEvent(ChangeEvent ce) These methods invoke the preChange and postChange methods of the apropreate listeners. firePreChangeEvent will pass on any ChangeVetoExceptions that the listeners throw. AbstractChangeable is an abstract implementation of Changeable that delegates to a ChangeSupport. In the cases where your class does not have to inherit from any class but must implement Changeable, this is a perfect base class. It will lazily instantiate the delegate only when listeners need to be registered. In the next tutorial, we will implement an event source and add some listeners to it. BioJava events tutorial 0.1 by Matthew Pocock. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
Changeability examples
By Matthew Pocock
We are going to play with the Changeability code using the example of a GUI for viewing the roles on a rulet wheel. We will try to estimate the probability of the ball falling on any one of the 40 slots and of it falling on red or black. Grab the source-code or run the applet directly. You may be unable to run the applet as it requires Java2. You should be able to view it in a Java2 appletviewer with no problems.
The imports
We will need to import some standard graphical packages to make the GUI, and java.util as it gives us stuff like iterators. From biojava, we will need org.biojava.utils for both the standard exceptions (NestedException and NestedError), and all of the Changeability api. The other biojava packages give us things like symbol objects, alphabets, annotations and probability distributions. import import import import import import import import import java.awt.*; java.awt.event.*; java.awt.geom.*; java.util.*; javax.swing.*; org.biojava.utils.*; org.biojava.bio.*; org.biojava.bio.symbol.*; org.biojava.bio.dist.*;
// stuff to make the roulet wheel exist. static { final int numRolls = 40; // make the rolls alphabet rolls = new SimpleAlphabet("Rolls"); allRolls = new Symbol[numRolls]; Having made the rolls alphabet, we now must populate it with each possible roulet wheel outcome - 1..40 - as a symbol instance. for(int i = 1; i <= numRolls; i++) { Symbol s = allRolls[i-1] = AlphabetManager.createSymbol( (char) (i + '0'), i + "", Annotation.EMPTY_ANNOTATION ); // attempt to add the symbol // this should work, but we still have to catch the exceptions. Since they // should be impossible throw, we re-throw them as assertion-failures. try { rolls.addSymbol(s); } catch (ChangeVetoException cve) { throw new NestedError( cve, "Assertoin Failure: Can't add symbol to the rolls alphabet" ); } catch (IllegalSymbolException ise) { throw new NestedError( ise, "Assertoin Failure: Can't add symbol to the rolls alphabet" ); } } Notice that we have to catch exceptions that should be imposible to generate, but are specified in the API. Under different circumstances, these exceptions may be legitimately thrown, and we would have caught them and done something more sensible to handle the error. rolls.addChangeListener(ChangeListener.ALWAYS_VETO, Alphabet.SYMBOLS); This is an example of using ALWAYS_VETO to prevent things from changing. Here we lock the SYMBOLS property of rolls so that no more symbol instances can be added or removed from the alphabet. This ensures data-integrity and makes it harder to write syntaciticaly correct buggs. We must now make the red/black alphabet. redBlack = new SimpleAlphabet("Red/Black"); // the "red" symbol red = AlphabetManager.createSymbol( 'r', "red", Annotation.EMPTY_ANNOTATION );
http://www.biojava.org/tutorials/events2.html (2 di 11) [02/04/2003 13.39.46]
// the "black" symbol" black = AlphabetManager.createSymbol( 'b', "black", Annotation.EMPTY_ANNOTATION ); // again, add them and throw any exceptions on as assertion-failures. try { redBlack.addSymbol(red); redBlack.addSymbol(black); } catch (ChangeVetoException cve) { throw new BioError( cve, "Assertoin Failure: Can't add symbol to the red/black alphabet" ); } catch (IllegalSymbolException ise) { throw new BioError( ise, "Assertoin Failure: Can't add symbol to the red/black alphabet" ); } // and again lock the alphabet redBlack.addChangeListener(ChangeListener.ALWAYS_VETO, Alphabet.SYMBOLS); Notice that again while the symbols are added we must check that nothing goes wrong. Also, again, we lock the red/black alphabet so that it can't be tampered with. Now we will set up a probability distribution that can be sampled from to simulate the rolling of a roulet wheel. We will simply use an instance of UniformDistribution rather than generating a special distribution ourselves - cassinoes should have un-biassed wheels. wheelRoler = new UniformDistribution(rolls); } And there we close the static block. Everything is set up for a game of chance.
try { rollDist = DistributionFactory.DEFAULT.createDistribution(rolls); } catch (IllegalAlphabetException iae) { throw new NestedError(iae, "Could not create distribution"); } redBlackDist = new RedBlackDist(rollDist); Now we must make an object to estimate the rollDist probabilities. This is done using a DistributionTrainerContext instance called dtc. dtc will colate counts for each of the forty outcomes so that rollDist can then represent these frequencies as a probability distribution. final DistributionTrainerContext dtc = new SimpleDistributionTrainerContext(); dtc.registerDistribution(rollDist); Now we will create the thread that samples roles from the roulet wheel. It will synchronize upon itself so that we can suspend it as we wish. countAdder = new Thread(new Runnable() { public void run() { while(true) { We will check the value of the running member variable to check if we should be sampling the wheel. boolean running; synchronized(countAdder) { running = Roulet.this.running; } if(running == true) { Here we perform the sampling and inform the trainer of the role. To force rollDist to reflect the new counts, we also call tdc.train, and catch all the resulting exceptions (which should be imposible if everything is set up coorectly). Symbol s = Roulet.wheelRoler.sampleSymbol(); try { dtc.addCount(rollDist, s, 1.0); dtc.train(); } catch (IllegalSymbolException ise) { // should be impossible! throw new NestedError( ise, "Assertion Failure: Sampled symbol not in alphabet" ); } catch (ChangeVetoException cve) { cve.printStackTrace(); } Now we will synchronize on the thread and sleep for a half seccond. synchronized(countAdder) { try { countAdder.wait(500); } catch (InterruptedException ie) {
} } This code handles the case when the sampling thread has been asked to stop running temporarily. Again, we must synchronize on the sampling thread. } else { synchronized(countAdder) { try { countAdder.wait(); } catch (InterruptedException ie) { } catch (IllegalMonitorStateException imse) { throw new NestedError(imse, "Ouch"); } } } } } }); That is the end of the sampling thread. Now we can move onto the GUI. Let's set up buttons to start and stop the sampler thread and to clear the counts so far. final JButton start = new JButton("Start"); final JButton stop = new JButton("Stop"); final JButton clear = new JButton("Clear"); The start button must start of enabled, and should cause sampling to start. start.setEnabled(true); start.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent ae) { synchronized(countAdder) { running = true; start.setEnabled(false); stop.setEnabled(true); countAdder.notify(); } } }); The stop button should start off dissabled, and should cause the sampling to stop. stop.setEnabled(false); stop.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent ae) { synchronized(countAdder) { running = false; start.setEnabled(true); stop.setEnabled(false); countAdder.notify(); } }
}); The clear button should be enabled, and should both clear the counts and susspend sampling. clear.setEnabled(true); clear.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent ae) { synchronized(countAdder) { running = false; start.setEnabled(true); stop.setEnabled(false); dtc.clearCounts(); countAdder.notify(); } } }); Now we should build the GUI components to render the probability distributions as pie-charts. Pie allPie; try { allPie = new Pie(rollDist, AlphabetManager.getAlphabetIndex(allRolls)); } catch (IllegalSymbolException ise) { throw new NestedError(ise, "Assertion Failure: Can't make indexer"); } catch (BioException be) { throw new NestedError(be, "Assertion Failure: Can't make indexer"); } Pie redBlackPie = new Pie(redBlackDist); Now, we add all of these components to the applet. getContentPane().setLayout(new BorderLayout()); JPanel top = new JPanel(); top.setLayout(new FlowLayout()); top.add(start); top.add(stop); top.add(clear); getContentPane().add(top, BorderLayout.NORTH); JPanel center = new JPanel(); center.setLayout(new FlowLayout()); center.add(redBlackPie); center.add(allPie); Dimension d = new Dimension(200, 200); redBlackPie.setPreferredSize(d); allPie.setPreferredSize(d); getContentPane().add(center, BorderLayout.CENTER); } This is the end of init. It has set up the state of the object, ready for it to render estimated probabilities of each wheel outcome being observed by repeatedly sampling the roulet wheel.
protected void paintComponent(Graphics g) { super.paintComponent(g); Graphics2D g2 = (Graphics2D) g; double pad = 5.0; Rectangle2D boundingBox = new Rectangle2D.Double( pad, pad, getWidth() - 2.0 * pad, getHeight() - 2.0 * pad ); double midx = getWidth() * 0.5; double midy = getHeight() * 0.5; Now we can render each slice of the pie-chart, using a width proportional to the probability of each symbol, skipping each zero probability. double angle = 0.0; for(int i = 0; i < indexer.getAlphabet().size(); i++) { try { Symbol s = indexer.symbolForIndex(i); double p = dist.getWeight(s); if(p != 0.0) { double extent = p * 365.0; Arc2D slice = new Arc2D.Double(boundingBox, angle, extent, Arc2D.PIE); char token = s.getToken(); if(s == Roulet.red) { g2.setPaint(Color.red); } else if(s == Roulet.black) { g2.setPaint(Color.black); } else if( ((token - '0') % 2) == 0) { g2.setPaint(Color.red); } else { g2.setPaint(Color.black); } g2.fill(slice); g2.setPaint(Color.blue); g2.draw(slice); angle += extent; } } catch (IllegalSymbolException ise) { ise.printStackTrace(); } } The last task is to render on some labels so that we know what each slice represents. angle = 0.0; g2.setPaint(Color.yellow); for(int i = 0; i < indexer.getAlphabet().size(); i++) { try { Symbol s = indexer.symbolForIndex(i);
http://www.biojava.org/tutorials/events2.html (8 di 11) [02/04/2003 13.39.46]
double p = dist.getWeight(s); if(p != 0.0) { double extent = p * 365.0; double a2 = Math.toRadians(angle + 0.5 * extent); g2.drawString( s.getName(), (float) (midx + Math.cos(a2) * midx * 0.8), (float) (midy - Math.sin(a2) * midy * 0.8) ); angle += extent; } } catch (IllegalSymbolException ise) { ise.printStackTrace(); } } } } That is the end of the pie-chart class.
that is a special instance that passes on changes to one object as knock-on events to another. By using the ChangeEvent constructor that includes a ChangeEvent, we can pass on the complete chain-of-evidence that allows listeners to work out why we are claiming to alter. protected ChangeEvent generateEvent(ChangeEvent ce) { return new ChangeEvent( getSource(), Distribution.WEIGHTS, null, null, ce ); } }, Distribution.WEIGHTS); We must also add a listener to ourselves to trap successful attempts to change (those that are not vetoed), and to update the values of red and black. addChangeListener(propUpdater = new ChangeAdapter() { public void postChange(ChangeEvent ce) { red = 0.0; black = 0.0; for( Iterator i = ((FiniteAlphabet) (parent.getAlphabet())).iterator(); i.hasNext(); ) { Symbol s = (Symbol) i.next(); try { if( (s.getToken() - '0') % 2 == 0) { // even - red red += parent.getWeight(s); } else { // odd - black black += parent.getWeight(s); } } catch (IllegalSymbolException ise) { throw new NestedError(ise, "Assertion Failure: Can't find symbol"); } } } }, Distribution.WEIGHTS); } And that is the end of the constructor. Now we must provide the missing methods in AbstractDistribution. These are fairly booring. Our alphabet is the same as the roulet redBlack object, and getWeightImpl will return the value of red for the red symbol and the value of black for the black symbol. public Alphabet getAlphabet() { return Roulet.redBlack; } protected double getWeightImpl(AtomicSymbol sym) throws IllegalSymbolException { if(sym == Roulet.red) { return red;
http://www.biojava.org/tutorials/events2.html (10 di 11) [02/04/2003 13.39.46]
} else if(sym == Roulet.black) { return black; } else { throw new IllegalSymbolException("No symbol known for " + sym); } } All of these methods are just stubs. Notice that they throw ChangeVetoExceptions to indicate that they are not implemented. ChangeVetoException can either mean that the change is dissalowed because some listener explicitly stops it, or that the method is not supported. Either way, the state of the object will not be updated. protected void setWeightImpl(AtomicSymbol as, double weight) throws ChangeVetoException, IllegalSymbolException { throw new ChangeVetoException("RedBlackDist is immutable"); } protected void setNullModelImpl(Distribution nullModel) throws ChangeVetoException, IllegalAlphabetException { throw new ChangeVetoException("RedBlackDist is immutable"); } public Distribution getNullModel() { if(nullModel == null) { nullModel = new RedBlackDist(parent.getNullModel()); } return nullModel; } }
Implementing Changeability
By Matthew Pocock
We are going to implement a simple ChangeEvent source that stores a String name property and can inform other objects if this name changes. By the end of this tutorial you should be comefortable with the general issues surrounding implementing event sources and for ensuring that resources are allocated as needed.
synchronized(cs) { Next, we make a new ChangeEvent to describe how the object wishes to alter, we fire a preChange notification to the listeners so that they have a chance to veto the change, we make the change and lastly we inform the listeners that the change has been made. ChangeEvent ce = new ChangeEvent(this, Nameable.NAME, name, this.name); cs.firePreChange(ce); this.name = name; cs.firePostChange(ce); } } } } That is the end of the implementation.
} Some subclasses may wish to override this method and lazily instantiate resoruces when the first listener for a particular ChangeType is added. In this case, the overriden method should first call super.getChangeSupport and then perform any checkes it wishes. Now that the protected methods are in place, we can provide the bodies of the listener management methods. These firstly use getChangeSupport to retrieve the delegate, and then ask it to add or remove a listener. We must synchronize on the delegate to make sure that it maintains in a consistent state. public void addChangeListener(ChangeListener cl) { ChangeSupport cs = getChangeSupport(null); synchronized(cs) { cs.addChangeListener(cl); } } public void addChangeListener(ChangeListener cl, ChangeType ct) { ChangeSupport cs = getChangeSupport(ct); synchronized(cs) { cs.addChangeListener(cl, ct); } } public void removeChangeListener(ChangeListener cl) { ChangeSupport cs = getChangeSupport(null); synchronized(cs) { cs.removeChangeListener(cl); } } public void removeChangeListener(ChangeListener cl, ChangeType ct) { ChangeSupport cs = getChangeSupport(ct); synchronized(cs) { cs.removeChangeListener(cl, ct); } } } And that is the end of the class. You should be able to cut-and-paste this code into your own Changeable objects to implement the basic delegate-management.
public void setName(String name) throws ChangeVetoException { if(!hasListeners()) { setNameImpl(name); } else { ChangeSupport cs = getChangeSupport(Nameable.NAME); synchronized(cs) { ChangeEvent ce = new ChangeEvent(this, Nameable.NAME, name, this.name); cs.firePreChange(ce); setNameImpl(name); cs.firePostChange(ce); } } } protected abstract void setNameImpl(String name) throws ChangeVetoException; } The implementation would look something like this. public class MyNameable extends AbstractNameable { private String name; public String getName() { return this.name; } public void setName(String name) throws ChangeVetoException { this.name = name; } } This split between the abstract implementation that handles all of the event guts and a realy light-weight implementation that controls access to data-storage is very useful in practice, and is used extensively in BioJava, particularly in the org.biojava.bio.dist package.
What next?
By now, you should be able to define interfaces that are Changeable, and to write implementations of these interfaces using AbstractChangeable or by delegating to ChangeSupport directly. For cases where there are many implementations that differ only in the means of data-storage, you should be able to factor the Changeablility code into an abstract class, and subclass this for each form of data-access. BioJava events tutorial 0.1 by Matthew Pocock. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
Getting up and running Example application - producing HTML from blast-like output
BioJava Blast2HTML tutorial 1.0 by Cambridge Antibody Technology. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
Introduction
The program implements the "occasionally dishonest casino" example used in the book "Biological Sequence Analysis" by R. Durbin, S. Eddy, A. Krogh, G. Mitchison. Basically, it conceives a casino with two dice, one fair and one loaded. The fair die lands on any of its sides equal probability while the loaded die yields "6" half the time, all the other sides being of equal probability. These probabilities represent the emission distribution of the fair die state and the loaded die states respectively. The casino switches between using the fair die and the loaded die periodically. When on the fair die, the probability that the next throw is with the fair die too is 0.95. Similarly, when on the loaded die, the probability of continuing with it is 0.90. These probabilities yield the transition distributions of the states. The HMM as modelled in the code is slightly modified from the above description with the inclusion of a MagicalState. This state is used to represent the start and end of the states of the model. The transition from the MagicalState to the fair die state occurs with a probability of 0.8 while the transition to the loaded die state occurs with a probability of 0.2. A termination condition was also introduced to allow transitions from the fair die and loaded die states to the Magical state with a probability of 0.01. The resultant HMM looks like this:-
Code
The core of the program is the createCasino() method. This creates an instance of the MarkovModel class that implements the model. public static MarkovModel createCasino() { Symbol[] rolls=new Symbol[6]; //set up the dice alphabet SimpleAlphabet diceAlphabet=new SimpleAlphabet(); diceAlphabet.setName("DiceAlphabet"); for(int i=1;i<7;i++) { try { rolls[i-1]=
http://www.biojava.org/tutorials/dp-doc.html (1 di 5) [02/04/2003 13.39.48]
AlphabetManager.createSymbol((char)('0'+i),""+i,Annotation.EMPTY_ANNOTATION); diceAlphabet.addSymbol(rolls[i-1]); } catch (Exception e) { throw new NestedError( e, "Can't create symbols to represent dice rolls" ); } } A Symbol array rolls is created to hold the Symbols generated by AlphabetManager to represent the outcomes of the dice. An Alphabet is also defined over these Symbols. Next, distributions representing the emission probabilities of the fair die and loaded die states are created (named fairD and loadedD respectively). The die states themselves are then created as SimpleEmissionStates, fairS and loadedS respectively. You will observe an int array advance with a single value of 1. In a single-head HMM like ours, there is only one generated sequence and in our case, we progress along this sole sequence a single position per transition in the model. In multihead HMMs, there will be multiple sequences generated by the HMM and it is possible that the increment through the different sequences might be different. For example, single-stepping a protein sequence amounts to an increment of three on its corresponding DNA sequence. int [] advance = { 1 }; Distribution fairD; Distribution loadedD; try { fairD = DistributionFactory.DEFAULT.createDistribution(diceAlphabet); loadedD = DistributionFactory.DEFAULT.createDistribution(diceAlphabet); } catch (Exception e) { throw new NestedError(e, "Can't create distributions"); } EmissionState fairS = new SimpleEmissionState("fair", Annotation.EMPTY_ANNOTATION, advance, fairD); EmissionState loadedS = new SimpleEmissionState("loaded", Annotation.EMPTY_ANNOTATION, advance, loadedD); The HMM is then created with these states:SimpleMarkovModel casino = new SimpleMarkovModel(1, diceAlphabet, "Casino"); try { casino.addState(fairS); casino.addState(loadedS); } catch (Exception e) { throw new NestedError(e, "Can't add states to model"); } Next, we need to model the transitions between the states. We do this like so:try { casino.createTransition(casino.magicalState(),fairS); casino.createTransition(casino.magicalState(),loadedS); casino.createTransition(fairS,casino.magicalState()); casino.createTransition(loadedS,casino.magicalState());
casino.createTransition(fairS,loadedS); casino.createTransition(loadedS,fairS); casino.createTransition(fairS,fairS); casino.createTransition(loadedS,loadedS); } catch (Exception e) { throw new NestedError(e, "Can't create transitions"); } Note the presence of a MagicalState that is returned by casino.magicalState(). This is inherent to the SimpleMarkovModel class and does not need to be created by the user. The emission distributions fairD and loadedD we set up earlier need to be initialised. We do that here. try { for(int i=0;i<rolls.length;i++) { fairD.setWeight(rolls[i],1.0/6.0); loadedD.setWeight(rolls[i], 0.1); } loadedD.setWeight(rolls[5],0.5); } catch (Exception e) { throw new NestedError(e, "Can't set emission probabilities"); } We also need to initialise the transition distributions. Note how this is done: the transition distribution of each state is requested from the model with a getWeights() and then updated with the required values by calling the getWeight() method of that distribution. It is not necessary thereafter to call setWeights() to pass the Distribution for a state back to the model. This may seem strange but it is done this way because model object may use unique Distribution classes that cannot be replaced by a generic Distribution class for greater internal efficiency. Every state in the model needs to have its own transition distribution initialised appropriately. //set up transition scores. try { Distribution dist; dist = casino.getWeights(casino.magicalState()); dist.setWeight(fairS, 0.8); dist.setWeight(loadedS, 0.2); dist = casino.getWeights(fairS); dist.setWeight(loadedS, 0.04); dist.setWeight(fairS, 0.95); dist.setWeight(casino.magicalState(), 0.01); dist = casino.getWeights(loadedS); dist.setWeight(fairS, 0.09); dist.setWeight(loadedS, 0.90); dist.setWeight(casino.magicalState(), 0.01); } catch (Exception e) { throw new NestedError(e, "Can't set transition probabilities"); } Having completed constructing the MarkovModel, all that remains is to return it to the caller. return casino;
http://www.biojava.org/tutorials/dp-doc.html (3 di 5) [02/04/2003 13.39.48]
The top line is the sequence emitted by our HMM when we made it generate 300 throws. The next is the state from which the throw came (f-fair l-loaded, these are the first letters of the labels "fair" and "loaded" we used when creating the SimpleEmissionState objects that represent the dice). The last is similar but this time from the StatePath v that is the result of the Viterbi algorithm. The performance is pretty on on this occasion but it can vary widely!
Installing Postgresql
If not already installed, PostgreSQL can be installed from RPMs with:rpm -ivh postgresql-7.2.1-5.i386.rpm postgresql-libs-7.2.1-5.i386.rpm postgresql-server-7.2.1-5.i386.rpm Root privileges will almost certainly be required (if not your machine is seriously insecure!!!). You will also need a JDBC to permit Java to connect to your PostgreSQL database and that can be installed with postgresql-jdbc-7.1.3-2.i386.rpm. However, I would recommend downloading the latest from here. You will end up with a jar file containing the JDBC implementation which you will need to place in your CLASSPATH. The installs will place a control script within /etc/init.d named postgresql. When this script runs for the first time, it will create a database cluster and initialise it. This cluster is the set of files used by the database for storage purposes. On RH7.2 the default location for the cluster in at /var/lib/pgsql/. This is a bit of a disadvantage as /var is usually a pretty small partition. It is possible at this stage to symlink /var/lib/pgsql to a directory within another partition altogether to circumvent this problem. I would suggest doing this immediately. At this stage, you will need to create the database you intend using and a user to use it. I would suggest NOT using the superuser named postgres for anything other than occasional essential administration. At this point, I will digress briefly into PostgreSQL authentication as choices you make will affect what you can do. PostgreSQL has a variety of routes to achieve this. The default at installation permits connection only from local users and permits access to a database ONLY by a user of the same username. This may be quite adequate for experimentation but not so convenient if you want to set up a BioSQL database for several local users or possibly even remote users. PostgresQL has other mechanisms which are described in their documentation. Authentication is specifically described here. You might consider password authentication but do use md5 encryption with this option, especially if you intend to authenticate remote users. In the Redhat 7.2 installation, the file you will need to edit to set these options is /var/lib/pgsql/data/pg_hba.conf. The location of this file varies with other distributions. As initially installed in RH7.2, PostgreSQL will require root privileges to set up further. The postgres superuser cannot be logged into but you can invoke the necessary commands from root to execute:$ su postgres -c 'createdb <insert db name here>' and a user created with:$ su postgres -c 'createuser <insert user name here>' For the purposes of this tutorial, I will not change the default authentication so the database name should be chosen to correspond to your user name. The user name used in this exercise is gadfly and this will be reflected in the choice of database name and user name. One additional change that will be necessary is to enable TCP/IP connections as the Unix domain socket restriction of the default installation is incompatible with the PostgreSQL JDBC implementation.
http://www.biojava.org/tutorials/biosql.html (1 di 3) [02/04/2003 13.39.48]
To do so, you need to add the "-i" flag to the startup script. Edit /etc/init.d/postgresql and change the line:su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster start > /dev/null 2>&1" < /dev/null to:su -l postgres -s /bin/sh -c "/usr/bin/pg_ctl -o "-i" -D $PGDATA -p /usr/bin/postmaster start > /dev/null 2>&1" < /dev/null The /var/lib/pgsql/data/pg_hba.conf file will also need to be edited to permit access via TCP/IP. This can be achieved by uncommenting:#host all 127.0.0.1 255.255.255.255 trust
Both these operations require root access: seek advice as to the best option given your local security circumstances. One additional change is that postgresql in RH7.3 does not come with the pgsql language enabled. As BioSQL uses that for acceleration, you will need to enable it. This can be done within root with:su postgres -c 'createlang plpgsql template1'
Installing BioSQL
The PostgreSQL server must be running to complete the BioSQL installation. You can check that it is with:$ /etc/rc.d/postgresql status and doing:$ /etc/rc.d/postgresql start if it is not running. You may require root privileges for this. You should have PostgreSQL started up during system startup with the SysV init system that comes with most Unixen. You will need three scripts that serve to initialise the new database with the BioSQL schema and load accelerators for this schema. These are:biosql-accelerators-pg.sql biosqldb-assembly-pg.sql biosqldb-pg.sql They may be obtained from here. We now need to load the schema into the database we have created. We do so as follows (user entries in bold):$ psql gadfly Welcome to psql, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit
gadfly=> \i biosqldb-pg.sql
http://www.biojava.org/tutorials/biosql.html (2 di 3) [02/04/2003 13.39.48]
CREATE psql:biosqldb-pg.sql:13: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'biodatabase_pkey' for table 'biodatabase' CREATE <rest of output snipped> INSERT 16862 1 psql:biosqldb-pg.sql:304: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'cache_corba_support_pkey' for table 'cache_corba_support' CREATE gadfly=> \i biosqldb-assembly-pg.sql <rest of output snipped> gadfly=> \i biosql-accelerators-pg.sql <rest of output snipped> gadfly=> \q $ Let's walk through the session above. psql is the name of the PostgreSQL interactive shell. We invoke it to connect to the PostgreSQL server and accept commands for a database named gadfly that we had created earlier. psql starts and displays its user prompt. All psql commands begin with a backslash (\). The \i instructs psql to take input from a file. I instruct psql to take input from the biosqldb-pg.sql, biosqldb-assembly-pg.sql and biosql-accelerators-pg.sql successively. psql reads the SQL statements within each of the files and proceeds to construct the BioSQL database schema, printing out a summary of its actions as it proceeds. Finally, I quit the psql interactive shell with \q. At this point you have a BioSQL schema installed and ready to run!!! Do remember that if you do not explicitly load the JDBC drivers in your code, you should set a Java environment variable to tell it what to look for like so:java -Djdbc.drivers=org.postgresql.Driver <whatever your java code is>
Allows you to focus on the objects you want to create, and forget about writing complex parsing code Allows you to make use of the output from more pieces of software. Because of the "concept-based" approach to the representation of data, many of the Content Handler classes you write can be re-used with the output of several different programs.
Recipes
The recipes are simple examples designed to get you up and running populating objects in the way you want. For each example recipe, two classes are provided: q An XML Content Handler (this is the class that does the work of populating objects with data) q A sample application class that takes blast-like program output and and sets up for parsing using the Content Handler class. NB You will find the complete source code for all the classes described here the demos section of biojava, in the eventbasedparsing package. After Example 1, the only classes that are described are the XML Content Hander classes, because the application classes are essentially identical for all examples. To help you get going, in addition to the source code for the examples, there are also several example examples of raw ouput from NCBI-blast, WU-blast, and HMMER the "files" directory of the demos section of biojava.
Example 1
For all the hits from a search as detailed in the summary section of the output, prepare a list of Hit Ids. This is an example of a re-useable Content Handler. The same piece of code works equally well with the output from multiple flavours of NCBI Blast, WU-Blast, and HMMER. Step A - Create an application that sets up the parser and does the parsing The full source is in eventbasedparsing.TutorialEx1. Because there is no difference between what you do here, and what you would do to parse XML files there isn't much to do. First create a SAX Parser that deals with Blast-like output. XMLReader oParser = (XMLReader) new BlastLikeSAXParser(); Next choose the Content Handler. In this case, we will be using the class TutorialEx1Handler, which takes a reference to an ArrayList in the constructor. When the SAX Parser parses the file, the ContentHandler will populate the ArrayList with Hit Ids from the summary section of the output. ContentHandler oHandler = (ContentHandler) new TutorialEx1Handler(oDatabaseIdList);
The final step in the set-up is to connect the Content Handler to the SAX Parser. oParser.setContentHandler(oHandler); For the purposes of the tutorial applications, we will simply be reading output from files on disk. Create a FileInputStream, and parse it by calling the parse method on the SAX Parser. oInputFileStream = new FileInputStream(oInput); oParser.parse(new InputSource(oInputFileStream)); Finally, having populated the ArrayList with HitIds, we simply print them out. System.out.println("Results of parsing"); System.out.println("=================="); for (int i = 0; i < oDatabaseIdList.size();i++) { System.out.println(oDatabaseIdList.get(i)); } Step B - Create the logic for parsing This is simply of matter of writing an XML Content Handler. The full source is in eventbasedparsing.TutorialEx1Handler. The logic here is trivial, we simply wish to identify Hit Ids that are contained within in the Summary sections of the output data, and add each Hit Id to the ArrayList. if ( (oNameStack.peek().toString().equals("HitId")) && (this.findInStack("Summary") != -1) ) { oDatabaseIdList.add(poAtts.getValue("id")); } Running the application After compiling, if you run the application from the demos directory by typing the following: java eventbasedparsing/TutorialEx1 files/ncbiblast/shortBlastn.out You should see the following output: Results of parsing ================== U51677 L38477 X80457 BioJava Blast-like parsing tutorial 0.1 by Cambridge Antibody Technology. Please mail any comments or suggestions to the author or to the biojava-l mailing list.
BioJava: Blast2HTML
Cambridge Antibody Technology Introduction
This tutorial covers the use of the Blast-like parsing framework to generate HTML representations of the Blast-like XML. Here are some examples of the type of output you can generate. q Blastp
q
Blastn
Prerequisites are:q an upto date copy of biojava q the programs in the demos directory
Controls whether a pair of characters in the alignment are styled or not. AlignmentStyler Decides what style to apply to any given pair of characters.
E.g. To markup mismatches in red you would have a ColourCommand that decides only mismatches are coloured, and then an AlignmentStyler that colours any characters passed to it as red. There are a couple of implementations of AlignmentStyler: SimpleAlignmentStyler and BlastMatrixAlignmentStyler - see the Javadocs for details. Of course you can also use custom handlers to only pass on a subset of the output. BioJava Blast2HTML tutorial 1.0 by Cambridge Antibody Technology. Please mail any comments or suggestions to the author or to the biojava-l mailing list.