Perl For Bio in For Ma Tics
Perl For Bio in For Ma Tics
Programming languages
Self-contained language
Platform-independent Used to write O/S C (imperative, procedural) C++, Java (object-oriented) Lisp, Haskell, Prolog (functional)
Scripting language
Closely tied to O/S Perl, Python, Ruby
Perl overview
Interpreted, not compiled
Fast edit-run-revise cycle
Introduction to Bioinformatics file formats Practical data-handling algorithms Exposure to Bioinformatics software
Structural elements
Learning Perl, Schwartz et al ISBN 0-596-10105-8 O'Reilly "There's more than one way to do it
Q: But which is best? A: TESTS
Terminal session Description of test conditions
Think before you write Use a good text editor Good debugging style
Perl basics
Basic syntax of a Perl program:
Lines # Elementary Perl program All statements end beginning print "Hello World\n"; with a semicolon with "#" are comments, and are ignored by Perl Single or double quotes enclose a "string literal" "\n" means new line (double quotes are "interpolated") print statement tells Perl to print the following stuff to the screen Hello World
Variables
We can tell Perl to "remember" a particular value, using the assignment operator =:
$x = 3; print $x; $x = "ACGCGT"; print $x;
Binding site for yeast transcription factor MCB
ACGCGT
Arithmetic operations
Basic operators are + - / * %
$x = 14; $y = 3; print "Sum: ", $x + $y, "\n"; print "Product: ", $x * $y, "\n"; print "Remainder: ", $x % $y, "\n"; $x = 5; print "x started as $x\n"; $x = $x * 2; print "Then x was $x\n"; $x = $x + 1; print "Finally x was $x\n"; Sum: 17 Product: 42 Remainder: 2
String operations
Concatenation . .=
$a = "pan"; $b = "cake"; $a = $a . $b; print $a; $a = "soap"; $b = "dish"; $a .= $b; print $a;
pancake
soapdish
accacguuaggucu
C does not have a basic type for strings only individual characters. Strings are built up from more basic elements as arrays of characters (well get to arrays later). Much of this functionality is provided in C and C++ as part of the standard library.
Conditional blocks
The ability to execute an action contingent on some condition is what distinguishes a computer from a calculator. In Perl, this looks like this:
if (condition) { action } else { alternative }
$x = 149; $y = 100; These braces { } if ($x > $y) tell Perl which { piece of code print "$x is greater than $y\n"; is contingent on } the condition. else { print "$x is less than $y\n"; } 149 is greater than 100
Conditional operators
"does not equal"
Note that the test for "$x equals $y" is $x==$y, not $x=$y 20 equals 20
String: eq ne gt lt ge le
"equals" "does not equal"
Shorthand syntax for assigning more than one variable at a time
($x, $y) = ("Apple", "Banana"); if ($y gt $x) { print "$y after $x "; }
Logical operators
Logical operators: && means "and", || means "or"
$x = 222; if ($x % 2 == 0 and $x % 3 == 0) { print "$x is an even multiple of 3\n"; } 222 is an even multiple of 3
An exclamation mark ! is used to negate what follows Thus !($x < $y) means the same as ($x >= $y)
In computers, the value zero is often used to represent falsehood, while any non-zero value (e.g. 1) represents truth. Thus:
if (1) { print "1 is true\n"; } if (0) { print "0 is true\n"; } if (-99) { print "-99 is true\n"; } 1 is true -99 is true
Loops
Here's how to print out the numbers 1 to 10:
The code inside the braces is repeatedly executed as long as the condition $x<=10 remains 1 true $x = 1; while ($x <= 10) { print $x, " "; ++$x; } 2 3 4 5 6 7 8 9 10
Equivalent to $x = $x + 1;
This is a while loop. The code is executed while the condition is true.
This form of while loop is common enough to have its own shorthand: the for loop.
Continuation Initialisation Test for completion for ($x = 1; $x <= 10; ++$x) { print $x, " "; }
A variable that has not yet been assigned a value has the special value undef Often, if you try to do something "illegal" (like reading from a nonexistent file), you end up with undef as a result
C does not have defined or undef. At best, using an uninitialized value will cause a compiler error; at worst, it will lead to undefined behavior (i.e. disaster)
Once the file is opened, we can read a single line from it into the scalar $x :
This reads the next line from the file, including the newline at the end, "\n". if the end of the file is reached, $x is assigned the special value undef $x = <FILE>;
Debugging
Most programs don't work first time Most apparently "working" programs actually aren't Bugs are cryptic Debugging is a scientific process As you gain experience, you will begin to "insure" against bugs with your programming technique
Debugging is scientific
Finding bugs can be very frustrating A job that you thought was nearly finished, for which you have budgeted a certain amount of time, stretches out indefinitely Often you may have no idea what's wrong If you think of debugging as a scientific problem and approach it systematically, much of the pain disappears
note recent changes (usually the cause of bugs) look for similar problems (can ask other developers) check "machine environment" (e.g. if you move to a
different computer, does it have less memory? less disk space?)
what should that code be doing? this can be seen as a continuation of Step 1 ("identify the problem") debugging is a cyclic, interactive process
Proactive debugging
Place consistency checks in your code
also called assertions
Pattern-matching
A very sophisticated kind of logical test is to ask whether a string contains a pattern e.g. does a yeast promoter sequence contain the MCB binding site, ACGCGT?
20 bases upstream of the yeast gene YBR007C $name = "YBR007C"; $dna="TAATAAAAAACGCGTTGTCG"; if ($dna =~ /ACGCGT/) { print "$name has MCB!\n"; } YBR007C has MCB!
The pattern binding operator =~ The pattern for the MCB binding site
FASTA format
A format for storing multiple named sequences in a single file >CG11604
Name of sequence is preceded by > symbol NB sequences can span multiple lines
TAGTTATAGCGTGAGTTAGT TGTAAAGGAACGTGAAAGAT AAATACATTTTCAATACC >CG11455 TAGACGGAGACCCGTTTTTC TTGGTTAGTTTCACATTGTA AAACTGCAAATTGTGTAAAA ATAAAATGAGAAACAATTCT GGT >CG11488 TAGAAGTCAAAAAAGTCAAG TTTGTTATATAACAAGAAAT CAAAAATTATATAATTGTTT TTCACTCT
This file contains 3' UTRs for Drosophila genes CG11604, CG11455 and CG11488
Pattern replacement
$_ is the default variable for these operations
open FILE, "fly3utr.txt"; while (<FILE>) { if (/>/) { s/>//; print; } } close FILE;
The new statement s/>// is an example of a replacement. General form: s/OLD/NEW/ replaces OLD with NEW Thus s/>// replaces ">" with "" (the empty string)
End of file?
no
Stop
Sequence name Print last sequence length
no
yes
no
First sequence?
yes
open FILE, "fly3utr.txt"; while (<FILE>) { chomp; if (/>/) { if (defined $len) { print "$name $len\n"; } $name = $_; $len = 0; } else { $len += length; } } print "$name $len\n"; close FILE; >CG11604 58 >CG11455 83 >CG11488 68
Arrays
An array is a variable holding a list of items
@nucleotides = ('a', 'c', 'g', 't'); print "Nucleotides: @nucleotides\n"; Nucleotides: a c g t
Array literals
There are several, equally valid ways to assign an entire array at once.
This is the most common: a commaseparated list, delimited by parentheses @a = (1,2,3,4,5); print "a = @a\n"; @b = ('a','c','g','t'); print "b = @b\n"; @c = 1..5; print "c = @c\n"; @d = qw(a c g t); print "d = @d\n";
a b c d
= = = =
1 a 1 a
2 c 2 c
3 g 3 g
4 5 t 4 5 t
Accessing arrays
To access array elements, use square brackets; e.g. $x[0] means "element zero of array @x"
@x = ('a', 'c', 'g', 't'); print $x[0], "\n"; $i = 2; print $x[$i], "\n"; a g
Remember, element indices start at zero! If you use an array @x in a scalar context, such as @x+0, then Perl assumes that you wanted the length of the array.
@x = ('a', 'c', 'g', 't'); print @x + 0; 4
Array operations
You can sort and reverse arrays...
@x = ('a', 't', 'g', 'c'); @y = sort @x; @z = reverse @y; print "x = @x\n"; print "y = @y\n"; print "z = @z\n"; x = a t g c y = a c g t z = t g c a
You can read the entire contents of a file into an array (each line of the file becomes an element of the array)
open FILE, "sequence.txt"; @x = <FILE>;
I started with Fame Power Money Then I had Fame Power Success Now I have Glamour Power Success I lost Money and Fame
foreach
Finding the total of a list of numbers:
foreach statement loops through each entry in an array @val = (4, 19, 1, 100, 125, 10); $total = 0; foreach $x (@val) { $total += $x; } print $total;
259
Equivalent to:
@val = (4, 19, 1, 100, 125, 10); $total = 0; for ($i = 0; $i < @val; ++$i) { $total += $val[$i]; } print $total;
259
Iterator comparison
foreach
[yoko:~] yam% time perl -e 'foreach $n (1..10**6) { $total += log $n } print $total, "\n"' 12815518.3846579 0.765u 0.007s 0:00.80 95.0% 0+0k 0+0io 0pf+0w
iMac G5 1.8GHz 512MB, Mac OS X 10.4.2, perl v5.8.6 built for darwin-thread-multi-2level
for
[yoko:~] yam% time perl -e 'for ($n = 1; $n <= 10**6; ++$n) { $total += log $n } print $total, "\n"' 12815518.3846579 1.080u 0.007s 0:01.12 96.4% 0+0k 0+0io 0pf+0w
iMac G5 1.8GHz 512MB, Mac OS X 10.4.2, perl v5.8.6 built for darwin-thread-multi-2level
(technically, the keywords for and foreach are interchangeable; historically, for was used with initialization-continuation-termination constructs and foreach was used with arrays)
Review: pattern-matching
The following code:
if (/ACGCGT/) { print "Found MCB binding site!\n"; }
prints the string "Found MCB binding site!" if the pattern "ACGCGT" is present in the default variable, $_ Instead of using $_ we can "bind" the pattern to another variable (e.g. $dna) using this syntax: if ($dna =~ /ACGCGT/) {
print "Found MCB binding site!\n"; }
We can replace the first occurrence of ACGCGT with the string _MCB_ using the following syntax: $dna =~ s/ACGCGT/_MCB_/; We can replace all occurrences by appending a 'g':
$dna =~ s/ACGCGT/_MCB_/g;
Regular expressions
Perl provides a pattern-matching engine Patterns are called regular expressions They are extremely powerful
probably Perl's strongest feature, compared to other languages
Attachment of a 14-sugar oligosaccharide Occurs at asparagine residues with the consensus sequence NX1X2, where
X2 is serine or threonine
QuickTime and a X1 can be anything decompressor (but proline & aspartic acid inhibit) are needed to see this picture.
The special filehandle STDIN means "standard input", i.e. the keyboard
Sometimes (e.g. in Windows IDEs) the output isnt printed until the script stops This is because of buffering. To stop buffering, set to "autoflush":
$| = 1; while (<STDIN>) { print; }
In general square brackets denote a set of alternative possibilities Use - to match a range of characters: [A-Z] . matches anything \s matches spaces or tabs Italics denote \S is anything that's not a space or tab input text [^X] matches anything but X
Won't match THIS Will match this Matched: Will match this Won't match ThE oThER Will match the other Matched: Will match the other
e.g. /the (\S+) sat on the (\S+) drinking (\S+)/ matches "the cat sat on the mat drinking milk" with $1="cat", $2="mat", $3="milk"
s/OLD/NEW/ replaces first "OLD" with "NEW" s/OLD/NEW/g is "global" (i.e. replaces every occurrence of "OLD" in the string)
$| = 1; while (<STDIN>) { $_ = uc $_; while (/(N[^PD][ST])/g) { print "Potential N-glycosylation sequence ", $1, " at residue ", pos() - 2, "\n"; } }
pos() is index of first residue after match, starting at zero; so, pos()-2 is index of first residue of three-residue match, starting at one.
PROSITE a database of regular expressions for protein families, domains and motifs
Pfam a database of Hidden Markov Models (HMMs) equivalent to probabilistic regular expressions
Subroutines
Often, we can identify self-contained tasks that occur in so many different places we may want to separate their description from the rest of our program. Code for such a task is called a subroutine. Examples of such tasks: NB: Perl provides
finding the length of a sequence reverse complementing a sequence finding the mean of a list of numbers
the subroutine length($x) to do this already
Subroutine calls
Numbers: 1 5 1 12 3 4 6 Maximum: 12
Data structures
Suppose we have a file containing a table of Drosophila gene names and cellular compartments, one pair on each line:
Cyp12a5 MRG15 Cop bor Bx42 Mitochondrion Nucleus Golgi Cytoplasm Nucleus
Genes: Cyp12a5 MRG15 Cop bor Bx42 Compartments: Mitochondrion Nucleus Golgi Cytoplasm Nucleus
The opposite of split is join, which makes a scalar from an array: print join (" and ", @gene);
Cyp12a5 and MRG15 and Cop and bor and Bx42
Binary search
The previous algorithm is inefficient. If there are N entries in the list, then on average we have to search through (N+1) entries to find the one we want. For the full Drosophila genome, N=12,000. This is painfully slow. An alternative is the Binary Search algorithm:
Start with a sorted list. Compare the middle element with the one we want. Pick the half of the list that contains our element. Iterate this procedure to "home in" on the right element. This takes around log2(N) steps.
$comp{"Cop"} = "Golgi";
The term substr($x,$i,$len) returns the substring of $x starting at position $i with length $len. For example, substr("Biology",3,3) is "log"
open FILE1, "fosn1.txt"; while (<FILE1>) { $gotName{$_} = 1; } close FILE1; open FILE2, "fosn2.txt"; while (<FILE2>) { print if $gotName{$_}; } close FILE2;
CG1041 CG1167
Assigning hashes
A hash can be assigned directly, as a list of "key=>value" pairs:
%comp = ('Cyp12a5' => 'Mitochondrion', 'MRG15' => 'Nucleus', 'Cop' => 'Golgi', 'bor' => 'Cytoplasm', 'Bx42' => 'Nucleus'); print "keys: ", join(";",keys(%comp)), "\n"; print "values: ", join(";",values(%comp)), "\n";
DDESC
g: a: c: t:
5 5 1 4
1 1 3 2 2 2 1 1 1
Note how we keep passing %freq back into the count_nmers subroutine, to get cumulative counts
Opening a file: Closing a file: Reading a line: Reading an array: Printing a line: Read-only: Write-only: Test if file exists:
open XYZ, $filename; close XYZ; $data = <XYZ>; @data = <XYZ>; print XYZ $data; open XYZ, "<$filename"; open XYZ, ">$filename"; if (-e $filename) { print "$filename exists!\n"; }
45 113
227 -2 227 -1
16 2
Hexadecimal notation
Computers use binary notation, which is tricky to interconvert to/from decimal notation however, binary notation is big & unwieldy A compromise is to use hexadecimal Hexadecimal is base 16 (decimal is base 10, binary is base 2) The letters A-F are used to represent the extra digits for 10-15
Binary: 101 1011 11100 101000011 Decimal: 5 11 28 323 Hexadecimal: 5 B 1C 143
References
Recall the subroutine find_max(@x) which returns the largest element in the array @x Count the number of times we create an array in this code.
Array @x created here @x copied into @_ here @x = (1, 5, 1, 12, 3, 4, 6); $max = find_max (@x); sub find_max { my @data = @_; ...
All in all, we've created three copies of this array. Each copy uses up time and memory. This seems unnecessary... and it is. Instead of passing the whole array into the subroutine, we could simply tell the subroutine where in memory the array begins. The memory address of a particular variable is called a reference to that variable. This is a useful abstraction. Addresses are often displayed in hexadecimal.
Reference syntax
To create a reference to
a scalar, $x: an array, @x: a hash, %x:
$scalar_ref = \$x; $array_ref = \@x; $hash_ref = \%x;
To access a reference to
a scalar: $x = $$scalar_ref; an array: @x = @$array_ref; an array element: $x = $array_ref->[3]; %x = %$hash_ref; a hash: a hash element: $x = $hash_ref->{'key'};
$x = $$array_ref[3];
References to scalars
$x = 10; $y = 20; print "Initially: x=$x, y=$y\n"; $xReference = \$x; print "X-reference: $xReference\n"; print "Referenced variable: $$xReference\n"; $$xReference += 3; print "Now: x=$x, y=$y\n"; $yReference = \$y; print "Y-reference: $yReference\n"; print "Referenced variable: $$yReference\n"; $$yReference *= 2; print "Finally: x=$x, y=$y\n"; Initially: x=10, y=20 X-reference: SCALAR(0x1832ac0) Referenced variable: 10 Now: x=13, y=20 Y-reference: SCALAR(0x1832ae4) Referenced variable: 20 Finally: x=13, y=40 This reference points to $x This changes the value of $x This reference points to $y This changes the value of $y
This is the memory location used to store $x This is the memory location used to store $y
References to arrays
@x = ('a', 'c', 'g', 't'); @y = 1..10; print "x: @x\n"; print "y: @y\n"; $xReference = \@x; print "X-reference: $xReference\n"; print "Referenced array: @$xReference\n"; $$xReference[3] =~ tr/t/u/; print "New x: @x\n"; $yReference = \@y; print "Referenced array: @$yReference\n"; $yReference->[3] *= 2; print "New y: @y\n"; x: a c g t y: 1 2 3 4 5 6 7 8 9 10 X-reference: ARRAY(0x1832b08) Referenced array: a c g t New x: a c g u Referenced array: 1 2 3 4 5 6 7 8 9 10 New y: 1 2 3 8 5 6 7 8 9 10 This reference points to @x This changes the 4th element of @x This reference points to @y This changes the 4th element of @y (NB alternative notation) Note that the type of reference is now ARRAY, not SCALAR
References to hashes
%comp = ('Cyp12a5' => 'Mitochondrion', 'MRG15' => 'Nucleus', 'Cop' => 'Golgi', 'bor' => 'Cytoplasm', 'Bx42' => 'Nucleus'); $ref = \%comp; print "Values: ", join(" ",values(%comp)), "\n"; print "Ref: $ref\n"; print "Ref values: ", join(" ",values(%$ref)), "\n"; $$ref{'MRG15'} =~ s/N/n/; print "New values: ", join(" ",values(%comp)), "\n";
Values: Cytoplasm Golgi Nucleus Mitochondrion Nucleus Ref: HASH(0x1832b08) Ref values: Cytoplasm Golgi Nucleus Mitochondrion Nucleus New values: Cytoplasm Golgi Nucleus Mitochondrion nucleus
References to subroutines
We can also have references to subroutines Syntax for assigning a subroutine reference:
$subref = \&read_FASTA;
Anonymous subroutines:
$subref = sub { print "Hello world\n"; }; &$subref(); Hello world
References to code
sub hello { print "Hello @_!\n"; } my $codeRef1 = \&hello; &$codeRef1 ("Mr", "President"); print "Ref: $codeRef1\n"; my $codeRef2 = sub { print "Goodbye @_!" }; &$codeRef2 ("cruel", "world"); This is an anonymous subroutine reference The reference points to the subroutine hello
An anonymous subroutine is one that is never named, but only referenced. Well be seeing more about anonymous references on the following slides.
We can also create an array and assign a reference to it, without explicitly naming the array variable:
$nucleotide_ref = ['a', 'c', 'g', 't'];
Arrays of arrays
More precisely, arrays of references-to-arrays. Suppose we want to represent this matrix:
This matrix could be a table of RNA base-pairing scores if the row and column indices are (A,C,G,U). The score of a pair is the number of strong hydrogen bonds that it forms. Thus, A-U and U-A pairs score +2; C-G and G-C pairs score +3; G-U and U-G pairs score +1; and all other pairs score 0.
0 0 0 2
0 0 3 0
0 3 0 1
2 0 1 0
The vector is a C++ template. Templates (like C arrays) are strongly typed, unlike Perls weakly typed arrays & hashes.
Genome annotations
+ + + + + + -
0 0 . . . . 0
Coding End Strand frame Start residue (+ or -) ("." if not residue (starts at 1) Score applicable) (starts at 1)
Group
Many of these now obsolete, but name/start/end/strand (and sometimes type) are useful Methods: read, write, compareTo(GFF_file), getSeq(FASTA_file)
Splits the line into at most nine fields, separated by tabs ("\t") Appends a reference to @data to the @gff array
Checking every possible pair takes time N2 to run, where N is the number of GFF lines (how can this be improved?)
Note: this code is slow. Vast improvements in speed can be gained if we sort the @gff array before checking for intersection.
DNA Microarrays
Normalization is crude (it can eliminate real signal as well as noise), but common
Rescaling an array
For each element of the array: add a, then multiply by b
@array = (1, 3, 5, 7, 9); print "Array before rescaling: @array\n"; rescale_array (\@array, -1, 2); print "Array after rescaling: @array\n"; sub rescale_array { my ($arrayRef, $a, $b) = @_; foreach my $x (@$arrayRef) { $x = ($x + $a) * $b; } } Array before rescaling: 1 3 5 7 9 Array after rescaling: 0 4 8 12 16
Reference to hash of arrays (hash key is gene name, array elements are expression data)
Normalizing by gene
A program to normalize expression data from a set of microarray experiments
($experiment, $expr) = read_expr ("expr.txt"); while (($geneName, $lineRef) = each %$expr) { normalize_array ($lineRef); }
NB $data is a reference to an array
sub normalize_array { my ($data) = @_; my ($mean, $sd) = mean_sd (@$data); @$data= map (($_ - $mean) / $sd, @$data); }
Could also use the following: rescale_array($data,-$mean,1/$sd);
Normalizes by gene
Normalizing by column
Remaps gene arrays to column arrays
($experiment, $expr) = read_expr ("expr.txt"); my @genes = sort keys %$expr; for ($i = 0; $i < @$experiment; ++$i) { my @col; foreach $j (0..@genes-1) { $col[$j] = $expr->{$genes[$j]}->[$i]; } normalize_array(\@col); foreach $j (0..@genes-1) { $expr->{$genes[$j]}->[$i] = $col[$j]; } }
Puts column data in @col Normalizes (note use of reference) Puts @col back into %expr
Sorting
It is often useful to be able to sort an array
e.g. smallest element first, largest last
This is changing...
Nucleotides: g c t a Sorted: a c g t
y: -1 1 2 5 10 16
$x $x $y $x
Pears cmp Apples: 1 Pears cmp Oranges: 1 Apples cmp Oranges: -1 Pears cmp Pears: 0
This works because (X or Y or Z) = X (if X!=0) or Y (if X==0 and Y != 0) or Z (if X==Y==0)
Packages
Perl allows you to organise your subroutines in packages each with its own namespace
use PackageName; PackageName::doSomething(); This line includes a file called "PackageName.pm" in your code
Perl looks for the packages in a list of directories specified by the array @INC
print "INC dirs: @INC\n"; INC dirs: Perl/lib Perl/site/lib . The "." means the directory that the script is saved in
Object-oriented programming
Data structures are often associated with code
FASTA: read_FASTA print_seq revcomp ... GFF: read_GFF write_GFF ... Expression data: read_expr mean_sd ...
Object-oriented programming makes this association explicit. A type of data structure, with an associated set of subroutines, is called a class The subroutines themselves are called methods A particular instance of the class is an object
OOP concepts
Abstraction
represent the essentials, hide the details
Encapsulation
storing data and subroutines in a single unit hiding private data (sometimes all data, via accessors)
Inheritance
abstract base interfaces multiple derived classes
Polymorphism
different derived classes exhibit different behaviors in response to the same requests
OOP: Analogy
OOP: Analogy
o Messages (the words in the speech balloons, and also perhaps the coffee itself) o Overloading (Waiter's response to "A coffee", different response to "A black coffee") o Polymorphism (Waiter and Kitchen implement "A black coffee" differently) o Encapsulation (Customer doesn't need to know about Kitchen) o Inheritance (not exactly used here, except implicitly: all types of coffee can be drunk or spilled, all humans can speak basic English and hold cups of coffee, etc.) o Various OOP Design Patterns: the Waiter is an Adapter and/or a Bridge, the Kitchen is a Factory (and perhaps the Waiter is too), asking for coffee is a Factory Method, etc.
OOP: Advantages
Often more intuitive
Data has behavior
Modularity
Interfaces are well-defined Implementation details are hidden
Maintainability
Easier to debug, extend
OOP: Jargon
Member, method
A variable/subroutine associated with a particular class
Overriding
When a derived class implements a method differently from its parent class
Constructor, destructor
Methods called when an object is created/destroyed
Accessor
A method that provides [partial] access to hidden data
Factory
An [abstract] object that creates other objects
Singleton
A class which is only ever instantiated once (i.e. theres only ever one object of this class) C.f. static member variables, which occur once per class
Objects in Perl
An object in Perl is usually a reference to a hash The method subroutines for an object are found in a class-specific package
Command bless $x, MyPackage associates variable $x with package MyPackage
AUTOLOAD
When an undefined method is called on an object, the special method AUTOLOAD is called, if defined Special variable $AUTOLOAD contains function name Allows implementation of e.g. default accessors for hash elements
GD.pm
A graphics package by Lincoln Stein
use GD; # create a new image $im = new GD::Image(100,100); # allocate some colors $white = $im->colorAllocate(255,255,255); $black = $im->colorAllocate(0,0,0); $red = $im->colorAllocate(255,0,0); $blue = $im->colorAllocate(0,0,255); # make the background transparent $im->transparent($white); # Put a black frame around the picture $im->rectangle(0,0,99,99,$black); # Draw a blue oval $im->arc(50,50,95,75,0,360,$blue); # And fill it with red $im->fill(50,50,$red); # Convert the image to PNG and print it out print $im->png;
CGI.pm
CGI (Common Gateway Interface)
Page-based web programming paradigm
BioPerl
A set of Open Source Bioinformatics packages
largely object-oriented Can be downloaded from bio.perl.org
Handles various different file formats Parses BLAST and other programs Basis for Ensembl
the human genome annotation project www.ensembl.org
Example: GenBank
Example: Bio::DB::GenBank
Interface to the GenBank database
use Bio::DB::GenBank; $gb = new Bio::DB::GenBank; $seq = $gb->get_Seq_by_id('MUSIGHBA1'); # Unique ID # or ... $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number $seq = $gb->get_Seq_by_version('J00522.1'); # Accession.version $seq = $gb->get_Seq_by_gi('405830'); # GI Number
Digest::MD5
MD5 is a one-way hash function e.g. gravatar.com uses MD5 to map (authenticated) email addresses to avatar icons
Digest::MD5
MD5 is a one-way hash function e.g. gravatar.com uses MD5 to map (authenticated) email addresses to avatar icons
use Digest::MD5 qw(md5 md5_hex md5_base64); my $baseURL = "http://www.gravatar.com/avatar/; while (<>) { chomp; print $baseURL, md5_hex(lc($_)), "\n; }
Functional languages
More mathematical, cleaner; but less pragmatic Lisp, Scheme
Lisp is the oldest. (Lots (of (parentheses)))
Co-ordinate transformation
Motivation: map clones to chromosomes
Chromosome 17455 17855
Clones
403
803