Beginning Perl For Bioinformatics-RVS
Beginning Perl For Bioinformatics-RVS
By Raghvendra Sachan
Raghvendra Sachan
CONTENTS
SL.NO. TOPIC PAGE NO.
1.0 INTRODUCTION TO PERL
2
1.1 PERL FACT! 2
1.2 WHY PERL? 3
2.0 HISTORY OF PERL 3
3.0 BIOINFORMATICS (GENERAL VIEW)
5
4.0 BIOINFORMATICS USING PERL 5
4.1 PROGRAMMING CONCEPTS
5
4.2 VARIABLE 7
4.3 STRING OPERATION
7
5.0 PERL PROGRAMS 8
5.1 TO FIND OUT THE FIRST ORF IN THE GIVEN AMINO ACID SEQUENCE
8
5.2 TO FIND OUT 6 ORF’s IN THE GIVEN DNA SEQUENCE
11
5.3 TO DETERMINE THE INFORMATION ABOUT 20 AMINO ACIDS
14
5.4 TO DETERMINE THE INFORMATION ABOUT NUCLEOTIDES.
22
5.5 TO DETERMINE THE MOLECULAR WEIGHT OF THE AMINO ACIDS
SEQUENCE 25
5.6 TO DETERMINE MOLECULAR FORMULA OF THE AMINO ACIDS
SEQUENCE. 28
5.7 TO FIND THE REVERSE, COMPLIMENTARY, SEQUENCE.
31
2
5.8 TO IDENTIFY THE NUMBER OF NUCLEOTIDES IN THE SEQUENCE.
32
5.9 TO IDENTIFY THE NUMBER OF NUCLEOTIDES AND LENGTH IN THE
SEQUENCE 33
5.10 TO DETERMINE MOL. WT. OF THE DNA SEQ. USING FIL EHANDLING
34
6.0 APPENDIX 40
6.1 WHAT IS PERL? 40
6.2 VARIABLE & DATA TYPES
40
6.3 QUOTES AND STRINGS 41
6.4 OPERATORS
41
6.5 TESTING 42
6.6 BOOLEAN EXPRESSIONS
43
6.7 INPUT PERL FUNCTIONS 44
7.0 CONCLUSION
48
1.0 Introduction to Perl
Perl is a interpreted language optimized for scanning arbitrary text files, extracting
information from those text files, and printing reports based on that information. It's
also a good language for many system management tasks. The language is intended to
be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant,
minimal). It combines (in the author's opinion, anyway)some of the best features of
C, sed, awk, and sh, so people familiar with those languages should have little difficulty
with it. (Language historians will also note some vestiges of csh, Pascal, and even
BASIC|PLUS.) Expression syntax corresponds quite closely to C expression syntax.
http://www.activestate.com/Products/ActivePerl/
3
This is the officially blessed version of Perl for Windows. It is released by Active State.
Active Perl can be downloaded for free, or we can order the ActiveCD from them. It
comes with a wealth of widely used third-party libraries such as Tk, LWP, and the XML
bundle.
Whatever operating system we are on, this is a valid choice. Especially if it happen to be
on a UNIX-based operating system such as Linux, FreeBSD, Windows or Mac OS X.
The official documentation system for Perl is POD, or "Plain Old Documentation". It is
powerful and widely used.
1.1 Perl Facts
• Perl is a stable, cross platform programming language.
• It is used for mission critical projects in the public and private sectors.
• Perl is Open Source software, licensed under its Artistic License, or the GNU
General Public License (GPL).
• Perl was created by Larry Wall.
• Perl 1.0 was released to usenet's alt.comp.sources in 1987
• PC Magazine named Perl a finalist for its 1998 Technical Excellence Award in the
Development Tool category.
4
• The Perl interpreter can be embedded into other systems.
2.0 HISTORY OF PERL
-- Larry Wall when asked if he learned Perl from the perl source
PERL 1.000
Perl 1.000 is unleashed upon the world. Some People take Perls' Birthday seriously.
Behold as Randal sings Happy Birthday to Larrys' answering machine. The description
from the original man page sums up this new language well. (18 December)
PERL 2.000
Perl 2.000 released. (5 June) Some of the enhancements from Perl1 included:
• New regexp routines derived from Henry Spencer's.
• Support for /(foo|bar)/.
• Support for /(foo)*/ and /(foo)+/.
• \s for whitespace, \S for non-, \d for digit, \D nondigit
PERL 3.000
Perl 3.000 is released and is distributed by Larry for the first time under the terms of the
GNU Public License. A few of the new features: (18 Oct)
• Perl can now handle binary data correctly and has functions to pack and unpack
binary structures into arrays or lists. You can now do arbitrary ioctl functions.
• You can now pass things to subroutines by reference.
• Debugger enhancements.
PERL 4.000
Perl 4.000 is released and includes an artistic license as well as the GPL. (21 March)
Linus Torvalds releases the first version of Linux. Linus had wanted to name it Freax
(free + freak + unix) but the site administrator liked Linux better. It was distributed under
the GNU Public License. (July).
PERL 5.000
The much anticipated Perl 5.000 is unveiled. It was a complete rewrite of Perl.
A few of the features and pitfalls are: (18 October)
• Objects.
5
• The documentation is much more extensive and perldoc along with pod is
introduced.
• Lexical scoping available via my. eval can see the current lexical variables.
• The preferred package delimiter is now :: rather than '.
• New functions include: abs(), chr(), uc(), ucfirst(), lc(), lcfirst(),
chomp(), glob()
• There is now an English module that provides human readable translations for
cryptic variable names.
• Several previously added features have been subsumed under the new keywords use
and no.
• Pattern matches may now be followed by an m or s modifier to explicitly request
multiline or singleline semantics. An s modifier makes . match newline.
• @ now always interpolates an array in double-quotish strings. Some programs may
now need to use backslash to protect any @ that shouldn't interpolate.
• It is no longer syntactically legal to use whitespace as the name of a variable, or as a
delimiter for any kind of quote construct.
• The -w switch is much more informative.
• is now a synonym for comma. This is useful as documentation for arguments that
come in pairs, such as initializers for associative arrays, or named arguments to a
subroutine.
Perl 5.001 is released. (13 March)
Perl 5.002 announced which introduced, among other things, subroutine prototypes and
sysopen(). (29 February)
6
using techniques and concepts from informatics, statistics, mathematics, chemistry,
biochemistry, physics, and linguistics. It has many practical applications in different
areas of biology and medicine.
4.0 Bioinformatics using Perl
Bioinformatics, the use of computers in biology research, has been increasing in
importance during the past decade as the Human Genome Project went from its
beginning to the announcement last year of a "draft" of the complete sequence of human
DNA.
The importance of programming in biology stretches back before the previous decade.
And it certainly has a significant future now that it is a recognized part of research into
many areas of medicine and basic biological research. This may not be news to
biologists. But Perl programmers may be surprised to find that their handsome language
has become one of the most - if not the most popular - of computer languages used in
bioinformatics.
4.1 Programming Concepts
• Program = a text file that contains instructions for the computer to follow
• Programming Language = a set of commands that the computer understands
(via a “command interpreter”)
• Input = data that is given to the program
• Output = something that is produced by the program
• Programming
• Write the program (with a text editor)
• Run the program
• Look at the output
• Correct the errors (debugging)
• Repeat
• (computers are VERY dumb -they do exactly what you tell them to do, so be
careful what you ask for…)
• String
• Text is handled in Perl as a string
7
• This basically means that you have to put quotes around any piece of text that is
not an actual Perl instruction.
• Perl has two kinds of quotes - single ‘ ‘
and double “ “
• (they are different- more about this later)
• Print
• Perl uses the term “print” to create output
• Without a print statement, you won’t know what your program has done
• You need to tell Perl to put a carriage return at the end of a printed line
o Use the “\n” (newline) command
o Include the quotes
o The “\” character is called an escape - Perl uses it a lot
• Numbers and Functions
• Perl handles numbers in most common formats:
• 456
• 5.6743
• 6.3E-26
• Mathematical functions work pretty much as you would expect:
• 4+7,6*4 ,43-27, 256/12,2/(3-5)
4.2 Variable
• To be useful at all, a program needs to be able to store information from one
line to the next
• Perl stores information in variables
• A variable name starts with the “$” symbol, and it can store strings or
numbers
o Variables are case sensitive
o Give them sensible names
• Use the “=”sign to assign values to variables
8
• $a = 100
• $s = “ttattagcc”
4.3 String operation
• Strings (text) in variables can be used for some math-like operations
• Concatenate (join) use the dot . operator
• $seq1= “ACTG”;
• $seq2= “GGCTA”;
• $seq3= $seq1 . $seq2;
• print $seq3
ACTGGGCTA
• String comparison (are they the same, > or <)
eq (equal )
ne (not equal )
ge (greater or equal )
gt (greater than )
lt (less than )
le (less or equal )
9
$g='';
while ($c<$len){
$b=substr($a,$c,3);
if ($b=~ /AUG/)
{
$g=$g.'M';
}
if ($b=~ /(UUA)|(UUG)|(CUU)|(CUC)|(CUA)|(CUG)/)
{
$g=$g.'L';
}
if ($b=~ /(UCU)|(UCC)|(UCA)|(UCG)|(AGU)|(ACG)/)
{
$g=$g.'S';
}
if ($b=~ /(AUU)|(AUC)|(AUA)/)
{
$g=$g.'I';
}
if ($b=~ /(UUU)|(UUC)/)
{
$g=$g.'F';
}
if ($b=~ /(GUU)|(GUC)|(GUA)|(GUG)/)
{
$g=$g.'V';
}
if ($b=~ /(CCU)|(CCC)|(CCA)|(CCG)/)
{
$g=$g.'P';
}
10
if ($b=~ /(ACU)|(ACC)|(ACA)|(ACG)/)
{
$g=$g.'T';
}
if ($b=~ /(GCU)|(GCC)|(GCA)|(GCG)/)
{
$g=$g.'A';
}
if ($b=~ /(UAU)|(UAC)/)
{
$g=$g.'Y';
}
if ($b=~ /(UGU)|(UGC)/)
{
$g=$g.'C';
}
if ($b=~ /UGG/)
{
$g=$g.'W';
}
if ($b=~ /(CAU)|(CAC)/)
{
$g=$g.'H';
}
if ($b=~ /(CAA)|(CAG)/)
{
$g=$g.'Q';
}
if ($b=~ /(CGU)|(CGC)|(CGA)|(AGG)|(AGA)|(AGG)/)
{
$g=$g.'R';
11
}
if ($b=~ /(AAU)|(AAC)/)
{
$g=$g.'N';
}
if ($b=~ /(AAA)|(AAG)/)
{
$g=$g.'K';
}
if ($b=~ /(CGU)|(GGC)|(GGA)|(GGG)/)
{
$g=$g.'G';
}
if ($b=~ /(GAA)|(GAG)/)
{
$g=$g.'E';
}
if ($b=~ /(GAU)|(GAC)/)
{
$g=$g.'D';
}
if ($b=~ /(UAA)|(UAG)(UGA)/)
{
$g=$g.'#';
}
$c=$c+3;
}
RESULT
12
ENTER THE m-RNA SEQUENCE
AUCGAUCGAUGC
THE LENGTH OF DNA SEQUENCE IS 12
THE AMINO ACID IN THE SEQUENCE IN THE 1ST ORF IS IDRC
COMMENT
AIM: TO FIND OUT THE FIRST ORF IN THE GIVEN AMINO ACID SEQUENCE.
13
if ($seq=~/GG./i) {$p.='G';}
if ($seq=~/CA[UC]/i) {$p.='H';}
if ($seq=~/AU[UCA]/i) {$p.='I';}
if ($seq=~/AA[AG]/i) {$p.='K';}
if ($seq=~/UU[AG]/i) {$p.='L';}
if ($seq=~/AUG/i) {$p.='M';}
if ($seq=~/AA[UC]/i) {$p.='N';}
if ($seq=~/CC./i) {$p.='P';}
if ($seq=~/CA[AG]/i) {$p.='Q';}
if ($seq=~/CG.|AG[AG]/i){$p.='R';}
if ($seq=~/UC.|AG[UC]/i){$p.='S';}
if ($seq=~/AC./i) {$p.='T';}
if ($seq=~/GU./i) {$p.='V';}
if ($seq=~/UGG/i) {$p.='W';}
if ($seq=~/UA[UC]/i) {$p.='Y';}
if ($seq=~/CU./i) {$p.='L';}
if ($seq=~/UA[AG]|UGA/i){$p.='*';}
$i=$i+3;
}
return $p;
}
print"\nFIRST READING FRAME ";
$q=dna();
print": $q\n";
print"\nSECOND READING FRAME ";
$dna=substr($dna,1,$len);
$p=dna();
print": $p\n";
print"\nTHIRD READING FRAME ";
$dna=substr($dna,1,$len);
$x=dna();
14
print": $x\n";
$rev=reverse($dna1);
$rev=~ tr/ACTG/UGAC/;
print "\nREVERSE mRNA : $rev\n ";
print"\nFOURTH READING FRAME ";
$q1=dna();
print": $q1\n";
print"\nFIFTH READING FRAME ";
$dna=substr($dna,1,$len);
$p1=dna();
print": $p1\n";
print"\nSIXTH READING FRAME ";
$dna=substr($dna,1,$len);
$x1=dna();
print": $x1\n";
RESULT
ENTER THE DNA SEQUENCE
ATGCGTGACATG
mRNA : UACGCACUGUAC
LENGTH 12
FIRST READING FRAME : YALY
SECOND READING FRAME : THC
THIRD READING FRAME : RTV
REVERSE mRNA : CAUGUCACGCAU
FOURTH READING FRAME : RTV
FIFTH READING FRAME : ALY
SIXTH READING FRAME : HC
COMMMENT
AIM: TO FIND OUT 6 ORF’s IN THE GIVEN DNA SEQUENCE.
15
do{
print"*" x 50;
print "\nEnter E for ESSENTIAL AMINO ACIDS\n";
print "Enter N for NONESSENTIALS\n";
print"*" x 50;
$a=<stdin>;
chomp($a);
if ($a eq 'E')
{
print "Isoleucine(I)\n
Leucine(L)\n
Lysine(K)\n
Methionine(M)\n
Phenylalanine(F)\n
Threonine(T)\n
Tryptophan(W)\n
Valine(V)\n
Arginine(R)\n
Histidine(H)\n";
}
if($a eq 'N')
{
print "Alanine(A)\n
Asparagine(N)\n
Aspartate(D)\n
Cysteine(C)\n
Glutamate(E)\n
Glutamine(Q)\n
Glycine(G)\n
Proline(P)\n
Serine(S)\n
16
Tyrosine(Y)\n";
}
$b= <stdin>;
chomp($b);
if ($b eq 'I')
{
print "Isoleucine\n
Chemical formula: C6H13NO2\n
Molecular mass: 131.18 [1] g•mol-1\n
Systematic name:\n
(2S,3S)-2-amino-3-methylpentanoic acid\n
Abbreviations: I, Ile\n
Synonyms:\n
{2/α}-amino-{3/β}-methylvaleric acid\n
3-methyl-{/erythro-}norvaline\n
Amino-sec-butyl-acetic acid\n
Amino(1-methylpropyl)-acetic acid\n";
}
if ($b eq 'L')
{
print"Leucine\n
Chemical formula: C6H13NO2\n
Molecular mass: 131.18 g•mol-1\n
Systematic name:\n
(S)-2-amino-4-methyl-pentanoic acid\n
Abbreviations: L, Leu\n
Synonyms:\n
{(S)-/L-}2-amino-4-methylvaleric acid\n
4-methyl-norvaline\n
α-aminoisocaproic acid\n";
}
17
if ($b eq 'K')
{
print"Lysine\n
Systematic name (S)-2,6-Diaminohexanoic acid\n
Abbreviations Lys,k\n
Chemical formula C6H14N2O2\n
Molecular mass 146.19 g/mol\n
PubChem 876\n
Melting point 224 °C\n";
}
if ($b eq 'M')
{
print"Methionine\n
Systematic name (S)-2-amino-4-(methylsulfanyl)-\n
butanoic acid\n
Abbreviations Met,m\n
Chemical formula C5H11NO2S\n
Molecular mass 149.21 g mol-1\n
Melting point 281 °C\n";
}
if ($b eq 'F')
{
print "Phenylalanine\n
Systematic name 2-Amino-3-phenyl-propanoic acid\n
Abbreviations Phe,F\n
Chemical formula C9H11NO2\n
Molecular mass 165.19 g mol-1\n
Melting point 283 °C\n";
}
if ($b eq 'T')
{
18
print" Threonine\n
Systematic name (2S,3R)-2-Amino-3-hydroxybutanoic acid\n
Abbreviations Thr,T\n
Chemical formula C4H9NO\n
Molecular mass 119.12 g mol-1\n
Melting point 256 °C\n";
}
if ($b eq 'W')
{
print" Tryptophan\n
Systematic name (S)-2-Amino-3-(1H-indol-3-yl)-propionic acid\n
Abbreviations Trp,W\n
Chemical formula C11H12N2O2\n
Molecular mass 204.23 g mol−1\n
Melting point 289 °C";\n
}
if ($b eq 'W')
{
print" Valine\n
Systematic name (S)-2-amino-3-methyl-butanoic acid\n
Abbreviations Val,V\n
Chemical formula C5H11NO2\n
Molecular mass 117.15 g mol-1\n
Melting point 315 °C\n";
}
if ($b eq 'R')
{
print"Arginine\n
19
amino)pentanoic acid\n
Chemical data\n
Formula C6H14N4O2\n
Mol. weight 174.2\n";
}
if ($b eq 'H')
{
print" Histidine\n
Systematic (IUPAC) name\n
2-amino-3-(3H-imidazol-4-yl)propanoic acid\n
Chemical data\n
Formula C6H9N3O2\n
Mol. weight 155.16\n";
}
if ($b eq 'A')
{
print" Alanine\n
Systematic (IUPAC) name\n
(S)-2-aminopropanoic acid\n
Chemical data\n
Formula C3H7NO\n
Mol. weight 89.1\n";
}
if ($b eq 'N')
{
print"Asparagine\n
Systematic (IUPAC) name\n
(2S)-2-amino-3-carbamoyl-propanoic acid\n
Chemical data\n
Formula C4H8N2O3\n
Mol. weight 132.118\n";
20
}
if ($b eq 'C')
{
print "Cysteine\n
Systematic (IUPAC) name\n
(2R)-2-amino-3-sulfanyl-propanoic acid\n
Chemical dat\n
Formula C3H7NO2S\n
Mol. weight 121.16\n";
}
if ($b eq 'A')
{
print"Aspartic acid\n
Systematic (IUPAC) name\n
(2S)-2-aminobutanedioic acid\n
Chemical data\n
Formula C4H7NO4\n
Mol. weight 133.10\n";
}
if ($b eq 'E')
{
print"Glutamic acid\n
Systematic (IUPAC) name\n
(2S)-2-aminopentanedioic acid\n
Chemical data\n
Formula C5H9NO4\n
Mol. weight 147.13\n";
}
if ($b eq 'Q')
{
print" Glutamine\n
21
Systematic (IUPAC) name\n
(2S)-2-amino-4-carbamoyl-butanoic acid\n
Chemical data\n
Formula C5H10N2O3\n
Mol. weight 146.15\n";
}
if ($b eq 'G')
{
print" Glycine\n
22
Molecular mass 105.09 g mol-1\n
Melting point 228 °C \n";
}
if ($b eq 'Y')
{
print"Tyrosine\n
Systematic name (S)-2-Amino-3-(4-hydroxy-phenyl)-propanoic acid\n
Abbreviations Tyr,Y\n
Chemical formula C9H11NO3\n
Molecular mass 181.19 g mol-1\n
Melting point 343 °C\n";
}
print "\nEnter Again press Y";
$y=<stdin>;
chomp($y);
)
while($y eq 'Y')
RESULT
ENTER E FOR ESSENTIAL AMINO ACIDS
ENTER N FOR NON ESSENTIALS
E
23
Synonyms:
{2/α}-amino-{3/β}-methylvaleric acid
3-methyl-{/erythro-}norvaline
Amino-sec-butyl-acetic acid
Amino(1-methylpropyl)-acetic acid
To Start Again Press Y
COMMENT
AIM:TO DETERMINE THE INFORMATION ABOUT 20 AMINO ACIDS.
print"*" x 30;
print "\nEnter 1 for ADENINE\n";
print "Enter 2 for GUANINE\n";
print "Enter 3 for THYMINE\n";
print "Enter 4 for CYTOSINE\n";
print "ENTER 5 for URACIL\n";
print "ENTER YOUR CHOICE\n";
$a =<stdin>;
if($a==1)
{
print "ADENINE\n
Systematic (IUPAC) name 7H-purin-6-amine\n
Synonyms 6-aminopurine\n
Identifiers CAS number 73-24-5 PubChem 190\n
Chemical data\n
Formula C5H5N5\n
Mol. weight 135.127\n
SMILES NC1=NC=NC2=C1N=CN2\n
Physical data\n
Melt. point\n
24
360 - 365 °C (-265 °F)\n";
}
if ($a==2)
{
print "GUANINE\n
Systematic name 2-amino-1H-purin-6(9H)-one\n
Other names 2-amino-6-oxo-purine,2-aminohypoxanthine\n
Molecular formula C5H5N5O\n
SMILES NC(NC1=O)=NC2=C1N=CN2\n
Molar mass 151.1261 g/mol\n
Appearance White amorphous solid\n
CAS number [73-40-5]\n
Melting point 360°C (633.15 K) deco.\n
Boiling point Sublimes\n";
}
if ($a==3)
{
print "THYMINE\n
Chemical name 5-Methylpyrimidine-2,4(1H,3H)-dione\n
Chemical formula C5H6N2O2\n
Molecular mass 126.11334 g/mol\n
Melting point 316 - 317 °C\n
CAS number 65-71-4\n
SMILES CC1=CNC(NC1=O)=O\n";
}
if ($a==4)
{
print "CYTOSINE\n
Chemical name 4-Aminopyrimidin-2(1H)-one\n
Chemical formula C4H5N3O\n
Molecular mass 111.102 g/mol\n
25
Melting point 320 - 325°C (decomp)\n
CAS number 71-30-7\n
SMILES NC1=NC(NC=C1)=O\n";
}
if ($a==5)
{
print "URACIL\n
Systematic name Pyrimidine-2,4(1H,3H)-dione\n
Other names Uracil, 2-oxy-4-oxy pyrimidine\n
Molecular formula C4H4N2O2\n
Molar mass 112.08676 g/mol\n
Appearance Solid\n
CAS number [66-22-8]\n
Melting point 335 °C (608 K)\n
Boiling point N/A\n
Acidity (pKa) basic pKa = -3.4\n
acidic pKa = 9.389\n";
}
print "\nTo Start Again press Y";
$y=<stdin>;
chomp($y);
}
while($y eq 'Y')
RESULT
Enter 1 for ADENINE
Enter 2 for GUANINE
Enter 3 for THYMINE
Enter 4 for CYTOSINE
ENTER 5 for URACIL
ENTER YOUR CHOICE
1
26
ADENINE
Systematic (IUPAC) name 7H-purin-6-amine
Synonyms 6-aminopurine
Identifiers CAS number 73-24-5 PubChem 190
Chemical data
Formula C5H5N5
Mol. weight 135.127
SMILES NC1=NC=NC2=C1N=CN2
Physical data
Melt. point
360 - 365 °C (-265 °F)
To Start Again Press Y
COMMENT
AIM: TO DETERMINE THE INFORMATION ABOUT NUCLEOTIDES.
27
$b = $b+117.15;
}
if($i eq 'L'){
$b = $b+131.18;
}
if($i eq 'I'){
$b = $b+131.18;
}
if($i eq 'S'){
$b = $b+105.09;
}
if($i eq 'T'){
$b = $b+119.12;
}
if($i eq 'C'){
$b = $b+121.15;
}
if($i eq 'M'){
$b = $b+149.21;
}
if($i eq 'F'){
$b = $b+165.19;
}
if($i eq 'Y'){
$b = $b+181.19;
}
if($i eq 'W'){
$b = $b+204.23;
}
if($i eq 'P'){
$b = $b+115.13;
28
}
if($i eq 'N'){
$b = $b+132.12;
}
if($i eq 'Q'){
$b = $b+146.15;
}
if($i eq 'D'){
$b = $b+133.10;
}
if($i eq 'E'){
$b = $b+147.13;
}
if($i eq 'K'){
$b = $b+146.19;
}
if($i eq 'H'){
$b = $b+155.16;
}
if($i eq 'R'){
$b = $b+174.20;
}
}
$c=$b-(18*($x-1));
print "The MOLECULAR WEIGHT of the sequence is $c";
RESULT
ENTER THE AMINO ACID SEQUENCE
AVLIST
LENGTH:4
THE MOLECULAR WEIGHT OF THE SEQUENCE IS 414.16
COMMENT
29
AIM:TO DETERMINE THE MOLECULAR WEIGHT OF THE AMINO ACIDS
SEQUENCE.
if ($b eq 'G')
{
print " GLYCINE=C2H5NO2";
}
if ($b eq 'A')
{
print " ALANINE=C3H7NO2";
}
if ($b eq 'V')
{
print " VALINE=C5H11NO2";
}
if ($b eq 'L')
{
print " LEUCINE = C6H13NO2";
}
if ($b eq 'I')
{
print " ISOLEUCINE=C6H13NO2";
}
if ($b eq 'S')
{
print " SERINRE = C3H7NO3";
}
30
if ($b eq 'T')
{
print " THREONINE = C4H9NO3";
}
if ($b eq 'C')
{
print " CYSTINE = C3H7NO2S";
}
if ($b eq 'M')
{
print " METHIONINE = C5H11NO2S";
}
if ($b eq 'F')
{
print " PHENYLALANINE = C9H11NO2";
}
if ($b eq 'Y')
{
print " TYROSINE = C9H11NO3";
}
if ($b eq 'W')
{
print " TRYPTOPHAN = C11H12N2O2";
}
if ($b eq 'P')
{
print " PROLINE = C5H9NO2";
}
if ($b eq 'N')
{
31
print " ASPARAGINE = C4H8N2O3";
}
if ($b eq 'Q')
{
print " GLUTAMINE = C5H10N2O3";
}
if ($b eq 'D')
{
print " ASPARTIC ACID = C4H7NO4";
}
if ($b eq 'E')
{
print " GLUTAMIC ACID = C5H9NO4";
}
if ($b eq 'K')
{
print " LYSINE = C6H14N2O2";
}
if ($b eq 'H')
{
print " HISTIDINE = C6H9N3O2";
}
if ($b eq 'R')
{
print " ARGININE = C6H14N4O2";
}
RESULT
A
ALANINE= C3H7NO2
COMMENT
32
AIM:TO DETERMINE THE MOLECULAR FORMULA OF THE AMINO ACIDS
SEQUENCE.
$A =0;
33
$T =0;
$C =0;
$G =0;
foreach $i(@a){
if ($i eq 'A')
{
$A=$A+1;
}
if ($i eq 'T')
{
$T=$T+1;
}
if ($i eq 'C')
{
$C=$C+1;
}
if ($i eq 'G')
{
$G=$G+1;
}
}
print "Adenine = $A";
print "Cytosine= $C";
print "Guanine = $G";
print "Thymine = $T";
print "length= $l";T";
print "length= $l";
RESULT
ATCG
Adenine = 1 Cytosine= 1Guanine = 1Thymine1
COMMENT
34
AIM: TO IDENTIFY THE NUMBER OF NUCLEOTIDES IN THE SEQUENCE.
5.9 PROGRAM NO 9
$a=<stdin>;
chomp($a);
$l= length($a);
@a= split('',$a);
$A =0;
$T =0;
$C =0;
$G =0;
foreach $i(@a){
if ($i eq 'A')
{
$A=$A+1;
}
if ($i eq 'T')
{
$T=$T+1;
}
if ($i eq 'C')
{
$C=$C+1;
}
if ($i eq 'G')
{
$G=$G+1;
}
}
print "Adenine = $A\n";
print "Cytosine= $C\n";
35
print "Guanine = $G\n";
print "Thymine = $T\n";
print "length= $l\n";
RESULT
ATCG
Adenine = 1 Cytosine= 1Guanine = 1Thymine 1
LENGTH 4
COMMENT
AIM: TO IDENTIFY THE NUMBER OF NUCLEOTIDES AND THE LENGTH IN
THE SEQUENCE.
$dna="D:/orf/as.txt";;
open(DNA,$dna);
@dna=<DNA>;
$dna=join('',@dna);
$dna=~s/\s//g;
print "$dna";
do{
sub A
$l= length($dna);
print"$l\n";
sub B
36
$dna= reverse ($dna);
sub C
$dna=~ tr/ATGC/TACG/;
sub D
$dna=~ tr/ATGC/TACG/;
sub E
@a= split('',$dna);
$A =0;
$T =0;
$C =0;
$G =0;
foreach $i(@a){
if ($i eq 'A')
37
{
$A=$A+1;
if ($i eq 'T')
$T=$T+1;
if ($i eq 'C')
$C=$C+1;
if ($i eq 'G')
$G=$G+1;
$e= ($A* 313.21) + ($T* 288.20) + ($G * 329.21) + ($C* 289.19) - (18.02);
print "$e";
38
print"\nenter 1 for length";
$a =<stdin>;
if($a==1)
A;
if($a==2)
B;
if($a==3)
C;
if($a==4)
D;
if($a==5)
39
{
E;
$y=<stdin>;
chomp($y);
while($y eq 'Y')
RESULT
ADENINE=2
THYMINE=2
GUANINE=2
CYTOSINE=2
COMMENT
40
AIM: TO DETERMINE THE LENGTH,REVERSE, COMPLIMENTARY,REVERSE
COMPLIMENTARYAND MOLECULAR WEIGHT OF THE GIVEN DNA
SEQUENCE USING FILE HANDLING.
6.0 APPENDIX
41
suited for tasks involving quick prototyping, system utilities, software tools, system
management tasks, database access, graphical programming, networking, and world wide
web programming. These strengths make it especially popular with system administrators
and CGI script authors, but mathematicians, geneticists, journalists, and even managers
also use Perl.
6.2 Variables & Data Types
a variable is a named location in memory that is used to hold data that may be modified
by the program. Perl has three scema for keeping data during program execution: scalars,
arrays of scalars (also known as lists), and hashes. Arrays are grouped scalars indexed by
number, while hashes are indexed by strings.
Scalars
The most basic kind of data structure in Perl is the scalar variable. Scalar variables can
hold both strings and numbers.
$bodytemp = 98.6;
sets the scalar variable $BodyTemp to 98.6, but you can also assign a
string to exactly the same variable:
$bodytemp = 'normal';
Perl will also accept numbers as strings,
$bodytemp = '098.6';
and still performs arithmetic and other operations on them.
Arrays
An array variable is a list of scalars, hence in perl they are often refered to as lists. They
have the same format as scalar variables except that they are prefixed by an @ symbol.
The following statements:
@valine = ("gtg", "gtt", "gta", "gtc");
@hydrophobics = ("valine", "leucine","isoleucine");
@weights(117.15, 131.17, 131.17);
assign a four element list to the array variable @valine and a three element list to the
array variables @hydrophobics and @weights.
Hashes
42
Basically hashes are arrays which are accessed by a string. They are also refered to as
associative arrays.To define a hash we can use the usual parenthesis notation, but the
array itself is prefixed by a % sign. Suppose we want to store all the hydrophobic amino
acids with their molecular weights in a single data structure. It would look like this:
%molyweights = ("valine", 117.15,
"leucine", 131.17,
"isoleucine", 131.17);
@data is a list array that has an element for every string and scalar in the hash
%molyweights.
6.3 Quotes & Strings
\t tab
\n newline
\b backspace
\a alarm (bell)
\$ literal $
\@ literal @
\\ literal
(special characters)
6.4 Operators
Precidence
Use parentheses when in doubt.
Arithmetic Operators
Math in Perl
x**y exponentiation
-x negation
x/y division
x*y multiplication
x+y addition
43
x-y subtraction
6.5 Testing
Primarily for Numeric Comparison
== TRUE if the left argument is numerically equal to the
right argument; otherwise FALSE.
!= TRUE if the left argument is numerically not equal to
the right argument; otherwise FALSE
< TRUE if the left argument is numerically less than the
right argument; otherwise FALSE
> TRUE if the left argument is numerically greater than
the right argument; otherwise FALSE
<= TRUE if the left argument is numerically less than or
equal to the right argument; otherwise FALSE
>= TRUE if the left argument is numerically greater than
or equal to the right argument; otherwise FALSE
<=> returns -1, 0, or 1 depending on whether the left
argument is numerically less than, equal to, or greater
than the right argument
44
or equal to the right argument; otherwise FALSE
ge returns TRUE if the left argument is stringwise greater
than or equal to the right argument; otherwise FALSE
cmp returns -1, 0, or 1 depending on whether the left argument
is stringwise less than, equal to, or greater than the right
argument
!($a) Is $a FALSE ?
45
SYNOPSIS
length EXPR
reverse - reverse a string or a list
SYNOPSIS
reverse STRING
reverse LIST
substr - get or alter a portion of a string
SYNOPSIS
substr EXPR,OFFSET,LEN,REPLACEMENT
substr EXPR,OFFSET,LEN
substr EXPR,OFFSET
index - left-to-right substring search
SYNOPSIS
index STR, SUBSTR, POSITION
index STR, SUBSTR
rindex - right-to-left substring search
SYNOPSIS
rindex STR,SUBSTR,POSITION
rindex STR,SUBSTR
Numeric Functions
abs - absolute value function
cos - cosine function
exp - raise e to a power
int - get the integer portion of a number
log - retrieve the natural logarithm for a number
sin - return the sin of a number
sqrt - square root function
6.7.1 Metacharacters
Metacharacters are used to broaden the capabilities of a pattern to match multiple strings
or in specific locations. The following are recognized:
. Match any character (except newline)
46
^ Match the beginning of the line
$ Match the end of the line (or before newline at the end)
| Alternation
( ) Grouping
[ ] Character class
metacharacters
6.7.2 Character Classes: Perl also provides some predefined character classes. The
following can be used in place of their bracketed alternatives:
\w Match a "word" character [a-zA-Z_0-9]
\W Match a non-word character [^a-zA-Z_0-9]
\s Match a whitespace character [ \t\n\r\f]
\S Match a non-whitespace character [^ \t\n\r\f]
\d Match a digit character [0-9]
\D Match a non-digit character [^0-9]
predefined character classes
6.7.3 Quantifiers
* Match 0 or more times, same as {0,}
+ Match 1 or more times, same as {1,}
? Match 1 or 0 times, same as {0,1}
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
quantifiers
if - conditional branching
SYNTAX
if (EXPR) {BLOCK}
if (EXPR) {BLOCK} else {BLOCK}
if (EXPR) {BLOCK} elsif (EXPR) {BLOCK} ... else {BLOCK}
47
while (EXPR) {BLOCK}
do {BLOCK} while (EXPR)
until - loop structure
SYNTAX
until (EXPR) {BLOCK}
do {BLOCK} until (EXPR)
7.0 CONCLUSION
We have only touched the tip of the iceberg here. Beyond just pure Perl projects, we
could also manage C & Perl joint projects under this infrastructure. The infrastructure is
built in Perl, which means that it is extremely portable, running on platforms ranging
48
from Linux to Windows to S/390. Once we can get used to this infrastructure, we will
find it totally invaluable for all the projects you work on. We will never have to write an
install script again, and through the use of well formed test cases, you can have a far
higher level of confidence that our program is performing the way it was intended.
Perl scripts which build dynamic data for a web site, and are already coded to return
HTML data, can benefit from offering PDF output options to users. Relying on the
external program HTMLDOC, which already does all the hard work of transforming
HTML into PDF.
We're the first to admit that calling HTMLDOC externally is not the most elegant
solution in the world -- sometimes, though, sheer functionality and the smile on your little
user's faces is worth more than any elegance!
49