Lecture14-Perl in Bioinformatics
Lecture14-Perl in Bioinformatics
Lecture 13 & 14
Perl in Bioinformatics
Tamsila Parveen
1
Perl & Bioinformatics
• Perl is a popular programming language that's extensively used in
areas such as bioinformatics and web programming.
• GenBank and the Protein Data Bank have been growing exponentially
for some time now.
2
Store a DNA Sequence & Concatenating 2 DNA
• Let's write a small program that stores some DNA in a variable and
prints it to the screen.
Concatenating 2 DNA
3
Transcription: DNA to RNA
• Transcription of DNA to RNA is the outcome of the workings
of a delicate, complex, and error-correcting molecular
machinery
4
5
Calculating the reverse complement of a
strand of DNA
• The reverse function does exactly what's needed. It's designed
to reverse the order of elements.
6
Check if two strands of DNA are reverse
complements of each other
7
Flow Control
• Flow control is the order in which the statements of a program are executed.
• A program executes from the first statement at the top of the program to
the last statement at the
bottom, in order, unless told to do otherwise.
• There are two ways to tell a program to do otherwise: conditional
statements and loops.
• A conditional statement executes a group of statements only if the
conditional test succeeds; otherwise, it just skips the group of
statements.
• A loop repeats a group of statements until an associated test fails
8
Reading Proteins in Files
• First, create a file on your computer (use your text editor) and put
some protein sequence data into it.
• Even to download a file with protein sequence data in it, let's develop
a program that reads the protein sequence data from the file and stores
it into a variable
9
Regular Expressions
• Regular expressions let you easily manipulate strings of all sorts, such as DNA
and protein sequence data.
• What's great about regular expressions is that if there's something
you want to do with a string, you usually can do it with Perl regular expressions.
• Some regular expressions are very simple. For instance, you can just use the
exact text of what you're searching for as a regular expression
• if I was looking for the word
"bioinformatics" in the text of this book, I could use the regular expression:
/bioinformatics/
Some regular expressions can be more complex, however.
10
Searching for motifs
11
Counting Nucleotides
• There are many things you might want to know about a piece of DNA
• Two main points to count each type of nucleotide in a DNA
1. Split the DNA into an array of single bases
2. In a loop, look at each base in turn to iterate over the positions in the string
of DNA while counting
12
A subroutine to get Seq Data
13
Scoping
• By keeping all variables a subroutine uses active only within the
subroutine, you can make it safe to call the subroutines from
anywhere.
• You make the variables specific only to the subroutine by declaring
them as myvariables.
• my is a keyword defined in Perl that limits variables to the block in
which they are used
• Hiding variables and making them local to only a restricted part of a
program, is called scoping.
• In Perl, using my variables is known as lexical scoping, and it's an
essential part of modularizing the programs.
14
Translation
• The process of the formation of Protein is termed as Translation
15
Parsing Annotations
• It's interesting to think about how to extract the useful information.
• The FEATURES table is certainly a key part of the story.
16
Random Nucleotide Generation
• Mutation is a fundamental topic in biology. Some of them do affect the
proteins and may result in diseases such as cancer
17
Lab Assignment 4
• Get the sequence file from user