0% found this document useful (0 votes)
96 views

Lecture14-Perl in Bioinformatics

This document discusses using the Perl programming language for bioinformatics tasks. It covers storing and manipulating DNA sequences, transcribing DNA to RNA, finding the reverse complement of a DNA strand, reading and writing protein sequences from files, using regular expressions to search strings, counting nucleotides, translating DNA to proteins, and simulating mutations. It also describes passing data between subroutines, scoping variables, and parsing annotation from FASTA format files. The final section outlines a lab assignment involving getting sequence data from a user, checking for valid FASTA format, and modeling the DNA replication, transcription and translation processes.

Uploaded by

noor ulain
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Lecture14-Perl in Bioinformatics

This document discusses using the Perl programming language for bioinformatics tasks. It covers storing and manipulating DNA sequences, transcribing DNA to RNA, finding the reverse complement of a DNA strand, reading and writing protein sequences from files, using regular expressions to search strings, counting nucleotides, translating DNA to proteins, and simulating mutations. It also describes passing data between subroutines, scoping variables, and parsing annotation from FASTA format files. The final section outlines a lab assignment involving getting sequence data from a user, checking for valid FASTA format, and modeling the DNA replication, transcription and translation processes.

Uploaded by

noor ulain
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

BIO400

Lecture 13 & 14

Perl in Bioinformatics

Tamsila Parveen

1
Perl & Bioinformatics
• Perl is a popular programming language that's extensively used in
areas such as bioinformatics and web programming.

• Biological data is proliferating rapidly.

• GenBank and the Protein Data Bank have been growing exponentially
for some time now.

• Bioperl is an open source bioinformatics toolkit used by researchers


all over the world.

2
Store a DNA Sequence & Concatenating 2 DNA
• Let's write a small program that stores some DNA in a variable and
prints it to the screen.

Concatenating 2 DNA

3
Transcription: DNA to RNA
• Transcription of DNA to RNA is the outcome of the workings
of a delicate, complex, and error-correcting molecular
machinery

• Here it's a simple substitution. When DNA is transcribed to


RNA, all the T's are changed to U’s.

4
5
Calculating the reverse complement of a
strand of DNA
• The reverse function does exactly what's needed. It's designed
to reverse the order of elements.

• Next substitute all bases by their complements,


• A->T, T->A, G->C, C->G

6
Check if two strands of DNA are reverse
complements of each other

7
Flow Control
• Flow control is the order in which the statements of a program are executed.
• A program executes from the first statement at the top of the program to
the last statement at the
bottom, in order, unless told to do otherwise.
• There are two ways to tell a program to do otherwise: conditional
statements and loops.
• A conditional statement executes a group of statements only if the
conditional test succeeds; otherwise, it just skips the group of
statements.
• A loop repeats a group of statements until an associated test fails

8
Reading Proteins in Files
• First, create a file on your computer (use your text editor) and put
some protein sequence data into it.

• Even to download a file with protein sequence data in it, let's develop
a program that reads the protein sequence data from the file and stores
it into a variable

9
Regular Expressions
• Regular expressions let you easily manipulate strings of all sorts, such as DNA
and protein sequence data.
• What's great about regular expressions is that if there's something
you want to do with a string, you usually can do it with Perl regular expressions.
• Some regular expressions are very simple. For instance, you can just use the
exact text of what you're searching for as a regular expression
• if I was looking for the word
"bioinformatics" in the text of this book, I could use the regular expression:
/bioinformatics/
Some regular expressions can be more complex, however.

10
Searching for motifs

• One of the most


common things we do
in bioinformatics is to
look for motifs

• This program finds


motifs that the user
types in at the
keyboard

11
Counting Nucleotides
• There are many things you might want to know about a piece of DNA
• Two main points to count each type of nucleotide in a DNA
1. Split the DNA into an array of single bases
2. In a loop, look at each base in turn to iterate over the positions in the string
of DNA while counting

12
A subroutine to get Seq Data

• Pass data into the


subroutine as
arguments, and
then you collect
the return value(s)
of the subroutine.

13
Scoping
• By keeping all variables a subroutine uses active only within the
subroutine, you can make it safe to call the subroutines from
anywhere.
• You make the variables specific only to the subroutine by declaring
them as myvariables.
• my is a keyword defined in Perl that limits variables to the block in
which they are used
• Hiding variables and making them local to only a restricted part of a
program, is called scoping.
• In Perl, using my variables is known as lexical scoping, and it's an
essential part of modularizing the programs.
14
Translation
• The process of the formation of Protein is termed as Translation

• A subroutine to translate DNA sequence into a peptide

15
Parsing Annotations
• It's interesting to think about how to extract the useful information.
• The FEATURES table is certainly a key part of the story.

16
Random Nucleotide Generation
• Mutation is a fundamental topic in biology. Some of them do affect the
proteins and may result in diseases such as cancer

• Using randomization, it's possible to simulate and investigate the


mechanisms of mutations in DNA and their effect upon the biological
activity of their associated proteins.

17
Lab Assignment 4
• Get the sequence file from user

1. Critically check either is it a FASTA file or not (Error handling should


be used if file is of FASTA, other print “Sorry this is not FASTA file”)
2. Check the sequence data (Either DNA, RNA, or Protein)
3. Use Parsing Annotation and get all Features from Feature Table e.g.
(mRNA, CDS, & gene)
4. Code for the complete parsing (means get a DNA sequence, its
Replication, Transcription, and Translation process)
5. Tell the number of nucleotides and amino acids.
18
19

You might also like