0% found this document useful (0 votes)

249 views10 pages

Perl DNA Sequence Manipulation Guide

This document contains a Perl tutorial covering various tasks for working with DNA/RNA sequences including: storing sequences in variables; concatenating sequences; transcribing DNA to RNA; calculating the reverse complement of a sequence; reading protein sequences from files; determining nucleotide frequencies using regular expressions and loops; and writing results to files. The tutorial provides code examples for each task and discusses concepts like using variables, file I/O, pattern matching, and conditional logic.

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

249 views10 pages

Perl DNA Sequence Manipulation Guide

Uploaded by

Jessica Mitchell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Perl

tutorial

Working with DNA Sequences

#!/usr/bin/perl -w
# Storing DNA in a variable, and printing it out
# First we store the DNA in a variable called $DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Next, we print the DNA onto the screen

print $DNA;

# Finally, we'll specifically tell the program to exit.

exit;

Concatenating the DNA sequences

#!/usr/bin/perl -w
# Concatenating DNA
# Store two DNA fragments into variables called $DNA1
#and $DNA2

$DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
$DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

print $DNA1, "\n";
print $DNA2, "\n\n";

# Concatenate the DNA fragments into a third variable and

#print them Using "string interpolation"
$DNA3 = "$DNA1$DNA2";
print "Here is the new DNA of the two fragments

version 1):\n\n";
print "$DNA3\n\n";

# An alternative way using the "dot operator":

# Concatenate the DNA fragments into a third variable and
# print them

$DNA3 = $DNA1 . $DNA2;

print "Here is the concatenation of the first two fragments
(version 2):\n\n";
print "$DNA3\n\n";

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

(version 3):\n\n";
print $DNA1, $DNA2, "\n";
exit;

TRANSCRIPTION: DNA -> RNA

#!/usr/bin/perl -w

# Transcribing DNA into RNA

# The DNA

$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Transcribe the DNA to RNA by substituting all T's with U's.

$RNA = $DNA;
$RNA =~ s/T/U/g;
# Print the RNA onto the screen
print "Here is the result of transcribing the DNA to
RNA:\n\n";
print "$RNA\n";

# Exit the program.

exit;

Reverse Complement

#!/usr/bin/perl -w
# Calculating the reverse complement of a strand of DNA

# The DNA
$DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';

# Print the DNA onto the screen

print "Here is the starting DNA:\n\n";
print "$DNA\n\n";

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

# (short for REVerse COMplement)
#
# It doesn't matter if we first reverse the string and then
# do the complementation; or if we first do the
complementation
# and then reverse the string. Same result each time.
# So when we make the copy we'll do the reverse in the same
statement.

$revcom = reverse $DNA;

-----
The DNA is now reversed.. we neeed to complement the bases
in revcom - substitute all bases by their complements.
# A->T, T->A, G->C, C->G
####Attempt 1:

$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/G/C/g;
$revcom =~ s/C/G/g;
# Print the reverse complement DNA onto the screen
print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";

#################

Does this work?? Why?

# See the text for a discussion of tr///
$revcom =~ tr/ACGTacgt/TGCAtgca/;

# Print the reverse complement DNA onto the screen

print "Here is the reverse complement DNA:\n\n";
print "$revcom\n";
print "\nThis time it worked!\n\n";
exit;

Reading Proteins in files

#!/usr/bin/perl -w
# Reading protein sequence data from a file
# The filename of the file containing the protein sequence
data

$proteinfilename = 'Name_Of_your_sequence_file.txt';

# First we have to "open" the file, and associate

# a "filehandle" with it. We choose the filehandle
# PROTEINFILE for readability.
open(PROTEINFILE, $proteinfilename) || Die ("cannot open
file");

# Now we do the actual reading of the protein sequence data

from the file, by using the angle brackets < and > to get
the input from the filehandle. We store the data into our
variable $protein.

@protein = <PROTEINFILE>;

# Now that we've got our data, we can close the file.

close PROTEINFILE;

# Print the protein onto the screen

print "Here is the protein:\n\n";
print @protein;
exit;

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

code layout..
if (condition)

do something

Finding Motifs
#!/usr/bin/perl -w
# if-elsif-else

$word = 'MNIDDKL';

# if-elsif-else conditionals

if($word eq 'QSTVSGE') {
print "QSTVSGE\n";
} elsif($word eq 'MRQQDMISHDEL') {
print "MRQQDMISHDEL\n";
}

GC CONTENT

In PCR experiments, the GC-content of primers are used to predict their annealing temperature
to the template DNA. A higher GC-content level indicates a higher melting temperature.

GC % = G + C x100

A+G+C+T

Logical:

for each base in the DNA

if base is A
count_of_A = count_of_A + 1

if base is C
count_of_C = count_of_C + 1
if base is G
count_of_G = count_of_G + 1

if base is T
count_of_T = count_of_T + 1

done

print count_of_A, count_of_C, count_of_G, count_of_T

the script

#!/usr/bin/perl -w
# Determining frequency of nucleotides
# Get the name of the file with the DNA sequence data

$dna_filename = File_name.txt;

# Remove the newline from the DNA filename

chomp $dna_filename;

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

\"$dna_filename\");
exit;
}

# Read the DNA sequence data from the file, and store it
# into the array variable @DNA
@DNA = <DNAFILE>;
# Close the file
close DNAFILE;

# From the lines of the DNA file,

# put the DNA sequence data into a single string.
$DNA = join( '', @DNA);
# Remove whitespace
$DNA =~ s/\s//g;

# Now explode the DNA into an array where each letter of

# the original string is now an element in the array.
# This will make it easy to look at each position.
# Notice that we're reusing the variable @DNA for this
purpose.
@DNA = split( '', $DNA );

# Initialize the counts.

# Notice that we can use scalar variables to hold numbers.
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$errors = 0;

# In a loop, look at each base in turn, determine which of

# the four types of nucleotides it is, and increment the
# appropriate count.

foreach $base (@DNA)

{
if ( $base eq 'A' ) {
++$count_of_A;
}
elsif ( $base eq 'C' ) {
++$count_of_C;
}
elsif ( $base eq 'G' ) {
++$count_of_G;
}
elsif ( $base eq 'T' ) {
++$count_of_T;
}
else {
print "!!!!!!!! Error - I don\'t recognize this
base: $base\n";
++$errors;
}
}

# print the results

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "errors = $errors\n";
# exit the program
exit;

---using regex ---

while($DNA =~ /a/ig){$a++}
while($DNA =~ /c/ig){$c++}
while($DNA =~ /g/ig){$g++}
while($DNA =~ /t/ig){$t++}
while($DNA =~ /[^acgt]/ig){$e++}
print "A=$a C=$c G=$g T=$t errors=$e\n";

----

Next is a new kind of loop, the foreach loop. This loop works over the elements
of an
array. The line:
foreach $base (@DNA)

Wrtiting to files

# Also write the results to a file called "countbase"

$outputfile = "countbase";
(
unless
open(COUNTBASE, ">$outputfile") || die ("Cannot open file
\"$outputfile\" to write to!!\n\n");

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

close(COUNTBASE);

Perl Techniques for Bioinformatics
No ratings yet
Perl Techniques for Bioinformatics
69 pages
Perl Programming Exercises 1 - 'A B C'
No ratings yet
Perl Programming Exercises 1 - 'A B C'
29 pages
Perl Regular Expressions for Biologists
No ratings yet
Perl Regular Expressions for Biologists
11 pages
Bioinformatics Programming Assignments
100% (1)
Bioinformatics Programming Assignments
4 pages
Bioinformatics Lab Manual V Semester
No ratings yet
Bioinformatics Lab Manual V Semester
28 pages
Lecture14-Perl in Bioinformatics
No ratings yet
Lecture14-Perl in Bioinformatics
19 pages
Bioinformatics with Perl
No ratings yet
Bioinformatics with Perl
49 pages
Perl Program
No ratings yet
Perl Program
38 pages
Introduction To Perl: Part 1
No ratings yet
Introduction To Perl: Part 1
11 pages
HW 13
No ratings yet
HW 13
6 pages
Bioperl: Perl Modules for Life Sciences
No ratings yet
Bioperl: Perl Modules for Life Sciences
47 pages
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
No ratings yet
Bio-Perl: S B Mirza 1314 Bioinformatics 7 Semester (A.n)
13 pages
B Perl: Submitted To:S .N
No ratings yet
B Perl: Submitted To:S .N
8 pages
Assignment - Idc306
No ratings yet
Assignment - Idc306
6 pages
Primr Design
No ratings yet
Primr Design
57 pages
Perl Exercises
No ratings yet
Perl Exercises
14 pages
Perl & BioPerl for Programmers
No ratings yet
Perl & BioPerl for Programmers
103 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
Sequence File Formats
No ratings yet
Sequence File Formats
22 pages
Introduction to Perl Programming
No ratings yet
Introduction to Perl Programming
25 pages
Perl Reference Card Overview
No ratings yet
Perl Reference Card Overview
2 pages
Linux Commands for Bioinformatics Tutorial
No ratings yet
Linux Commands for Bioinformatics Tutorial
3 pages
PERL Bioinformatics Course Guide
No ratings yet
PERL Bioinformatics Course Guide
2 pages
IBS Basic Problems
No ratings yet
IBS Basic Problems
10 pages
Perl Scripts for Beginners
No ratings yet
Perl Scripts for Beginners
3 pages
Beginning Perl For Bioinformatics-RVS
No ratings yet
Beginning Perl For Bioinformatics-RVS
49 pages
PERL Programming for Bioinformatics
No ratings yet
PERL Programming for Bioinformatics
3 pages
Linux Bootcamp Exercises
No ratings yet
Linux Bootcamp Exercises
9 pages
Perl Doc
No ratings yet
Perl Doc
13 pages
02 Handling Files
No ratings yet
02 Handling Files
18 pages
Afpjawprwa'tj 3
No ratings yet
Afpjawprwa'tj 3
6 pages
Bioinformatics Data Skills (PDFDrive)
No ratings yet
Bioinformatics Data Skills (PDFDrive)
30 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
Scripting Through PERL
No ratings yet
Scripting Through PERL
22 pages
Analyzing DNA with Bioinformatics Techniques
No ratings yet
Analyzing DNA with Bioinformatics Techniques
119 pages
Perl Programming Basics Tutorial
No ratings yet
Perl Programming Basics Tutorial
54 pages
Computer Manipulation of DNA and Protein Sequences
No ratings yet
Computer Manipulation of DNA and Protein Sequences
23 pages
Linux Examples Exercises
No ratings yet
Linux Examples Exercises
7 pages
Perl Reference Card #2
No ratings yet
Perl Reference Card #2
3 pages
Perl Tutorial: Based On A Tutorial by Nano Gough
No ratings yet
Perl Tutorial: Based On A Tutorial by Nano Gough
24 pages
Arhqh 32 Po 9 Lknan 2
No ratings yet
Arhqh 32 Po 9 Lknan 2
6 pages
Computational Problem For Practice
No ratings yet
Computational Problem For Practice
18 pages
Biopython Tutorial PDF
No ratings yet
Biopython Tutorial PDF
332 pages
Biopython Tutorial and Cookbook
No ratings yet
Biopython Tutorial and Cookbook
324 pages
Bio Python Tutorial
No ratings yet
Bio Python Tutorial
331 pages
Bio Python
100% (1)
Bio Python
357 pages
Introduction to Shell Scripting
No ratings yet
Introduction to Shell Scripting
6 pages
Bioinfomatics
No ratings yet
Bioinfomatics
21 pages
BioPython Cookbook
No ratings yet
BioPython Cookbook
310 pages
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
No ratings yet
Perl Short-Cut For Variable - Scalar Would Be Scalar Short-Cut Names Have The Least
23 pages
Web Technologies
No ratings yet
Web Technologies
12 pages
Practical 6 Com
No ratings yet
Practical 6 Com
5 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Pract 6
No ratings yet
Pract 6
5 pages
1.3.2 Test (TST) - Cell Biology Unit Test (Test)
No ratings yet
1.3.2 Test (TST) - Cell Biology Unit Test (Test)
14 pages
Cell Organelles: Classify Different Cell Types (Plant and Animal Tissues) and Specify The Functions of Each
No ratings yet
Cell Organelles: Classify Different Cell Types (Plant and Animal Tissues) and Specify The Functions of Each
42 pages
SDS-PAGE Protocol
No ratings yet
SDS-PAGE Protocol
3 pages
Gel Electrophoresis Lab Report
83% (6)
Gel Electrophoresis Lab Report
10 pages
Regulation of Peptidoglycan Synthesis
No ratings yet
Regulation of Peptidoglycan Synthesis
15 pages
BC 1
No ratings yet
BC 1
22 pages
Review Materials in Ergonomics
100% (2)
Review Materials in Ergonomics
35 pages
Cell Adhesion and Junctions Explained
No ratings yet
Cell Adhesion and Junctions Explained
14 pages
DNA Recombination
No ratings yet
DNA Recombination
42 pages
DNA Forms Structures and Packaging
No ratings yet
DNA Forms Structures and Packaging
4 pages
BIO CH13 Foundations
100% (1)
BIO CH13 Foundations
10 pages
G - Protein Coupled Receptors
No ratings yet
G - Protein Coupled Receptors
8 pages
Cell Modifications
100% (2)
Cell Modifications
35 pages
ISM Chapter 25
No ratings yet
ISM Chapter 25
18 pages
Genes and Elite Athletes A Roadmap For Future Research
No ratings yet
Genes and Elite Athletes A Roadmap For Future Research
8 pages
No 13 Immunological Tolerance
No ratings yet
No 13 Immunological Tolerance
49 pages
بايو محاضرة 2 نظري مترجمه
No ratings yet
بايو محاضرة 2 نظري مترجمه
13 pages
SR/CR Formulation Approaches
No ratings yet
SR/CR Formulation Approaches
21 pages
Expression of Recombinant Fusion Protein
No ratings yet
Expression of Recombinant Fusion Protein
13 pages
Laboratory Manual BTY312 Genetic Engineering
No ratings yet
Laboratory Manual BTY312 Genetic Engineering
11 pages
Fundamentals of Anatomy and Physiology 4th Edition Donald C Rizzo 2024 Scribd Download
100% (14)
Fundamentals of Anatomy and Physiology 4th Edition Donald C Rizzo 2024 Scribd Download
51 pages
Bio 200 Exam 2 Solution Key
No ratings yet
Bio 200 Exam 2 Solution Key
6 pages
Ch4 Lecture Slides
No ratings yet
Ch4 Lecture Slides
34 pages
Drug Design: Molecular Modifications
No ratings yet
Drug Design: Molecular Modifications
18 pages
Microarray Technology Overview
No ratings yet
Microarray Technology Overview
13 pages
AACR2024 Poster4077 Final
No ratings yet
AACR2024 Poster4077 Final
1 page
Antibiotics, The Basics: Classification of Veterinary Antibiotics
100% (1)
Antibiotics, The Basics: Classification of Veterinary Antibiotics
2 pages
Test 2 Study Guide-2009 Summer
No ratings yet
Test 2 Study Guide-2009 Summer
3 pages
Soal Try Out Biologi Ok 40
No ratings yet
Soal Try Out Biologi Ok 40
12 pages
Microbial Biotechnology Rapid Advances in An Area of Massive Impact
No ratings yet
Microbial Biotechnology Rapid Advances in An Area of Massive Impact
7 pages

Perl DNA Sequence Manipulation Guide

Uploaded by

Perl DNA Sequence Manipulation Guide

Uploaded by

Perl

Working with DNA Sequences

# Next, we print the DNA onto the screen

# Finally, we'll specifically tell the program to exit.

Concatenating the DNA sequences

# Print the DNA onto the screen

print "Here are the original two DNA fragments:\n\n";

# Concatenate the DNA fragments into a third variable and

# An alternative way using the "dot operator":

$DNA3 = $DNA1 . $DNA2;

# Print the same thing without using the variable $DNA3

print "Here is the concatenation of the first two fragments

TRANSCRIPTION: DNA -> RNA

# Transcribing DNA into RNA

# Print the DNA onto the screen

# Transcribe the DNA to RNA by substituting all T's with U's.

# Exit the program.

# Print the DNA onto the screen

# Calculate the reverse complement

# First, copy the DNA into new variable $revcom

$revcom = reverse $DNA;

Does this work?? Why?

# Print the reverse complement DNA onto the screen

# First we have to "open" the file, and associate

# Now we do the actual reading of the protein sequence data

# Print the protein onto the screen

Pattern matching: Motifs and Loops

Proceed ONLY if condition is true...

for each base in the DNA

print count_of_A, count_of_C, count_of_G, count_of_T

# Remove the newline from the DNA filename

# open the file, or exit

open(DNAFILE, $dna_filename) || die ("Cannot open file

# From the lines of the DNA file,

# Now explode the DNA into an array where each letter of

# Initialize the counts.

# In a loop, look at each base in turn, determine which of

foreach $base (@DNA)

# print the results

---using regex ---

# Also write the results to a file called "countbase"

print COUNTBASE "A=$a C=$c G=$g T=$t errors=$e\n";

You might also like