0% found this document useful (0 votes)
255 views

Manual de Ejercicios de Python

The document provides code solutions to exercises involving manipulating DNA sequences in Python. It includes examples of calculating AT content in a sequence, taking the complement of a sequence, calculating restriction fragment lengths after digestion with an enzyme, and writing DNA sequences to files in FASTA format. The solutions demonstrate reading and writing files, string manipulation functions, regular expressions, and processing sequences from a file.

Uploaded by

Daniel Alonso
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
255 views

Manual de Ejercicios de Python

The document provides code solutions to exercises involving manipulating DNA sequences in Python. It includes examples of calculating AT content in a sequence, taking the complement of a sequence, calculating restriction fragment lengths after digestion with an enzyme, and writing DNA sequences to files in FASTA format. The solutions demonstrate reading and writing files, string manipulation functions, regular expressions, and processing sequences from a file.

Uploaded by

Daniel Alonso
Copyright
© © All Rights Reserved
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 1

Chapter 2: Printing and manipulating text header_1 = "ABC123"

Using the data from part one, write a program that will header_2 = "DEF456"
1.Calculating AT content print out the original genomic DNA sequence with coding header_3 = "HIJ789"
bases in uppercase and non-coding bases in lowercase.
Here's a short DNA sequence: # set the values of all the sequence variables
Solution: seq_1 = "ATCGTACGATCGATCGATCGCTAGACGTATCG"
ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT seq_2 = "actgatcgacgatcgatcgatcacgact"
my_dna = seq_3 = "ACTGAC-ACTGT-ACTGTA----CATGTG"
Write a program that will print out the AT content of this "ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCG
DNA sequence. Hint: you can use normal mathematical symbols ATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGAC # make three files to hold the output
like add (+), subtract (-), multiply (*), divide (/) and TACTAT" output_1 = open(header_1 +
parentheses to carry out calculations on numbers in Python. exon1 = my_dna[0:63] "/home/daniel/Python/exercises/Chapter_3/exercises/one.fast
intron = my_dna[63:90] a", "w")
Solution: exon2 = my_dna[90:] output_2 = open(header_2 +
print(exon1 + intron.lower() + exon2) "/home/daniel/Python/exercises/Chapter_3/exercises/two.fast
from __future__ import division a", "w")
ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGA output_3 = open(header_3 +
my_dna = TCGAtcgatcgatcgatcgatcgatcatgctATCATCGATCGATATCGATGCATCGACT "/home/daniel/Python/exercises/Chapter_3/exercises/three.fa
"ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT" ACTAT sta", "w")
length = len(my_dna)
a_count = my_dna.count('A') Chapter 3: Reading and writing files # write sequence 1 to output file 1
t_count = my_dna.count('T') output_1.write('>' + header_1 + '\n' + seq_1 + '\n')
7.Splitting genomic DNA
at_content = (a_count + t_count) / length # write sequence 2 to output file 2
print("AT content is " + str(at_content)) Look in the chapter_3 folder for a file called output_2.write('>' + header_2 + '\n' + seq_2.upper() +
genomic_dna.txt – it contains the same piece of genomic DNA '\n')
AT content is 0.685185185185 that we were using in the final exercise from chapter 2.
Write a program that will split the genomic DNA into coding # write sequence 3 to output file 3
2.Complementing DNA and non-coding parts, and write these sequences to two output_3.write('>' + header_3 + '\n' + seq_3.replace('-',
separate files. '') + '\n')
Here's a short DNA sequence:
Hint: use your solution to the last exercise from chapter 2 Chapter 4: Lists and loops
ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT as a starting point.
10.Processing DNA in a file
Write a program that will print the complement of this Solution:
sequence. The file input.txt contains a number of DNA sequences, one
# open the file and read its contents per line. Each sequence starts with the same 14 base pair
Solution: dna_file = fragment – a sequencing adapter that should have been
open("/home/daniel/Python/exercises/Chapter_3/exercises/gen removed. Write a program that will (a) trim this adapter
my_dna = omic_dna.txt") and write the cleaned sequences to a new file and (b) print
"ACTGATCGATTACGTATAGTATTTGCTATCATACATATATATCGATGCGTTCAT" my_dna = dna_file.read() the length of each sequence to the screen.
replacement1 = my_dna.replace('A', 't')
print(replacement1) # extract the different bits of DNA sequence Solution:
replacement2 = replacement1.replace('T', 'a') exon1 = my_dna[0:62]
print(replacement2) intron = my_dna[62:90] # open the input file
replacement3 = replacement2.replace('C', 'g') exon2 = my_dna[90:] file =
print(replacement3) open("/home/daniel/Python/exercises/Chapter_4/exercises/inp
replacement4 = replacement3.replace('G', 'c') # open the two output files ut.txt")
print(replacement4) coding_file =
print(replacement4.upper()) open("/home/daniel/Python/exercises/Chapter_3/exercises/cod # open the output file
ing_dna.txt", "w") output =
tCTGtTCGtTTtCGTtTtGTtTTTGCTtTCtTtCtTtTtTtTCGtTGCGTTCtT noncoding_file = open("/home/daniel/Python/exercises/Chapter_4/exercises/tri
tCaGtaCGtaatCGatatGataaaGCataCtatCtatatataCGtaGCGaaCta open("/home/daniel/Python/exercises/Chapter_3/exercises/non mmed.txt", "w")
tgaGtagGtaatgGatatGataaaGgatagtatgtatatatagGtaGgGaagta coding_dna.txt", "w")
tgactagctaatgcatatcataaacgatagtatgtatatatagctacgcaagta # go through the input file one line at a time
TGACTAGCTAATGCATATCATAAACGATAGTATGTATATATAGCTACGCAAGTA # write the sequences to the output files for dna in file:
coding_file.write(exon1 + exon2)
3.Restriction fragment lengths noncoding_file.write(intron) # get the substring from the 15th character to the end
trimmed_dna = dna[14:]
Here's a short DNA sequence: 8.Writing a FASTA file
# get the length of the trimmed sequence
ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT FASTA file format is a commonly-used DNA and protein trimmed_length = len(trimmed_dna) - 1
sequence file format. A single sequence in FASTA format
The sequence contains a recognition site for the EcoRI looks like this: # print out the trimmed sequence
restriction enzyme, which cuts at the motif G*AATTC (the output.write(trimmed_dna)
position of the cut is indicated by an asterisk). Write a >sequence_name
program which will calculate the size of the two fragments ATCGACTGATCGATCGTACGAT # print out the length to the screen
that will be produced when the DNA sequence is digested print("processed sequence with length " +
with EcoRI. Where sequence_name is a header that describes the sequence str(trimmed_length))
(the greater-than symbol indicates the start of the header
Solution: line). Often, the header contains an accession number that processed sequence with length 42
relates to the record for the sequence in a public sequence processed sequence with length 37
my_dna = database. A single FASTA file can contain multiple processed sequence with length 48
"ACTGATCGATTACGTATAGTAGAATTCTATCATACATATATATCGATGCGTTCAT" sequences, like this: processed sequence with length 33
frag1_length = my_dna.find("GAATTC") + 1 processed sequence with length 47
frag2_length = len(my_dna) - frag1_length >sequence_one
print("length of fragment one is " + str(frag1_length)) ATCGATCGATCGATCGAT 11.Multiple exons from genomic DNA
print("length of fragment two is " + str(frag2_length)) >sequence_two
ACTAGCTAGCTAGCATCG The file genomic_dna.txt contains a section of genomic DNA,
length of fragment one is 22 >sequence_three and the file exons.txt contains a list of start/stop
length of fragment two is 33 ACTGCATCGATCGTACCT positions of exons. Each exon is on a separate line and
the start and stop positions are separated by a comma.
4.Splicing out introns, part one Write a program that will create a FASTA file for the Write a program that will extract the exon segments,
following three sequences – make sure that all sequences concatenate them, and write them to a new file.
Here's a short section of genomic DNA: are in upper case and only contain the bases A, T, G and C.
Solution:
ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGA Sequence header DNA sequence
TCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGACT ABC123 ATCGTACGATCGATCGATCGCTAGACGTATCG # open the genomic dna file and read the contents
ACTAT DEF456 actgatcgacgatcgatcgatcacgact genomic_dna =
HIJ789 ACTGAC-ACTGT—ACTGTA----CATGTG open("/home/daniel/Python/exercises/Chapter_4/exercises/gen
It comprises two exons and an intron. The first exon runs omic_dna.txt").read()
from the start of the sequence to the sixty-third Solution:
character, and the second exon runs from the ninety- first # open the exons locations file
character to the end of the sequence. Write a program that # set the values of all the header variables exon_locations =
will print just the coding regions of the DNA sequence. header_1 = "ABC123" open("/home/daniel/Python/exercises/Chapter_4/exercises/exo
header_2 = "DEF456" ns.txt")
Solution: header_3 = "HIJ789"
# create a variable to hold the coding sequence
my_dna = # set the values of all the sequence variables coding_sequence = ""
"ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCG seq_1 = "ATCGTACGATCGATCGATCGCTAGACGTATCG"
ATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGAC seq_2 = "actgatcgacgatcgatcgatcacgact" # go through each line in the exon locations file
TACTAT" seq_3 = "ACTGAC-ACTGT--ACTGTA----CATGTG" for line in exon_locations:
exon1 = my_dna[0:63]
exon2 = my_dna[90:] # make a new file to hold the output # split the line using a comma
print(exon1 + exon2) output = positions = line.split(',')
open("/home/daniel/Python/exercises/Chapter_3/exercises/seq
ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCGA uences.fasta", "w") # get the start and stop positions
TCGAATCATCGATCGATATCGATGCATCGACTACTAT start = int(positions[0])
# write the header and sequence for seq1 stop = int(positions[1])
5.Splicing out introns, part two output.write('>' + header_1 + '\n' + seq_1 + '\n')
# extract the exon from the genomic dna
Using the data from part one, write a program that will # write the header and uppercase sequences for seq2 exon = genomic_dna[start:stop]
calculate what percentage of the DNA sequence is coding. output.write('>' + header_2 + '\n' + seq_2.upper() + '\n')
# append the exon to the end of the current coding
Solution: # write the header and sequence for seq3 with hyphens sequence
removed coding_sequence = coding_sequence + exon
from __future__ import division output.write('>' + header_3 + '\n' + seq_3.replace('-', '')
my_dna = + '\n') # write the coding sequence to an output file
"ATCGATCGATCGATCGACTGACTAGTCATAGCTATGCATGTAGCTACTCGATCGATCG output =
ATCGATCGATCGATCGATCGATCGATCATGCTATCATCGATCGATATCGATGCATCGAC 9.Writing multiple FASTA files open("/home/daniel/Python/exercises/Chapter_4/exercises/cod
TACTAT" ing_sequence.txt", "w")
exon1 = my_dna[0:63] Use the data from the previous exercise, but instead of output.write(coding_sequence)
exon2 = my_dna[90:] creating a single FASTA file, create three new FASTA files output.close()
coding_length = len(exon1 + exon2) – one per sequence. The names of the FASTA files should be
total_length = len(my_dna) the same as the sequence header names, with the
print(100 * coding_length / total_length) extension .fasta.

78.0487804878 Solution:

6.Splicing out introns, part three # set the values of all the header variables

You might also like