0% found this document useful (0 votes)

12 views7 pages

exam_programming_exercises

Uploaded by

Manuel Flores

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views7 pages

exam_programming_exercises

Uploaded by

Manuel Flores

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Exercise 1 - 3 points. Modify the code of nw.

py so as count all the ties that occur while

filling up the dynamic programming matrix. You should only consider the ties
between the highest values (i.e. if Sub=4, Ins=2 and Del=2, this should not be counted as a
tie).

command: python nw.py prot_seqs.fasta matrix.lom

output: ties: <number of ties>

Exercise 2 - 3 points. In the file msa.fasta count the number of sequences containing 20 or
more basic residues (R, K and H)

command: python basic_residues.py msa.fasta

output: NSeq: <number of sequences containing more than 20 basic residues>

Exercise 3 - 4 points. Compute the pairwise identity between both sequences in file
align.fasta. The pairwise identity between two sequences is defined as the number of
columns containing identical residues divided by the total number of ungapped
columns.

command: python identity.py align.fasta

output: identity: <percentage of identity> %
Exercise 1 - 2 points. The script nw.py contains an incomplete implementation of the
Needleman–Wunsch algorithm. Modify the code so that it counts and prints the number of
gaps in each of the aligned sequences (seq1 and seq2) and the sum of both numbers.

command: python nw.py twoseq.fasta

output: gaps seq1: <number>
gaps seq2: <number>
total gaps: <number>

Exercise 2 - 2 points. The script msa.py performs an alignment of two sequence profiles. In
this version (the same we implemented in class), the score for one cell (one column of one
profile aligned to one column of the other profile) is set to the average of the matching scores
for the individual sequences while ignoring gaps. Modify the align function in msa.py so that
the average score takes gaps into account: the individual score of any amino acid aligned with
a gap should get the value of gep.

command: python msa.py prf1.msa prf2.msa matrix.lom

output: Optimal Score: <number>

Exercise 3 - 2 points. The script similarity.py computes the pairwise identity between two
sequences in a file in fasta format. Modify it so that it reads a substitution matrix and computes
the pairwise similarity as well (given as a percentage). The pairwise similarity between two
sequences is defined as the number of columns containing pairs with values >0 in the
substitution matrix divided by the total number of ungapped columns. Similarity ≥ identity.

command: python similarity.py align.fasta matrix.lom

output: identity: <percentage> %
similarity: <percentage> %

Exercise 4 - 2 points. The script readtree1.py prints the names of all leaves of a tree (one
line per node). Modify the code so that it also prints (together with the name; and also in the
same line) node id, parent node id and a string indicating whether the node is “right” or “left”
child of the parent node. Do not modify any function, only the main code.

command: python readtree1.py tree_prots.dnd

output: ...
TX7_ANTXA/3-47 27 26 right
...

Exercise 5 - 2 points. Modify the code in the previous exercise (make a copy of readtree1.py
and save it as readtree2.py) to remove from the nodes dictionary all nodes that are left-child
leaves. Print the number of nodes after removal. Do not modify any function, only the main
code.

command: python readtree2.py tree_prots.dnd

output: numb nodes: <number>
Exercise 1 - 2 points. Complete the script profile.py so that it computes a profile or position
weight matrix. This is a matrix that displays the relative frequencies of each base in each of
the positions in the sequences in profile.fasta. All sequences have the same length and
contain no gaps.

command: python profile.py profile.fasta

output: A [<list of numbers>]
C [<list of numbers>]
G [<list of numbers>]
T [<list of numbers>]

Exercise 2 - 2 points. The script nw.py contains an implementation of the Needleman–

Wunsch algorithm. Modify the code (save it as nw_rna.py) so as to be able to align the two
sequences in dna_rna.fasta. In the final alignment nucleotides that correspond to one another
in the DNA and the RNA should be aligned.

command: python nw_rna.py dna_rna.fasta

output: Optimal score: <number>
<alignment>

Exercise 3 - 3 points. Modify the script nw.py again (save it as nw_gapvar.py) so also to
implement amino acid specific gap penalties that reduce or increase the gap opening penalties
at each position in the alignment or sequence (this is referred to as Pascarella gaps). The
penalties should scale (multiply) the gep penalty. The values are in the file pascarella.txt. As
an example, positions that are rich in glycine are more likely to have an adjacent gap than
positions that are rich in valine.

command: python nw_gapvar.py gapvarseqs.fasta pascarella.txt

output: Optimal score: <number>
<alignment>

Exercise 4 - 3 points. The script readtree.py prints all the leafs associated with each of the
nodes of a tree. Modify the script (save it as compare_trees.py) so that it counts the number
of nodes that occur in tree1.dnd but not in tree2.dnd. Report this number. Note that this
measure is named Robinson-Foulds distance and is often used in phylogeny to compare trees.

command: python compare_trees.py tree1.dnd tree2.dnd

output: number of different nodes: <number>
ALGORITHMS FOR SEQUENCE ANALYSIS IN BIOINFORMATICS – FINAL EXAM 2022

1) (1.25 point) aligns.py

Modify the function naive_aligner() in aligns.py so that it generates all possible
alignments (and its scores) of seq1 with seq2 (assume seq1 is larger than seq2) by only
adding gaps to the beginning or end of seq2. As scoring system use: match: +1,
mismatch: ‐1, gap: 0.
The output of the script should be:
FASTCAT
CAT‐‐‐‐ ‐1
‐CAT‐‐‐ ‐1
‐‐CAT‐‐ ‐3
‐‐‐CAT‐ ‐3
‐‐‐‐CAT 3

2) (1.25 points) nw_gapvar.py

Modify the script needleman_wunsch.py to implement amino acid‐specific gap
penalties. This approach (Pascarella gaps) modifies the gap penalty by scaling gep. For
instance, Gly is more likely to have an adjacent gap than other amino acids and
multiplies by a factor < 1. Scaling factors are in pascarella.txt.

3) (1.25 points) readtree.py

Modify the script readtree.py by writing a function node_info(nodes, child), where the
argument child can take the values “left” or “right”. The function should return a list of
lists with information of only leaves (have no left or right child) that are left or right
childs (as requested).
The returned data should have the format:
[['TXAA_ANTXA', 28, 26],
['TXH8_ANTS7', 31, 29],
['TXH7_ANTS7', 34, 32],
['TX6_ANTXA', 35, 21],
['TXA2_ANTFU', 39, 37],
['TXAC_ANTEL', 40, 36],
...
elements in each nested list are i) protein name, ii) node id and iii) parent node id,
respectively.
ALGORITHMS FOR SEQUENCE ANALYSIS IN BIOINFORMATICS – RETAKE EXAM 2022

1) (1.25 point) aligns.py

Modify the function naive_aligner() in aligns.py and the match/mismatch scores so that
it generates all possible alignments (and its scores) of seq1 with seq2 (consider
alignments up to length seq1 + seq2 by adding gaps at the end of seq1 and both at the
beginning and end of seq2; assume seq1 shorter than seq2).
The output of the script should be:
FAST‐‐‐
AST‐‐‐‐ ‐6
‐AST‐‐‐ 9
‐‐AST‐‐ ‐6
‐‐‐AST‐ ‐6
‐‐‐‐AST ‐6

2) (1.25 points) nw_ties.py

Modify the code of nw_ties.py so as the function nw() returns the count of all the ties
that occur while filling up the dynamic programming matrix. You should only consider
ties between the highest scores (i.e. if subst=4, inser=2 and delet=2, this should not be
counted as a tie).
The script should print: 7

3) (1.25 points) msa.py

The script msa.py performs an alignment of two sequence profiles (two groups of
aligned sequences). In this version (similar to the one implemented in class), the score
for one cell (one column of one profile aligned to one column of the other profile) is set
to the average of the matching scores for the individual sequences while ignoring gaps.
Modify the align() function in msa.py so that i) it prints the optimal score and ii) the
average score for each cell takes gaps into account: the individual score of any amino
acid aligned with a gap should get the value of gep.
The optimal score after the modification should be 16.
ALGORITHMS FOR SEQUENCE ANALYSIS IN BIOINFORMATICS – FINAL EXAM 2023

1) (1 point) similarity.py
Complete the script similarity.py so that it computes the similarity (in percentage)
between two protein sequences. Assume that both sequences have been previously
aligned and have the same length (considering gaps). The pairwise similarity between
the two sequences is the number of columns containing pairs with values >0 in the
substitution matrix divided by the total number of ungapped columns x100. Similarity ≥
identity.

2) (1 point) left_or_right.py
Modify the script readtree.py so that the function get_leaves() returns the names of the
left or the right childs (as requested; ex: get_leaves(nodes, “left”) should return only left
leaves).
With the provided example, the script should print:
['B', 'D', 'F']
ALGORITHMS FOR SEQUENCE ANALYSIS IN BIOINFORMATICS – FINAL EXAM 2023

1) (1 point) profile.py
Complete the script profile.py so that it computes a profile or position weight matrix.
This matrix displays the relative frequencies of each base in each of the positions in the
sequences in profile.fasta. All sequences have been aligned, have the same length and
contain no gaps. The output should be like:
A [0.3, 0.6, 0.1, 0, 0, 0.6, 0.7, 0.2, 0.1]
C [0.2, 0.2, 0.1, 0, 0, 0.2, 0.1, 0.1, 0.2]
G [0.1, 0.1, 0.7, 1, 0, 0.1, 0.1, 0.5, 0.1]
T [0.4, 0.1, 0.1, 0, 1, 0.1, 0.1, 0.2, 0.6]

2) (1 point) orphans.py
Modify the script orphans.py so that the function get_orphans() returns the names of
the leaves having no sister. With the provided example, the script should print:
['C', 'F']

3) (1 point) smith_waterman.py

Modify smith_waterman.py so that:

a) the local alignment uses match and mismatch scores instead of a substitution matrix

b) the function sw() accepts two additional parameters (match and mismatch)

c) the algorithm implementation gives the following alignment with the provided
examples and the parameters match=2, mismatch=0 and gep=-1.
CAAT
CA-T

AIML Lab Manual
No ratings yet
AIML Lab Manual
39 pages
R Cheat Sheet Merged
100% (2)
R Cheat Sheet Merged
35 pages
DWM EXP 1 to 14 C_merged_compressed
No ratings yet
DWM EXP 1 to 14 C_merged_compressed
104 pages
Lectures_9-12
No ratings yet
Lectures_9-12
39 pages
Cycle 1 Programs
No ratings yet
Cycle 1 Programs
20 pages
DSA Lab
No ratings yet
DSA Lab
35 pages
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
Lec 2 PDF
No ratings yet
Lec 2 PDF
28 pages
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
No ratings yet
MOOC Project Work - Sequence Analysis - Data Analysis With Python 2021
29 pages
Computational and Systems Biology Assignment Help
100% (1)
Computational and Systems Biology Assignment Help
15 pages
R - Phylogenetics (Autoguardado)
No ratings yet
R - Phylogenetics (Autoguardado)
39 pages
COMP204 W2019 Final Solution
No ratings yet
COMP204 W2019 Final Solution
28 pages
Wise CLZ R Programming 2022 Cse 2 2 Sem R20
No ratings yet
Wise CLZ R Programming 2022 Cse 2 2 Sem R20
49 pages
Python Coddes
No ratings yet
Python Coddes
28 pages
Tareas para la asignatura Python 1 de la THD
No ratings yet
Tareas para la asignatura Python 1 de la THD
20 pages
Lösungen Zu Den Exercises AI Python
No ratings yet
Lösungen Zu Den Exercises AI Python
26 pages
CS605 DA
No ratings yet
CS605 DA
21 pages
2023s2 Cosc122 Assignment1 Handout
No ratings yet
2023s2 Cosc122 Assignment1 Handout
9 pages
solutionsExerciseMaster11 23
No ratings yet
solutionsExerciseMaster11 23
13 pages
Function Solutions
No ratings yet
Function Solutions
10 pages
Python Regex - Docx 2 (5 Files Merged)
No ratings yet
Python Regex - Docx 2 (5 Files Merged)
7 pages
BT3040 - BIOINFORMATICS - Assignment 4: Question 1
No ratings yet
BT3040 - BIOINFORMATICS - Assignment 4: Question 1
9 pages
Nterms Int (Input (Enter Num) )
No ratings yet
Nterms Int (Input (Enter Num) )
4 pages
Group17 2
No ratings yet
Group17 2
9 pages
Qualitative Analysis of Biomolecules: 1. The Human Genome
No ratings yet
Qualitative Analysis of Biomolecules: 1. The Human Genome
6 pages
Algorithm Assignment
No ratings yet
Algorithm Assignment
3 pages
bio 9
No ratings yet
bio 9
9 pages
AIML LAB EXPS
No ratings yet
AIML LAB EXPS
16 pages
Ai Lab Programs (2)
No ratings yet
Ai Lab Programs (2)
12 pages
Algo
No ratings yet
Algo
10 pages
Bi183 HW2
No ratings yet
Bi183 HW2
4 pages
File 2
No ratings yet
File 2
17 pages
Ai Myh
No ratings yet
Ai Myh
8 pages
DA Lab ANSWERS
No ratings yet
DA Lab ANSWERS
10 pages
CSE160-Final-23wi-key
No ratings yet
CSE160-Final-23wi-key
10 pages
lab_taskR
No ratings yet
lab_taskR
6 pages
Biol1001_RAssignment_1 (1)
No ratings yet
Biol1001_RAssignment_1 (1)
5 pages
Lab2
No ratings yet
Lab2
7 pages
Sol CH 2
No ratings yet
Sol CH 2
8 pages
Exam Sample Questions (1)
No ratings yet
Exam Sample Questions (1)
6 pages
Python Introduction 2019
No ratings yet
Python Introduction 2019
27 pages
c4b06c5b42571455f3a22c97039ebd9b_MIT6_047F15_pset2
No ratings yet
c4b06c5b42571455f3a22c97039ebd9b_MIT6_047F15_pset2
5 pages
QB Test
No ratings yet
QB Test
19 pages
Loops: Genome 559: Introduction To Statistical and Computational Genomics Prof. James H. Thomas
No ratings yet
Loops: Genome 559: Introduction To Statistical and Computational Genomics Prof. James H. Thomas
27 pages
DocumentBioSYS PHD Program - Semantic Similarity
No ratings yet
DocumentBioSYS PHD Program - Semantic Similarity
4 pages
?quiz 2?
No ratings yet
?quiz 2?
5 pages
R Cheat Sheet: 1. Basics 4. Input and Export of Data
100% (1)
R Cheat Sheet: 1. Basics 4. Input and Export of Data
4 pages
Design and Analysis of Algorithm Lab Manual - Answers
No ratings yet
Design and Analysis of Algorithm Lab Manual - Answers
13 pages
BIF_Problems
No ratings yet
BIF_Problems
2 pages
solutionsExerciseMaster1 10
No ratings yet
solutionsExerciseMaster1 10
9 pages
MIT6 006F11 ps4
No ratings yet
MIT6 006F11 ps4
5 pages
Lab Manual (AI)
100% (1)
Lab Manual (AI)
17 pages
Manual de Ejercicios de Python
No ratings yet
Manual de Ejercicios de Python
1 page
Assignment4-NIT CALICUT DSA
No ratings yet
Assignment4-NIT CALICUT DSA
7 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
4 pages
Essentials of Bioinformatics, Volume I Understanding Bioinformatics Genes to Proteins Full-Resolution Download
100% (12)
Essentials of Bioinformatics, Volume I Understanding Bioinformatics Genes to Proteins Full-Resolution Download
16 pages
HW1 2014
No ratings yet
HW1 2014
2 pages
Exome Sequencing
No ratings yet
Exome Sequencing
26 pages
6.006 Introduction To Algorithms: Mit Opencourseware
No ratings yet
6.006 Introduction To Algorithms: Mit Opencourseware
5 pages
Solutions Manual Matlab A Practical Introduction Programming Problem Solving 2nd Edition Stormy Attaway
73% (41)
Solutions Manual Matlab A Practical Introduction Programming Problem Solving 2nd Edition Stormy Attaway
35 pages
Cloning Vectors Presentation Updated
No ratings yet
Cloning Vectors Presentation Updated
11 pages
Restriction Enzymes
No ratings yet
Restriction Enzymes
18 pages
KA02 New Knowledge Self Review
80% (5)
KA02 New Knowledge Self Review
29 pages
Compbio Paper Kelompok 1 PDF
No ratings yet
Compbio Paper Kelompok 1 PDF
21 pages
Transgenic Animal Production
No ratings yet
Transgenic Animal Production
16 pages
Human Genome Project and DNA Fingerprinting
80% (10)
Human Genome Project and DNA Fingerprinting
2 pages
COMP90016 2023 06 Data Sources
No ratings yet
COMP90016 2023 06 Data Sources
64 pages
Your Passport to a Career in Bioinformatics 2nd Edition Prashanth N. Suravajhala - The complete ebook set is ready for download today
100% (2)
Your Passport to a Career in Bioinformatics 2nd Edition Prashanth N. Suravajhala - The complete ebook set is ready for download today
64 pages
Fish
No ratings yet
Fish
15 pages
Xpert HBV Viral Load CEIVD Datasheet 3130 01
No ratings yet
Xpert HBV Viral Load CEIVD Datasheet 3130 01
1 page
Miprimer An Empirical-Based QPCR Primer Design Method For Small Noncoding microRNA
No ratings yet
Miprimer An Empirical-Based QPCR Primer Design Method For Small Noncoding microRNA
10 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
Protocol Sanger Sequencing SARS-CoV-2 B117 B1351 Brochure
No ratings yet
Protocol Sanger Sequencing SARS-CoV-2 B117 B1351 Brochure
10 pages
Download Complete Algorithms for Computational Biology First International Conference AlCoB 2014 Tarragona Spain July 1 3 2014 Proceedigns 1st Edition Adrian-Horia Dediu PDF for All Chapters
No ratings yet
Download Complete Algorithms for Computational Biology First International Conference AlCoB 2014 Tarragona Spain July 1 3 2014 Proceedigns 1st Edition Adrian-Horia Dediu PDF for All Chapters
36 pages
Biomedicine Course Structure
No ratings yet
Biomedicine Course Structure
4 pages
Ont Pricelist
No ratings yet
Ont Pricelist
10 pages
TN Automated DNA Library Preparation For PacBio Long-Read Whole Genome Sequencing With The DreamPrep NGS Compact 402700
No ratings yet
TN Automated DNA Library Preparation For PacBio Long-Read Whole Genome Sequencing With The DreamPrep NGS Compact 402700
4 pages
Computational Biology
No ratings yet
Computational Biology
12 pages
Homology Modeling, Also Known As Comparative Modeling of
No ratings yet
Homology Modeling, Also Known As Comparative Modeling of
19 pages
Sequence Similarity Searching: Basic Local Alignment Search Tool
No ratings yet
Sequence Similarity Searching: Basic Local Alignment Search Tool
47 pages
Bioengineering and Electrical Engineering
No ratings yet
Bioengineering and Electrical Engineering
23 pages
Genetic Modified Food
No ratings yet
Genetic Modified Food
16 pages
Genomics
No ratings yet
Genomics
6 pages
University of Melbourne 2021 Profile
No ratings yet
University of Melbourne 2021 Profile
20 pages
Exam Paper Requirements
No ratings yet
Exam Paper Requirements
3 pages
EX 1 TREE THINKING CONCEPTS Worksheet
No ratings yet
EX 1 TREE THINKING CONCEPTS Worksheet
8 pages
Robert N. Kent III: Ducation
No ratings yet
Robert N. Kent III: Ducation
1 page
Mit Courses
No ratings yet
Mit Courses
101 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet

exam_programming_exercises

Uploaded by

exam_programming_exercises

Uploaded by

Exercise 1 - 3 points. Modify the code of nw.

py so as count all the ties that occur while

command: python nw.py prot_seqs.fasta matrix.lom

command: python basic_residues.py msa.fasta

command: python identity.py align.fasta

command: python nw.py twoseq.fasta

command: python msa.py prf1.msa prf2.msa matrix.lom

command: python similarity.py align.fasta matrix.lom

command: python readtree1.py tree_prots.dnd

command: python readtree2.py tree_prots.dnd

command: python profile.py profile.fasta

Exercise 2 - 2 points. The script nw.py contains an implementation of the Needleman–

command: python nw_rna.py dna_rna.fasta

command: python nw_gapvar.py gapvarseqs.fasta pascarella.txt

command: python compare_trees.py tree1.dnd tree2.dnd

1) (1.25 point) aligns.py

2) (1.25 points) nw_gapvar.py

3) (1.25 points) readtree.py

1) (1.25 point) aligns.py

2) (1.25 points) nw_ties.py

3) (1.25 points) msa.py

Modify smith_waterman.py so that:

You might also like