Algorithm Assignment

The document outlines an assignment to write code for multiple sequence alignment. It provides instructions to read protein sequences from a file, perform pairwise and multiple sequence alignments, calculate Kimura distances between sequences, cluster the sequences using these distances, and output the multiple sequence alignment and score. Guidelines are given for various steps and scoring criteria.

Uploaded by

Jibon Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views3 pages

Algorithm Assignment

Uploaded by

Jibon Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Assignment Content

Your task is to write code for multiple sequence alignment. This is an individual assignment. Please
see the comments on rubrics and marking at the bottom.

You will read a text file containing a collection of protein sequences and output their alignment
together with its score. You will use BLOSUM50 as the substitution matrix and a linear gap penalty
with parameter d=8.

The steps will be as follows. The total marks for this will be out of 100.

Python code for pairwise alignment of sequences is in the attached zip file. You should make sure
you can unzip and run this code. Running the initial pairwise alignment file should return a
visualisation of a Needleman-Wunsch (NW) global alignment with its corresponding score for any
given sequences X and Y. (0 marks) Note: when running this for the first time you may see the
following: ModuleNotFoundError: No module named 'blosum'. This means you will need to import
that package using an installer like pip or conda. Ensure this is version 2.0.2 as this updated very
recently causing v1/2 code to be buggy.
Write a piece of code that reads a text file containing several protein sequences (such as those in
./sequences). It then takes each pair of sequences from that file and outputs a matrix that gives the
score of the optimal NW alignment between the two sequences. A sample output for file
multiple3.txt is provided. Submit the code for this part in part2.py. Note that diagonal entries align
sequences against themselves. (25 marks)
You will get code from us that computes the alignment of a sequence to a profile and its score.
You will adapt the code from (2) to compute distances using the Kimura model -- details below. You
should create a two-dimensional matrix whose (i, j)th entry is the Kimura distance between
sequence i and sequence j. (25 marks)
You will cluster all sequences using the distances in 3(i), using an existing python library -- details
below (20 marks)
You will use the clustering in (ii) to create the guide tree for a multiple alignment, and calculates a
multiple alignment and its score, and outputs it. Your code should be able to handle multiple128.txt.
Submit this as part 3a.py (20 marks)
You will try any other method to find a better score than what you got in 3(a). This method is for you
to decide. One option is to choose the closest pair of sequences from 3(i) and adding all other
sequences in the order of the average distance from those already in the profile (10 marks). Submit
this as part3b.py.
Your programs part3a.py and part3b.py each should read a file containing several protein sequences
(such as those in ./sequences), compute their multiple alignment, and output both the multiple
alignment and its score. An example output could be as follows (of course, your alignment and its
score could be different):

Reading input sequences from file multiple10.txt

Number of sequences 10
Computed alignment:
A------WCPP-SMS-WKR-CC--HTTCNPTCNSQHVQT--IYEHMASTSG--GAKVHDC-T-D-A-MN-
--------CPP-SMS-KEKACC--HTTCNPTCNMQHVQTENMYEHMASTSGVHDPDCH-CETVECAAMP-
A------WCPP-SHI-KKA-CC--HTTCNPTCNSQHVQT--IPEHMASTSG---V-VD-C-T-V-A-MN-
APG---TMCPPGSMS-KEKACC--HTTGNPTRNMQHVPTLNMYEHMASTSG-----VHDC-T-V-A-MP-
A------MCPP-SM--KF--CC--HTTCNPTCNSQHVQT--HAEHMASTSG-----VNDC-T-Q-A-MP-
A------MCPP-SMSFSKKACR--HTTCNPTCNQQHVQTEDIYEQMASTSD-----WNDC-T-V-Q-MP-
A------WCPP-SMS-WKR-CC--HTTCNPTCNSQHLQT--IYEHMASTSG-----VHDG-I-D-A-MN-
APGSIHTMCPPGSMS-KEKACC--HTTGNPTRNMQHVQTENMYEHMASTSG-----VHDC-T-V-A-MP-
--------CPP-SMS-KEKACCPTHTTCNPTCNKQCVQTENMYEHEASTSG-----LHDQ-T-V-A-MP-
NPG---TMCPPGSMS-KEKACC--HTTCNPTCNMQHVQTFNMYEHMGDTSG-----VHDC-T-V-A-MPG
Score of alignment: 8833
Process finished with exit code 0
The programming assignment will be due on Friday 4th May 2024 at 2PM.
Kimura Model
The distance between two protein sequences X and Y can be calculated as follows (Kimura, 1983).
More details here.

align X and Y
look at each column of the alignment, and ignore any columns with gaps
let positions_scored the number of remaining columns
let exact_matches be the number of columns where the two residues are exactly the same
Calculate the distance as follows:
S = exact_matches / positions_scored
D=1-S
distance = -ln( 1 - D - 0.2 D2) -- here ln is the natural log, which is log to the base e = 2.71828...
Clustering
For clustering, it is recommended that you use an agglomerative clustering method. Given n
"objects" (sequences in your case) and an n x n matrix of distances between these objects, it will
group together either:
two objects
an object with a group of objects
two groups of objects
until you get a single group. The grouping is based on distance. More details can be found here.
The little example below assumes five objects numbered 0 through 4, and the distances between
them are given by the array X. Here is code that does the clustering, with some hints on how to use
as a guide tree:
from sklearn.cluster import AgglomerativeClustering
import numpy as np
D = np.array([[0, 1, 2, 4, 4], [1, 0, 2, 4, 4], [2, 2, 0, 2, 2], [4, 4, 2, 0, 1], [4, 4, 2, 1, 0]])
model = AgglomerativeClustering(linkage='average', metric='precomputed', distance_threshold =
None)
#cluster the objects based on the distances
cluster = model.fit(D)
n_objects = len(cluster.labels_) # this is just the number of sequences
# cluster.children_ gives the hierarchical clustering
# leaves are the original objects and labelled from 0 to 4 here
# the internal nodes in the cluster are labelled from 5 onwards
next_node = n_objects
for i, merge in enumerate(cluster.children_):
print("Align ", merge[0], " with ", merge[1], " to give ", next_node)
next_node+=1
print("done")
Marking criteria
You are expected to do this code yourself. Collusion and plagiarism will be checked for.
Marks for each part are as follows:
- correctness of code (70% weight) -- does the code produce the right output, and does it look likely
it will work for all inputs?
- readability and clarity of code (30% weight). Please put ample comments that explain how your
code works.

Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet
exam_programming_exercises
No ratings yet
exam_programming_exercises
7 pages
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
CSCI374_Homework1
No ratings yet
CSCI374_Homework1
5 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
From Everand
UNIX Shell Programming Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Learn Programming Using C#
From Everand
Learn Programming Using C#
Taurius Litvinavicius
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Coding In C Decoded: Decoded, #1
From Everand
Coding In C Decoded: Decoded, #1
D Brown
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
A Beginner's guide to Python
From Everand
A Beginner's guide to Python
Steven Mcananey
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
C++ Functions and tutorial
From Everand
C++ Functions and tutorial
Nino Paiotta
No ratings yet
C Programming
From Everand
C Programming
Netra
No ratings yet
Gd Script
From Everand
Gd Script
Marijo Trkulja
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Daa Assignment 10 Aryan Project
No ratings yet
Daa Assignment 10 Aryan Project
11 pages
CS460___Assignments (3)
No ratings yet
CS460___Assignments (3)
12 pages
Perl One-Liners: 130 Programs That Get Things Done
From Everand
Perl One-Liners: 130 Programs That Get Things Done
Peteris Krumins
4/5 (3)
"C Programming for Beginners: A Step-by-Step Guide"
From Everand
"C Programming for Beginners: A Step-by-Step Guide"
Lov kush
No ratings yet
HPC Practical 2025 (3)
No ratings yet
HPC Practical 2025 (3)
19 pages
Python Programming Concepts
From Everand
Python Programming Concepts
MRB
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Hierarchical Clustering Implementation
No ratings yet
Hierarchical Clustering Implementation
34 pages
Multiple Seq Alignment
No ratings yet
Multiple Seq Alignment
36 pages
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
No ratings yet
Lecture 5: Multiple Sequence Alignment: Introduction To Computational Biology
34 pages
Advanced SAS Interview Questions You'll Most Likely Be Asked
From Everand
Advanced SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
So Lab Manual
No ratings yet
So Lab Manual
10 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Python Interview Questions You'll Most Likely Be Asked
From Everand
Python Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
2/5 (1)
Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment
No ratings yet
Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment
20 pages
DSA Lab
No ratings yet
DSA Lab
35 pages
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
School of Computer Engineering: Lab Records
No ratings yet
School of Computer Engineering: Lab Records
28 pages
Align 2
No ratings yet
Align 2
29 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
MULTIPLE SEQUENCE ALIGNMENT (1)
No ratings yet
MULTIPLE SEQUENCE ALIGNMENT (1)
18 pages
Analytical
No ratings yet
Analytical
24 pages
Lab 3
No ratings yet
Lab 3
4 pages
School of Computer Engineering: Lab Records
No ratings yet
School of Computer Engineering: Lab Records
28 pages
CS223final Project F2017-1
No ratings yet
CS223final Project F2017-1
4 pages
BE EXPERT IN JAVA Part- 2: Learn Java programming and become expert
From Everand
BE EXPERT IN JAVA Part- 2: Learn Java programming and become expert
Ummed Singh
No ratings yet
Practical 7 Thsem
No ratings yet
Practical 7 Thsem
50 pages
All HPC Programs
No ratings yet
All HPC Programs
16 pages
Irr, NPV, Pi
No ratings yet
Irr, NPV, Pi
49 pages
MTS-PST-311 Course Outline (BGEN3, BMEN3, and BMMP3) - 2024
No ratings yet
MTS-PST-311 Course Outline (BGEN3, BMEN3, and BMMP3) - 2024
8 pages
A Tutorial On Meta-Heuristics For Optimization
No ratings yet
A Tutorial On Meta-Heuristics For Optimization
75 pages
Plant Seedlings Classification Using Transfer Learning: July 2021
100% (1)
Plant Seedlings Classification Using Transfer Learning: July 2021
13 pages
Algorithm and Flowchart
No ratings yet
Algorithm and Flowchart
27 pages
JNTUH - B Tech - 2019 - 1 1 - May - R16 - 131AA Mathematics I
No ratings yet
JNTUH - B Tech - 2019 - 1 1 - May - R16 - 131AA Mathematics I
2 pages
Lecture On Pattern Classification and Pattern Association
No ratings yet
Lecture On Pattern Classification and Pattern Association
16 pages
CHAPTER 2 (Done)
No ratings yet
CHAPTER 2 (Done)
30 pages
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
No ratings yet
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
8 pages
MODULE in Mathematics in The Modern World Week 14 17
No ratings yet
MODULE in Mathematics in The Modern World Week 14 17
38 pages
jrc120469 Historical Evolution of Ai-V1.1
No ratings yet
jrc120469 Historical Evolution of Ai-V1.1
36 pages
Question Bank Unit 1 PDF
No ratings yet
Question Bank Unit 1 PDF
27 pages
R20-Os-Assignment Questions
100% (1)
R20-Os-Assignment Questions
4 pages
NPTEL-OSCM Assignment 3 2023 Solutions
No ratings yet
NPTEL-OSCM Assignment 3 2023 Solutions
4 pages
Deep Learning Optimized Dictionary Learning and Its Application in Eliminating Strong Magnetotelluric Noise
No ratings yet
Deep Learning Optimized Dictionary Learning and Its Application in Eliminating Strong Magnetotelluric Noise
22 pages
Divide and Conquer PDF
No ratings yet
Divide and Conquer PDF
622 pages
Abdullah Et Al. - 2018 - Intrusion Detection of DoS Attacks in WSNs
No ratings yet
Abdullah Et Al. - 2018 - Intrusion Detection of DoS Attacks in WSNs
6 pages
1.7 The Analytic Hierarchy Process (AHP)
No ratings yet
1.7 The Analytic Hierarchy Process (AHP)
7 pages
Worksheet 4.1: Linear Inequalities in Two Unknowns
No ratings yet
Worksheet 4.1: Linear Inequalities in Two Unknowns
46 pages
Organization of Data Using Table and Graph
No ratings yet
Organization of Data Using Table and Graph
19 pages
MG 443 Lesson 5 Optimal Replacement Decisions
No ratings yet
MG 443 Lesson 5 Optimal Replacement Decisions
38 pages
Experiment-6: Objective: Write Your Own MATLAB Function "Mycirconv" To Compute Circular Convolution
No ratings yet
Experiment-6: Objective: Write Your Own MATLAB Function "Mycirconv" To Compute Circular Convolution
5 pages
Linear Equations With Brackets LESSON
No ratings yet
Linear Equations With Brackets LESSON
2 pages
Unit 11
No ratings yet
Unit 11
37 pages
628430054 Oxford Insight Mathematics 10-5-25 3 AC for NSW Student Book Obook John Ley Michael Fuller Z Lib Org 358
No ratings yet
628430054 Oxford Insight Mathematics 10-5-25 3 AC for NSW Student Book Obook John Ley Michael Fuller Z Lib Org 358
1 page
Gender and Age Detection
No ratings yet
Gender and Age Detection
16 pages
X90747 (Ma7169)
No ratings yet
X90747 (Ma7169)
3 pages
A Real-Time Intelligent System Based on Machine-Learning Methods for Improving Communication in Sign Language
No ratings yet
A Real-Time Intelligent System Based on Machine-Learning Methods for Improving Communication in Sign Language
19 pages
Uji Likelihood Ratio
No ratings yet
Uji Likelihood Ratio
5 pages
SHA_250119_114809
No ratings yet
SHA_250119_114809
9 pages

Algorithm Assignment

Uploaded by

Algorithm Assignment

Uploaded by

Assignment Content

Reading input sequences from file multiple10.txt

You might also like