0% found this document useful (0 votes)
47 views50 pages

Laboratory Manual BTY177: Matlab

The document provides guidelines for a MATLAB laboratory manual. It outlines general instructions for students, including things to bring (the lab manual), dos (write all points discussed, keep manual organized), and don'ts (no sharing manuals, talking in lab, copying results). It also provides an overview of the 10 experiments to be completed in MATLAB, including introductions to MATLAB data structures, file input/output, plotting, script writing, and other tasks.

Uploaded by

Riya Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views50 pages

Laboratory Manual BTY177: Matlab

The document provides guidelines for a MATLAB laboratory manual. It outlines general instructions for students, including things to bring (the lab manual), dos (write all points discussed, keep manual organized), and don'ts (no sharing manuals, talking in lab, copying results). It also provides an overview of the 10 experiments to be completed in MATLAB, including introductions to MATLAB data structures, file input/output, plotting, script writing, and other tasks.

Uploaded by

Riya Kumari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

LABORATORY MANUAL

BTY177

MATLAB
(For private circulation only)

Name of the Student………………………………………..……………………………………..

Registration Number/Roll No…………………………………………………………….……….

Section and Group………………………………………………………………...………………..

School of Bioengineering and Biosciences

1
LMBTY177
General Guidelines for the students:

First practical session and first practical sessions after MTE are for introduction to all the
experiments before and after MTE respectively.

Students are expected to follow the instructions attentively and, before coming to the subsequent
labs they should browse through the database and tools.

Student should attend theory class so that they can clearly understand the theory behind the
experiments. They should also read books prescribed and should keenly observe and understand
the database structures which are demonstrated in class.

Compulsory things to be carried by the students in lab: Lab manual.

Do’s :

1. Do write all the point discussed in lab.


2. Lab manual should contain all experiments, along with Front page; Table of content should be
brought.
3. Keep your cell phone in switch off mode.
4. Always be punctual to lab otherwise no attendance will be awarded.

Don’ts :

1. Don’t Share or copy lab manuals.


2. Don’t talk among yourselves.
3. Don’t copy the result.
4. Don’t roam around in lab.
5. Don’t open restricted sites in lab.

2
LMBTY177
Table of Contents

Sr No: Title Of Experiment Page No:

1. Introduction to Matlab data structure and perform basic arithmetic 4-8


operations on matrices.
2. To read and write in external file using Matlab 9-12

3. Plotting 1d, 2d and subplot graphs in Matlab 13-16

4. Writing scripts in Matlab including loops and logical operators. 17-21

5. To transcribe and translate biological sequence 22-25

6. To design a Matlab script for Dotplot algorithm 26-30

7. To perform global and local sequence alignment 31-37

8. To perform K means clustering on any dataset 38-42

9. To design Matlab script for random walk model simulation 43-46

10. To determine the value of pi using monte-carlo simulation method 47-50

Reference Book(s):
1. MASTERING MATLAB 7 by BRUCE L LITTLEFIELD, DUANE C HANSELMAN,
PEARSON

3
LMBTY177
Experiment 1
1. Experiment: Introduction to Matlab data structure and perform basic arithmetic
operations on matrices.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): To understand the basics of Matlab, data structures and


performing mathematical operations of matrices.

4. Background of the topic: MATLAB (matrix laboratory) is a multi-paradigm numerical


computing environment and proprietary programming language developed by
MathWorks. MATLAB allows matrix manipulations, plotting of functions and data,
implementation of algorithms, creation of user interfaces, and interfacing with programs
written in other languages, including C, C++, C#, Java, Fortran and Python. Although
MATLAB is intended primarily for numerical computing, an optional toolbox uses the
MuPAD symbolic engine, allowing access to symbolic computing abilities. An additional
package, Simulink, adds graphical multi-domain simulation and model-based design for
dynamic and embedded systems.

Matlab's graphical interface is written in Java and should be look similar on any OS. It is
divided into 4 main parts:

b
d

a
c

a. Command Window—this is where you type commands. Output or error messages


usually appear here too.
b. Workspace window—as you define new variables, they should be listed here.

4
LMBTY177
c. Command History window—this is where past commands are remembered. If you
want to re-run a previous command or to edit it you can drag it from this window to
the command window or double click to re-run it.
d. Current Directory window—shows the files in the Current Directory.

5. Outline of the Procedure:


a. Creating Matrices: Everything in MATLAB is stored as a matrix or an array. To
create a 1x1 array, or a scalar, you can write:
a = 5
To create a 4 x 4 matrix, you can use the following syntax.
b = [1 2 3 4; 5 6 7 8; 9 10 11 12; 13 14 15 16]
or
b = [
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
]
 opening and closing square brackets are required for matrices larger than 1x1
 entries in a row are separated by white space or commas
 semicolons (;) mark the ends of rows in the matrix
 if you end any of the above statements with a semicolon (;), you will notice that
MATLAB does not echo the matrix to the display
 if you want to remind yourself of what is in b, you can type b ↩
You can also use MATLAB functions to generate matrices. The following table summarizes a
few of these functions:
function description
zeros(i,j) creates an i x j matrix of zeros
ones(i,j) creates an i x j matrix of ones
rand(i,j) creates an i x j matrix of random elements (between 0 and 1)
randn(i,j) creates an i x j matrix of random elements drawn from a normal distribution
with mean 0 and standard deviation 1
eye(i) creates an i x i identity matrix (a matrix of zeros with ones on the diagonal)
In the above table, i is the number of rows and j is the number of columns. Be careful, this may
confuse you if you are used to thinking in terms of x and y.

5
LMBTY177
b. Basic matrix operations: Given b (below), the following table summarizes a few
basic matrix operations and the results of these functions:

b =
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Function Description Results
sum(b) Sums the values of each column* ans =
28 32 36 40
diag(b) Produces all of the elements along the diagonal ans =
1
6
11
16
b' Transpose operator. The first column becomes the first row, the second ans =
column becomes the second row and so on 1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
max(b) Produces the maximum values for each column* ans =
13 14 15 16
min(b) Produces the minimum values for each column* ans =
1 2 3 4
mean(b) Produces the average of each column ans =
7 8 9 10
mean2(b) Produces the average of all the numbers in the matrix ans =
8.5000
*It is great to know the maximum, minimum or sum of each column, but sometimes you want
it for the entire matrix. If you have a multidimensional matrix, you can turn it into a single
column matrix using the notation b(:) (discussed in the "Colon Operator" section below). For
instance, to get the sum of all the values in b, you can write:

sum(b(:))
which generates the answer below:

ans =
136
6. Result Required: Output at command window should be obtained without errors or
warning.
7. Scope of results: student will be able to understand basic data structure in Matlab and
perform simple functions in it.
8. Caution: the use of semicolon (;) should be judicious.
Suggested readings for students:
Websites
ftp://www.cs.uregina.ca/pub/class/425/MatlabIntro/lesson.html

6
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

7
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

8
LMBTY177
Experiment 2
1. Experiment: To read and write in external file using Matlab.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): To understand the basic concepts of file handling in Matlab.

4. Background of the topic: File handling is the most important part in any programming
language. Abstractly, a file is a collection of bytes stored on a secondary storage device,
which is generally a disk of some kind. The collection of bytes may be interpreted, for
example, as characters, words, lines, paragraphs and pages from a textual document;
fields and records belonging to a database; or pixels from a graphical image. The meaning
attached to a particular file is determined entirely by the data structures and operations
used by a program to process the file. A file is simply a machine decipherable storage
media where programs and data are stored for machine usage. Essentially there are two
kinds of files that programmers deal with text files and binary files. Out of these two
classes, text files are relevant to biological data and hence will be discussed in the
experiment. A text file can be a stream of characters that a computer can process
sequentially. It is not only processed sequentially but only in forward direction. For this
reason a text file is usually opened for only one kind of operation (reading, writing, or
appending) at any given time.

5. Outline of the procedure: On a Windows platform, MATLAB's XLSREAD and


XLSWRITE open Excel as a COM automation server in the background to read/write
data. Sometimes users want XLSREAD and XLSWRITE to do more, and would like
their own custom XLSREAD or XLSWRITE functions (such as a custom XLSWRITE
that puts data into multiple worksheets, and creates Excel charts at the same time).

Syntax  num = xlsread(filename,sheet,xlRange) reads from the


specified worksheet and range.
xlswrite(filename,A,sheet,xlRange) writes to the specified
worksheet and range.

Example

filename = 'testdata.xlsx';
A = {'Time','Temperature'; 12,98; 13,99; 14,97};
sheet = 2;
xlRange = 'E1';
xlswrite(filename,A,sheet,xlRange)

filename = 'testdata.xlsx';
sheet = 1;

9
LMBTY177
xlRange = 'B2:C3';

subsetA = xlsread(filename,sheet,xlRange)

Problem statement
Read the following data from excel file(s) and perform matrix operation of addition
finally storing the result in a new file.
A = {1, 2, 3 ; 4, 5, 6 ; 7, 8, 9};
B = {1, 2, 3 ; 4, 5, 6 ; 7, 8, 9};

6. Result Required: creation of three excel files containing raw data and processed data.
7. Scope of results: Students will be required to read and write data from excel file in
Matlab.
8. Caution: Files should be stored in permanent storage.
Suggested readings for students:
Websites
https://in.mathworks.com/help/matlab/ref/xlsread.html
https://in.mathworks.com/help/matlab/ref/xlswrite.html

10
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

11
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

12
LMBTY177
Experiment 3
1. Experiment: Plotting 1d, 2d and subplot graphs in Matlab.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): To write scripts for plotting data for analytical purpose.

4. Background of the topic: A graph is a planned drawing, consisting of lines and relating
numbers to one another. With the use of color and a little imagination you can quickly
whip up a professional looking graph in no time at all. With technology at your fingertips
you can make use of the computer. When doing calculations in everyday life we need the
basic knowledge of making use of graphs. It is not just for those that excel in math, but
for every student to use according to their needs.
When doing analysis of any kind, we need to make use of structure. This will be done by
using a graph. Graphing is used daily. From stockbrokers to performance evaluation in
companies. All use them to boost sales and meet deadlines.

To plot the graph of a function, you need to take the following steps −
Define x, by specifying the range of values for the variable x, for which the function is to
be plotted
Define the function, y = f(x)

5. Outline of the procedure:


Following example would demonstrate the concept. Let us plot the simple function y = x
for the range of values for x from 0 to 100, with an increment of 5.

Create a script file and type the following code –

x = [0:5:100];
y = x;
plot(x, y)

plot the function y = x2

x = [1 2 3 4 5 6 7 8 9 10];
x = [-100:20:100];
y = x.^2;
plot(x, y)

MATLAB allows you to add title, labels along the x-axis and y-axis, grid lines and also
to adjust the axes to spruce up the graph.
x = [0:0.01:10];
13
LMBTY177
y = sin(x);
plot(x, y), xlabel('x'), ylabel('Sin(x)'), title('Sin(x)
Graph'),
grid on, axis equal

Drawing multiple functions on same graph

x = [0 : 0.01: 10];
y = sin(x);
g = cos(x);
plot(x, y, x, g, '.-'), legend('Sin(x)', 'Cos(x)')

Generating subplot

x = [0:0.01:5];
y = exp(-1.5*x).*sin(10*x);
subplot(1,2,1)
plot(x,y), xlabel('x'),ylabel('exp(–
1.5x)*sin(10x)'),axis([0 5 -1 1])
y = exp(-2*x).*sin(10*x);
subplot(1,2,2)
plot(x,y),xlabel('x'),ylabel('exp(–2x)*sin(10x)'),axis([0 5
-1 1])

6. Result Required: Plots and subplots


7. Scope of results: Students will be able to learn data visualization via plotting techniques.
8. Caution: Using subplots one should understand the declaration of syntax.

Suggested readings for students:


Websites
https://www.tutorialspoint.com/matlab/matlab_plotting.htm

14
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

15
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

16
LMBTY177
Experiment 4
1. Experiment: Writing scripts in Matlab including loops and logical operators.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): To understand the basics of programming including loops and


logical operator.

4. Background of the topic: A relational operator compares two numbers by determining


whether a comparison statement is true or false. If the statement is true, it is assigned a
value of 1, and a value of 0 otherwise.

Relational Operators
 < less than
 >greater than
 <= less than or equal to
 >= greater than or equal to
 == equal to
 ~= not equal to

Logical Operators
 & AND operates on two operands (A and B). If both are true, the result is true (1)
and false (0) otherwise
 | OR operates on two operands (A and B). If either one, or both are true, the result
is true (1), otherwise (both are false) the result is false (0)
 ~ NOT operates on one operand. It gives the opposite of the operand.

Logical operators have numbers as operands. A nonzero number is true, and a zero
number is false. Logical operations can be used with scalars and arrays (like relational
operators).
Conditional Statements
A conditional statement is a command that allows MATLAB to make a decision of
whether to execute a group of commands that follow the conditional statement, or to skip
these commands.
 The conditional statements are logical expressions that control the operation of if
construct.
 If CONDITIONAL_STATEMENT1 is true (returns 1 when it is non-zero), then
the program executes the expressions in the first block and skips to the end.
 Otherwise, the program checks for the result of
CONDITIONAL_STATEMENT2. If it is true, then the program executes the
expressions in the second block and skips to the end.
17
LMBTY177
 If all conditional statements return zero (false), then the program executes the
expressions in the block associated with the else clause.
 There can be any number of elseif clauses (0 or more) in an if construct, but there
can be at most one else clause.
 If all conditional statements are false, and there is no else clause, then the program
skips to the end without executing any part of the if construct.
 Note that the keyword end in this contruct is completely different from the
MATLAB function end we use for indexing arrays (to return the last entry of a
vector). MATLAB can tell the difference the two usages of end from the context.
Loops
Loops are MATLAB constructs that permit us to execute a sequence of statements more
than once. MATLAB has two kind of loops: for and while loops. The major difference
between these two types is in how the repetition is controlled. The code in for loop is
repeated a specified number of times. The number of repetitions is known before the loop
starts. On the other hand, the code in a while loop is repeated an indefinite number of
times until some user-specified condition is satisfied.
5. Outline of the procedure:
The logical operators in Matlab are <, >, <=, >=, == (logical equals), and ~= (not
equal). These are binary operators which return the values 0 and 1 (for scalar
arguments)

Matlab has a standard if-elseif-else conditional; for example:


>> t = rand(1);
>> if t > 0.75.
s = 0;
elseif t < 0.25
s = 1;
else
s = 1-2*(t-0.25);
end
>> s
s =
0
>> t
t =
0.7622

Example is to show how to find the indices of the entries of a vector x that are
nonnegative:
x = randint(1,5,[-3,2]);
jj = 0;
for ii=1:length(x)
18
LMBTY177
if (x(ii) >= 0)
jj = jj+1;
IND(jj) = ii;
end
end

Problem statement

Create a 100-element vector containing the values 1,2,...,100. Then take the square root of
all elements whose value is greater than 30 using a for loop and if construct.

6. Result Required: Program involving if and for statements.


7. Scope of results: Students will be able to learn utilization of conditional statement and
loops.
8. Caution: Nesting of loop should be avoided as it may fix the program in infinite loops.

Suggested readings for students:


Websites
http://jitkomut.eng.chula.ac.th/matlab/loop.html

19
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

20
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

21
LMBTY177
Experiment 5
1. Experiment: To transcribe and translate biological sequence.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): i) Student will be able understand central dogma.


ii) Use of computers in basic bioinformatics will be highlighted.

4. Background of the topic: i) Transcription is the first step of gene expression, in which
a particular segment of DNA is copied into RNA (especially mRNA) by the enzyme RNA
polymerase. Both DNA and RNA are nucleic acids, which use base pairs of nucleotides
as a complementary language. During transcription, a DNA sequence is read by an RNA
polymerase, which produces a complementary, antiparallel RNA strand called a primary
transcript.
In general RNA is different from DNA at positions where there is Thymine in DNA there
is Uracil in RNA. For a simple transcription to perform one need to suitably substitute
thymine with uracil.
ii) In translation, messenger RNA (mRNA) is decoded in a ribosome, outside the
nucleus, to produce a specific amino acid chain, or polypeptide. The mRNA carries
genetic information encoded as a ribonucleotide sequence from the chromosomes to
the ribosomes. The ribonucleotides are "read" by translational machinery in a sequence
of nucleotide triplets called codons. Each of those triplets codes for a specific amino
acid. In PERL programing, RNA sequence is converted into protein sequence by
substituting equivalent amino acid characters to triplet characters of RNA. In this
program, we will use associative array (also known as hash array) to associate triplet
characters with amino acid character.
The associate array corresponding to codon table is arranged to 20 amino acid character.
The triplet codon table is shown below:

22
LMBTY177
5. Outline of the procedure: SeqRNA = dna2rna(SeqDNA) converts a DNA sequence
to an RNA sequence by converting any thymine nucleotides (T) in the DNA sequence
to uracil nucleotides (U). The RNA sequence is returned in the same format as the
DNA sequence. For example, if SeqDNA is a vector of integers, then so is SeqRNA.
dna = randseq(100)
rna = dna2rna(dna)

If you wish to get the DNA sequence back then use rna2dna.
SeqAA = nt2aa(SeqNT) converts a nucleotide sequence, specified by SeqNT, to an
amino acid sequence, returned in SeqAA, using the standard genetic code.
SeqAA = nt2aa(SeqNT, ...'PropertyName', PropertyValue, ...) calls nt2aa with
optional properties that use property name/property value pairs. You can specify one
or more properties in any order. Each PropertyName must be enclosed in single
quotation marks and is case insensitive. These property name/property value pairs are
as follows:
SeqAA = nt2aa(..., 'Frame', FrameValue, ...) converts a nucleotide sequence for a
specific reading frame to an amino acid sequence. Choices are 1, 2, 3, or 'all'. Default
is 1. If FrameValue is 'all', then output SeqAA is a 3-by-1 cell array.
SeqAA = nt2aa(..., 'GeneticCode', GeneticCodeValue, ...) specifies a genetic code to
use when converting a nucleotide sequence to an amino acid sequence.
GeneticCodeValue can be an integer or string specifying a code number or code name
from the table Genetic Code. Default is 1 or 'Standard'. The amino acid to nucleotide
codon mapping for the Standard genetic code is shown in the table Standard Genetic
Code.
SeqNT = aa2nt(SeqAA) converts an amino acid sequence, specified by SeqAA, to a
nucleotide sequence, returned in SeqNT, using the standard genetic code.
Problem statement
Download 10 gene(s) from NCBI and translate and transcribe it into a protein sequence
automatically.
6. Result Required: Translated protein sequence(s).
7. Scope of results: Students will be able to learn automation using bioinformatics
toolbox in matlab.
8. Caution: File names should be used judiciously for its fetching.
Suggested readings for students:
Websites
https://in.mathworks.com/help/bioinfo/ref/dna2rna.html
https://in.mathworks.com/help/bioinfo/ref/nt2aa.html
23
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

24
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

25
LMBTY177
Experiment 6
1. Experiment: To design a Matlab script for Dotplot algorithm.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): Analyze sequence by pairwise comparison and plotting.


ii) Understand the basics of Bioinformatics

4. Background of the topic: The dot plot technique dates back to the late 1960’s and early
1970’s (Fitch 1969, Gibbs & McIntyre 1970) and was developed in the field of genetics.
Today, the most powerful application for dot plots is still the domain where they initially
emerged from (e.g. Gibbs & McIntyre 1970, Maizel & Lenk 1981, Staden 1982, Brown
et al. 1995, Dunham et al. 1999). Since the early approaches more sophisticated software
tools have been designed (e.g. Staden 1982, Junier & Pagni 2000, Krumsiek et al. 2007)
and continuing technological progress allows for increasingly faster and more complex
analyses. Several state of the art programs have implemented dot plot modules for
homology comparisons and pattern recurrence detection (e.g. STADEN, Geneious,
DNAstar, BioEdit). Dot plot applications are particularly useful in the identification of
interspersed repeats such as transposons and tandem-repeat motifs such as microsatellites
(Leese et al. in press). Furthermore, loss or gain of whole motifs can easily be spotted in
different types of domains, a trait useful in characterising the evolution of certain protein
families (Beaussart et al. 2007). Dot plots are also employed in the investigation of
properties of protein coding sequences by predicting secondary structures, like stem-loop
formation or structural RNA domains (e.g. Guttell et al. 1993). In the field of DNA based
watermarks adopted algorithms can support identification of patent infringements of
genetically modified organisms (e.g. Heider & Barnekow 2007) during routine screening.

5. Outline of the procedure: a) Initiation: Declaration of the sequences and parameters.


b) Comparison of two sequences character by character or via word size.
c) Plotting the positions where the character/words are similar.

Interpretation of plots: a) A continuous main diagonal shows perfect similarity for


symbols with the same indices.
b) Parallels to the main diagonal indicate repeated regions in the same reading direction
on different parts of the sequences. In this case a region D is found twice in the sequence
(D1, D2, so called ‘duplications’).
c) Lines perpendicular to the main diagonal indicate palindromic areas. In this case the
sequence is completely palindromic in the displayed area. As an example the latin
sentence ‘SATOR AREPO TENET OPERA ROTAS’ might be consulted.
d) Partially palindromic sequence (For DNA sequences this refers to a perfect match of
the normal strand with its reverse complement, which is frequently found for many
transposable elements.

26
LMBTY177
e) Bold blocks on the main diagonal indicate repetition of the same symbol in both
sequences, e.g. (G)50, so called microsatellite repeats

f) Parallel lines indicate tandem repeats of a larger motif in both sequences, e.g.
(AGCTCTGAC)20, so called minisatellite patterns. The distance between the diagonals
equals the distance of the motif.
g) When the diagonal is a discontinuous line this indicates that the sequences T1 and T2
share a common source. In literal analyses we may have to deal with plagiarism or in
DNA analyses sequences may be homologous because of a common ancestor. The
number of interruptions increases with modifications on the text or the time of
independent evolution and mutation rate.
h) Partial deletion in sequence 1 or insertion in sequence 2, so called ‘indel’. In protein
coding sequences this can be often observed for many different types of domains, which
got lost or substituted during evolution (Beaussart et al. 2007). Also comparing mRNA
(cDNA) sequences without introns (T1) against the unspliced DNA sequence (T2)
generally yields this picture.

Problem statement
Design a dotmatrix plot program for comparing two nucleotide sequences and interpret
the results.

6. Result Required: Functional Dot plot matrix program.


7. Scope of results: Students will be able to compare and analyze biological sequences
using dot matrix plot methods.
8. Caution: The importance of word size as a parameter should be well understood.

27
LMBTY177
Suggested readings for students:
Websites
http://www.code10.info/index.php%3Foption%3Dcom_content%26view%3Darticle%26id
%3D64:inroduction-to-dot-plots%26catid%3D52:cat_coding_algorithms_dot-
plots%26Itemid%3D76

28
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

29
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

30
LMBTY177
Experiment 7
1. Experiment: To perform global and local sequence alignment.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): i) To acquaint students with the pairwise sequence alignment.


ii) To help students understand difference between global and local alignment.

4. Background of the topic: Closely related sequences which are of same length are very
much appropriate for global alignment. Here, the alignment is carried out from beginning
till end of the sequence to find out the best possible alignment. The Needleman–Wunsch
algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.
It was published in 1970 by Saul B. Needleman and Christian D. Wunsch it uses dynamic
programming, and was the first application of dynamic programming to biological
sequence comparison. It is sometimes referred to as the optimal matching algorithm. The
Smith–Waterman algorithm performs local sequence alignment; that is, for determining
similar regions between two strings or nucleotide or protein sequences. Instead of looking
at the total sequence, the Smith– Waterman algorithm compares segments of all possible
lengths and optimizes the similarity measure.

5. Outline of the procedure: The first application of dynamic programming to


biological sequence alignment (both DNA and protein) was by Needleman and
Wunsch. This and related algorithms have been in use since then for the detection of
similarities and the alignment of sequence information from protein families. The
dynamic programming algorithm finds the optimal alignment through the
construction of a score matrix. The path which resulted in the score in the last
row/column is traced back in reverse to generate the alignment. In the case of the
Needleman/Wunsch algorithm this is a global alignment.

Global sequence alignment (Needleman-Wunch algorithm)


The algorithm consists of three basic steps:
i) Initialization step
ii) Matrix fill step
iii) Traceback step

Initialization step: The first step in the global alignment dynamic programming approach is
to create a matrix with M + 1 columns and N + 1 rows where M and N correspond to the
size of the sequences to be aligned.

The first row and first column of the matrix can be initially filled with 0.

31
LMBTY177
Matrix fill step: One possible (inefficient) solution of the matrix fill step finds the
maximum global alignment score by starting in the upper left hand corner in the matrix
and finding the maximal score Mi,j for each position in the matrix. In order to find Mi,j for
any i,j it is minimal to know the score for the matrix positions to the left, above and
diagonal to i, j. In terms of matrix positions, it is necessary to know Mi-1,j, Mi,j-1 and Mi-1, j-
1.

For each position, Mi,j is defined to be the maximum score at position i,j; i.e.

Mi,j = MAXIMUM[
Mi-1, j-1 + Si,j (match/mismatch in the diagonal),
Mi,j-1 + w (gap in sequence #1),
Mi-1,j + w (gap in sequence #2)]

Note that in the example, Mi-1,j-1 will be red, Mi,j-1 will be green and Mi-1,j will be blue.

Using this information, the score at position 1,1 in the matrix can be calculated. Since the
first residue in both sequences is a G, S1,1 = 2, and by the assumptions stated earlier, w = -
2. Thus, M1,1 = MAX[M0,0 + 2, M1,0 - 2, M0,1 - 2] = MAX[2, -2, -2].

A value of 2 is then placed in position 1,1 of the scoring matrix. Note that there is also an
arrow placed back into the cell that resulted in the maximum score, M[0,0].

Moving down the first column to row 2, we can see that there is once again a match in both
sequences. Thus, S1,2 = 2. So M1,2 = MAX[M0,1 + 2, M1,1 - 2, M0,2 -2] = MAX[0 + 2, 2 -
2, 0 - 2] = MAX[2, 0, -2].
32
LMBTY177
A value of 2 is then placed in position 1,2 of the scoring matrix and an arrow is placed to
point back to M[0,1] which led to the maximum score.

The rest of the score matrix can then be filled in. The completed score matrix will be as
follows:

Traceback step: After the matrix fill step, the maximum global alignment score for the
two sequences is 3. The traceback step will determine the actual alignment(s) that result
in the maximum score. The traceback step begins in the M,J position in the matrix, i.e. the
position where both sequences are globally aligned.

Since we have kept pointers back to all possible predacessors, the traceback step is simple.
At each cell, we look to see where we move next according to the pointers. To begin, the
only possible predacessor is the diagonal match.

33
LMBTY177
This gives us an alignment of
A
|
A

Note that the blue letters and gold arrows indicate the path leading to the maximum score.

We can continue to follow the path using a single pointer until we get to the following
situation.

The alignment at this point is


T C A G T T A
| | | |
T C _ G _ _ A

Alignment score = Match score* #matches + Mismatch score*#mismatches + Gap


penalty*#gaps
Matlab way of computing global sequence alignment
[Score, Alignment] = nwalign(Seq1,Seq2)
Returns a 3-by-N character array showing the two sequences, Seq1 and Seq2, in the first
and third rows, and symbols representing the optimal global alignment for them in the
second row. The symbol | indicates amino acids or nucleotides that match exactly. The
symbol : indicates amino acids or nucleotides that are related as defined by the scoring
matrix (nonmatches with a zero or positive scoring matrix value).
Local Sequence alignment (Smith Waterman algorithm)
The Smith–Waterman algorithm performs local sequence alignment; that is, for
determining similar regions between two strings of nucleic acid sequences or protein
sequences. Instead of looking at the entire sequence, the Smith–Waterman algorithm
compares segments of all possible lengths and optimizes the similarity measure.
The main difference to the Needleman–Wunsch algorithm is that negative scoring matrix
cells are set to zero, which renders the (thus positively scoring) local alignments visible.

34
LMBTY177
Traceback procedure starts at the highest scoring matrix cell and proceeds until a cell with
score zero is encountered, yielding the highest scoring local alignment.
Matlab way of computing local sequence alignment
[Score, Alignment] = swalign(Seq1, Seq2)
Returns a 3-by-N character array showing the two sequences, Seq1 and Seq2, in the first
and third rows, and symbols representing the optimal local alignment between them in the
second row. The symbol | indicates amino acids or nucleotides that match exactly. The
symbol : indicates amino acids or nucleotides that are related as defined by the scoring
matrix (nonmatches with a zero or positive scoring matrix value).
Problem statement
Download two nucleotide/protein sequences from any database of your choice and compare
them using global and local sequence alignment.
6. Result Required: Functional Dot plot matrix program.
7. Scope of results: Students will be able to compare and analyze biological sequences
using dot matrix plot methods.
8. Caution: The importance of word size as a parameter should be well understood.
Suggested readings for students:
Websites
http://cse.iitkgp.ac.in/conf/CBBH/lectures/HandsOn_MATLAB_Session2/tutorials/DNA%
20Sequence%20Alignment.xhtml
https://in.mathworks.com/help/bioinfo/ref/swalign.html
https://in.mathworks.com/help/bioinfo/ref/nwalign.html

35
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

36
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

37
LMBTY177
Experiment 8
1. Experiment: To perform K means clustering on any dataset.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): i) To make students understand the use and utility of clustering
ii) Students will be able to use internal functions of Matlab for clustering purpose.

4. Background of the topic: K-means clustering is a type of unsupervised learning, which


is used when you have unlabeled data (i.e., data without defined categories or groups).
The goal of this algorithm is to find groups in the data, with the number of groups
represented by the variable K. The algorithm works iteratively to assign each data point
to one of K groups based on the features that are provided. Data points are clustered based
on feature similarity. The results of the K-means clustering algorithm are:
a) The centroids of the K clusters, which can be used to label new data
b) Labels for the training data (each data point is assigned to a single cluster)
Rather than defining groups before looking at the data, clustering allows you to find and
analyze the groups that have formed organically. The "Choosing K" section below
describes how the number of groups can be determined.
Each centroid of a cluster is a collection of feature values which define the resulting
groups. Examining the centroid feature weights can be used to qualitatively interpret what
kind of group each cluster represents.

Algorithm
The Κ-means clustering algorithm uses iterative refinement to produce a final result. The
algorithm inputs are the number of clusters Κ and the data set. The data set is a collection
of features for each data point. The algorithms starts with initial estimates for the Κ
centroids, which can either be randomly generated or randomly selected from the data
set. The algorithm then iterates between two steps:

1. Data assignment step:


Each centroid defines one of the clusters. In this step, each data point is assigned to its
nearest centroid, based on the squared Euclidean distance. More formally, if ci is the
collection of centroids in set C, then each data point x is assigned to a cluster based on

where dist( · ) is the standard (L2) Euclidean distance. Let the set of data point
assignments for each ith cluster centroid be Si.

2. Centroid update step:


In this step, the centroids are recomputed. This is done by taking the mean of all data
points assigned to that centroid's cluster.

38
LMBTY177
The algorithm iterates between steps one and two until a stopping criteria is met (i.e., no
data points change clusters, the sum of the distances is minimized, or some maximum
number of iterations is reached).

This algorithm is guaranteed to converge to a result. The result may be a local optimum
(i.e. not necessarily the best possible outcome), meaning that assessing more than one run
of the algorithm with randomized starting centroids may give a better outcome.

5. Outline of procedure:
a. Clean and transform your data
Visualize the data to review it before proceeding for example

load fisheriris
X = meas(:,3:4);

figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

b. Choose K and run the algorithm

x1 = min(X(:,1)):0.01:max(X(:,1));
x2 = min(X(:,2)):0.01:max(X(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on
the plot

idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);

c. Review the results

figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...
[0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

39
LMBTY177
legend('Region 1','Region 2','Region
3','Data','Location','SouthEast');
hold off;

6. Result Required: Clusters and cluster accuracies.


7. Scope of results: Students will be able to see how data science can be helpful in real life
scenarios.
8. Caution: Appropriate value of K should be determined pre-handedly.
Suggested readings for students:
Websites
https://in.mathworks.com/help/stats/kmeans.html
https://www.datascience.com/blog/k-means-clustering

40
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

41
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

42
LMBTY177
Experiment 9
1. Experiment: To design Matlab script for random walk model simulation.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): i) To make students understand the concepts of modeling and


simulation.
ii) Designing a script and function for the purpose of simulation.

4. Background of the topic: A random walk is a mathematical object, known as a stochastic


or random process that describes a path that consists of a succession of random steps on
some mathematical space such as the integers. An elementary example of a random walk
is the random walk on the integer number line, which starts at 0 and at each step moves
+1 or −1 with equal probability. Other examples include the path traced by a molecule as
it travels in a liquid or a gas, the search path of a foraging animal, the price of a fluctuating
stock and the financial status of a gambler can all be approximated by random walk
models, even though they may not be truly random in reality. As illustrated by those
examples, random walks have applications to many scientific fields including ecology,
psychology, computer science, physics, chemistry, biology as well as economics.
Random walks explain the observed behaviors of many processes in these fields, and thus
serve as a fundamental model for the recorded stochastic activity.

5. Outline of the procedure: A popular random walk model is that of a random walk
on a regular lattice, where at each step the location jumps to another site according to
some probability distribution. In a simple random walk, the location can only jump
to neighboring sites of the lattice, forming a lattice path. In simple symmetric random
walk on a locally finite lattice, the probabilities of the location jumping to each one
of its immediate neighbors are the same. The best studied example is of random walk
on the d-dimensional integer lattice.

a) We start with the simplest random walk. Take the lattice Zd. We start at the origin.
At each time step we pick one of the 2d nearest neighbors at random (with equal
probability) and move there.

43
LMBTY177
b) Repeat this step until you reach either at the end of the lattice of the maximum
number of steps is being utilized.
c) Compute the probability of each node in lattice to be traversed by that randon
walker.

6. Result Required: Graph showing final random walk after 1000 steps.

7. Scope of results: To understand the importance of random processes in biology as


well as other fields.

8. Caution: One should take care that random walker should not go into infinite process.

Suggested readings for students:


Websites
http://mathworld.wolfram.com/RandomWalk2-Dimensional.html

44
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

45
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

46
LMBTY177
Experiment 10
1. Experiment: To determine the value of pi using monte-carlo simulation method.

2. Apparatus required: Computer with Matlab installed

3. Learning objective(s): i) To make students understand the concepts of modeling and


simulation.
ii) Designing a script and function for the purpose of simulation.

4. Background of the topic: Monte Carlo methods (or Monte Carlo experiments) are a
broad class of computational algorithms that rely on repeated random sampling to obtain
numerical results. Their essential idea is using randomness to solve problems that might
be deterministic in principle. They are often used in physical and mathematical problems
and are most useful when it is difficult or impossible to use other approaches. Monte
Carlo methods are mainly used in three problem classes: optimization, numerical
integration, and generating draws from a probability distribution.
In physics-related problems, Monte Carlo methods are useful for simulating systems with
many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled
solids, and cellular structures. Other examples include modeling phenomena with
significant uncertainty in inputs such as the calculation of risk in business and, in math,
evaluation of multidimensional definite integrals with complicated boundary conditions.
In application to space and oil exploration problems, Monte Carlo–based predictions of
failure, cost overruns and schedule overruns are routinely better than human intuition or
alternative "soft" methods.

5. Outline of procedure: Estimation of Pi


The idea is to simulate random (x, y) points in a 2-D plane with domain as a square of
side 1 unit. Imagine a circle inside the same domain with same diameter and inscribed
into the square. We then calculate the ratio of number points that lied inside the circle
and total number of generated points. Refer to the image below:

47
LMBTY177
We know that area of the square is 1 unit sq while that of circle is
1 2
area of circle = π ∗ ( ) = π/4
2
Now for a very large number of generated points,

𝑎𝑟𝑒𝑎 𝑜𝑓 𝑐𝑖𝑟𝑐𝑙𝑒 # 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛𝑠𝑖𝑑𝑒 𝑡ℎ𝑒 𝑐𝑖𝑟𝑐𝑙𝑒


=
𝑎𝑟𝑒𝑎 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑
That is

# 𝑝𝑜𝑖𝑛𝑡𝑠 𝑖𝑛𝑠𝑖𝑑𝑒 𝑡ℎ𝑒 𝑐𝑖𝑟𝑐𝑙𝑒


π=4∗
𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑝𝑜𝑖𝑛𝑡𝑠 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒𝑑
The Algorithm
1. Initialize circle_points, square_points and interval to 0.
2. Generate random point x.
3. Generate random point y.
4. Calculate d = x*x + y*y.
5. If d <= 1, increment circle_points.
6. Increment square_points.
7. Increment interval.
8. If increment < NO_OF_ITERATIONS, repeat from 2.
9. Calculate pi = 4*(circle_points/square_points).
10. Terminate.

6. Result Required: The value of pi should be correctly estimated upto 3 decimal


places.

7. Scope of results: To understand the process of numerical processes using monte-


carlo simulation methods.

8. Caution: For exact estimation one should place large number of random dots.
Suggested readings for students:
Websites
https://www.geeksforgeeks.org/estimating-value-pi-using-monte-carlo/

48
LMBTY177
Worksheet of the student
Date of Performance: Registration Number:

Aim:

Observations/code generated:

Remarks/error(s) obtained (if any):

Result and Discussion

49
LMBTY177
Learning Outcome

To be filled by faculty

S. No. Parameter Marks Obtained Max. Marks


1. Understanding of the student about the 20
procedure/apparatus
2. Observations and analysis including learning 20
outcomes.
3. Completion of the experiment, discipline and 10
cleanliness
4. Signature of the faculty

50
LMBTY177

You might also like