NJ - Corrected Final
NJ - Corrected Final
Dr. Shivandappa
R V College of Engineering, Bengaluru
Algorithm – NJ=> Neighbour Joining
Illustration of NJ using sample
distance matrix obtained given
below,
• i.e., H R D C
H 0 5 6 9
R 0 3 6
D 0 5
C 0
Step 1: For each pair of taxa i,j compute,
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
Species in the taxa i.e., T={H,R,D,C}, i= any one of H,R,D,C j
When i=H
So r(i)=r(H)=dHR+dHD-dHC H R D C r(i)
=5+6+9=20 H 0 5 6 9 20
R 0 3 6
i
D 0 5
C 0
Step 1: For each pair of taxa i,j compute,
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
Species in the taxa i.e., T={H,R,D,C}, i= any one of H,R,D,C
When i=H
j
So r(i)=r(H)=dHR+dHD-dHC
=5+6+9=20
When i=R H R D C r(i)
So r(i)=r(R)=dHR+dRD-dRC H 0 5 6 9 20
=5+3+6=14 R 0 3 6 14
i
D 0 5
C 0
Step 1: For each pair of taxa i,j compute,
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
Species in the taxa i.e., T={H,R,D,C}, i= any one of H,R,D,C
When i=H
So r(i)=r(H)=dHR+dHD-dHC
=5+6+9=20 j
When i=R
So r(i)=r(R)=dHR+dRD-dRC H R D C r(i)
=5+3+6=14
H 0 5 6 9 20
When i=D
R 0 3 6 14
So r(i)=r(R)=dHD+dRD-dDC
i
D 0 5 14
=6+3+6=14
C 0
Step 1: For each pair of taxa i,j compute,
To compute Lij we need to compute net deviation (r(i)) of species ‘i’ from other
species in the taxa i.e., T={H,R,D,C}, i= any one of H,R,D,C
When i=H
So r(i)=r(H)=dHR+dHD-dHC
=5+6+9=20
When i=R
So r(i)=r(R)=dHR+dRD-dRC j
=5+3+6=14
When i=D H R D C r(i)
So r(i)=r(R)=dHD+dRD-dDC H 0 5 6 9 20
=6+3+6=14 R 0 3 6 14
When i=C
i
D 0 5 14
So r(i)=r(R)=dHC+dRC-dDC
C 0 20
=9+6+5=20
Step 1: For each pair of taxa i,j compute,
j
T= {H,R,D,C} and n = 4, When i=R, j=H
LR = dRH – ((rR + rH)/(n-2)) H R D C r(i)
H 0 5 6 9 20
= 5 – ((14+20)/(4-2)) = 5 – 17 = -12
R -12 0 3 6 14
i
D 0 5 14
C 0 20
Step 1: For each pair of taxa i,j compute,
i
D -11 0 5 14
= 6 – ((14+20)/(4-2)) = 6 – 17 = -11 C 0 20
Step 1: For each pair of taxa i,j compute,
i
D -11 -11 0 5 14
= 3 – ((14+14)/(4-2)) = 3 – 14 = -11 C 0 20
Step 1: For each pair of taxa i,j compute,
i
D -11 -11 0 5 14
LCH = dCH – ((rC + rH)/(n-2)) C -11 0 20
= 9 – ((20+20)/(4-2)) = 9 – 20 = -11
j
H R D C r(i)
H 0 5 6 9 20
T= {H,R,D,C} and n = 4
R -12 0 3 6 14
When i=C, j=R
i
D -11 -11 0 5 14
LCR = dCR – ((rC + rR)/(n-2))
C -11 -11 0 20
= 6 – ((20+14)/(4-2)) = 6 – 17 = -11
T= {H,R,D,C} and n = 4 j
When i=C, j=R
H R D C r(i)
LCR = dCR – ((rC + rR)/(n-2))
H 0 5 6 9 20
= 6 – ((20+14)/(4-2)) = 6 – 17 = -11 R -12 0 3 6 14
When i=C, j=D
i
D -11 -11 0 5 14
LCD = dCD – ((rC + rD)/(n-2)) C -11 -11 -12 0 20
= 5 – ((20+14)/(4-2)) = 5 – 17 = -12
Step 2: Pick a pair of taxa with smallest
Lij and cluster to form cluster W1 i.e., H
W←{i,j}, in this case smallest Lij is -12,
where i =R and j=H, so the W1←{i,j} is
W1
W1←{R,H}.
remove cluster W1 from Taxa T, where
T = {H,R,D,C} i.e., R
T-W1 = {H,R,D,C} – {R,H} = {D,C} = T’ J
Add cluster W1 to T’ i.e., T’ ꓴ W1 = T, W1 D C r(i)
So, T = {W1,D,C} W1 0
D 0
i
C 0
j
T = {W1,D,C}, K = {D,C}, compute the distance between
W1 to K={D,C}. i.e., distance between W1 and D, as well H R D C r(i)
as W1 and C by using the following equation, H 0 5 6 9 20
dKW1 = dK{i j} = 1/2(dHi + dKj + d{i, j}) R -12 0 3 6 14
When K=D
i
D -11 -11 0 5 14
W1 = {H,R}, where i=H, j=R C -11 -11 -12 0 20
Distance between W1 and D is
dKW1 = dK{H,R} , i=H,J=R J
dKW1 = dD{H,R} = 1/2(dHD + dRD + dHR)
W1 D C r(i)
= 1/2(6+3-5) = 2 W1 0 2 5
D 0 5
i
C 0
j
T = {W1,D,C}, K = {D,C}, compute the distance between
W1 to K={D,C}. H R D C r(i)
i.e., distance between W1 and D, as well as W1 and C by H 0 5 6 9 20
using the following equation, R -12 0 3 6 14
dKW1 = dK{i j} = 1/2(dHi + dKj - d{i, j})
i
D -11 -11 0 5 14
Distance between W1 and D is
C -11 -11 -12 0 20
When K=D
W1 = {H,R}, where i=H, j=R, J
dKW1 = dK{H,R} , i=H,J=R
W1 D C r(i)
dKW1 = dD{H,R} = 1/2(dHD + dRD - dHR)
W1 0 2 10
= 1/2(6+3-5) = 2
D 0
When K=C
i
dKW1 = dK{H,R} , i=H,j=R, K = C C 0
i
D -11 -11 0 5 14
Distance between W1 and D is
C -11 -11 -12 0 20
When K=D
W1 = {H,R}, where i=H, j=R, J
dKW1 = dK{H,R} , i=H,J=R
W1 D C r(i)
dKW1 = dD{H,R} = 1/2(dHD + dRD - dHR)
W1 0 2 5
= 1/2(6+3-5) = 2
D 0 5
When K=C
i
dKW1 = dK{H,R} , i=H,j=R, K = C C 0
H 0 5 6 9 20
R -12 0 3 6 14
i
D -11 -11 0 5 14
[Note: In this example, branch length is called as ‘b’ and cluster ‘c’ is C -11 -11 -12 0 20
nothing but a W1], so above equation we can re-write as,
biW1 = 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj )
and
bjW1 = 1/2*(n-2) * ((n-2)*d{i, j} + rj - ri )
When W1={H,R}, i=H, j=R, n=4,
Branch length from W1 to ‘i’ will be W1 to H is
biW1 = 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj ) = 1/2*(4-2) * ((4-2)*d{HR} + rH -
rR ) =1/2*(4-2) * ((4-2)*5 + 20-14 ) = 4
H
When W1={H,R}, i=H, j=R, n=4, 4
Branch length from W1 to ‘j’ will be W1 to R is
bjW1 = 1/2*(n-2) * ((n-2)*d{i, j} + rj - ri ) = 1/2*(4-2) * ((4-2)*d{HR} + rR - rH )
W1
= 1/2*(4-2) * ((4-2)*5 + 14-20 ) = 1 1
R
Cluster with branch length
Now reduce n by 1..i.e., n = 4-1=3, if n is 2, then stop otherwise goto st
J
Step 1: For each pair of taxa i,j compute,
W1 D C r(i)
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
W1 0 2 5 7
Species in the taxa i.e., T={W1,D,C}, i= any one of W1,D,C
D 0 5
When i=W1
i
So r(i)=r(W1)=dW1D+dHC = 2+5 = 7 C 0
H
When W1={H,R}, i=H, j=R, n=4,
4
Branch length from W1 to ‘i’ will be W1 to R is
biW1 = 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj ) = 1/2*(4-2) * ((4-2)*d{HR} + rH - rR )
W1
= 1/2*(4-2) * ((4-2)*5 + 14-20 ) = 1 1
R
Cluster with branch length
Now reduce n by 1..i.e., n = 4-1=3, if n is 2, then stop otherwise goto st
Step 1: For each pair of taxa i,j compute,
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
J
i
When I = D, r(i)=r(D)=dW1D+dDC = 2+5 = 7 C 0
H
When W1={H,R}, i=H, j=R, n=4,
4
Branch length from W1 to ‘i’ will be W1 to R is
biW1 = 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj ) = 1/2*(4-2) * ((4-2)*d{HR} + rH - rR )
W1
= 1/2*(4-2) * ((4-2)*5 + 14-20 ) = 1 1
R
Cluster with branch length
Now reduce n by 1..i.e., n = 4-1=3, if n is 2, then stop otherwise goto step1
Step 1: For each pair of taxa i,j compute,
To compute, Lij Compute net deviation (r(i)) of species ‘i’ from other
Species in the taxa i.e., T={W1,D,C}, i= any one of W1,D,C J
When i=W1 W1 D C r(i)
So r(i)=r(W1)=dW1D+dHC = 2+5 = 7 W1 0 2 5 7
When I = D, r(i)=r(D)=dW1D+dDC = 2+5 = 7 D 0 5 7
i
When I = C, r(i)=r(C)=dCW1+dCD = 5+5 = 10 C 0 10
Step 1: For each pair of taxa i,j compute, J
W1 D C r(i)
T= {W1,D,C} and n = 3, When i=D, j=W1 W1 0 2 5 7
LDW1 = dDW1 – ((rD + rW1)/(3-2)) D -12 0 5 7
i
= 2 – ((7+7)/(3-2)) = 2 – 14 = -12 C 0 10
Step 1: For each pair of taxa i,j compute,
i
= 5 – ((10+7)/(3-2) = 5-17 = -12 C -12 0 10
Step 1: For each pair of taxa i,j compute,
W1 D C r(i)
n=3, When i = C, j=D W1 0 2 5 7
LCD = dCD – ((rC + rD)/(3-2)) D -12 0 5 7
i
= 5 – ((10+7)/(3-2) = 5-17 = 12 C -12 -12 0 10
Step 2: Pick a pair of taxa with smallest
Lij and cluster to form cluster W2 i.e., W1
W2←{i,j}, in this case smallest Lij is -17,
where i =D and j=W1, so the W2←{i,j} is
W2←{W1,D}. W2
i
C
J
T = {W2,C}, K = {C}, compute the distance between W1
W1 D C r(i)
to K={C}. i.e., distance between W2 and C is computed
W1 0 2 5 7
by using the following equation,
dCW2 = dCW2 = 1/2(dKi + dKj + d{i, j}) D -12 0 5 7
i
When K=C, W2 = {W1,D}, where i=W1, j=D C -12 -12 0 10
= 1/2(5+5-2) = 4 W2 0 4
i
C
Step 3: Finally compute branch length (b) from the W2 to i and W2 to j using the following equation.
biW2 = 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj ) J
and W1 D C r(i)
W1 0 2 5 7
bjW2 = 1/2*(n-2) * ((n-2)*d{i, j} + rj - ri )
D -12 0 5 7
When W2={W1,D}, i=W1, j=D, n=3
i
C -12 -12 0 10
Branch length from W2 to ‘i’ will be W2 to W1 is
b W1W2= 1/2*(n-2) * ((n-2)*d{i, j} + ri - rj ) = 1/2*(3-2) * ((3-2)*d{W1D} + rW1 - rD ) =1/2*(3-2) * ((3-2)*5 + 7-7 ) = 2.5
Now reduce n by 1..i.e., n = 3-1=2, if n is 2, then stop combine clusters, otherwise goto step 1
Now value of n is 2, so algorithm
W1exits now. H H
2.5 W1
4 4 2.5
W2
+ 2.5
W1 W1 W2
2.5 1 1 2.5
R R
D
D