Be19b004 Endsem
Be19b004 Endsem
Part II
④
n
when i c- C ) len
is
of sequenced
' n
,
,n
Ji is the ith residue
mcri molwt ith residue
) is the
of
i.
Molest =
146 + 147 + 146 t 181 + 174 + 181
K E 9 Y R Y
n I
-
② Base stacks
energy
=
i.
Ee critic , )
1
/(n -
1)
when i c- C ) len
is
of sequenced
' n
,
,n
Ji is the ith residue
(GTA)/ 8 + -9
AG
+ -9
GA
+ -9
AG
+ -10
GC
thy =
-
9
③
①M
I
t
L
A
-
I
t
T
S
①
LA
-
D
I
-
T
-
W
L
+
k
R A
+
V
(From Blosum )
62
similarity = 6110 =
60%
&
I •
] • •
1- .
> • •
I •
o
o
→ • •
I •
H
① •
2 0
V H L E E K V T A L W G K
Part 11T
=
QI
⑨ A database is an organized collection
of information in a
computer
-
compatible format
⑤ Factors
Content
:
Fin the
-
:
content according to target audience .
If
but database
target audience
biologist
are
will
has
it
information
belated then
astronomy
to one me
no
.
the other
Knowing content also helps create the
databases as the schema
feouetuus of
the such ,
the
etc
user
interface ,
.
Ontology :
Ontology the
description of terms cesed Many
is .
For eg a
proteus has
thermodynamic
:
sequence ,
property ,
chemical
properties function etc , ,
.
retrieval :
Fuially ,
tu ever must be able to retrieve the
G pinnae - data
according to his / her need . Hence an
interface must be
provide and some DBMS
like
query language Sql must be used
what's
given in our database we should
provide ,
Biology as a subject
very complicated ranging from is
,
the
there
many relationships to factor so many
micro to macro are so , ,
each
group conducting their own enperimevto , creating their own
algorithms net .
They contain
mostly the same information ,
but also somewhat
slightly different .
This and
very competitive forward
research
makes there
groups pushes .
relational database
,
i. e
,
multiple datahees containing specific information
such as the various
properties ,
and the relations bring them
together .
?⃝
⑨ European Molecular
Biology lab
②
⑨ A :
GTGCTGCACG di ,j =
Number
of mismatches between
B :
A G C 1- GC A AEC ith { jtn seq
C :
GT G C T G C ACT
☐ : C T C T G C A G A A
E : GT A T C A C AT A
A B C D
B 9
c
① 9
9 5
A. :
☐ 9
B D
C- 6 8 6 7 E °
°
(A. c) O
o
0
CB D)
(
,
B
AC D
(A. D. E) 0
B 9 0
① A. D. E) @ D)
⑧
. -
D 9
E 6 8 7
AC BD
BD 9
€ ⑥ 75
(✗ A. c) E)
, ,(B. D ))
④ I :
GTGCTGCACG
II :
AGC 1- GC A ACC
It :
GTGCTGC ACT
TI : CTC TG CA GAA
I : GT A TCA C AT A
1 2 3 4 5 6 7 8 9 10
A 1 0 I 0 0 I 2 4 I 2
G 3 I 2 0 2 2 •
I 1 I
C 2 3
I
?
I 0 2 2 0
I
T O 4 0 3 2 O O O ,
'
DNA
,
hence ,
p= 14,111--5
win
. =
in in base ,jtn position
N : number
of sequin
I 2 3 4 5 6 7 8 9 10
A -0.18 -1.79 -0.18 -1.79 -1-79 -0.18 0-41 1.0A -0.18 0.41
G o . >> -0.18 0.41 -1.79 0.41 0.91 -1-79 -0.18 -0.18 -0.18
C 0.18 1.79 0.41 -10.18 0.41 0.77 -1.79 041 0.18
- -
0.41 -
T -
I -79 1-04-1.79 0.77 0.41 -1-79 -1.79 -1.79 -0.18 -0.18
② Pos 3 Posh
I G C
I C G
IHI G C
II C A
¥ A T
3 9
A 0-2 012
G 0.4 0.2
C 0.4 0.4
T O 80.2
member of see
Éf J
:
n
Entropy
: ; lnf .
in residue / bare
i. 1
frequency ogih entity
at
fi
-
:
given position
posts
=
0-2 / no -2 1- 0.4 Ch 0.9 1- 0.4 In 0.4
= -1.055
to
= 0 -
= -1.33
⑨ -
-
O
G
O
T
O
G
O
C
O
T
O
G
O
C
O
A
O
CT
O O
C O O O O 2 0 0 2 0 2 ☐
4 I 0 A
T
⑨
O O 2 • 0 &
C O O O l ② A 3 2 0 3 0
T O O 2 o o ⑧ O O O O 5
G O 2 0 4 1 I ⑥ 3 0 0 2
C O O r r 6 3 3 ⑧ 5 2 •
A O O O O 3 5 2
55 ④ > *
G O 2 8 2 00 2 7 4 7 ⑨ 6
A 0 I 6 6 80
0 0
I • 4 6
A O O O O O O
1 3 3 3 ⑤
Gt GC TG CA C - T
CTG CA GA A
the data
plays any significant role
In this
regard ,
the
preprocessing of data
plays a huge
role .
it bias the
repeated can model .
Augmentation means
slightly altering the input .
Sine neural welt
to
have a
tendency orufit to training data distribution
, augmentative
confuse the model a little bit and umehe it more robust .
Eg adding :
some noise to input feature vector .
neural
Sample size , deep aece
highly dependent on
data
of samples are small
the
sample size .
If number
then the model can
easily ovefit Smaller data sige
gently
but this leads
.
to weather generalisation .
the
dimension be
Auto encodes
,
can reduced to keep only most valuable
Ones ,
is
After date
prepared we can use
multiple methods
For regression ,
we can ve
☐ NN :
deep newel nets are
very versatile
but are
contingent
data available in bio, the
on amount
of . Sin ,
ground
and
Casmnhoiud
truth available is
really spans hence we cannot
network and
above ) have
very deep hence lose out on
of nevaeh
is
being put into find methods of unsupervised leering .
Classification .
boned
True based method :
bayesian
methods whee tree
are constructed between
modelled
and
on the basis
of probability
Lena have
good enplain ability behind their
actions
SVM :
DNN :
DNNS ear
again be wed men .
Verification of model
K -
also huge
the
type of optimizes such as SGD or Adam
plays a
role
④ N
= >5
,
AP = 12
=
p 13
[ correctly
Tp = >
predicted position)
☒ p= 6 [income thy predictive positive]
[ incorrectly
predicted negative ]
F N = 6
¥-74
¥ 0.583
= =
specificity ,¥p
=
§÷
= = 0 -903
accuracy TP+m
I
= 63 = 0.84
-75
3 9- 25
i.u.in/.....iM/
⑨ 5 o a • • ~
, , .
2 to
hydrophobicity profile ,
plot of hydrophobicity values
us residue
position
Sey : A I KSWVKTIARTYLLNS
grain
poweo Of Anrphiphetiits = E
/( ait , , 9in -
@ its , air /
N
= 2 - O l