0% found this document useful (0 votes)
10 views9 pages

Be19b004 Endsem

Uploaded by

be21b002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Be19b004 Endsem

Uploaded by

be21b002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

BT 3040 Endrem

Part II


n

Mol weight = E mcri )


f- 1

when i c- C ) len
is
of sequenced
' n
,
,n
Ji is the ith residue
mcri molwt ith residue
) is the
of
i.
Molest =
146 + 147 + 146 t 181 + 174 + 181
K E 9 Y R Y

Molwt = 975 arm 2 97T dalton

n I
-

② Base stacks
energy
=

i.
Ee critic , )
1
/(n -

1)

when i c- C ) len
is
of sequenced
' n
,
,n
Ji is the ith residue

e. Citi 8in ) is the bare two


energy of
consecutive residues d-
position 's

(GTA)/ 8 + -9

AG
+ -9

GA
+ -9

AG
+ -10

GC

thy =
-
9


①M
I

t
L

A
-
I

t
T

S

LA

-
D
I

-
T
-
W

L
+
k

R A
+
V

(From Blosum )
62

Identity = 2110 = 2070

similarity = 6110 =
60%
&

I •

] • •

1- .

> • •

I •

o
o

→ • •

I •

H
① •

2 0

V H L E E K V T A L W G K

HLEE by VT are aligned

Part 11T
=

QI
⑨ A database is an organized collection
of information in a

computer
-

compatible format
⑤ Factors
Content
:

Fin the
-
:
content according to target audience .
If
but database
target audience
biologist
are
will
has
it
information
belated then
astronomy
to one me
no
.

the other
Knowing content also helps create the
databases as the schema
feouetuus of
the such ,
the
etc
user
interface ,
.

Ontology :
Ontology the
description of terms cesed Many
is .

a twin for systematic storage of data lot of


,
a
,

technical terms to avoid


cued confusion Also
are .

sin the database is to be accessible to people of all


levels of knowledge ,
even new students . Hence to make
is
the database
user-friendly and usable ,
ontology my
linportant .

Kia schema is the interrelationship within the data


: .

For eg a
proteus has
thermodynamic
:

sequence ,

property ,
chemical
properties function etc , ,
.

how true various teens are connected forms the

and hence it crucial role


schema
plays a .

torment To avoid defined template / for met


confusion
:
a
,

should be used to prevent the data otherwise ,

user will get confined when accosting multiple some .

retrieval :
Fuially ,
tu ever must be able to retrieve the

G pinnae - data
according to his / her need . Hence an

interface must be
provide and some DBMS
like
query language Sql must be used

Fwtherueauh If interested in kenowiy more than


is
:
a urn

what's
given in our database we should
provide ,

bike to onn database with different information

Biology as a subject
very complicated ranging from is
,
the
there
many relationships to factor so many
micro to macro are so , ,

unanswered question and so much data to titer through


there is with
Hence a lot
of research being done in the ticket ,

each
group conducting their own enperimevto , creating their own

algorithms net .

This database for


is
why we see so
many
various
categories .

They contain
mostly the same information ,
but also somewhat

slightly different .

This and
very competitive forward
research
makes there
groups pushes .

Bio databases contain a lot of information and hence structured


and retrieval
storage are
key .

Given for a single entity , g.


a
protein ,
there is so much divers
information , eg .
thermal
properties chemical properties having
,
,
a

relational database
,
i. e
,
multiple datahees containing specific information
such as the various
properties ,
and the relations bring them
together .
?⃝
⑨ European Molecular
Biology lab

DNA Data Bank of Japan


⑨ A :
GTGCTGCACG di ,j =
Number
of mismatches between
B :
A G C 1- GC A AEC ith { jtn seq
C :
GT G C T G C ACT
☐ : C T C T G C A G A A

E : GT A T C A C AT A

A B C D
B 9

c
① 9

9 5
A. :
☐ 9
B D
C- 6 8 6 7 E °
°

(A. c) O
o
0

CB D)
(
,
B
AC D
(A. D. E) 0

B 9 0

① A. D. E) @ D)

. -

D 9

E 6 8 7

AC BD

BD 9

€ ⑥ 75

(✗ A. c) E)
, ,(B. D ))
④ I :
GTGCTGCACG
II :
AGC 1- GC A ACC
It :
GTGCTGC ACT
TI : CTC TG CA GAA
I : GT A TCA C AT A

1 2 3 4 5 6 7 8 9 10

A 1 0 I 0 0 I 2 4 I 2
G 3 I 2 0 2 2 •
I 1 I
C 2 3
I
?
I 0 2 2 0
I
T O 4 0 3 2 O O O ,

'
DNA
,
hence ,
p= 14,111--5

win
. =
in in base ,jtn position
N : number
of sequin

I 2 3 4 5 6 7 8 9 10

A -0.18 -1.79 -0.18 -1.79 -1-79 -0.18 0-41 1.0A -0.18 0.41
G o . >> -0.18 0.41 -1.79 0.41 0.91 -1-79 -0.18 -0.18 -0.18
C 0.18 1.79 0.41 -10.18 0.41 0.77 -1.79 041 0.18
- -

0.41 -

T -
I -79 1-04-1.79 0.77 0.41 -1-79 -1.79 -1.79 -0.18 -0.18

② Pos 3 Posh

I G C

I C G

IHI G C

II C A

¥ A T

3 9
A 0-2 012
G 0.4 0.2
C 0.4 0.4
T O 80.2
member of see
Éf J
:
n

Entropy
: ; lnf .

in residue / bare
i. 1
frequency ogih entity
at
fi
-
:

given position
posts
=
0-2 / no -2 1- 0.4 Ch 0.9 1- 0.4 In 0.4

= -1.055

to
= 0 -

21h0 -2 1- Or 41h0 -4 t 0-2140.2 1- 0 .


relief 2

= -1.33

⑨ -
-

O
G
O
T
O
G
O
C

O
T

O
G
O
C
O
A
O
CT

O O

C O O O O 2 0 0 2 0 2 ☐

4 I 0 A
T

O O 2 • 0 &

C O O O l ② A 3 2 0 3 0

T O O 2 o o ⑧ O O O O 5
G O 2 0 4 1 I ⑥ 3 0 0 2
C O O r r 6 3 3 ⑧ 5 2 •

A O O O O 3 5 2
55 ④ > *

G O 2 8 2 00 2 7 4 7 ⑨ 6
A 0 I 6 6 80
0 0
I • 4 6
A O O O O O O
1 3 3 3 ⑤

Gt GC TG CA C - T
CTG CA GA A

⑧ When it to data driven modeling using AI methods


⑨ comes
,

the data
plays any significant role

whether it be black boar like net


a
deep neural or a

statistics driven bayesian net ,


the data can determine
whether model screeds or not .

In this
regard ,
the
preprocessing of data
plays a huge
role .

Prepnocesiy here can mean


normalization ,
filthy ,
augmentation ,
etc .
Normalization is
very uniportant ,

otherwise the Tae


input value can skew tire model .

For the model trainee value C- Co D


eg if is or
-

and suddenly three is an


input > 1
,
that will confine tree model

find spurious correlations


and
may push it to .

Filtering could mean


removing repeated samples if samples ,
are

it bias the
repeated can model .

Augmentation means
slightly altering the input .
Sine neural welt
to
have a
tendency orufit to training data distribution
, augmentative
confuse the model a little bit and umehe it more robust .

Eg adding :
some noise to input feature vector .

neural
Sample size , deep aece
highly dependent on

data
of samples are small
the
sample size .

If number
then the model can
easily ovefit Smaller data sige
gently
but this leads
.

models with smaller parameters


require ,

to weather generalisation .

Dumeirsionaliiz reduction models form


.
In which
hyperplanes
for clarification such as SVM s
,
having redundant feather dimension
the model Using methods such as PCA t.SN e.
can
confine .
,

the
dimension be
Auto encodes
,
can reduced to keep only most valuable

Ones ,

is
After date
prepared we can use
multiple methods

For regression ,
we can ve

linear models : Find linen relationship


between
the
and
reconstruction
data
prediction
lose .
by minimize

☐ NN :
deep newel nets are
very versatile
but are
contingent
data available in bio, the
on amount
of . Sin ,
ground
and
Casmnhoiud
truth available is
really spans hence we cannot
network and
above ) have
very deep hence lose out on

perfoormau Unsupervised learning Wwe


ground truth is not
-

required key but is a very difficult tale and lot


is .
a

of nevaeh
is
being put into find methods of unsupervised leering .

Since DNN are black bone by nature it is very difficult


to intuit it's working .

Classification .
boned
True based method :
bayesian
methods whee tree
are constructed between

input and ernpeeted class . The tires are

modelled
and
on the basis
of probability
Lena have
good enplain ability behind their
actions

SVM :

finds hyperplanes between Various classes


by
minimising the distance of the
predicted plan from
class
each point
of a .

DNN :
DNNS ear
again be wed men .

Clustering based methods line K -


mean
,
DBS CAN ,
etc .

be used well classify


can as to on the basis
of the
clusters made -

Verification of model

K -

told chose validation is the staler method


of
testing the
capacity of the model . The data is split
into train & tent
splits . The model is trained on train split
and evaluated on test .

test split distribution


In
genal it is a
good practice to have
to be
slightly different from train
split .

typeepaiauuhe Gone details

hypenpaa meter optimization is a


huge
tart in
too gsfindis
DNA ,

the rate and


right leering schedule , weigh initihgidsitoir Alongside .

also huge
the
type of optimizes such as SGD or Adam
plays a

role

④ N
= >5
,
AP = 12

=
p 13

[ correctly
Tp = >
predicted position)
☒ p= 6 [income thy predictive positive]
[ incorrectly
predicted negative ]
F N = 6

TN = 56 [ correctly predicted negative]


TP
sensitivity =

¥-74

¥ 0.583
= =

specificity ,¥p
=

§÷
= = 0 -903

accuracy TP+m
I

= 63 = 0.84
-75

3 9- 25

i.u.in/.....iM/
⑨ 5 o a • • ~

, , .

2 to

hydrophobicity profile ,
plot of hydrophobicity values
us residue
position

Sey : A I KSWVKTIARTYLLNS
grain

poweo Of Anrphiphetiits = E
/( ait , , 9in -

@ its , air /
N

= 2 - O l

You might also like