311c PDF
311c PDF
311c PDF
HANNES ERIKSSON
The Author shall, when transferring the rights of the Work to a third party (for example a
publisher or a company), acknowledge the third party about this agreement. If the Author
has signed a copyright agreement with a third party regarding the Work, the Author
warrants hereby that he/she has obtained any necessary permission from this third party to
let Chalmers University of Technology and University of Gothenburg store the Work
electronically and make it accessible on the Internet.
Hannes Eriksson
A compound transformed into high dimensional space represented in a Gaussian process, see
page 10.
Testing drugs in discovery is time consuming and expensive. An idea is then to eliminate
unpromising compounds from the testing phase by using online learning methods to
predict properties of yet to be tested compounds and determining which drugs to test.
This is done by comparing substructures in the graph representation of compounds,
transformed into a compressed high dimensional space where a Gaussian process bandit
and a linear bandit is used to predict properties of new compounds. Results show that
the bandits perform significantly better than random selection and that the feature
compression probably does not decrease the overall accuracy of the predictions.
Keywords
Contextual bandits, Gaussian process bandit, experimental design, linear bandit, signa-
ture descriptor, compressed sensing, online learning, reinforcement learning
Acknowledgements
I would like to give special thanks to my supervisors Christos Dimitrakakis (Chalmers)
and Lars Carlsson (AstraZeneca) for their help with this master thesis.
Glossary
IC50 A metric for the potency of a drug which indicates how much of the drug is needed
to inhibit a process by half
1
Contents
1 Introduction 4
1.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 AstraZeneca . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 8
2.1 Biochemistry terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Bayesian probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Gaussian process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Motivation and application . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Signature descriptors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.2 Motivation and application . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Motivation and application . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Bandits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6.2 Motivation and application . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.1 Linear Bandit selection using Thompson Sampling . . . . . . . . . 15
2.7.2 Gaussian Process Bandit selection . . . . . . . . . . . . . . . . . . 16
3 Experiment 19
3.1 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Initial setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2
CONTENTS
4 Results 24
4.1 Initial setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1 CDK5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.2 GNRHR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.3 MAPK14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.4 MGLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Randomized setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Discussion 32
5.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.1 Initial setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Randomized setting . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Compressed sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Gain for GP-UCB . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.2 Gain for LinearBandit-TS . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Exploitation vs. exploration with δ . . . . . . . . . . . . . . . . . . . . . . 34
5.4 Selection budget . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 GP-UCB vs. LinearBandit-TS . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5.1 LinearBandit-TS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.5.2 GP-UCB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.6 General discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 Conclusion 37
6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2 GP-UCB or LinearBandit-TS? . . . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
References 43
3
1
Introduction
Synthesizing and testing compounds for their potential use as drugs is time consuming
and expensive [1]. An idea came about to alleviate this by restricting the amount of
compounds sent for testing. The way to decide which compounds to continue research
on would be by using Artificial Intelligence (AI) methods to predict properties of yet
to be tested compounds and to select which compounds to test. By doing this, if the
predictions are accurate enough, some unpromising compounds could perhaps be ruled
out even before they have gone through a real testing phase. A breakthrough here would
be interesting for the biopharmaceutical industry as it could decrease the costs since not
as many compounds have to be synthesized and tested.
1.1 Aims
The aims of this Master Thesis are to investigate algorithms for predictions on data in
high dimensional space and to develop methods for automated experiment design for drug
development such that when given a set of compounds they should together with exper-
imentalists rule out unpromising compounds. This by having the methods successively
predict properties (toxicity, target proteins and so on) of yet to be tested compounds
and then conducting tests on the compounds with the most promising properties. These
properties will then be verified by the experimentalists who will feed the tested properties
back to the methods for more accurate predictions on future compounds.
1.2 Scope
Signature descriptors are a way of identifying similarities in the graphical model of
compounds by looking at which atoms are present, what kind of bonds connect the
atoms and so on. This is used to predict properties of yet to be tested compounds.
This work will only consist of using signature descriptors to compare the compounds.
4
CHAPTER 1. INTRODUCTION
One problem with this is that a compound in reality is a 3D-structure and in this work
compounds are modeled as 2D-structures. A consequence of this is that compounds with
different 3D-structure and properties might be represented as the same 2D-structure with
different properties.
Global graph kernels that work on a whole graph unlike signature descriptors can
find even more information hidden in a graph as shown by Johansson et al. [2]. These
will however not be investigated because of time constraints but could be interesting to
compare against the combination of signature descriptors together with bandits as used
in this work.
1.3 Methodology
The work done in this thesis consisted of three phases, namely a literature study phase, a
data acquisition and pre-processing phase, and finally the implementation phase. These
phases were mostly run in parallel of each other.
In the literature study part the first goal was to read up on and become acquainted
with the following things: graph kernel methods to be able to give a metric on the
similarity or dissimilarity of different compounds, contextual bandits for selection of the
most promising compounds and Gaussian processes for the online learning and predic-
tion. Techniques for dimensionality reduction were also investigated because of the high
dimensionality of the data, covering techniques such as compressed sensing.
The molecular data and their corresponding properties used in this report were taken
from literature [3]. This data is organized into several data sets where each of the
compounds in a particular data set was considered for the development of a particular
drug. Each compound has its corresponding IC50 value for a distinct process that the
drug in development is intended to affect.
The implementation phase consisted of both implementing the various algorithms
and a testing stage where the algorithms were compared with each other, this to decide
on which methods were best used to tackle the goal of the project.
1.4 AstraZeneca
AstraZeneca is a biopharmaceutical company whose work spans all the way from medicine
research and medicine development to the commercialization of medicines [4]. During
the development of new drugs these drugs normally have to undergo a testing phase to
determine which drugs have the most promising properties. It is in this phase where the
work conducted in this thesis is interesting. Instead of having to test the whole batch of
drugs to find the most promising ones we test a part of the batch and hope to identify
the best drugs in the batch without testing all of them.
5
CHAPTER 1. INTRODUCTION
Molecular data
Signature de-
scriptors (SD)
Compressed
sensing (CS)
Selection
1.5 Outline
The Figure 1.1 depicts the information flow from molecular structures mapped onto some
hyperspace to what the algorithm deems are the most promising compounds out of the
data set.
In chapter 2 we cover the theoretical part of the different techniques used throughout
this work. In chapter 3 we describe the experiment settings in which these techniques are
used. In section 3.2 we describe the setting of an experiment with set data. In section 3.3
we describe the setting of an experiment on generated data. In chapter 4 we show the
results of running these experiments. In chapter 5 we discuss these results. In chapter 6
we give our concluding remarks on this work as well as possible improvements on our
solutions and other ideas to explore that did not fit in this work.
6
CHAPTER 1. INTRODUCTION
human intelligence is used to both test and interpret properties of the compounds.
The work done in Srinivas et al. [6] uses the very same idea for bandit selection as
in this work (GP-UCB) and they show regret bounds for different kernels and compare
the algorithm with heuristic methods such as Expected Improvement and Most Probable
Improvement. They then use these methods to try to find the most congested part of a
highway.
Krause and Ong [7] gave multi-task experiment design as an example application of
their CGP-UCB algorithm where the idea was to design a vaccine through a sequence
of experiments. In that paper they wanted to identify the maximally binding peptides
for complex compounds. The similarity measure used for the context is the Hamming
distance between amino acids in the peptide bindings as opposed to the signature de-
scriptors used in this work and they focus on a select few compounds that the peptides
are supposed to bind.
In a recent paper by Williams et al. [1] one of the things the authors do is employ-
ing AI methods in the drug discovery process to discover new compounds to combat
tropical diseases. They do this by using a Gaussian process with a linear kernel.
The contribution of this thesis to the field of experiment design is that we show that
using a Gaussian process for prediction and Upper Confidence Bound for selection works
well in the drug development setting. We compare that combination with another selec-
tion algorithm using a Linear bandit with Thompson sampling as well as with random
selection. We show that using a dimensionality reduction on the data probably does not
lower the overall accuracy of the selection and prediction.
7
2
Background
This chapter will serve as a baseline description for the different methods used throughout
this project. Firstly, we will explain the basic terminology used in this work, then
the methods of processing the data and finally the algorithms we used to control the
prediction and selection process of the drugs in development. The order of things in this
chapter mainly follows the structure depicted in Figure 1.1.
8
CHAPTER 2. BACKGROUND
R2
R1 R3
R6 R4
R5
Figure 2.1: Graph of a scaffold with its corresponding R-groups.
of probability. All the methods used in this work that are probabilistic are based on
Bayesian probability.
2.3.1 Definition
Let x = (x1 ,x2 , . . . ,xi ) ∈ X be a set of points in some input space. Let then f = f (x)
where f is a function f : X → R. P(f ) is then a GP if the marginal distribution over
the set P(f ) is distributed as a multivariate Gaussian.
A Gaussian process is defined by its mean function m(x) and its covariance function
k(x,x0 ). As shown in Rasmussen and Williams [11] this means the GP can be written as
f (x) ∼ GP(m(x),k(x,x0 )). The mean function can often be taken as zero but the chosen
covariance function or kernel is critical for the GP’s ability to predict and compare
points with each other. A commonly used kernel is the Radial Basis Function kernel as
described in subsection 2.7.2.
9
CHAPTER 2. BACKGROUND
Figure 2.2: A Gaussian process with predicted mean and variance for points in [0,2π].
2.4.1 Definition
A molecular 2D-structure could be seen as an undirected graph G containing a set
of vertices V and a set of edges E. These vertices can be atoms or more complicated
chemical structures that make up a compound. The edges represent the molecular bonds
present in the complete compound.
Faulon et al. [12] described a way to try to identify properties of a graph given
its spatial structure using different operators on the graph itself. These operators are
10
CHAPTER 2. BACKGROUND
H
N
O
O
H
Figure 2.3: The paracetamol or acetaminophen compound modeled as a graph.
11
CHAPTER 2. BACKGROUND
1 H
3
N 7
2 10
8
6 O9
11 4
O 5
H
Figure 2.4: Paracetamol compound with its atoms numbered and with some of its signature
descriptors shown for height = 1, n = 4 ( ) and height = 2, n = 6 ( ).
The Table 2.1, adapted from [13], shows all the subgraphs originating from each of
the vertices for the heights h = 1,2,3. The subgraph for n = 3, h = 1 is read as following
for instance: a carbon atom that is singly bonded to another carbon atom, singly bonded
to a nitrogen atom and doubly bonded to another carbon atom. This follows a recursive
pattern.
12
CHAPTER 2. BACKGROUND
2.5.1 Definition
Given a sparse structure of data x the question is then if it is possible to compress this
data to a much smaller data set y such that the elements in y preserve the information
contained in x with high probability. Baraniuk et al. [14] argues that this is the case
given that the signal has the restricted isometry property. The idea is then to construct
a matrix Φ such that y = Φx.
Let the density, i.e. the number of non-zero elements, of the data set x be S. Let
the dimensionality of the data set x be N . Let K = O(Slog N S ).
Candés [15] showed that it is then possible to generate Φ with dimensions N × K
using a Gaussian matrix consisting of values oi,j ∼ N (0, S1 ).
o1,1 o2,1 · · ·
y1 y2 · · · = o o
1,2 2,2 · · · x x ···
1 2 (2.1)
K×1 .. .. .. N ×1
. . .
K×N
2.6 Bandits
A bandit is, as described by Auer et al. [17], a theoretical machine, inspired by regular slot
machines such that when played it produces some reward from some unknown probability
13
CHAPTER 2. BACKGROUND
distribution specific to that machine. The inspiration comes from the following problem.
Suppose you are a gambler in a casino. Your goal is to maximize your winnings and
you have a number of slot machines to play on. At each time step you select one slot
machine and play it. Some reward, in this case the amount of money won for that round,
is observed and recorded. At the next time step you have access to the observed data of
all the previous rounds. Now your goal is to maximize your winnings given this historic
data. See Figure 2.5 for an idea of what a multi-armed bandit problem can look like.
2.6.1 Definition
A contextual multi-armed bandit (MAB) problem can have the following parameters: K
is the number of arms, ai,t ∈ A is a set of actions for a specific arm at a given time and
xi,t ∈ Rd a set of contexts. Each arm has its specific unknown probability distribution
Di and produces a reward rai ,t ∼ Di when played. The goal is then to either maximize
the total reward over some time T or to minimize the regret over some time T . Regret
here is defined as the difference from the observed rewards of the selected arms and the
reward from the optimal arm at each time step t = 1,2,..,T .
T
X
Regret = ra∗i ,t − rt (2.2)
t=1
14
CHAPTER 2. BACKGROUND
Clinical trials could be seen as a MAB problem, as described by Katehakis and Arthur
F. Veinott [18], where each treatment corresponds to one bandit. At each time step one
of the treatments is selected and the corresponding machine is played and some reward
is observed, commonly in a binary fashion, 1 is the treatment was successful or 0 is the
treatment was not successful. A common goal in this setting is to try to maximize the
number of successful treatments, which in the most basic setting where the probability
that a treatment is successful is the same for all patients, is simply finding the best
bandit to play.
The step from a clinical trial model to one of drug design is quite simple. Instead of
maximizing the number of successful treatments as in the clinical trial case we maximize
over the IC50 values of the compounds.
One motivation behind using this particular model is that bandits can handle the
exploitation vs. exploration trade-off quite well. A common issue in optimization prob-
lems is getting stuck in local maxima/minima and exploitation vs. exploration deals
with just that. The gambler has to choose whether to play the bandit that seems to be
optimal right now (exploitation) or to play a bandit that the gambler currently has very
little information of (exploration).
Figure 2.6 shows a bandit algorithm used in conjunction with a Gaussian process.
It clearly shows the exploitation vs. exploration dilemma as the best points are rarely
the ones with the highest mean so in order to find the optimal bandit to play it has to
explore unknown space. Exactly how it does this is explained in subsection 2.7.2.
15
CHAPTER 2. BACKGROUND
Details There are N arms. Each arm has a context xi ∈ Rd associated with it. B
is a d × d matrix capturing the observed contexts. This matrix is used to update the
posterior mean and variance. µ̂ is a d-vector
q that is the sample mean of the rewards
24 1
of the observed contexts. v is set to v = R dlog( δ ), where and δ are the hyper-
parameters of the algorithm. R is set such that ri,t ∈ [x> >
i µ−R, xi µ+R].
The algorithm
then samples from the posterior distribution and selects the arm that maximizes x>i µ̂.
For more details, see the work by Agrawal and Goyal [19].
Motivation The range of rewards when optimizing for pIC50 values is approximately
known beforehand so the R parameter can be set without issue. Furthermore, its imple-
mentation is quite simple and the idea behind it is different enough from GP-UCB. It
has also been shown as in Deshpande and Montanari [20] that linear bandits can work
well for problems of high dimensionality.
Algorithm 1 LinearBandit-TS
B = Id , µ̂ = 0d , f = 0d .
for all t = 1,2,...,T do
Sample µ̃ ∼ N (µ̂,v 2 B −1 ).
Play arm i ∈ argmaxi x> i µ̃. Observe reward ri,t .
B ← B+xi x> i .
f ← f +xi ri,t .
µ̂ ← B −1 f.
end for
Algorithm 2 GP-UCB
GP prior µ0 = 0, σ0 , kernel K.
for all t = 1,2,...,T do √
Play arm i ∈ argmaxi µt−1 (xi ) + βt σt−1 (xi ). Observe reward ri,t .
Perform Bayesian update to obtain µt and σt by letting GP learn xi = ri,t .
end for
Details There are N arms. Each arm has a context xi ∈ Rd associated with it. µ0
and σ0 are the initial hyper-parameters of the GP. K is a covariance function that is
used to predict the mean of other points. The kernel used in this work is the Ra-
dial Basis Function (RBF) kernel. Two points xi , x0i have their similarity defined by
16
CHAPTER 2. BACKGROUND
Figure 2.6: A plot of a Gaussian process in several stages. The points represent the true
reward, the shaded area the variance and the solid line represents the Gaussian process
mean.
17
CHAPTER 2. BACKGROUND
K(xi ,x0i ) =exp(−||xi −x0i ||2 /(2σ 2 )). β(t) is a function that decays with time that is used
to handle the exploitation vs. exploration trade-off. β(t) = 2log(dt2 π 2 /6δ). For more
specific details, see the work by Srinivas et al. [6].
Motivation The reasoning for using a GP for bandit selection is that a way to compare
points comes innately with the GP in the covariance function K. Should the RBF
kernel not work well enough then it could be substituted with some other kernel. The
GP can also be used to predict properties of arbitrary points, something that is used
in section 3.3.
The motivation behind using UCB as the selection method is that it handles the
exploitation vs. exploration trade-off in a clever way. GP-UCB in action can be seen
in Figure 2.6. The Gaussian process has no information of the reward values initially
but as more observations are made it becomes more confident in its predictions. The
numbers on the x-axis are the compound or bandit numbers. Each of those numbers
corresponds to a specific context, as described in section 2.6.
18
3
Experiment
In this chapter we will describe what is meant by an experiment. In addition to this two
different settings of the experiment will be discussed. The methods will be used on real
data sets in the first setting.
In the other setting the data will be generated using a Gaussian process that has
learned a subset of one of the real data sets. The reasoning for this is mainly that there
is not that much data in the data sets and there are no repeated measurements for each
compound so to strengthen the results the algorithms will also be tried on generated
data.
3.1 Details
The experiments are both modeled as multi-armed bandit problems where each com-
pound or data point has its own bandit. Playing these bandits is equivalent to observing
the test results from running real chemical tests on the compounds.
The metrics used to test the performance of the algorithms used in this work is cu-
mulative regret and simple regret. Regret is defined as in Equation 2.2 and the reasoning
for testing two separate definitions of regret is that for cumulative regret we can see how
close on average the whole tested set is to the optimal tested set but we miss out on
information on how close the best compound in the tested data set is to the optimal
compound in the data set. Let d be the size of the feature vector. Let N be the to-
tal number of samples. Each set contains a set of points (xi ,yi ), where i = 1,2, . . . ,N .
Where x ∈ Rd×N is the feature matrix and y ∈ RN is the reward vector. r∗ is defined
to be r∗ = maxi yi . Let r(i) be the i-th highest reward in the data set. From these
definitions we get the two metrics as defined in Equation 3.1 and Equation 3.2.
19
CHAPTER 3. EXPERIMENT
T
X
Cumulative regret = r(t) − rt (3.2)
t=1
The goal is then to select a number of data points 0 < x ≤ N that minimizes the
regret as defined in Equation 3.1 and Equation 3.2. The tests of compounds are here
assumed to be noise-free, i.e. multiple tests of the same compounds give the same result.
Because of this assumption it does not make sense to play the same bandit twice and
such the regret definition can be simplified to Equation 3.2.
CDK5 Cyclin-dependent kinase (CDK) plays an important role in the cell division
cycle and is often deregulated in developing tumors, as described by Meijer et al. [21].
20
CHAPTER 3. EXPERIMENT
R6 R8
H
R1 N N N O
N (CDK5)
R2 R4 R7
R3
R1 R2 (MGLL)
N N N
O O
R2 R3
H
N
R1 O
O
(GNRHR)
R6 R4
R5
R6
R1 R5
R2
N
(MAPK14)
N
R3
R4
Figure 3.1: Graphs of the scaffolds and R-groups of the data sets.
21
CHAPTER 3. EXPERIMENT
x3
x2
x4
x1
MGLL Monoacylglycerol lipase (MGLL) which levels of may be regulated for pain
suppression and for inflammatory disorders, as described by Labar et al. [24].
22
CHAPTER 3. EXPERIMENT
3.3.2 Details
Since the model now takes noise and multiple testings into account the regret definitions
have to be changed to accommodate this. The new definitions are shown in Equation 3.3
and Equation 3.4. The main difference is that each bandit now produces noisy rewards
and so which compound is the best is not immediately apparent even after testing. Since
it is now possible for the selection algorithm to choose the optimal bandit multiple times
the new cumulative regret is just the difference between playing the best bandit for all
trials and the sum over the entire reward history.
The noise for both the GP model and the rewards are assumed to be distributed as
N (0,1).
T
X
Cumulative regret = T r∗ − rt (3.4)
t=1
23
4
Results
This chapter contains the results achieved by running the methods used throughout this
work in the two settings described in chapter 3.
4.1.1 CDK5
All methods both with and without compressed sensing are tried on the CDK5 data set.
The results are shown in Figure 4.1.
4.1.2 GNRHR
The LinearBandit-TS algorithm without compressed sensing is skipped in this test be-
cause of the long time it takes for it to evaluate. Other than that all methods are present
once again. Results shown in Figure 4.2.
4.1.3 MAPK14
The results are shown here only with compression since not using it takes a lot of time.
In addition to the normal tests two tests with other δ values for GP-UCB are conducted.
Results are shown in Figure 4.3.
24
CHAPTER 4. RESULTS
4.1.4 MGLL
As in the previous section only tests with compression are conducted and with the very
same delta values as in the previous section. Results are shown in Figure 4.4.
25
CHAPTER 4. RESULTS
Figure 4.1: Test results for the CDK5 data set showing the average cumulative and simple
regret. 26
CHAPTER 4. RESULTS
Figure 4.2: Test results for the GNRHR data set showing the average cumulative and
simple regret. 27
CHAPTER 4. RESULTS
Figure 4.3: Test results for the MAPK14 data set showing the average cumulative and
simple regret. 28
CHAPTER 4. RESULTS
Figure 4.4: Test results for the MGLL data set showing the average cumulative and simple
regret. 29
CHAPTER 4. RESULTS
Figure 4.5: Test results for the generated data set showing the average cumulative and
simple regret. 30
CHAPTER 4. RESULTS
Figure 4.6: Test results for the generated data set with noise showing the average cumu-
lative and simple regret. 31
5
Discussion
In this chapter our interpretation of the results shown in chapter 4 will be discussed as
well as a comparison between the two different selection algorithms. It will also feature
arguments for why the problems tackled and solutions described in this project might
be interesting in the future.
5.1 Results
This section contains our interpretation of the results achieved from running the methods
described in this work on the four known data sets as well as on the newly generated
data.
32
CHAPTER 5. DISCUSSION
it is learning far too slowly compared to GP-UCB. This because the linear bandit quite
often decides to play the very same bandit a few times before moving on to another
bandit.
The biggest difference of the results in the no-noise setting compared to the noisy
setting is that it takes GP-UCB quite a bit more time in the noisy setting to identify
the promising compounds. Note that we are dealing with quite heavy noise here. The
rewards range from around 6 − 8 pIC50 and the noise is x ∼ N (0,1) so it is pretty
likely that a relatively bad compound will be taken as a good one and similarly a good
compound assumed to be bad. Some test setups performed okay even in this difficult
task, especially the GP-UCB setups that value exploration more than normal (δ = 0.95).
Overall this non-standard linear bandit with Thompson sampling did not work well
in the randomized setting with repeated measurements. It is possible the standard
algorithm would work better in this case.
Unfortunately we did not have access to information on how noisy the tests for the
compounds in the patent data were however, realistically the noise should be quite a lot
lower than what we used here.
33
CHAPTER 5. DISCUSSION
in the relatively small compounds featured in this work, then we can reduce the d2 term
d
to (100 log( 100 ))2 which is a lot smaller than d2 for big d.
34
CHAPTER 5. DISCUSSION
5.5.1 LinearBandit-TS
To use this version of Thompson Sampling with a linear bandit it is crucial that it is
known beforehand in what range the rewards can come in. The implementation mainly
scales with the dimensionality of the features and so is quite slow for complex data sets
without compression.
5.5.2 GP-UCB
It is important that the RBF kernel knows how much to scale the input data for the
algorithm to work properly. This means that the input data has to be of about the
same magnitude. This implementation scales with both the number of samples and the
number of features but in general the number of samples takes precedence. It is easy
to control the exploitation vs. exploration trade-off which might be important for it to
work well, both for data sets where the data is vastly different and for data sets where
the data is quite similar in nature.
35
CHAPTER 5. DISCUSSION
those cases is true, then automated experiment design is surely helping society to become
a better place to live in.
36
6
Conclusion
In this chapter we will summarize our results and our thoughts of this work. We will
conclude on when to use GP-UCB and when to use LinearBandit-TS going by our
simplified explanations in section 5.5. We will also go over some improvements and
interesting ideas that might be interesting in the future if one were to continue on with
this work.
6.1 Results
We conclude that these methods work quite well in this context. Identifying promising
elements in a data set when the number of tests is much smaller than the amount of
elements in the data set is not a trivial task, especially when there is no training data or
previous observations to make use of. The methods work online and they still manage
to identify promising compounds in a relatively short period of time.
37
CHAPTER 6. CONCLUSION
• GP-UCB
• LinearBandit-TS
+ Simple
+ δ for exploitation vs. exploration
- Scales with features – slow without compression
- Slow to adapt?
Needs to know reward range
Replacing the current graphlet method for identifying similar structures in graphs (sig-
nature descriptors) by graph kernels with geometric embeddings as in Johansson et al.
[2]. The authors show in that paper that their algorithm can work better or at least as
good as current graphlet methods such as the one used in this work. There is a possibil-
ity that signature descriptors miss out on some interesting information concealed in the
graph that a global graph kernel could discover.
Trying out other ways for the exploration and not only Thompson Sampling and UCB.
There might be a better way of selecting the next drugs to test.
38
CHAPTER 6. CONCLUSION
Developing a bandit that can use an arbitrary predictor. As an example, Eklund et al.
[13] has noticed that conformal predictors work well for prediction in the drug develop-
ment setting so trying out conformal predictors could be interesting.
39
CHAPTER 6. CONCLUSION
...
p(yN |ξt )
N
ct
...
Start, trial = t
...
Figure 6.1: The tree of posterior predictive distributions of a bandit run at trial t.
40
References
[3] GOSTAR databases 2012; GVK Biosciences Private Ltd.: Hyderabad, India.
[8] J. S. Soothill, R. Ward, and A. J. Girling, “The IC50: an exactly defined measure
of antibiotic sensitivity,” Journal of Antimicrobial Chemotherapy, vol. 29, no. 2, pp.
137–139, 1992.
41
REFERENCES
42
REFERENCES
[25] F. L. Gall, “Powers of Tensors and Fast Matrix Multiplication,” CoRR, vol.
abs/1401.7714, 2014. [Online]. Available: http://arxiv.org/abs/1401.7714
[26] U. Rester, “From virtuality to reality - Virtual screening in lead discovery and lead
optimization: a medicinal chemistry perspective,” Curr Opin Drug Discov Devel,
vol. 11, no. 4, pp. 559–568, 2008.
43