Exploring Biological Pathways Using Unity3D
Exploring Biological Pathways Using Unity3D
There are two main challenges for using game engines for biological data
visualization. First, how to represent the biological data to be visualized using
the game engine data structures. Second, how to efficiently implement the
necessary data exploration operations using these data structures.
This thesis presents the, to our knowledge, first approach for mapping biological
data to the ecs model. We used the approach to implement Freia, an application
for visualizing gene expression data integrated with pathway images. We
evaluated the performance and scalability of Freia by measuring the smoothness
of key data exploration operations. Our results show that Freia provides a frame
rate above 30 fps for these operations for up to 100 simultaneously shown
pathways.
We believe our approach demonstrates that game engines are well suited to im-
plement data visualization tools for the upcoming biological data studies.
Acknowledgements
First I would like to thank my advisors, Associate Professor Lars Ailo Bongo,
PhD Candidate Bjørn Fjukstad and Associate Professor John Markus Bjørndalen
for their continuous feedback, support and motivation during the course of this
project.
I would thank the nowac research group for their input to this project.
To my fellow students, particularly those in "Slytherin" thanks for all the great
years at the university. I will really miss our daily burn runs!
Finally, I would thank my beloved family for encouragement, support and all
the warm dinners throughout the period of this project.
Kenneth
Tromsø, November 2015
Contents
Abstract i
Acknowledgements iii
List of Tables ix
List of Abbreviations xi
1 Introduction 1
1.1 Challenges & Requirements . . . . . . . . . . . . . . . . . . 2
1.2 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Mapping 5
2.1 Entity Component System (ECS) . . . . . . . . . . . . . . . 5
2.2 Biological Pathways and data . . . . . . . . . . . . . . . . . 7
2.3 Mapping from biological data to ECS . . . . . . . . . . . . . 9
2.4 Lesson Learned . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Freia 13
3.1 Visualization of Biological Pathways . . . . . . . . . . . . . 14
3.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 Pathway Search . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Gene Search . . . . . . . . . . . . . . . . . . . . . . 19
3.2.3 Gene Expression Visualization . . . . . . . . . . . . . 19
3.2.4 Path Discovery . . . . . . . . . . . . . . . . . . . . . 20
v
vi CONTENTS
5 Related Work 35
5.1 Kvik . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 KEGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.3 BioCarta . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 KEGGViewer . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.5 Caleydo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.6 Entourage . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.7 Dynamic Exploration . . . . . . . . . . . . . . . . . . . . . . 41
5.8 UnityMol . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Conclusion 43
6.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Bibliography 47
List of Figures
2.1 An entity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Drawing system . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 ECS overview of Super Mario. . . . . . . . . . . . . . . . . . 6
2.4 Component alignment in memory. . . . . . . . . . . . . . . 7
2.5 KEGG pathway for estrogen signaling. . . . . . . . . . . . . 8
2.6 Microarray technology. . . . . . . . . . . . . . . . . . . . . . 9
2.7 ECS table overview a pathway visualization application. . . . 10
vii
viii L I S T O F FI G U R E S
ix
List of Abbreviations
api Application Programming Interface
xi
1
Introduction
Cancer is a complex group of diseases with many possible causes. It is the
second leading cause of death in the United States, and it is estimated to pass
heart disease death rate in a few years. The increase in cancer cases is due
to the world population ages and increases in size. The probability of being
diagnosed with cancer for men and women is 43% and 38%, whereas it is only
3.4% and 5.4% for men and women younger than 50 years [1]. The authors
of [1] estimate that about 589,430 Americans will die from cancer this year,
which corresponds to about 1,600 deaths per day.
The main focus of the thesis is exploring how well suited game development
frameworks are for such biological data visualization.
1
2 CHAPTER 1 INTRODUCTION
The tool should support whatever hardware and software the researchers use.
The tool should adapt to the screen resolution the researchers have available,
this gives the notion of the same user experience whether or not the researcher
has the newest high-resolution monitor.
The tool should be portable to whatever device the researchers have available,
and this may include smartphones. Therefore, the tool has to be lightweight
and requires a backend for computation and storage.
The life science community often use R³, Microsoft Excel⁴ or BioJS[3] for visu-
alization of biological data. There are projects that specializes in visualization
of biological data, but these are tailored for special tasks or data sets which
results in few users. This is elaborated in Chapter 5. On the other side, the
gaming community has to continuously create and update software which
supports and exploits the latest hardware to stay competitive in the gaming
industry. The life sciences do not have the user base and commercial financial
needed for such frameworks.
1. kegg.jp
2. reactome.org
3. r-project.org
4. products.office.com/nb-no/excel
1.2 PROPOSED SOLUTION 3
The Unity3D game engine is also based on component system which follows
the composition over inheritance. The advantage of ecs is that it creates
more flexibility in every entity created because every entity can have the same
component, such as a collision script, whether it is a tree, a bullet or a player
it may use this script. An entity can be viewed as a key with an unique entity
ID, and the entity has components assigned to it. In order for an entity to be
calculated on by a system, the entity must contain the required components to
the given system and effective execution. The ecs concept is further described
in Chapter 2.1.
1.3 Contributions
We implemented a data exploration application, called Freia, using Unity3D.
The computation therefore has to be done in the backend. We use the Kvik
framework [4] to handle data and computation. This results in a lightweight
application that the researcher can use, since the Kvik framework provides the
required functionality for retrieving pathway maps and its respective meta-data.
The Kvik framework also contains the functionality for accessing experimental
data, such as gene expression or methylation.
In order to use the efficiency advantage of the ecs model, the biological data
5. unity3d.com
6. unrealengine.com
7. cryengine.com
8. blender.org
9. coronalabs.com
4 CHAPTER 1 INTRODUCTION
• To our knowledge, the first approach for mapping the data types of
biological data visualization tools for efficient representation, processing
and visualization using ecs in game engines.
We found that our approach has several advantage. Particularly when is comes
to using a game engine that enforces the use of ecs, because of its modularity
and the reusability of the routines and scripts in the game engine. A script
or a routine can be how an image should behave in the application, and this
can be reused whether it is an image from different image sources. Finally,
the greatest advantage by our approach is the efficient processing provided
by ecs. ecs aligns structures of the same type together in memory, iterations
may exploit spatial locality in memory layout.
However, there are also some disadvantages. Unity3D game engine has a free
to use model, but closed source so we do not in detail know the underlying
details of for example ecs memory alignment.
We believe our approach demonstrates that game engines are well suited to im-
plement data visualization tools for the upcoming biological data studies.
1.4 Outline
The remainder of the thesis is structured as follows.
Position
Draw to
Processing screen
5
6 CHAPTER 2 MAPPING
An entity can be viewed as a key with a unique entity ID, and the entity has
components assigned to it as seen in Figure 2.1. For a system to process on an
entity, the entity must have the components that the system requires. As seen in
Figure 2.2 the drawing system requires the sprite and the position components
in order for an entity to be rendered to the screen. The key may unlock the
calculation «key-hole» if it has the right components, «key-pins». An entity
may acquire components dynamically, for example in «Super Mario» Mario
can acquire the «Super Mushroom» which grows Mario to a bigger version of
himself. If Mario is hit by a turtle shell when he is Super Mario, he will go
back to normal size. As seen in Figure 2.3 Mario starts with the «Small Mario»
component, when Mario hit the Super Mushroom entity he acquires the Super
Mario component and lose Small Mario component.
Component Manager
Mario
Coin
Super
Mushroom
Fire Flower
Enemy
Figure 2.3 shows a small representation of how Super Mario could be rep-
resented in a ecs model. Mario is the only entity that has an input and is
therefore the only entity that the player can control. Coin, super mushroom,
fire flower and enemy entities have all the score component. But the points
gained from these entities are not equal. Super mushroom and fire flower is
worth of 1000 points and coins are worth 200 points, this is set when attaching
a new score component on the entities. It is the system that contains this game
logic, the entity only states which components it has, and the components
states e.g. a position component stores the x and y position.
Components are loosely coupled, so components are not aware of other com-
ponents. To reference other entities, all that is needed is the entity ID, there is
no need to store a pointer or the entity itself.
Unity3D does not explicitly follow the ecs model, but it shares similarities with
the ecs model. It uses game objects as the unique entity and it attaches com-
ponents to that entity, e.g. the built-in component «Transform». The transform
component stores the information about the size, position, scale and scale of
the game object.
The small rectangles in Figure 2.5 represents genes and the arrows from or to
the genes are links. The rounded squares are other pathways that this pathway
is connected to. The small circles are compounds which is a collection of small
molecules, biopolymers, and other chemical substances that are relevant to
biological systems.
The content within the kegg pathway images are represented in separate file
as Kegg Markup Language (kgml). The kgml file is describing the type of
the content, coordinates, size of the content and if there are any links they
are listed up. Not all the kegg pathway image content is represented in the
eXtensible Markup Language (xml) file and the information about the content
may therefore be lacking vital information, e.g. annotated data such as the cell
membrane that is the two vertical lines in Figure 2.5.
The nowac biobank is a collection of genetic data from patients and is cur-
rently consisting of 70 000 blood samples. The genetic information is encoded
in the Deoxyribonucleic acid (dna) in units (genes) and selectively used (ex-
pressed/transcribed to Messenger ribonucleic acid (mrna)) as templates for
production of proteins, which in turn are the working units of the cell [6].
2.3 M A P P I N G F R O M B I O L O G I C A L D ATA T O ECS 9
Id Expression
PCAT7 1.20
PCAN-R2 -1.34
CASC18 10.62
CASC20 4.00
which component this entity has. As the arrows in Figure 2.7 points out, adding
a new entity or component is easily added at the end of the table.
The component manager contains system such as a drawing system, this system
traverses the components and which which of the entities that holds both a
sprite and a position component. If we were to add another entity that should
be drawn it would create no more involvement in the codebase with the
drawing system, since all it has to do is to add the sprite and the position
component.
Component Manager
Pathway
Gene
Compund
Figure 2.7: ecs overview of a pathway visualization application. The columns are
components and each row is an entity.
But as mentioned earlier we have only mapped one of the popular pathway
databases in this project. Further down in the development we could see
ourselves including other pathway databases which would make us refactoring
the codebase whenever we want to scale the information basis.
component. E.g. all these entities have input, sprite and a position, but only the
pathway entity should be able to be repositioned. Genes and compounds are
connected to the position within the pathway, so should the position component
in the pathway entity push the new information to these position components
that are linked to the pathway entity, or should all this be handled in the
system? Components should be independent, so therefore the system should
handle this.
The ecs model is hard to start with when you quite not fully understand the
concept, but again it is even harder to go from the hierarchical model to the
ecs model. However, we believe the simplicity of the final implementation
makes the mapping worthwhile.
3
Freia
The mapping in previous chapter is realized as a standalone application, Freia
implemented in Unity3D. It uses the Kvik framework [4] as a backend. The
application is portable to devices including mobile phones, browser, windows,
mac, and linux.
Figure 3.1: Screenshot of Freia. Showing the prostate cancer (hsa05215) pathway with
gene expression data.
13
14 CHAPTER 3 FREIA
The application was designed for the researchers exploring gene expression
data in the context of biological pathways. This applies to the operations which
are further explained in Chapter 3.2. We demonstrate the functionality of Freia
in this YouTube video: youtu.be/22XmfSYOwO8.
(a) Original static pathway image from (b) Overlaying graph nodes from the kgml
kegg. representation of the pathway.
Figure 3.2: Visualizing gene expression data on kegg pathway maps. Figure is in-
spired from Figure 5.5 from[7].
With the knowledge of where the nodes are in the image, it is possible to create
nodes on top of nodes in the image. The user does not see these overlaid
nodes as they are transparent and it creates the illusion that the image is
interactive.
Freia
Kvik framework
Kvik framework provides a rest interface to access the kegg database (Table
3.1). In Freia we use simulated gene expression values on a scale from 0 to
1.
Resource Description
GET /pathway/{id}/json Returns the KGML of the id in json for-
mat.
GET /public/pathways/{id}.pnд Returns the png image of the id to the
given pathway.
GET /search/{term} Returns pathways that matches the given
term in json format.
Freia is made in Unity3D with a single scene, the camera is static and does not
move position. There is a User Interface (ui) at the top of the screen which
the user uses for the visualization operations elaborated in Chapter 3.2. The
biological data entities that were described in Chapter 2.3 are instantiated as
game objects in Unity3D, where the pathway and gene nodes in the kgml
are represented as child game object to the kegg pathway image (Figure
3.5).
3.2 O P E R AT I O N S 17
Figure 3.4: Pathway search results from the search term «Cancer».
3.2 Operations
We have implemented operations which we consider important for biological
data exploration: i) search and visualize pathway, ii) search and highlight gene,
iii) gene expression visualization, and iv) path discovery (connections between
genes). Combined these cover a wide variety of visualization and exploration
for researchers.
Pathway search is implemented in Unity3D as a text field where the user can
insert text. Beneath this text field there is a restricted area that has the vertical
scroll functionality. Every search term entry is sent to Kvik framework by using
its rest Interface to the kegg database. The kvik framework returns pathways
that matches the search term and Freia is creating results that visualize the
title and stores the id of the pathway in the result game object. For instance,
the title «Pathways in cancer» has the pathway id «hsa05200».
When a result is clicked a pathway entity is instantiated and two new get
requests are sent to Kvik framework. One request is for the pathway image
and the other request is for the pathway kgml. Freia uses coroutines functions
which waits for a response from the backend. The image returned from the
backend is directly loaded into a Unity3D texture which is applied to the
pathway entity. Parsing of the kgml is slower than the image retrieving and
rendering (Chapter 4.3.1), so the pathway image is visible before the user can
interact with its nodes. Every gene and pathway node that is present in the
kgml is instantiated accordingly with its position, size, edges, name and id.
The nodes are instantiated as transparent buttons as child game object to the
pathway. This way they inherit the global space position of the pathway game
object and will move and scale accordingly to the pathway.
3.2 O P E R AT I O N S 19
Another essential function is to search for specific genes. Freia features search-
ing for specific genes within pathways, and highlights the results on the pathway
visualization. Freia also features the possibility to highlight and remove high-
ligted genes by clicking on the specific genes. There is no communication to
Kvik framework since the gene search is on the active pathways
Every gene is overlaid on the pathway image as buttons. The button is a Unity3D
class that has ability to listen to click events. When a gene is instantiated, the
button is transparent and is therefore not visual. But when the button registers
that it has been clicked on, it becomes visible, but transparent enough that the
gene name is readable.
When a search for a specific gene is entered, Freia iterates through every
pathway game object and every node in the pathway game object to check the
entered gene name matches the name of the gene game object. If there is n
number of pathways with m number of nodes, it has to iterate over n x m times.
If the gene entity matches the entered search term, it is highlighted the same
way as it was clicked.
To get insight in how the processes are affected by gene expression levels in
biological samples, it is common to overlay gene expression data on top of
biological pathways. We retrieve the gene expression from a result of microarray
technology from nowac through the Kvik Framework and is shown on top of
the genes, see Figure 3.6. When the researchers apply gene expression data the
genes will change in color if the genes are up-, or down regulated. Researchers
can turn on and off the gene expression visualization by dragging the slider
bar in Figure 3.6 on and off.
20 CHAPTER 3 FREIA
(a) Pathway with gene expression values (b) Pathway with gene expression values
not visible. visible.
Figure 3.6: Visualizing gene expression data on kegg pathway. Slide bar operates as
the on and off button.
The gene expression values that are retrieved from Kvik framework have in
this experiment been simulated. This is done because the datasets used in this
prototype is not relevant. A gene expression is a value from 0 to 1 and it does
not matter where this value comes from. Although, the simulation data can be
swapped to with real data from the nowac dataset with minor tweaks.
When the gene expression slider is active, every pathway game object is iterated
and every node in the pathway game object. Every gene has a gene expression
component which stores if the gene is up-, or down regulated. There might be
cases where there are no gene expression data, and as seen in Figure 3.6 the
genes color does not change. The visualization of the color is done by changing
the transparency level with a light red or blue color.
There may be special interactions including a set of genes are involved, and this
may be easier to discover if researchers can see how genes in a pathway are
connected. Freia provides support for searching for paths between two genes
in a pathway which is active in Freia. Figure 3.7 shows an illustration of a path
3.2 O P E R AT I O N S 21
between gene «araf» and gene «rps6ka5» in the Bladder Cancer (hsa05219)
pathway. The path is highlighted by coloring the genes in the path blue.
When searching for a path between gene X and gene Y, Freia iterates over
every pathway game object and thereafter every node in the pathway game
object. It checks if gene X and gene Y to the gene node names in the pathway
game object, and if both is present in the same pathway game object it can
begin the recursion search. Since the links in the pathway images are directed
graphs, the network path search only checks if there is a path from gene X to
gene Y. Every gene game object has a link component which holds every other
gene it links to. Freia uses first-depth recursion until gene Y is found or there
are no more unvisited links. The path networking function visualize the path
by changing the transparency level to a blue color.
4
Evaluation and discussion
For our Freia implementation, our main focus has been to investigating and
demonstrate how we can map the biological data into the ECS model. How-
ever, performance and scalability is an important motivation for using game
engines.
23
24 CHAPTER 4 E VA L U AT I O N A N D D I S C U S S I O N
data with a game engine? This is a subjective experience from us, but
Lines of Code (loc) gives an indication of effect.
The experiments are performed by running the application within the editor
and with the built-in Unity3D profiler. The profiler gives real-time feedback from
the running application from Unity. We used the built-in profiler for running
micro-benchmarks to measure performance. Since we ran the application
within Unity instead of a standalone application, this may have interfered with
some of the experimental data. We have three different experiments test input.
How is the performance difference from one pathway, seven pathway or all the
pathways¹(If it is possible to load all pathways).
The experiment was performed by selecting a random pathway out of the total
amount of pathways that kegg has. If we were not below our threshold, 30
fps, we kept increasing the number of pathway until we found the roof.
Note that the cpu usage shown in profiler is scaling accordingly for the highest
peak in the current view. It does not imply that the highest peak uses 100 % of
the cpu.
Figure 4.1: Effects of frame rate on user perception. Figure 19 from [8].
We use the results from [8] as target fps. In [8], results from an experiment
with one-hundred participants with a first-person shooter game. First-person
shooter games are highly interactive and requires that the information on the
screen is up to date at all times. As seen on Figure 4.1 there is little to no
change in both quality and playability from 30 fps and 60 fps, but when
the frame rate goes below 30 fps both quality and playability declines. Freia
should at least 30 fps in order to achieve «smooth» performance.
26 CHAPTER 4 E VA L U AT I O N A N D D I S C U S S I O N
4.3 Experiments
4.3.1 Load Pathways
We compare the result from Freia with Kvik Pathways [7] therefore use the
same four pathways as [7] (Table 4.1).
Loading a pathway requires both the pathway kgml and image from Kvik
framework. The latency is measured from the user click on a pathway result
from the search field, until it is rendered with all the overlay node as seen in
Figure 3.2.
Table 4.2 shows the result for loading different pathways. The results shows that
the load time increases as the number of nodes increases, but for the average
pathway the load time would be 1.5 second. Pathways with a larger number of
nodes uses almost a second longer than the average load time. The variance
in load time is small and not notable for the users eye.
[9] reports that there are many different definitions of response time. In our
case the «System, can you do work for me?» definition from [9] is the most
applicable. This definition states that the acknowledgement should be within
two seconds. As seen in Table 4.2 the largest pathway hsa05200 (Pathways in
cancer) does not satisfy this response time definition. As the loading time is
increasing as the number of nodes is increasing in the pathways, it indicated
that it is the processing of the kgml is the most cpu expensive. The parsing of
the response from the Kvik framework and instantiation of new game objects
is the cause of slow processing. Requesting and rendering only the image takes
approximately 20 milliseconds, it does not matter if it has few or many nodes.
28 CHAPTER 4 E VA L U AT I O N A N D D I S C U S S I O N
Comparing Freia (Table 4.2) with Kvik Pathways (Table 4.3), the latency of all
pathways are higher on Freia. We assume that it takes longer time because
of the instantiation of new game objects within Unity3D. The large hsa05200
pathway has 267 nodes which implies that Unity3D has to instantiate one
game object for the pathway itself and then accordingly one game object per
node. Instantiating and destroying game objects is inefficient when it occurs
frequently [10].
Freia is more consistent when it comes to standard deviation, the larger the
pathway the less deviation was registered. For the small pathway the deviation
is 0.1 second which is not notable for the eye when requesting the same pathway
image over and over. Kvik Pathways has a larger deviation of response time and
they suspect that it is the garbage collector in Firefox that is responsible for it [7].
This may be applicable if we deployed Freia as an web application, and it would
therefore maybe affect Freia the same as it did with Kvik Pathways.
We use the profiler while doing the operations described in the case study in
Chapter 3.2.
1. Open Freia.
2. Enter the search term, «cancer», in the search field that is open when
Freia is started up and press enter.
5. Press the «Gene Expression» box and drag the slider to the right. The
gene boxes which has data will now show up regulated (red) or down
regulated (blue).
6. Press the «Find Path» box and enter a start gene and end gene. The path
found is shown.
The experiment for one pathway we can that both dragging (Figure A.1) and
gene expression (Figure A.2) operations satisfy our over 30 fps requirement.
By comparing Figure A.1 and Figure A.2 it is clear that applying gene expression
on every gene is more cpu expensive and it affects the fps way more. Figure
2 has a high blue peak which indicates that it is the scripting that is expensive,
and as seen in the overview window below it is shown that it is the event system
update function that is eating cpu cycles. As described in Chapter 3.2.3 the
gene expression is coloring a transition from transparent to a noticeable color
and back, depending on which way the slider is. If this was only implemented
as an on or off it would maybe be more efficient for the cpu.
One pathway with gene expression (Figure A.2) operation ate a lot of cpu
and gave Freia a frame rate of 60 fps in comparison with a frame rate on 120
fps with drag operation. The gene expression operation is coloring every gene
that have data on it, and this is on every pathway that is open. The dragging
operation is only done on one pathway image at a time, so this operation
does not scale with the number of pathways as the gene expression operation
does.
Seven pathways are the expected pathway work size that researchers would
work with in order to gain a proper overview. Dragging an image with seven
pathways open has a frame rate of 110 fps average as shown in Figure A.3.
30 CHAPTER 4 E VA L U AT I O N A N D D I S C U S S I O N
The frame rate has dropped from 120 fps in one pathway to 110 fps and it
is acceptable. With seven pathways open and with gene expression operation
it barely satisfies the frame rate requirement as seen in Figure A.4. In Figure
A.5 the frame rate has been cut in half and it is clearly violating the frame rate
requirement.
The drag operation can handle many more pathways than the gene expression
operation, it was not violating the requirement until it had 50 pathways. In
Figure A.6 it is clear that the rendering is consuming more cpu cycles than
the script part. Freia can almost handle 100 pathways (Figure A.7) running
simultaneously, but this is with no interactions. Freia serves no use if it can not
be interacted with, but as earlier noted it is expected that the average workload
in number of pathways is seven, which clearly satisfy the requirement. 50 and
100 simultaneously running pathways is not expected at all, but is used for
measurement to see what Freia can handle.
Game developers have throughout many years gathered routines and frame-
works which can be reused in a variety of games. These routines and frame-
works have merged over time to game engines, which is a platform dedicated
for game development. We will in this chapter present how it is to develop
with a game engine and how easy is it to develop a visualization of biological
data with the game engine.
The mapping from the biological data to the ecs model and how to do this
in Unity3D was the most time consuming in this project. The visualization
did not go without a hassle. Unity3D operates with three different coordinate
systems; world point system, screen point system and viewport point system.
When objects coordinates need to be relative to each other, you may need to
convert from one coordinate system to another. World point system is how
the objects are relative to the world space in the application, it is the absolute
XYZ coordinates of the objects. Viewport point system and screen point system
represent the same area on the screen and both are represented in 2D, but
they have different coordinate system. The viewport space is where the camera
renders and is typically used for ui elements. All inputs from a mouse or touch
is received in the screen space, and these have to be mapped to viewport space
for the ui elements. Including to these three coordinate systems there are
also a local space as opposed to the world space. Local space is the coordinate
relative to another object, and this was something we could exploit when
attaching the gene and pathway nodes in the pathway game object. Unity3D is
a game engine full of features from visualization support to controller support,
but thanks to the Unity3D documentation [11] and the Unity3D use community
[12] which is great when first getting acquainted with Unity3D and for further
experience.
In our case we have 692 loc where the code is consisting of initializing
game objects, aligning game objects in the coordinate system and storing, get
requests, parsing responses and storing attributes. Comparing our loc to the
demographic data shown by [13] Freia has less loc than a simple iPhone game
application. The demographic data is displaying the loc different application
has. The simple iPhone game application in the graph [13] presents may
include external libraries and so on, but it is an indicator for complexity of
the application. With loc and out subjective experience we conclude that
visualization of biological data with a game engine may be accomplished
without great difficulty.
Memory
We did not specify any requirement for memory usage, but as the experiment
were done we noticed some trends. For every pathway we instantiated it used
approximately 22 MB, regardless if the pathway had few or many nodes. Every
node is instantiated, so we would expect that a pathway that has 260 nodes
would use more memory than a pathway that has 40 nodes. This indicates
that the structure that is used is maybe bad memory wise and need further
reconstruction.
4.4 FROM 2 D T O 3 D V I S U A L I Z AT I O N 33
Since there are not any databases that have 3D representation of biological
processes, the solution to 3D visualization of biological processes is to create
a 3D world with kegg pathways. We did some initial experiment for how
connected the kegg pathways for human diseases are. We had some question
regarding how everything were connected and how we should interpret this
information and use this for our advantage when mapping this biologic data.
This experiment was done by using D3² library where it received the requests
from the Kvik framework.
2. d3js.org
34 CHAPTER 4 E VA L U AT I O N A N D D I S C U S S I O N
In Figure 3.7 every node is a pathway and there is an edge between connected
pathways. We discovered that even though there is a connection between two
pathways, it does not mean that there is a connection from both pathways to
each other, so the graph is a directed-graph. For further connectivity visualiza-
tion such as 3D visualization of the pathways, this has to be indicated somehow.
Gene expressions in pathway X may effect gene expressions in pathway Y, but
not the other way around.
In Figure 3.7 the two clusters of nodes show two groups with a strong relation-
ship inside them. There are also a group of nodes that are not connected to
any other node. This raises the question to how to explore to these pathways
when clicking through the pathways scheme. Even though pathways may not
be connected, these may share genes or compounds which affects biological
processes. We should therefore have a network path search that would show
every pathway that has the same path between two given genes.
Figure 4.4: Pathway connection clustering. Pathways are represented as nodes and
pathway connection is the edges between the pathways.
5
Related Work
While there exist several tools and application for visualizing and exploring
biological data, the biology projects often requires custom built solutions for
their specific problem/data set.
5.1 Kvik
Kvik is framework for developing applications for exploration of biological
data. Kvik provides a simple interface to systems for executing statistical
analyses and retrieving information from meta-databases, such as kegg. [4]
Kvik is designed as a modular framework that allows developers to build light-
weight applications that interact with powerful compute and storage resources.
Kvik Pathways is an application that is built using the Kvik framework. Kvik
Pathways is a web application that allows researchers explore gene expression
in the context of kegg pathways. [4] Similar to Freia, Kvik Pathways visualizes
biological pathways by overlaying nodes on top of the static pathway image from
the kegg database. Figure 5.1 shows the user interface in Kvik Pathway.
35
36 CHAPTER 5 R E L AT E D W O R K
5.2 KEGG
kegg is an integrated database resource consisting of a collection of 16
databases which again are categorized into four systems [15]. It is used for
understanding high-level functions and utilities of the biological systems. One
of their main database is the pathway database which consists of a collection of
manually drawn pathway maps that represents their knowledge on the molec-
ular interaction and reaction networks. kegg is freely available through its
website¹ or through the GenomeNet mirror website² where users may retrieve
the images in png format and exchange format kgml.
1. kegg.jp
2. genome.jp/kegg
5.3 B I O C A R TA 37
The resource that kegg is freely available, but if the users of the kegg database
want push notification or access to download the whole database, users may
subscribe for a File Transfer Protocol (ftp) service to gain access.
5.3 BioCarta
BioCarta is an interactive online resource targeted to the life science research
community [16]. It is similar to kegg in the context that users may search for
pathways and click on genes to view additional information. But the pathway
section in BioCarta, the gene interactions are presented as dynamical graph-
ical models (Graphics Interchange Format (gif)). Figure 5.3 shows the Wnt
signaling pathway from BioCarta.
38 CHAPTER 5 R E L AT E D W O R K
Figure 5.3: Screenshot of the Wnt signaling pathway in BioCarta. Figure taken from
[16].
The pathways and data were not updated any more, but the pathways are still
available at the Cancer Genome Anatomy Project³.
5.4 KEGGViewer
KEGGViewer[17] is a BioJS[3] component to visualize kegg pathways. It
uses the kgml representations of pathways from the kegg rest api to
3. cgap.nci.nih.gov/Pathways/BioCarta_Pathways
5.5 CALEYDO 39
build pathways, and visualizes them in a web browser using Javascript library
Cytoscape[18]. Since KEGGViewer only uses the kgml representation to gen-
erate the visualizations, they are missing added annotation as well nodes and
edges.
5.5 Caleydo
Caleydo is a visualization system that addresses and supports two work flows,
pathway-centric approach and analysis of gene expression data [19]. It supports
two pathway databases, kegg and BioCarta, which consist together of 600
pathways. Uses traditional multiple views for viewing large datasets or highly
interactive visualizations, also supports 2.5D technique in order to support a
seamless navigation of multiple pathways which simultaneously links to the
expression of the contained genes. Figure 5.5 shows the bucked view with
integration of both the kegg pathways and BioCarta pathways.
40 CHAPTER 5 R E L AT E D W O R K
Figure 5.5: Caleydo visual analysis framework for gene expression data in its biological
context. Figure taken from [19]
Caleydo have in the later years divided its task into smaller projects and today
it consists of 7 projects [20].
Figure 5.6: Entourage showing the Glioma pathway in detail and contextual informa-
tion of multiple related pathways.
5.6 Entourage
Entourage is a visualization technique that provides contextual information
when visualizing multiple related pathways [21]. It uses a single focus pathway
for main interaction and exploration, and visualizes only what is important
to researchers from other related pathways. Entourage visualizes subsets of
5.7 D Y N A M I C E X P L O R AT I O N 41
related pathways to give context information about the location of user selected
genes within related pathways. Entourage is using previous existing techniques
such as enRoute [22] technique in order to visualize experimental data and
Bubble Sets [23] in order to highlight the selected nodes and the the route for
these nodes within a pathway.
Figure 5.7: Dynamic exploration of two pathways with connected edges. Figure 2 from
[24].
42 CHAPTER 5 R E L AT E D W O R K
5.8 UnityMol
UnityMol is a prototype for displaying biological network and molecular visu-
alization for research or educational use [25]. The UnityMol prototype is built
with Unity3D and is exploring the possibility for quick developments for educa-
tional or scientific purpose. UnityMol visualization is compared to an original
cytoscape[18] visualization in Figure 5.8. UnityMol is released in two versions,
a stand-alone application and a web applet which runs on top of Unity3D
web-plugin. UnityMol is using the built-in graphical primitives in Unity3D to
create 3D content with spheres and point-sprite particles. Our approach does
not have the 3D representation of the biological network as their approach is
based on.
Users may search for specific domain knowledge, such as pathways, genes and
paths between genes. By using a game engine, we can deploy to a multitude
of devices and develop in substantially less time and with less developer effort.
This raises huge potentialities for quick development and our users may also
contribute by using the user friendly interface.
Our approach to visualize and explore biological data is by using a game engine.
We use familiar representation of biological processes (kegg pathway) and
operations that utilizes the search for patterns regarding cancer development.
Such operations must include multiple views of biological processes, search for
biological processes, search within biological processes and search for connec-
tions in the biological processes. This have been realized as pathway search,
gene search, gene expression visualization and path discovery.
Although we apply the biological data to a specific data type (gene expression)
and representation of biological processes (kegg), within the game engine,
the approach is more general. By using the ecs model, game objects in the
game engine can have components that represent different data types and
biological knowledge. A pathway game object can for example get biological
43
44 CHAPTER 6 CONCLUSION
knowledge from other databases such as Reactome, rather than kegg. This
allows researchers to combine different image representation of biological
processes, and this may help the researchers with exploring of patterns that
may be related to the development of cancer.
Pre caching Freia is sending every pathway search to the Kvik framework. We
believe that the meta data of the human pathways would not be heavy
for Freia to store and this would reduce the network traffic. Although
the network traffic is not big, but having the meta data about pathways
in Freia gives further possibilities in other functionalities.
Path networking The researchers were quite excited about the ability to
search for paths between two genes. This search work on genes, so
if there is a pathway or a compound between, the path would not be
detected. There are not every pathway that has the compound links in
the kgml.
The path search only search for paths in the opened pathways. If the
meta data about the pathways were stored in a data structure in Freia,
it could be expanded to search for the path between gene X and Y in all
of the pathways.
6.1 FUTURE WORK 45
ecs model Our implementation of the ecs system is not efficient hence
the lack of system implementation. The entities and components are
implemented, but the functionality such as the drag operation that should
be handled by the movement system is not implemented. These have been
implemented inside the components. When the system is implemented,
it is room for better memory utilization, aligning components of the same
type in memory.
Bibliography
[1] Rebecca L Siegel, Kimberly D Miller, and Ahmedin Jemal. Cancer statistics,
2015. CA: a cancer journal for clinicians, 65(1):5–29, 2015.
[2] Eiliv Lund, Vanessa Dumeaux, Tonje Braaten, Anette Hjartåker, Dagrun
Engeset, Guri Skeie, and Merethe Kumle. Cohort profile: the norwegian
women and cancer study—nowac—kvinner og kreft. International journal
of epidemiology, 37(1):36–41, 2008.
[3] John Gómez, Leyla J García, Gustavo A Salazar, Jose Villaveces, Swanand
Gore, Alexander García, Maria J Martín, Guillaume Launay, Rafael Al-
cántara, Noemi Del-Toro, Marine Dumousseau, Sandra Orchard, Sameer
Velankar, Henning Hermjakob, Chenggong Zong, Peipei Ping, Manuel
Corpas, and Rafael C Jiménez. BioJS: an open source JavaScript frame-
work for biological data visualization. Bioinformatics (Oxford, England),
29:1103–4, 2013.
[4] Bjørn Fjukstad, Karina Standahl Olsen, Mie Jareid, Eiliv Lund, and
Lars Ailo Bongo. Kvik: three-tier data exploration tools for flexible
analysis of genomic data in epidemiological studies [version 1; ref-
erees: 2 approved with reservations]. F1000Research 2015, 4:81 (doi:
10.12688/f1000research.6238.1), 4, 2015.
[6] Karina S Olsen. Blood gene expression, lifestyle and diet - The Norwegian
Women and Cancer Post-genome Cohort. Doctoral thesis, University of
Tromsø, 2013.
[7] Bjørn Fjukstad. Kvik: Interactive exploration of genomic data from the
nowac postgenome biobank. 2014.
[8] Kajal T Claypool and Mark Claypool. On frame rate and player perfor-
mance in first person shooter games. Multimedia systems, 13(1):3–17,
47
2007.
[15] Minoru Kanehisa, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and
Mao Tanabe. Kegg as a reference resource for gene and protein annotation.
Nucleic acids research, page gkv1070, 2015.
[16] Darryl Nishimura. Biocarta. Biotech Software & Internet Report: The
Computer Software Journal for Scient, 2(3):117–120, 2001.
[19] Alexander Lex, Marc Streit, Ernst Kruijff, and Dieter Schmalstieg. Caleydo:
Design and evaluation of a visual analysis framework for gene expres-
sion data in its biological context. In Pacific Visualization Symposium
(PacificVis), 2010 IEEE, pages 57–64. IEEE, 2010.
[22] Christian Partl, Alexander Lex, Marc Streit, Denis Kalkofen, Karl Kashofer,
and Dieter Schmalstieg. enroute: dynamic path extraction from biological
pathway maps for exploring heterogeneous experimental datasets. BMC
bioinformatics, 14(Suppl 19):S3, 2013.
[24] Christian Klukas and Falk Schreiber. Dynamic exploration and editing of
kegg pathway diagrams. Bioinformatics, 23(3):344–350, 2007.
[25] Zhihan Lv, Alex Tek, Franck Da Silva, Charly Empereur-Mot, Matthieu
Chavent, and Marc Baaden. Game on, science-how video game technology
may help biologists tackle visualization challenges. PloS one, 8(3):57990,
2013.
Appendix A
51
Figure 1: Screenshot of profiler with one pathways with drag operation.
Figure 2: Screenshot of profiler with one pathways with gene expression operation.
Figure 3: Screenshot of profiler with 7 pathways with drag operation.
Figure 4: Screenshot of profiler with 7 pathways with gene expression operation.
Figure 5: Screenshot of profiler with 14 pathways with drag operation.
Figure 6: Screenshot of profiler with 50 pathways with drag operation.
Figure 7: Screenshot of profiler with 100 pathways running.