Generatebioinformaticsdatausing Generative Adversarial Network AReview

Generate bioinformatics data using Generative Adversarial Network A Review

Uploaded by

le.du.charlotte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Generatebioinformaticsdatausing Generative Adversarial Network AReview

Generate bioinformatics data using Generative Adversarial Network A Review

Uploaded by

le.du.charlotte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/321865166

Generate bioinformatics data using Generative Adversarial Network: A Review

Conference Paper · December 2017

CITATIONS READS
0 1,534

2 authors, including:

Sharmilan S
Informatics Institute of Technology
4 PUBLICATIONS 3 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PREGNANCY COMPLICATIONS DIAGNOSIS USING PREDICTIVE DATA MINING View project

Generate bioinformatics data using Generative Adversarial Network View project

All content following this page was uploaded by Sharmilan S on 17 December 2017.

The user has requested enhancement of the downloaded file.

Generate bioinformatics data using Generative
Adversarial Network: A Review
Sharmilan S 1, Hapugahage Thilak Chaminda 2
Informatics Institute of Technology, Affiliated to Robert Gordon University, Scotland
Colombo, Sri Lanka
[email protected] 1, [email protected] 2

Abstract – Data is the most important part in machine learning. to two main types such as text and image. Many researchers
In bioinformatics field the sensitivity of the data is high and done to predict or analysis the medical issues or dieses using
due to that the accessibility of the data for a secondary purpose the images [19] [33]. And same time many researchers done
(e.g.: research) is consist with many legal and ethical issues.
Due to that in many bioinformatics researches collecting the
to predict the things using patient records or electronic
data consume more time than the development phase. There health records [8] [36] [33]. In that case the sensitivity and
are some researches done to solve the legal and ethical issues privacy of the data comes in to play. As most of the health
by anonymising the data using encryption, de-identification records consists with personal details.
and perturbation of potentially identifiable attributes. For If we take a process of prediction models it all depend on the
some extend those solutions restricted the data breach but in
other hand anonymized data not performed well during the data sets. The attribute selection will be done based on
analysis and mining tasks. Recently Generative adversarial looking at the data set. The collected data will be
networks (GANs) have become a research focus of artificial preprocessed and use for model training and testing. So the
intelligence. The goal of GANs is to estimate the potential model learns about the things that provided based on those
distribution of real data samples and generate new samples data sets. If the gathers data is poor in quality wise or if the
from that distribution. Here, researcher review GAN in data is class imbalanced then the trained model will be
bioinformatics to generate data sets, presenting examples of inefficient [27]. So to improve this whole process collected
current research. To provide a useful and comprehensive
perspective, Researcher categorize research both by the data should be balanced and quality one. Also it should
bioinformatics data and GAN architecture and flow. cover a wide range of inputs.
Additionally, discussed about the issues of GAN in Modern machine learning methods based on deep neural
bioinformatics to generate data sets and suggest future network architectures require large amounts of training data
research directions. Researcher believes that this review will to achieve the best possible results [25]. But due to the
provide valuable insights for researchers to apply GAN to
generate bioinformatics data sets. privacy and legal issues it’s not possible to access large scale
of real patient data. But if try to get access some sort of data
sets then the quality and the distribution of the data will be
I INTRODUCTION poor in many times [8] [33] [19] [36]. So in this paper
Access to data is one of the bottlenecks in the development researcher going to discuss about the issues and difficulties
of machine learning solutions to domain-specific problems. in the bioinformatics researches in terms of access and
The availability of standard datasets (with associated tasks) availability of data. Also discussing about the recent
has helped to advance the capabilities of learning systems in researches done using GAN to generate various things
multiple tasks. However in bioinformatics and medical field including data sets as well. Also how GAN improving the
it is hard to collect the standard datasets in a huge amount field of bioinformatics in terms of generating data samples.
[33]. For example in medical, defense, security and some
other fields the sensitivity of the data is high. In that case the II BIOINFORMATICS
access to data is highly controlled.
Bioinformatics is an interdisciplinary field that develops
The exponential growth of the amount of biological data software tools and machine learning models to understand
available raises two problems: on one hand, efficient the biological data. As it’s an interdisciplinary area of
information storage and management and, on the other hand, science, it combines computer, science, mathematics to
the extraction of useful information from these data. The analyses and understand biological data. As it’s a wide and
second problem is which requires the development of tools complicated area bioinformatics have different genres
and methods capable of transforming all these inside itself. Most popular ones are sequence analysis, Gene
heterogeneous data into biological knowledge about the and protein expression, Analysis of cellular organization,
underlying mechanism [19]. Medical data will be divided in Structural bioinformatics and Network and systems biology
[26]. Even we can divide these things in to sub parts as well. A Quality and quantity of the data
For an example DNA sequencing, Sequence assembly, Generally a successful decision support or prediction
Genome annotation, Computational evolutionary biology, system needs a good amount of quality data. The data can
Analysis of mutations in cancer are the sub parts of be collected as domain knowledge or real patient data sets.
sequence analysis [26]. The National Center for But the first approach is more expensive and we need to
Biotechnology Information reports that there are three main collect good quality and quantity of knowledge [33] [32].
scientific applications of bioinformatics. They categorize But the other one is easy to get but the amount of data that
them as Evolutionary Biology, Protein Modeling and needed is the issue. During the big data boom as similar as
Genome Mapping [29]. As it’s improving dramatically, other industries health care industry also understand the
over the past decades the quantity and quality of biological importance of the data and they started to collect and stored
information has skyrocketed. them for the future works. Even there are researches done to
As bioinformatics containing mathematics and computer unify all the medical data in to a one central system to solve
programing, the advancement of machine learning models the data diverse issues as well [35]. But if a researcher try to
and artificial intelligence largely improved the access the data they will face many legal and ethical issues
bioinformatics field as well. Deep learning has advanced as these data are sensitive as its collected form patients.
rapidly since the early 2000s and now demonstrates state- Basically to train a supervised model the amount of required
of-the-art performance in various fields as well as data is not a constant. The amount of data required is depend
bioinformatics also [20]. Even the invention of GAN helped on the complexity of the problem and the models as well
the bioinformatics researchers to develop biomedical data [25] [10] [32]. Most of the bioinformatics researches are is
and images to solve some complicated areas of more complex and sensitive. Due to that researchers needs
bioinformatics [33] [11] [33] [19] [41]. Using the machine to build an efficient models to provide high accuracy
learning and data mining models bioinformatics solved predictions. Also if the model build using some nonlinear
many complicated issues like predicting diseases in early algorithm then they need more data samples for training and
stage, calculate the patient risk level in early stage and even testing compare to linear model algorithms [25].
modeling and remodeling the RNA and DNA [26] [28].
There are applications and tools developed by researchers B Lack of data in terms of quality and quantity
to detect or identify various types of cancer, brain tumor, Data mining and machine learning is typically associated
diabetes, heart attack and etc. [26] [29]. So as a conclusion with solving real world problems that are characterized by a
the field of bioinformatics absorbed the advancement of large amount of data. However, in practice, collecting large
computing and AI and used effectively in biology and amounts of data in medical field is infeasible. Although data
medical mining could make important advances in this field, several
III BIOINFORMATICS DATA challenges must be addressed [11]. Existing works that
apply data mining to small datasets have shown the
A wide array of biomedical data are generated and made
following challenges:
available to healthcare experts and researchers for the
purposes of research. However, due to the diverse nature of • The over fitting problem. Obviously, a
the medical data, it is difficult to analyze and predict classification decision based on a small number of
outcomes [28]. When we consider diseases most of the instances is susceptible to the over fitting problem,
symptoms and causes are differ from region to region or because of lack of samples representative of the
even country to country. Also for many disease, symptoms whole data distribution [27].
and causes are vary from many non-medical parameters • Due to the small amount of data, it will cause very
such as climate, behaviors and culture [33] [29] [43]. So if poor classification performance [13].
a data collected from a specific place and used for research • Noise. The noisy data instances will lead to
will not applicable for another place with different non- unclear class boundaries and reduce overall classification
medical parameters. When a researcher tries to access the accuracy.
medical data many of the important parameters will be hided New researches and innovations are needed in data mining
due to the privacy issues of patients such as date of birth, technology to address these challenges [11]. Class
birth place, addictions and some diseases like HIV [30] [17]. imbalance is another major problem with data and
Most of the times collected data will not cover all the classes especially in medical field. In the imbalance data set the
of prediction or even there will be no data for some rare class having more number of instances is called as major
cases. With this type of data used to train a supervised model class while the one having relatively less number of
to predict or classify, it will not able to predict rare cases or instances are called as minor class [27] [10]. Most machine
even it will predict them wrong. So the efficiency and the learning algorithms works best when the number of
accuracy of the model will go down due to the class instances of each classes are roughly equal. When the
imbalanced. number of instances of one class far exceeds the other,
problems arise. In such situation most of the classifier are
biased towards the major classes and hence show very poor Now a day’s health care industry facing the big issue was
classification rates on minor classes. It is also possible that data breaching. As they storing sensitive patient data
classifier predicts everything as major class and ignores the including their personal details. Totally 40 biggest health
minor class as it not have enough evidence for the minor care record breaches done all around the world in 2017 until
class [27]. October [14]. Also the data breaches done in places where
the data shared for secondary purposes like researches. To
IV LEGAL AND ETHICAL ISSUES avoid the privacy issues while if there is a data breach,
health care industries used information randomizes and
One reasons behind limited access stems from the fact that
generalization techniques. However, this approach is not
EHR data are composed of personal identifiers, which in
impregnable to attacks, such as linkage via residual
combination with potentially sensitive medical information,
information to re-identify the individuals to whom the data
induces privacy concerns. As a result, access to such data
corresponds [24]. Also anonymising and sharing patient
for secondary purposes (e.g., research) is regulated, as well
data is the new trend in health care industry. But still this
as controlled by the health care organizations that are at risk
if data are misused or breached [17]. The review process by process consume more time to anonymising the data [24].
legal departments and institutional review boards can take Researchers not able to predict the outcome or symptoms by
months, with no guarantee of access [11]. This process region or a particular place because the residential data are
limits timely opportunities to use data and may slow anonymised.
advances in biomedical knowledge and patient care. Health In recent times researches done to generate data samples to
care organizations often aim to mitigate privacy risks overcome these issues. In Recent years advancement of auto
through the practice of de-identification [22], typically generated data went to next extend to create total fake
through the perturbation of potentially identifiable attributes records based on some real record samples. This type of
(e.g., dates of birth) via generalization, suppression or solutions will resolve the data access issues as well as the
randomization. And then they made the data available for data piracy and security issues.
research uses [21]. But for most of the bioinformatics
researches in diseases predication or risk analysis field
needs a wide range of data that includes exact date of birth VII GENERATIVE ADVESARILA NETWORKS
and residential details. So if the data is randomized then GANs are neural networks that learn to create synthetic data
there will be a chance that the accuracy and efficiency is not similar to some known input data. For instance, researchers
up to the high level. have generated convincing images from photographs of
everything from bedrooms to album covers, and they
V OTHER ISSUES WHILE ACCESSING DATA display a remarkable ability to reflect higher-order semantic
logic [2] [1] [37] [19] [11] [7]. GAN was invented by ian
As considering patient data still in many countries like Sri
goodfellow [16]. It was first introduced in 2014 and
Lanka there is no EHR management in large scale [40]. And
afterword’s there are many number of GAN variants were
for some diseases and medical cases such as maternal,
introduced by researchers for different tasks [23]. The
autism and many mental diseases still there is no recorded
concept is basic as if anyone want to improve some skills
data sets and only having the domain knowledge
especially in games they will compete with an opponent
[36][15][38]. So if there is a bioinformatics research need to
better than them. Then they will analyze what went wrong
be done to build some machine learning models using data
or which point it went wrong. Afterword’s they use that
then the researches by them self they need to collect and
knowledge to improve their skills. Same as that hear in GAN
record the data. So the time allocated for research will be
generator network will always compete with the high
diverted to collect the data and store them [36] [40]. Most
accuracy discriminator and then learn his mistakes and
of the times researchers are not in the medical fields, due to
improve his accuracy. So in one point generator will beat
that their domain knowledge is limited in those areas
the discriminator [16]. The efficiency and accuracy of the
compare to the doctors and specialist in medical field. So
generator depend on how powerful the discriminator is, so
they may miss some important attributes while building a
all the time in GAN must to have a powerful discriminator
model as they will only consider about the data and the
[23] [4].
attribute variant. As a conclusion many developed countries
having issues of privacy and storing patient data and the
same time some developing countries still figuring out ways A Architecture and Flow
to collect and store the data. So for these type of countries The architecture of the GAN consist with two classifier
there will be no history of previous data and they will only models. One is discriminator and other one is generator. The
have current data cases. task of generator model to learn and generate things such as
images, sound, and etc. and other one is discriminator.
VI DATA BREACH AND HACKING
Discriminator will classify the generated things as real or
fake by its trained knowledge. Discriminator model
determines whether a given image looks like a real image equation (2) achieves its minimum value based on the below
from the dataset or like an artificially created image. This is equation.
basically a binary classifier that will take the form of a 𝑃𝑑𝑎𝑡𝑎 (𝑥)
normal convolutional neural network over the course of 𝐷𝐺∗ (𝑥) =
𝑃𝑑𝑎𝑡𝑎 (𝑥) + 𝑝𝑔(𝑥)
many training iterations, the weights and biases in the
(3)
discriminator and the generator are trained through back
propagation. The discriminator learns to tell "real" This is th ebest solution of discriminator D. based on the
things/data apart from "fake" things/data created by the equation 4, discriminator of GAN estimates the ratio of two
generator. And once the generated one classified as fake probability densities. D(x) denoting the probability of X.
then generator will get that feedback from discriminator and that meand D(x) denoting the real data, so the discriminator
stat generate a new sample. Until the discriminator fooled try to make the D(x) as 1. And same time if the input data
by the generator to classify generated samples as real one. comes from G (z). G(z) denoting the generated data, then
The sample structure of GAN diagram given below. the discriminator try to make that D(G(z)) as 0 [16]. And he
same time generator G tries to make it approach 1. Since it’s
a min max game between G nad D, the loss function of G is
ObjG(θG) = −ObjD(θD, θG). Therefore, the optimization of
GAN can be formulated as a minimax problem:

𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉(𝐷, 𝐺) = 𝐸𝑥~𝑃𝑑𝑎𝑡𝑎(𝑥) [𝑙𝑜𝑔𝐷(𝑥)] + 𝐸𝑧~𝑃𝑧 (𝑧) [𝑙𝑜𝑔(1

𝐺 𝐷
Fig. 1 Generative Adversarial Network Architecture − 𝐷(𝑔(𝑧)))]
(4)
In this section researcher going to discuss about the training As a conclusion, during the learning process of GAN, needs
and learning process of GAN. To get in to the training to train the discriminator D to maximum accuracy to
process first, researcher describe the optimization of discriminating input data. That means discriminator should
discriminator D given generator G. Similar to the training of have high accuracy to confirm whether the input data is real
sigmoid function based classifiers, training the or generated one. Then need to train the the generator G to
discriminator involves minimizing the cross entropy [23]. minimize log(1 − D(G(z))) [23] [16]. Normally in GAN first
The loss function given in the below formula. train the discriminator D to its maximum accuracy and not
train the G. then once the training completed for D then fix
1 the D as it is and train the G to minimize the discrimination
𝑂𝑏𝑗𝐷 (𝜃𝐷 , 𝜃𝐺 ) = − 𝐸𝑥~𝑃𝑑𝑎𝑡𝑎(𝑥) [𝑙𝑜𝑔𝐷(𝑥)] accuracy of D. as this process continue in one point the
2
1 model achieve the global optimal solution if and only if
− 𝐸𝑧~𝑃𝑧 (𝑧) [𝑙𝑜𝑔(1 − 𝐷(𝑔(𝑧)))]
2 pdata = pg [23] [16]. That time discriminator fooled by the
(1) generator and, discriminator classify the generated data as
In GAN the training data will be in 2 parts. One is the real the real data.
data pdata(x) and another one is the generated data
distribution pg(x). Where x is sampled from real data VIII RELEATED WORKS TO GENERATE DATA SETS USING
distribution pdata(x), z is sampled from the prior GAN
distribution pz(z) such as uniform or Gaussian distribution,
There is a research done to generate electronic health
and E(·) represents the expectation [23]. Its slightly
records for secondary purposes [11]. The access and
differentiate from the conventional binary classification. In
availability of the electronic health records has motivated
generator need to minimize equation 1 value to get a best
the bioinformatics advances in research side. However still
solutions. In continuous space, equation 1 can be
the EHR systems not automatically provide easy access to
reformulated as below.
the data for researches. The main reason behind the limited
access is the data consist with personal details. So the health
care organizations put high security over their data because
it can easily misused or breach.
As a solution they used GAN and created a new approach
call medGAN (medical Generative Adversarial Network)
[11]. medGAN will generate realistic synthetic patient
records. It’s trained by using real records to create fake
records. They used auto encoder and GAN to generate high
(2) dimensional discrete variables from binary to count values
The above equation will achieves its minimum value at y = [11]. Also they proposed mini batch averaging to effectively
m/ (m + n). Hence, given generator G, the objective
avoid mode collapse. That will improve the efficiency of medGAN will improve the healthcare industry and provide
learning with batch normalizations [11]. good amount of contribution to bioinformatics field in terms
They used GAN along with the auto encoder to generate the of data sets. For future directions, they plan to explore the
datasets. The model was trained using real world data sets sequential version of medGAN, and also try to include other
and from that they try to produce similar generated sets. So modalities such as lab measures, patient demographics, and
the generate network will create data samples and the free-text medical notes [11].
discriminator will tell whether the created one is real or fake. Now a days in bioinformatics images are widely using for
If the data is fake then the generator will recreate another predictions and analysis. But the availability of annotated
sample by using the advice from discriminator. For the data in large amount becoming increasingly critical [33].
experiment they used 3 large electronic health record sets. However, annotated medical data often scarce and costly to
They used a proprietary dataset from a private health obtain. In recent advances of deep learning and deep
organization, which consists of 10-years of longitudinal networks are required large amount of data to be trained. A
medical records of 258K patients, MIMIC-III dataset wide availability of such data may allow researchers to
(Johnson et al., 2016; Goldberger et al., 2000), which is a develop and validate more sophisticated computational
publicly available dataset consisting of the medical records techniques. As a solution for this problem researchers
of 46K intensive care unit (ICU) patients over 11 years old introduced GAN model to generate synthesis retinal color
and a proprietary dataset for a heart failure study from a images [33]. It’s an image simulation based approach. So
private health organization, which consists of 18-months using that researchers can create synthetics retinal images as
observation period of 30K patients [11]. much as they want.
They carried variety of experiments and evaluation to check Researchers proposed to implement an adversarial auto
the quality and the performance of the system. They found encoder for the task of retinal vessel network synthesis. Also
that min batch averaging increased the performance of researchers used generated vessel trees as an intermediate
medGAN [11]. stage for the generation of color retinal images, which is
accomplished with a GAN. To train the model they used
Messidor-1 dataset that consist automatic retinal vessel
segmentations data [33].
This model achieved a 0.9755 AUC on the DRIVE test set,
a result aligned with state-of-the art methods for retinal
vessel segmentation. Only images from Messidor-1 with
grades 0, 1 and 2 were used in this work, reducing the
number of example pairs to 946. This dataset was randomly
Fig. 2 Score distribution for real and fake datasets [11]. divided into training (614 pairs), validation (155 pairs) and
test (177 pairs) sets, which were downscaled to 256 × 256
before training the model [33]. This data preprocessing
Also the Independent sampling naturally shows great happened to avoid the poor generalization of their vessel
performance. medGAN, seems to capture the dimension- segmentation technique, which produced incorrect
wise distribution relatively well, showing specific weakness segmentations for images in a later stage of diabetic
at processing codes with low probability. To evaluate the retinopathy [33]. To evaluate the quality and quantity of the
medGAN they used two EHR datasets. Then they evaluated images they carried out different methods.
the generated data by get reviews from medical experts. To
As a conclusion about this research, researchers developed
get the reviews from doctors they randomly got 50 records
the model using GAN to generate retinal vessels synthetics
from real data and another 50 from generated data and
images. Even though they produced some good quality
shuffle them and presented to experts to scale from 1-10
images, the knowledge of the GAN is limited. As they
based on the realistic of the data. The results are given in
trained the GAN using only 614 images and that also
figure 2 [11].
retrieved form one database [33]. And researchers
The findings suggested that medGAN’s synthetic data are mentioned that the future extension of this research as they
generally indistinguishable to a human doctor except for going to trained the model with large scale data sets form
several outliers. The fake records identified by the doctor different databases. The size of the synthetic images (256 ×
are mainly lacked appropriate medication codes [11]. And 256) is far from the resolution provided by images produced
these can happen in real world scenarios as well when the by current retinal fundus image acquisition systems. And
data is missing or not recorded. these drawbacks can be solved if they using large amount of
medGAN uses GAN to generate real-world multi-label data.
discrete electronic health records (EHR). Through rigorous As mentioned previously the need of generated data is in
evaluation using real datasets, medGAN showed impressive demand for researchers. And to contribute that another
results [11]. According to medical filed there is no efficient research was done using two stage pipeline for generating
way for researchers to access the real data. Considering that synthetics medical images. To test that they used to generate
retinal synthetic images. In their research they used dual and increased the image resolution to 128x128.To evaluate
GAN to generate images [19]. the model they used 3 fold cross validation on the SCR Lung
In their two stage pipeline the first one is to produce Database [41]. For the quantitative evaluation, they chose to
segmentation masks that represent the variable geometries perform image segmentation using the U-Net fully
of the dataset, and the second one is to translate the masks convolutional network architecture. They used the Nesterov
produced in Stage-I to photorealistic images. For the optimizer at a learning rate of 0.00001 for the segmentation
training and testing they used DRIVE database for stage one task, with a momentum of 0.99 and a weight decay of
GAN. And it contains forty pairs of retinal fundi images and 0.0005 [4-].
segmentation masks extracted manually by two experts They tested the model using real data, real and synthetic data
[19]. For stage two they used with segmentation masks, and synthetic data only. To evaluate the segmentation
derived from a CNN segmentation network on the results, we used the Dice coefficient and Hausdorff distance
MESSIDOR database. Stage one GAN is to generate metrics [41]. The results shown in table 1 for full training
variable segmentation masks. It is based on the deep set and in table 2 for reduce training set.
convolutional generative adversarial network (DCGAN)
architecture, and built on the TensorFlow platform [19]. TABLE 1
And stage two also build on the TensorFlow. To improve Full training set results.
the realistic of the image they used the u-net. Generation
mask given a photorealistic medical image. The u-net
architecture, specifically formulated for biomedical image
segmentation, is derived from an auto encoder architecture
that relies on unsupervised learning for dimensionality
reduction.
They evaluated the u-net on test images from the DRIVE TABLE 2
Reduce training set results.
database and compared them with the ground truth to
calculate an F1 score. Also calculated the variance between
the 4 synthetic and real datasets through a Kullback–Leibler
(KL) divergence score [19]. They received an F1 accuracy
rating of 0.8877 for synthetic data and an F1 accuracy of
0.8988 on the DRIVE dataset. When testing variance,
received a KL-divergence score of 4.752 [19]. As a conclusion researcher mentioned that after using the
As a conclusion they used dual GAN models to generate image segmentation they find out the images showing small
medical images due to the extreme complexion in the details and noises correctly than before in lung image
medical images. However, it is able to identify simple datasets [41]. That not shown in details when they using
features such as general color, shape, and lighting. But still only GAN. Also they found if the training time for GAN is
they need more variant and accurate real image data to too short, generated images are not in a usable format for
improve the dual GAN pipeline to generate more realistic later supervised trainings. Finding a suitable stopping point
images. for GAN training is still a hot topic of current research, as a
Another similar kind of research done to synthesized lower GAN loss during training typically does not indicate
medical images by using GAN. As they also tried to solve higher image quality of the generated images. Overall they
the lack of data problem in bioinformatics field. To did a research and proved that using GAN along with image
overcome this issues in their research they proposed a new segmentation can generate more accurate image data sets.
variant of GAN, which, in addition to synthesized medical A research conducted in Switzerland to create a tool using
images, also generates segmentation masks for the use in GAN to generate medical data sets. As GAN shown
supervised medical image analysis applications. remarkable success in generate things and specially produce
To get maximum benefit from the generated images while data sets. As to evaluate the system they used to generate
using that in supervised algorithms or deep learning tasks medical time series data set for intensive care unit [39]. In
it’s necessary to have a ground truth solution for any given ICUs, doctors have to make snap decisions under time
input image. So for that researcher used a modified GAN to pressure, where they cannot afford to hesitate. It is already
generate images as well as generate segmentation masks as standard in medical training to use simulations to train
well [41]. So it will be easy for the discriminator to decide doctors, but these simulations often rely on hand-engineered
whether the image is real or synthetic. That will improve the rules and physical props. Thus, a model capable of
learning of generator as well as discriminator networks. generating diverse and realistic ICU situations could have
Researcher used DCGAN architecture for this research. an immediate application, especially when given the ability
They used tensorfolow implementation of DCGAN in this to condition on underlying ‘states’ of the patient. As a
research [41]. And they modified that architecture and solution they proposed a Recurrent GAN (RGAN) and
include support for the generation of segmentation masks Recurrent Conditional GAN (RCGAN) to produce realistic
real-valued multi-dimensional time series, with an emphasis methods on classification problems to improve the accuracy
on their application to medical data [39]. of a well-sought class. First one is to remove border samples
The model presented in this work follows the architecture of between two categories, second one is reduce
a regular GAN, where both the generator and the dimensionality through feature selection, third one sacrifice
discriminator have been substituted by recurrent neural the accuracy of less-valuable classes. To train the model
networks. Therefore, researcher present a Recurrent GAN they used the Share2Quit dataset [42].
(RGAN), which can generate sequences of real-valued data, There are many existing methods to deal with this problem.
and a Recurrent Conditional GAN (RCGAN), which can However, their performance needs to be significantly
generate sequences of real-value data subject to some improved in practice. Their results show that applying each
conditional inputs [39]. And researchers evaluated their of these analysis methods improves classification accuracy
work by using some toy data sets. They used generated data of the well-sought class and proved the GANs can generate
to train the model and used real data to test the model many simulation data. Researchers carried out different
(TSTR) [39]. And then they used that in reverse to train the evaluations to get the better accuracy of the methods. They
model with real data and tested with synthetic data (TRTS). tried and check the accuracy before and after feature
In this research to generate icu data they used recently- selection by each time drop one feature, Excluding Border
released Philips eICU database. It contains around 200,000 Samples, Data Augmentation inclusion and Sacrificial
patients from 208 care units across the US, with a total of Boundary. They got around 87% accuracy after include the
224,026,866 entries divided in 33 tables. And for the segmented data and got high accuracy after the Sacrificial
research purpose they get only 4 main variables such as Boundary [42]. But due to the imbalanced of their training
oxygen saturation measured by pulse oximeter (SpO2), data they got some classes’ accuracies were poor.
heart rate (HR), respiratory rate (RR) and mean arterial
pressure (MAP). After preprocessing the data, they end up
with a cohort of 17,693 patients [39].
Suppose that the model has overfit, and most points in latent
space map to training examples. Then all the generated data
will be similar or same as the training data. And if suppose
model underfit then all the generated data will be totally
varying with training data in terms of distribution. To avoid
that hey compared the distribution of reconstruction errors
and compared the generated samples [39].
As a conclusion they created a tool to generate data sets
using RGAN. And they evaluated the model by generating
time series data sets for ICU. And the major finding of the
research is by generating labelled training data - by
conditioning on the labels and generating the corresponding Fig. 3 Accuracy level based on the size of data [42].
samples, anyone can evaluate the quality of the model using
the ‘TSTR technique‘, where they train a model on the
So researcher presented machine learning methods such as
synthetic data, and evaluate it on a real data. They have
removing noise, reducing the number of dimensions,
additionally illustrated that such a synthetic dataset does not
generating simulation data, and sacrificing the accuracy of
pose a major privacy concern or constitute a data leak for
other unimportant classes to improve the accuracy of
the original sensitive training data.
important class [42]. And they got good accuracy of
The primary goal of this research is to generate simulated simulated data as well and it given in image 3. In their future
data which is useful when only limited data is available. For work they going to try to improve the quality of augmented
that researcher took smoking cessation as a case study in this data.
research to generate data sets. Case study is to test a peer
recruitment strategy to increase access to a smoking
cessation resource (a technology-assisted tobacco IX PREVIOUSE WORKS TO GENERATE THINGS USING
GAN
intervention) [42]. As noted, a key challenge is that data is
limited in health interventions. Identifying ideal recruiters is After the remarkable success of GAN, it’s widely used in
therefore a classical classification problem. So researcher many industries to generate things. GAN used to generate
tried to augment the data for this case study to improve the images, text, music and many more things. So hear
efficiency of the classification algorithms. researcher going to discuss about the use of GAN in other
For that they introduce two techniques in this study to industries in terms of generate things. Research done and
augment the datasets. One is to change a very small set of proven that GAN can generate open-domain dialogue [18].
‘fields’ values randomly. And the second one is to augment
the data using GAN [42]. Also they proposed a few analysis
They proposed a solution using GAN to produce sequences
that are indistinguishable from human-generated dialogue
utterances. The aim of this research is to generating
meaningful and coherent dialogue responses given the
dialogue history. And they used the idea of adversarial
evaluation to train a discriminant function to separate
generated and true sentences, in an attempt to evaluate the
model’s sentence generation capability. And they got a good
accuracy levels and they used humans to validate the results
[18]. Fig. 4 Generated bird images using sentences [37].
Another research done to improve the training of GAN for
image synthesis [6]. As GAN is more popular in image GAN is famous to generate text data in handwritten format.
generation field [7] [37][12][33][19]. Recent works has Already there are researches done to generate numbers in
shown that GANs can produce convincing image samples hand written format [5][16]. From very early age humans
on datasets with low variability and low resolution [9][34]. learn handwriting as a skill. This research deals with this
However, GANs struggle to generate globally coherent, problem where an intelligent system tries to learn the
high resolution samples - particularly from datasets with handwriting of an entity using GAN. The start to train the
high variability. They construct a variant of GANs model to generate alphabetic and then single word and then
employing label conditioning that results in 128 × 128 list of words. For this task they proposed a modified
resolution image samples exhibiting global coherence. Also architecture of DCGAN to achieve this [5]. Also to achieve
they expand on previous work for image quality assessment faster learning they used the reinforcement method. Early
to provide two new analyses for assessing the implementation of their algorithm illustrates a good
discriminability and diversity of samples from class- performance with MNIST datasets [5]. Also during
conditional image synthesis models [6]. evaluation they tried to generate ascii and 0 -9 number data.
Hear in another research, researchers tried to generate high And the wide variation in style of handwriting produced a
quality samples of natural images. Building a model to wide range of different style images of the same digit [5].
generate high resolution natural images has been a major Their model hopes to give new insights in this area and its
problem in computer vision [12]. Recent advancement of uses include identification of forged documents, signature
GAN there is a new path open in computer vision to verification, computer generated art, digitization of
generate images similar like real ones. Their approach is to documents among others.
use a cascade of convolutional networks within a Laplacian Another research done to generate language using GAN.
pyramid framework to generate images in a coarse-to-fine Training GANs for language generation has proven to be
fashion [12-]. At each level of the pyramid, a separate more difficult, because of the non-differentiable nature of
generative convent model is trained using the GAN generating text with recurrent neural networks [31]. In this
approach. In a quantitative assessment by human evaluators, work, researcher shown that recurrent neural networks can
their samples were mistaken for real images around 40% of be trained to generate text with the use of GANs from
the time, compared to 10% for samples drawn from a GAN scratch using curriculum learning, by slowly teaching the
baseline model [12]. They used 15 volunteers to evaluate the model to generate sequences of increasing and variable
quality of the images generated by model. And they found length. They evaluate the model by generating 640
that the images generated by their model are far better than sequences from each model and measuring %-IN-TEST-n,
the standard GAN model. that is, the proportion of word n-grams from generated
There is an interesting and useful research done to generate sequences that also appear in a held-out test set [31]. They
images from text. But for this task normal AI systems and found that training the generator for 50 iterations every 10
algorithms are still far from the goal [37]. But to solve this training iterations of the discriminator resulted in superior
issue they develop a novel deep architecture and GAN performance [31]. They implemented and proved that their
formulation to effectively bridge these advances in text and approach vastly improves the quality of generated
image modeling, translating visual concepts from characters sequences compared to a convolutional baseline.
to pixels. Their main contribution in this work is to develop
a simple and effective GAN architecture and training X ADVANTAGES AND LIMITATIONS OF GAN
strategy that enables compelling text to image synthesis of
As a new framework GAN comes with advantages as well
bird and flower images from human-written descriptions
as disadvantages. One of the major advantage is GAN
[37]. And below image showing the Generated bird images
framework will work with any type of neural networks.
by interpolating between two sentences (within a row the
Anyone can implement GAN using conventional neural
noise is fixed). In future work, they target to scale up the
networks, recurrent neural network or even advances deep
model to higher resolution images and add more types of
learning networks. Another advantage is Markov chains are
text [37].
never needed [16] [4] [23] [3], only the conventional back XI CONCLUSION
propagation is used to get gradients, and no external Researcher analyze and review about bioinformatics data,
interface needed during the training [16-]. In terms of actual legal and ethical issues while accessing EHR data and other
result GAN produce better generated samples than the other security issues such as data breaching and misusing.
methods [16]. There’s no need to design the model to obey Researcher explained about the current methods to stop data
any kind of factorization. Any generator net and any breaching and security. Also explained about the struggles
discriminator net will work. Compared to the PixelRNN, the researchers facing while accessing the medical data for
runtime to generate a sample is smaller. GANs produce a research purposes. Further in this paper, researcher survey
sample in one shot, while PixelRNNs need to produce a the state of the art of GANs. This model has been used in
sample one pixel at a time [16] [3].Especially the generation many generative works including images, text, sound and
of HD data GAN perform better than PixelRNN and also data. And got good feedback from scientist and researchers.
GAN does not limit the generation dimension. Which Basic concept of GAN is min-max game theory. And it
increase the scope of the generated data samples in wide contain generator and discriminator networks. Best part of
range. this model is it can be develop using any type of neural
The generation process of GANs does not require tedious networks.
sampling sequence, but can directly sample and predict new In addition this paper reviewed the researches done to
samples, which improve the efficiency of generating new generate bioinformatics data for secondary purposes such as
samples [16]. In practice, the samples generated by GANs research. Researcher review about image data generation
are easy to understand for humans. For example, GANs can and EHR data generation as well. Additionally this paper
generate very sharp and realistic images [2] [37] [23]. Not contain review about usage of GAN in other industries.
only have GANs made great contributions to the Further researcher analyses and review about GAN
development of generative models, but they are also advantages and disadvantages also explained about future
meaningful and instructive for semi-supervised learning [3]. research direction of GAN as well. Researcher believes that
GAN have solved a lot of problems in terms of generative this review will provide valuable insights and serve as a
models and brought a new in AI field, but they still have starting point for researchers to apply GAN to generate data
limitations. GANs adopt the adversarial learning idea, but sets for their bioinformatics researches.
convergence of the model and existence of equilibrium
point have not been proved yet [23]. During the training REFERENCES
process needs to ensure the balance and synchronization of [1] Alex Kuefler1, Jeremy Morton2, Tim Wheeler2, and Mykel
two adversarial networks. D needs to be synchronized well Kochenderfer2, “Imitating Driver Behavior with Generative
Adversarial Networks”. 2017 IEEE Intelligent Vehicles Symposium
with G. The key challenge is to keep both D and G (IV) June 11-14, 2017, Redondo Beach, CA, USA.
synchronized. And mainly G must not be trained too much
[2] Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko,
without updating D, in order to avoid “the Helvetica Thomas Brox. “Learning to Generate Chairs, Tables and Cars with
scenario” in which G mood collapses too many values of z Convolutional Networks”. IEEE TRANSACTIONS ON PATTERN
to the same value of x [16] [23]. Mode collapse refers in ANALYSIS AND MACHINE INTELLIGENCE.
scenarios which generated samples have the similar [3] Analytics Vidhya. “Introductory guide to Generative Adversarial
features. For example if the GAN generated images the Networks (GANs) and their promise!”
https://www.analyticsvidhya.com/blog/2017/06/introductory-
color or the texture are same in all generated samples. generative-adversarial-networks-gans. JUNE 15, 2017
.Another disadvantage is if the discriminator is week then
[4] Antonia Creswell§ , Tom White¶ , Vincent Dumoulin‡ , Kai
the generator also perform low. So in GAN its must to train Arulkumaran§ , Biswa Sengupta†§ and Anil A Bharath§ , Member
the discriminator to be more powerful with lot of generated IEEE § BICV Group, Dept. of Bioengineering, Imperial College
and real data samples. As GAN using the neural networks it London ¶ School of Design, Victoria University of Wellington, New
Zealand ‡ MILA, University of Montreal, Montreal H3T 1N8 †
have the common defect called poor interpretability of Cortexica Vision Systems Ltd., London, United Kingdom.”
neural networks. To overcome these limitations there are Generative Adversarial Networks: An Overview Generative
new techniques and variants of GAN were introduced and Adversarial Networks: An Overview”, SUBMITTED TO IEEE-SPM,
it’s emerging continually. Wasserstein GAN [44], [23] APRIL 2017.
greatly overcomes the training instability problem, and [5] Arna Ghosh∗ and Biswarup Bhattacharya∗ and Somnath Basu Roy
partially solves the collapse mode problem at the same time. Chowdhury∗ Department of Electrical Engineering, Indian Institute of
Technology Kharagpur {arnaghosh, biswarup}@iitkgp.ac.in,
As similar like that there are variants of GAN models [email protected]. “Handwriting Profiling using
introduced such as Semi-GAN, C-GAN, BiGAN, InfoGAN, Generative Adversarial Networks”. Copyright c 2017, Association for
AC-GAN, SeqGAN, BEGAN [44] [23][4][3]. How to the Advancement of Artificial Intelligence (www.aaai.org).
completely avoid collapse mode and further optimize the [6] Augustus Odena 1 Christopher Olah 1 Jonathon Shlens 1.”
training process remains a research direction of GANs [23]. Conditional Image Synthesis with Auxiliary Classifier GANs”.
Proceedings of the 34 th International Conference on Machine
Furthermore, the theory about model convergence and the Learning, Sydney, Australia, PMLR 70, 2017.
existence of equilibrium point remain important research
[7] Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
subjects in the near future. Cunningham, ´ Alejandro Acosta, Andrew Aitken, Alykhan Tejani,
Johannes Totz, Zehan Wang, Wenzhe Shi Twitter. “Photo-Realistic Adversarial Networks: Introduction and Outlook”. IEEE/CAA
Single Image Super-Resolution Using a Generative Adversarial JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 4,
Network” OCTOBER 2017.
[8] David Weatherall, Brian Greenwood, Heng Leng Chee, and Prawase [24] Lawrence Gostin, Laura Levit, Sharyl Nass, et al. Beyond the HIPAA
Wasi. “Disease Control Priorities in Developing Countries. 2nd Privacy Rule: Enhancing Privacy, Improving Health Through
edition.” Research. National Academies Press, 2009.
[9] Denton, Emily L., Chintala, Soumith, Szlam, Arthur, and Fergus, [25] Machine Learning Mystery, July 24 2017.
Robert. “Deep generative image models using a laplacian pyramid of “https://machinelearningmastery.com/much-training-data-required-
adversarial networks”. CoRR, abs/1506.05751, 2015. URL machine-learning/”
http://arxiv.org/ abs/1506.05751.
[26] Mamta Chowdhary*1 , Dr. Asha Rani1 , Jyoti Parkash2 , Mohd
[10] Dr. Jason Brownlee. “Machine Learning Mystery”, Shahnaz2 , and Dhruv Dev2 1Dolphin (PG) college of Life Sciences,
https://machinelearningmastery.com/. Chuni Kalan, Dist Fatehgarh Punjab, India 2 Shivalik College of
Pharmacy, Nangal, Punjab, India *Corresponding Author’s Email
[11] Edward Choi, [email protected], Siddharth Biswal,
address- [email protected]. “BIOINFORMATICS: AN
[email protected], Georgia Institute of Technology.
OVERVIEW FOR CANCER RESEARCH”. Available online 15 July
Bradley Malin, Vanderbilt University. Jon Duke, Georgia Institute of
2016.
Technology. Walter F. Stewart, Sutter Health. Jimeng Sun, Georgia
Institute of Technology. “Generating Multi-label Discrete Patient [27] Mr.Rushi Longadge, 2 Ms. Snehlata S. Dongre, 3Dr. Latesh Malik 1
Records using Generative Adversarial Networks”. Department of Computer Science and Engineering G. H. Raisoni
College of Engineering Nagpur, India [email protected] 2
[12] Emily Denton∗ Dept. of Computer Science Courant Institute New
Department of Computer Science and Engineering G. H. Raisoni
York University, Soumith Chintala∗, Arthur Szlam, Rob Fergus
College of Engineering Nagpur, India [email protected] 3
Facebook AI Research New York. “Deep Generative Image Models
Department of Computer Science and Engineering G. H. Raisoni
using a Laplacian Pyramid of Adversarial Networks”.
College of Engineering Nagpur, India [email protected].
[13] Gary M Weiss. Mining with rarity: a unifying framework. ACM “Class Imbalance Problem in Data Mining: Review”. Class Imbalance
Sigkdd Explorations Newsletter, 6(1):7–19, 2004. Problem in Data Mining: Review.
[14] Health care IT news, October 12 [28] N.M. Luscombe, D. Greenbaum, M. Gerstein Department of
2017.http://www.healthcareitnews.com/slideshow/biggest- Molecular Biophysics and Biochemistry Yale University New Haven,
healthcare-breaches-2017-so-far?page=1 USA. “What is bioinformatics? An introduction and overview”.
[15] Hemamali Perera1,2, Kamal Chandima Jeewandara3 , Chandima Yearbook of Medical Informatics 2001.
Guruge2 , Sudarshi Seneviratne1,2, “Presenting symptoms of autism [29] NCBINational Center for Biotechnology Information Search
in Sri Lanka: analysis of a clinical cohort. Sri Lanka Journal of Child database. https://www.ncbi.nlm.nih.gov/. accessed 2017 october 3rd.
Health”, 2013; 42(3): 139-143.
[30] Office for Civil Rights. Guidance Regarding Methods for De-
[16] Ian J. Goodfellow, Jean Pouget-Abadie∗ , Mehdi Mirza, Bing Xu, identification of Protected Health Information in Accordance with the
David Warde-Farley, Sherjil Ozair† , Aaron Courville, Yoshua Health Insurance Portability and Accountability Act (HIPAA) Privacy
Bengio‡ Departement d’informatique et de recherche op ´ erationnelle Rule. U.S. Department of Health and Human Services, 2013.
´ Universite de Montr ´ eal ´ Montreal, QC H3C 3J7. “Generative
[31] Ofir Press ∗ 1 , Amir Bar ∗ 1 , 2 , Ben Bogin ∗ 1 Jonathan Berant 1 ,
Adversarial Nets”.
Lior Wolf 1 , 3 1 School of Computer Science, Tel-Aviv University 2
[17] James Hodge Jr, Lawrence O Gostin, and Peter Jacobson. Legal issues Zebra Medical Vision 3 Facebook AI Research
concerning electronic health information: privacy, quality, and [email protected]. “Language Generation with Recurrent
liability. Jama, 282(15):1466–1471, 1999. Generative Adversarial Networks without Pre-training”.
[18] Jiwei Li 1 , Will Monroe 1 , Tianlin Shi 1 , Sebastien Jean ´ 2 , Alan [32] Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Data mining
Ritter 3 and Dan Jurafsky 1 1Stanford University, Stanford, CA, USA cluster analysis: basic concepts and algorithms. 2013.
2New York University, NY, USA 3Ohio State University, OH, USA
[33] Pedro Larran‹aga, Borja Calvo, Roberto Santana, Concha Bielza, Josu
jiweil,wmonroe4,tianlins,[email protected]
Galdiano, In‹aki Inza, Jose¤ A. Lozano, Rube¤n Arman‹anzas,
[email protected] [email protected]. “Adversarial Learning
Guzma¤n Santafe¤, Aritz Pe¤rez and Victor Robles. “Machine
for Neural Dialogue Generation”.
learning in bioinformatics”. Submitted: 29th July 2005
[19] John Guibas* , Tejpal Virdi* , Peter Li First Two Authors Contributed
[34] Radford, Alec, Metz, Luke, and Chintala, Soumith. “Unsupervised
Equally {jtgg01, peter.s.li93}@gmail.com, [email protected]
representation learning with deep convolutional generative adversarial
Henry M. Gunn High School Palo Alto, CA, 94306. “Synthetic
networks”. CoRR, abs/1511.06434, 2015. URL http://arxiv.org/
Medical Images from Dual Generative Adversarial Networks”.
abs/1511.06434.
(Dated: September 7, 2017).
[35] Rahman Ali 1 , Muhammad Hameed Siddiqi 1 , Muhammad Idris 1 ,
[20] Jurgen Schmidhuber ¨ The Swiss AI Lab IDSIA Istituto Dalle Molle
Taqdir Ali 1 , Shujaat Hussain 1, Eui-Nam Huh 1 , Byeong Ho Kang
di Studi sull’Intelligenza Artificiale University of Lugano & SUPSI
2 and Sungyoung Lee 1,*, “GUDM: Automatic Generation of Unified
Galleria 2, 6928 Manno-Lugano Switzerland. “Deep Learning in
Datasets for Learning and Reasoning in Healthcare”. Published: 2 July
Neural Networks: An Overview”. Technical Report IDSIA-03-14.
2015.
[21] K. El Emam, D. Buckeridge, R. Tamblyn, A. Neisa, E. Jonker, and A.
[36] S. Sharmilan, H. T. Chaminda. “Pregnancy Complications Diagnosis
Verma. The re-identification risk of canadians from longitudinal
Using Predictive Data Mining”. Proceedings of the International
demographics. BMC Medical Informatics and Decision Making,
Conference on Innovations in Info-business and Technology-2016.
11:46, 2011a.
[37] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran,
[22] Khaled El Emam, associate professor in pediatrics, Canada research
Bernt Schiele, Honglak Lee, 1 University of Michigan, Ann Arbor,
chair in electronic health information,1,2 Sam Rodgers, lead general
MI, USA (UMICH.EDU) 2 Max Planck Institute for Informatics,
practitioner,3 and Bradley Malin, vice chair for research4.
Saarbrucken, Germany ( ¨ MPI-INF.MPG.DE). “Generative
“Anonymising and sharing individual patient data”. Published online
Adversarial Text to Image Synthesis”. Proceedings of the 33 rd
2015 Mar 20.
International Conference on Machine Learning, New York, NY, USA,
[23] Kunfeng Wang, Member, IEEE, Chao Gou, Yanjie Duan, Yilun Lin, 2016.
Xinhu Zheng, and Fei-Yue Wang, Fellow, IEEE. “Generative
[38] Sri Lanka Foundation, 2016 available from:
https://srilankafoundation.org/newsfeed/autism-spectrum-disorders-
asd-in-sri-lanka/
[39] Stephanie L. Hyland*1, 2, Cristóbal Esteban*1, Gunnar
Rätsch11Department of Computer Science, ETH Zurich,
Switzerland2Tri-Institutional Training Program in Computational
Biology and Medicine, Weill Cornell Medical{stephanie.hyland,
cristobal.esteban, raetsch}@inf.ethz.ch, “Real-valued (Medical) Time
Series Generation with Recurrent Conditional GANs” [accessed Oct
17 2017].
[40] Tharaniga Jeyakodi, Faculty of Computing, Sri Lanka Institute of
Information Technology. “Adoption of Health Information Systems in
Sri Lanka”. Proceedings of 2015 International Conference on Future
Computational Technologies.
[41] Thomas Neff1 , Christian Payer1 , Darko Stern ˇ 2 , Martin Urschler2.
“Generative Adversarial Network based Synthesis for Supervised
Medical Image Segmentation*”.
[42] Wei Li∗, Xiaohui Cui†, Kevin Michael Amaral‡, Rajani Sadasivam§,
and Ping Chen¶ Wuhan University, Wuhan, Hubei, China∗†
University of Massachusetts Boston, Boston, Massachusetts, USA‡¶
University of Massachusetts Medical School, Boston, Massachusetts,
USA§ Email: ∗auto [email protected], †[email protected],
‡[email protected], §[email protected],
¶[email protected]. “Smoking Cessation Recruitment Analysis: A
Case Study”. 2017 IEEE International Conference on Big Knowledge.
[43] WHO Media Center, April 2017 available from:
http://www.who.int/mediacentre/factsheets/autism-spectrum-
disorders/en/
[44] X. L. Wang, A. Shrivastava, and A. Gupta, “A-Fast-RCNN: hard
positive generation via adversary for object detection,” arXiv:
1704.03414, 2017.

View publication stats

Introduction to Information Quality
From Everand
Introduction to Information Quality
Craig Fisher
No ratings yet
Were Going On A Bear Hunt Lesson Plan
100% (1)
Were Going On A Bear Hunt Lesson Plan
3 pages
Vril The Religion of Lucifer
100% (1)
Vril The Religion of Lucifer
3 pages
Software Testing LAB Programs
No ratings yet
Software Testing LAB Programs
45 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Deep Learning in Bioinformatics - Principles and Applications-Apr-02-2024-1156
No ratings yet
Deep Learning in Bioinformatics - Principles and Applications-Apr-02-2024-1156
20 pages
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
From Everand
Bioinformatics: Algorithms, Coding, Data Science And Biostatistics
Rob Botwright
No ratings yet
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
Data Science
From Everand
Data Science
Chloe Martin
No ratings yet
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
From Everand
Data Science and Analytics Essentials: The Revolution of Decision-Making: Leveraging Data in the Digital Age
Daniel Richards
No ratings yet
All About Data Science: Learn Data Science from scratch
From Everand
All About Data Science: Learn Data Science from scratch
Devi Prasad
No ratings yet
Artificial Intelligence in Bioinformatics
No ratings yet
Artificial Intelligence in Bioinformatics
4 pages
Reality Mining: Using Big Data to Engineer a Better World
From Everand
Reality Mining: Using Big Data to Engineer a Better World
Nathan Eagle
4/5 (2)
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
From Everand
A Guide to Data Science and Analytics: Navigating the Data Deluge: Tools, Techniques, and Applications
Juniper Blake
No ratings yet
Deep Learning in Mining Biological Data
100% (1)
Deep Learning in Mining Biological Data
33 pages
Data Science
From Everand
Data Science
John D. Kelleher
3/5 (8)
applsci-14-05975
No ratings yet
applsci-14-05975
13 pages
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Book Chapter
No ratings yet
Book Chapter
17 pages
Datamining in Bioinformatics-1
No ratings yet
Datamining in Bioinformatics-1
15 pages
Milta 36
No ratings yet
Milta 36
7 pages
UC Riverside Electronic Theses and Dissertations
No ratings yet
UC Riverside Electronic Theses and Dissertations
59 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Biomedinformatics 02 00039 v3
No ratings yet
Biomedinformatics 02 00039 v3
22 pages
health_1
No ratings yet
health_1
11 pages
Cores Bioinformatics_and_Computational_Biology
No ratings yet
Cores Bioinformatics_and_Computational_Biology
4 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Preparation and Exploration: Applied to Healthcare Data
From Everand
Data Preparation and Exploration: Applied to Healthcare Data
Robert Hoyt
No ratings yet
Controllable Data Generation by Deep Learning: A Review
No ratings yet
Controllable Data Generation by Deep Learning: A Review
55 pages
Strategies in Biomedical Data Science: Driving Force for Innovation
From Everand
Strategies in Biomedical Data Science: Driving Force for Innovation
Jay A. Etchings
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Science and Analytics: The Power of Data: From Analysis to Action in the Modern World
From Everand
Mastering Data Science and Analytics: The Power of Data: From Analysis to Action in the Modern World
Finnley Harper
No ratings yet
Generating Big Radiogenomic Data of Cancer Using Deepfake Learni
No ratings yet
Generating Big Radiogenomic Data of Cancer Using Deepfake Learni
2 pages
What Is Bioinformatics
100% (1)
What Is Bioinformatics
22 pages
Final Term Paper Draft 2
No ratings yet
Final Term Paper Draft 2
33 pages
Artificial Intelligence Techniques For Bioinformatics
No ratings yet
Artificial Intelligence Techniques For Bioinformatics
54 pages
Big Data Is Not a Monolith
From Everand
Big Data Is Not a Monolith
Cassidy R. Sugimoto
5/5 (1)
1Doc
No ratings yet
1Doc
6 pages
Bioinformatics Applications in Patients Medicals
No ratings yet
Bioinformatics Applications in Patients Medicals
13 pages
Probabilistic Modeling in Bioinformatics and Medical Informatics - 1st Edition All Format Download
100% (11)
Probabilistic Modeling in Bioinformatics and Medical Informatics - 1st Edition All Format Download
17 pages
A Survey of Transfer and Multitask Learning in Bioinformatics
No ratings yet
A Survey of Transfer and Multitask Learning in Bioinformatics
12 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Soft Computing Methodologies in Tics
No ratings yet
Soft Computing Methodologies in Tics
7 pages
Controllable Data Generation by Deep Learning: A Review: Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao
No ratings yet
Controllable Data Generation by Deep Learning: A Review: Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao
38 pages
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Probabilistic Modeling in Bioinformatics and Medical Informatics 1st Edition Entire Book Download
No ratings yet
Probabilistic Modeling in Bioinformatics and Medical Informatics 1st Edition Entire Book Download
14 pages
Generative AI in Medicine-1
No ratings yet
Generative AI in Medicine-1
30 pages
Statistical Modelling ML Principles Bioinformatics
100% (1)
Statistical Modelling ML Principles Bioinformatics
318 pages
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
From Everand
Data Science Fundamentals and Practical Approaches: Understand Why Data Science Is the Next (English Edition)
Dr. Gypsy Nandi
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Python for Bioinformatics Chapman Hall CRC Mathematical Computational Biology Sebastian Bassi instant download
100% (1)
Python for Bioinformatics Chapman Hall CRC Mathematical Computational Biology Sebastian Bassi instant download
52 pages
Implementation-of-Machine-Learning-Algorithms_-Gaussian-Na%C3%AFve-Bayes-CatBoost-and-LightGBM
No ratings yet
Implementation-of-Machine-Learning-Algorithms_-Gaussian-Na%C3%AFve-Bayes-CatBoost-and-LightGBM
4 pages
Introduction to Bioinformatics_BCHS 4214
No ratings yet
Introduction to Bioinformatics_BCHS 4214
10 pages
An Assignment
No ratings yet
An Assignment
6 pages
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Unveiling The Frontiers of Deep Learning: Innovations Shaping Diverse Domains
No ratings yet
Unveiling The Frontiers of Deep Learning: Innovations Shaping Diverse Domains
64 pages
Bioinformatics Note
No ratings yet
Bioinformatics Note
7 pages
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Bioinformatics Question
No ratings yet
Bioinformatics Question
11 pages
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Human-Centered Data Science: An Introduction
From Everand
Human-Centered Data Science: An Introduction
Cecilia Aragon
No ratings yet
AN986: Bluetooth® A2DP and AVRCP Chip
No ratings yet
AN986: Bluetooth® A2DP and AVRCP Chip
50 pages
Answers CH 1
No ratings yet
Answers CH 1
1 page
Letter To Principal
No ratings yet
Letter To Principal
1 page
13 - Economic Section
No ratings yet
13 - Economic Section
44 pages
Virtual internship opportunities on AICTE Internship Portal by ServiceNow-reg
No ratings yet
Virtual internship opportunities on AICTE Internship Portal by ServiceNow-reg
2 pages
Arduino Bootcamp: Learning Through Projects: Parts List
No ratings yet
Arduino Bootcamp: Learning Through Projects: Parts List
17 pages
Dagohoy National High School Senior High School Department Dagohoy, Bohol Detailed Lesson Plan (DLP)
No ratings yet
Dagohoy National High School Senior High School Department Dagohoy, Bohol Detailed Lesson Plan (DLP)
2 pages
Guia 5 Ingles 8vo PDF
No ratings yet
Guia 5 Ingles 8vo PDF
11 pages
Kirin 710F Vs Kirin 710
No ratings yet
Kirin 710F Vs Kirin 710
4 pages
Information Support Assistant Job Description and Person Specification 2
No ratings yet
Information Support Assistant Job Description and Person Specification 2
7 pages
UTD-IPH00-XXXXX-M100-5TW
No ratings yet
UTD-IPH00-XXXXX-M100-5TW
5 pages
Math - Time and Money
No ratings yet
Math - Time and Money
1 page
FRAC_Draft2.1
No ratings yet
FRAC_Draft2.1
87 pages
Narimaster Cattle. Sibi
No ratings yet
Narimaster Cattle. Sibi
14 pages
Auditing, Assurance Services, and Forensics (PDFDrive)
100% (1)
Auditing, Assurance Services, and Forensics (PDFDrive)
494 pages
BOT UCE PHY 1 2025
No ratings yet
BOT UCE PHY 1 2025
5 pages
Piensate-Rico-1
No ratings yet
Piensate-Rico-1
6 pages
Katharina J. Schreiber
No ratings yet
Katharina J. Schreiber
41 pages
Learning Episode 5 - Writing My Lesson Plans: EDUC 302-Field Study 2 Participation and Assistantship
No ratings yet
Learning Episode 5 - Writing My Lesson Plans: EDUC 302-Field Study 2 Participation and Assistantship
4 pages
Brief Encounter (1945) : Movie Info
No ratings yet
Brief Encounter (1945) : Movie Info
7 pages
2010 National Qualifying Exam - Biology: Section A (Multiple Choice)
No ratings yet
2010 National Qualifying Exam - Biology: Section A (Multiple Choice)
4 pages
The House of The Scorpion Hero Cycle
No ratings yet
The House of The Scorpion Hero Cycle
2 pages
Mechanics of Fluids and Hydraulic Machines: Lecture Notes
No ratings yet
Mechanics of Fluids and Hydraulic Machines: Lecture Notes
98 pages
2023 LN, Xii, The Last Lesson
No ratings yet
2023 LN, Xii, The Last Lesson
3 pages
UTAMU
No ratings yet
UTAMU
2 pages
Coping With Loss, Death and Grieving
No ratings yet
Coping With Loss, Death and Grieving
39 pages
As 1289.5.7.1-2006 Methods of Testing Soils For Engineering Purposes Soil Comp Action and Density Tests - Comp
No ratings yet
As 1289.5.7.1-2006 Methods of Testing Soils For Engineering Purposes Soil Comp Action and Density Tests - Comp
2 pages

Generatebioinformaticsdatausing Generative Adversarial Network AReview

Uploaded by

Generatebioinformaticsdatausing Generative Adversarial Network AReview

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Generate bioinformatics data using Generative Adversarial Network: A Review

Conference Paper · December 2017

PREGNANCY COMPLICATIONS DIAGNOSIS USING PREDICTIVE DATA MINING View project

Generate bioinformatics data using Generative Adversarial Network View project

The user has requested enhancement of the downloaded file.

𝑚𝑖𝑛 𝑚𝑎𝑥 𝑉(𝐷, 𝐺) = 𝐸𝑥~𝑃𝑑𝑎𝑡𝑎(𝑥) [𝑙𝑜𝑔𝐷(𝑥)] + 𝐸𝑧~𝑃𝑧 (𝑧) [𝑙𝑜𝑔(1

View publication stats

You might also like