0% found this document useful (0 votes)
60 views

Child Face Generation With Deep Conditional Generative Adversarial Networks

1) The document presents a model for child face generation using Deep Convolutional Generational Adversarial Networks (DCGANs). The model takes images of parents as input and generates realistic images of what their child may look like. 2) Key challenges include the limited size of datasets for training and the high dimensionality of image inputs. The authors demonstrate that GANs can capture the multi-modal nature of the target distribution better than baseline models. 3) The proposed model conditions the generator and discriminator on the parent images. This allows the model to sample from conditional distributions and generate children's faces given their parents' faces.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Child Face Generation With Deep Conditional Generative Adversarial Networks

1) The document presents a model for child face generation using Deep Convolutional Generational Adversarial Networks (DCGANs). The model takes images of parents as input and generates realistic images of what their child may look like. 2) Key challenges include the limited size of datasets for training and the high dimensionality of image inputs. The authors demonstrate that GANs can capture the multi-modal nature of the target distribution better than baseline models. 3) The proposed model conditions the generator and discriminator on the parent images. This allows the model to sample from conditional distributions and generate children's faces given their parents' faces.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Child Face Generation with Deep Conditional Generative Adversarial

Networks

Robert Gordan Mingu Kim Alexander Muñoz

Abstract approaches. GANs do not require a costly inference step,


Child face generation is a computer vision prob- instead learning only through backpropagation. Further-
lem in which the goal is to synthesize realis- more, GANs leverage the benefits of deep neural networks,
tic images of a child given images of its par- easily incorporating many factors and complex feature in-
ents. We present a model for this problem based teractions.
on Deep Convolutional Generational Adversar- A natural extension to the model is the Conditional Gen-
ial Networks (DCGANs). Key challenges in this erative Adversarial Network, in which a set of features are
domain include limited datasets with high di- fed to both the Generator and the Discriminator. This al-
mensional input spaces and the multi-modal na- lows the model to sample from conditional distributions.
ture of the target distribution. We demonstrate We demonstrate a successful application of the model to
convincingly that GANs have a unique ability to the child face generation domain. For this problem, we
capture the latter feature, while use of state-of- are interested in sampling from the conditional distribution
the-art training techniques and architecture opti- over images of a childs face given images of each of his or
mizations allow us to mitigate the impact of the her parents. As far as we know, no previous academic work
former. As a baseline, we use a simple super- has focused on this problem. Instead, several papers have
vised model that minimizes RMSE with respect explored kinship verification, in which the goal is to clas-
to target images. Qualitatively, the clarity and sify pairs or triples of images as representing a parent-child
diversity of images reflect advantages of the GAN (or other) relationship. Among other benefits, advances in
model when compared to the base model. Quan- child face generation may help biological parents to find
titatively, we find that after training the discrimi- adopted and/or long vanished children.
nator correctly classifies GAN-generated images
as fake with a rate of 71.1%, compared to 99.5% Technically, this problem is interesting because of the prop-
classification accuracy on the images generated erties of the distribution we seek to model. A clear cleavage
by the baseline model, suggesting an objective exists in the distribution between male and female children,
basis for our observations. which means that the generator should be able to produce
both types of children. Furthermore, child face generation
is an interesting application of computer vision methods
1. Introduction such as convolutional layers because of the complex three-
way relationship between the visual objects. Rather than
Since their introduction by Goodfellow et al in 2014, Gen- simply reducing an image to labels, we must capture infor-
erative Adversarial Networks have gained rapid adoption mation at some level of abstraction that best predicts the
as a way of modeling and sampling from latent distribu- face of that parents child.
tions over certain output spaces. In particular, they have
proved particularly effective in the image domain, allow- We first discuss our data and a more formal definition of
ing the synthesis of extremely realistic images. our problem. We then review related work, both in model
structure and in the kinship subdomain of computer vision.
GAN models construct a two-player game opposing a Gen- In section 4, we describe our model, followed by the train-
erator and a Discriminator. The latter is trained to discrim- ing process in section 5 and methods in section 6. Lastly,
inate between output from the Generator and real-world we present our results and the conclusions we can draw
samples, while the former is trained to fool the Discrim- from them.
inator. GANs have faced several practical issues, includ-
ing a notoriously difficult training process and mode col-
lapse, which occurs when the Generator learns to produce
a single point in the output space. However, GANs en-
joy several advantages when compared to older generative
Submission and Formatting Instructions for ICML 2017

2. Background
The exists some distribution C(θ) such that a random vari-
able A ∼ C(θ) consists of an image of a child with the
probability that a certain child has the facial phenotype cor-
responding to that image. Using the universality of the uni-
form, we can construct a mapping from random noise Z
to an arbitrary distribution. Unfortunately, theta includes
many variables not easily observable, including presum-
ably some environmental interactions. However, if we have
some useful information, we can construct f1 (Z, f2 (θ))
which we can then fit using a training method of our choice
and a dataset of child images (in addition to theta features).
Intuitively, a convenient and useful choice for theta is the
images of the parents of a given child. We employ the Figure 1. Model architecture for the generator network of
childGAN. Separate convolutional parts of the network process
TSKinFace dataset, which consists of 1015 triples of 64x64
the two parent images, while a fully connected layer processes
pixel images. These represent the father, the mother, and the noise. These inputs are combined at the end of the network to
the child. The dataset is roughly balanced between female produce the output.
and male children. However, the limitations of the dataset
pose some other challenges. First, the small size of the
dataset means that overfitting is a constant concern. This
concern is aggravated by the relatively large amount of data
used as an input to this model. Secondly, the dataset is bi-
ased towards East Asian families, which means that the re-
sulting model may lack the ability to generalize beyond this
group of people.

3. Related Work
Goodfellow et al. (2014) first introduced the GAN model
(Goodfellow et al., 2014). GANs have demonstrated suc-
cess in many image-based domains, including image gen-
eration (Denton et al., 2015), representation learning (Rad-
ford et al., 2015), text-to-image synthesis (Reed et al.,
2016), upsampling images (Ledig et al., 2016). The key
innovation is the adversarial loss in which the loss of the Figure 2. Model architecture for the discriminator network of
generator is based on the ability of a discriminator to cor- childGAN. As in the generator, separate convolutional compo-
rectly classify the image as coming from the generator. The nents process the parent images.
weights of the generative model can be trained directly by
backpropagating through the (fixed) weights of the discrim-
inator, since the output of the generator and input of the
discriminator align.
CGAN produces more realistic results (Isola et al., 2016).
Conditional Generative Adversarial Networks quickly fol- Further developments in this domain include the introduc-
lowed GANs. The first application of this model condi- tion of models that include an inverse mapping to translate
tioned only on a single number from 0-9, in order to gen- back (Zhu et al., 2017). Our only adjustment to the image-
erate MNIST images. However, one can condition on any to-image translation model is the use of two images instead
type of data that can be easily fed into a neural network of one as the input to our generator.
(Mirza & Osindero, 2014).
In the domain of kinship verification, we find a variety of
The most natural parallel to the problem of child face gen- approaches. The paper we draw our dataset from employs
eration is image-to-image translation, a domain which gen- an RBSM (relative symmetric bilinear model) and feature
eralizes the problem of taking an input image and produc- extraction (Qin et al., 2015). However, others have found
ing an output image. While such a problem can be ap- success with this problem using convolutional neural net-
proached with a standard CNN and deconvolutions, the works (Zhang12 et al., 2015).
Submission and Formatting Instructions for ICML 2017

4. Model
As discussed previously, we are interested in modeling and
sampling from the distribution over children given their
parent images. Because we process each input separately
initially, we can express our generator as:

f1 (Z, f2 (p1 ), f3 (p2 ))

Here each p1 and p2 represent each parent. We call this


function G. G is therefore a mapping from random noise
onto the space of our data x, which is the images of chil-
dren. The discriminator then maps from the space of the Figure 3. The basic training process for Generative Adversarial
child images to a single scalar representing a boolean value Networks. (ima)
indicating whether or not the image was produced by the
generator.
Our baseline supervised CNN model is nearly identical to
D(x, f4 (p1 ), f5 (p2 ))
the generative network of our GAN. The key difference is
We then train D to minimize the probability of misclassify- that the loss of this model is calculated with respect to the
ing an image as coming from the generator or the real data. child image from the data.
In contrast, G is trained to minimize:

1 − D(G(Z)) 5. Training
Because our model is composed of neural networks, we use
Thus, the generator network is trained to produce images
backpropagation to fit reasonable values for the parameters.
that closely conform to the distribution, as evaluated by the
When compared to other generative methods, GANs have
generator. The original GAN paper characterizes the rela-
the advantage of not requiring an inference step. Compu-
tionship between the two neural networks as that of a two
tationally, this makes it possible to estimate parameters for
player minimiax game with the following value function:
much larger models. GANs learn approximate parameters,
min max V (D, G) = and have a notoriously unstable training process, which
G D makes appropriate training procedures very important.
Ex∼pdata (x) [log D(x, f4 (p1 ), f5 (p2 ))] As mentioned previously, we are limited by a relatively
+EZ∼pz (z) [log(1 − D(G(Z, f2 (p1 ), f3 (p2 ))), p1 , p2 )] small dataset of 1015 examples of parent-parent-child trio
image sets. Because of this, we used transfer learning as
where in this case log-loss is used. a way of pseudo-augmenting our data. For example, much
of what the generator must learn at a higher level is simply
The parameters of this model are all weights in the neural making human-like faces; regardless of the parent images,
networks. This presents a challenge because the number of the output needs to look like a child. This fundamental fea-
weights is large relative to the the amount of data that we ture of human faces and higher level abstraction is not lim-
have. ited to our problem. We assume that this encoding occurs
However, the advantages of using a neural network-based from the first two convolutional layers. Therefore, we can
model include their ability to represent nearly arbitrary instead pretrain a different DCGAN that takes in a random
functions and using highly efficient backpropagation to z matrix input and outputs a child image. The advantage of
train. This allows us to make very few assumptions about splitting this problem up in this manner is that we can use
parameters in our model. Nonetheless, we bake a few as- any image dataset of children. We therefore used the Large
sumptions about the structure of the data and the problem Age-Gap (LAG) database, an image database with pictures
into our model. For example, we assume we can process taken of people at a variety of ages (Bianco, 2017). We
the parents’ images separately because the features that af- only take the child images, resulting in 9846 photos.
fect the looks of their child can be encoded at a higher level In summary, the following is the basic outline of our overall
of abstraction before they interact. We also use convolu- training process for the DCGAN:
tional layers in our network components that take images,
which reflect an assumption of positional invariance for the 1. Train a DCGAN that receives random array z as an
features in the images. Intuitively these are reasonable as- input and generates child images.
sumptions, and they help us limit the number of parameters
we have to train. 2. Initialize the weights of the first two convolutional
Submission and Formatting Instructions for ICML 2017

layers which initially takes a random array as input, (Radford et al., 2015), but we made several key changes for
which are shown in Figure 1, of a different DCGAN our problem. For example, our model is a conditional GAN
(intended to generate children from parent images) to that takes in 2 parent images, so we had to adjust our model
be the first two layers of the pre-trained DCGAN from accordingly. Also, although previous work used ReLUs for
the previous step. activation, we found that ELUs allowed the learning to be
much quicker. Also, we have found that too much batch
3. Keeping the weights in these layers fixed, train the normalization causes the discriminator to ’overpower’ the
DCGAN conditioned on parent images to generate po- generator in the early training process, so some of the batch
tential children images. normalization layers were omitted. We used Adam opti-
mizers for all generators and discriminators, with learning
In general, we alternate the training of the discriminator rate of 0.0002 and betas = [0.5, 0.999].
and the generator. The objective functions are as described
in Section 4. We use binary cross entropy loss as the loss For our final model, we trained for 12 hours, which equated
function. to 4000 epochs.

In order to evaluate the quality of the output of our GAN


we also implemented a supervised model in which the loss
7. Results
function for the generator is RMSE with respect to the ac- One weakness of the generative model is that it lacks clear
tual child image. In this case we backpropagate directly quantitative evaluation metrics. We can see in Figure 4
from the final layer of the generator instead of doing so that the generator is successfully learning to confuse the
through the weights of the discriminator first. discriminator, such as the spikes in the probability fool-
ing the discriminator around t=590 and t=430. 4 How-
6. Methods ever, the quality of the generated images is subjective and
best evaluated by humans. We observed the largest qual-
In this section we specify the exact details of our imple- ity increases in the plateaus of the generator performance.
mentation of the model and the dataset. We’ve included sample images from both our GAN and
We have two DCGANs total- one of them (which we will our baseline supervised model. The GAN produces clearer
call PreDCGAN) is used to generate child images from ran- images, and is also capable of outputting a much greater
dom input, for transfer learning purposes. The other DC- variety of images. Meanwhile, as expected, the fully super-
GAN (which we will refer to as childGAN) uses these pre- vised model does not produce variety and produces com-
trained weights for two of its layers, and trains on parent paratively blurry images.
images to generate possible children images. We also explored the performance impact of various tweaks
To train PreDCGAN, we used the Large Age-Gap Dataset, to our model architecture. We found that pre-training the
which comprises of 9846 child photos. To train childGAN, discriminator had an adverse impact on our model, because
we used the TSKinFace dataset of 1015 triples of father- when the discriminator was much stronger than the genera-
mother-child images. Each image was mean-centered tor, the gradients during generator training were not mean-
around 0 and normalized before training. The architectures ingful because it could never successfully fool the discrim-
of the generator and discriminator for childGAN are shown inator. Lastly, as discussed in Sections 5 and 6, the pseudo-
in figures 1 and 2. The generator takes in two 64x64x3 im- augmentation of data improved subjective performance.
ages of parents, along with a 4x4x64 random array drawn Finally, we compared the generative adversarial model to
from Unif(0, 1), and outputs a 64x64x3 image. The dis- our baseline model by pitting both against the discriminator
criminator takes in two parent images as well as either an network from the GAN. We found that the generator fooled
image of a real child or a generated image of a child. the discriminator in 28.9% of instances, while the super-
During our first attempts at the problem, we tried to train vised CNN model only did so in 0.005% of cases. This
the childGAN without transfer learning. Although we re- suggests the the GAN is a more robust model for sampling
ceived results, we suffered from mode collapse, where the child images.
Generator learned to produce a single point in the out-
put space for input images. Expanding our dataset by the 8. Discussion
use of transfer learning with PreDCGAN, along with fixed
weights for the two convolutional layers shown in green in Our results demonstrate that the appealing properties of
Figure 1, allowed for greater diversity of output child im- GANs hold even for complex models in which the distribu-
ages. tion is conditioned on a large set of features such as two im-
ages. The most important property in this set is the ability
Much of the architecture was inspired by previous work
Submission and Formatting Instructions for ICML 2017

Figure 4. Results for training: probability of generator fooling the discriminator over time

Figure 5. Output images by model


Submission and Formatting Instructions for ICML 2017

to sample from highly multi-modal distributions. The most world impact. The most useful contribution in this area
obvious example of this is that the GAN successfully pro- would be the development of larger and more diverse
duces both male and female children. However, the issue datasets. The limits of the dataset limit the strength and
of image clarity can also be viewed as a similar problem. generality of the model.
While two very clear images, when viewed by a human,
Further exploration is also needed into techniques for gen-
may both be judged to be likely images of real children,
erative models in low-data environments. For example, the
the pixelwise average of these images is quite unlikely to
incorporation of data from similar spaces proved a promis-
get the same reaction. Therefore the clarity of images pro-
ing addition to our model, and large volumes of unlabeled
duced by the GAN reflects its ability to find modes, peaks
data could also contribute success on this and similar prob-
in probability density, instead of averaging them. In con-
lems.
trast, the supervised baseline model, driven by its RMSE,
”hedges” its guesses for pixel values between multiple pos-
sibilities. References
GANs are notorious for ”mode collapse” a problem that Deep learning for computer vision: Generative mod-
occurs when the generator learns to produce a single value els and adversarial training (upc 2016). URL
without any variation. We were able to avoid this problem https://www.slideshare.net/xavigiro/
by using several random re-initializations. This ability to deep-learning-for-computer-vision-generative-mode
produce a highly diverse set of images is important for this
domain. Potential applications include synthesizing hypo- Bianco, Simone. Large age-gap face verification by feature
thetical images of lost children to help biological children injection in deep networks. Pattern Recognition Letters,
find them or information retrieval (searching for online im- 90:36–42, 2017. doi: 10.1016/j.patrec.2017.03.006.
ages of children for given parents). In these cases, having Denton, Emily L, Chintala, Soumith, Fergus, Rob, et al.
several diverse candidate images is valuable. Deep generative image models using a laplacian pyramid
The other interesting result from our experiments is that the of adversarial networks. In Advances in neural informa-
GAN was even able to produce remotely reasonable results tion processing systems, pp. 1486–1494, 2015.
given the small size of the dataset. This may reflect the
fact that generated data augments the real data when train- Goodfellow, Ian, Pouget-Abadie, Jean, Mirza, Mehdi, Xu,
ing the discriminator. In effect, this doubles the size of the Bing, Warde-Farley, David, Ozair, Sherjil, Courville,
data. Furthermore, our use of transfer learning gave a sort Aaron, and Bengio, Yoshua. Generative adversarial nets.
of pseudo-augmentation to the dataset. Our success with In Advances in neural information processing systems,
this technique could potentially be replicated in other do- pp. 2672–2680, 2014.
mains with limited data from which to train generative data. Isola, Phillip, Zhu, Jun-Yan, Zhou, Tinghui, and Efros,
By introducing data from spaces which overlap the target Alexei A. Image-to-image translation with conditional
space, the generator can quickly learn valuable information adversarial networks. arXiv preprint arXiv:1611.07004,
applicable to the target space. 2016.

9. Conclusion Langley, P. Crafting papers on machine learning. In Lang-


ley, Pat (ed.), Proceedings of the 17th International Con-
The results in this paper suggest Generative Adversarial ference on Machine Learning (ICML 2000), pp. 1207–
Networks are well suited to problems that involve com- 1216, Stanford, CA, 2000. Morgan Kaufmann.
plex three way relationships between visual objects. With
appropriate training and architectural adjustments, these Ledig, Christian, Theis, Lucas, Huszár, Ferenc, Caballero,
models can be effective even for low-data problems. The Jose, Cunningham, Andrew, Acosta, Alejandro, Aitken,
techniques, such as pseudo-augmentation, can potentially Andrew, Tejani, Alykhan, Totz, Johannes, Wang, Ze-
be extended to other domains. han, et al. Photo-realistic single image super-resolution
using a generative adversarial network. arXiv preprint
GANs successfully capture important properties of the arXiv:1609.04802, 2016.
parent-child kinship relationships, most importantly the
multi-modality. We show that GANs are capable of pro- Mirza, Mehdi and Osindero, Simon. Conditional genera-
ducing candidate children with highly diverse features. tive adversarial nets. arXiv preprint arXiv:1411.1784,
There has been, unfortunately, relatively little work on the 2014.
problem of generating child faces given parents. While Qin, Xiaoqian, Tan, Xiaoyang, and Chen, Songcan. Tri-
clearly a small niche, it has the potential for specific real- subject kinship verification: Understanding the core of a
Submission and Formatting Instructions for ICML 2017

family. IEEE Transactions on Multimedia, 17(10):1855–


1867, 2015.
Radford, Alec, Metz, Luke, and Chintala, Soumith.
Unsupervised representation learning with deep con-
volutional generative adversarial networks. CoRR,
abs/1511.06434, 2015. URL http://arxiv.org/
abs/1511.06434.
Reed, Scott, Akata, Zeynep, Yan, Xinchen, Logeswaran,
Lajanugen, Schiele, Bernt, and Lee, Honglak. Genera-
tive adversarial text to image synthesis. arXiv preprint
arXiv:1605.05396, 2016.

Zhang12, Kaihao, Huang, Yongzhen, Song, Chunfeng, Wu,


Hong, Wang, Liang, and Intelligence, Statistical Ma-
chine. Kinship verification with deep convolutional neu-
ral networks. 2015.
Zhu, Jun-Yan, Park, Taesung, Isola, Phillip, and Efros,
Alexei A. Unpaired image-to-image translation using
cycle-consistent adversarial networks. arXiv preprint
arXiv:1703.10593, 2017.

You might also like