0% found this document useful (0 votes)
2 views

Understanding and Visualizing Generative

The document discusses the application of Generative Adversarial Networks (GAN), specifically the PIX2PIXHD model, in learning and generating architectural drawings. It details the network's structure, including convolution, residual, and deconvolution layers, and how these contribute to recognizing and visualizing features in architectural plans. The findings suggest that as the network trains, it learns to represent architectural features more concisely and clearly, drawing parallels to human learning processes.

Uploaded by

whk666888
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Understanding and Visualizing Generative

The document discusses the application of Generative Adversarial Networks (GAN), specifically the PIX2PIXHD model, in learning and generating architectural drawings. It details the network's structure, including convolution, residual, and deconvolution layers, and how these contribute to recognizing and visualizing features in architectural plans. The findings suggest that as the network trains, it learns to represent architectural features more concisely and clearly, drawing parallels to human learning processes.

Uploaded by

whk666888
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

W. Huang, M. Williams, D. Luo, Y. Wu and Y. Lin (eds.

), Learning, Prototyping and Adapting, Short


Paper Proceedings of the 23rd International Conference on Computer-Aided Architectural Design Research
in Asia (CAADRIA) 2018. © 2018, The Association for Computer-Aided Architectural Design Research in
Asia (CAADRIA), Hong Kong.

UNDERSTANDING AND VISUALIZING GENERATIVE


ADVERSARIAL NETWORK IN ARCHITECTURAL
DRAWINGS

HAO ZHENG1 and WEIXIN HUANG2


1
University of California, Berkeley,USA
[email protected]
2
Tsinghua University, Beijing, China
[email protected]

Abstract. Generative Adversarial Network (GAN) is a model frame-


work in machine learning. It’s specially used for learning and generat-
ing input and output data with similar or the same format. PIX2PIXHD
is a refined version of GAN, which is designed for learning image data
in pairs, and generating predicted images based on the network model.
The author applied PIX2PIXHD to learning and generating architec-
tural drawings, marking rooms with different colours automatically by
computer programs. Then, to understand how this network works, the
author analysed the frame of the network, and gave a detailed explana-
tion about the three working principles of this network, convolution
layer, residual network layer and deconvolution layer. Last, to visualize
the network in architectural drawings, the author exported the data from
each layer and each training epoch as grayscale images, finding that the
features of architectural plan drawings have been learned step by step,
and stored in the network as parameters. And the features in the draw-
ings become more concise as the network goes deeper, and clearer as
the training epoch increases. It might be inspiring comparing to the
learning process of our human beings.

Keywords. Machine learning; architectural drawing; Generative Ad-


versarial Network; visualizing; PIX2PIXHD.

1. Introduction

1.1. GENERATIVE ADVERSARIAL NETWORKS


Goodfellow et al. (2014) firstly proposed a network in machine learning,
which we called Generative Adversarial Networks. It was used for generating
234 H. ZHENG AND W. HUANG

data, especially for the output data, whose format is similar or same with the
input data. By giving training data in pairs, the program will figure out the
most suitable parameters in the network, so that the discriminator (D) has
smallest possibility to distinguish the generated data (G) with the original data.
Then, Wang et al. (2017) built a refined network called PIX2PIXHD (Fig-
ure 1), for generating and evaluating 2D image data. Input image will be trans-
lated into three 2D matrices, based on its width, height, and RGB channels.
Then the matric will go through 5 groups of convolution layer, which contains
one convolution layer, one batch normalization layer, and one ReLU layer,
then 9 groups of residual network layer, which follows as two sets of Reflec-
tionPad2d-Conv2d-InstanceNorm2d-ReLU layer, and finally 5 groups of de-
convolution layer, which contains one deconvolution layer, one batch normal-
ization layer, and one ReLU or Tanh layer.

Figure 1. Network architecture of PIX2PIXHD by Wang et al. (2017).

A large number of experiments show that, PIX2PIXHD network works


very well in 2D image training and generating. So the following work of this
article is heavily based on this network.

1.2. RECOGNIZING ARCHITECTURAL DRAWINGS


As the main method for communicating with architects, architectural plan
drawings can also be regarded as 2D image data. So we collected 100 images
of apartment floor plans from lianjia.com, and marked them with different
colours by rooms (Figure 2). R255G0B0 for walkway, R0G255B0 for bed-
rooms, R0G0B255 for living rooms, R255G255B0 for kitchens, R255G0B255
for toilets, R0G255B255 for dining rooms, R0G0B0 for balconies, R128G0B0
for windows, and R0G128B0 for doors. The plan drawing and its correspond-
ing coloured image together become a training image pair.
UNDERSTANDING AND VISUALIZING GENERATIVE ADVERSARIAL NET-
WORK IN ARCHITECTURAL DRAWINGS 235

Figure 2. Apartment floor plan drawing (left); labelled image (middle); labelling rule (right).

After training with 100 image pairs, we gave new floor plan drawings to
the network, and told the program to generate predicted labelled images (Fig-
ure 3). As figure 3 shows, the network generated a highly similar labelled im-
age, which means it performed nicely in recognizing architectural drawings.

Figure 3. Apartment floor plan drawing (left); generated labelled image (middle); original
labelled image (right).

2. Working principles
In order to reveal how PIX2PIXHD network learns image pairs, this chapter
will analyse all three parts of this network, and give explanation of why they
work well in processing image data.

2.1. CONVOLUTION LAYER


Mentioned earlier in this article, the first part of PIX2PIXHD network is made
from 5 groups of convolution layer, which contains one convolution layer, one
batch normalization layer, and one ReLU layer. While the batch normalization
236 H. ZHENG AND W. HUANG

layer and ReLU layer do not build connections between each pixel, convolu-
tion layer acts as the main calculation rule, extracting and mixing features of
an image.
As figure 4 shows, convolution kernel is a 3 × 3 (or larger) matrix. When
we input a 5 × 5 matrix, the kernel will find its corresponding position, multi-
ply and sum-up 9 numbers, and finally output a new 3 × 3 matrix. General
speaking, convolution kernel is a feature extractor, turning a matrix into a
smaller but refined new matrix. A convolution layer usually contains hundreds
of kernels, in order to make sure all features being contained in the layer. So
the numbers in the kernel are the parameters, which we should tell the program
to figure out by machine learning.

Figure 4. Generating a new matrix (image) using convolution kernel.

2.2. RESIDUAL NETWORK LAYER


The Second part in the network is 9 groups of residual network layer (ResNet).
One ResNet contains two groups of convolution layer, but instead of directly
linking convolution layers to the network, ResNet has a protecting operation
to skip these two layers if the result is turning worse (Figure 5). It proceeds
the network into deeper layers, while making sure the output will not become
worse.

Figure 5. ResNet unit framework.

2.3. DECONVOLUTION LAYER


After going through 5 convolution layers and 9 residual network layers, the
width and height of the new matrix become 1/16 of the original matrix. So
next 5 groups of deconvolution layers will enlarge the matrix, turning it back
UNDERSTANDING AND VISUALIZING GENERATIVE ADVERSARIAL NET-
WORK IN ARCHITECTURAL DRAWINGS 237

to the original size, while using the features to generate similar data as the
second image in the image pairs. Considering the length of this article, the
reversed operation of matrix will not be elaborated.

3. Visualizing architectural features in plan drawings


After understanding the network frame, we started to extract the parameters
in each layer from each training epoch, to see what features the network has
learned.

3.1. BY NETWORK LAYER


Firstly, we inputted a new image into the network. After passing through each
convolution layer, the size of the image got down to 1/2, while the number of
channels increased to 2 times. That means the features in one image channel
became more concise while the features or the combination of features became
more. As figure 6 shows, only simple features like walls were recognized in
conv-layer 1, but as the layer went deeper, more features like edges of tables
and sofas became conspicuous, while the image size went down, which means
the features were combined and compressed in convolution layers, like we
human beings learning from concrete entities to abstract concepts as we think
deeper.

Figure 6. Visualization of parameters in each convolution layer.

Later comes the ResNet layer, which did not change the image size and
number of channels, but further shifted the combination of features. Last, as
figure 7 shows, deconvolution layer enlarged the image and decreased the
number of channels into the same status of the original image.

Figure 7. Visualization of parameters in ResNet layer and each deconvolution layer.


238 H. ZHENG AND W. HUANG

3.2. BY TRAINING EPOCH


Also, we used the same method to extract parameters in different training
epochs. As figure 8 shows, in convolution layer 3, when training to epoch 4,
the node 0 recognized the paving pattern in bedrooms, but it’s not obvious
since there is still noise of table edges and walkway paving. As the training
went on, walkway paving first disappeared in epoch 24, and then table edges
went away gradually. In epoch 80, only bedroom paving pattern remained in
the parameters, which means this node was used to tell bedroom area apart
from other areas, and it became clearer and clearer as training time increased.
So comparing to human learning process, it’s easy to understand the effect of
training time on performance.

Figure 8. Visualization of parameters in convolution layer 3 node 0 in each sample epoch.

4. Conclusion
Based on Generative Adversarial Networks, PIX2PIXHD is a powerful ma-
chine learning tool to recognize and generate architectural drawings. The fea-
tures in the drawings become more concise as the network goes deeper, and
clearer as the training epoch increases. It might be inspiring comparing to the
learning process of our human beings, noted that we learned from concrete
entities to abstract concepts, and from fuzzy cognition to accurate judgement.
So in the future, Generative Adversarial Networks not only can be used for
generating images, but also has the potential for self-designing art or architec-
tural works.

Acknowledgements
I'd like to show my gratitude to Prof. Weixin Huang from Tsinghua University, who supervised
me in this research, and Yuming Lin, Lijing Yang, Chenglin Wu, Zhijia Chen, and Xia Su for
providing labelled image data and advice.

References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio,
Y. (2014). Generative adversarial nets. In Advances in neural information processing sys-
tems (pp. 2672-2680).
Wang, T. C., Liu, M. Y., Zhu, J. Y., Tao, A., Kautz, J., & Catanzaro, B. (2017). High-resolution
image synthesis and semantic manipulation with conditional gans.

You might also like