0% found this document useful (0 votes)
43 views32 pages

Generative AI For Architectural Design: A Literature Review

This literature review examines the impact of Generative Artificial Intelligence (AI) on architectural design, highlighting advancements in methodologies and efficiency through technologies like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. The paper outlines the applications of generative AI across various stages of architectural design, from initial 3D form generation to final imagery, and emphasizes the growing interest and research in this field. It also identifies challenges such as the need for high-quality training data and professional barriers that may hinder the integration of advanced generative models in architectural practices.

Uploaded by

Sahithi Kalakoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views32 pages

Generative AI For Architectural Design: A Literature Review

This literature review examines the impact of Generative Artificial Intelligence (AI) on architectural design, highlighting advancements in methodologies and efficiency through technologies like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and Diffusion Models. The paper outlines the applications of generative AI across various stages of architectural design, from initial 3D form generation to final imagery, and emphasizes the growing interest and research in this field. It also identifies challenges such as the need for high-quality training data and professional barriers that may hinder the integration of advanced generative models in architectural practices.

Uploaded by

Sahithi Kalakoti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Generative AI for Architectural Design: A Literature Review

Chengyuan Li1 Tianyu Zhang2 Xusheng Du2 Ye Zhang1 * Haoran Xie2


1
Tianjin University 2 Japan Advanced Institute of Science and Technology

Abstract
arXiv:2404.01335v1 [cs.LG] 30 Mar 2024

Generative Artificial Intelligence (AI) has pioneered new


methodological paradigms in architectural design, signifi-
cantly expanding the innovative potential and efficiency of
the design process. This paper explores the extensive appli-
cations of generative AI technologies in architectural de-
sign, a trend that has benefitted from the rapid develop-
ment of deep generative models. Generative Adversarial
Networks (GANs) and Variational Autoencoder (VAE) have
been extensively applied before, significantly advancing de-
sign innovation and efficiency. With continual technolog-
ical advancements, state-of-the-art Diffusion Models and
Figure 1. Examples of architecture design using generative AI
3D Generative Models are progressively integrated into ar- techniques: (a) church design [1]; (b) matrix of cuboid shapes [2];
chitectural design, offering designers a more diversified set (c) Frank Gehry’s Walt Disney concert hall [3]; (d) Bangkok urban
of creative tools and methodologies. This article further design [4]; (e) foresting architecture [4]; (f) Urban interiors [4]
provides a comprehensive review of the basic principles of and (g) text-to-architectural design [5].
generative AI and large-scale models and highlights the
applications in the generation of 2D images, videos, and
3D models. In addition, by reviewing the latest literature 1. Introduction
from 2020, this paper scrutinizes the impact of generative
AI technologies at different stages of architectural design, Nowadays, generative artificial intelligence (AI) tech-
from generating initial architectural 3D forms to produc- niques increasingly expand their power and revolution in ar-
ing final architectural imagery. The marked trend of re- chitectural design. Here, generative AI refers to the artificial
search growth indicates an increasing inclination within the intelligence technologies dedicated to content generation,
architectural design community towards embracing gener- such as text, images, music, and videos. Generative AI ben-
ative AI, thereby catalyzing a shared enthusiasm for re- efits from the rapid development of deep generative models,
search. These research cases and methodologies have not including Generative Adversarial Networks (GANs), Vari-
only proven to enhance efficiency and innovation signifi- ational Autoencoder (VAE), and Diffusion Models (DMs).
cantly but have also posed challenges to the conventional GANs and VAE are traditional generative models, and have
boundaries of architectural creativity. Finally, we point out been widely explored in architectural design, as illustrated
new directions for design innovation and articulate fresh in Figure 1. In this paper, we focus on the recent progress of
trajectories for applying generative AI in the architectural generative AI, especially the revolutionary diffusion mod-
domain. This article provides the first comprehensive liter- els. DMs achieved state-of-the-art performance in various
ature review about generative AI for architectural design, content generation tasks such as text-to-image and text-to-
and we believe this work can facilitate more research work 3D-models.
on this significant topic in architecture.
Architectural design may encompass multiple themes
and scopes, with each project having distinct design re-
Keywords: Generative AI, Architectural Design, Diffusion quirements and individual styles, leading to diversity and
Models, 3D Generative Models, Large-scale models. complexity in design approaches. In this work, we adopt
6 main steps in the architectural design process for the lit-
* corresponding author, [email protected] erature review: 1) architectural preliminary 3D forms de-

1
sign, 2) architectural layout design, 3) architectural struc- architectural design, thus advancing scientific research and
tural system design, 4) detailed and optimization design of technological innovation.
architectural 3D forms, 5) architectural facade design, and In terms of the issue of training data, deep learning mod-
6) architectural imagery expression. After exploring the re- els require high-quality training data to analyze and ver-
search papers from 2020 to 2023, we observed there has ify their generalization ability. However, data in the field
been a significant increase in the number of research papers of architecture is usually unstructured. The search and or-
in architectural design using Generative AI. The number of ganization of architectural training data pose a significant
research papers using Generative AI technology in different challenge, making it difficult right from the initial stages
architectural design steps reveals the development trends of model training. In addition, high-performance Graphics
within each subfield, as illustrated in Figure 2(a). Most re- Processing Units (GPUs) are required to train the millions
searches are concentrated in the area of architectural plan of data for deep learning models, especially those dealing
design. Research in preliminary 3D form design of archi- with complex images and datasets. The scarcity of high-
tecture and architectural image expression has rapidly in- performance GPUs and the difficulty of mastering GPU pro-
creased in the past two years. More research needs to be gramming skills may prevent the architects to explore the
done by scholars on architectural, structural system design, recent diffusion model and large foundation models.
architectural 3D form refinement and optimization design,
and architectural facade design. 1.2. Structure and Methodology
This sustained growth trend distinctly demonstrates that This article first introduces the development and applica-
generative AI in architectural design are expanding at an un- tion directions of generative AI models, then elaborates on
precedented rate while also reflecting the architectural de- the methods of applying generative AI in the architectural
sign and computer science community have high level of design process, and finally, forecasts the potential applica-
attention and increasing investment in Generative AI tech- tion development of generative AI in the architectural field.
nologies. The most used generative AI techniques are illus-
In section 2, the article offers an in-depth introduction to
trated in Fig 2(b). In computer science, many studies focus
the principles and evolution of various generative AI mod-
on GAN and VAE, while research on DDPM, LDM, and
els, with a focus on Diffusion Models (DMs), 3D Gener-
GPT is in the initial stages. The situation is the same in
ative Models, and Foundation Models. In section 2.1, the
architecture.
article elaborates on the principles and development of Vari-
1.1. Motivation ational Autoencoders (VAEs) and Generative Adversarial
Networks (GANs). In section 2.2, the discourse on Dif-
Leveraging the recent generative AI models in architec- fusion Models elaborates on the working mechanisms and
tural design could significantly improve design efficiency, the developmental trajectories of DDPM and LDM. In sec-
and provide architects with new design processes and ideas tion 2.3, the segment on 3D Generative Models zeroes in
to expand the possibilities of architectural design and rev- on 3D shape representation, encompassing Voxels, Point
olutionize the entire design process. However, the use of Clouds, Meshes, Implicit functions, and Occupancy Fields.
advanced generative models in architectural design has not Within Occupancy Fields, the paper details Signed Distance
been explored extensively. The primary reasons for hinder- Functions (SDF), Unsigned Distance Functions (UDF), and
ing the use of advanced generative models in architectural Neural Radiance Fields (NeRF), explaining their respec-
design may have two aspects: the professional barriers and tive operational principles. In section 2.4, the Foundation
the issue of training data. Models section comprehensively describes the progress and
In terms of professional barriers, deep learning and ar- achievements of Large Language Models (LLM) and Large
chitectural design are highly specialized fields requiring ex- Vision Models. In section 2.5, the paper discusses the ap-
tensive professional knowledge and experience. The aim of plications and developments of these models in image gen-
this study is to narrow the professional barriers between ar- eration, video generation, and 3D model generation.
chitecture and computer science, and assist architectural de- In section 3, this paper delves into the application de-
signers in bridging Generative AI technologies with appli- velopment of generative AI models in architectural design.
cations, promoting interdisciplinary research, and delineat- Given the complexity of the architectural design process,
ing future research directions. This review systematically this article delineates the architectural design process into
analyzes and summarizes case studies and research out- six steps, as presented in introduction. In each step, the
comes of Generative AI applications in architectural design, article summarizes and discusses the current application
and showcases the possibilities and potential of the intersec- methods of generative AI models in these six domains.
tion between computer science and architecture. This in- By analyzing these research papers, the study demonstrates
terdisciplinary perspective encourages collaboration among how generative AI can facilitate innovation in architectural
experts from different fields to address complex issues in design, improve design efficiency, and optimize architec-

2
Figure 2. Overview of generative AI applications in architectural design: statistics on research paper numbers and generative models.

tural solutions. Throughout this summarization process,


literature retrieval was conducted using databases such as
Cumincad and Web of Science, supplemented by searches
on Litmaps. To ensure the targeted and accurate nature of GAN:
the search, specific search queries were set for each design
process.
In Section 4, this article explores the potential applica-
tions of generative AI technology in generating architec-
tural design images, architectural design videos, architec-
tural design 3D models, and human-centric architectural de- VAE:
sign. In section 4.1, it anticipates applications for archi-
tectural design image generation in generating floor plans,
facade images, architectural images. In section 4.2, it an-
ticipates architectural design video generation, it foresees
applications such as generating videos from a single archi-
tectural image, generating videos from architectural images,
style transfer for specific video content. In section 4.3, Re- DM:
garding architectural design 3D model generation, it envi-
sions possibilities in generating 3D models from images and
text prompt, transferring styles to 3D models, and generat-
Figure 3. The framework of GAN, VAE, and diffusion models
ing and editing detailed styles for 3D models. In section 4.4, (DM). Where z is a compressed low-dimensional representation
it elaborates on the potential of generative AI in enhancing of the input.
the human-centric architectural design process.

2. Generative AI Models ation with a generator and a discriminator, GANs engage


The generative AI models are currently experiencing in an adversarial training process to prompt the generator
rapid development, with new methods continually emerg- to generate images progressively resembling the distribu-
ing. The evolution of deep learning-based approaches, par- tion of real data [7, 8]. Moreover, the diffusion models
ticularly Variational Autoencoders (VAE), Generative Ad- stand out as the most revolutionary technologies that have
versarial Networks (GAN), and Diffusion Models (DM), emerged in recent years with remarkable image generation
have significantly advanced and enhanced image generation quality [9, 10]
techniques. VAEs played a pioneering role in deep learning-
2.1. Generative Adversarial Networks
based generative models. They employ an encoder-decoder
architecture integrated with probabilistic graphical mod- Generative Adversarial Network (GAN) [11] comprises
els to learn latent representations for image generation [6]. a generator G and a discriminator D, as illustrated in Fig-
GANs represent a milestone in the realm of image gener- ure 3. The G is responsible for generating samples for noise

3
z, while the D determines the authenticity of the generated
samples G(Z) with the ground truth image x̄. Ideally:

D(X̄) = 1, D(G(z)) = 0 (1)

This adversarial nature enables the model to maintain a dy-


namic equilibrium between generation and discrimination,
propelling the learning and optimization of the entire sys-
tem. Despite its advantages, GAN still faces challenges,
such as mode collapse during training.
Figure 4. The framework of the latent diffusion model, which is
Conditional GAN Conditional image generation is an proposed by Rombach et al [9]
image generation technique that controls the generation .
process by introducing conditional information to gener-
ate images that match given conditions, such as text, la- data by projecting it into a low-dimensional, efficient latent
bels, and hand-drawn sketches. Conditional image gener- space, in which high-frequency, imperceptible details are
ation introduces additional input conditions, enabling the abstracted away. The framework of LDM is illustrated in
generator to generate images with specific properties based Figure 4. After the image x is compressed by the encoder E
on conditional information. To address the issue that GAN to latent representation z, the diffusion process is performed
models exhibit limited controllability, Conditional GAN on the latent representation space. LDM has a similar dif-
(CGAN) [12] was introduced that uses additional auxiliary fusion process to the DDPM. Finally, LDM infers the data
information as a condition to fine-tune both the G and D. sample z from the noise zT and D restores the data z to the
The G of CGAN receives conditional information besides original pixel space and gets the result images x e.
random noise. By providing conditional information to Specifically, given an image x ∈ RH×W ×3 with height
the G, CGAN can more precisely control the generated re- H, wigth W in RGB space , LDM first utilizes an encoder
sults. Additionally, variants such as pix2pix [13] and Style- E to encode the image x into a latent representation space:
GAN [7] have been developed.
z = E(x) (2)
2.2. Diffusion Models
In image generation, diffusion models outperform GANs where z ∈ Rh×w×c with height h and width w, the constant
and VAEs [14, 15]. Most diffusion models currently used c represents the number of channels. Then D recover the
are based on Denoising Diffusion Probabilistic Models image from the latent representation space:
(DDPM) [15] which simplifies the diffusion model through
variational inference. As shown in Figure 3, diffusion mod- x̃ = D(z) = D(E(x)) (3)
els contain both forward diffusion process and reverse de-
noising (inference) processes. The forward process follows To accelerate the generation speed, the Latent Consis-
the concept of a Markov chain and turns the input image tency Model (LCM) [17] was proposed to optimize the step
into Gaussian noise. Given a data sample x0 , the Gaussian of denoising inference.
noise is progressively Increased to the data sample during 2.3. 3D Generative Models
T steps in the forward process, producing the noisy sam-
ples xt , where the timestep t = {1, . . . , T }. As t increases, In the field of three-dimensional shape modeling, im-
the distinguishable features of x0 gradually diminish. Even- plicit functions are commonly represented in three ways:
tually when T → ∞, xT is equivalent to a Gaussian dis- Occupancy Field, Signed Distance Function (SDF), or Un-
tribution with isotropic covariance. In addition, the infer- signed Distance Function (UDF), and the recently emerging
ence process can be understood as a sequence of denoising Neural Radiance Fields (NeRF).
autoencoders with same weights ϵθ (xt , t) (ϵθ is typically
implemented as U-Net [16]), which are trained to forecast 3D Shape Representation Representation in 3D visual
denoised images of their corresponding inputs xt . problems can generally be divided into four categories:
voxel-based, point cloud-based, mesh-based, and implicit
Latent Diffusion Model Different from DDPM, Latent representation-based.
Diffusion Model (LDM) [9] does not directly operate on Voxel. As shown in Fig 5a. The voxel format describes
the images but operates in the latent space, called percep- a 3D object as a matrix of volume occupancy, where the
tual compression. LDM reduces the dimensionality of the size of the matrix is fixed. Researchers [18] adopted voxel

4
(a) Voxel (b) Point (c) Mesh (d) Implicit

Figure 5. Representation examples of 3D shapes from [24].


Figure 6. DeepSDF [23] representation applied to the Stanford
Bunny: (a) depiction of the underlying implicit surface SDF = 0
trained on sampled points inside SDF <0 and outside SDF >0
representation in the generation of 3D shapes. Voxel for-
the surface, (b) 2D cross-section of the signed distance field, (c)
mat requires high resolution to describe fine-grained details, rendered 3D surface recovered from SDF = 0. Note that (b) and
so as the shape resolution increases, the computational cost (c) are recovered via DeepSDF.
also explodes. The reconstruction results of voxel-based re-
search are limited in resolution and do not provide topolog-
ical guarantees or represent sharp features. Function (SDF) has become a crucial direction in implicit
Point Cloud. As shown in 5b. Point clouds are a lightweight function representation within deep learning. SDF assigns
3D representation composed of (x, y, z) coordinate values. a signed distance value to each point, indicating the short-
Point clouds are a natural way to represent shapes. Point- est distance from the point to the object’s surface. Positive
Net [19] extracts global shape features using the max-set values signify points outside the object, while negative val-
operations, and it is used widely as an encoder for point- ues indicate points inside the object. As shown in Figure 6.
based generative networks [20]. However, point clouds do DeepSDF [23] provides an end-to-end approach for contin-
not represent topology and are unsuitable for generating wa- uous SDF learning, enabling precise modeling of irregular
tertight surfaces. shapes and local geometry.
Mesh. As shown in 5c meshes are widely used and con- UDF. UDF and SDF are two distinct yet interrelated im-
structed from vertices and faces. [21] deformed a pre- plicit function representation approaches. UDF assigns an
defined template to restrict a fixed topology using graph unsigned distance value to each point, representing the dis-
convolution. Recently, meshes are used to represent shapes tance to the nearest surface without considering surface di-
in deep learning techniques [22]. Although meshes are rection. UDF is particularly useful for capturing more intu-
more suitable for describing the topological structure of ob- itive surface distance information without involving direc-
jects, they usually require advanced preprocessing steps. tional aspects. Zhao et al. [26] contribute significantly by
Implicit. As shown in 5d, implicit representation refers to jointly exploring the learning of both signed and unsigned
describing a surface with a zero-crossing point of a volume distance functions. This approach aims to enrich the ex-
function ψ : R3 → R, whose value can be adjusted. Repre- pressiveness of implicit functions, simultaneously capturing
senting a 3D shape as a set of level sets of a deep network, intricate details through both signed and unsigned distance
mapping 3D coordinates to a signed distance function [23] information.
or occupancy field [24]. Implicit representation can create NeRF. Neural Radiance Fields (NeRF) [25] have revolu-
a lightweight, continuous shape representation with no res- tionized the field of computer vision and graphics by intro-
olution limits. ducing a novel approach to scene representation. As shown
in Figure 7. At the heart of NeRF lies the concept of repre-
senting a scene as a continuous function capturing radiance
Occupancy Field Occupancy Field is one of the implicit
information at every point. The fundamental equation driv-
function methods based on deep learning [24]. Occu-
ing NeRF is the rendering equation, mathematically formu-
pancy Field assigns binary values to each point in three-
lating the observed radiance along a viewing ray. The NeRF
dimensional space, determining whether the point is occu-
formulation is expressed as:
pied by an object. This approach utilizes neural networks
to learn the representation of occupancy fields, facilitating Z
highly detailed three-dimensional reconstruction. The ad- C(p) = T (pt ) · σ(pt ) · L(pt , −d) dpt
vantage of Occupancy Field lies in its dynamic modeling of
object occupancy in scenes, making it suitable for handling Where C(p) represents the observed color at point p, pt
complex three-dimensional environments. represents points along the viewing ray, T (pt ) is the trans-
SDF. Building upon Occupancy Field, the Signed Distance mittance function, σ(pt ) represents volume density, and

5
Figure 7. An overview of NeRF scene representation and differentiable rendering procedure [25]. Synthesizing images by sampling 5D
coordinates (location and viewing direction) along camera rays (a), feeding those locations into an MLP to produce a color and volume
density (b), and using volume rendering techniques to composite these values into an image (c). And minimize the residual between
synthesized and ground truth observed images (d).

L(pt , −d) represents emitted radiance. NeRF introduces [28] and Generative Pre-trained Transformer (GPT) [29].
an implicit representation, enabling the encoding of detailed The main difference is that BERT is based on a bidirectional
and continuous volumetric information. This allows for pre-training language model and fine-tuning, while GPT is
high-fidelity reconstruction and rendering of scenes with based on an autoregressive pre-training language model and
fine-scale structures, surpassing the limitations of explicit prompting.
representations. Recently, 3D Gaussian Splatting [27] is in-
troduced by projecting 3D information onto a 2D domain
GPT. GPT aims to pre-train models using large-scale un-
using Gaussian kernels, and achieved better performance
supervised learning to facilitate understanding and gener-
than NeRF.
ation of natural language. The training process involves
2.4. Foundation Models two primary stages: Initially, a language model is trained
in an unsupervised manner on extensive corpora without
In computer science, foundation models also called task-specific labels or annotations. Subsequently, super-
large-scale models use deep learning models with numer- vised fine-tuning occurs during the second stage, catering
ous parameters and intricate structures, particularly in nat- to specific application domains and tasks.
ural language processing and computer vision tasks. These
models demand substantial computational resources for
training but exhibit exceptional performance across diverse BERT. BERT has emerged as a breakthrough approach,
tasks. The evolution from basic neural networks to sophis- achieving state-of-the-art performance across diverse lan-
ticated diffusion models, as depicted in Figure 8, illustrates guage tasks. BERT’s training methodology comprises two
the continuous quest for more robust and adaptable AI sys- key stages: pre-training and fine-tuning. Pre-training in-
tems. volves the utilization of extensive text corpora to train the
language model. The primary objective of pre-training is
to endow the BERT model with robust language under-
2.4.1 Large Language Models (LLM) standing capabilities, enabling it to effectively tackle var-
ious natural language processing tasks. Subsequently, fine-
Transformer. The Transformer model has achieved remark-
tuning utilizes the pre-trained BERT model in conjunction
able success in natural language processing (NLP) which
with smaller labeled datasets to refine the model parame-
consists of several components: encoder, decoder, posi-
ters. This process facilitates the customization of the model
tional Encoding, and the final linear and softmax layers.
to specific tasks, thereby enhancing its suitability and per-
Both the encoder and decoder are composed of multiple
formance for targeted applications.
identical layers. Each layer contains several components of
attention layers and feedforward network layers. Addition-
ally, positional encoding is used to inject positional infor- In recent years, LLMs have witnessed explosive and
mation into the text embeddings, indicating the position of rapid growth. Basic language models refer to models that
words within the sequence. Notably, Transformer has paved are only pre-trained on large-scale text corpora, without any
the way for two prominent Transformer models: Bidirec- fine-tuning. Examples of such models include LaMDA [30]
tional Encoder Representations from Transformers (BERT) and OpenAI’s GPT-3 [31].

6
Large-Language

ChatGPT ChatGPT ChatGLM Gemini


Models

CLIP (GPT-3.5) (GPT-4)


ERNIE Bot

BART
BERT T5
Methods
NLP

GPT-1 GPT-2 GPT-3 GPT-4

DreamBooth
VideoCrafter

CogView
Large-Visual

T2I-Adapter
SD V1 SD V2 SD XL
Models

DreamFusion IP-Adapter
Magic3D
Midjourney SVD
eDiff PIKA
DALLꞏE GLIDE DALLꞏE2 Imagen DALLꞏE3

Prompt-to- LoRA
Classifier-Free ControlNet
prompt
Diffusion
Methods

Diffusion Guidance
MultiDiffusion
DPM DDPM DDIM LDM LCM

Point Cloud
CGAN DCGAN Transformer StyleGAN
Methods

NeRF
Other

RNN GAN
pix2pix
FCNSDF VAE GCN
CNN
UDF

Before 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023

Figure 8. The evolution of prominent large-scale models in computer science.

2.4.2 Large Vision Models 2.5.1 Image Generation


In computer vision, pretrained vision-language models like Text-to-image Synthesizing high-quality images from
CLIP [32] have demonstrated powerful zero-shot gener- text descriptions is a challenging problem in computer vi-
alization performance across various downstream visual sion, StackGAN [35] proposed a two-stage model to solve
tasks. These models are typically trained on hundreds of this issue. In the first stage, StackGAN generates the prim-
millions to billions of image-text pairs collected from the itive shape and colors of the object based on the given text
web. In addition, some research efforts also focus on large- description, yielding the initial low-resolution images. In
scale base models conditioned on visual input prompts. For the second stage, StackGAN takes the low-resolution result
example, SAM [33] can perform category-agnostic segmen- and text prompts as inputs and generates high-resolution
tation from given images and visual prompts (such as boxes, images with photo-realistic details. It can rectify defects in
points, or masks). the results of the first stage and add exhaustive details dur-
The current generative models based on the diffusion ing the refinement process. GLIDE [36] extends the core
model present unprecedented understanding and creative concepts of the diffusion model by adding additional text in-
capabilities. Stable Diffusion [9] uses the CLIP [32] text formation to enhance the training process, ultimately gener-
encoder and can adjust the model through text prompts. Its ating text-conditioned images. On this basis, by using diffu-
diffusion process starts with random noise and gradually de- sion models and massive data, With the release of LDM [9],
noising until generates a complete data sample . DALLE- Stable Diffusion based on LDM has also sprung up. These
3 [34] utilizing the diffusion model with massive data to works cover areas such as image editing and more power-
generate amazing results MIDJourney excels at adapting to ful 3D generation, further advancing image generation and
actual artistic styles to create images with any combination making it closer to human needs.
of effects the user desires.
Image-to-image Image-to-image translation can convert
2.5. Applications of Generative AI
the content in an image from one image domain to another,
In this section, we introduce widely used applications of that is, cross-domain conversion between images.
generative AI, including image generation (Section 2.5.1), Sketch. The objective of sketch-to-image generation is
video generation (Section 2.5.2), and 3D model generation to ensure that the generated image maintains consistency in
(Section 2.5.3). Furthermore, we present results from pre- both appearance and context with the provided hand-drawn
sented models in Figure 9 as illustrative references. sketch. Pix2Pix [13] stands out as a classic GAN model ca-

7
A modern matte painting of a
This bird is red This bird is red architectural castle made of
A small A cute corgi lives A warm room, floor-
Input and brown in color, with white and has building with large A chair made cheesecake
kitchen with in a house made to-ceiling windows,
with a stubby beak. a very short beak. glass windows, of brick. surrounded by a
a low ceiling. out of sushi. sunlight, cat.
overlooking a serene moat made of ice
ocean at sunset. cream

Output

StackGAN AttnGAN Glide DALL-E3 Imagen Stable Diffusion XL Text2Mesh DreamFusion

Input

Output

Pix2pix ControlNet MultiDiffusion NeRF DreamCraft3D Magic123 Sg2im SceneGenie

Input

Output

Pix2pix SketchyGAN Voynov et al. ControlNet Control3D Layout2Im He et al. GLIGEN

Figure 9. Example of generated results from generative AI models.

pable of handling diverse image translation tasks, including text-to-image generation in open worlds using prompts and
the transformation of sketches into fully realized images. In bounding boxes as condition inputs.
addition, SketchyGAN [37] focuses on the sketch-to-image Scene Graph. The scene graph was proposed and uti-
generation task and aims to achieve more diversity and real- lized first in 2018 [44], which is used to enable explicit
ism. Currently, ControlNet [38] can control diffusion mod- reasoning about objects and their relationships. Thereafter,
els by adding extra conditions. The sketch-to-image gen- Sortino et al. [45] proposed a model that can satisfy seman-
eration tasks are applied in both photo-realistic and anime- tic constraints defined by a scene graph and to model rela-
cartoon styles [39, 40]. tions between visual objects in the scene by taking into ac-
count a user-provided partial rendering of the desired target.
Layout. Layout typically encompasses details such as Currently, SceneGenie [46] combined the scene graphs with
the position, size, and relative relationships of individual advanced diffusion models to generate high-quality images.
objects. Layout2Im [41] is designed to take a coarse spatial Which enforces geometric constraints in the sampling pro-
layout, consisting of bounding boxes and object categories, cess using the bounding box and segmentation information
for generating a set of realistic images. These images ac- predicted from a scene graph.
curately depict the specified objects in their intended lo-
cations. To enhance the global attention in context, He et 2.5.2 Video Generation
al. [42] introduced the Context Feature Conversion Mod-
ule to ensure that the generated feature ControlNet
encoding for ob- Since text prompts only generate some discrete tokens, text-
jects remains aware of other coexisting objects in the scene. to-video generation is more difficult than tasks such as im-
As for diffusion models, GLIGEN [43] facilitates grounded age retrieval and image captioning. Video diffusion model

8
[47] is the first paper to use the diffusion model for video derings, thereby delivering high-fidelity, coherent 3D ob-
generation tasks. The video diffusion model proposes 3D jects. While Magic123 [56] offers a two-stage solution for
UNet, which can be applied on variable sequence lengths. generating high-quality textured 3D meshes from unposed
Thus it can be jointly trained on video and image modeling wild images. It optimizes a neural radiance field for coarse
goals, making it suitable for video generation tasks. Ad- geometry and fine-tunes details using differentiable mesh
ditionally, Make-A-Video [48] is based on the pre-trained representations guided by both 2D and 3D diffusion priors.
text-to-image model and adds one-dimensional convolution
and attention layers in the time dimension to transform it 3. Generative AI for Architectural Design
into a text2video model. By learning the connection be-
tween text and vision through the T2I model, the single- This study delineates the architecture design into six
modal video data is utilized to learn the generation of tem- main steps to facilitate a convenient understanding of the
poral dynamic content. Furthermore, the controllability and process and essence of architectural design. The output of
consistency of video generation models have also garnered each step is generated based on the project’s objective con-
increased attention from researchers. PIKA [49] has been ditions and the architects’ subjective intentions. Objective
proposed to support dynamic transformations of elements conditions (O) include factors such as site area, building
in the scene based on prompts, without causing the over- height restrictions, and construction standards that must be
all image collapse. DynamiCrafter [50] utilizes pre-trained adhered to by all architects. Subjective intentions (S) re-
video diffusion priors to add animation effects to static im- fer to the individual architect’s design concept, architectural
ages based on textual prompts. This tool supports high- style, and other subjective preferences. This study explores
resolution models, providing better dynamic effects, higher how generative AI can assist with preliminary design, lay-
resolution, and stronger consistency. out design, structural design, 3D form design, façade de-
sign, and imagery expressions based on objective conditions
and subjective intentions. It also presents a statistical anal-
2.5.3 3D Model Generation ysis of generative AI models used in each architectural step
and the tasks they accomplished.
Text-to-3D Recent advancements in text-to-3D synthesis
have demonstrated remarkable progress, with researchers 3.1. Architectural Preliminary 3D Forms Design
employing various sophisticated strategies to bridge the
gap between natural language descriptions and the creation To begin with, creating a preliminary 3D architecture
of detailed 3D content. The pioneering work DreamFu- model involves considering objective factors such as the
sion [51] harnesses a pre-trained 2D text-to-image diffu- building’s type and function, site conditions, surroundings
sion model to generate 3D models without large-scale la- environment, and subjective factors such as design con-
beled 3D datasets or specialized denoising architectures. cepts and morphological intentions. This process can be
Magic3D [52] improves upon DreamFusion’s [51] limita- expressed by Equations (4).
tions by implementing a two-stage coarse-to-fine approach,
accelerating the optimization process through a sparse 3D n T4 o
FP-3D = yP-3D | yP-3D ∈ i=1 fP-3D (oiP-3D ) ∩ fP-3D (SP-3D ) (4)
representation before refining it into high-resolution tex-
tured meshes via a differentiable renderer.
Where yP-3D is the generated preliminary 3D model of
the architecture, FP-3D is the collection of all the options.
Image-to-3D Recent 3D reconstruction techniques par- OP-3D refers to the Objective conditions of the preliminary
ticularly focus on generating and reconstructing three- design, which includes design tasks (o1P-3D ), such as building
dimensional objects and scenes from a single or few im- functions, building area, building height restrictions, and
ages. NeRF [53] represents a state-of-the-art technique the number of occupants; site conditions (o2P-3D ), such as
where complex scene representations are modeled as con- the red line of the site, the shape of the boundaries; sur-
tinuous neural radiance fields optimized with sparse input roundings conditions (o3P-3D ), such as nearby traffic arteries,
views. CLIP-NeRF [54] leveraging the joint language- neighbor buildings; and environmental performance (o4P-3D ),
image embedding space of CLIP model, proposes a uni- such as the daylighting, wind and thermal environment.
fied framework that allows manipulating NeRF in a user- SP-3D refers to the Subjective intentions of the preliminary
friendly way, using either a short text prompt or an exem- design.
plar image. DreamCraft3D [55] introduces a hierarchical To elucidate the specific architectural design process,
process for 3D content creation that employs bootstrapped this paper takes the Bo-DAA apartment project in Seoul,
score distillation sampling from a view-dependent diffusion South Korea, as an example. The project requirements in-
model. This two-step method refines textures through per- clude multiple residential units and shared public spaces,
sonalized diffusion models trained on augmented scene ren- encompassing a communal workspace, lounge, shared

9
Figure 10. Architectural preliminary 3D forms design process.

Table 1. Application of Generative AI in the preliminary 3D forms design of architecture

Data Transformation Approach Paper & Methodology


parameters to FP-3D VAE [57]; GAN, VAE [58]; 3D-DDPM [59]; GANs [60]; 3D-GAN, CPCGAN [61]
classif y FP-3D VAE [62]; 3D-AAE [63]
SP-3D text to FP-3D CVAE [64]
RP-3D sketch to FP-3D VAE, GAN [65]
RP-3D 2d to RP-3D 2d pix2pix [66, 67];DCGAN [68];pix2pix, CycleGAN [69, 70]
(o2P-3D + o3P-3D ) to FP-3D pix2pix [71]; ESGAN [72]
(SP-3D + o3P-3D ) to FP-3D cGAN [73]; GAN [74]
FP-3D to FP-3D TreeGAN [75]; DDPM [76]
(FP-3D + o2P −3D ) to o4P −3D VAE [77]; pix2pix, cycleGAN [78]

kitchen, laundry room, and pet bathing area (o1P-3D ). The ings (o3P-3D ). To enhance resident comfort, the design con-
site is a regular rectangle with flat terrain (o2P-3D ), located in sidered lighting and views for each residential unit (o4P-3D ).
an urban setting surrounded by multi-story residential build- Based on these requirements, the architect chose ”Book

10
House” as the design concept (SP-3D ), creating a prelimi- higher than that of training on 2D image data. Facing
nary 3D form (FP-3D ) that gradually tapers from bottom to the challenges associated with training neural networks
top. This design provides excellent lighting and views for on 3D forms, researchers have innovated by transforming
each residential unit level. This process is illustrated in Fig- 3D forms into 2D representations, such as grayscale im-
ure 10. ages enriched with elevation data. This approach simpli-
The applications of generative AI in this process include fies the training process, enhancing efficiency for architec-
four main categories, as shown in the table1: generating tural forms in specific regions and facilitating the genera-
FP-3D based on parameters and classification; generating tion of 3D models influenced by the surrounding environ-
FP-3D based on 2D images or 1D text (usually from o1P-3D , ment (RP-3D 2d to RP-3D 2d ) [67–70]. Moreover, the prac-
o2P-3D , o3P-3D and SP-3D , SP-3D text , RP-3D 2d , RP-3D sketch ); tice of converting 3D models into 2D images for recon-
and generating or redesign FP-3D based on 3D model data struction, followed by reverting these 2D images back to
(usually from FP-3D ); and generating environmental perfor- 3D forms, significantly reduces both training duration and
mance evaluation (usually from o4P-3D ) based on 3D model costs, ensuring accurate restoration of the original 3D mod-
data (usually from FP-3D ). els (RP-3D 2d to RP-3D 2d ) [66]. In other generative AI train-
Firstly, it facilitates generative AI in generating pre- ing strategies, researchers incorporate parameters such as
liminary 3D forms based on input parameters or in con- the design site’s scope (o2P-3D ) and characteristics of the im-
ducting classification analysis on preliminary 3D models. mediate environment (o3P-3D ) as generative conditions. This
Initially, Variational Autoencoders (VAE) play a pivotal enables the creation of preliminary 3D models that adhere
role in reconstructing and generating detailed 3D models to predefined rule settings (o2P-3D + o3P-3D to FP-3D ) [71, 72].
(FP-3D ) from a set of input parameters (parameters to Furthermore, researchers can create architectural 3D mod-
FP-3D ) [57]. Building upon this, Generative Adversarial els from design concept sketches (SP-3D to FP-3D ) [73], and
Networks (GAN) further refine the process by training on even from a singular concept sketch in conjunction with en-
the point coordinate data of 3D models, utilizing category vironmental data (SP-3D + o3P-3D to FP-3D ) [74].
parameters for more precise reconstructions (parameters Afterwards, utilize 3D models as the basis for genera-
to FP-3D ) [60]. And the approach facilitates the creation of tive AI creation, or redesign based on generated 3D models,
innovative architectural 3D forms through the technique of which is generated by generative AI. TreeGAN is used to
input interpolation (parameters to FP-3D ) [58]. Also, dif- train point cloud models of churches, leveraging these mod-
fusion probability models offer a unique method of train- els for diverse redesign applications (FP-3D to FP-3D ) [75].
ing Taihu stone and architectural 3D models, this training Additionally, diffusion probability models are instrumental
enables the discovery of transitional forms between two in training 3D models, introducing noise into 3D models
distinct 3D models by employing interpolation as an in- to create novel forms (FP-3D to FP-3D ) [76]. Lastly, genera-
put mechanism (parameters to FP-3D ) [59]. The Struc- tive AI is utilized to conduct site and architectural environ-
ture GAN model, focusing on point cloud data, enables mental performance evaluations based on 3D models. This
the generation of 3D models based on specific input pa- involves generating images for assessments such as view
rameters such as length, width, and height (parameters analysis, sunlight exposure, and daylighting rates, among
to FP-3D ) [61]. In a further enhancement to the modeling others (FP-3D + o2P −3D to o4P −3D ) [77, 78].
process, VAE is also utilized for the in-depth training of 3D 3.2. Architectural Plan Design
models (FP-3D ). This allows for a comprehensive classifica-
tion and analysis of the models’ distribution within the la- Architectural plan design, the second phase in the archi-
tent space, paving the way for more nuanced model creation tectural design process, involves creating horizontal section
(classif y FP-3D ) [62]. Generative AI techniques, such as views at specific site elevations. Guided by objective condi-
the 3D Adversarial Autoencoder model, are employed for tions and subjective decisions, this step includes arranging
the training and generation of point cloud representations, spatial elements like walls, windows, and doors into a 2D
facilitating the reconstruction and classification of architec- plan. This process can be expressed by Equations (5).
tural forms (classif y FP-3D ) [63]. ( 3
Secondly, it involves using 1D text data or 2D image data \
FPlan = yPlan | yPlan ∈ fPlan (oiPlan )
as the generation conditions for generative AI to produce
i=1
preliminary 3D forms. Variational Autoencoders (VAE) are  (5)
2
also applied to train and generate 3D voxel models, guided \ 
∩ fPlan (sjPlan )
by textual labels (SP-3D text to FP-3D ) [64]. Lastly, the in- 
j=1
tegration of VAE and GAN models facilitates the gener-
ation of architectural 3D forms from sketches (SP-3D sketch Where yPlan is the generated architectural plan design,
to FP-3D ) [65]. The difficulty of training on 3D data is FPlan is the collection of all the options. OPlan refers to

11
Figure 11. Architectural plan design process.

the Objective conditions of the architectural plan design, each floor’s Plan based on the model’s elevation contours.
which includes the preliminary architectural 3D form de- They then design functional spaces (s1Plan ) according to spa-
sign (o1Plan ), which is the result of the prior design phase; tial requirements (o2Plan ), as evacuation distances and space
spatial requirements and standards (o2Plan ), such as space area needs, positioning public areas on the lower floors and
area and quantity needs; Spatial environmental performance residential units above. Spatial sequences (s2Plan ) are struc-
evaluations (o3Plan ), such as room daylighting ratio, ventila- tured using corridors and atriums to align with the layout.
tion rate, etc . SPlan refers to the Subjective intentions of the Environmental evaluations (o3Plan ) are also conducted to en-
architectural plan design. which includes functional space sure spatial performance. This leads to a comprehensive
layout (s1Plan ), indicating the size and layout of functional architectural plan (RPlan ) that meets all established con-
spaces; Spatial sequences (s2Plan ), such as bubble diagrams straints.This process is showed in Figure 11.
and sequence schematics. The applications of generative AI in plan design in-
By accumulating the plan design results of each layer, clude four main categories, as shown in the table2: gen-
the overall plan design outcome is obtained, represented as erating floor plan FPlan based on 2D images (usually from
Equations (6): o1Plan ,o3Plan , s1Plan , s2Plan , and FPlan ) , generating functional
n
X
i space layout s1Plan based on 2D images (usually from s1Plan ,
RPlan = FPlan (6)
s2Plan ,o1Plan ,o2Plan , o2P-3D ,o3P-3D ,o4P-3D ), generating spatial se-
i=1
quences s2Plan based on 2D images (usually from RPlan ,o1Plan ,
Using the Bo-DAA apartment project as an example, ar- o3Plan ), generating Spatial environmental performance evalu-
chitects first create a preliminary 3D model (o1Plan ) to outline ations o3Plan based on 2D images (usually from FPlan ,s1Plan ).

12
Table 2. Application of generative AI in the architectural plan design

Data Transformation Approach Paper & Methodology


o1Plan to s1Planto FPlan GANs [79–83];
s1Plan to FPlan pix2pix [84];
s2Plan to FPlan Graph2Plan [85]; pix2pix [86]; CycleGAN [87]
FPlan to FPlan pix2pix [88]; GANs [89]
o3Plan to FPlan pix2pix [90]
(s1Plan + o4P-3D ) to s1Plan Genetic-Algorithm, FCN [91]
s1Plan to s1Plan GNN, VAE [92]
o2Plan to s1Plan CoGAN [93]
(sPlan + o3P-3D ) to s1Plan
1
GANs [94–96]; CNN, pix2pixHD [94];
(s1Plan + o2Plan ) to s1Plan Transformer [97]
s2Plan to s1Plan GANs [98–105]; Transformer [106]; DM [107]
o2P-3D to s1Plan pix2pix [104, 108, 109]; GauGAN [110]
o1Plan to s1Plan GANs [111]; StyleGAN, Graph2Plan, RPLAN [112]; pix2pix [113]
RPlan to s2Plan EdgeGAN [114]; cGAN [115]
o3Plan to s2Plan DCGAN [116]; VQ-VAE, GPT [117]
o1Plan to s2Plan DDPM [118]
FPlan to o3Plan cGAN [119]
s1Plan to o3Plan pix2pix [120]; pix2pix [121]

Firstly, In terms of generating architectural floor plans, tional layouts per spatial requirements(o2Plan to s1Plan ) [93].
researchers can create functional space layout diagrams Additionally, it can predict and generate functional space
from preliminary design range schematics or site range layouts based on surrounding environmental performance
schematics and then generate the final architectural Plan evaluations (s1Plan + o3P-3D to s1Plan ) [94–96, 122] . Similarly,
based on these layouts, progressing from o1Plan to s1Plan , Generative AI possesses the ability to complement or aug-
and finally to FPlan [79–83]. Architectural floor plans can ment incomplete functional space layouts based on specific
directly generated from functional space layout diagrams demands (s1Plan + o2Plan to s1Plan ) [97, 123], and it use spa-
(s1Plan to FPlan ) [84]. Additionally, researchers utilize gen- tial sequences to generate functional space layout diagrams
erative models to convert planar spatial bubble diagrams or (s2Plan to s1Plan ) [98–107] . In addition, it can generate func-
spatial sequence diagrams into latent spatial vectors, which tional space layout diagrams based on the designated red-
are then used to generate architectural floor plans (s2Plan to line boundary of a design site (o2P-3D to s1Plan ) [104,108–110]
FPlan ) [85–87]. Moreover, by utilizing GAN models, ar- , and it skillfully use plan design boundaries as conditions
chitectural floor plans can be further refined to obtain floor to generate functional space layout diagrams (o1Plan to s1Plan )
plans with flat furniture. (FPlan to FPlan ) [88]. Some pro- [111–113].
cesses of reconstruction and generation of floor plans are Thirdly, generative AI demonstrates exceptional perfor-
achieved through training architectural floor plans (FPlan to mance in the generation and prediction of spatial sequences
FPlan ) [89]. Generating floor plans can also be produced (s2Plan ). Specifically, It is capable of identifying and re-
based on space Environmental evaluations, such as lighting constructing wall layout sequences from floor plans (RPlan
and wind conditions (o3Plan to FPlan ) [90]. to s2Plan ) [114, 115].Additionally, it can construct spatial
Secondly, generative AI is not limited to producing ar- sequence bubble diagrams directly from these floor plans
chitectural floor plans but also plays various roles in the (RPlan to s2Plan ) [124]. Moreover, generative AI can em-
generation of functional space layouts (s1Plan ). For instance, ploy isovists for predicting spatial sequences (o3Plan to s2Plan )
it can utilize neural networks and genetic algorithms to en- [116,117] . Lastly, it is also capable of producing these dia-
hance functional layouts based on wind environments per- grams conditioned on specific plan design boundary ranges
formance evaluations (s1Plan + o4P-3D to s1Plan ) [91]. More- (o1Plan to s2Plan ) [118] .
over, Generative AI can reconstruct and produce match- Lastly, generative AI can foresee space environmental
ing functional layout diagrams based on the implicit in- performance evaluations from floor plans (FPlan to o3Plan )
formation within the functional space layout maps (s1Plan [119] , such as light exposure and isovist ranges. It can
to s1Plan ) [92]. Furthermore, it can generate viable func- also predict indoor brightness [120] and daylight penetra-

13
tion [121] using functional space layout diagrams (s1Plan to tive AI combines functional space layouts (s1Plan ) and ar-
o3Plan ). chitectural floor plans (o2str ) to create corresponding archi-
tectural structure layout diagrams (s1Plan + o2str to (xstr , ystr ))
3.3. Architectural Structural System Design [133].
The third phase in the architectural design process, archi- In terms of predicting and generating structural dimen-
tectural structure system design, involves architects devel- sions, generative AI can forecast and create more appro-
oping the building’s framework and support mechanisms. priate structural sizes and layouts based on the layout and
This process can be expressed by Equations (7). existing dimensions, thereby optimizing these dimensions
((xstr , ystr ) + dstr to (xstr , ystr ) + dstr ) [134]. Furthermore,
n o
T3 T2
Fstr = ystr ystr ∈ i=1 fstr (oistr ) ∩ j=1 fstr (sjstr ) (7) It can also generate dimensions and layouts that meet load
requirements based on structural layout (xstr , ystr ) and load
capacity (lstr ), ((xstr , ystr ) + lstr to (xstr , ystr ) + dstr ).
Where ystr is the generated structure system, Fstr is the
collection of all the options. OP-3D refers to the objective 3.4. Architectural 3D Forms Refinement and Opti-
conditions of the preliminary design, which includes the mization Design
structural load distribution (o1str ), referring to the schematic
Architectural design’s fourth phase focuses on refining
of the building’s structural load distribution; architectural
and optimizing 3D models to closely represent the build-
plan design (o2str ), the second step of the design process; the
ing’s characteristics based on the initial model. This step
preliminary 3D form of the building (o3str ), being the result
can enhance the detail and form, and this process can be
of the first step in the design process. SP-3D refers to the
expressed by Equations (8).
Subjective decisions of the structure system design, which
includes structural materials (s1str ) , typically encompass- (
ing parameters characterizing the materials and texture im-
ages; structural aesthetic principles (s2str ) , usually involving FD-3D = yD-3D yD-3D ∈
conceptual diagrams and 3D models of the structural form. ) (8)
4 2
The design outcome ystr in the formula encapsulates various \ \
structural information, such as structural load capacity (lstr ), fD-3D (oiD-3D ) ∩ fD-3D (sjD-3D )
i=1 j=1
structural dimensions (dstr ), and structural layout (xstr , ystr ).
This structural information is determined by a set of objec- Where yD-3D is the generated refined 3D model of the ar-
tive conditions (Ostr ) and a set of subjective decisions (Sstr ). chitecture, FD-3D is the collection of all the options. OD-3D
Using the Bo-DAA apartment project as an illustration, refers to the Objective conditions of the preliminary design,
the architect utilized the preliminary 3D model (o3str ) and which includes Requirements (o1D-3D ) of the refinement of
the architectural plan (o2str ) to define the building’s spatial the architectural 3D form by indicator; the preliminary 3D
form and structural load distribution (o1str ). Opting for a forms design of architectural (o2D-3D ), the result of the first
frame structure, reinforced concrete was chosen as the con- step in the design process; Architectural floor plan design
struction material (s1str ), embodying modern Brutalism (s2str ). (o3D-3D ), the outcome of the second step in the design pro-
This approach ensured that the final structure (Rstr ) adhered cess; Architectural structural system design (o4D-3D ), the re-
to both the aesthetic and the functional constraints.This pro- sult of the third step in the design process. SD-3D refers to
cess is represented in Figure 12. the subjective decisions of refined 3D model of the archi-
The applications of generative AI in structural system tecture, which includes Aesthetic principles (s1D-3D ), which
design primarily involve the prediction of structural layout are principles used by architects to control the overall form
((xstr , ystr )) and structural dimensions (dstr ). and proportions of a building ; The design style (s2D-3D ),
In the realm of generating architectural structure layout which mean a manifestation of a period or region’s spe-
images, generative AI is capable of recognizing architec- cific characteristics and expression methods that can be re-
tural floor plans (o2str ). Consequently, it leverages this recog- flected through elements such as the form, structure, mate-
nition to generate detailed images of the structural layout rials, color, and decoration of a building.
(o2str to (xstr , ystr )) [125–127]. Using the Bo-DAA apartment project, the architectural
. Moreover, this technology is adept at creating struc- form index (o1D-3D ) including key metrics like floor area ra-
tural layout diagrams that correspond to floor plans based tio and height is first established. Next, the preliminary
on specified structural load capacities (lstr text + o2str to 3D form (o2D-3D ) shapes a tapered volume. In the floor
(xstr , ystr )) [128–131].Additionally, generative AI holds the plan phase (o3D-3D ), refinements such as a sixth-floor setback
capability to refine and enhance existing structural layouts, for public spaces are made. The structural system design
optimizing the layouts within the same structural space (o4D-3D ) guides these modifications within structural princi-
from((xstr , ystr ) to (xstr , ystr )) [132]. Furthermore, genera- ples. Aesthetic principles (s1D-3D ) and design styles (s2D-3D )

14
Figure 12. Architectural structural system design process.

Table 3. Application of Generative AI in Architectural Structural System Design

Data Transformation Approach Paper & Methodology


o2str to (xstr , ystr ) GANs [125–127]
(lstr text + o2str ) to (xstr , ystr ) GANs [128–131]
(xstr , ystr ) to (xstr , ystr ) GANs [132]
(s1Plan + o2str ) to (xstr , ystr ) pix2pixHD [133]
((xstr , ystr ) + dstr ) to ((xstr , ystr ) + dstr ) StructGAN-KNWL [134]

are woven throughout, culminating in a refined 3D form (usually from FD-3D 2d , o3D-3D , s2Plan ).
(RD-3D ) that harmonizes constraints with aesthetics..This In terms of Using parameters or 1D text to generate re-
process is illustrated in Figure 13. fined architectural 3D models by generative AI, researchers
The applications of generative AI in architectural 3D have trained voxel expression models (generate to FD-3D )
forms refinement and optimization design include two main [135] to generate these refined models.Additionally, gener-
categories, as shown in the table4: Using parameters or 1D ative AI has been employed to train Signed Distance Func-
text to generate FD-3D , or to conduct classification analy- tion (SDF) voxels, coupled with performing clustering anal-
sis (usually from s2D-3D text ); generating FP-3D , which repre- ysis on shallow vector representations of 3D models (FD-3D )
sented by 2D images or 3D models, based on 2D images [136].Following this, 2D images containing 3D voxel in-

15
Figure 13. Architectural 3D forms refinement and optimization design process.

Table 4. Application of Generative AI in Architectural 3D Forms Refinement and Optimization Design.

Data Transformation Approach Paper & Methodology


generate FD-3D 3D-GAN [135]
classif y FD-3D VAE [136]
parameters to FD-3D DCGAN, StyleGAN [137]; 3D-GAN [138]
s2D-3D text to FD-3D 3D-GAN [139]
FD-3D 2d to FD-3D 2d StyleGAN, pix2pix [140]; pix2pix, CycleGAN [141]
o3D-3D to FD-3D StyleGAN [142]
(oP-3D + s2Plan ) to FD-3D
1
GCN [142]; cGAN, GNN [143]

formation can be generated based on the input RGB chan- chitectural 3D models by generative AI, researchers con-
nel values. (parameters to FD-3D 2d ) [138] . Furthermore, verted the refined architectural 3D models (FD-3D ) into sec-
New forms of 3D elements can be generated based on in- tional images, it is possible to train paired sectional dia-
terpolation, transitioning from (parameters to FD-3D ) [137]. grams using Generative Adversarial Networks (GANs) to
Voxelized and point cloud representations of 3D enhanced understand the connections between adjacent sectional im-
model components (FD-3D ) can be trained , and according ages. By inputting a single sectional image into the model
to the textual labels of architectural components (s2D-3D text to reconstruct a new sectional image and then using the
to FD-3D ) [139]. newly generated sectional image as input to continue devel-
In terms of basing 2D images to generate refined ar- oping sectional images through iteration of the above pro-

16
Figure 14. Architectural facade design process.

cess, the reconstruction of the 3D model can be completed.


(FD-3D 2d to FD-3D 2d ) [140, 141] . Concurrently, generative (
AI can also generate refined 3D models from architectural FFac = yFac yFac
floor plans (o3D-3D to FD-3D ) [142], or based on spatial se-
(9)
quence matrices and spatial requirements (o1P-3D + s2Plan to 4 2
)
\ \
FD-3D ) [143, 144]. In an innovative approach, generative ∈ fFac (oiFac ) ∩ fFac (sjFac )
AI can learn the 3D models of architectural components i=1 j=1
(FD-3D ), combining them to create refined architectural 3D Where yFac is the generated architectural facade, FFac is
models. For instance, architectural 3D model components the collection of all the options. OFac refers to the Objective
can be pixelated into 2D images for training. conditions of architectural facade, which includes perfor-
mance evaluation of the architectural facade (o1Fac ), such as
daylighting, heat insulation, thermal retention, etc; archi-
tectural plan design (o2Fac ), which is the result of the sec-
ond step in the design process; architectural structural sys-
3.5. Architectural Facade Design
tem design (o3Fac ), which is the outcome of the third step
in the design process; and architectural 3D forms refine-
The fifth step in architectural design focuses on facade ment and optimization design (o4Fac ), which is the result of
design, aiming to create the building’s external appearance the fourth step in design process. SFac refers to the sub-
that reflects its style and environmental compatibility, incor- jective decisions of the preliminary design, which includes
porating cultural and symbolic elements. This process can facade component elements (s1Fac ), referring to specific fa-
be expressed by Equations (9). cade component styles employed by the architect, reflecting

17
Table 5. Application of generative AI in architectural facade design.

Data Transformation Approach Paper & Methodology


(aw + pw + awin + pwin + s1Fac ) to FFac GANs [145–151] DM, CycleGAN [152]
(aw + pw + awin + pwin + s1Fac ) to RFac pix2pix [153]
FFac to FFac CycleGAN [154]; StyleGAN2 [155]
(FFac + s2Fac ) to FFac GANs [156]
(aw + pw + awin + pwin + s1Fac ) to (aw + pw + awin + pwin + s1Fac ) GAN [157]

the designer’s style and concept; materials and style of the (s1Fac ), represented as (aw + (pw ) + awin + (pwin ) + s1Fac )
facade (s2Fac ), different materials bring various textures and to FFac [145–152]. Furthermore, complete facade and roof
colors to the building, exhibiting unique architectural char- images for all four directions of a building can be generated
acteristics and styles. using semantic segmentation images (aw + (pw ) + awin +
Subsequently, the final facade design outcome can be (pwin ) + s1Fac to RFac ) [153]. Additionally, generative AI
achieved by summing up the facade design results from proves instrumental in training architectural facade images
each direction. This process can be expressed by Equations for both reconstruction and novel generation processes (FFac
(10). to FFac ) [154]. Its utility is further demonstrated in the ap-
plication of style transfer to architectural facades, either by
4
X
i incorporating style images (FFac + s2Fac to FFac ) [156, 158]
RFac = FFac (10) or by facilitating style transfer between facade images of
i=1
diverse architectural styles (FFac to FFac ) [155].
Each direction’s final facade design outcome yFac , encap- In generating semantic segmentation maps for architec-
sulates various facade information, such as the area (aw ) tural facades, generative AI can be employed for the recon-
and position (pw ) of the wall surface, the area (awin ) and struction and generation of semantic segmentation maps of
position (pwin ) of the window surface, and the adoption of facades, like rebuilding the occluded parts of a semantic
a specific style for the facade components (cFac ). These in- segmentation maps based on the unobstructed parts(aw +
formation are derived from the set of objective conditions (pw )+awin +(pwin )+s1Fac to (aw +(pw )+awin +(pwin )+
OFac and the set of subjective decisions SFac . s1Fac )) [157].
In the Bo-DAA apartment project, architects use the ar-
chitectural plan (o2Fac ) to define windows and walls, incor- 3.6. Architectural Image Expression
porating glass curtain walls on the ground floor. The struc- Architectural image expression synthesizes design ele-
tural design (o3Fac ) guides facade structuring, ensuring align- ments into 2D images, reflecting the architect’s vision and
ment with the building’s structure. The refined 3D model design process. This process can be expressed by Equations
(o4Fac ) influences the facade’s shape, with residential win- (11).
dows designed to complement the building’s form. Facade
performance is enhanced through simulations (o1Fac ). Mate- (
rial selection (s2Fac ) favors exposed concrete, echoing Bru-
FImg = yImg yImg
talist aesthetics (s1Fac ), resulting in a minimalist, sculptural
facade (RFac ).This process is illustrated in Figure 14. 4 2
) (11)
The applications of generative AI in architectural facade
\ \
∈ fImg (oiImg ) ∩ fImg (sjImg )
design include two main categories, as shown in the table i=1 j=1
5: generating FFac based on 2D images (usually from s2Fac ,
FFac , semantic segmentation map of facade); generating se- Where yImg is the generated architectural images, FImg is
mantic segmentation map of facade based on 2D images. the collection of all the options. OImg refers to the Objective
In generating architectural facades, initially, it facilitates conditions of the architectural image expression, which in-
the generation of facade images by utilizing architectural cludes Architectural plan design (o1Img ), which is the result
facade semantic segmentation map, which annotate the pre- of the second step in the design process; architectural struc-
cise location and form of facade elements such as walls, tural system design, (o2Img ), which is the result of the third
window panes, and other components. Consequently, this step in the design process; refined 3D form of architecture
process involves generating facade images under the con- (o3Img ), which is the result of the fourth step in the design
straints of a given wall area (aw ) and position (pw ), win- process. architectural facade design (o4Img ), which is the re-
dow area (awin ), position (pwin ), and component elements sult of the fifth step in the design process. SImg refers to the

18
Table 6. Application of generative AI in architectural image expression process.

Data Transformation Approach Paper & Methodology


parameter to FImg GANs [159];
s1Image text to FImg GANs [160–162]; DMs [4, 163–166]; GANs, DMs [5]
1
(sImage text + FImg ) to FImg DMs [167–170]; GANs [171, 172]; GANs, DMs [173]; GANs, CLIP [174, 175]
(o3Img + FImg ) to FImg GANs [176];
s1Image mask to FImg GANs [177]; CycleGAN [178];
s2Img to FImg GANs [179–182]
FImg to FImg GANs [183–186];
s2Img to s2Img GAN [187];
FImg to s1Img VAE [188];

Subjective decisions of the architectural plan design. which sketches or line drawings (s2Img to FImg ) [179–182]. By
includes aesthetic principles (s1Img ) and image style (s2Img ), leveraging generative AI models, architectural images can
indicating elements are principles architects use to control undergo style blending, where images are generated based
the composition and style of architectural images. on two input images, enhancing the versatility of architec-
The applications of generative AI in architectural image tural visualization (FImg to FImg ) [186]. Employing GAN
expression include three main categories, as shown in the models to generate comfortable underground space render-
table 6: generating architectural image FFac based on 1D ings from virtual 3D space images (FImg to FImg ) [183]; and
text (usually from parameter, s1Image text ) ; generating ar- facilitates the creation of interior decoration images from
chitectural image FFac based on 2D images (usually from ) 360-degree panoramic interior images (FImg to FImg ) [184].
; generating architectural different style images or semantic Moreover, Using StyleGAN2 to generate architectural fa-
images (s1Img ,s2Img ) based on 2D images (usually from s2Img , cade and floor plan images (FImg to FImg ) [185] serves as a
FFac ). basis for establishing 3D architectural models.
In generating architectural images based on 1D text, In generating architectural different style images or se-
researchers employ linear interpolation techniques to cre- mantic images based on 2D images, generative AI can be
ate architectural images from varying perspectives (param- instrumental in the reconstruction and generation of archi-
eter to FImg ) [159]. Moreover, the direct generation of tectural line drawings (s2Img to s2Img ) [187]. And generative
architectural images from textual prompts simplifies and AI is capable of producing semantic style images that cor-
streamlines the process (s1Image text to FImg ) [4,5,161,163– respond to architectural images (FImg to s1Img ) [188].
165].This approach is also effective for generating architec-
tural interior images, as demonstrated by the use of Stable 4. Future Research Directions
Diffusion for interior renderings (s1Image text to FImg ) [166].
In generating architectural images based on 2D images, In this section, we illustrate the potential future research
several researchers have focused on training architectural directions to apply generative AI in architectural design us-
images and their paired textual prompts using generative ing the latest emerging techniques of image, video and 3D
AI models, facilitating the creation of architectural im- forms generations (Section 2).
ages based on the textual prompts (s1Image text + FImg to
4.1. Architectural Design Image Generation
FImg ) [160, 167–175]. Additionly, researchers utilize gen-
erative AI models to train architectural images and corre- Floor Plan Generation Researchers have applied vari-
sponding textual prompts, generating architectural images ous generative AI image generation techniques to the de-
based on these prompts (o3Img +FImg to FImg ) [162,176].Fur- sign and generation of architectural plan images. As tech-
thermore, the direct generation of architectural images nology advances, architects can gradually incorporate more
can be achieved through the use of image semantic la- conditional constraints into the generation of floor plans,
bels (masks) or textual descriptions. Specifically, gener- allowing generative AI to take over the thought process of
ating architectural images from image semantic labels of- architects. Architects can supply text data to the generative
fers precise control over the content of the generated images models, Text data encompasses client design requirements
(s1Image mask to FImg ) [177,178]; Researchers have also ex- and architectural design standards (o2Plan ), such as building
plored the transformation of architectural images across dif- type, occupancy, spatial needs, dimensions of architectural
ferent styles, such as generating architectural images from spaces, evacuation route settings and dimensions, fire safety

19
layout standards, etc. Architects also can supply image data holds promise for aligning with layout considerations in ar-
to the generative models, such as site plans (o2P-3D ), which chitectural design.
define the specific land use of architectural projects, nearby
buildings and natural features (o3P-3D ), as well as floor layout Elevation Generation Applications of generative AI on
diagrams (s1Plan ) or spatial sequence diagrams (s2Plan ) . facade generation based on semantic segmentation , textual
Based on the aforementioned method, some generative descriptions, and facade image style transfer. These ad-
AI models hold developmental potential in architectural vancements have made the facade generation process more
floor plan generation. ”Scene Graph” is a data structure ca- efficient. With the advancement of generative AI technol-
pable of intricately describing the elements within a scene ogy, researchers can develop more efficient and superior fa-
and their interrelations, consisting of nodes and edges. This cade generation models. For instance, architects can pro-
structure is particularly suited for depicting the connectiv- vide generative AI models with conditions such as facade
ity within architectural floor plans. By integrating diffu- sketches, facade mask segmentation images, and descrip-
sion models, SceneGenie [46] can accurately generate ar- tive terms for facade generation. These conditions can as-
chitectural floor plans using Scene Graphs. Furthermore, sist architects in generating corresponding high-quality fa-
technologies such as Stable Diffusion [9] and Imagen [189] cade images, streamlining the facade design process, and
allow for further refinement in the generation process of ar- enhancing design efficiency.
chitectural floor plans through text prompts and layout con-
trols.
Floor Plans Generation

Mask Layout Result

Generative
Parameters
Model

Describe the floor plan layout of a house featuring an open-concept design Facade Generation
with a seamless flow between the living room, kitchen, and dining area. The
spacious master bedroom with ensuite bathroom is positioned for privacy,
while additional bedrooms and a shared bathroom are strategically placed Mask Result
for convenience and comfort.
Figure 16. Framework example for building floor plans and facade
generation.

The key to applying generative models to architectural


design lies in integrating professional architectural data
with computational data. As illustrated in Figure 16, lay-
out and segmentation masks can often represent the facade
information in architecture in 2D image generation. The ar-
chitectural constraints can serve as hyperparameter inputs
to guide the image generation process by the generative
model.
Figure 15. Existing generative models can generate layout dia- The various methods of generative AI in image gener-
grams of rooms based on input text and can also be controlled ation have also shown unique potential in creating archi-
accordingly based on input layouts.
tectural facade images, such as GLIGEN [43], and Multi-
Diffusion [190]. Moreover, with the development of gen-
As shown in Figure 15, existing generative models, such erative AI technology, ControlNet [38] can precisely con-
as Stable Diffusion [9] and Imagen [189], can generate trol the content generated by diffusion models by adding
complete architectural designs based on textual input. How- extra conditions. It is applicable to the style transfer of ar-
ever, the generated images often fail to meet professional chitectural facades and can enrich facade designs with de-
standards and may lack rational layout adherence to design- tailed elements such as brick textures, window decorations,
ers’ intentions. Nonetheless, with the advancement of con- or door designs. Moreover, ControlNet can be used to ad-
ditional image generation, it is now possible to incorporate just specific elements in facade drawings, for instance, al-
additional constraints such as bounding boxes to control the tering window shapes, door sizes, or facade colors, thereby
generation process of diffusion models. This integration enhancing the personalization and creativity of the design.

20
Simultaneously, analyzing the style and characteristics of tual descriptions into detailed dynamic videos, enhancing
surrounding buildings ensures that the new facade design the visual impact and augmenting audience engagement,
harmonizes with its environment, maintaining consistency enabling designers to depict the architectural transforma-
in the scene. tions over time through text effortlessly. DynamiCrafter
employs text-prompted technology to infuse static images
Architectural Image Generation The text-to-image im- with dynamic elements, such as flowing water and drifting
age generation method is capable of producing creative ar- clouds, with high-resolution support ensuring the preserva-
chitectural concept designs (FImg ) based on brief descrip- tion of details and realism. PIKA, conversely, demonstrates
tions or a set of parameters (s1Image text , s2Image text ). The unique advantages in dynamic scene transformation, sup-
image-to-image image generation method enables the gen- porting text-driven element changes, allowing designers to
eration of architectural images possessing consistent fea- maintain scene integrity while presenting dynamic details,
tures or styles. This offers the potential to explore architec- thereby offering a rich and dynamic visual experience.
tural forms and spatial layouts yet to be conceived by human With advancements in diffusion models, current gener-
designers. Automatically generated architectural concepts ative models can now produce high-quality effect videos.
can serve as design inspiration, helping designers break tra- As shown in the Figure 17, the first and second rows de-
ditional mindsets and explore new design spaces. Simul- pict effect demonstration videos generated from input im-
taneously, diffusion probabilistic models can generate real- ages using PIKA [49], where the buildings undergo minor
istic architectural rendering images suitable for Virtual Re- movements and scaling while maintaining consistency with
ality (VR) or Augmented Reality (AR) applications, pro- the surrounding environment. DynamiCrafter [50] can gen-
viding designers and clients with immersive design evalu- erate rotating buildings, as demonstrated in the third row,
ation experiences. This advances higher quality and inter- where the model predicts architectural styles from differ-
active architectural visualization technologies, making the ent angles and ensures consistent generation. From GANs
design review and modification process more intuitive and to diffusion models, mature image-to-image style transfer
efficient. models have been implemented. The application of these
Stable Diffusion [9], DALLE-3 [34], and GLIDE [36] models ensures that the generated videos exhibit the desired
have been significantly applied in the domain of architec- presentation effects, greatly expanding the application sce-
tural image generation, demonstrating robust capabilities narios for videos.
in image synthesis. ControlNet [38], with its exceptional
controllability, has increasingly been utilized by architects Style Transfer for Specific Video Content In the outlook
to generate architectural images and style transfer, substan- for future technologies, the application of generative AI for
tially enriching design creativity and enhancing design effi- partial style transfer in architectural video content paves the
ciency. Similarly, GLIGEN [43] and SceneGenie [46] have way to new frontiers in architectural visual presentation.
shown potential in the control of image content, which also This technology enables designers to replicate an overall
holds significant value in the generation and creation of ar- style and, more importantly, precisely select which parts
chitectural imagery. of the video should undergo style transformation. Deep
learning-based neural style transfer algorithms have proven
4.2. Architectural Design Video Generation
their efficacy in applying style transfer to images and video
Video Generation based on Architectural Images and content. These algorithms achieve style transformation by
Text Prompt The application of generative AI-based learning specific features of a target style image and apply-
video generation in architectural design has multiple devel- ing these features to the original content. This implies that
opment directions. Through generative AI technology, per- distinct artistic styles or visual effects can be applied to se-
formance videos can be generated using a single architec- lected video portions in architectural videos. Local video
tural effect image (FImg ) along with relevant textual descrip- style transfer opens up novel possibilities in the architec-
tions (s1Image text , s2Image text ), performance videos can be tural domain, allowing designers and researchers to explore
produced. Future advancements include compiling multiple and present architectural and urban spaces in ways never
images of a structure from various angles to craft a continu- before possible. By precisely controlling the scope of style
ous video narrative. Such an approach diversifies presenta- transfer application, unique visual effects can be created,
tion techniques and streamlines the design process, yielding thereby enhancing architectural videos’ expressiveness and
significant time and cost savings. communicative value.
In the field of architectural video generation, Make- PIKA [49] showcases significant advantages in style
A-Video [48], DynamiCrafter [50], and PIKA [49] each transfer applications for architectural video content, of-
showcase their strengths, bringing innovative presentation fering robust support for visual presentation and research
methods to the forefront. Make-A-Video transforms tex- within the architectural realm. This technology enables

21
0 1 2 3 t/s
Input Image

Screenshot for isometric strategy RPG video game, a medieval castle, dynamic Unreal Engine render, panoramic scale, mountains and river, high resolution, ultra detailed

Snowing in the forbidden city.

Rotating view, small house.

Firework display, cartoon style.

Figure 17. Currently, generative models, such as PIKA [49] and DynamiCrafter [50], are capable of generating high-quality videos from
images, supporting multi-angle rotation, and style transfer.

designers and researchers to perform precise and flexi- architectural images, such as site information (o2P-3D ),or text
ble style customization for architectural videos, facilitat- prompt, such as design requirements (o1P-3D ), as input can
ing style transfers tailored to specific video content. No- improve modeling efficiency.
tably, PIKA allows for the style transfer of specific elements
or areas within a video instead of a uniform transforma-
tion across the entire content. This capability of localized In architectural 3D modeling, technologies such as
style transfer enables the accentuation of certain architec- DreamFusion [51], Magic3D [52], CLIP-NeRF [54], and
tural features or details, such as presenting a segment of DreamCraft3D [55] have emerged as revolutionary archi-
classical architecture in a modern or abstract artistic style, tectural design and visualization tools. They empower
thereby enhancing the video’s appeal and expressiveness. architects and designers to directly generate detailed and
Furthermore, PIKA excels in maintaining video content’s high-fidelity 3D architectural models from textual descrip-
coherence and visual consistency. By finely controlling the tions or 2D images, significantly expanding the possibilities
extent and scope of the style transfer, PIKA ensures that for architectural creativity and enhancing work efficiency.
the video retains its original structure and narrative while Specifically, as shown in Figure 18, DreamFusion [51] and
integrating new artistic styles, resulting in an aesthetically Magic3D [52] allow designers to swiftly create architectural
pleasing and authentic final product. Additionally, PIKA’s 3D model prototypes through simple text descriptions, ac-
style transfer technology is not confined to traditional artis- celerating the transition from concept to visualization. De-
tic styles but is also adaptable to various complex and inno- signers can easily modify textual descriptions and employ
vative visual effects, providing a vast canvas for creative ex- these tools for iterative design, exploring various architec-
pression in architectural video content. Whether emulating tural styles and forms to optimize design schemes. More-
the architectural style of a specific historical period or ven- over, CLIP-NeRF [54] and DreamCraft3D [55] enable de-
turing into unprecedented visual effects, PIKA is equipped signers to extract 3D information from existing architectural
to support such endeavors. images, facilitating the precise reconstruction of historical
buildings or current sites for restoration, research, or further
4.3. Architectural 3D Forms Generation development. Additionally, designers can create unique vi-
sual effects in 3D models by transforming and fusing image
3D Model Generation based on Architectural Images styles, further enhancing the artistic appeal and attractive-
and Text Prompt Generating 3D building forms Using ness of architectural representations.

22
(a) Castle (b) Abbey (c) Chichen Itza
(a) Turn the bear into a Grizzly bear

(d) Castle (e) Pisa tower (f) Opera house


(b) Make it Autumn
Figure 18. Examples of 3D models generated based on textual
Figure 19. Examples of 3D model style editing using GaussianEd-
prompts. Figures (a) – (c) are produced by Dreamfusion [51],
itor [191]. Left column shows the original models, right column
while figures (d) – (f) are generated using Magic3D [52].
presents the edited results.

Detail Style Generation and Editing for architectural 3D


Models With the advancements in generative AI for 3D
model generation technology, generative AI can generate
architectural 3D models with specific styles and textures
based on the input of preliminary architectural 3D models
(SP-3D ) (FP-3D + SP-3D to FD-3D ) . If this technology enables
the modification and editing of 3D models based on highly
personalized design requirements and allows designers to
make real-time adjustments, it would significantly enhance
the efficiency of architectural creation and enrich the av-
enues for architectural design.
GaussianEditor [191] and Magic123 [56] demonstrate
their applications and advantages in generating and edit-
ing detail styles for architectural 3D models by offering
designers greater creative freedom and control over edit-
ing. As shown in Figure 19, GaussianEditor’s Gaussian se-
mantic tracing and Hierarchical Gaussian Splatting enable
more precise and intuitive editing of architectural details.
At the same time, Magic123’s two-stage approach facili-
tates the transformation from complex real-world images to
detailed 3D models, as shown in Figure 20. The develop- Figure 20. Examples of input images (left) into detailed 3D mod-
ment of these technologies heralds a future in architectural els (right, dragon and teapots) using Magic123 [56].
design and visualization characterized by a richer diversity
and higher customization of 3D architectural models.
same time, the rapid development of AI technology, incred-
4.4. Human-Centric Architectural Design
ibly generative AI, offers the possibility for more intelligent
As society evolves and technology advances, the chal- architectural design. Based on these needs and visions, fu-
lenges faced by architectural design become increasingly ture generative AI will not only assist in the architectural de-
complex, requiring consideration of more factors. Tradi- sign process but also, based on human-centric design prin-
tional design methods demand extensive time from design- ciples, receive multimodal inputs, including text, images,
ers to meet requirements and adjust designs. Moreover, user sound, etc., and through intelligent processing, quickly un-
needs are becoming more diverse, making it a significant derstand design requirements and adjust design schemes,
issue to reflect better human requirements in design, which thereby generating designs that align with the architect’s vi-
necessitates more intelligent tools for realization. At the sion. Such architecturally designed AI large models will

23
be similar to existing co-pilot models but with further en- a broad spectrum of architectural design.
hanced functionality and intelligence. In conclusion, the integration of generative AI into ar-
Realizing this large model requires training the AI model chitectural design represents a significant leap forward in
on a vast amount of architectural design data and user feed- the realm of digital architecture. This advanced technol-
back to enable it to understand complex design require- ogy has shown exceptional capability in generating high-
ments. It also necessitates multimodal input processing, quality, high-resolution images and designs, offering inno-
developing technologies capable of handling various types vative ideas and enhancing the creative process across vari-
of inputs, such as text, images, and sound, to increase the ous facets of architectural design. As we look to the future,
model’s application flexibility. In addition, developing in- it is clear that the continued exploration and integration of
telligent interaction interfaces is essential; user-friendly in- Generative AI in architectural design will play a pivotal role
terfaces allow architects to communicate intuitively with the in shaping the next generation of digital architecture. This
AI model, state their needs, and receive feedback. Finally, technological evolution not only simplifies and accelerates
the model should provide customized output designs, gen- the design process but also opens up new avenues for cre-
erating multiple design options based on the input require- ativity, enabling architects to push the boundaries of tradi-
ments and data for architects to choose and modify. tional design and explore new, innovative design spaces.
However, realizing this architectural design AI large
model faces numerous challenges: 1)Data collection and References
processing: high-quality training data is critical to the per-
formance of AI models, and efficiently collecting and pro- [1] Matias Del Campo, Sandra Manninger, M Sanche, and
cessing a vast amount of architectural design data is a sig- L Wang. The church of ai—an examination of architecture
nificant challenge. 2)The fusion of multimodal inputs, ef- in a posthuman design ecology. In Intelligent & Informed-
Proceedings of the 24th CAADRIA Conference, Victoria
fectively integrating information from different modalities
University of Wellington, Wellington, New Zealand, pages
to improve the model’s accuracy and application scope, re- 15–18, 2019.
quires further technological breakthroughs. Another chal-
lenge is optimizing user interaction; designing an interface [2] Frederick Chando Kim, Mikhael Johanes, and Jeffrey
that aligns with architects’ habits and enables accessible Huang. Text2form diffusion: Framework for learning cu-
rated architectural vocabulary. In 41st Conference on Edu-
communication with the AI model is crucial for the tech-
cation and Research in Computer Aided Architectural De-
nology’s implementation. 3)Ensuring that AI-generated de- sign in Europe, eCAADe 2023, pages 79–88. Education and
signs meet practical needs while being innovative and per- research in Computer Aided Architectural Design in Eu-
sonalized is critical for technological development. By ad- rope, 2023.
dressing these challenges, the future may see the realiza-
[3] Xinwei Zhuang, CA Design, ED Phase, C Generative, and
tion of generative AI models that truly aid architectural de-
AN Network. Rendering sketches. eCAADe 2022, 1:517–
sign, improving design efficiency and quality and achieving
521, 1973.
human-centric architectural design optimization.
[4] Daniel Koehler. More than anything: Advocating for
5. Conclusion synthetic architectures within large-scale language-image
models. International Journal of Architectural Computing,
The field of generative models has witnessed unparal- page 14780771231170455, 2023.
leled advancements, particularly in image generation, video [5] Mathias Bank Stigsen, Alexandra Moisi, Shervin Ra-
generation, and 3D content creation. These developments soulzadeh, Kristina Schinegger, and Stefan Rutzinger. Ai
span across various applications, including text-to-image, diffusion as design vocabulary - investigating the use of
image-to-image, text-to-3D, and image-to-3D transforma- ai image generation in early architectural design and edu-
tions, demonstrating a significant leap in the capability to cation. Digital Design Reconsidered - Proceedings of the
synthesize realistic, high-fidelity content from minimal in- 41st Conference on Education and Research in Computer
puts. The rapid advancement of generative models marks a Aided Architectural Design in Europe (eCAADe 2023),
transformative phase in artificial intelligence, where synthe- page 587–596, 2023.
sizing realistic, diverse, and semantically consistent content [6] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo
across images, videos, and 3D models is becoming increas- Rezende, and Daan Wierstra. Draw: A recurrent neural
ingly feasible. This progress paves new avenues for creative network for image generation. In International conference
expression and lays the groundwork for future innovations on machine learning, pages 1462–1471. PMLR, 2015.
in digital architectural design process. As the field contin- [7] Tero Karras, Samuli Laine, and Timo Aila. A style-based
ues to evolve, further exploration of model efficiency, con- generator architecture for generative adversarial networks.
trollability, and domain-specific applications will be crucial In Proceedings of the IEEE/CVF conference on computer
in harnessing the full potential of generative AI models for vision and pattern recognition, pages 4401–4410, 2019.

24
[8] Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. International conference on machine learning, pages 9786–
Apdrawinggan: Generating artistic portrait drawings from 9796. PMLR, 2020.
face photos with hierarchical gans. In Proceedings of
[21] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei
the IEEE/CVF conference on computer vision and pattern
Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh
recognition, pages 10743–10752, 2019.
models from single rgb images. In Proceedings of the Euro-
[9] Robin Rombach, Andreas Blattmann, Dominik Lorenz, pean conference on computer vision (ECCV), pages 52–67,
Patrick Esser, and Björn Ommer. High-resolution image 2018.
synthesis with latent diffusion models. In Proceedings of
[22] Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter
the IEEE/CVF Conference on Computer Vision and Pattern
Battaglia. Polygen: An autoregressive generative model of
Recognition, pages 10684–10695, 2022.
3d meshes. In International conference on machine learn-
[10] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott ing, pages 7220–7229. PMLR, 2020.
Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya
Sutskever. Zero-shot text-to-image generation. In Interna- [23] Jeong Joon Park, Peter Florence, Julian Straub, Richard
tional Conference on Machine Learning, pages 8821–8831. Newcombe, and Steven Lovegrove. Deepsdf: Learning
PMLR, 2021. continuous signed distance functions for shape representa-
tion. In Proceedings of the IEEE/CVF Conference on Com-
[11] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing puter Vision and Pattern Recognition (CVPR), June 2019.
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative adversarial nets. Advances [24] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se-
in neural information processing systems, 27, 2014. bastian Nowozin, and Andreas Geiger. Occupancy net-
works: Learning 3d reconstruction in function space. In
[12] Mehdi Mirza and Simon Osindero. Conditional generative Proceedings of the IEEE/CVF Conference on Computer Vi-
adversarial nets. arXiv preprint arXiv:1411.1784, 2014. sion and Pattern Recognition (CVPR), June 2019.
[13] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
[25] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik,
Efros. Image-to-image translation with conditional adver-
Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf:
sarial networks. In Proceedings of the IEEE conference on
Representing scenes as neural radiance fields for view syn-
computer vision and pattern recognition, pages 1125–1134,
thesis. Communications of the ACM, 65(1):99–106, 2021.
2017.
[26] Fang Zhao, Wenhao Wang, Shengcai Liao, and Ling Shao.
[14] Yang Song and Stefano Ermon. Generative modeling by
Learning anchored unsigned distance functions with gradi-
estimating gradients of the data distribution. Advances in
ent direction alignment for single-view garment reconstruc-
neural information processing systems, 32, 2019.
tion. In Proceedings of the IEEE/CVF International Con-
[15] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- ference on Computer Vision, pages 12674–12683, 2021.
fusion probabilistic models. Advances in Neural Informa-
[27] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler,
tion Processing Systems, 33:6840–6851, 2020.
and George Drettakis. 3d gaussian splatting for real-time
[16] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- radiance field rendering. ACM Transactions on Graphics,
net: Convolutional networks for biomedical image seg- 42(4), July 2023.
mentation. In Medical Image Computing and Computer-
Assisted Intervention–MICCAI 2015: 18th International [28] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina
Conference, Munich, Germany, October 5-9, 2015, Pro- Toutanova. Bert: Pre-training of deep bidirectional
ceedings, Part III 18, pages 234–241. Springer, 2015. transformers for language understanding. arXiv preprint
arXiv:1810.04805, 2018.
[17] Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang
Zhao. Latent consistency models: Synthesizing high- [29] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya
resolution images with few-step inference. arXiv preprint Sutskever, et al. Improving language understanding by gen-
arXiv:2310.04378, 2023. erative pre-training. 2018.
[18] Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong [30] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam
Zhang, William T Freeman, and Joshua B Tenenbaum. Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia
Learning shape priors for single-view 3d completion and Jin, Taylor Bos, Leslie Baker, Yu Du, et al. Lamda:
reconstruction. In Proceedings of the European Conference Language models for dialog applications. arXiv preprint
on Computer Vision (ECCV), pages 646–662, 2018. arXiv:2201.08239, 2022.
[19] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. [31] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub-
Pointnet: Deep learning on point sets for 3d classification biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan-
and segmentation. In Proceedings of the IEEE conference tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al.
on computer vision and pattern recognition, pages 652– Language models are few-shot learners. Advances in neu-
660, 2017. ral information processing systems, 33:1877–1901, 2020.
[20] Andrey Voynov and Artem Babenko. Unsupervised dis- [32] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya
covery of interpretable directions in the gan latent space. In Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry,

25
Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- [44] Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image gener-
ing transferable visual models from natural language super- ation from scene graphs. In Proceedings of the IEEE con-
vision. In International conference on machine learning, ference on computer vision and pattern recognition, pages
pages 8748–8763. PMLR, 2021. 1219–1228, 2018.
[33] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi [45] Renato Sortino, Simone Palazzo, and Concetto Spamp-
Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer inato. Transforming image generation from scene graphs.
Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment In 2022 26th International Conference on Pattern Recogni-
anything. arXiv preprint arXiv:2304.02643, 2023. tion (ICPR), pages 4118–4124. IEEE, 2022.
[34] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey [46] Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen,
Chu, and Mark Chen. Hierarchical text-conditional Böjrn Ommer, and Nassir Navab. Scenegenie: Scene graph
image generation with clip latents. arXiv preprint guided diffusion models for image synthesis. In Proceed-
arXiv:2204.06125, 2022. ings of the IEEE/CVF International Conference on Com-
puter Vision, pages 88–98, 2023.
[35] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xi-
aogang Wang, Xiaolei Huang, and Dimitris N Metaxas. [47] Jonathan Ho, Tim Salimans, Alexey Gritsenko, William
Stackgan: Text to photo-realistic image synthesis with Chan, Mohammad Norouzi, and David J Fleet. Video dif-
stacked generative adversarial networks. In Proceedings fusion models. arXiv:2204.03458, 2022.
of the IEEE international conference on computer vision, [48] Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An,
pages 5907–5915, 2017. Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual,
Oran Gafni, et al. Make-a-video: Text-to-video generation
[36] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav
without text-video data. arXiv preprint arXiv:2209.14792,
Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and
2022.
Mark Chen. Glide: Towards photorealistic image genera-
tion and editing with text-guided diffusion models. arXiv [49] Leijie Wang, Nicolas Vincent, Julija Rukanskaitė, and
preprint arXiv:2112.10741, 2021. Amy X Zhang. Pika: Empowering non-programmers to au-
thor executable governance policies in online communities.
[37] Wengling Chen and James Hays. Sketchygan: Towards di- arXiv preprint arXiv:2310.04329, 2023.
verse and realistic sketch to image synthesis. In Proceed-
ings of the IEEE Conference on Computer Vision and Pat- [50] Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Xin-
tern Recognition, pages 9416–9425, 2018. tao Wang, Tien-Tsin Wong, and Ying Shan. Dynamicrafter:
Animating open-domain images with video diffusion pri-
[38] Lvmin Zhang and Maneesh Agrawala. Adding conditional ors. arXiv preprint arXiv:2310.12190, 2023.
control to text-to-image diffusion models. arXiv preprint
arXiv:2302.05543, 2023. [51] Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden-
hall. Dreamfusion: Text-to-3d using 2d diffusion. In The
[39] Yichen Peng, Chunqi Zhao, Haoran Xie, Tsukasa Fukusato, Eleventh International Conference on Learning Represen-
and Kazunori Miyata. Difffacesketch: High-fidelity face tations, 2023.
image synthesis with sketch-guided latent diffusion model.
[52] Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki
arXiv preprint arXiv:2302.06908, 2023.
Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja
[40] Zhengyu Huang, Haoran Xie, Tsukasa Fukusato, and Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-
Kazunori Miyata. Anifacedrawing: Anime portrait explo- resolution text-to-3d content creation. In Proceedings of
ration during your sketching. In ACM SIGGRAPH 2023 the IEEE/CVF Conference on Computer Vision and Pattern
Conference Proceedings. Association for Computing Ma- Recognition (CVPR), pages 300–309, June 2023.
chinery, 2023. [53] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik,
[41] Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf:
Image generation from layout. In Proceedings of the Representing scenes as neural radiance fields for view syn-
IEEE/CVF Conference on Computer Vision and Pattern thesis. In ECCV, 2020.
Recognition (CVPR), June 2019. [54] Can Wang, Menglei Chai, Mingming He, Dongdong
[42] Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Chen, and Jing Liao. Clip-nerf: Text-and-image driven
Yi-Zhe Song, Bodo Rosenhahn, and Tao Xiang. Context- manipulation of neural radiance fields. arXiv preprint
aware layout to image generation with enhanced object ap- arXiv:2112.05139, 2021.
pearance. In Proceedings of the IEEE/CVF Conference on [55] Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen
Computer Vision and Pattern Recognition (CVPR), pages Liu, Zhenda Xie, and Yebin Liu. Dreamcraft3d: Hierarchi-
15049–15058, June 2021. cal 3d generation with bootstrapped diffusion prior. arXiv
[43] Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, preprint arXiv:2310.16818, 2023.
Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae [56] Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren,
Lee. Gligen: Open-set grounded text-to-image generation. Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko-
arXiv preprint arXiv:2301.07093, 2023. rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard

26
Ghanem. Magic123: One image to high-quality 3d object [70] Jingyi Li, Fang Guo, and Hong Chen. A study on urban
generation using both 2d and 3d diffusion priors. In The block design strategies for improving pedestrian-level wind
Twelfth International Conference on Learning Representa- conditions: Cfd-based optimization and generative adver-
tions (ICLR), 2024. sarial networks. Energy and Buildings, page 113863, 2023.
[57] Jaime de Miguel Rodrı́guez, Maria Eugenia Villafañe, Luka [71] Ondrej Veselỳ. Building massing generation using gan
Piškorec, and Fernando Sancho Caparrini. Generation of trained on dutch 3d city models. 2022.
geometric interpolations of building types with deep varia-
tional autoencoders. Design Science, 6:e34, 2020. [72] Feifeng Jiang, Jun Ma, Christopher John Webster, Xiao
Li, and Vincent JL Gan. Building layout generation us-
[58] Xinwei Zhuang, Yi Ju, Allen Yang, and Luisa Caldas. Syn-
ing site-embedded gan model. Automation in Construction,
thesis and generation for 3d architecture volume with gen-
151:104888, 2023.
erative modeling. International Journal of Architectural
Computing, page 14780771231168233, 2023. [73] Diego Navarro-Mateu, Oriol Carrasco, and Pedro
[59] Yubo Liu, Han Li, Qiaoming Deng, and Kai Hu. Diffusion Cortes Nieves. Color-patterns to architecture conver-
probabilistic model assisted 3d form finding and design la- sion through conditional generative adversarial networks.
tent space exploration: A case study for taihu stone spacial Biomimetics, 6(1):16, 2021.
transformation. In The International Conference on Com- [74] Suzi Kim, Dodam Kim, and Sunghee Choi. Citycraft: 3d
putational Design and Robotic Fabrication, pages 11–23. virtual city creation from a single image. The Visual Com-
Springer, 2023. puter, 36:911–924, 2020.
[60] Hao Zheng and Philip F Yuan. A generative architectural
[75] Dong Wook Shu, Sung Woo Park, and Junseok Kwon.
and urban design method through artificial neural networks.
3d point cloud generative adversarial network based on
Building and Environment, 205:108178, 2021.
tree structured graph convolutions. In Proceedings of the
[61] Panagiota Pouliou, Anca-Simona Horvath, and George IEEE/CVF international conference on computer vision,
Palamas. Speculative hybrids: Investigating the generation pages 3859–3868, 2019.
of conceptual architectural forms through the use of 3d gen-
erative adversarial networks. International Journal of Ar- [76] Adam Sebestyen, Ozan Özdenizci, Robert Legenstein, and
chitectural Computing, page 14780771231168229, 2023. Urs Hirschberg. Generating conceptual architectural 3d ge-
[62] Tomas Cabezon Pedroso, Jinmo Rhee, and Daragh Byrne. ometries with denoising diffusion models. In 41st Confer-
Feature space exploration as an alternative for design space ence on Education and Research in Computer Aided Ar-
exploration beyond the parametric space. arXiv preprint chitectural Design in Europe-Digital Design Reconsidered:
arXiv:2301.11416, 2023. eCAADe 2023, 2023.
[63] FREDERICK CHANDO KIM and JEFFREY HUANG. [77] Spyridon Ampanavos and Ali Malkawi. Early-phase
Towards a machine understanding of architectural form. performance-driven design using generative models. In In-
[64] Adam Sebestyen, Urs Hirschberg, and Shervin Ra- ternational Conference on Computer-Aided Architectural
soulzadeh. Using deep learning to generate design spaces Design Futures, pages 87–106. Springer, 2021.
for architecture. International Journal of Architectural [78] Chenyu Huang, Gengjia Zhang, Jiawei Yao, Xiaoxin Wang,
Computing, page 14780771231168232, 2023. John Kaiser Calautit, Cairong Zhao, Na An, and Xi Peng.
[65] Alberto Tono, Heyaojing Huang, Ashwin Agrawal, and Accelerated environmental performance-driven urban de-
Martin Fischer. Vitruvio: 3d building meshes via single per- sign with generative adversarial network. Building and En-
spective sketches. arXiv preprint arXiv:2210.13634, 2022. vironment, 224:109575, 2022.
[66] Qiaoming Deng, Xiaofeng Li, Yubo Liu, and Kai Hu. Ex- [79] Stanislas Chaillou. Archigan: Artificial intelligence x ar-
ploration of three-dimensional spatial learning approach chitecture. In Architectural Intelligence: Selected Papers
based on machine learning–taking taihu stone as an exam- from the 1st International Conference on Computational
ple. Architectural Intelligence, 2(1):5, 2023. Design and Robotic Fabrication (CDRF 2019), pages 117–
[67] Raffaele Di Carlo, Divyae Mittal, and Ondrej Veselỳ. Gen- 127. Springer, 2020.
erating 3d building volumes for a given urban context us-
[80] Xiaoni Gao, Xiangmin Guo, and Tiantian Lo. M-strugan:
ing pix2pix gan. Legal Depot D/2022/14982/02, page 287,
An automatic 2d-plan generation system under mixed struc-
2022.
tural constraints for homestays. Sustainability, 15(9):7126,
[68] Steven Jige Quan. Urban-gan: An artificial intelligence- 2023.
aided computation system for plural urban design. Environ-
ment and Planning B: Urban Analytics and City Science, [81] Xiao Min, Liang Zheng, and Yile Chen. The floor plan
49(9):2500–2515, 2022. design method of exhibition halls in cgan-assisted museum
architecture. Buildings, 13(3):756, 2023.
[69] Shiqi Zhou, Yuankai Wang, Weiyi Jia, Mo Wang, Yuwei
Wu, Renlu Qiao, and Zhiqiang Wu. Automatic responsive- [82] Da Wan, Xiaoyu Zhao, Wanmei Lu, Pengbo Li, Xinyu Shi,
generation of 3d urban morphology coupled with local cli- and Hiroatsu Fukuda. A deep learning approach toward
mate zones using generative adversarial network. Building energy-effective residential building floor plan generation.
and Environment, 245:110855, 2023. Sustainability, 14(13):8074, 2022.

27
[83] Ran Chen, Jing Zhao, Xueqi Yao, Sijia Jiang, Yingting He, [95] Lehao Yang, Long Li, Qihao Chen, Jiling Zhang, Tian
Bei Bao, Xiaomin Luo, Shuhan Xu, and Chenxi Wang. Feng, and Wei Zhang. Street layout design via conditional
Generative design of outdoor green spaces based on gen- adversarial learning. arXiv preprint arXiv:2305.08186,
erative adversarial networks. Buildings, 13(4):1083, 2023. 2023.
[84] Yubo Liu, Yangting Lai, Jianyong Chen, Lingyu Liang, [96] Hanan Tanasra, Tamar Rott Shaham, Tomer Michaeli,
and Qiaoming Deng. Scut-autoalp: A diverse bench- Guy Austern, and Shany Barath. Automation in interior
mark dataset for automatic architectural layout parsing. space planning: Utilizing conditional generative adversar-
IEICE TRANSACTIONS on Information and Systems, ial network models to create furniture layouts. Buildings,
103(12):2725–2729, 2020. 13(7):1793, 2023.
[85] Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick, [97] Sepidehsadat Hosseini and Yasutaka Furukawa. Floorplan
Hao Zhang, and Hui Huang. Graph2plan: Learning floor- restoration by structure hallucinating transformer cascades.
plan generation from layout graphs. ACM Transactions on
[98] Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg
Graphics (TOG), 39(4):118–1, 2020.
Mori, and Yasutaka Furukawa. House-gan: Relational gen-
[86] Christina Doumpioti and Jeffrey Huang. Intensive differ- erative adversarial networks for graph-constrained house
ences in spatial design. In 39th eCAADe Conference in Novi layout generation. In Computer Vision–ECCV 2020: 16th
Sad, Serbia, pages 9–16, 2021. European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part I 16, pages 162–177. Springer, 2020.
[87] Merve Akdoğan and Özgün Balaban. Plan generation with
generative adversarial networks: Haeckel’s drawings to pal- [99] Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang,
ladian plans. Journal of Computational Design, 3(1):135– Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa.
154, 2022. House-gan++: Generative adversarial layout refinement
network towards intelligent computational agent for profes-
[88] Ilker Karadag, Orkan Zeynel Güzelci, and Sema Alaçam.
sional architects. In Proceedings of the IEEE/CVF Confer-
Edu-ai: a twofold machine learning model to support
ence on Computer Vision and Pattern Recognition, pages
classroom layout generation. Construction Innovation,
13632–13641, 2021.
23(4):898–914, 2023.
[100] Shidong Wang, Wei Zeng, Xi Chen, Yu Ye, Yu Qiao, and
[89] Can Uzun, Meryem Birgül Çolakoğlu, and Arda İnceoğlu.
Chi-Wing Fu. Actfloor-gan: Activity-guided adversarial
Gan as a generative architectural plan layout tool: A case
networks for human-centric floorplan design. IEEE Trans-
study for training dcgan with palladian plans and evaluation
actions on Visualization and Computer Graphics, 2021.
of dcgan outputs. vol, 17:185–198, 2020.
[101] PEDRO VELOSO, JINMO RHEE, ARDAVAN BIDGOLI,
[90] Sheng-Yang Huang, Enriqueta Llabres-Valls, Aiman
and MANUEL LADRON DE GUEVARA. A pedagogical
Tabony, and Luis Carlos Castillo. Damascus house: Ex-
experience with deep learning for floor plan generation.
ploring the connectionist embodiment of the islamic envi-
ronmental intelligence by design. In eCAADe proceedings, [102] Ziniu Luo and Weixin Huang. Floorplangan: Vector res-
volume 1, pages 871–880. eCAADe, 2023. idential floorplan adversarial generation. Automation in
Construction, 142:104470, 2022.
[91] XY Ying, XY Qin, JH Chen, and J Gao. Generating resi-
dential layout based on ai in the view of wind environment. [103] Morteza Rahbar, Mohammadjavad Mahdavinejad,
In Journal of Physics: Conference Series, volume 2069, Amir HD Markazi, and Mohammadreza Bemanian.
page 012061. IOP Publishing, 2021. Architectural layout design through deep learning and
agent-based modeling: A hybrid approach. Journal of
[92] Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao,
Building Engineering, 47:103822, 2022.
Bo Dai, Ziwei Liu, and Dahua Lin. Blockplanner: city
block generation with vectorized graph representation. In [104] Yubo Liu, Zhilan Zhang, and Qiaoming Deng. Exploration
Proceedings of the IEEE/CVF International Conference on on diversity generation of campus layout based on gan.
Computer Vision, pages 5077–5086, 2021. In The International Conference on Computational Design
and Robotic Fabrication, pages 233–243. Springer, 2022.
[93] Pedram Ghannad and Yong-Cheol Lee. Automated modu-
lar housing design using a module configuration algorithm [105] Mohammadreza Aalaei, Melika Saadi, Morteza Rahbar,
and a coupled generative adversarial network (cogan). Au- and Ahmad Ekhlassi. Architectural layout generation us-
tomation in construction, 139:104234, 2022. ing a graph-constrained conditional generative adversarial
network (gan). Automation in Construction, 155:105053,
[94] Shuyi Huang and Hao Zheng. Morphological regenera-
2023.
tion of the industrial waterfront based on machine learn-
ing. In 27th International Conference of the Associa- [106] Jiachen Liu, Yuan Xue, Jose Duarte, Krishnendra
tion for Computer-Aided Architectural Design Research Shekhawat, Zihan Zhou, and Xiaolei Huang. End-to-
in Asia (CAADRIA 2022), pages 475–484. The Associa- end graph-constrained vectorized floorplan generation with
tion for Computer-Aided Architectural Design Research in panoptic refinement. In European Conference on Computer
Asia . . . , 2022. Vision, pages 547–562. Springer, 2022.

28
[107] Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- [119] Christina Doumpioti and Jeffrey Huang. Field condition -
sutaka Furukawa. Housediffusion: Vector floorplan gener- environmental sensibility of spatial configurations with the
ation via a diffusion model with discrete and continuous use of machine intelligence. eCAADe proceedings, 2022.
denoising. In Proceedings of the IEEE/CVF Conference [120] Fatemeh Mostafavi, Mohammad Tahsildoost, Zahra Sadat
on Computer Vision and Pattern Recognition, pages 5466– Zomorodian, and Seyed Shayan Shahrestani. An interactive
5475, 2023. assessment framework for residential space layouts using
[108] Pei Sun, Fengying Yan, Qiwei He, and Hongjiang Liu. The pix2pix predictive model at the early-stage building design.
development of an experimental framework to explore the Smart and Sustainable Built Environment, 2022.
generative design preference of a machine learning-assisted
[121] Qiushi He, Ziwei Li, Wen Gao, Hongzhong Chen, Xiaoy-
residential site plan layout. Land, 12(9):1776, 2023.
ing Wu, Xiaoxi Cheng, and Borong Lin. Predictive models
[109] Yubo Liu, Yihua Luo, Qiaoming Deng, and Xuanxing for daylight performance of general floorplans based on cnn
Zhou. Exploration of campus layout based on generative and gan: a proof-of-concept study. Building and Environ-
adversarial network: Discussing the significance of small ment, 206:108346, 2021.
amount sample learning for architecture. In Proceedings
[122] Tomasz Dzieduszyński. Machine learning and complex
of the 2020 DigitalFUTURES: The 2nd International Con-
compositional principles in architecture: Application of
ference on Computational Design and Robotic Fabrication
convolutional neural networks for generation of context-
(CDRF 2020), pages 169–178. Springer, 2021.
dependent spatial compositions. International Journal of
[110] Yuzhe Pan, Jin Qian, and Yingdong Hu. A preliminary Architectural Computing, 20(2):196–215, 2022.
study on the formation of the general layouts on the north-
ern neighborhood community based on gaugan diversity [123] Viktor Eisenstadt, Jessica Bielski, BURAK Mete,
output generator. In Proceedings of the 2020 DigitalFU- CHRISTOPH Langenhan, Klaus-Dieter Althoff, and
TURES: The 2nd International Conference on Computa- ANDREAS Dengel. Autocompletion of floor plans for the
tional Design and Robotic Fabrication (CDRF 2020), pages early design phase in architecture: Foundations, existing
179–188. Springer, 2021. methods, and research outlook. In POSTCARBON-
Proceedings of the 27th CAADRIA Conference, Sydney,
[111] Chao-Wang Zhao, Jian Yang, and Jiatong Li. Generation
pages 323–332, 2022.
of hospital emergency department layouts based on genera-
tive adversarial networks. Journal of Building Engineering, [124] Yueheng Lu, RUNJIA TIAN, AO LI, XIAOSHI WANG,
43:102539, 2021. and GARCIA DEL CASTILLO LOPEZ JOSE LUIS. Or-
[112] Wamiq Para, Paul Guerrero, Tom Kelly, Leonidas J Guibas, ganizational graph generation for structured architectural
and Peter Wonka. Generative layout modeling using con- floor plan dataset. In Presented at the Proceedings of
straint graphs. In Proceedings of the IEEE/CVF interna- the 26th International Conference of the Association for
tional conference on computer vision, pages 6690–6700, Computer-Aided Architectural Design Research in Asia
2021. (CAADRIA), CUMINCAD, pages 81–90, 2021.
[113] Ricardo C Rodrigues and Rovenir B Duarte. Generating [125] Wenjie Liao, Xinzheng Lu, Yuli Huang, Zhe Zheng, and
floor plans with deep learning: A cross-validation assess- Yuanqing Lin. Automated structural design of shear wall
ment over different dataset sizes. International Journal of residential buildings using generative adversarial networks.
Architectural Computing, 20(3):630–644, 2022. Automation in Construction, 132:103931, 2021.
[114] Shuai Dong, Wei Wang, Wensheng Li, and Kun Zou. Vec- [126] Yifan Fei, Wenjie Liao, Shen Zhang, Pengfei Yin, Bo Han,
torization of floor plans based on edgegan. Information, Pengju Zhao, Xingyu Chen, and Xinzheng Lu. Integrated
12(5):206, 2021. schematic design method for shear wall structures: a prac-
[115] Seongyong Kim, Seula Park, Hyunjung Kim, and Kiyun tical application of generative adversarial networks. Build-
Yu. Deep floor plan analysis for complicated drawings ings, 12(9):1295, 2022.
based on style transfer. Journal of Computing in Civil En- [127] Bochao Fu, Yuqing Gao, and Wei Wang. Dual generative
gineering, 35(2):04020066, 2021. adversarial networks for automated component layout de-
[116] Mikhael Johanes and Jeffrey Huang. Deep learning spatial sign of steel frame-brace structures. Automation in Con-
signature inverted gans for isovist representation in archi- struction, 146:104661, 2023.
tectural floorplan. In 40th Conference on Education and Re- [128] Wenjie Liao, Yuli Huang, Zhe Zheng, and Xinzheng Lu. In-
search in Computer Aided Architectural Design in Europe, telligent generative structural design method for shear wall
eCAADe 2022, pages 621–629. Education and research in building based on “fused-text-image-to-image” generative
Computer Aided Architectural Design in Europe, 2022. adversarial networks. Expert Systems with Applications,
[117] Mikhael Johanes and Jeffrey Huang. Generative isovist 210:118530, 2022.
transformer. [129] Xinzheng Lu, Wenjie Liao, Yu Zhang, and Yuli Huang.
[118] Peiyang Su, Weisheng Lu, Junjie Chen, and Shibo Hong. Intelligent structural design of shear wall residence using
Floor plan graph learning for generative design of residen- physics-enhanced generative adversarial networks. Earth-
tial buildings: a discrete denoising diffusion model. Build- quake Engineering & Structural Dynamics, 51(7):1657–
ing Research & Information, pages 1–17, 2023. 1676, 2022.

29
[130] Yifan Fei, Wenjie Liao, Xinzheng Lu, Ertugrul Taciroglu, generation. In Proceedings of the IEEE/CVF Interna-
and Hong Guan. Semi-supervised learning method incor- tional Conference on Computer Vision, pages 11956–
porating structural optimization for shear-wall structure de- 11965, 2021.
sign using small and long-tailed datasets. Journal of Build-
[145] Zhenlong Du, Haiyang Shen, Xiaoli Li, and Meng Wang.
ing Engineering, 79:107873, 2023.
3d building fabrication with geometry and texture coordi-
[131] Pengju Zhao, Wenjie Liao, Yuli Huang, and Xinzheng nation via hybrid gan. Journal of Ambient Intelligence and
Lu. Intelligent design of shear wall layout based on Humanized Computing, pages 1–12, 2020.
graph neural networks. Advanced Engineering Informatics,
55:101886, 2023. [146] Qiu Yu, Jamal Malaeb, and Wenjun Ma. Architectural
facade recognition and generation through generative ad-
[132] Wenjie Liao, Xinyu Wang, Yifan Fei, Yuli Huang, Lin- versarial networks. In 2020 International Conference on
lin Xie, and Xinzheng Lu. Base-isolation design of Big Data & Artificial Intelligence & Software Engineering
shear wall structures using physics-rule-co-guided self- (ICBASE), pages 310–316. IEEE, 2020.
supervised generative adversarial networks. Earthquake
Engineering & Structural Dynamics, 2023. [147] Cheng Lin Chuang, Sheng Fen Chien, et al. Facilitating
architect-client communication in the pre-design phase. In
[133] Pengju Zhao, Wenjie Liao, Hongjing Xue, and Xinzheng Projections-Proceedings of the 26th International Confer-
Lu. Intelligent design method for beam and slab of shear ence of the Association for Computer-Aided Architectural
wall structure based on deep learning. Journal of Building Design Research in Asia, CAADRIA 2021, volume 2, pages
Engineering, 57:104838, 2022. 71–80. The Association for Computer-Aided Architectural
[134] Yifan Fei, Wenjie Liao, Yuli Huang, and Xinzheng Lu. Design Research in Asia . . . , 2021.
Knowledge-enhanced generative adversarial networks for
[148] Cheng Sun, Yiran Zhou, and Yunsong Han. Automatic
schematic design of framed tube structures. Automation in
generation of architecture facade for historical urban ren-
Construction, 144:104619, 2022.
ovation using generative adversarial network. Building and
[135] Immanuel Koh. Architectural plasticity: the aesthetics of Environment, 212:108781, 2022.
neural sampling. Architectural Design, 92(3):86–93, 2022.
[149] Lei Zhang, Liang Zheng, Yile Chen, Lei Huang, and Shihui
[136] Michael Hasey, Jinmo Rhee, and Daniel Cardoso Llach. Zhou. Cgan-assisted renovation of the styles and features
Form data as a resource in architectural analysis: an ar- of street facades—a case study of the wuyi area in fujian,
chitectural distant reading of wooden churches from the china. Sustainability, 14(24):16575, 2022.
carpathian mountain regions of eastern europe. Digital Cre-
ativity, 34(2):103–126, 2023. [150] Hongpan Lin, Linsheng Huang, Yile Chen, Liang Zheng,
Minling Huang, and Yashan Chen. Research on the appli-
[137] Ingrid Mayrhofer-Hufnagl and Benjamin Ennemoser. From cation of cgan in the design of historic building facades in
linear to manifold interpolation. urban renewal—taking fujian putian historic districts as an
[138] Benjamin Ennemoser and Ingrid Mayrhofer-Hufnagl. De- example. Buildings, 13(6):1478, 2023.
sign across multi-scale datasets by developing a novel ap- [151] JIAXIN ZHANG, TOMOHIRO FUKUDA, NOBUYOSHI
proach to 3dgans. International Journal of Architectural YABUKI, and YUNQIN LI. Synthesizing style-similar
Computing, page 14780771231168231, 2023. residential facade from semantic labeling according to the
[139] DONGYUN KIM, LLOYD SUKGYO LEE, and HANJUN user-provided example.
KIM. Elemental sabotage.
[152] Wenyuan Sun, Ping Zhou, Yangang Wang, Zongpu Yu, Jing
[140] Hang Zhang and Ye Huang. Machine learning aided 2d-3d Jin, and Guangquan Zhou. 3d face parsing via surface
architectural form finding at high resolution. In Proceed- parameterization and 2d semantic segmentation network,
ings of the 2020 DigitalFUTURES: The 2nd International 2022.
Conference on Computational Design and Robotic Fabri-
[153] Da Wan, Runqi Zhao, Sheng Zhang, Hui Liu, Lian Guo,
cation (CDRF 2020), pages 159–168. Springer, 2021.
Pengbo Li, and Lei Ding. A deep learning-based approach
[141] Hang Zhang and E Blasetti. 3d architectural form style to generating comprehensive building façades for low-rise
transfer through machine learning (full version), 2020. housing. Sustainability, 15(3):1816, 2023.
[142] KE Asmar and Harpreet Sareen. Machinic interpolations: [154] JIAHUA DONG, QINGRUI JIANG, ANQI WANG, and
a gan pipeline for integrating lateral thinking in computa- YUANKAI WANG. Urban cultural inheritance.
tional tools of architecture. In Proceedings of the 24th Con-
ference of the Iberoamerican Society of Digital Graphics, [155] Shengyu Meng. Exploring in the latent space of design:
Online, pages 18–20, 2020. A method of plausible building facades images genera-
tion, properties control and model explanation base on
[143] Hang Zhang. Text-to-form. 08 2021. stylegan2. In Proceedings of the 2021 DigitalFUTURES:
[144] Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo, Shingo The 3rd International Conference on Computational De-
Murata, Mehdi Nourbakhsh, and Yoshito Tsuji. Building- sign and Robotic Fabrication (CDRF 2021) 3, pages 55–68.
gan: Graph-conditioned architectural volumetric design Springer Singapore, 2022.

30
[156] Selen Çiçek, Gozde Damla Turhan, and Aybüke Taşer. [170] Sachith Seneviratne, Damith Senanayake, Sanka Ras-
Deterioration of pre-war and rehabilitation of post-war nayaka, Rajith Vidanaarachchi, and Jason Thompson.
urbanscapes using generative adversarial networks. In- Dalle-urban: Capturing the urban design expertise of large
ternational Journal of Architectural Computing, page text to image transformers. In 2022 International Confer-
14780771231181237, 2023. ence on Digital Image Computing: Techniques and Appli-
cations (DICTA), pages 1–9. IEEE, 2022.
[157] Zhenhuang Cai, Yangbin Lin, Jialian Li, Zongliang Zhang,
and Xingwang Huang. Building facade completion using [171] Jonathan Dortheimer, Gerhard Schubert, Agata Dalach,
semantic-synchronized gan. In 2021 IEEE International Lielle Brenner, and Nikolas Martelaro. Think ai-side the
Geoscience and Remote Sensing Symposium IGARSS, box!
pages 6387–6390. IEEE, 2021. [172] Emmanouil Vermisso. Semantic ai models for guiding
[158] Xue Sun, Yue Wang, Ting Zhang, Yin Wang, Haoyue Fu, ideation in architectural design courses. In ICCC, pages
Xuechun Li, and Zhen Liu. An application of deep neural 205–209, 2022.
network in facade planning of coastal city buildings. In In- [173] Daniel Bolojan, Emmanouil Vermisso, and Shermeen
ternational Conference on Computer Science and its Appli- Yousif. Is language all we need? a query into architec-
cations and the International Conference on Ubiquitous In- tural semantics using a multimodal generative workflow.
formation Technologies and Applications, pages 517–523. In POST-CARBON, Proceedings of the 27th International
Springer, 2022. Conference of the Association for Computer-Aided Archi-
[159] Frederick Chando Kim and Jeffrey Huang. Perspectival tectural Design Research in Asia (CAADRIA), volume 1,
gan. pages 353–362, 2022.
[174] Dongyun Kim. Latent morphologies: Encoding architec-
[160] J Chen and R Stouffs. From exploration to interpreta-
tural features and decoding their structure through artificial
tion: Adopting deep representation learning models to la-
intelligence. International Journal of Architectural Com-
tent space lnterpretation of architectural design alternatives.
puting, page 14780771231209458, 2022.
2021.
[175] Kaiyu Cheng, Paulina Neisch, and Tong Cui. From con-
[161] Wolf dPrix, Karolin Schmidbaur, Daniel Bolojan, and cept to space: a new perspective on aigc-involved attribute
Efilena Baseta. The legacy sketch machine: from arti- translation. Digital Creativity, 34(3):211–229, 2023.
ficial to architectural intelligence. Architectural Design,
92(3):14–21, 2022. [176] Jeffrey Huang, Mikhael Johanes, Frederick Chando Kim,
Christina Doumpioti, and Georg-Christoph Holz. On gans,
[162] Ruşen Eroğlu and Leman Figen Gül. Architectural form ex- nlp and architecture: combining human and machine in-
plorations through generative adversarial networks. Legal telligences for the generation and evaluation of meaning-
Depot D/2022/14982/02, page 575, 2022. ful designs. Technology— Architecture+ Design, 5(2):207–
[163] Nervana Osama Hanafy. Artificial intelligence’s effects 224, 2021.
on design process creativity:” a study on used ai text-to- [177] DONGYUN KIM, GEORGE GUIDA, JOSE LUIS
image in architecture”. Journal of Building Engineering, GARCÍA, and DEL CASTILLO Y LÓPEZ. Participatory
80:107999, 2023. urban design with generative adversarial networks.
[164] Ville Paananen, Jonas Oppenlaender, and Aku Visuri. [178] Yeji Hong, Somin Park, Hongjo Kim, and Hyoungkwan
Using text-to-image generation for architectural design Kim. Synthetic data generation using building information
ideation. arXiv preprint arXiv:2304.10182, 2023. models. Automation in Construction, 130:103871, 2021.
[165] Hanım Gülsüm Karahan, Begüm Aktaş, and Cemal Koray [179] X. Zhuang. Rendering sketches - interactive rendering gen-
Bingöl. Use of language to generate architectural scenery eration from sketches using conditional generative adver-
with ai-powered tools. In International Conference on sarial neural network. Proceedings of the 40th Interna-
Computer-Aided Architectural Design Futures, pages 83– tional Conference on Education and Research in Computer
96. Springer, 2023. Aided Architectural Design in Europe (eCAADe) [Volume
1], 2022.
[166] Junming Chen, Zichun Shao, and Bin Hu. Generating inte-
rior design from text: A new diffusion model-based method [180] Yuqian Li and Weiguo Xu. Using cyclegan to achieve the
for efficient creative design. Buildings, 13(7):1861, 2023. sketch recognition process of sketch-based modeling. In
Proceedings of the 2021 DigitalFUTURES: The 3rd Inter-
[167] Frederick Chando Kim, Mikhael Johanes, and Jeffrey national Conference on Computational Design and Robotic
Huang. Text2form diffusion. Fabrication (CDRF 2021) 3, pages 26–34. Springer, 2022.
[168] Gernot Riether and Taro Narahara. Ai tools to synthesize [181] Xinyue Ye, Jiaxin Du, and Yu Ye. Masterplangan: Facilitat-
characteristics of public spaces. ing the smart rendering of urban master plans via generative
[169] Junming Chen, Duolin Wang, Zichun Shao, Xu Zhang, adversarial networks. Environment and Planning B: Urban
Mengchao Ruan, Huiting Li, and Jiaqi Li. Using artificial Analytics and City Science, 49(3):794–814, 2022.
intelligence to generate master-quality architectural designs [182] YUQIAN LI1and WEIGUO XU. Research on architectural
from text descriptions. Buildings, 13(9):2285, 2023. sketch to scheme image based on context encoder.

31
[183] Yingbin Gui, Biao Zhou, Xiongyao Xie, Wensheng Li, and
Xifang Zhou. Gan-based method for generative design of
visual comfort in underground space. In IOP Conference
Series: Earth and Environmental Science, volume 861,
page 072015. IOP Publishing, 2021.
[184] Ka Chun Shum, Hong-Wing Pang, Binh-Son Hua,
Duc Thanh Nguyen, and Sai-Kit Yeung. Conditional 360-
degree image synthesis for immersive indoor scene decora-
tion. In Proceedings of the IEEE/CVF International Con-
ference on Computer Vision, pages 4478–4488, 2023.
[185] Matias del Campo. Deep house-datasets, estrangement, and
the problem of the new. Architectural Intelligence, 1(1):12,
2022.
[186] Matias Del Campo, Sandra Manninger, and Alexandra
Carlson. Hallucinating cities: a posthuman design method
based on neural networks. In Proceedings of the 11th an-
nual symposium on simulation for architecture and urban
design, pages 1–8, 2020.
[187] Wenliang Qian, Yang Xu, and Hui Li. A self-sparse genera-
tive adversarial network for autonomous early-stage design
of architectural sketches. Computer-Aided Civil and Infras-
tructure Engineering, 37(5):612–628, 2022.
[188] Sisi Han, Yuhan Jiang, Yilei Huang, Mingzhu Wang, Yong
Bai, and Andrea Spool-White. Scan2drawing: Use of
deep learning for as-built model landscape architecture.
Journal of Construction Engineering and Management,
149(5):04023027, 2023.
[189] Chitwan Saharia, William Chan, Saurabh Saxena, Lala
Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour,
Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali-
mans, et al. Photorealistic text-to-image diffusion models
with deep language understanding. Advances in Neural In-
formation Processing Systems, 35:36479–36494, 2022.
[190] Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel.
Multidiffusion: Fusing diffusion paths for controlled image
generation. 2023.
[191] Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi-
aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huap-
ing Liu, and Guosheng Lin. Gaussianeditor: Swift and con-
trollable 3d editing with gaussian splatting. arXiv preprint
arXiv:2311.14521, 2023.

32

You might also like