Generative AI For Architectural Design: A Literature Review
Generative AI For Architectural Design: A Literature Review
Abstract
arXiv:2404.01335v1 [cs.LG] 30 Mar 2024
1
sign, 2) architectural layout design, 3) architectural struc- architectural design, thus advancing scientific research and
tural system design, 4) detailed and optimization design of technological innovation.
architectural 3D forms, 5) architectural facade design, and In terms of the issue of training data, deep learning mod-
6) architectural imagery expression. After exploring the re- els require high-quality training data to analyze and ver-
search papers from 2020 to 2023, we observed there has ify their generalization ability. However, data in the field
been a significant increase in the number of research papers of architecture is usually unstructured. The search and or-
in architectural design using Generative AI. The number of ganization of architectural training data pose a significant
research papers using Generative AI technology in different challenge, making it difficult right from the initial stages
architectural design steps reveals the development trends of model training. In addition, high-performance Graphics
within each subfield, as illustrated in Figure 2(a). Most re- Processing Units (GPUs) are required to train the millions
searches are concentrated in the area of architectural plan of data for deep learning models, especially those dealing
design. Research in preliminary 3D form design of archi- with complex images and datasets. The scarcity of high-
tecture and architectural image expression has rapidly in- performance GPUs and the difficulty of mastering GPU pro-
creased in the past two years. More research needs to be gramming skills may prevent the architects to explore the
done by scholars on architectural, structural system design, recent diffusion model and large foundation models.
architectural 3D form refinement and optimization design,
and architectural facade design. 1.2. Structure and Methodology
This sustained growth trend distinctly demonstrates that This article first introduces the development and applica-
generative AI in architectural design are expanding at an un- tion directions of generative AI models, then elaborates on
precedented rate while also reflecting the architectural de- the methods of applying generative AI in the architectural
sign and computer science community have high level of design process, and finally, forecasts the potential applica-
attention and increasing investment in Generative AI tech- tion development of generative AI in the architectural field.
nologies. The most used generative AI techniques are illus-
In section 2, the article offers an in-depth introduction to
trated in Fig 2(b). In computer science, many studies focus
the principles and evolution of various generative AI mod-
on GAN and VAE, while research on DDPM, LDM, and
els, with a focus on Diffusion Models (DMs), 3D Gener-
GPT is in the initial stages. The situation is the same in
ative Models, and Foundation Models. In section 2.1, the
architecture.
article elaborates on the principles and development of Vari-
1.1. Motivation ational Autoencoders (VAEs) and Generative Adversarial
Networks (GANs). In section 2.2, the discourse on Dif-
Leveraging the recent generative AI models in architec- fusion Models elaborates on the working mechanisms and
tural design could significantly improve design efficiency, the developmental trajectories of DDPM and LDM. In sec-
and provide architects with new design processes and ideas tion 2.3, the segment on 3D Generative Models zeroes in
to expand the possibilities of architectural design and rev- on 3D shape representation, encompassing Voxels, Point
olutionize the entire design process. However, the use of Clouds, Meshes, Implicit functions, and Occupancy Fields.
advanced generative models in architectural design has not Within Occupancy Fields, the paper details Signed Distance
been explored extensively. The primary reasons for hinder- Functions (SDF), Unsigned Distance Functions (UDF), and
ing the use of advanced generative models in architectural Neural Radiance Fields (NeRF), explaining their respec-
design may have two aspects: the professional barriers and tive operational principles. In section 2.4, the Foundation
the issue of training data. Models section comprehensively describes the progress and
In terms of professional barriers, deep learning and ar- achievements of Large Language Models (LLM) and Large
chitectural design are highly specialized fields requiring ex- Vision Models. In section 2.5, the paper discusses the ap-
tensive professional knowledge and experience. The aim of plications and developments of these models in image gen-
this study is to narrow the professional barriers between ar- eration, video generation, and 3D model generation.
chitecture and computer science, and assist architectural de- In section 3, this paper delves into the application de-
signers in bridging Generative AI technologies with appli- velopment of generative AI models in architectural design.
cations, promoting interdisciplinary research, and delineat- Given the complexity of the architectural design process,
ing future research directions. This review systematically this article delineates the architectural design process into
analyzes and summarizes case studies and research out- six steps, as presented in introduction. In each step, the
comes of Generative AI applications in architectural design, article summarizes and discusses the current application
and showcases the possibilities and potential of the intersec- methods of generative AI models in these six domains.
tion between computer science and architecture. This in- By analyzing these research papers, the study demonstrates
terdisciplinary perspective encourages collaboration among how generative AI can facilitate innovation in architectural
experts from different fields to address complex issues in design, improve design efficiency, and optimize architec-
2
Figure 2. Overview of generative AI applications in architectural design: statistics on research paper numbers and generative models.
3
z, while the D determines the authenticity of the generated
samples G(Z) with the ground truth image x̄. Ideally:
4
(a) Voxel (b) Point (c) Mesh (d) Implicit
5
Figure 7. An overview of NeRF scene representation and differentiable rendering procedure [25]. Synthesizing images by sampling 5D
coordinates (location and viewing direction) along camera rays (a), feeding those locations into an MLP to produce a color and volume
density (b), and using volume rendering techniques to composite these values into an image (c). And minimize the residual between
synthesized and ground truth observed images (d).
L(pt , −d) represents emitted radiance. NeRF introduces [28] and Generative Pre-trained Transformer (GPT) [29].
an implicit representation, enabling the encoding of detailed The main difference is that BERT is based on a bidirectional
and continuous volumetric information. This allows for pre-training language model and fine-tuning, while GPT is
high-fidelity reconstruction and rendering of scenes with based on an autoregressive pre-training language model and
fine-scale structures, surpassing the limitations of explicit prompting.
representations. Recently, 3D Gaussian Splatting [27] is in-
troduced by projecting 3D information onto a 2D domain
GPT. GPT aims to pre-train models using large-scale un-
using Gaussian kernels, and achieved better performance
supervised learning to facilitate understanding and gener-
than NeRF.
ation of natural language. The training process involves
2.4. Foundation Models two primary stages: Initially, a language model is trained
in an unsupervised manner on extensive corpora without
In computer science, foundation models also called task-specific labels or annotations. Subsequently, super-
large-scale models use deep learning models with numer- vised fine-tuning occurs during the second stage, catering
ous parameters and intricate structures, particularly in nat- to specific application domains and tasks.
ural language processing and computer vision tasks. These
models demand substantial computational resources for
training but exhibit exceptional performance across diverse BERT. BERT has emerged as a breakthrough approach,
tasks. The evolution from basic neural networks to sophis- achieving state-of-the-art performance across diverse lan-
ticated diffusion models, as depicted in Figure 8, illustrates guage tasks. BERT’s training methodology comprises two
the continuous quest for more robust and adaptable AI sys- key stages: pre-training and fine-tuning. Pre-training in-
tems. volves the utilization of extensive text corpora to train the
language model. The primary objective of pre-training is
to endow the BERT model with robust language under-
2.4.1 Large Language Models (LLM) standing capabilities, enabling it to effectively tackle var-
ious natural language processing tasks. Subsequently, fine-
Transformer. The Transformer model has achieved remark-
tuning utilizes the pre-trained BERT model in conjunction
able success in natural language processing (NLP) which
with smaller labeled datasets to refine the model parame-
consists of several components: encoder, decoder, posi-
ters. This process facilitates the customization of the model
tional Encoding, and the final linear and softmax layers.
to specific tasks, thereby enhancing its suitability and per-
Both the encoder and decoder are composed of multiple
formance for targeted applications.
identical layers. Each layer contains several components of
attention layers and feedforward network layers. Addition-
ally, positional encoding is used to inject positional infor- In recent years, LLMs have witnessed explosive and
mation into the text embeddings, indicating the position of rapid growth. Basic language models refer to models that
words within the sequence. Notably, Transformer has paved are only pre-trained on large-scale text corpora, without any
the way for two prominent Transformer models: Bidirec- fine-tuning. Examples of such models include LaMDA [30]
tional Encoder Representations from Transformers (BERT) and OpenAI’s GPT-3 [31].
6
Large-Language
BART
BERT T5
Methods
NLP
DreamBooth
VideoCrafter
CogView
Large-Visual
T2I-Adapter
SD V1 SD V2 SD XL
Models
DreamFusion IP-Adapter
Magic3D
Midjourney SVD
eDiff PIKA
DALLꞏE GLIDE DALLꞏE2 Imagen DALLꞏE3
Prompt-to- LoRA
Classifier-Free ControlNet
prompt
Diffusion
Methods
Diffusion Guidance
MultiDiffusion
DPM DDPM DDIM LDM LCM
Point Cloud
CGAN DCGAN Transformer StyleGAN
Methods
NeRF
Other
RNN GAN
pix2pix
FCNSDF VAE GCN
CNN
UDF
Before 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
7
A modern matte painting of a
This bird is red This bird is red architectural castle made of
A small A cute corgi lives A warm room, floor-
Input and brown in color, with white and has building with large A chair made cheesecake
kitchen with in a house made to-ceiling windows,
with a stubby beak. a very short beak. glass windows, of brick. surrounded by a
a low ceiling. out of sushi. sunlight, cat.
overlooking a serene moat made of ice
ocean at sunset. cream
Output
Input
Output
Input
Output
pable of handling diverse image translation tasks, including text-to-image generation in open worlds using prompts and
the transformation of sketches into fully realized images. In bounding boxes as condition inputs.
addition, SketchyGAN [37] focuses on the sketch-to-image Scene Graph. The scene graph was proposed and uti-
generation task and aims to achieve more diversity and real- lized first in 2018 [44], which is used to enable explicit
ism. Currently, ControlNet [38] can control diffusion mod- reasoning about objects and their relationships. Thereafter,
els by adding extra conditions. The sketch-to-image gen- Sortino et al. [45] proposed a model that can satisfy seman-
eration tasks are applied in both photo-realistic and anime- tic constraints defined by a scene graph and to model rela-
cartoon styles [39, 40]. tions between visual objects in the scene by taking into ac-
count a user-provided partial rendering of the desired target.
Layout. Layout typically encompasses details such as Currently, SceneGenie [46] combined the scene graphs with
the position, size, and relative relationships of individual advanced diffusion models to generate high-quality images.
objects. Layout2Im [41] is designed to take a coarse spatial Which enforces geometric constraints in the sampling pro-
layout, consisting of bounding boxes and object categories, cess using the bounding box and segmentation information
for generating a set of realistic images. These images ac- predicted from a scene graph.
curately depict the specified objects in their intended lo-
cations. To enhance the global attention in context, He et 2.5.2 Video Generation
al. [42] introduced the Context Feature Conversion Mod-
ule to ensure that the generated feature ControlNet
encoding for ob- Since text prompts only generate some discrete tokens, text-
jects remains aware of other coexisting objects in the scene. to-video generation is more difficult than tasks such as im-
As for diffusion models, GLIGEN [43] facilitates grounded age retrieval and image captioning. Video diffusion model
8
[47] is the first paper to use the diffusion model for video derings, thereby delivering high-fidelity, coherent 3D ob-
generation tasks. The video diffusion model proposes 3D jects. While Magic123 [56] offers a two-stage solution for
UNet, which can be applied on variable sequence lengths. generating high-quality textured 3D meshes from unposed
Thus it can be jointly trained on video and image modeling wild images. It optimizes a neural radiance field for coarse
goals, making it suitable for video generation tasks. Ad- geometry and fine-tunes details using differentiable mesh
ditionally, Make-A-Video [48] is based on the pre-trained representations guided by both 2D and 3D diffusion priors.
text-to-image model and adds one-dimensional convolution
and attention layers in the time dimension to transform it 3. Generative AI for Architectural Design
into a text2video model. By learning the connection be-
tween text and vision through the T2I model, the single- This study delineates the architecture design into six
modal video data is utilized to learn the generation of tem- main steps to facilitate a convenient understanding of the
poral dynamic content. Furthermore, the controllability and process and essence of architectural design. The output of
consistency of video generation models have also garnered each step is generated based on the project’s objective con-
increased attention from researchers. PIKA [49] has been ditions and the architects’ subjective intentions. Objective
proposed to support dynamic transformations of elements conditions (O) include factors such as site area, building
in the scene based on prompts, without causing the over- height restrictions, and construction standards that must be
all image collapse. DynamiCrafter [50] utilizes pre-trained adhered to by all architects. Subjective intentions (S) re-
video diffusion priors to add animation effects to static im- fer to the individual architect’s design concept, architectural
ages based on textual prompts. This tool supports high- style, and other subjective preferences. This study explores
resolution models, providing better dynamic effects, higher how generative AI can assist with preliminary design, lay-
resolution, and stronger consistency. out design, structural design, 3D form design, façade de-
sign, and imagery expressions based on objective conditions
and subjective intentions. It also presents a statistical anal-
2.5.3 3D Model Generation ysis of generative AI models used in each architectural step
and the tasks they accomplished.
Text-to-3D Recent advancements in text-to-3D synthesis
have demonstrated remarkable progress, with researchers 3.1. Architectural Preliminary 3D Forms Design
employing various sophisticated strategies to bridge the
gap between natural language descriptions and the creation To begin with, creating a preliminary 3D architecture
of detailed 3D content. The pioneering work DreamFu- model involves considering objective factors such as the
sion [51] harnesses a pre-trained 2D text-to-image diffu- building’s type and function, site conditions, surroundings
sion model to generate 3D models without large-scale la- environment, and subjective factors such as design con-
beled 3D datasets or specialized denoising architectures. cepts and morphological intentions. This process can be
Magic3D [52] improves upon DreamFusion’s [51] limita- expressed by Equations (4).
tions by implementing a two-stage coarse-to-fine approach,
accelerating the optimization process through a sparse 3D n T4 o
FP-3D = yP-3D | yP-3D ∈ i=1 fP-3D (oiP-3D ) ∩ fP-3D (SP-3D ) (4)
representation before refining it into high-resolution tex-
tured meshes via a differentiable renderer.
Where yP-3D is the generated preliminary 3D model of
the architecture, FP-3D is the collection of all the options.
Image-to-3D Recent 3D reconstruction techniques par- OP-3D refers to the Objective conditions of the preliminary
ticularly focus on generating and reconstructing three- design, which includes design tasks (o1P-3D ), such as building
dimensional objects and scenes from a single or few im- functions, building area, building height restrictions, and
ages. NeRF [53] represents a state-of-the-art technique the number of occupants; site conditions (o2P-3D ), such as
where complex scene representations are modeled as con- the red line of the site, the shape of the boundaries; sur-
tinuous neural radiance fields optimized with sparse input roundings conditions (o3P-3D ), such as nearby traffic arteries,
views. CLIP-NeRF [54] leveraging the joint language- neighbor buildings; and environmental performance (o4P-3D ),
image embedding space of CLIP model, proposes a uni- such as the daylighting, wind and thermal environment.
fied framework that allows manipulating NeRF in a user- SP-3D refers to the Subjective intentions of the preliminary
friendly way, using either a short text prompt or an exem- design.
plar image. DreamCraft3D [55] introduces a hierarchical To elucidate the specific architectural design process,
process for 3D content creation that employs bootstrapped this paper takes the Bo-DAA apartment project in Seoul,
score distillation sampling from a view-dependent diffusion South Korea, as an example. The project requirements in-
model. This two-step method refines textures through per- clude multiple residential units and shared public spaces,
sonalized diffusion models trained on augmented scene ren- encompassing a communal workspace, lounge, shared
9
Figure 10. Architectural preliminary 3D forms design process.
kitchen, laundry room, and pet bathing area (o1P-3D ). The ings (o3P-3D ). To enhance resident comfort, the design con-
site is a regular rectangle with flat terrain (o2P-3D ), located in sidered lighting and views for each residential unit (o4P-3D ).
an urban setting surrounded by multi-story residential build- Based on these requirements, the architect chose ”Book
10
House” as the design concept (SP-3D ), creating a prelimi- higher than that of training on 2D image data. Facing
nary 3D form (FP-3D ) that gradually tapers from bottom to the challenges associated with training neural networks
top. This design provides excellent lighting and views for on 3D forms, researchers have innovated by transforming
each residential unit level. This process is illustrated in Fig- 3D forms into 2D representations, such as grayscale im-
ure 10. ages enriched with elevation data. This approach simpli-
The applications of generative AI in this process include fies the training process, enhancing efficiency for architec-
four main categories, as shown in the table1: generating tural forms in specific regions and facilitating the genera-
FP-3D based on parameters and classification; generating tion of 3D models influenced by the surrounding environ-
FP-3D based on 2D images or 1D text (usually from o1P-3D , ment (RP-3D 2d to RP-3D 2d ) [67–70]. Moreover, the prac-
o2P-3D , o3P-3D and SP-3D , SP-3D text , RP-3D 2d , RP-3D sketch ); tice of converting 3D models into 2D images for recon-
and generating or redesign FP-3D based on 3D model data struction, followed by reverting these 2D images back to
(usually from FP-3D ); and generating environmental perfor- 3D forms, significantly reduces both training duration and
mance evaluation (usually from o4P-3D ) based on 3D model costs, ensuring accurate restoration of the original 3D mod-
data (usually from FP-3D ). els (RP-3D 2d to RP-3D 2d ) [66]. In other generative AI train-
Firstly, it facilitates generative AI in generating pre- ing strategies, researchers incorporate parameters such as
liminary 3D forms based on input parameters or in con- the design site’s scope (o2P-3D ) and characteristics of the im-
ducting classification analysis on preliminary 3D models. mediate environment (o3P-3D ) as generative conditions. This
Initially, Variational Autoencoders (VAE) play a pivotal enables the creation of preliminary 3D models that adhere
role in reconstructing and generating detailed 3D models to predefined rule settings (o2P-3D + o3P-3D to FP-3D ) [71, 72].
(FP-3D ) from a set of input parameters (parameters to Furthermore, researchers can create architectural 3D mod-
FP-3D ) [57]. Building upon this, Generative Adversarial els from design concept sketches (SP-3D to FP-3D ) [73], and
Networks (GAN) further refine the process by training on even from a singular concept sketch in conjunction with en-
the point coordinate data of 3D models, utilizing category vironmental data (SP-3D + o3P-3D to FP-3D ) [74].
parameters for more precise reconstructions (parameters Afterwards, utilize 3D models as the basis for genera-
to FP-3D ) [60]. And the approach facilitates the creation of tive AI creation, or redesign based on generated 3D models,
innovative architectural 3D forms through the technique of which is generated by generative AI. TreeGAN is used to
input interpolation (parameters to FP-3D ) [58]. Also, dif- train point cloud models of churches, leveraging these mod-
fusion probability models offer a unique method of train- els for diverse redesign applications (FP-3D to FP-3D ) [75].
ing Taihu stone and architectural 3D models, this training Additionally, diffusion probability models are instrumental
enables the discovery of transitional forms between two in training 3D models, introducing noise into 3D models
distinct 3D models by employing interpolation as an in- to create novel forms (FP-3D to FP-3D ) [76]. Lastly, genera-
put mechanism (parameters to FP-3D ) [59]. The Struc- tive AI is utilized to conduct site and architectural environ-
ture GAN model, focusing on point cloud data, enables mental performance evaluations based on 3D models. This
the generation of 3D models based on specific input pa- involves generating images for assessments such as view
rameters such as length, width, and height (parameters analysis, sunlight exposure, and daylighting rates, among
to FP-3D ) [61]. In a further enhancement to the modeling others (FP-3D + o2P −3D to o4P −3D ) [77, 78].
process, VAE is also utilized for the in-depth training of 3D 3.2. Architectural Plan Design
models (FP-3D ). This allows for a comprehensive classifica-
tion and analysis of the models’ distribution within the la- Architectural plan design, the second phase in the archi-
tent space, paving the way for more nuanced model creation tectural design process, involves creating horizontal section
(classif y FP-3D ) [62]. Generative AI techniques, such as views at specific site elevations. Guided by objective condi-
the 3D Adversarial Autoencoder model, are employed for tions and subjective decisions, this step includes arranging
the training and generation of point cloud representations, spatial elements like walls, windows, and doors into a 2D
facilitating the reconstruction and classification of architec- plan. This process can be expressed by Equations (5).
tural forms (classif y FP-3D ) [63]. ( 3
Secondly, it involves using 1D text data or 2D image data \
FPlan = yPlan | yPlan ∈ fPlan (oiPlan )
as the generation conditions for generative AI to produce
i=1
preliminary 3D forms. Variational Autoencoders (VAE) are (5)
2
also applied to train and generate 3D voxel models, guided \
∩ fPlan (sjPlan )
by textual labels (SP-3D text to FP-3D ) [64]. Lastly, the in-
j=1
tegration of VAE and GAN models facilitates the gener-
ation of architectural 3D forms from sketches (SP-3D sketch Where yPlan is the generated architectural plan design,
to FP-3D ) [65]. The difficulty of training on 3D data is FPlan is the collection of all the options. OPlan refers to
11
Figure 11. Architectural plan design process.
the Objective conditions of the architectural plan design, each floor’s Plan based on the model’s elevation contours.
which includes the preliminary architectural 3D form de- They then design functional spaces (s1Plan ) according to spa-
sign (o1Plan ), which is the result of the prior design phase; tial requirements (o2Plan ), as evacuation distances and space
spatial requirements and standards (o2Plan ), such as space area needs, positioning public areas on the lower floors and
area and quantity needs; Spatial environmental performance residential units above. Spatial sequences (s2Plan ) are struc-
evaluations (o3Plan ), such as room daylighting ratio, ventila- tured using corridors and atriums to align with the layout.
tion rate, etc . SPlan refers to the Subjective intentions of the Environmental evaluations (o3Plan ) are also conducted to en-
architectural plan design. which includes functional space sure spatial performance. This leads to a comprehensive
layout (s1Plan ), indicating the size and layout of functional architectural plan (RPlan ) that meets all established con-
spaces; Spatial sequences (s2Plan ), such as bubble diagrams straints.This process is showed in Figure 11.
and sequence schematics. The applications of generative AI in plan design in-
By accumulating the plan design results of each layer, clude four main categories, as shown in the table2: gen-
the overall plan design outcome is obtained, represented as erating floor plan FPlan based on 2D images (usually from
Equations (6): o1Plan ,o3Plan , s1Plan , s2Plan , and FPlan ) , generating functional
n
X
i space layout s1Plan based on 2D images (usually from s1Plan ,
RPlan = FPlan (6)
s2Plan ,o1Plan ,o2Plan , o2P-3D ,o3P-3D ,o4P-3D ), generating spatial se-
i=1
quences s2Plan based on 2D images (usually from RPlan ,o1Plan ,
Using the Bo-DAA apartment project as an example, ar- o3Plan ), generating Spatial environmental performance evalu-
chitects first create a preliminary 3D model (o1Plan ) to outline ations o3Plan based on 2D images (usually from FPlan ,s1Plan ).
12
Table 2. Application of generative AI in the architectural plan design
Firstly, In terms of generating architectural floor plans, tional layouts per spatial requirements(o2Plan to s1Plan ) [93].
researchers can create functional space layout diagrams Additionally, it can predict and generate functional space
from preliminary design range schematics or site range layouts based on surrounding environmental performance
schematics and then generate the final architectural Plan evaluations (s1Plan + o3P-3D to s1Plan ) [94–96, 122] . Similarly,
based on these layouts, progressing from o1Plan to s1Plan , Generative AI possesses the ability to complement or aug-
and finally to FPlan [79–83]. Architectural floor plans can ment incomplete functional space layouts based on specific
directly generated from functional space layout diagrams demands (s1Plan + o2Plan to s1Plan ) [97, 123], and it use spa-
(s1Plan to FPlan ) [84]. Additionally, researchers utilize gen- tial sequences to generate functional space layout diagrams
erative models to convert planar spatial bubble diagrams or (s2Plan to s1Plan ) [98–107] . In addition, it can generate func-
spatial sequence diagrams into latent spatial vectors, which tional space layout diagrams based on the designated red-
are then used to generate architectural floor plans (s2Plan to line boundary of a design site (o2P-3D to s1Plan ) [104,108–110]
FPlan ) [85–87]. Moreover, by utilizing GAN models, ar- , and it skillfully use plan design boundaries as conditions
chitectural floor plans can be further refined to obtain floor to generate functional space layout diagrams (o1Plan to s1Plan )
plans with flat furniture. (FPlan to FPlan ) [88]. Some pro- [111–113].
cesses of reconstruction and generation of floor plans are Thirdly, generative AI demonstrates exceptional perfor-
achieved through training architectural floor plans (FPlan to mance in the generation and prediction of spatial sequences
FPlan ) [89]. Generating floor plans can also be produced (s2Plan ). Specifically, It is capable of identifying and re-
based on space Environmental evaluations, such as lighting constructing wall layout sequences from floor plans (RPlan
and wind conditions (o3Plan to FPlan ) [90]. to s2Plan ) [114, 115].Additionally, it can construct spatial
Secondly, generative AI is not limited to producing ar- sequence bubble diagrams directly from these floor plans
chitectural floor plans but also plays various roles in the (RPlan to s2Plan ) [124]. Moreover, generative AI can em-
generation of functional space layouts (s1Plan ). For instance, ploy isovists for predicting spatial sequences (o3Plan to s2Plan )
it can utilize neural networks and genetic algorithms to en- [116,117] . Lastly, it is also capable of producing these dia-
hance functional layouts based on wind environments per- grams conditioned on specific plan design boundary ranges
formance evaluations (s1Plan + o4P-3D to s1Plan ) [91]. More- (o1Plan to s2Plan ) [118] .
over, Generative AI can reconstruct and produce match- Lastly, generative AI can foresee space environmental
ing functional layout diagrams based on the implicit in- performance evaluations from floor plans (FPlan to o3Plan )
formation within the functional space layout maps (s1Plan [119] , such as light exposure and isovist ranges. It can
to s1Plan ) [92]. Furthermore, it can generate viable func- also predict indoor brightness [120] and daylight penetra-
13
tion [121] using functional space layout diagrams (s1Plan to tive AI combines functional space layouts (s1Plan ) and ar-
o3Plan ). chitectural floor plans (o2str ) to create corresponding archi-
tectural structure layout diagrams (s1Plan + o2str to (xstr , ystr ))
3.3. Architectural Structural System Design [133].
The third phase in the architectural design process, archi- In terms of predicting and generating structural dimen-
tectural structure system design, involves architects devel- sions, generative AI can forecast and create more appro-
oping the building’s framework and support mechanisms. priate structural sizes and layouts based on the layout and
This process can be expressed by Equations (7). existing dimensions, thereby optimizing these dimensions
((xstr , ystr ) + dstr to (xstr , ystr ) + dstr ) [134]. Furthermore,
n o
T3 T2
Fstr = ystr ystr ∈ i=1 fstr (oistr ) ∩ j=1 fstr (sjstr ) (7) It can also generate dimensions and layouts that meet load
requirements based on structural layout (xstr , ystr ) and load
capacity (lstr ), ((xstr , ystr ) + lstr to (xstr , ystr ) + dstr ).
Where ystr is the generated structure system, Fstr is the
collection of all the options. OP-3D refers to the objective 3.4. Architectural 3D Forms Refinement and Opti-
conditions of the preliminary design, which includes the mization Design
structural load distribution (o1str ), referring to the schematic
Architectural design’s fourth phase focuses on refining
of the building’s structural load distribution; architectural
and optimizing 3D models to closely represent the build-
plan design (o2str ), the second step of the design process; the
ing’s characteristics based on the initial model. This step
preliminary 3D form of the building (o3str ), being the result
can enhance the detail and form, and this process can be
of the first step in the design process. SP-3D refers to the
expressed by Equations (8).
Subjective decisions of the structure system design, which
includes structural materials (s1str ) , typically encompass- (
ing parameters characterizing the materials and texture im-
ages; structural aesthetic principles (s2str ) , usually involving FD-3D = yD-3D yD-3D ∈
conceptual diagrams and 3D models of the structural form. ) (8)
4 2
The design outcome ystr in the formula encapsulates various \ \
structural information, such as structural load capacity (lstr ), fD-3D (oiD-3D ) ∩ fD-3D (sjD-3D )
i=1 j=1
structural dimensions (dstr ), and structural layout (xstr , ystr ).
This structural information is determined by a set of objec- Where yD-3D is the generated refined 3D model of the ar-
tive conditions (Ostr ) and a set of subjective decisions (Sstr ). chitecture, FD-3D is the collection of all the options. OD-3D
Using the Bo-DAA apartment project as an illustration, refers to the Objective conditions of the preliminary design,
the architect utilized the preliminary 3D model (o3str ) and which includes Requirements (o1D-3D ) of the refinement of
the architectural plan (o2str ) to define the building’s spatial the architectural 3D form by indicator; the preliminary 3D
form and structural load distribution (o1str ). Opting for a forms design of architectural (o2D-3D ), the result of the first
frame structure, reinforced concrete was chosen as the con- step in the design process; Architectural floor plan design
struction material (s1str ), embodying modern Brutalism (s2str ). (o3D-3D ), the outcome of the second step in the design pro-
This approach ensured that the final structure (Rstr ) adhered cess; Architectural structural system design (o4D-3D ), the re-
to both the aesthetic and the functional constraints.This pro- sult of the third step in the design process. SD-3D refers to
cess is represented in Figure 12. the subjective decisions of refined 3D model of the archi-
The applications of generative AI in structural system tecture, which includes Aesthetic principles (s1D-3D ), which
design primarily involve the prediction of structural layout are principles used by architects to control the overall form
((xstr , ystr )) and structural dimensions (dstr ). and proportions of a building ; The design style (s2D-3D ),
In the realm of generating architectural structure layout which mean a manifestation of a period or region’s spe-
images, generative AI is capable of recognizing architec- cific characteristics and expression methods that can be re-
tural floor plans (o2str ). Consequently, it leverages this recog- flected through elements such as the form, structure, mate-
nition to generate detailed images of the structural layout rials, color, and decoration of a building.
(o2str to (xstr , ystr )) [125–127]. Using the Bo-DAA apartment project, the architectural
. Moreover, this technology is adept at creating struc- form index (o1D-3D ) including key metrics like floor area ra-
tural layout diagrams that correspond to floor plans based tio and height is first established. Next, the preliminary
on specified structural load capacities (lstr text + o2str to 3D form (o2D-3D ) shapes a tapered volume. In the floor
(xstr , ystr )) [128–131].Additionally, generative AI holds the plan phase (o3D-3D ), refinements such as a sixth-floor setback
capability to refine and enhance existing structural layouts, for public spaces are made. The structural system design
optimizing the layouts within the same structural space (o4D-3D ) guides these modifications within structural princi-
from((xstr , ystr ) to (xstr , ystr )) [132]. Furthermore, genera- ples. Aesthetic principles (s1D-3D ) and design styles (s2D-3D )
14
Figure 12. Architectural structural system design process.
are woven throughout, culminating in a refined 3D form (usually from FD-3D 2d , o3D-3D , s2Plan ).
(RD-3D ) that harmonizes constraints with aesthetics..This In terms of Using parameters or 1D text to generate re-
process is illustrated in Figure 13. fined architectural 3D models by generative AI, researchers
The applications of generative AI in architectural 3D have trained voxel expression models (generate to FD-3D )
forms refinement and optimization design include two main [135] to generate these refined models.Additionally, gener-
categories, as shown in the table4: Using parameters or 1D ative AI has been employed to train Signed Distance Func-
text to generate FD-3D , or to conduct classification analy- tion (SDF) voxels, coupled with performing clustering anal-
sis (usually from s2D-3D text ); generating FP-3D , which repre- ysis on shallow vector representations of 3D models (FD-3D )
sented by 2D images or 3D models, based on 2D images [136].Following this, 2D images containing 3D voxel in-
15
Figure 13. Architectural 3D forms refinement and optimization design process.
formation can be generated based on the input RGB chan- chitectural 3D models by generative AI, researchers con-
nel values. (parameters to FD-3D 2d ) [138] . Furthermore, verted the refined architectural 3D models (FD-3D ) into sec-
New forms of 3D elements can be generated based on in- tional images, it is possible to train paired sectional dia-
terpolation, transitioning from (parameters to FD-3D ) [137]. grams using Generative Adversarial Networks (GANs) to
Voxelized and point cloud representations of 3D enhanced understand the connections between adjacent sectional im-
model components (FD-3D ) can be trained , and according ages. By inputting a single sectional image into the model
to the textual labels of architectural components (s2D-3D text to reconstruct a new sectional image and then using the
to FD-3D ) [139]. newly generated sectional image as input to continue devel-
In terms of basing 2D images to generate refined ar- oping sectional images through iteration of the above pro-
16
Figure 14. Architectural facade design process.
17
Table 5. Application of generative AI in architectural facade design.
the designer’s style and concept; materials and style of the (s1Fac ), represented as (aw + (pw ) + awin + (pwin ) + s1Fac )
facade (s2Fac ), different materials bring various textures and to FFac [145–152]. Furthermore, complete facade and roof
colors to the building, exhibiting unique architectural char- images for all four directions of a building can be generated
acteristics and styles. using semantic segmentation images (aw + (pw ) + awin +
Subsequently, the final facade design outcome can be (pwin ) + s1Fac to RFac ) [153]. Additionally, generative AI
achieved by summing up the facade design results from proves instrumental in training architectural facade images
each direction. This process can be expressed by Equations for both reconstruction and novel generation processes (FFac
(10). to FFac ) [154]. Its utility is further demonstrated in the ap-
plication of style transfer to architectural facades, either by
4
X
i incorporating style images (FFac + s2Fac to FFac ) [156, 158]
RFac = FFac (10) or by facilitating style transfer between facade images of
i=1
diverse architectural styles (FFac to FFac ) [155].
Each direction’s final facade design outcome yFac , encap- In generating semantic segmentation maps for architec-
sulates various facade information, such as the area (aw ) tural facades, generative AI can be employed for the recon-
and position (pw ) of the wall surface, the area (awin ) and struction and generation of semantic segmentation maps of
position (pwin ) of the window surface, and the adoption of facades, like rebuilding the occluded parts of a semantic
a specific style for the facade components (cFac ). These in- segmentation maps based on the unobstructed parts(aw +
formation are derived from the set of objective conditions (pw )+awin +(pwin )+s1Fac to (aw +(pw )+awin +(pwin )+
OFac and the set of subjective decisions SFac . s1Fac )) [157].
In the Bo-DAA apartment project, architects use the ar-
chitectural plan (o2Fac ) to define windows and walls, incor- 3.6. Architectural Image Expression
porating glass curtain walls on the ground floor. The struc- Architectural image expression synthesizes design ele-
tural design (o3Fac ) guides facade structuring, ensuring align- ments into 2D images, reflecting the architect’s vision and
ment with the building’s structure. The refined 3D model design process. This process can be expressed by Equations
(o4Fac ) influences the facade’s shape, with residential win- (11).
dows designed to complement the building’s form. Facade
performance is enhanced through simulations (o1Fac ). Mate- (
rial selection (s2Fac ) favors exposed concrete, echoing Bru-
FImg = yImg yImg
talist aesthetics (s1Fac ), resulting in a minimalist, sculptural
facade (RFac ).This process is illustrated in Figure 14. 4 2
) (11)
The applications of generative AI in architectural facade
\ \
∈ fImg (oiImg ) ∩ fImg (sjImg )
design include two main categories, as shown in the table i=1 j=1
5: generating FFac based on 2D images (usually from s2Fac ,
FFac , semantic segmentation map of facade); generating se- Where yImg is the generated architectural images, FImg is
mantic segmentation map of facade based on 2D images. the collection of all the options. OImg refers to the Objective
In generating architectural facades, initially, it facilitates conditions of the architectural image expression, which in-
the generation of facade images by utilizing architectural cludes Architectural plan design (o1Img ), which is the result
facade semantic segmentation map, which annotate the pre- of the second step in the design process; architectural struc-
cise location and form of facade elements such as walls, tural system design, (o2Img ), which is the result of the third
window panes, and other components. Consequently, this step in the design process; refined 3D form of architecture
process involves generating facade images under the con- (o3Img ), which is the result of the fourth step in the design
straints of a given wall area (aw ) and position (pw ), win- process. architectural facade design (o4Img ), which is the re-
dow area (awin ), position (pwin ), and component elements sult of the fifth step in the design process. SImg refers to the
18
Table 6. Application of generative AI in architectural image expression process.
Subjective decisions of the architectural plan design. which sketches or line drawings (s2Img to FImg ) [179–182]. By
includes aesthetic principles (s1Img ) and image style (s2Img ), leveraging generative AI models, architectural images can
indicating elements are principles architects use to control undergo style blending, where images are generated based
the composition and style of architectural images. on two input images, enhancing the versatility of architec-
The applications of generative AI in architectural image tural visualization (FImg to FImg ) [186]. Employing GAN
expression include three main categories, as shown in the models to generate comfortable underground space render-
table 6: generating architectural image FFac based on 1D ings from virtual 3D space images (FImg to FImg ) [183]; and
text (usually from parameter, s1Image text ) ; generating ar- facilitates the creation of interior decoration images from
chitectural image FFac based on 2D images (usually from ) 360-degree panoramic interior images (FImg to FImg ) [184].
; generating architectural different style images or semantic Moreover, Using StyleGAN2 to generate architectural fa-
images (s1Img ,s2Img ) based on 2D images (usually from s2Img , cade and floor plan images (FImg to FImg ) [185] serves as a
FFac ). basis for establishing 3D architectural models.
In generating architectural images based on 1D text, In generating architectural different style images or se-
researchers employ linear interpolation techniques to cre- mantic images based on 2D images, generative AI can be
ate architectural images from varying perspectives (param- instrumental in the reconstruction and generation of archi-
eter to FImg ) [159]. Moreover, the direct generation of tectural line drawings (s2Img to s2Img ) [187]. And generative
architectural images from textual prompts simplifies and AI is capable of producing semantic style images that cor-
streamlines the process (s1Image text to FImg ) [4,5,161,163– respond to architectural images (FImg to s1Img ) [188].
165].This approach is also effective for generating architec-
tural interior images, as demonstrated by the use of Stable 4. Future Research Directions
Diffusion for interior renderings (s1Image text to FImg ) [166].
In generating architectural images based on 2D images, In this section, we illustrate the potential future research
several researchers have focused on training architectural directions to apply generative AI in architectural design us-
images and their paired textual prompts using generative ing the latest emerging techniques of image, video and 3D
AI models, facilitating the creation of architectural im- forms generations (Section 2).
ages based on the textual prompts (s1Image text + FImg to
4.1. Architectural Design Image Generation
FImg ) [160, 167–175]. Additionly, researchers utilize gen-
erative AI models to train architectural images and corre- Floor Plan Generation Researchers have applied vari-
sponding textual prompts, generating architectural images ous generative AI image generation techniques to the de-
based on these prompts (o3Img +FImg to FImg ) [162,176].Fur- sign and generation of architectural plan images. As tech-
thermore, the direct generation of architectural images nology advances, architects can gradually incorporate more
can be achieved through the use of image semantic la- conditional constraints into the generation of floor plans,
bels (masks) or textual descriptions. Specifically, gener- allowing generative AI to take over the thought process of
ating architectural images from image semantic labels of- architects. Architects can supply text data to the generative
fers precise control over the content of the generated images models, Text data encompasses client design requirements
(s1Image mask to FImg ) [177,178]; Researchers have also ex- and architectural design standards (o2Plan ), such as building
plored the transformation of architectural images across dif- type, occupancy, spatial needs, dimensions of architectural
ferent styles, such as generating architectural images from spaces, evacuation route settings and dimensions, fire safety
19
layout standards, etc. Architects also can supply image data holds promise for aligning with layout considerations in ar-
to the generative models, such as site plans (o2P-3D ), which chitectural design.
define the specific land use of architectural projects, nearby
buildings and natural features (o3P-3D ), as well as floor layout Elevation Generation Applications of generative AI on
diagrams (s1Plan ) or spatial sequence diagrams (s2Plan ) . facade generation based on semantic segmentation , textual
Based on the aforementioned method, some generative descriptions, and facade image style transfer. These ad-
AI models hold developmental potential in architectural vancements have made the facade generation process more
floor plan generation. ”Scene Graph” is a data structure ca- efficient. With the advancement of generative AI technol-
pable of intricately describing the elements within a scene ogy, researchers can develop more efficient and superior fa-
and their interrelations, consisting of nodes and edges. This cade generation models. For instance, architects can pro-
structure is particularly suited for depicting the connectiv- vide generative AI models with conditions such as facade
ity within architectural floor plans. By integrating diffu- sketches, facade mask segmentation images, and descrip-
sion models, SceneGenie [46] can accurately generate ar- tive terms for facade generation. These conditions can as-
chitectural floor plans using Scene Graphs. Furthermore, sist architects in generating corresponding high-quality fa-
technologies such as Stable Diffusion [9] and Imagen [189] cade images, streamlining the facade design process, and
allow for further refinement in the generation process of ar- enhancing design efficiency.
chitectural floor plans through text prompts and layout con-
trols.
Floor Plans Generation
Generative
Parameters
Model
Describe the floor plan layout of a house featuring an open-concept design Facade Generation
with a seamless flow between the living room, kitchen, and dining area. The
spacious master bedroom with ensuite bathroom is positioned for privacy,
while additional bedrooms and a shared bathroom are strategically placed Mask Result
for convenience and comfort.
Figure 16. Framework example for building floor plans and facade
generation.
20
Simultaneously, analyzing the style and characteristics of tual descriptions into detailed dynamic videos, enhancing
surrounding buildings ensures that the new facade design the visual impact and augmenting audience engagement,
harmonizes with its environment, maintaining consistency enabling designers to depict the architectural transforma-
in the scene. tions over time through text effortlessly. DynamiCrafter
employs text-prompted technology to infuse static images
Architectural Image Generation The text-to-image im- with dynamic elements, such as flowing water and drifting
age generation method is capable of producing creative ar- clouds, with high-resolution support ensuring the preserva-
chitectural concept designs (FImg ) based on brief descrip- tion of details and realism. PIKA, conversely, demonstrates
tions or a set of parameters (s1Image text , s2Image text ). The unique advantages in dynamic scene transformation, sup-
image-to-image image generation method enables the gen- porting text-driven element changes, allowing designers to
eration of architectural images possessing consistent fea- maintain scene integrity while presenting dynamic details,
tures or styles. This offers the potential to explore architec- thereby offering a rich and dynamic visual experience.
tural forms and spatial layouts yet to be conceived by human With advancements in diffusion models, current gener-
designers. Automatically generated architectural concepts ative models can now produce high-quality effect videos.
can serve as design inspiration, helping designers break tra- As shown in the Figure 17, the first and second rows de-
ditional mindsets and explore new design spaces. Simul- pict effect demonstration videos generated from input im-
taneously, diffusion probabilistic models can generate real- ages using PIKA [49], where the buildings undergo minor
istic architectural rendering images suitable for Virtual Re- movements and scaling while maintaining consistency with
ality (VR) or Augmented Reality (AR) applications, pro- the surrounding environment. DynamiCrafter [50] can gen-
viding designers and clients with immersive design evalu- erate rotating buildings, as demonstrated in the third row,
ation experiences. This advances higher quality and inter- where the model predicts architectural styles from differ-
active architectural visualization technologies, making the ent angles and ensures consistent generation. From GANs
design review and modification process more intuitive and to diffusion models, mature image-to-image style transfer
efficient. models have been implemented. The application of these
Stable Diffusion [9], DALLE-3 [34], and GLIDE [36] models ensures that the generated videos exhibit the desired
have been significantly applied in the domain of architec- presentation effects, greatly expanding the application sce-
tural image generation, demonstrating robust capabilities narios for videos.
in image synthesis. ControlNet [38], with its exceptional
controllability, has increasingly been utilized by architects Style Transfer for Specific Video Content In the outlook
to generate architectural images and style transfer, substan- for future technologies, the application of generative AI for
tially enriching design creativity and enhancing design effi- partial style transfer in architectural video content paves the
ciency. Similarly, GLIGEN [43] and SceneGenie [46] have way to new frontiers in architectural visual presentation.
shown potential in the control of image content, which also This technology enables designers to replicate an overall
holds significant value in the generation and creation of ar- style and, more importantly, precisely select which parts
chitectural imagery. of the video should undergo style transformation. Deep
learning-based neural style transfer algorithms have proven
4.2. Architectural Design Video Generation
their efficacy in applying style transfer to images and video
Video Generation based on Architectural Images and content. These algorithms achieve style transformation by
Text Prompt The application of generative AI-based learning specific features of a target style image and apply-
video generation in architectural design has multiple devel- ing these features to the original content. This implies that
opment directions. Through generative AI technology, per- distinct artistic styles or visual effects can be applied to se-
formance videos can be generated using a single architec- lected video portions in architectural videos. Local video
tural effect image (FImg ) along with relevant textual descrip- style transfer opens up novel possibilities in the architec-
tions (s1Image text , s2Image text ), performance videos can be tural domain, allowing designers and researchers to explore
produced. Future advancements include compiling multiple and present architectural and urban spaces in ways never
images of a structure from various angles to craft a continu- before possible. By precisely controlling the scope of style
ous video narrative. Such an approach diversifies presenta- transfer application, unique visual effects can be created,
tion techniques and streamlines the design process, yielding thereby enhancing architectural videos’ expressiveness and
significant time and cost savings. communicative value.
In the field of architectural video generation, Make- PIKA [49] showcases significant advantages in style
A-Video [48], DynamiCrafter [50], and PIKA [49] each transfer applications for architectural video content, of-
showcase their strengths, bringing innovative presentation fering robust support for visual presentation and research
methods to the forefront. Make-A-Video transforms tex- within the architectural realm. This technology enables
21
0 1 2 3 t/s
Input Image
Screenshot for isometric strategy RPG video game, a medieval castle, dynamic Unreal Engine render, panoramic scale, mountains and river, high resolution, ultra detailed
Figure 17. Currently, generative models, such as PIKA [49] and DynamiCrafter [50], are capable of generating high-quality videos from
images, supporting multi-angle rotation, and style transfer.
designers and researchers to perform precise and flexi- architectural images, such as site information (o2P-3D ),or text
ble style customization for architectural videos, facilitat- prompt, such as design requirements (o1P-3D ), as input can
ing style transfers tailored to specific video content. No- improve modeling efficiency.
tably, PIKA allows for the style transfer of specific elements
or areas within a video instead of a uniform transforma-
tion across the entire content. This capability of localized In architectural 3D modeling, technologies such as
style transfer enables the accentuation of certain architec- DreamFusion [51], Magic3D [52], CLIP-NeRF [54], and
tural features or details, such as presenting a segment of DreamCraft3D [55] have emerged as revolutionary archi-
classical architecture in a modern or abstract artistic style, tectural design and visualization tools. They empower
thereby enhancing the video’s appeal and expressiveness. architects and designers to directly generate detailed and
Furthermore, PIKA excels in maintaining video content’s high-fidelity 3D architectural models from textual descrip-
coherence and visual consistency. By finely controlling the tions or 2D images, significantly expanding the possibilities
extent and scope of the style transfer, PIKA ensures that for architectural creativity and enhancing work efficiency.
the video retains its original structure and narrative while Specifically, as shown in Figure 18, DreamFusion [51] and
integrating new artistic styles, resulting in an aesthetically Magic3D [52] allow designers to swiftly create architectural
pleasing and authentic final product. Additionally, PIKA’s 3D model prototypes through simple text descriptions, ac-
style transfer technology is not confined to traditional artis- celerating the transition from concept to visualization. De-
tic styles but is also adaptable to various complex and inno- signers can easily modify textual descriptions and employ
vative visual effects, providing a vast canvas for creative ex- these tools for iterative design, exploring various architec-
pression in architectural video content. Whether emulating tural styles and forms to optimize design schemes. More-
the architectural style of a specific historical period or ven- over, CLIP-NeRF [54] and DreamCraft3D [55] enable de-
turing into unprecedented visual effects, PIKA is equipped signers to extract 3D information from existing architectural
to support such endeavors. images, facilitating the precise reconstruction of historical
buildings or current sites for restoration, research, or further
4.3. Architectural 3D Forms Generation development. Additionally, designers can create unique vi-
sual effects in 3D models by transforming and fusing image
3D Model Generation based on Architectural Images styles, further enhancing the artistic appeal and attractive-
and Text Prompt Generating 3D building forms Using ness of architectural representations.
22
(a) Castle (b) Abbey (c) Chichen Itza
(a) Turn the bear into a Grizzly bear
23
be similar to existing co-pilot models but with further en- a broad spectrum of architectural design.
hanced functionality and intelligence. In conclusion, the integration of generative AI into ar-
Realizing this large model requires training the AI model chitectural design represents a significant leap forward in
on a vast amount of architectural design data and user feed- the realm of digital architecture. This advanced technol-
back to enable it to understand complex design require- ogy has shown exceptional capability in generating high-
ments. It also necessitates multimodal input processing, quality, high-resolution images and designs, offering inno-
developing technologies capable of handling various types vative ideas and enhancing the creative process across vari-
of inputs, such as text, images, and sound, to increase the ous facets of architectural design. As we look to the future,
model’s application flexibility. In addition, developing in- it is clear that the continued exploration and integration of
telligent interaction interfaces is essential; user-friendly in- Generative AI in architectural design will play a pivotal role
terfaces allow architects to communicate intuitively with the in shaping the next generation of digital architecture. This
AI model, state their needs, and receive feedback. Finally, technological evolution not only simplifies and accelerates
the model should provide customized output designs, gen- the design process but also opens up new avenues for cre-
erating multiple design options based on the input require- ativity, enabling architects to push the boundaries of tradi-
ments and data for architects to choose and modify. tional design and explore new, innovative design spaces.
However, realizing this architectural design AI large
model faces numerous challenges: 1)Data collection and References
processing: high-quality training data is critical to the per-
formance of AI models, and efficiently collecting and pro- [1] Matias Del Campo, Sandra Manninger, M Sanche, and
cessing a vast amount of architectural design data is a sig- L Wang. The church of ai—an examination of architecture
nificant challenge. 2)The fusion of multimodal inputs, ef- in a posthuman design ecology. In Intelligent & Informed-
Proceedings of the 24th CAADRIA Conference, Victoria
fectively integrating information from different modalities
University of Wellington, Wellington, New Zealand, pages
to improve the model’s accuracy and application scope, re- 15–18, 2019.
quires further technological breakthroughs. Another chal-
lenge is optimizing user interaction; designing an interface [2] Frederick Chando Kim, Mikhael Johanes, and Jeffrey
that aligns with architects’ habits and enables accessible Huang. Text2form diffusion: Framework for learning cu-
rated architectural vocabulary. In 41st Conference on Edu-
communication with the AI model is crucial for the tech-
cation and Research in Computer Aided Architectural De-
nology’s implementation. 3)Ensuring that AI-generated de- sign in Europe, eCAADe 2023, pages 79–88. Education and
signs meet practical needs while being innovative and per- research in Computer Aided Architectural Design in Eu-
sonalized is critical for technological development. By ad- rope, 2023.
dressing these challenges, the future may see the realiza-
[3] Xinwei Zhuang, CA Design, ED Phase, C Generative, and
tion of generative AI models that truly aid architectural de-
AN Network. Rendering sketches. eCAADe 2022, 1:517–
sign, improving design efficiency and quality and achieving
521, 1973.
human-centric architectural design optimization.
[4] Daniel Koehler. More than anything: Advocating for
5. Conclusion synthetic architectures within large-scale language-image
models. International Journal of Architectural Computing,
The field of generative models has witnessed unparal- page 14780771231170455, 2023.
leled advancements, particularly in image generation, video [5] Mathias Bank Stigsen, Alexandra Moisi, Shervin Ra-
generation, and 3D content creation. These developments soulzadeh, Kristina Schinegger, and Stefan Rutzinger. Ai
span across various applications, including text-to-image, diffusion as design vocabulary - investigating the use of
image-to-image, text-to-3D, and image-to-3D transforma- ai image generation in early architectural design and edu-
tions, demonstrating a significant leap in the capability to cation. Digital Design Reconsidered - Proceedings of the
synthesize realistic, high-fidelity content from minimal in- 41st Conference on Education and Research in Computer
puts. The rapid advancement of generative models marks a Aided Architectural Design in Europe (eCAADe 2023),
transformative phase in artificial intelligence, where synthe- page 587–596, 2023.
sizing realistic, diverse, and semantically consistent content [6] Karol Gregor, Ivo Danihelka, Alex Graves, Danilo
across images, videos, and 3D models is becoming increas- Rezende, and Daan Wierstra. Draw: A recurrent neural
ingly feasible. This progress paves new avenues for creative network for image generation. In International conference
expression and lays the groundwork for future innovations on machine learning, pages 1462–1471. PMLR, 2015.
in digital architectural design process. As the field contin- [7] Tero Karras, Samuli Laine, and Timo Aila. A style-based
ues to evolve, further exploration of model efficiency, con- generator architecture for generative adversarial networks.
trollability, and domain-specific applications will be crucial In Proceedings of the IEEE/CVF conference on computer
in harnessing the full potential of generative AI models for vision and pattern recognition, pages 4401–4410, 2019.
24
[8] Ran Yi, Yong-Jin Liu, Yu-Kun Lai, and Paul L Rosin. International conference on machine learning, pages 9786–
Apdrawinggan: Generating artistic portrait drawings from 9796. PMLR, 2020.
face photos with hierarchical gans. In Proceedings of
[21] Nanyang Wang, Yinda Zhang, Zhuwen Li, Yanwei Fu, Wei
the IEEE/CVF conference on computer vision and pattern
Liu, and Yu-Gang Jiang. Pixel2mesh: Generating 3d mesh
recognition, pages 10743–10752, 2019.
models from single rgb images. In Proceedings of the Euro-
[9] Robin Rombach, Andreas Blattmann, Dominik Lorenz, pean conference on computer vision (ECCV), pages 52–67,
Patrick Esser, and Björn Ommer. High-resolution image 2018.
synthesis with latent diffusion models. In Proceedings of
[22] Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter
the IEEE/CVF Conference on Computer Vision and Pattern
Battaglia. Polygen: An autoregressive generative model of
Recognition, pages 10684–10695, 2022.
3d meshes. In International conference on machine learn-
[10] Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott ing, pages 7220–7229. PMLR, 2020.
Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya
Sutskever. Zero-shot text-to-image generation. In Interna- [23] Jeong Joon Park, Peter Florence, Julian Straub, Richard
tional Conference on Machine Learning, pages 8821–8831. Newcombe, and Steven Lovegrove. Deepsdf: Learning
PMLR, 2021. continuous signed distance functions for shape representa-
tion. In Proceedings of the IEEE/CVF Conference on Com-
[11] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing puter Vision and Pattern Recognition (CVPR), June 2019.
Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville,
and Yoshua Bengio. Generative adversarial nets. Advances [24] Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se-
in neural information processing systems, 27, 2014. bastian Nowozin, and Andreas Geiger. Occupancy net-
works: Learning 3d reconstruction in function space. In
[12] Mehdi Mirza and Simon Osindero. Conditional generative Proceedings of the IEEE/CVF Conference on Computer Vi-
adversarial nets. arXiv preprint arXiv:1411.1784, 2014. sion and Pattern Recognition (CVPR), June 2019.
[13] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A
[25] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik,
Efros. Image-to-image translation with conditional adver-
Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf:
sarial networks. In Proceedings of the IEEE conference on
Representing scenes as neural radiance fields for view syn-
computer vision and pattern recognition, pages 1125–1134,
thesis. Communications of the ACM, 65(1):99–106, 2021.
2017.
[26] Fang Zhao, Wenhao Wang, Shengcai Liao, and Ling Shao.
[14] Yang Song and Stefano Ermon. Generative modeling by
Learning anchored unsigned distance functions with gradi-
estimating gradients of the data distribution. Advances in
ent direction alignment for single-view garment reconstruc-
neural information processing systems, 32, 2019.
tion. In Proceedings of the IEEE/CVF International Con-
[15] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- ference on Computer Vision, pages 12674–12683, 2021.
fusion probabilistic models. Advances in Neural Informa-
[27] Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler,
tion Processing Systems, 33:6840–6851, 2020.
and George Drettakis. 3d gaussian splatting for real-time
[16] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- radiance field rendering. ACM Transactions on Graphics,
net: Convolutional networks for biomedical image seg- 42(4), July 2023.
mentation. In Medical Image Computing and Computer-
Assisted Intervention–MICCAI 2015: 18th International [28] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina
Conference, Munich, Germany, October 5-9, 2015, Pro- Toutanova. Bert: Pre-training of deep bidirectional
ceedings, Part III 18, pages 234–241. Springer, 2015. transformers for language understanding. arXiv preprint
arXiv:1810.04805, 2018.
[17] Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang
Zhao. Latent consistency models: Synthesizing high- [29] Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya
resolution images with few-step inference. arXiv preprint Sutskever, et al. Improving language understanding by gen-
arXiv:2310.04378, 2023. erative pre-training. 2018.
[18] Jiajun Wu, Chengkai Zhang, Xiuming Zhang, Zhoutong [30] Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam
Zhang, William T Freeman, and Joshua B Tenenbaum. Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia
Learning shape priors for single-view 3d completion and Jin, Taylor Bos, Leslie Baker, Yu Du, et al. Lamda:
reconstruction. In Proceedings of the European Conference Language models for dialog applications. arXiv preprint
on Computer Vision (ECCV), pages 646–662, 2018. arXiv:2201.08239, 2022.
[19] Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. [31] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub-
Pointnet: Deep learning on point sets for 3d classification biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan-
and segmentation. In Proceedings of the IEEE conference tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al.
on computer vision and pattern recognition, pages 652– Language models are few-shot learners. Advances in neu-
660, 2017. ral information processing systems, 33:1877–1901, 2020.
[20] Andrey Voynov and Artem Babenko. Unsupervised dis- [32] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya
covery of interpretable directions in the gan latent space. In Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry,
25
Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- [44] Justin Johnson, Agrim Gupta, and Li Fei-Fei. Image gener-
ing transferable visual models from natural language super- ation from scene graphs. In Proceedings of the IEEE con-
vision. In International conference on machine learning, ference on computer vision and pattern recognition, pages
pages 8748–8763. PMLR, 2021. 1219–1228, 2018.
[33] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi [45] Renato Sortino, Simone Palazzo, and Concetto Spamp-
Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer inato. Transforming image generation from scene graphs.
Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment In 2022 26th International Conference on Pattern Recogni-
anything. arXiv preprint arXiv:2304.02643, 2023. tion (ICPR), pages 4118–4124. IEEE, 2022.
[34] Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey [46] Azade Farshad, Yousef Yeganeh, Yu Chi, Chengzhi Shen,
Chu, and Mark Chen. Hierarchical text-conditional Böjrn Ommer, and Nassir Navab. Scenegenie: Scene graph
image generation with clip latents. arXiv preprint guided diffusion models for image synthesis. In Proceed-
arXiv:2204.06125, 2022. ings of the IEEE/CVF International Conference on Com-
puter Vision, pages 88–98, 2023.
[35] Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xi-
aogang Wang, Xiaolei Huang, and Dimitris N Metaxas. [47] Jonathan Ho, Tim Salimans, Alexey Gritsenko, William
Stackgan: Text to photo-realistic image synthesis with Chan, Mohammad Norouzi, and David J Fleet. Video dif-
stacked generative adversarial networks. In Proceedings fusion models. arXiv:2204.03458, 2022.
of the IEEE international conference on computer vision, [48] Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An,
pages 5907–5915, 2017. Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual,
Oran Gafni, et al. Make-a-video: Text-to-video generation
[36] Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav
without text-video data. arXiv preprint arXiv:2209.14792,
Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and
2022.
Mark Chen. Glide: Towards photorealistic image genera-
tion and editing with text-guided diffusion models. arXiv [49] Leijie Wang, Nicolas Vincent, Julija Rukanskaitė, and
preprint arXiv:2112.10741, 2021. Amy X Zhang. Pika: Empowering non-programmers to au-
thor executable governance policies in online communities.
[37] Wengling Chen and James Hays. Sketchygan: Towards di- arXiv preprint arXiv:2310.04329, 2023.
verse and realistic sketch to image synthesis. In Proceed-
ings of the IEEE Conference on Computer Vision and Pat- [50] Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Xin-
tern Recognition, pages 9416–9425, 2018. tao Wang, Tien-Tsin Wong, and Ying Shan. Dynamicrafter:
Animating open-domain images with video diffusion pri-
[38] Lvmin Zhang and Maneesh Agrawala. Adding conditional ors. arXiv preprint arXiv:2310.12190, 2023.
control to text-to-image diffusion models. arXiv preprint
arXiv:2302.05543, 2023. [51] Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden-
hall. Dreamfusion: Text-to-3d using 2d diffusion. In The
[39] Yichen Peng, Chunqi Zhao, Haoran Xie, Tsukasa Fukusato, Eleventh International Conference on Learning Represen-
and Kazunori Miyata. Difffacesketch: High-fidelity face tations, 2023.
image synthesis with sketch-guided latent diffusion model.
[52] Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki
arXiv preprint arXiv:2302.06908, 2023.
Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja
[40] Zhengyu Huang, Haoran Xie, Tsukasa Fukusato, and Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-
Kazunori Miyata. Anifacedrawing: Anime portrait explo- resolution text-to-3d content creation. In Proceedings of
ration during your sketching. In ACM SIGGRAPH 2023 the IEEE/CVF Conference on Computer Vision and Pattern
Conference Proceedings. Association for Computing Ma- Recognition (CVPR), pages 300–309, June 2023.
chinery, 2023. [53] Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik,
[41] Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf:
Image generation from layout. In Proceedings of the Representing scenes as neural radiance fields for view syn-
IEEE/CVF Conference on Computer Vision and Pattern thesis. In ECCV, 2020.
Recognition (CVPR), June 2019. [54] Can Wang, Menglei Chai, Mingming He, Dongdong
[42] Sen He, Wentong Liao, Michael Ying Yang, Yongxin Yang, Chen, and Jing Liao. Clip-nerf: Text-and-image driven
Yi-Zhe Song, Bodo Rosenhahn, and Tao Xiang. Context- manipulation of neural radiance fields. arXiv preprint
aware layout to image generation with enhanced object ap- arXiv:2112.05139, 2021.
pearance. In Proceedings of the IEEE/CVF Conference on [55] Jingxiang Sun, Bo Zhang, Ruizhi Shao, Lizhen Wang, Wen
Computer Vision and Pattern Recognition (CVPR), pages Liu, Zhenda Xie, and Yebin Liu. Dreamcraft3d: Hierarchi-
15049–15058, June 2021. cal 3d generation with bootstrapped diffusion prior. arXiv
[43] Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, preprint arXiv:2310.16818, 2023.
Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae [56] Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren,
Lee. Gligen: Open-set grounded text-to-image generation. Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Sko-
arXiv preprint arXiv:2301.07093, 2023. rokhodov, Peter Wonka, Sergey Tulyakov, and Bernard
26
Ghanem. Magic123: One image to high-quality 3d object [70] Jingyi Li, Fang Guo, and Hong Chen. A study on urban
generation using both 2d and 3d diffusion priors. In The block design strategies for improving pedestrian-level wind
Twelfth International Conference on Learning Representa- conditions: Cfd-based optimization and generative adver-
tions (ICLR), 2024. sarial networks. Energy and Buildings, page 113863, 2023.
[57] Jaime de Miguel Rodrı́guez, Maria Eugenia Villafañe, Luka [71] Ondrej Veselỳ. Building massing generation using gan
Piškorec, and Fernando Sancho Caparrini. Generation of trained on dutch 3d city models. 2022.
geometric interpolations of building types with deep varia-
tional autoencoders. Design Science, 6:e34, 2020. [72] Feifeng Jiang, Jun Ma, Christopher John Webster, Xiao
Li, and Vincent JL Gan. Building layout generation us-
[58] Xinwei Zhuang, Yi Ju, Allen Yang, and Luisa Caldas. Syn-
ing site-embedded gan model. Automation in Construction,
thesis and generation for 3d architecture volume with gen-
151:104888, 2023.
erative modeling. International Journal of Architectural
Computing, page 14780771231168233, 2023. [73] Diego Navarro-Mateu, Oriol Carrasco, and Pedro
[59] Yubo Liu, Han Li, Qiaoming Deng, and Kai Hu. Diffusion Cortes Nieves. Color-patterns to architecture conver-
probabilistic model assisted 3d form finding and design la- sion through conditional generative adversarial networks.
tent space exploration: A case study for taihu stone spacial Biomimetics, 6(1):16, 2021.
transformation. In The International Conference on Com- [74] Suzi Kim, Dodam Kim, and Sunghee Choi. Citycraft: 3d
putational Design and Robotic Fabrication, pages 11–23. virtual city creation from a single image. The Visual Com-
Springer, 2023. puter, 36:911–924, 2020.
[60] Hao Zheng and Philip F Yuan. A generative architectural
[75] Dong Wook Shu, Sung Woo Park, and Junseok Kwon.
and urban design method through artificial neural networks.
3d point cloud generative adversarial network based on
Building and Environment, 205:108178, 2021.
tree structured graph convolutions. In Proceedings of the
[61] Panagiota Pouliou, Anca-Simona Horvath, and George IEEE/CVF international conference on computer vision,
Palamas. Speculative hybrids: Investigating the generation pages 3859–3868, 2019.
of conceptual architectural forms through the use of 3d gen-
erative adversarial networks. International Journal of Ar- [76] Adam Sebestyen, Ozan Özdenizci, Robert Legenstein, and
chitectural Computing, page 14780771231168229, 2023. Urs Hirschberg. Generating conceptual architectural 3d ge-
[62] Tomas Cabezon Pedroso, Jinmo Rhee, and Daragh Byrne. ometries with denoising diffusion models. In 41st Confer-
Feature space exploration as an alternative for design space ence on Education and Research in Computer Aided Ar-
exploration beyond the parametric space. arXiv preprint chitectural Design in Europe-Digital Design Reconsidered:
arXiv:2301.11416, 2023. eCAADe 2023, 2023.
[63] FREDERICK CHANDO KIM and JEFFREY HUANG. [77] Spyridon Ampanavos and Ali Malkawi. Early-phase
Towards a machine understanding of architectural form. performance-driven design using generative models. In In-
[64] Adam Sebestyen, Urs Hirschberg, and Shervin Ra- ternational Conference on Computer-Aided Architectural
soulzadeh. Using deep learning to generate design spaces Design Futures, pages 87–106. Springer, 2021.
for architecture. International Journal of Architectural [78] Chenyu Huang, Gengjia Zhang, Jiawei Yao, Xiaoxin Wang,
Computing, page 14780771231168232, 2023. John Kaiser Calautit, Cairong Zhao, Na An, and Xi Peng.
[65] Alberto Tono, Heyaojing Huang, Ashwin Agrawal, and Accelerated environmental performance-driven urban de-
Martin Fischer. Vitruvio: 3d building meshes via single per- sign with generative adversarial network. Building and En-
spective sketches. arXiv preprint arXiv:2210.13634, 2022. vironment, 224:109575, 2022.
[66] Qiaoming Deng, Xiaofeng Li, Yubo Liu, and Kai Hu. Ex- [79] Stanislas Chaillou. Archigan: Artificial intelligence x ar-
ploration of three-dimensional spatial learning approach chitecture. In Architectural Intelligence: Selected Papers
based on machine learning–taking taihu stone as an exam- from the 1st International Conference on Computational
ple. Architectural Intelligence, 2(1):5, 2023. Design and Robotic Fabrication (CDRF 2019), pages 117–
[67] Raffaele Di Carlo, Divyae Mittal, and Ondrej Veselỳ. Gen- 127. Springer, 2020.
erating 3d building volumes for a given urban context us-
[80] Xiaoni Gao, Xiangmin Guo, and Tiantian Lo. M-strugan:
ing pix2pix gan. Legal Depot D/2022/14982/02, page 287,
An automatic 2d-plan generation system under mixed struc-
2022.
tural constraints for homestays. Sustainability, 15(9):7126,
[68] Steven Jige Quan. Urban-gan: An artificial intelligence- 2023.
aided computation system for plural urban design. Environ-
ment and Planning B: Urban Analytics and City Science, [81] Xiao Min, Liang Zheng, and Yile Chen. The floor plan
49(9):2500–2515, 2022. design method of exhibition halls in cgan-assisted museum
architecture. Buildings, 13(3):756, 2023.
[69] Shiqi Zhou, Yuankai Wang, Weiyi Jia, Mo Wang, Yuwei
Wu, Renlu Qiao, and Zhiqiang Wu. Automatic responsive- [82] Da Wan, Xiaoyu Zhao, Wanmei Lu, Pengbo Li, Xinyu Shi,
generation of 3d urban morphology coupled with local cli- and Hiroatsu Fukuda. A deep learning approach toward
mate zones using generative adversarial network. Building energy-effective residential building floor plan generation.
and Environment, 245:110855, 2023. Sustainability, 14(13):8074, 2022.
27
[83] Ran Chen, Jing Zhao, Xueqi Yao, Sijia Jiang, Yingting He, [95] Lehao Yang, Long Li, Qihao Chen, Jiling Zhang, Tian
Bei Bao, Xiaomin Luo, Shuhan Xu, and Chenxi Wang. Feng, and Wei Zhang. Street layout design via conditional
Generative design of outdoor green spaces based on gen- adversarial learning. arXiv preprint arXiv:2305.08186,
erative adversarial networks. Buildings, 13(4):1083, 2023. 2023.
[84] Yubo Liu, Yangting Lai, Jianyong Chen, Lingyu Liang, [96] Hanan Tanasra, Tamar Rott Shaham, Tomer Michaeli,
and Qiaoming Deng. Scut-autoalp: A diverse bench- Guy Austern, and Shany Barath. Automation in interior
mark dataset for automatic architectural layout parsing. space planning: Utilizing conditional generative adversar-
IEICE TRANSACTIONS on Information and Systems, ial network models to create furniture layouts. Buildings,
103(12):2725–2729, 2020. 13(7):1793, 2023.
[85] Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick, [97] Sepidehsadat Hosseini and Yasutaka Furukawa. Floorplan
Hao Zhang, and Hui Huang. Graph2plan: Learning floor- restoration by structure hallucinating transformer cascades.
plan generation from layout graphs. ACM Transactions on
[98] Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg
Graphics (TOG), 39(4):118–1, 2020.
Mori, and Yasutaka Furukawa. House-gan: Relational gen-
[86] Christina Doumpioti and Jeffrey Huang. Intensive differ- erative adversarial networks for graph-constrained house
ences in spatial design. In 39th eCAADe Conference in Novi layout generation. In Computer Vision–ECCV 2020: 16th
Sad, Serbia, pages 9–16, 2021. European Conference, Glasgow, UK, August 23–28, 2020,
Proceedings, Part I 16, pages 162–177. Springer, 2020.
[87] Merve Akdoğan and Özgün Balaban. Plan generation with
generative adversarial networks: Haeckel’s drawings to pal- [99] Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang,
ladian plans. Journal of Computational Design, 3(1):135– Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa.
154, 2022. House-gan++: Generative adversarial layout refinement
network towards intelligent computational agent for profes-
[88] Ilker Karadag, Orkan Zeynel Güzelci, and Sema Alaçam.
sional architects. In Proceedings of the IEEE/CVF Confer-
Edu-ai: a twofold machine learning model to support
ence on Computer Vision and Pattern Recognition, pages
classroom layout generation. Construction Innovation,
13632–13641, 2021.
23(4):898–914, 2023.
[100] Shidong Wang, Wei Zeng, Xi Chen, Yu Ye, Yu Qiao, and
[89] Can Uzun, Meryem Birgül Çolakoğlu, and Arda İnceoğlu.
Chi-Wing Fu. Actfloor-gan: Activity-guided adversarial
Gan as a generative architectural plan layout tool: A case
networks for human-centric floorplan design. IEEE Trans-
study for training dcgan with palladian plans and evaluation
actions on Visualization and Computer Graphics, 2021.
of dcgan outputs. vol, 17:185–198, 2020.
[101] PEDRO VELOSO, JINMO RHEE, ARDAVAN BIDGOLI,
[90] Sheng-Yang Huang, Enriqueta Llabres-Valls, Aiman
and MANUEL LADRON DE GUEVARA. A pedagogical
Tabony, and Luis Carlos Castillo. Damascus house: Ex-
experience with deep learning for floor plan generation.
ploring the connectionist embodiment of the islamic envi-
ronmental intelligence by design. In eCAADe proceedings, [102] Ziniu Luo and Weixin Huang. Floorplangan: Vector res-
volume 1, pages 871–880. eCAADe, 2023. idential floorplan adversarial generation. Automation in
Construction, 142:104470, 2022.
[91] XY Ying, XY Qin, JH Chen, and J Gao. Generating resi-
dential layout based on ai in the view of wind environment. [103] Morteza Rahbar, Mohammadjavad Mahdavinejad,
In Journal of Physics: Conference Series, volume 2069, Amir HD Markazi, and Mohammadreza Bemanian.
page 012061. IOP Publishing, 2021. Architectural layout design through deep learning and
agent-based modeling: A hybrid approach. Journal of
[92] Linning Xu, Yuanbo Xiangli, Anyi Rao, Nanxuan Zhao,
Building Engineering, 47:103822, 2022.
Bo Dai, Ziwei Liu, and Dahua Lin. Blockplanner: city
block generation with vectorized graph representation. In [104] Yubo Liu, Zhilan Zhang, and Qiaoming Deng. Exploration
Proceedings of the IEEE/CVF International Conference on on diversity generation of campus layout based on gan.
Computer Vision, pages 5077–5086, 2021. In The International Conference on Computational Design
and Robotic Fabrication, pages 233–243. Springer, 2022.
[93] Pedram Ghannad and Yong-Cheol Lee. Automated modu-
lar housing design using a module configuration algorithm [105] Mohammadreza Aalaei, Melika Saadi, Morteza Rahbar,
and a coupled generative adversarial network (cogan). Au- and Ahmad Ekhlassi. Architectural layout generation us-
tomation in construction, 139:104234, 2022. ing a graph-constrained conditional generative adversarial
network (gan). Automation in Construction, 155:105053,
[94] Shuyi Huang and Hao Zheng. Morphological regenera-
2023.
tion of the industrial waterfront based on machine learn-
ing. In 27th International Conference of the Associa- [106] Jiachen Liu, Yuan Xue, Jose Duarte, Krishnendra
tion for Computer-Aided Architectural Design Research Shekhawat, Zihan Zhou, and Xiaolei Huang. End-to-
in Asia (CAADRIA 2022), pages 475–484. The Associa- end graph-constrained vectorized floorplan generation with
tion for Computer-Aided Architectural Design Research in panoptic refinement. In European Conference on Computer
Asia . . . , 2022. Vision, pages 547–562. Springer, 2022.
28
[107] Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- [119] Christina Doumpioti and Jeffrey Huang. Field condition -
sutaka Furukawa. Housediffusion: Vector floorplan gener- environmental sensibility of spatial configurations with the
ation via a diffusion model with discrete and continuous use of machine intelligence. eCAADe proceedings, 2022.
denoising. In Proceedings of the IEEE/CVF Conference [120] Fatemeh Mostafavi, Mohammad Tahsildoost, Zahra Sadat
on Computer Vision and Pattern Recognition, pages 5466– Zomorodian, and Seyed Shayan Shahrestani. An interactive
5475, 2023. assessment framework for residential space layouts using
[108] Pei Sun, Fengying Yan, Qiwei He, and Hongjiang Liu. The pix2pix predictive model at the early-stage building design.
development of an experimental framework to explore the Smart and Sustainable Built Environment, 2022.
generative design preference of a machine learning-assisted
[121] Qiushi He, Ziwei Li, Wen Gao, Hongzhong Chen, Xiaoy-
residential site plan layout. Land, 12(9):1776, 2023.
ing Wu, Xiaoxi Cheng, and Borong Lin. Predictive models
[109] Yubo Liu, Yihua Luo, Qiaoming Deng, and Xuanxing for daylight performance of general floorplans based on cnn
Zhou. Exploration of campus layout based on generative and gan: a proof-of-concept study. Building and Environ-
adversarial network: Discussing the significance of small ment, 206:108346, 2021.
amount sample learning for architecture. In Proceedings
[122] Tomasz Dzieduszyński. Machine learning and complex
of the 2020 DigitalFUTURES: The 2nd International Con-
compositional principles in architecture: Application of
ference on Computational Design and Robotic Fabrication
convolutional neural networks for generation of context-
(CDRF 2020), pages 169–178. Springer, 2021.
dependent spatial compositions. International Journal of
[110] Yuzhe Pan, Jin Qian, and Yingdong Hu. A preliminary Architectural Computing, 20(2):196–215, 2022.
study on the formation of the general layouts on the north-
ern neighborhood community based on gaugan diversity [123] Viktor Eisenstadt, Jessica Bielski, BURAK Mete,
output generator. In Proceedings of the 2020 DigitalFU- CHRISTOPH Langenhan, Klaus-Dieter Althoff, and
TURES: The 2nd International Conference on Computa- ANDREAS Dengel. Autocompletion of floor plans for the
tional Design and Robotic Fabrication (CDRF 2020), pages early design phase in architecture: Foundations, existing
179–188. Springer, 2021. methods, and research outlook. In POSTCARBON-
Proceedings of the 27th CAADRIA Conference, Sydney,
[111] Chao-Wang Zhao, Jian Yang, and Jiatong Li. Generation
pages 323–332, 2022.
of hospital emergency department layouts based on genera-
tive adversarial networks. Journal of Building Engineering, [124] Yueheng Lu, RUNJIA TIAN, AO LI, XIAOSHI WANG,
43:102539, 2021. and GARCIA DEL CASTILLO LOPEZ JOSE LUIS. Or-
[112] Wamiq Para, Paul Guerrero, Tom Kelly, Leonidas J Guibas, ganizational graph generation for structured architectural
and Peter Wonka. Generative layout modeling using con- floor plan dataset. In Presented at the Proceedings of
straint graphs. In Proceedings of the IEEE/CVF interna- the 26th International Conference of the Association for
tional conference on computer vision, pages 6690–6700, Computer-Aided Architectural Design Research in Asia
2021. (CAADRIA), CUMINCAD, pages 81–90, 2021.
[113] Ricardo C Rodrigues and Rovenir B Duarte. Generating [125] Wenjie Liao, Xinzheng Lu, Yuli Huang, Zhe Zheng, and
floor plans with deep learning: A cross-validation assess- Yuanqing Lin. Automated structural design of shear wall
ment over different dataset sizes. International Journal of residential buildings using generative adversarial networks.
Architectural Computing, 20(3):630–644, 2022. Automation in Construction, 132:103931, 2021.
[114] Shuai Dong, Wei Wang, Wensheng Li, and Kun Zou. Vec- [126] Yifan Fei, Wenjie Liao, Shen Zhang, Pengfei Yin, Bo Han,
torization of floor plans based on edgegan. Information, Pengju Zhao, Xingyu Chen, and Xinzheng Lu. Integrated
12(5):206, 2021. schematic design method for shear wall structures: a prac-
[115] Seongyong Kim, Seula Park, Hyunjung Kim, and Kiyun tical application of generative adversarial networks. Build-
Yu. Deep floor plan analysis for complicated drawings ings, 12(9):1295, 2022.
based on style transfer. Journal of Computing in Civil En- [127] Bochao Fu, Yuqing Gao, and Wei Wang. Dual generative
gineering, 35(2):04020066, 2021. adversarial networks for automated component layout de-
[116] Mikhael Johanes and Jeffrey Huang. Deep learning spatial sign of steel frame-brace structures. Automation in Con-
signature inverted gans for isovist representation in archi- struction, 146:104661, 2023.
tectural floorplan. In 40th Conference on Education and Re- [128] Wenjie Liao, Yuli Huang, Zhe Zheng, and Xinzheng Lu. In-
search in Computer Aided Architectural Design in Europe, telligent generative structural design method for shear wall
eCAADe 2022, pages 621–629. Education and research in building based on “fused-text-image-to-image” generative
Computer Aided Architectural Design in Europe, 2022. adversarial networks. Expert Systems with Applications,
[117] Mikhael Johanes and Jeffrey Huang. Generative isovist 210:118530, 2022.
transformer. [129] Xinzheng Lu, Wenjie Liao, Yu Zhang, and Yuli Huang.
[118] Peiyang Su, Weisheng Lu, Junjie Chen, and Shibo Hong. Intelligent structural design of shear wall residence using
Floor plan graph learning for generative design of residen- physics-enhanced generative adversarial networks. Earth-
tial buildings: a discrete denoising diffusion model. Build- quake Engineering & Structural Dynamics, 51(7):1657–
ing Research & Information, pages 1–17, 2023. 1676, 2022.
29
[130] Yifan Fei, Wenjie Liao, Xinzheng Lu, Ertugrul Taciroglu, generation. In Proceedings of the IEEE/CVF Interna-
and Hong Guan. Semi-supervised learning method incor- tional Conference on Computer Vision, pages 11956–
porating structural optimization for shear-wall structure de- 11965, 2021.
sign using small and long-tailed datasets. Journal of Build-
[145] Zhenlong Du, Haiyang Shen, Xiaoli Li, and Meng Wang.
ing Engineering, 79:107873, 2023.
3d building fabrication with geometry and texture coordi-
[131] Pengju Zhao, Wenjie Liao, Yuli Huang, and Xinzheng nation via hybrid gan. Journal of Ambient Intelligence and
Lu. Intelligent design of shear wall layout based on Humanized Computing, pages 1–12, 2020.
graph neural networks. Advanced Engineering Informatics,
55:101886, 2023. [146] Qiu Yu, Jamal Malaeb, and Wenjun Ma. Architectural
facade recognition and generation through generative ad-
[132] Wenjie Liao, Xinyu Wang, Yifan Fei, Yuli Huang, Lin- versarial networks. In 2020 International Conference on
lin Xie, and Xinzheng Lu. Base-isolation design of Big Data & Artificial Intelligence & Software Engineering
shear wall structures using physics-rule-co-guided self- (ICBASE), pages 310–316. IEEE, 2020.
supervised generative adversarial networks. Earthquake
Engineering & Structural Dynamics, 2023. [147] Cheng Lin Chuang, Sheng Fen Chien, et al. Facilitating
architect-client communication in the pre-design phase. In
[133] Pengju Zhao, Wenjie Liao, Hongjing Xue, and Xinzheng Projections-Proceedings of the 26th International Confer-
Lu. Intelligent design method for beam and slab of shear ence of the Association for Computer-Aided Architectural
wall structure based on deep learning. Journal of Building Design Research in Asia, CAADRIA 2021, volume 2, pages
Engineering, 57:104838, 2022. 71–80. The Association for Computer-Aided Architectural
[134] Yifan Fei, Wenjie Liao, Yuli Huang, and Xinzheng Lu. Design Research in Asia . . . , 2021.
Knowledge-enhanced generative adversarial networks for
[148] Cheng Sun, Yiran Zhou, and Yunsong Han. Automatic
schematic design of framed tube structures. Automation in
generation of architecture facade for historical urban ren-
Construction, 144:104619, 2022.
ovation using generative adversarial network. Building and
[135] Immanuel Koh. Architectural plasticity: the aesthetics of Environment, 212:108781, 2022.
neural sampling. Architectural Design, 92(3):86–93, 2022.
[149] Lei Zhang, Liang Zheng, Yile Chen, Lei Huang, and Shihui
[136] Michael Hasey, Jinmo Rhee, and Daniel Cardoso Llach. Zhou. Cgan-assisted renovation of the styles and features
Form data as a resource in architectural analysis: an ar- of street facades—a case study of the wuyi area in fujian,
chitectural distant reading of wooden churches from the china. Sustainability, 14(24):16575, 2022.
carpathian mountain regions of eastern europe. Digital Cre-
ativity, 34(2):103–126, 2023. [150] Hongpan Lin, Linsheng Huang, Yile Chen, Liang Zheng,
Minling Huang, and Yashan Chen. Research on the appli-
[137] Ingrid Mayrhofer-Hufnagl and Benjamin Ennemoser. From cation of cgan in the design of historic building facades in
linear to manifold interpolation. urban renewal—taking fujian putian historic districts as an
[138] Benjamin Ennemoser and Ingrid Mayrhofer-Hufnagl. De- example. Buildings, 13(6):1478, 2023.
sign across multi-scale datasets by developing a novel ap- [151] JIAXIN ZHANG, TOMOHIRO FUKUDA, NOBUYOSHI
proach to 3dgans. International Journal of Architectural YABUKI, and YUNQIN LI. Synthesizing style-similar
Computing, page 14780771231168231, 2023. residential facade from semantic labeling according to the
[139] DONGYUN KIM, LLOYD SUKGYO LEE, and HANJUN user-provided example.
KIM. Elemental sabotage.
[152] Wenyuan Sun, Ping Zhou, Yangang Wang, Zongpu Yu, Jing
[140] Hang Zhang and Ye Huang. Machine learning aided 2d-3d Jin, and Guangquan Zhou. 3d face parsing via surface
architectural form finding at high resolution. In Proceed- parameterization and 2d semantic segmentation network,
ings of the 2020 DigitalFUTURES: The 2nd International 2022.
Conference on Computational Design and Robotic Fabri-
[153] Da Wan, Runqi Zhao, Sheng Zhang, Hui Liu, Lian Guo,
cation (CDRF 2020), pages 159–168. Springer, 2021.
Pengbo Li, and Lei Ding. A deep learning-based approach
[141] Hang Zhang and E Blasetti. 3d architectural form style to generating comprehensive building façades for low-rise
transfer through machine learning (full version), 2020. housing. Sustainability, 15(3):1816, 2023.
[142] KE Asmar and Harpreet Sareen. Machinic interpolations: [154] JIAHUA DONG, QINGRUI JIANG, ANQI WANG, and
a gan pipeline for integrating lateral thinking in computa- YUANKAI WANG. Urban cultural inheritance.
tional tools of architecture. In Proceedings of the 24th Con-
ference of the Iberoamerican Society of Digital Graphics, [155] Shengyu Meng. Exploring in the latent space of design:
Online, pages 18–20, 2020. A method of plausible building facades images genera-
tion, properties control and model explanation base on
[143] Hang Zhang. Text-to-form. 08 2021. stylegan2. In Proceedings of the 2021 DigitalFUTURES:
[144] Kai-Hung Chang, Chin-Yi Cheng, Jieliang Luo, Shingo The 3rd International Conference on Computational De-
Murata, Mehdi Nourbakhsh, and Yoshito Tsuji. Building- sign and Robotic Fabrication (CDRF 2021) 3, pages 55–68.
gan: Graph-conditioned architectural volumetric design Springer Singapore, 2022.
30
[156] Selen Çiçek, Gozde Damla Turhan, and Aybüke Taşer. [170] Sachith Seneviratne, Damith Senanayake, Sanka Ras-
Deterioration of pre-war and rehabilitation of post-war nayaka, Rajith Vidanaarachchi, and Jason Thompson.
urbanscapes using generative adversarial networks. In- Dalle-urban: Capturing the urban design expertise of large
ternational Journal of Architectural Computing, page text to image transformers. In 2022 International Confer-
14780771231181237, 2023. ence on Digital Image Computing: Techniques and Appli-
cations (DICTA), pages 1–9. IEEE, 2022.
[157] Zhenhuang Cai, Yangbin Lin, Jialian Li, Zongliang Zhang,
and Xingwang Huang. Building facade completion using [171] Jonathan Dortheimer, Gerhard Schubert, Agata Dalach,
semantic-synchronized gan. In 2021 IEEE International Lielle Brenner, and Nikolas Martelaro. Think ai-side the
Geoscience and Remote Sensing Symposium IGARSS, box!
pages 6387–6390. IEEE, 2021. [172] Emmanouil Vermisso. Semantic ai models for guiding
[158] Xue Sun, Yue Wang, Ting Zhang, Yin Wang, Haoyue Fu, ideation in architectural design courses. In ICCC, pages
Xuechun Li, and Zhen Liu. An application of deep neural 205–209, 2022.
network in facade planning of coastal city buildings. In In- [173] Daniel Bolojan, Emmanouil Vermisso, and Shermeen
ternational Conference on Computer Science and its Appli- Yousif. Is language all we need? a query into architec-
cations and the International Conference on Ubiquitous In- tural semantics using a multimodal generative workflow.
formation Technologies and Applications, pages 517–523. In POST-CARBON, Proceedings of the 27th International
Springer, 2022. Conference of the Association for Computer-Aided Archi-
[159] Frederick Chando Kim and Jeffrey Huang. Perspectival tectural Design Research in Asia (CAADRIA), volume 1,
gan. pages 353–362, 2022.
[174] Dongyun Kim. Latent morphologies: Encoding architec-
[160] J Chen and R Stouffs. From exploration to interpreta-
tural features and decoding their structure through artificial
tion: Adopting deep representation learning models to la-
intelligence. International Journal of Architectural Com-
tent space lnterpretation of architectural design alternatives.
puting, page 14780771231209458, 2022.
2021.
[175] Kaiyu Cheng, Paulina Neisch, and Tong Cui. From con-
[161] Wolf dPrix, Karolin Schmidbaur, Daniel Bolojan, and cept to space: a new perspective on aigc-involved attribute
Efilena Baseta. The legacy sketch machine: from arti- translation. Digital Creativity, 34(3):211–229, 2023.
ficial to architectural intelligence. Architectural Design,
92(3):14–21, 2022. [176] Jeffrey Huang, Mikhael Johanes, Frederick Chando Kim,
Christina Doumpioti, and Georg-Christoph Holz. On gans,
[162] Ruşen Eroğlu and Leman Figen Gül. Architectural form ex- nlp and architecture: combining human and machine in-
plorations through generative adversarial networks. Legal telligences for the generation and evaluation of meaning-
Depot D/2022/14982/02, page 575, 2022. ful designs. Technology— Architecture+ Design, 5(2):207–
[163] Nervana Osama Hanafy. Artificial intelligence’s effects 224, 2021.
on design process creativity:” a study on used ai text-to- [177] DONGYUN KIM, GEORGE GUIDA, JOSE LUIS
image in architecture”. Journal of Building Engineering, GARCÍA, and DEL CASTILLO Y LÓPEZ. Participatory
80:107999, 2023. urban design with generative adversarial networks.
[164] Ville Paananen, Jonas Oppenlaender, and Aku Visuri. [178] Yeji Hong, Somin Park, Hongjo Kim, and Hyoungkwan
Using text-to-image generation for architectural design Kim. Synthetic data generation using building information
ideation. arXiv preprint arXiv:2304.10182, 2023. models. Automation in Construction, 130:103871, 2021.
[165] Hanım Gülsüm Karahan, Begüm Aktaş, and Cemal Koray [179] X. Zhuang. Rendering sketches - interactive rendering gen-
Bingöl. Use of language to generate architectural scenery eration from sketches using conditional generative adver-
with ai-powered tools. In International Conference on sarial neural network. Proceedings of the 40th Interna-
Computer-Aided Architectural Design Futures, pages 83– tional Conference on Education and Research in Computer
96. Springer, 2023. Aided Architectural Design in Europe (eCAADe) [Volume
1], 2022.
[166] Junming Chen, Zichun Shao, and Bin Hu. Generating inte-
rior design from text: A new diffusion model-based method [180] Yuqian Li and Weiguo Xu. Using cyclegan to achieve the
for efficient creative design. Buildings, 13(7):1861, 2023. sketch recognition process of sketch-based modeling. In
Proceedings of the 2021 DigitalFUTURES: The 3rd Inter-
[167] Frederick Chando Kim, Mikhael Johanes, and Jeffrey national Conference on Computational Design and Robotic
Huang. Text2form diffusion. Fabrication (CDRF 2021) 3, pages 26–34. Springer, 2022.
[168] Gernot Riether and Taro Narahara. Ai tools to synthesize [181] Xinyue Ye, Jiaxin Du, and Yu Ye. Masterplangan: Facilitat-
characteristics of public spaces. ing the smart rendering of urban master plans via generative
[169] Junming Chen, Duolin Wang, Zichun Shao, Xu Zhang, adversarial networks. Environment and Planning B: Urban
Mengchao Ruan, Huiting Li, and Jiaqi Li. Using artificial Analytics and City Science, 49(3):794–814, 2022.
intelligence to generate master-quality architectural designs [182] YUQIAN LI1and WEIGUO XU. Research on architectural
from text descriptions. Buildings, 13(9):2285, 2023. sketch to scheme image based on context encoder.
31
[183] Yingbin Gui, Biao Zhou, Xiongyao Xie, Wensheng Li, and
Xifang Zhou. Gan-based method for generative design of
visual comfort in underground space. In IOP Conference
Series: Earth and Environmental Science, volume 861,
page 072015. IOP Publishing, 2021.
[184] Ka Chun Shum, Hong-Wing Pang, Binh-Son Hua,
Duc Thanh Nguyen, and Sai-Kit Yeung. Conditional 360-
degree image synthesis for immersive indoor scene decora-
tion. In Proceedings of the IEEE/CVF International Con-
ference on Computer Vision, pages 4478–4488, 2023.
[185] Matias del Campo. Deep house-datasets, estrangement, and
the problem of the new. Architectural Intelligence, 1(1):12,
2022.
[186] Matias Del Campo, Sandra Manninger, and Alexandra
Carlson. Hallucinating cities: a posthuman design method
based on neural networks. In Proceedings of the 11th an-
nual symposium on simulation for architecture and urban
design, pages 1–8, 2020.
[187] Wenliang Qian, Yang Xu, and Hui Li. A self-sparse genera-
tive adversarial network for autonomous early-stage design
of architectural sketches. Computer-Aided Civil and Infras-
tructure Engineering, 37(5):612–628, 2022.
[188] Sisi Han, Yuhan Jiang, Yilei Huang, Mingzhu Wang, Yong
Bai, and Andrea Spool-White. Scan2drawing: Use of
deep learning for as-built model landscape architecture.
Journal of Construction Engineering and Management,
149(5):04023027, 2023.
[189] Chitwan Saharia, William Chan, Saurabh Saxena, Lala
Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour,
Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali-
mans, et al. Photorealistic text-to-image diffusion models
with deep language understanding. Advances in Neural In-
formation Processing Systems, 35:36479–36494, 2022.
[190] Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel.
Multidiffusion: Fusing diffusion paths for controlled image
generation. 2023.
[191] Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xi-
aofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huap-
ing Liu, and Guosheng Lin. Gaussianeditor: Swift and con-
trollable 3d editing with gaussian splatting. arXiv preprint
arXiv:2311.14521, 2023.
32