AI Art in Architecture

Uploaded by

binhinindia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views11 pages

AI Art in Architecture

Uploaded by

binhinindia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Ploennigs and Berger AI in Civil Engineering (2023) 2:8 AI in Civil Engineering

https://doi.org/10.1007/s43503-023-00018-y

ORIGINAL ARTICLE Open Access

AI art in architecture
Joern Ploennigs1* and Markus Berger1

Abstract
Recent diffusion-based AI art platforms can create impressive images from simple text descriptions. This makes them
powerful tools for concept design in any discipline that requires creativity in visual design tasks. This is also true
for early stages of architectural design with multiple stages of ideation, sketching and modelling. In this paper, we
investigate how applicable diffusion-based models already are to these tasks. We research the applicability of the plat-
forms Midjourney, DALL·E 2 and Stable Diffusion to a series of common use cases in architectural design to determine
which are already solvable or might soon be. Our novel contributions are: (i) a comparison of the capabilities of public
AI art platforms; (ii) a specification of the requirements for AI art platforms in supporting common use cases in civil
engineering and architecture; (iii) an analysis of 85 million Midjourney queries with Natural Language Processing
(NLP) methods to extract common usage patterns. From this we derived (iv) a workflow for creating images for inte-
rior designs and (v) a workflow for creating views for exterior design that combines the strengths of the individual
platforms.
Keywords Image generation, Diffusion models, Natural language processing, Architecture

1 Introduction one of these platforms with Natural Language Process-

1.1 Motivation ing (NLP) workflows. Finally, we present a collection of
Recent versions of AI art generation tools are reach- case studies in which we apply the practical experience
ing levels of output quality that allow them to support we collected in working with these systems.
architects and designers in parts of their daily work. This The novel contributions of this paper are thus:
gained them the attention of several architects across
the globe, reflected also in a special edition of the AEC • a comparison of the technology of three leading AI
magazine.1 art platforms;
Beyond this public discussion, we want to take a deeper • an analysis of how well current AI art tools can han-
look into the current capabilities of this technology and dle different use cases;
analyse qualitatively as well as quantitatively what ben- • a mapping for what specific design tasks each plat-
efits it offers to architects now and in the future. In this form supports;
paper we therefore review the technology behind the • an NLP analysis of how these platforms are used for
three leading commercial AI art platforms and evaluate architecture today;
what use cases they currently support in architecture. • a collection of practical workflows for specific archi-
We investigate how users are already using these tools tectural design tasks.
by analysing more than 85 million public queries from

1
https://aecmag.com/technology/ai-special-edition-of-aec-magazine/.

*Correspondence:
Joern Ploennigs
[email protected]
1
AI for Sustainable Construction, University of Rostock, Rostock, Germany

© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/.
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 2 of 11

1.2 State of the art in generative methods models (img2img) are used. They either change the style
The potential benefits of AI art platforms for creative or arrangement of the image based on the text prompt.
work is hard to overstate. These AI art platforms are all If a certain part of the original image was deleted, the
using generative machine learning models, specifically model can replace it with entirely new content based on
text-to-image generative models. Despite their speciali- the prompt. This approach is called inpainting by Lug-
zation on generating images, many are based on the nat- mayr et al. (2022). A similar approach is outpainting, also
ural language model GPT-3. GPT-3 is trained to generate called uncropping by Saharia et al. (2022), which adds
text that completes a textual input query Brown et al. additional content outside the image. If a user is request-
(2020). More precisely, it predicts the next possible coming changes to the original image without manually
binations of words to an input text. The specific Image deleting or masking out parts, then this is called image
GPT-3 models used by AI art platforms are trained to editing. Image editing is not yet available in commercial
instead predict the next cluster of pixels, called patch, in AI art platforms, but there is recent work in single-image
an image for a given input text. editing through text prompt, for example by Kawar et al.
The most recent generation of generative models (2022). Is another diffusion step applied to add more
combines natural language and so-called diffusion mod- details to the image in a higher resolution then it is up-
els. The idea for diffusion models was first proposed in scaled, a method also called super-resolution by Saharia
Sohl-Dickstein et al. (2015), in which (structured) image et al. (2022). Often platforms offer multiple or all these
information is slowly destroyed through a forward diffu- methods, with configurable weights between the individ-
sion process that introduces noise into the image data and ual images and words.
then generated anew through a reversed diffusion process. Assessing the quality of all these models and architec-
This reverse process generates completely new image tures systematically is difficult. Attempts at quantitative
data, as the original information was fully destroyed by evaluation are being made such as Borji (2022). However,
noise. This approach was constantly improved over the in such cases it is difficult to evaluate subjective metrics
years with a strong focus on optimizing the underlying like style, i. e., whether an oil painting style looks better
neural network architectures Ho et al. (2020), resulting than a photorealistic one.
in several variants, like OpenAI’s GLIDE model Nichol As for research into use cases from architecture, Senev-
et al. (2022). It consists of an encoder that creates a text iratne et al. (2022) describe a systematic grammar for
encoding based on the user prompt, a model implement- using DALL·E for the purpose of generating images in
ing the diffusion based on this text encoding, as well as the context of urban planning. They open-sourced 11.000
an upsampler that upscales and denoises the result. Cur- images generated with Stable Diffusion and 1.000 created
rent diffusion models often implement the process of with DALL·E by that grammar. They found that, though,
text encoding and the association of those text encodings many realistic images could be generated, the model has
with image parts with the CLIP (Contrastive Language weaknesses in creating real-world scenes with a high level
Image Pre-training) architecture presented by Radford of detail. But, these models advance quickly and DALL·E
et al. (2021) and used by DALL·E 1. 2 outperforms DALL·E 1 significantly.
One of the currently most advanced incarnation of the Recent progress was also made in generating video
technology is DALL·E 2 (from here on out simply referred (Ho et al., 2022), 3D models via point clouds (Luo and
to as DALL·E), based on the unClip method developed by Hu, 2021; Zhou et al., 2021; Zeng et al., 2022), and even
Ramesh et al. (2022). It uses one image encoder for both 3D animation data (Tevet et al., 2022). While this is
text and images into a diffusion-based joint representa- beyond the scope of this paper, especially the generation
tion space (the prior). The image generation is done by of 3D models will be revolutionary for architecture. The
a similarly trained decoder, which translates the prior’s 3D workflows would be similar to the image-based work-
encoding back into an image. Another main platform flows we present.
in the field is the open-source model Stable Diffusion,
which is also based on the CLIP text encoder. The third 2 Model Architectures and Interfaces
contender, Midjourney, does not published their models, Although the concept of generative art has been a
but it is assumed that it is using a similar structure. research area for years, it only entered public percep-
This kind of diffusion model architecture can solve var- tion with the advent of publicly available diffusion model
ious image generation tasks. Is a completely new image platforms like DALL·E, Midjourney or Stable Diffusion,
generated from a user-written text prompt then a text- which combine txt2img, img2img, inpainting and upscal-
to-image model (txt2img) is used. Is an existing image ing into easy-to-use workflows.
modified based on a text prompt then Image-to-image
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 3 of 11

Fig. 1 Model architecture and image generation process in different models. Grey elements show the AI workflow, coloured elements the user
interaction

These models are not only competing on the technical best results in multiple ways, with limited possibilities to
level, but also in terms of user experience. Midjourney2 remix the original text prompt throughout.
directly interacts with its community by sharing queries In contrast, both DALL·E and Stable Diffusion allow
across public (or private) channels in the Discord mes- direct editing of uploaded images or previous results.
saging app. They do not provide a dedicated user inter- They do not provide traditional image editing tools like
face, but simply return the generated images as a chat drawing, filling, layering, or stamping. Instead, all image
response to the query. Direct interactions are possible editing must be done by img2img-based operations like
with attached links that usually result in new queries. erasing sections (inpainting) or extending the canvas
In contrast, DALL·E 23 is only accessible by individu- (outpainting). All networks have ways to create images
als through a dedicated web-based user interface with of different sizes and aspect ratios, either by specify-
authoring and editing tools. It provides a simply query ing the size in the query or by altering it later through
interface without additional query parameters. Stable outpainting.
Diffusion on the other hand is released as open source, Notably, image generation only takes a few tens of sec-
which fosters a plethora of community-created tools that onds in all models, making it fast enough to use in crea-
are usually used more than the official web-interface4. tive sessions alone or with clients. All three models also
The internal model architecture as well as the interface allow importing external images in some capacity. There-
paradigm influences how these models can be utilized. In fore, they can easily be combined into composite work-
Fig. 1 we illustrate some of the similarities and differences flows, as shown in Sect. 5.2.
between DALL·E 2, Stable Diffusion (v2.1), and Mid- One aspect not included in Fig. 1 is the training data.
journey (v4). The core workflow of foundational models There is little information on the training data used for
in grey are similar across technologies as stated before. DALL·E and Midjourney. However, Stable Diffusion was
Differences lie in the workflows that they provide, which trained on the LAION-5B dataset (Rombach et al., 2022),
often results from their different interface approaches. which is based on image and text data scraped from the
It is apparent in Fig. 1 that while the core workflow in web. Similar internet datasets are very likely the source
grey is similar, the ways to refine their results do vastly for Midjourney and DALL·E. However, it is evident that
differ. Only Midjourney offers successive upscaling of either biased by the training data or the training process,
resolution and does so with multiple different sizes. Thus, these models have developed very different image styles.
Midjourney’s workflow focuses on generating and com- DALL·E and Stable Diffusion are good in generating both
paring different image variants and then upsampling the drawn images as well as photorealistic outputs. Midjour-
ney tends towards a more artistic style, especially with
earlier model versions. But, with carefully chosen key-
words most styles can now be targeted by all models.
2
https://midjourney.com.
3
https://openai.com/dall-e-2.
4
https://beta.dreamstudio.ai.
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 4 of 11

Table 1 Top part—comparison of platforms with their supported features; Lower part—Mapping of architectural use cases to
features
Model txt2img img2img In-/Out-paint. Editing Upscaling Semantics
DALL·E 2 ♦ ♦ ♦
Midjourney v4 ♦ ♦ ♦
Stable Diffusion v2.1 ♦ ♦
Use Case
Ideation / / ♦/ ♦/♦ ♦/ ♦/♦
Sketches / / / ♦/♦ ♦/ /♦
Collages ♦/♦ /♦ / /♦ ♦/♦ ♦/♦
Image Combination ♦/♦ /♦ / /♦ /♦ ♦/♦
Build Variants ♦/♦ / / ♦ ♦/♦ /♦
Style Variants ♦/♦ / ♦/♦ ♦/♦ / ♦/♦
Construction Plans /♦ ♦/♦ ♦/♦ ♦ /♦ /♦
Exterior Design / / / /♦ / /♦
Interior Design / / / /♦ / /♦
Creating Textures / ♦/ ♦/♦ /♦ / ♦/♦
full, limited, bad, or no support; high, some, low, or no importance; versus (/) how well it works: well, somewhat, a little, not at all

3 Architectural use cases through inpainting for individual items, but not
Given the discussed differences of the platforms, they through generic requests like: “Add many people”.
vary in the architectural use cases that they support. • Image combination: Taking multiple existing image
To analyse this, we collected a series of use cases where elements (for example multiple buildings), arranging
architects and planners normally create or edit images. them on a canvas and then creating a coherent com-
A direct comparison between the platforms and the use posite image.
cases is often complicated, because the style and quality • Build variants: Taking an existing sketch or picture
of the results heavily depends on the input prompts. We and generating versions in which certain elements
identified that the more differentiating factor is, whether are altered (like adding a garage). This works well
a platform supports a specific technical feature that is through inpainting.
required to realize the use case. • Style variants: Taking an existing image and trans-
Therefore, we evaluated for each use cases what fea- forming its style (e. g., a sketch to photorealistic art
tures they require and how qualitatively well they are deco) without changing content. This works well
supported by each platform. In addition to the image with certain models.
operations explained in the previous chapter, we also • Construction plans: Creating detailed layout plans
consider support for architectural semantics, i. e., struc- to establish spatial relations. This rarely works as
tured knowledge that goes beyond common image train- the models do not understand the semantics of line
ing datasets. Table 1 shows the results for the use cases styles, areas, etc.
that we discuss below: • Exterior design: Finding style and feeling for a build-
ing and the surrounding area/landscape. This works
• Ideation: Developing ideas by randomly generating well for common scenarios.
images for inspirations. This is what txt2img mod- • Interior design: Finding a style or feeling for an inte-
els are made for and it works splendidly. Additional rior space. This also works well in many scenarios.
image prompts can add style and object references. • Creating txtures: Creating tiled patterns to serve as
• Sketches: Drawing architectural sketches with specific surface materials for 2D or 3D models. This is cur-
target style and items. This works well for common rently a unique feature of Midjourney.
examples in the training data, but, less so with spe-
cific requests. Even though Stable Diffusion seems to support fewer
• Collages: Combining and filling existing images with features than DALL·E 2, it’s main advantage can not be
life by adding people and objects. This can be done overstated: It is possible to run it locally and train it on
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 5 of 11

own data to introduce e. g. new architectural concepts. “creative” or “full” have high overall frequency, but, low
Together with the high quality of Stable Diffusions out- frequency in architectural context. Other terms like
puts, this makes it the most potent of the three models to “architecture”, “interior”, “house” do only occur exclu-
be specialized on architectural drawings. sively within our filtered results as they are part of our
keyword list.
4 Analysis of architectural queries Figure 2(b) shows our extended keyword list and their
We also explored how people use these AI art platforms respective frequency. As we filter on these keywords,
in practice. We analysed about 85 million user queries their total frequency is identical with the filtered one
that we collected over one year since Jan. 30th , 2022 from and we do not display it. It is of note that “architecture”
Midjourney. It is the only platform for which many user and “interior” keywords are the most and third frequent
queries are publicly visible. Midjourney uses the Discord keyword. It is notable that “interior” is six times more
messaging app as its main interface, which allowed us to popular than “exterior” design as keyword, but, this
monitor the public channels for queries that we consider simply may be that users refer to it implicitely trough
to be of an architectural nature. We selected queries con- “architecture”.
taining either the word “architect”, “interior” or “exterior” Figure 2(c) shows the frequency of famous architects
design or one of 38 architectural keywords like “building”, that we extracted from 430.333 queries that referred to
“facade”, or “construction” (listed in Fig. 2 (b) and5). We at least one of them. Zaha Hadid is by far the most fre-
identified these keywords by selecting only those from quently queried architect, given her well recognizable
architectural glossaries6, 7 that co-occurred in at least parametric style that is popular in the community of
10 % of all cases with “architect”, “interior” or “exterior” people experimenting with AI tools. Michelangelo and
in the queries. We also added to the list of keywords the William Morris are second and third. The low red bar
names of 941 famous architects from Wikipedia8 as we shows that they are usually not used in architectural
noted that several queries refer to their style by naming context but for their art contributions. Adrian Smith
them. By applying these filters, we identified 5.7 mil- is also often used in other context probably referring to
lion queries (6.7 %) with potential architectural intent the musician. The architects Frank Lloyd Wright, Tadao
including 2.2 million queries (2.6 %) explicitly containing Ando, Frank Gehry, Antoni Gaudi, Lebbeus Woods, and
“architect”, “interior” or “exterior” design. Peter Zumthor complete the top 10 and are often used in
In the next steps we filter out stopwords (e. g. “a”, “and”, explicit architectural context given the strong red bar.
“the”) and build a Word2Vec model (Mikolov et al., 2013) Figure 2(d) shows the links between keywords and the
from these queries to get a model of the occurrence and most likely connected term. We analysed this by predict-
co-occurrence of terms in these queries. For under- ing with the Word2Vec for each keyword on the left the
standing the results, it is important to know that most most probably co-located word on the right, weighted
Midjourney users do not formulate full sentences, but by probability. Interesting combinations here are links
a prompt is rather a collection of terms that refer to the between interior-design, floor-plan-drawing, architec-
content, style, or render quality of the targeted image. ture-visualization, cathedral-gothic, or swimming-pool.
Figure 2 visualizes the main results of our analysis. Fig- From this it is possible to build an of auto-complete func-
ure 2(a) shows the most frequent terms used in the fil- tion for architectural queries.
tered queries. The colour blue represents the frequency Figure 2(e) shows the mean length of queries depend-
across all 85 million queries, red is the frequency within ing on whether they got upscaled, remastered or left in
the filtered 5.7 million queries and green within 2.2 mil- draft mode. A draft mode image is of low image size,
lion queries explicitly containing “architect”, “interior” so users will normally upscale or remaster them if
or “exterior” design. Note that red, green, and blue are they like one of the variants. It is notable that for the
mutably inclusive and overlap. The top 10 of words has a medium upscale options as well as the remastered ver-
similar frequency across all three classes. Many of those sion the mean query length increases above 35 terms
refer to Midjourney style commands like “detailed”, “real- per query in comparison to 30 terms for draft mode
istic”, or “cinematic”. However, some terms like “black”, queries. We also manually classified the most frequent
150 terms into the categories: style, content, quality. It
is notable that for the upscale and refined queries, the
5
Keywords are not listed in Fig. 2(b) due to low count: balcon, basilica, bat- percentage of style terms increases significantly from
tlement, buttress, gable, hvac, latticework, livingroom, minaret, panelling, 6.6 % to 8.3 %.
pavilion, plinth, rotunda, spire.
6
Figure 2(f ) shows the mean number of iterations
https://www.heritage.nf.ca/articles/society/architectural-terms.php.
7
needed to develop a query. We classify a query as itera-
https://en.wikipedia.org/wiki/Glossary_of_architecture.
8 tion if the same user is rerunning the same or extended
https://en.wikipedia.org/wiki/List_of_architects.
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 6 of 11

Fig. 2 Results from analysing Midjourney queries

query within 30 min. 54 % of all unique queries (sin- • these queries are usually refined in more than 4 itera-
gle, 852k) are run only once. The other half of the que- tions in mean;
ries are improved in multiple iterations. Queries that • these queries usually contain more keywords specify-
remain in draft mode require about 3.7 steps. 7.8 % of ing style and quality.
the queries are good enough to be upscaled require
about 6.75 steps in total. They are upscaled after 4.1
draft steps into different variants (light, medium/beta,
max). 5.3 % queries that are remastered take about 5.1 5 Case studies
iterations. They have only 1.4 draft mode queries, but In this section, we present some refined workflows for
1.2 remastering steps, and 2.3 final upscale steps. architects to utilize the strengths of all three AI art plat-
This analysis illustrates that users do not come-up forms. These workflows are based on the learning of our
with perfect queries from scratch. We can derive mul- analysis and by a large number of experiments to iden-
tiple insights: tify the workflows leading to the best results. It should
be noted that Midjourney v3 is in use here, which raises
• 6.7% (5.7 million) of all queries are related to archi- the importance of the remastering step. Remastering
tecture; can usually be skipped as of model version 4. As we have
• “architecture” is the most popular keyword, identified in the analysis, users rarely run only a sin-
and “interior” is six times more popular than “exte- gle query and gain a perfect result—instead they iterate
rior” as a keyword; many times. This requires a good understanding of effec-
• only 13.1 % of all unique queries upscaled or remas- tive workflows to avoid dead ends and come to adequate
tered; results quickly.
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 7 of 11

5.1 Interior design—comparing the models improves material quality and overall detail, but the cen-
First, we will look at how the models perform purely on tral sofa remains as an incoherent form in the centre of
their own. The example will be an interior-design sce- the image. Anytime the normal image output does not
nario. We start with a simple query without any special attain a sufficient level of cohesion and realism, we can
command of a platform. On all three we try to generate a invoke the “remaster” step, shown upscaled in (d). How-
high-quality rendering of a room using the prompt “cozy ever, even this last result contains smaller perspective
living room, wood paneling, television, large sofa, natu- errors, which are difficult to fix without intervention
ral light, lived in, realistic, full view”. We developed this through manual image editing.
prompt by watching common prompting patterns in the The DALL·E 2 first results in Fig. 3(e) shows that it
Midjourney query data and testing out different iterations struggles to correctly response to the “realistic” term in
across models until we arrived at that final version. Each the query. Once one of the two realistic variants is picked,
comma is considered as topic separator by the AI art plat- the next variant generation step in (f ) creates more use-
forms. Thus, we are asking for an image of a (i) cozy liv- ful results. To remove perspective or coherence errors
ing room that (ii) has wood paneling; (iii) contains a TV; we mask out certain areas of the image in (g) to generate
(iv) a large sofa, etc. With this we ensure that the resulting inpaint variants. The final variant is shown in (h). Of note
image should contain similar elements across platforms. is that even the inpainted regions react correctly to the
Midjourney starts out with several results that are styl- previously established lighting of the scene. This result
ized or of strange perspective, visible in Fig. 3(a, b). The is of good quality, but cannot be upscaled any further
first upscale of the chosen interior design in (c) greatly within the web interface.

(a) Midjourney original (b) Variant selection (c) Upscale (d) Remaster and
query maximum upscale

(e) DALL·E original (f) Variant selection (g) Erase tool to remove (h) Final inpaint with
query artifacts added “table”

(i) Stable Diffusion (j) Resize selection and (k) Variants of (l) Final selection
original query Erase artifacts in-/outpainting
Fig. 3 Minimal workflow for Midjourney (a–d), DALL·E 2 (e–h), and Stable Diffusion (i–l) for the given query
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 8 of 11

Stable Diffusion starts out in Fig. 3(i) with much In contrast, DALL·E and Stable Diffusion provide
stronger results than its two contenders, generating more control over changes with their in- and outpaint-
images that incorporate the “realistic” and “lived in” ing tools. Once we understand these tools and strengths
aspect of the query very well. These rooms look like of the platforms, we can combine them into a more flex-
inhabited and not like artificial renderings. However, ible workflow. In the following ideation workflow, we will
all images seem like close-ups of a proper interior view. start with a Midjourney prompt to create a desired scene.
Which is why we do not just use inpainting in (j) to fix While Stable Diffusion tends to generate better looking
errors, but also add additional canvas space for outpaint- first results, Midjourney is a strong contender once the
ing. This results in some quite incoherent variants for the remaster step is done, and with its workflow excels at
outpainted areas. After some additional iterations the free-form ideation, which makes it the most appropri-
variant in (l) was selected as the best. ate starting point for an exterior design. From the Mid-
Overall, Stable Diffusion performs best in this scenario. journey remaster stage, we will then refine any errors or
All of its first variants were realistic and showed no real undesired results through in-/outpainting in DALL·E and
deficiencies in the later steps. Only during the final out- Stable Diffusion to attain the targeted result. An overview
painting stage it was necessary to manually smooth the of the workflow is provided in Fig. 4 and we will explain
transitions. Midjourney generated a final result of simi- it along the results in Fig. 5. The workflow highlights
lar quality, however offered a weaker beginning selection, how the different image editing steps can be combined
with not all images containing all prompted elements to get the best results independent of the AI tool used.
(e. g. missing TV) and frequent perspective errors and The specific models most appropriate for each step may
other visual faults. change with newer versions. The logic behind the work-
flow is targeting the best image quality, by: (1) generating
5.2 Exterior design—combining the models the image; (2) upscaling it; (3) extending the canvas; (4)
The best results can be archived by combining the finally editing details with inpainting.
strengths of all three models and knowing their spe- Figure 5(a–c) shows the beginning stages of an idea
cific command keywords. For example, one of the first as generated in Midjourney. The query that led to this
hints that DALL·E shows new users is the information particular result was “single-family home with garden,
that the keyword “digital art” can significantly improve full exterior view, modern architecture, photo, sunlight”.
many prompts, which is deeply related to the data it was Multiple keyword arrangements and weights on different
trained on. terms were tried before this result was selected, remas-
In the case of Midjourney, refinement starts as a pro- tered and then upscaled. It is also possible to start with
cess of including and removing certain phrases within one or more reference images, though that technique was
the prompt to get as close to a desired style as we can. omitted here.
These phrases can be very convoluted. It often helps to The result was then uploaded to the DALL·E, where
include the kinds of modelling software that would cre- it was outpainted in Fig. 5(d) to create a wider viewing
ate the desired kind of image (like “octane render” or angle and subsequently inpainted in Fig. 5(e, f ) to replace
“cinema3D”) or even quality signifiers like “top 10 on art- unwanted details like the cables hanging in the air or the
station” into the prompt. A much more directed way to differently colored windows.
influence queries is the use of word weights, image refer- After unsuccessful attempts to add a paved path
ences and parameters, which add additional parameteri- from the sideway to the entrance through inpainting in
zation to the prompt system. DALL·E, we transferred image (f ) to Stable Diffusion.

Fig. 4 The proposed combined workflow over Midjourney, DALL·E and (optionally) Stable Diffusion
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 9 of 11

We erased the walkway and replaced it with inpainting 5.3 Common limitations

using a modified query without a garden reference (as the Working with current AI models is an trial-and-error
walkway would often be overgrown by plants) and add- process. They rarely present perfect results on first try,
ing “paved” early in the prompt (as earlier keywords are but expect the user to pick the best variants and refine
weighted higher). This resulted in the prompt “single- their prompts multiple times. This may not be straight
family home, paved between sidewalk and door, full exte- forward, but giving a group of architecture students the
rior view, modern architecture, photo, sunlight” with the task to draw a design with similarly rough specifications
result in image (g). would also result in many variants and would be way
The speed of the process allows to develop this design more time consuming.
together with a client. He may also suggest major Nonetheless, many variants that users explore fail
changes like adding a second story or some other roof entirely. This is evidenced by the high number of single
element. This can easily be accomplished via inpaint- step queries without upscaling in our Midjourney anal-
ing by erasing the roof and part of the sky and a slightly ysis in Fig. 2(f ). Some of the common failure cases an
changed prompt specifying the style of the new image. architect would encounter while using these models are
Figure 5(h, i) shows the result for the Stable Diffusion shown in Fig. 6. Case (a) shows the result for a floor plan
query “single-family home, two stories, clear blue sky, query. It does well in imitating the style of bold lines for
curved roof, full exterior view, modern architecture, walls and thin lines for objects, but is completely non-
photo, sunlight” after the roof and central area of the sensical on closer look. This is due to the fact that AI
sky have been erased and the canvas has been extended art tools replicate the style, but have no semantic under-
upwards to give more room for roof elements. Note that standing of the meaning of the lines in a floor plan. Case
some latent effects like the tree branches reaching into (b) shows the result of a query with multiple specific
the image are hard to avoid. technical terms, which are also somewhat ambiguous in

(a) Midjourney original query (b) Variants (c) Remastered and Upscaled

(d) DALL·E outpainting of (c) (e) Erased parts (f) Inpainted

(g) Stable Diffusion inpainted (h) 2nd story variant 1 (i) 2nd story variant 2
Fig. 5 Refinement and variant generation in Midjourney (a–c), DALL·E 2 (d–f), and Stable Diffusion for a walkway (g) and a second story (h, i)
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 10 of 11

(a) “architectural floor plan of a (b) “red brick building facade (c) “a drawing of a map of five
modern single-family home, with a dutch clock gable and a buildings arranged around a
ground floor” bow window” park, realistic, detailed,
top-down, site plan, digital
rendering”
Fig. 6 Example failure cases

themselves. The specific architectural element “bow win- the tools are already adopted. In the coming months
dow” was turned into a window with a bow over it. The and years these platforms will further improve. We can
“clock gable” was represented by a different gable element already see workflows across tools that converge toward
with a clock below it. Case (c) shows a query for a land- Fig. 4. In the end, single platforms will deliver the full
scape design with five buildings, which invokes the com- workflows for use cases like ideation, collages, build and
mon problem that these AI tools are just bad at counting style variants that will drastically improve productivity
and complex spatial arrangements beyond foreground and creativity. It is thus likely that these tools will first
and background. be adopted for brain storming sessions with clients and
for competitions. We observed that the designs created
6 Conclusion by current AI tools tend toward organic forms, openwork
In this paper we looked at AI art generation tools in their facades and complex arrangements that break out of the
applicability to architecture and civil engineering. We common minimalistic modern design. With the ongoing
compared three of the currently available AI art plat- research on automated evaluation of structure dynamics
forms and identified use cases that can be tackled now and in the field of additive and robotic construction tech-
or are soon to be unlocked. To understand how users are nologies, more and more of these designs are becoming
already using these tools we analyzed millions of queries structurally and financially possible. This may form a per-
providing some insights on how users iterate. Finally, fect storm situation leading to a new generation of archi-
we presented two workflows, for interior and exterior tecture styles based on AI-generated designs.
design, with the latter combining the strengths of the dif-
ferent platforms.
Author contributions
The various use cases shown in this paper illustrate the JP: Ideation and manuscript, Midjourney Study, Writing Sects. 1, 3, 4. MB: Use
strong potential for AI tools in architecture. The AI plat- Case Experiments, Writing Sects. 2, 3, 5. Both authors reviewed and edited all
forms still struggle with more complex prompts, usually sections as well as read and approved the final manuscript.
due to missing semantic understanding of the image con- Funding
tent. A floor plan for example is not just a collection of No funding was received for conducting this study.
lines. These lines carry semantic and contextual informa-
Data availibility
tion, like that they form walls enclosing a room with a door The Midjourney datasets generated and analysed during the current study are
to get in. As we already have Building Information Models not publicly available for data privacy reasons. Code for retrieving the data can
(BIM) that provide this semantic information it is just a be made available from the corresponding author on reasonable request.
matter of time that new diffusion models will arrive that are
trained specifically on these data sets, and it is likely that Declarations
they will be able to fulfil all requirements from Table 1. Competing interests
Nonetheless, the high number of 5.7 million queries The authors have no financial or proprietary interests in any material discussed
with architectural context that we identified show that in this article.
Ploennigs and Berger AI in Civil Engineering (2023) 2:8 Page 11 of 11

Received: 8 February 2023 Revised: 28 June 2023 Accepted: 31 July 2023

References
Borji, A. (2022). Generated faces in the wild: Quantitative comparison of stable
diffusion, midjourney and DALL-E 2. arXiv preprint http://arxiv.org/abs/
2210.00586.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neela-
kantan, A., Shyam, P., Sastry, G., Askell, A., & Agarwal, S. (2020). Language
models are few-shot learners. Advances in Neural Information Processing
Systems, 33, 1877–901.
Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models.
Nips, 33, 6840–6851.
Ho, J., Salimans, T., Gritsenko, A.A., Chan, W., Norouzi, M., & Fleet, D.J. (2022).
Video diffusion models. ICLR workshop on deep generative models for
highly structured data.
Kawar, B., Zada, S., Lang, O., Tov O, Chang, H., Dekel, T., Mosseri, I., & Irani, M.
(2022). Imagic: Text-based real image editing with diffusion models. arXiv
preprint http://arxiv.org/abs/2210.09276.
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., & Van Gool, L. (2022).
Repaint: Inpainting using denoising diffusion probabilistic models. CVPR
(pp. 11461–11471).
Luo, S., & Hu, W. (2021). Diffusion probabilistic models for 3D point cloud
generation. CVPR (pp. 2837–2845).
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed
representations of words and phrases and their compositionality. Nips
(Vol. 26).
Nichol, A., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., McGrew, B., Sutskever,
I., & Chen, M. (2022). Glide: Towards photorealistic image generation and
editing with text-guided diffusion models. ICML.
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G.,
Askell, A., Mishkin, P., Clark, J., Krueger, G. (2021). Learning transferable
visual models from natural language supervision. ICML (pp. 8748–8763).
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., & Chen, M. (2022). Hierarchical
text-conditional image generation with CLIP latents. arXiv preprint http://
arxiv.org/abs/2204.06125.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022, June).
High-resolution image synthesis with latent diffusion models. CVPR (p.
10684–10695).
Saharia, C., Chan, W., Chang, H., Lee, C., Ho, J., Salimans, T., Fleet, D., & Norouzi,
M. (2022). Palette: Image-to-image diffusion models. ACM SIGGRAPH (pp.
1–10).
Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., & Norouzi, M. (2022). Image
super-resolution via iterative refinement. IEEE Transactions on Pattern
Analysis and Machine Intelligence.
Seneviratne, S., Senanayake, D., Rasnayaka, S., Vidanaarachchi, R., & Thompson,
J. (2022). DALLE-URBAN: Capturing the urban design expertise of large
text to image transformers. International Conference on Digital Image
Computing: Techniques and Applications.
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., & Ganguli, S. (2015). Deep
unsupervised learning using nonequilibrium thermodynamics. ICML (pp.
2256–2265).
Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., & Bermano, A.H. (2022).
Human motion diffusion model. arXiv preprint http://arxiv.org/abs/2209.
14916.
Zeng, X., Vahdat, A., Williams, F., Gojcic, Z., Litany, O., Fidler, S., & Kreis, K. (2022).
LION: Latent point diffusion models for 3D shape generation.
Zhou, L., Du, Y., & Wu, J. (2021). 3D shape generation and completion through
point-voxel diffusion. CVPR (pp. 5826–5835).

Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
lished maps and institutional affiliations.

Photography and Its Critics A Cultural Histrory, 1839-1900
100% (1)
Photography and Its Critics A Cultural Histrory, 1839-1900
248 pages
Ch-2 Animation Tool Synfig
83% (6)
Ch-2 Animation Tool Synfig
85 pages
Best AI Image Generator
100% (1)
Best AI Image Generator
12 pages
Generative Art Processes & Practices, Part I - Initiate - 7.19.24
No ratings yet
Generative Art Processes & Practices, Part I - Initiate - 7.19.24
95 pages
Tech Report Generative AI
100% (1)
Tech Report Generative AI
17 pages
@powder Coating PDF
100% (2)
@powder Coating PDF
198 pages
Colour Mixing Essentials
88% (8)
Colour Mixing Essentials
40 pages
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
100% (1)
Dynamic Image Generation From Text Prompt Research Paper-JOT-5135
7 pages
Wk4 - AI Generated Images
No ratings yet
Wk4 - AI Generated Images
30 pages
Revit Template
No ratings yet
Revit Template
10 pages
Generative Ai & Creative Applications
No ratings yet
Generative Ai & Creative Applications
28 pages
Arts 9: Quarter 3 - Module 1 Arts of The Neoclassic and Romantic Periods
100% (1)
Arts 9: Quarter 3 - Module 1 Arts of The Neoclassic and Romantic Periods
35 pages
Thesis 11 51
No ratings yet
Thesis 11 51
41 pages
Ijimai 9 1 16
No ratings yet
Ijimai 9 1 16
36 pages
Vol. 92 - Sag 1887 1921
No ratings yet
Vol. 92 - Sag 1887 1921
35 pages
A Survey of Generative AI Applications
No ratings yet
A Survey of Generative AI Applications
36 pages
Generating AI Text To Image A Comprehensive Guide
No ratings yet
Generating AI Text To Image A Comprehensive Guide
3 pages
TLE 6 PPT Q3 - Methods of Enhancing Decorating Bamboo, Wood, and Metal Products
No ratings yet
TLE 6 PPT Q3 - Methods of Enhancing Decorating Bamboo, Wood, and Metal Products
86 pages
Unit - 4
No ratings yet
Unit - 4
46 pages
Presentation-PAINTING WORKS
No ratings yet
Presentation-PAINTING WORKS
18 pages
New Research
No ratings yet
New Research
6 pages
Ai Art - B
No ratings yet
Ai Art - B
6 pages
Clay
No ratings yet
Clay
19 pages
BTP - 6 Sem - Part1
No ratings yet
BTP - 6 Sem - Part1
40 pages
DeepPov GAI
100% (1)
DeepPov GAI
47 pages
Sketching Drawing - by - Blixer +++ (OCR)
No ratings yet
Sketching Drawing - by - Blixer +++ (OCR)
243 pages
Artificial Intelligence For Image Creation - Advances, Applications, and Ethical Challenges
No ratings yet
Artificial Intelligence For Image Creation - Advances, Applications, and Ethical Challenges
4 pages
Dehouce
No ratings yet
Dehouce
12 pages
Manera - Text-To-image Technologies - The Aesthetic Implications of AI-generated Images
No ratings yet
Manera - Text-To-image Technologies - The Aesthetic Implications of AI-generated Images
13 pages
Generative AI Models For Different Steps in Architectural Design: A Literature Review
No ratings yet
Generative AI Models For Different Steps in Architectural Design: A Literature Review
34 pages
UNIT 4 Notes Gen-AI ASP
No ratings yet
UNIT 4 Notes Gen-AI ASP
19 pages
F C C: E AI A U: ROM Reation To Urriculum Xamining The Role of Generative IN RTS Niversities
No ratings yet
F C C: E AI A U: ROM Reation To Urriculum Xamining The Role of Generative IN RTS Niversities
17 pages
Digital Society Miltner Highfield
No ratings yet
Digital Society Miltner Highfield
11 pages
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
No ratings yet
Exploring The Various Machine Learning Models For Image Generation - A Comprehensive Survey Unlocking The Future of Digital Creativity
15 pages
Applicationsof Generative AIinthe Creative Sect
No ratings yet
Applicationsof Generative AIinthe Creative Sect
13 pages
Image Generation A Review
No ratings yet
Image Generation A Review
39 pages
2024 - Beyond The Visuals - Future Collaboration Scenarios Between Architects and Artificial Intelligence
No ratings yet
2024 - Beyond The Visuals - Future Collaboration Scenarios Between Architects and Artificial Intelligence
25 pages
Samir Bellare: Quizmaster
No ratings yet
Samir Bellare: Quizmaster
89 pages
15 Tikal Burial 196 Tomb of The Jade Jaguar Structure 5D 73 Peten Guatemala Vol 2 Nicholas Hellmuth Harvard Thesis Part 1
No ratings yet
15 Tikal Burial 196 Tomb of The Jade Jaguar Structure 5D 73 Peten Guatemala Vol 2 Nicholas Hellmuth Harvard Thesis Part 1
124 pages
WEEK 005-006 Beginning Graphic Design Typography
No ratings yet
WEEK 005-006 Beginning Graphic Design Typography
5 pages
Ap22 Art and Design Drawing Full Portfolio Sample
No ratings yet
Ap22 Art and Design Drawing Full Portfolio Sample
17 pages
JDSAA-Volume 4-Issue 2 - Page 42-58
No ratings yet
JDSAA-Volume 4-Issue 2 - Page 42-58
17 pages
Image Synthesis From An Ethical Perspective
No ratings yet
Image Synthesis From An Ethical Perspective
11 pages
Indian Standard Radial Gate
100% (1)
Indian Standard Radial Gate
8 pages
Background and Literature Review
No ratings yet
Background and Literature Review
17 pages
SSRN Id4654875
No ratings yet
SSRN Id4654875
30 pages
Portfolio Research Paper
No ratings yet
Portfolio Research Paper
14 pages
AI Image Generation
No ratings yet
AI Image Generation
11 pages
Artificial Intelligence and Machine Learning For Additive Manufacturing Composites Toward Enriching Metaverse Technology
No ratings yet
Artificial Intelligence and Machine Learning For Additive Manufacturing Composites Toward Enriching Metaverse Technology
13 pages
Illustrating Classic Brazilian Books Using A Text-To-Image Diffusion Model
No ratings yet
Illustrating Classic Brazilian Books Using A Text-To-Image Diffusion Model
7 pages
Artverse: A Paradigm For Parallel Human-Machine Collaborative Painting Creation in Metaverses
No ratings yet
Artverse: A Paradigm For Parallel Human-Machine Collaborative Painting Creation in Metaverses
9 pages
Synthetic Image Verification in The Era of Generative AI: What Works and What Isn't There Yet
No ratings yet
Synthetic Image Verification in The Era of Generative AI: What Works and What Isn't There Yet
11 pages
A Pathway Towards Responsible AI Generated Content
No ratings yet
A Pathway Towards Responsible AI Generated Content
12 pages
Background and Literature Review
No ratings yet
Background and Literature Review
7 pages
Planning Alternative Building Facade Designs Using
No ratings yet
Planning Alternative Building Facade Designs Using
7 pages
Design Guidelines For Prompt Engineering
No ratings yet
Design Guidelines For Prompt Engineering
23 pages
Class IX AI Notes
No ratings yet
Class IX AI Notes
9 pages
Image Synthesis From An Ethical Perspective: Oliver Bendel
No ratings yet
Image Synthesis From An Ethical Perspective: Oliver Bendel
10 pages
Piskopani Et Al 2023 Responsible Ai and
No ratings yet
Piskopani Et Al 2023 Responsible Ai and
5 pages
PaintGAN 22042023
No ratings yet
PaintGAN 22042023
5 pages
Detection of AI Generated Images
No ratings yet
Detection of AI Generated Images
6 pages
Intro To Image Generation With AI
No ratings yet
Intro To Image Generation With AI
2 pages
Caadria2022 503
No ratings yet
Caadria2022 503
10 pages
What's in A Text-To-Image Prompt The Potential of Stable Diffusion in Visual Arts Education
No ratings yet
What's in A Text-To-Image Prompt The Potential of Stable Diffusion in Visual Arts Education
12 pages
Irjet V11i6100
No ratings yet
Irjet V11i6100
7 pages
Capabilities Limitations and Challenges of Style T
No ratings yet
Capabilities Limitations and Challenges of Style T
20 pages
DCM Halidh
No ratings yet
DCM Halidh
12 pages
Image-Dev An Advance Text To Image AI Model
No ratings yet
Image-Dev An Advance Text To Image AI Model
6 pages
Capabilities of Generative AI-en
No ratings yet
Capabilities of Generative AI-en
3 pages
Manovich Ai Image and Generative Media
No ratings yet
Manovich Ai Image and Generative Media
17 pages
AI Image Generation
No ratings yet
AI Image Generation
12 pages
Balingit - CIV 227 - Assignment 2 - GEART01X
No ratings yet
Balingit - CIV 227 - Assignment 2 - GEART01X
5 pages
SanjanaSademba 2205348.
No ratings yet
SanjanaSademba 2205348.
8 pages
Text To Image Synthesis Using Generative Adversarial Networks
No ratings yet
Text To Image Synthesis Using Generative Adversarial Networks
10 pages
Passenger Lift
No ratings yet
Passenger Lift
37 pages
Mythology As Poetics
No ratings yet
Mythology As Poetics
85 pages
Generative AI1
No ratings yet
Generative AI1
4 pages
Latest Organisation Chart
No ratings yet
Latest Organisation Chart
4 pages
Daily Report Al Hameli Nov-27
No ratings yet
Daily Report Al Hameli Nov-27
1 page
Francisco Ribalta Makale-1
No ratings yet
Francisco Ribalta Makale-1
3 pages
3D View Link To Revit: @trinhtrieu193
No ratings yet
3D View Link To Revit: @trinhtrieu193
3 pages
iHFG Standard Components: 1br-Ba-20-I 1 Bed Room - Bariatric, 20m
No ratings yet
iHFG Standard Components: 1br-Ba-20-I 1 Bed Room - Bariatric, 20m
3 pages
Unit Plan: Color
No ratings yet
Unit Plan: Color
37 pages
Primer For GI
No ratings yet
Primer For GI
3 pages
Greek Shields and Helmets
No ratings yet
Greek Shields and Helmets
5 pages
Military and Aerospace Applications For Powder Coating PDF
No ratings yet
Military and Aerospace Applications For Powder Coating PDF
18 pages
ORGANOGRAM
No ratings yet
ORGANOGRAM
1 page
ROADMAP Report A2 Worksheet Unit
No ratings yet
ROADMAP Report A2 Worksheet Unit
2 pages
Unit 5 CanLit Grado 19-09-13
No ratings yet
Unit 5 CanLit Grado 19-09-13
22 pages
Erik Kessels
No ratings yet
Erik Kessels
7 pages
Ba (Hons.) Graphic Design: Module Code Semester Module Title Doc. Code
No ratings yet
Ba (Hons.) Graphic Design: Module Code Semester Module Title Doc. Code
11 pages
Untold Story by RK Bidur
No ratings yet
Untold Story by RK Bidur
4 pages
DIY Diaper Cover Tutorial With Free Pattern! - Prudent Baby
No ratings yet
DIY Diaper Cover Tutorial With Free Pattern! - Prudent Baby
13 pages
THANJAVUR
No ratings yet
THANJAVUR
6 pages
Erasure in Art
No ratings yet
Erasure in Art
14 pages