0% found this document useful (0 votes)
89 views29 pages

SceneX - Procedural Controllable Large-Scale Scene Generation Via Large-Language Models

Uploaded by

nftdatastudio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views29 pages

SceneX - Procedural Controllable Large-Scale Scene Generation Via Large-Language Models

Uploaded by

nftdatastudio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

arXiv:2403.15698v1 [cs.

CV] 23 Mar 2024


SceneX : Procedural Controllable Large-scale
Scene Generation via Large-language Models

Mengqi Zhou1* , Jun Hou2* , Chuanchen Luo1 , Yuxi Wang2Q , Zhaoxiang


Zhang1Q , and Junran Peng1

Institute of Automation, Chinese Academy of Sciences


1
2
Centre for Artificial Intelligence and Robotics, Hong Kong Institute of Science &
Innovation, Chinese Academy of Sciences

Cherry Blossoms Generation Large-scale Forest Generation Unbounded City Generation

BlenderOSM

User descriptions: Generate beautiful cherry blossoms /Generate large-scale forest/Generate a modern city.

Fig. 1: The proposed SceneX can create large-scale 3D natural scenes or unbounded
cities automatically according to user instructions. The generated models are character-
ized by delicate geometric structures, realistic material textures, and natural lighting,
allowing for seamless deployment in the industrial pipeline.

Abstract. Due to its great application potential, large-scale scene gen-


eration has drawn extensive attention in academia and industry. Recent
research employs powerful generative models to create desired scenes
and achieves promising results. However, most of these methods repre-
sent the scene using 3D primitives (e.g. point cloud or radiance field)
incompatible with the industrial pipeline, which leads to a substantial
gap between academic research and industrial deployment. Procedural
Controllable Generation (PCG) is an efficient technique for creating scal-
able and high-quality assets, but it is unfriendly for ordinary users as it
demands profound domain expertise. To address these issues, we resort
to using the large language model (LLM) to drive the procedural model-
ing. In this paper, we introduce a large-scale scene generation framework,
SceneX , which can automatically produce high-quality procedural mod-
els according to designers’ textual descriptions.Specifically, the proposed
*
Equal contribution
Q
Co-corresponding authors
2

method comprises two components, PCGBench and PCGPlanner. The


former encompasses an extensive collection of accessible procedural as-
sets and thousands of hand-craft API documents. The latter aims to gen-
erate executable actions for Blender to produce controllable and precise
3D assets guided by the user’s instructions. Our SceneX can generate a
city spanning 2.5 km × 2.5 km with delicate layout and geometric struc-
tures, drastically reducing the time cost from several weeks for profes-
sional PCG engineers to just a few hours for an ordinary user. Extensive
experiments demonstrated the capability of our method in controllable
large-scale scene generation and editing, including asset placement and
season translation. Our project page: https://scenex-lab.github.io/.

Keywords: Large-scale Scene Generation · LLM Agents · PCG

1 Introduction
With the rapid progress in generative modeling, 3D content generation has wit-
nessed remarkable achievements in automatic creation of 3D objects [35, 36, 39,
43], avatars [17, 27, 58], and scenes [9, 29, 31, 52]. In this work, we focus on au-
tomatic 3D scene generation, which has extensive applications in domains like
gaming, animation, film, and virtual reality. Although previous approaches have
achieved promising results, a significant gap still remains for industrial-grade
applications due to incompatible representations with the industrial pipeline of
the scene using 3D primitives (e.g. point cloud or radiance field).
To bridge the gap, we resort to procedural modeling, an advanced technique
for the fast creation of high-quality 3D assets. It can generate realistic and intri-
cate 3D assets in natural scenes through adjustable parameters and rule-based
systems [32] using software like Blender. For example, Infinigen [42] proposes a
procedural generator that combines different procedural modeling algorithms to
generate large-scale natural scenes encompassing terrain, weather, vegetation,
and wildlife. [33], [5] and [46] use procedural modeling to generate city-level
street or layout. These procedural approaches generate high-quality 3D assets
at the object, scene, and city levels. However, the comprehensive grasp of gen-
eration rules, algorithmic frameworks, and individual parameters of procedural
modeling make the generation process beginner-unfriendly and time-consuming,
especially for large-scale scene or city generation. For instance, generating a New
York city, as shown in Fig 1, requires the effort of a professional PCG engineer
working for over two weeks. Moreover, due to the substantial learning curve for
users, it is formidable to generate large-scale scenes using procedural modeling.
To address the above problems, existing works such as 3D-GPT [45] and
SceneCraft [19] introduce an instruction-driven 3D modeling method by inte-
grating an LLM agent with procedural generation software. Despite the suc-
cessful collaboration with human designers to establish a procedural generation
framework, these methods still exhibit significant limitations. Firstly, SceneCraft
is unable to achieve fine-grained 3D asset editing due to the uneditable nature of
fixed 3D assets compared to those created by procedural generation. Secondly,
3

3D-GPT, being based on the parametric procedural generation model Infinigen,


restricts its ability to fully utilize existing PCG resources. Additionally, it is in-
capable of generating high-quality and large-scale scenes, especially expansive
terrains and boundless cities. Lastly, the diversity of assets is limited due to
constraints imposed by prompt expressions.
To address these issues, we propose an instructive large-scale scene genera-
tion framework, SceneX , which can automatically generate high-quality scenes
according to designers’ textual descriptions. By introducing a PCGBench, we
collect extensive accessible procedural assets, including various terrains, cities,
and natural objects like plants and animals. This helps us to generate diverse, re-
alistic, and practical large-scale scenes. Moreover, we also construct thousands of
API parameter documents, which are used to formulate the agent planner process
and provide a quantitative evaluation. Inspired by existing LLM-driven planning
methods [28,44,49,51], we develop a PCGPlanner framework to produce correct
API actions interfaced with Blender, to automatically generate controllable and
precise 3D assets guided by users’ instructions. This LLM-based automated solu-
tion significantly enhances the efficiency of scene generation, reduces the learning
curve for procedural scene modeling, and enables the customized creation of tar-
get scenes. In addition to the inherent advantages of procedural modeling such
as photorealistic and real geometry, our SceneX possesses several key properties:
1. Efficiency: Benefiting from the proposed PCGBench and PCGPlanner, our
SceneX can rapidly produces extensive, high-quality 3D assets, including
terrain, city, and forest. Moreover, we only need a few hours to generate a
large-scale city, whereas it would take a professional designer over two weeks.
2. Controllability: SceneX can generate 3D scenes satisfying personalized de-
mands. For example, we can achieve scene editing according to the corre-
sponding descriptions, such as adding objects, placing objects at a location,
translating the season, and so on.
3. Diversity: SceneX breaks the limitations of specific generation algorithms
and predefined rules, leveraging advanced language model technology to
achieve more flexible and diverse large-scale scene generation. Specifically, we
generate large-scale terrains in the natural world, like deserts, lakes, forests,
and mountains, and high-quality cities in the real world, such as New York,
Paris, and Tokyo, as well as various objects, including plants, animals, and
buildings.

2 Related Works
2.1 Learning Based 3D Generation.
In recent years, 3D asset generation has witnessed rapid progress, combining
the ideas of computer graphics and computer vision to realize the free creation
of 3D content. Presently, predominant research in 3D asset generation primarily
concentrate on creating individual objects [8,30,35,39], 3D avatars [18,25,27,58],
and 3D scenes [11, 16, 59, 61, 62]. Among these, ZeroShot123 [35] proposes a
method based on a diffusion model, which realizes the 3D model of the target
4

based on a picture. DreamFusion [39] proposes a NeRF-based approach that


allows the model to generate a corresponding 3D model based on the input text.
Compared with the generation of a single object, the generation of large-scale
scenes, including the generation of natural landforms [9, 15, 34] and borderless
cities [29,31,52], is undoubtedly more practical and challenging in the metaverse
and game production. CityDreamer [52] builds vast, large-scale 3D cities based
on the layout of real cities, enhancing the accuracy and stability of urban recon-
struction. SceneDreamer [9] proposes a method to generate 3D borderless scenes
within 2D plots using BEV representations. These learning-based methods do
not have geometric structures, making them difficult to apply directly to virtual
engines such as UE. For example, although InfiniCity [31] and CityDreamer [52]
can generate visually appealing urban scenes, the absence of clear geometry may
lead to issues in practical applications, such as intersecting objects and spatial
discontinuities.

2.2 Procedural Based 3D Generation.

Researchers have delved into the procedural generation of natural scenes [12,60]
and urban scenes [33, 46, 48, 54] using Procedural Content Generation (PCG).
For instance, PMC [38] proposes a procedural way to generate cities based on
2D ocean or city boundaries. It employs mathematical algorithms to generate
blocks and streets and utilizes tailgating technology to generate the geometry
of buildings. While these traditional computer graphic methods can produce
high-quality 3D data, all parameters must be pre-entered into the procedurally
generated process. The resultant 3D data is bound by rule limitations, introduc-
ing a certain level of deviation from the real world. This significantly constrains
flexibility and practical usability. Infinigen [42] introduces a technique for pro-
cedurally generating realistic 3D natural objects and scenes. This methodology
facilitates the programmatic generation of all assets, with meshes and textures
generated through random mathematical algorithms. Although Infinigen gener-
ates infinitely real assets, users are unable to customize the generated outcomes
because of their specific requirements. Therefore, we propose a more convenient
procedural generation method to produce high-quality 3D assets.

2.3 LLM Based Task Planner

Benefiting from knowledge hidden in the large-language model (LLMs) [1, 3, 10,
26,41], researchers explore LLMs to address intricate tasks beyond canonical lan-
guage processing domains. These tasks encompass areas such as mathematical
reasoning [22, 50], medicine [23, 53], and planning [14, 20, 21, 57]. Thanks to pow-
erful reasoning and generalization capabilities, LLMs act as practiced planners
for different tasks. For example, [20] utilizes the expansive domain knowledge
of LLMs on the internet and their emerging zero-shot planning capabilities to
execute intricate task planning and reasoning. [14] investigates the application of
LLMs in scenarios involving multi-agent coordination, covering a range of diverse
task objectives. [56] presents a modular framework that employs structured dia-
logue through prompts among multiple large pretrained models. Moreover, spe-
5

(a) (b)

Fig. 2: (a) Illustrative Comparison: Tree PCG Example and Tree API Documentation
Example. (b) Illustration of the text-to-image retrieval process.

cialized LLMs for particular applications have been explored, such as Hugging-
GPT [44] for vision perception tasks, VisualChatGPT [51] for multi-modality
understanding, Voyager [49] and [63], SheetCopilot [28] for office software, and
Codex [6] for Python code generation. Inspired by the existing works, we explore
the LLM agent to PCG software, e.g., Blender, to provide automatic 3D assets
generation.

3 Methods

The proposed SceneX framework contains two main components: the PCGBench
and PCGPlanner. PCGBench is a large-scale dataset that includes diverse re-
sources in procedural modeling, which is used to perform as an asset bank and
evaluate the generation performance. Specifically, it includes collecting plugins
and assets from the Internet, asset annotations, and hand-craft API documents.
Based on our PCGBench, a novel PCGPlanner framework is proposed to auto-
matically generate 3D scenes using both PCG and pre-defined models. It com-
prises a task planner stage, an asset retrieval stage, and an action execution
stage guided by agents with custom-built instructions. By the cooperation be-
tween the PCGBench and novel PCGPlanner agents, SceneX is endowed with
the capability of efficient, controllable generation of diverse, high-quality scenes.
6

3.1 PCGBench
To build PCGBench, we first collect abundant resources of PCG models from
Blender Market1 , which includes action APIs and 3D assets. The action APIs
consist of raw APIs and detailed parameters that are used to control asset gener-
ation. Specifically, we have collected 1,532 individual raw APIs in total, involving
each aspect in the scene-generation tasks, such as controlling generated assets’
placement, color, size, and height. Then, we manually merge the most essen-
tial raw APIs according to their functions and obtain 45 action functions. Each
action function contains multiple raw APIs with the corresponding parameters
to produce a complete 3D asset. For example, to generate a 3D tree as Fig
2 shows, we define two individual action functions such as tree_mesh_generate
and tree_material_generate. Each function comprises several raw APIs with the
corresponding parameters like TreeHight, CrownDensity, LeafDensity, etc. These
parameters can be manipulated at the action input, and we can generate various
trees by changing these parameters.
To achieve controllable 3D model generation guided by human instructions,
we should formulate the action functions for large-language models (LLMs),
e.g., GPT-4 2 , following previous works [6, 45, 49]. To achieve this, we compile
lots of action functions guidance documents, denoted as API Documents, to
deploy our functions for LLMs. Concretely, this document includes information
on API types, function descriptions, API names, and a set of raw APIs with
the corresponding parameter details. Besides, we also provide a comprehensive
record of all parameter names and their possible values. The details are referred
to Fig 2. Moreover, we have categorized these documents into four modules:{
Terrain, City, Detail, Weather}. Simultaneously, we have developed a concise
document outlining the scene generation process, providing detailed explanations
of the construction sequence and capability boundaries for each module.
Regarding asset collection, we have meticulously curated an extensive 3D
dataset comprising 1,908 finely crafted models and 1,294 textures. This diverse
collection spans categories including buildings, flora, urban infrastructure, and
textures gathered from the vast expanse of the Internet. These existing assets
provide an alternative solution for 3D scene generation by retrieving existing
individual procedural models rather than progressively generation, which can
significantly improve the efficiency of creating large-scale scenes. For retrieval,
we use CLIP [40] encoder feature, achieving text-to-text retrieval and text-to-
image retrieval. Therefore, we also provide abundant annotations for collected
3D assets by rendering a picture for each 3D asset.
3.2 PCGPlanner
To achieve LLM-driven asset generation, we introduce PCG Planner Agents’ hi-
erarchy, which is comprised of multiple agents that can automatically generate
high-quality and large-scale scenes corresponding to the users’ textual descrip-
tions. Following previous works [6,44,49,51], we design our module encompassing
1
https://blendermarket.com/
2
https://openai.com/gpt-4
7

Task Planner … Action Execution


Dispatch agent Execution agent
3D assets
You serve as a Blender modeling task planner that helps me design a scene. Image Encoding You serve as a Blender API operator to choose the feasible parameter
You should determine the objects that should be included. value according to the description.
- Your task is {T_dispathch}. - The task is {T_execution}, the api is {α}
- You can only make a plan as regulated in {D_dispatch}
Embeddings - You must modify the parameters follow the rules defined in {D_execution}
- Out put a rough plan in the form of {F_dispatch} - Out put the modified api in the form of {F_execution}
- Here is an example: - Here is an example:
{E_dispatch) {E_execution}
Now, you must think about how to accomplish the task step by step. Asset retrieval You should read the API documentation carefully.Then manipulate the
parameter values in API function according to task.

Generate object plan


Text Encoder

{Step 1: Import a plain.}

Stone_generate(par1,par2…par
a)
{Terrain: Plain} {Step 1: Generate a pine tree.}
{Details : pine tree} {Step 1: Import the pine into the scene.} Descriptions Obj_scatter(par1,par2…para)
{Details : shrub} {Step 1: Scatter willow son the ground.}

{Weather: Summer } … Tree_generate(par1,par2…para)

Text Encoder
{Step 1: Import a summer light.}
{Step 2: Change the leaf into green.} {"Name": "IsTrunkStraight",
"Parameter Types": "Discrete",
"Description": "Indicates whether the
tree trunk is straight or not",
"Possible Values": ["True", "False"]},
Specialist agent ...

You serve as a Blender modeling specialist. Your job is to give me detailed Embeddings
steps to achieve a task with given actions APIs.
- Your task is {T_specialist}.
- You must make a detailed plan as regulated in {D_specialist}
- Output a list of the detailed plan in the sequential order in the form of Text Encoding
{F_specialist}
- Here is an example:
{E_specialist}
Now, you must think about how to accomplish the task step by step. … API docs

Fig. 3: The PCG Planner framework comprises three essential components: the task
planner, asset retrieval, and action execution. This framework empowers LLMs with
the capabilities for task planning in complex scenarios, utilizing multiple API actions,
and facilitating large-scale scene generation.

three primary stages: task planning, asset retrieval, and action execution. In the
task planning stage, a sequential planning agent pair is introduced to process
the user query and output a detailed sub-task list. Then, to precisely connect
the sub-task with our PCGBench, we retrieve the actions and assets mentioned
in the task list. Finally, the execution agent leverages the annotated action APIs
to process subtasks of procedural generation or asset manipulation.
Agent construction To endow the LLMs with the capability for modeling
scenes like a specialist, we introduce a systematic template to prompt LLMs
as agents to achieve the subtasks in a scene modeling task. For each agent,
the prompt P has a similar structure defined as Pi (Ri , Ti , Di , Fi , Ei ), where
i ∈{dispatch, specialist, retrieval, execution} distinguishes the responsibility of
agents. The constituents of the prompt are defined as follows:

• Role Each agent is given a specific role R which describes its respon-
sibility in the scene generation process.
• Task T gives a detailed explanation of the goals for the agent. In the
meanwhile, the constraints are also expounded executing these tasks.
• Document At each step, the agent is prompted by a knowledge doc-
ument according to their task. We denote D as the collection that contains
all the knowledge documentation pre-defined in PCGBench.
• Format F denoted as the output format for each agent. To precisely
and concisely convey information between agents, the output format of each
agent is strictly defined.
• Examples To help agents respond strictly following the format, we
demonstrated several examples E for reference in each prompt.

Task Planner For a scene generation task, multiple action APIs from differ-
ent modules are usually needed. There is a great possibility that different APIs
8

require to be called in a sequential order. Thus, a correct function arrangement


between different modules and APIs is of essential importance. To ensure compe-
tence in handling a broad spectrum of scene-generation tasks and surmounting
token limitations, we present a two-stage iterative task planner comprising a
dispatch agent and a specialist agent as depicted in Fig. 3.
Dispatch agent Dispatch agent is the first agent to receive users’ descriptions of
the target scene, it is responsible for extracting information from the sentences.
It serves as a planner to interpret user intents into a rough plan and list all the
objects mentioned in the input instruction. In the meantime, users may omit
some common background objects in the description, the agent will automati-
cally add all these objects that need to be taken into account in the rough plan.
The detailed object requirements of a scene and their module correspondence
principle are described in the knowledge documentation Ddispatch . We formulate
the dispatch agent as follows:

\{o_1,...o_n\} \leftarrow Agent_{dispatch}(q, P_{dispatch}) (1)

where q is the user input, oi stands for an object listed in the rough plan in the
formation of {Module: Object name}
Specialist agent Then, the specialist agent is activated to make further detailed
plans. At this stage, a specialist agent serves as a professional designer who is ca-
pable of giving detailed plans for the objects given in the rough plan. According
to the module class of the object, the specialist agent is prompted with the corre-
sponding knowledge documentation Dspecialist ⊂ D, where Dspecialist contains
the generation mechanisms and explanations for each module. The specialist
agent decomposes each rough plan into a detailed plan with several sub-tasks,
where each subtask corresponds to an executable API or an asset to be imported

\{s_1,...,s_n\} \leftarrow Agent_{specialist}(o_i, P_{specialist}) (2)


where sj stands for one of the sub-tasks to be completed.
Asset Retrieval With the detailed plan given by the specialist agent, we need
to search in our PCGBench for a suitable API to execute or an asset to import.
To precisely retrieve them, we propose a retrieval agent that identifies whether
the target of the sub-task type is an asset or an API. With the prompt Pretrieval ,
the agent is also instructed to output a description as the search index of the asset
or the API. The description is then encoded into an embedding for the retrieval.
Here, a pre-trained CLIP [40] is used to search our action APIs by text-to-text
retrieval and 3D models by text-to-image retrieval, respectively. As shown in the
right part of Fig 2, the rendering images of each asset in our PCGBench are
encoded into normalized 768-dimensional vectors, and then compared with the
input description embedding. One of the five most similar results is randomly
selected as the outcome and imported into the Blender scene. The best-matched
result is selected by calculating the similarity by cosine distance.

\alpha \textbar \gamma \leftarrow Agent_{retrieval}(s_j, P_{retrieval}) (3)


9

where α ∈ A stands for an retrieved API, A stands for the API collection.
γ ∈ Γ is the retrieved asset from asset collection Γ . For text-to-text retrieval,
the most relevant API is also selected by its functional description embedding. If
the retrieval result is an asset, it will be directly imported into the scene project.
On the other hand, the retrieved API α is to be passed to the action execution
agent.
Action Execution In the PCGBench, each action API is defined in the form
of a Python function with multiple pre-defined parameters. After obtaining the
α, we need to make custom modifications for these initial parameters to meet
the task description. For each API α, there is a knowledge documentation Dα
∈ D that passed into the prompt Pspecialist . As illustrated in Fig 2 (a), the
knowledge documentation of each action API is recorded in a JSON format and
has the same architecture. Take tree_mesh_generate API as an example, each
parameter in the function is precisely defined and explained in tree_mesh_doc
= {"type": "tree mesh", "Description": "Function for generating trees mesh.",
"Function API": "tree_mesh_generate", "Parameter Information": {"Possible
values": ("High", "Medium", "Low"), ... } }. The execution agent can be ex-
pressed as:
\alpha ^* \leftarrow Agent_{retrieval}(\alpha , P_{specialist},s_j) (4)
Where α∗ is the manipulated version of retrieved API α. There is a chance that
the descriptions contain less information for all the parameters. In this situation,
we instruct the execution agent to automatically choose a feasible parameter
from the options according to the other parameters. The execution agent ulti-
mately executed the action API with the inferred parameters in Blender Python
environments, incrementally adding new objects and making modifications in
scene project until all the tasks were completed.

4 Experiments
The goals of our experiments are threefold: (i) to verify the capability of SceneX
for generating photorealistic large-scale scenes, including nature scenes and cities;
(ii) to demonstrate the effectiveness of SceneX for personalized editing, such as
adding or changing; (iii) to compare different LLMs on the proposed benchmark.

4.1 Benchmark Protocol


Dataset. To evaluate the effectiveness of proposed SceneX , we use GPT-4 to
generate high-quality 50 scene descriptions, 50 asset descriptions, and 20 asset
editing descriptions. The scene descriptions involve natural scenes and cities.
Then, we feed them to our SceneX to generate corresponding models, which are
used to perform quantitative and qualitative comparisons.
Models. When generating and editing the 3D scenes, we adopt the leading
GPT-4 as the large language model with its public API keys. To ensure the
stability of LLM’s output, we set the decoding temperature as 0.
Metrics: We use Executability Rate (ER@1) and Success Rate (SR@1) to
evaluate the capabilities of LLMs on our SceneX . The former measures the
10

Early summer, under the bright sun, the forest is thriving with lush vegetation.

Cherry blossoms bloom, green grass the ground, and a river flows calmly.

Bright lights illuminate the modern city, showcasing a distinctive contemporary style.

Fig. 4: Visualization of the generation quality for large-scale scenes and cities.

proportion of proposed actions that can be executed, and the latter is used to
evaluate action correctness [7]. Moreover, to quantify the aesthetic level of the
generated scenes, we adopt a unified judgment standard as a reference. We divide
the aesthetics of generated scenes into five standards: Poor (1-2 points)/Below
Average (3-4 points)/Average (5-6 points)/Good (7-8 points)/Excellent (9-10
points). We enlisted 35 volunteers to assess the quality of our generation, in-
cluding 5 PCG experts. We compute average score (AS) and average expert
score (AES) to evaluate the effectiveness of our method.

4.2 Main Results

Large-Scale Scene Generation. To intuitively illustrate the capability of


SceneX in generating large-scale scenes, we select three subsets with significant
differences for comparison. We perform quantitative and qualitative results, in-
cluding generation quality, aesthetic evaluation and efficacy comparison.
Generation Quality. We first provide several examples of our SceneX for
generating large-scale nature scenes and unbounded cities. The results are shown
in Fig 4. From the results, we can observe that the proposed SceneX can produce
highly realistic scenes in both natural scenes and cities. Moreover, the generated
content is correctly corresponding to the provided texture descriptions. These
demonstrate the power and effectiveness of our proposed LLM-driven automatic
3D scene generation framework.
Furthermore, to compare with existing learning-based methods, we provide
the qualitative comparison with several representing methods on 3D city gen-
eration, such as Persistent Nature [4], SceneDreamer [9],InfiniCity [31], and
CityDreamer [52]. Fig. 5 shows the qualitative comparison results between the
learning-based methods for city scene reconstruction work and SceneX . From
the results, we can observe that learning-based methods commonly suffer similar
problems: low 3D consistency and building structural distortions. For example,
PersistanNature [4] and InfiniCity [31] both appear to have severe deformation
in the whole scene level. SceneDreamer [9] and CityDreamer [52] have better
11

PersistentNature
SceneDreamer
InfiniCity
CityDreamer
Ours

Fig. 5: Comparative results on city generation. Issues with unreasonable geometry are
observed in previous works, while our method performs well in generating realistic
large-scale city scenes.

structure consistency, but the building quality is still relatively low. These fac-
tors limit their large-scale application in industry. In comparison, the proposed
SceneX generates highly realistic and well-structured urban scenes without the
issues of structural distortions and layout defects compared to learning-based
methods. These results demonstrate the effectiveness of our method for large-
scale city scene generation.
Aesthetic Evaluation. To better evaluate the generation quality of SceneX ,
we collect the results of related works involving text-to-3D work and Blender-
driven 3D generation. These results are subjected to aesthetic evaluation by a
panel comprising of 35 voluntary contributors and 5 experts in 3D modeling, with
the scoring criteria outlined in Section 4.1. As shown in Table 1, our scores for AS
and AES surpass the second-highest scores by 2.10 and 0.9 points, respectively.
Compared to the average level of other works, our project reaches a good level,
indicating the high generation quality of SceneX .
To evaluate the consistency between text inputs and the generated assets,
we calculate the CLIP similarity between input text and rendered images. To
better illustrate the results, we utilize three different CLIP models for testing,
including ViT-L/14 (V-L/14), ViT-B/16 (V-B/16) and ViT-B/32 (V-B/32), re-
spectively. The detailed results are displayed in Table 2. We compare represen-
tative text-to-3D approaches (e.g. WonderJ [55], Text2Room [16], and Dream-
Fusion [39]) and Blender-driven 3D generation works (e.g. BlenderGPT, 3D-
GPT [45], and SceneCraft [19]). Although the similarity scores of text-to-3D
methods are higher, it is reasonable because their training or optimization in-
cludes the text-to-image alignment process. Compared to the Blender-driven 3D
12

Generate a tree with the sprawling branches Make trees more lush Make the tree look like a dead a tree in autumn
tree

Generate a winter terrain of forest increase the number of mountains lower the height of the mountain convert snowy mountains to mud

Fig. 6: Visualization of the personalized editing results on different scenes.

Table 1: Comparative analysis of aver- Table 2: Assessing prompt-rendered re-


age score (AS) and average expert score sult similarity with various models.
(AES).
CLIP SIM
Method AS AES V-L/14 V-B/16 V-B/32
Magic 3D [30] 4.48 3.50 WonderJ [55] 18.78 25.70 25.45
DreamFusion [39] 4.55 3.60 Text2Room [16] 23.51 30.10 29.29
Text2Room [16] 5.73 6.10 Magic 3D [30] 27.86 31.78 31.94
CityDreamer [52] 5.47 6.80 DreamFusion [39] 29.40 35.37 31.60
Infinigen [42] 5.42 6.00 BlenderGPT 21.23 25.65 26.19
3D-GPT [45] 4.94 6.20 3D-GPT [45] 18.67 25.80 25.59
WonderJ [55] 5.28 6.00 SceneCraft [19] 22.04 25.82 25.30
SceneX (Ours) 7.83 7.70 SceneX (Ours) 22.82 27.82 26.89

generation works, SceneX achieves the highest score, indicating its capability to
accurately execute the input prompts and generate results.
Efficiency Evaluation. To illustrate the efficiency of SceneX , we provide
the time required of our method compared with Infinigen [42] and human craft.
The experiments are performed on a server equipped with dual Intel Xeon Pro-
cessors (Skylake architecture), each with 20 cores, totaling 80 CPU cores. Addi-
tionally, we consult 3D PCG experts to determine the time needed to construct
the same natural and urban scenes. The comparison results are shown in Table 3.
From the results, we can observe that SceneX is 7 times faster than Infinigen [42]
in generating a scene with 150m × 150m size. We also provide nearly 30 times
faster than human experts creating a large-scale city by hand. This demonstrates
the impressive efficiency of SceneX in both large-scale natural scene and urban
scene generation.
Personalized Editing Results. To demonstrate the capability of our method
for personalized editing, we conduct experiments on 3D asset generation guided
by users’ instructions. The results are shown in Fig. 10. It is evident that the
changes in the edited text are closely related to the modifications in 3D as-
sets. SceneX demonstrates a versatile, highly controllable, and personalized edit-
13

Table 3: Comparing the time required for natural scene generation and city generation
at different terrain scales.

150m × 150m 500m × 500m 2.5km × 2.5km


Scene City Scene City Scene City
Human 1h 40min 3h 4h - >3w
Infinigen [42] 14min - - - - -
SceneX (Ours) 2min 1.5min 10min 6min - 20h

ing ability by manipulating 3D assets from various perspectives. These results


demonstrate that our method supports user-instruction editing, significantly re-
ducing the difficulty of 3D asset generation and accelerating the industrial pro-
duction process.

4.3 Ablation Study


In order to analyze the impact of various components within the systematic
template on the system, we conduct an ablation study based on the tree plugin
in PCGBench. During the experiment, we utilize the dataset from Section 4.1
for testing, maintaining a consistent format. We incrementally add or remove
different parts of the systematic template, using ER@1 and SR@1 as metrics to
observe the impact of various components on the system. The results are shown
in Table 4. It is evident from the results that augmenting the Task, Document,
Examples and Role components contributes to an increase in ER@1 and SR@1.
Among these, The inclusion of the Example component results in the most sig-
nificant improvement, resulting in a maximum increase of 76.00% and 51.09% in
ER@1 and SR@1 respectively. Conversely, the Role component has the least im-
pact, with maximum increases of 2.00% and 2.59% in ER@1 and SR@1 after its
addition. These results suggest that the setting of Task significantly impacts the
performance of LLMs. By clarifying the goals and requirements of the task, the
system can better understand the user’s intent and better meet the needs when
generating output. Examples play an important role in designing proxy prompts.
By providing concrete examples, the system can better understand the user’s in-
tent and needs and produce high-quality output related to the input text. The
inclusion of documentation is also important for designing proxy prompts. Doc-
umentation provides background information that can help the system better
meet user expectations when generating output.

4.4 Comparing with Different LLMs


To investigate the performance of different variants of large language models
(LLM) in SceneX , we test public LLM APIs like gpt-3.5-turbo, gpt-4 and sev-
eral external open-source LLMs in this subsection. To ensure the stability of
LLM outputs, we set the temperature of LLM to 0 for all experiments. We
conduct experiments on 3D scene and asset generation based on 50 scenario de-
scriptions and 50 object descriptions in Sec. 4.1. The results are presented in
Table 5. It is evident that gpt-4 delivers the most superior performance, with
Mistral closely following as the second-best. Due to its superior performance
14

Table 4: Results of different prompt components for tree asset generation. Examples,
Role, Document and Task represent four integral components of the prompt template.

Task Document Examples Role ER@1 SR@1


1 ✓ 16.00 25.00
2 ✓ 42.00 47.62
3 ✓ ✓ ✓ 84.00 71.43
4 ✓ 92.00 73.91
5 ✓ ✓ 92.00 76.09
6 ✓ ✓ 92.00 76.09
7 ✓ ✓ ✓ 92.00 78.26
8 ✓ ✓ ✓ 94.00 78.72
9 ✓ ✓ ✓ ✓ 94.00 80.85
Table 5: Comparing the performance of different language models in natural scene
generation and city generation.

Scene Generation Asset Generation


Model
ER@1 SR@1 ER@1 SR@1
Llama2-7B [47] 30.00 53.33 38.00 57.89
Llama2-13B [47] 44.00 59.09 54.00 66.66
Mistral [24] 76.00 68.42 94.00 85.11
Gemma-2B [13] 6.00 33.33 22.00 45.45
Gemma-7B [13] 36.00 55.56 68.00 73.53
GPT-3.5-turbo [2] 66.00 60.60 82.00 82.93
GPT-4 [37] 86.00 86.05 96.00 85.42

and lower hardware requirements, the open-source Mistral is a highly appealing


option. When compared to asset generation, the executability and success rates
noticeably decline during the generation of large-scale natural scenes, a trend
that can be attributed to the increased task complexity. In particular, as the
number of components involved in the system expands, the LLM may face chal-
lenges in maintaining their accuracy. Nonetheless, our method exhibits consistent
performance across different LLMs, maintaining high levels of executability and
success rates.

5 Conclusion

In this paper, we introduce an advanced framework for large-scale scene gener-


ation, denoted as SceneX . This framework automatically generates high-quality
procedural models based on designers’ textual descriptions. The proposed method
consists of two main components: PCGBench and PCGPlanner Agent. PCG-
Bench includes an extensive repository of accessible procedural assets and thou-
sands of meticulously crafted API documents. On the other hand, the PCGPlan-
ner Agent can generate executable actions for Blender to create controllable and
precise 3D assets following user instructions. Utilizing SceneX , we can generate
a city covering 2.5 km × 2.5 km with accurate layout and geometric structures.
This significantly reduces the time required, from two weeks for several profes-
15

sional PCG engineers to just a few hours for an ordinary user. Our method’s
capabilities in controllable large-scale scene generation and editing, including
tasks like asset placement and season translation, are validated through exten-
sive experiments.
Limitations Although the proposed SceneX is effective for the LLM-driven
procedural large-scale scene generation, it still faces several limitations: 1) The
performance of our method relies on the pretrained LLM model. This reliance
may constrain the framework from generalizing to a wider range of applications.
2) We have collected a limited number of assets and APIs, which limits the vari-
ety of generated scenes and degree of action space. In the future, it is promising
to fine-tune the LLMs with professional modeling knowledge and empower them
to be a professional modeling assistant. Collecting more resources for PCGBench
can also enhance the scene fidelity and aesthetics.

6 Acknowledgement

This work was supported in part by the National Key R&D Program of China
(No. 2022ZD0116500), the National Natural Science Foundation of China (No.
U21B2042, No. 62320106010), and in part by the 2035 Innovation Program of
CAS and the InnoHK program.

References

1. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida,
D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv
preprint arXiv:2303.08774 (2023) 4
2. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win-
ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark,
J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
models are few-shot learners (2020) 14
3. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee,
P., Lee, Y.T., Li, Y., Lundberg, S., et al.: Sparks of artificial general intelligence:
Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023) 4
4. Chai, L., Tucker, R., Li, Z., Isola, P., Snavely, N.: Persistent nature: A generative
model of unbounded 3d worlds. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. pp. 20863–20874 (2023) 10
5. Chen, G., Esch, G., Wonka, P., Müller, P., Zhang, E.: Interactive procedural street
modeling. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008) 2
6. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code. arXiv preprint arXiv:2107.03374 (2021) 5, 6
7. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code. arXiv preprint arXiv:2107.03374 (2021) 10
16

8. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: Disentangling geometry
and appearance for high-quality text-to-3d content creation. arXiv preprint
arXiv:2303.13873 (2023) 3
9. Chen, Z., Wang, G., Liu, Z.: Scenedreamer: Unbounded 3d scene generation from
2d image collections. arXiv preprint arXiv:2302.01330 (2023) 2, 4, 10
10. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A.,
Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: Scaling lan-
guage modeling with pathways. Journal of Machine Learning Research 24(240),
1–113 (2023) 4
11. Fridman, R., Abecasis, A., Kasten, Y., Dekel, T.: Scenescape: Text-driven consis-
tent scene generation. arXiv preprint arXiv:2302.01133 (2023) 3
12. Gasch, C., Sotoca, J., Chover, M., Remolar, I., Rebollo, C.: Procedural modeling of
plant ecosystems maximizing vegetation cover. Multimedia Tools and Applications
81 (05 2022). https://doi.org/10.1007/s11042-022-12107-8 4
13. Gemma Team, G.D.: Gemma: Open models based on gemini research and tech-
nology, see Contributions and Acknowledgments section for full author list. Please
send correspondence to [email protected]. 14
14. Gong, R., Huang, Q., Ma, X., Vo, H., Durante, Z., Noda, Y., Zheng, Z., Zhu, S.C.,
Terzopoulos, D., Fei-Fei, L., et al.: Mindagent: Emergent gaming interaction. arXiv
preprint arXiv:2309.09971 (2023) 4
15. Hao, Z., Mallya, A., Belongie, S., Liu, M.Y.: Gancraft: Unsupervised 3d neural
rendering of minecraft worlds. In: Proceedings of the IEEE/CVF International
Conference on Computer Vision. pp. 14072–14082 (2021) 4
16. Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2room: Extracting
textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989
(2023) 3, 11, 12
17. Hong, F., Chen, Z., Yushi, L., Pan, L., Liu, Z.: Eva3d: Compositional 3d human
generation from 2d image collections. In: The Eleventh International Conference
on Learning Representations (2022) 2
18. Hong, F., Zhang, M., Pan, L., Cai, Z., Yang, L., Liu, Z.: Avatarclip: Zero-shot text-
driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535
(2022) 3
19. Hu, Z., Iscen, A., Jain, A., Kipf, T., Yue, Y., Ross, D.A., Schmid, C., Fathi, A.:
Scenecraft: An llm agent for synthesizing 3d scene as blender code (2024) 2, 11,
12
20. Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot
planners: Extracting actionable knowledge for embodied agents. In: International
Conference on Machine Learning. pp. 9118–9147. PMLR (2022) 4
21. Huang, W., Wang, C., Zhang, R., Li, Y., Wu, J., Fei-Fei, L.: Voxposer: Composable
3d value maps for robotic manipulation with language models. arXiv preprint
arXiv:2307.05973 (2023) 4
22. Imani, S., Du, L., Shrivastava, H.: Mathprompter: Mathematical reasoning using
large language models. arXiv preprint arXiv:2303.05398 (2023) 4
23. Jeblick, K., Schachtner, B., Dexl, J., Mittermeier, A., Stüber, A.T., Topalis, J.,
Weber, T., Wesp, P., Sabel, B.O., Ricke, J., et al.: Chatgpt makes medicine easy
to swallow: an exploratory case study on simplified radiology reports. European
radiology pp. 1–9 (2023) 4
24. Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de las Casas,
D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L.R., Lachaux,
M.A., Stock, P., Scao, T.L., Lavril, T., Wang, T., Lacroix, T., Sayed, W.E.: Mistral
7b (2023) 14
17

25. Jiang, R., Wang, C., Zhang, J., Chai, M., He, M., Chen, D., Liao, J.: Avatarcraft:
Transforming text into neural human avatars with parameterized shape and pose
control. arXiv preprint arXiv:2303.17606 (2023) 3
26. Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional
transformers for language understanding. In: Proceedings of naacL-HLT. vol. 1,
p. 2 (2019) 4
27. Kolotouros, N., Alldieck, T., Zanfir, A., Bazavan, E.G., Fieraru, M., Smin-
chisescu, C.: Dreamhuman: Animatable 3d avatars from text. arXiv preprint
arXiv:2306.09329 (2023) 2, 3
28. Li, H., Su, J., Chen, Y., Li, Q., Zhang, Z.: Sheetcopilot: Bringing software
productivity to the next level through large language models. arXiv preprint
arXiv:2305.19308 (2023) 3, 5
29. Li, Y., Jiang, L., Xu, L., Xiangli, Y., Wang, Z., Lin, D., Dai, B.: Matrixcity: A
large-scale city dataset for city-scale neural rendering and beyond. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 3205–3215
(2023) 2, 4
30. Lin, C.H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler,
S., Liu, M.Y., Lin, T.Y.: Magic3d: High-resolution text-to-3d content creation.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 300–309 (2023) 3, 12
31. Lin, C.H., Lee, H.Y., Menapace, W., Chai, M., Siarohin, A., Yang, M.H., Tulyakov,
S.: Infinicity: Infinite-scale city synthesis. arXiv preprint arXiv:2301.09637 (2023)
2, 4, 10
32. Lindenmayer, A.: Mathematical models for cellular interactions in development
i. filaments with one-sided inputs. Journal of Theoretical Biology 18(3), 280–299
(1968). https://doi.org/https://doi.org/10.1016/0022- 5193(68)90079- 9,
https://www.sciencedirect.com/science/article/pii/0022519368900799 2
33. Lipp, M., Scherzer, D., Wonka, P., Wimmer, M.: Interactive modeling of city lay-
outs using layers of procedural content. In: Computer Graphics Forum. vol. 30, pp.
345–354. Wiley Online Library (2011) 2, 4
34. Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite
nature: Perpetual view generation of natural scenes from a single image. In: Pro-
ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
(October 2021) 4
35. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-
1-to-3: Zero-shot one image to 3d object. In: Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision. pp. 9298–9309 (2023) 2, 3
36. Melas-Kyriazi, L., Laina, I., Rupprecht, C., Vedaldi, A.: Realfusion: 360deg recon-
struction of any object from a single image. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 8446–8455 (2023) 2
37. OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L.,
Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I.,
Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I.,
Berdine, J., Bernadett-Shapiro, G., Berner, C., Bogdonoff, L., Boiko, O., Boyd,
M., Brakman, A.L., Brockman, G., Brooks, T., Brundage, M., Button, K., Cai, T.,
Campbell, R., Cann, A., Carey, B., Carlson, C., Carmichael, R., Chan, B., Chang,
C., Chantzis, F., Chen, D., Chen, S., Chen, R., Chen, J., Chen, M., Chess, B.,
Cho, C., Chu, C., Chung, H.W., Cummings, D., Currier, J., Dai, Y., Decareaux,
C., Degry, T., Deutsch, N., Deville, D., Dhar, A., Dohan, D., Dowling, S., Dunning,
S., Ecoffet, A., Eleti, A., Eloundou, T., Farhi, D., Fedus, L., Felix, N., Fishman,
18

S.P., Forte, J., Fulford, I., Gao, L., Georges, E., Gibson, C., Goel, V., Gogineni,
T., Goh, G., Gontijo-Lopes, R., Gordon, J., Grafstein, M., Gray, S., Greene, R.,
Gross, J., Gu, S.S., Guo, Y., Hallacy, C., Han, J., Harris, J., He, Y., Heaton, M.,
Heidecke, J., Hesse, C., Hickey, A., Hickey, W., Hoeschele, P., Houghton, B., Hsu,
K., Hu, S., Hu, X., Huizingi, J., Jain, S., Jain, S., et al.: Gpt-4 technical report
(2023) 14
38. Parish, Y., Müller, P.: Procedural modeling of cities. vol. 2001, pp. 301–308 (08
2001). https://doi.org/10.1145/1185657.1185716 4
39. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d
diffusion. In: The Eleventh International Conference on Learning Representations
(2022) 2, 3, 4, 11, 12
40. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G.,
Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from
natural language supervision. In: International conference on machine learning. pp.
8748–8763. PMLR (2021) 6, 8
41. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li,
W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text
transformer. The Journal of Machine Learning Research 21(1), 5485–5551 (2020)
4
42. Raistrick, A., Lipson, L., Ma, Z., Mei, L., Wang, M., Zuo, Y., Kayan, K., Wen, H.,
Han, B., Wang, Y., et al.: Infinite photorealistic worlds using procedural generation.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 12630–12641 (2023) 2, 4, 12, 13
43. Raj, A., Kaza, S., Poole, B., Niemeyer, M., Ruiz, N., Mildenhall, B., Zada, S.,
Aberman, K., Rubinstein, M., Barron, J., et al.: Dreambooth3d: Subject-driven
text-to-3d generation. arXiv preprint arXiv:2303.13508 (2023) 2
44. Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: Solving ai
tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580
(2023) 3, 5, 6
45. Sun, C., Han, J., Deng, W., Wang, X., Qin, Z., Gould, S.: 3d-gpt: Procedural 3d
modeling with large language models. arXiv preprint arXiv:2310.12945 (2023) 2,
6, 11, 12
46. Talton, J.O., Lou, Y., Lesser, S., Duke, J., Mech, R., Koltun, V.: Metropolis pro-
cedural modeling. ACM Trans. Graph. 30(2), 11–1 (2011) 2, 4
47. Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash-
lykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C.C.,
Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao,
C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kar-
das, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux,
M.A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov,
T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Sal-
adi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X.E., Tang,
B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan,
A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., Scialom,
T.: Llama 2: Open foundation and fine-tuned chat models (2023) 14
48. Vanegas, C., Kelly, T., Weber, B., Halatsch, J., Aliaga, D., Müller, P.: Procedural
generation of parcels in urban modeling. Computer Graphics Forum 31, 681–690
(05 2012). https://doi.org/10.1111/j.1467-8659.2012.03047.x 4
49. Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., Anand-
kumar, A.: Voyager: An open-ended embodied agent with large language models.
arXiv preprint arXiv:2305.16291 (2023) 3, 5, 6
19

50. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou,
D., et al.: Chain-of-thought prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems 35, 24824–24837 (2022) 4
51. Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan, N.: Visual chatgpt:
Talking, drawing and editing with visual foundation models. arXiv preprint
arXiv:2303.04671 (2023) 3, 5, 6
52. Xie, H., Chen, Z., Hong, F., Liu, Z.: Citydreamer: Compositional generative model
of unbounded 3d cities. arXiv preprint arXiv:2309.00610 (2023) 2, 4, 10, 12
53. Yang, K., Ji, S., Zhang, T., Xie, Q., Ananiadou, S.: On the evaluations of chat-
gpt and emotion-enhanced prompting for mental health analysis. arXiv preprint
arXiv:2304.03347 (2023) 4
54. Yang, Y.L., Wang, J., Vouga, E., Wonka, P.: Urban pattern: Layout design by hi-
erarchical domain splitting. ACM Transactions on Graphics (Proceedings of SIG-
GRAPH Asia 2013) 32, Article No. xx (2013) 4
55. Yu, H.X., Duan, H., Hur, J., Sargent, K., Rubinstein, M., Freeman, W.T., Cole, F.,
Sun, D., Snavely, N., Wu, J., Herrmann, C.: Wonderjourney: Going from anywhere
to everywhere (2023) 11, 12
56. Zeng, A., Attarian, M., Ichter, B., Choromanski, K., Wong, A., Welker, S., Tombari,
F., Purohit, A., Ryoo, M., Sindhwani, V., et al.: Socratic models: Composing zero-
shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598 (2022)
4
57. Zhang, C., Yang, K., Hu, S., Wang, Z., Li, G., Sun, Y., Zhang, C., Zhang, Z.,
Liu, A., Zhu, S.C., et al.: Proagent: Building proactive cooperative ai with large
language models. arXiv preprint arXiv:2308.11339 (2023) 4
58. Zhang, C., Chen, Y., Fu, Y., Zhou, Z., Yu, G., Wang, B., Fu, B., Chen, T., Lin, G.,
Shen, C.: Styleavatar3d: Leveraging image-text diffusion models for high-fidelity
3d avatar generation. arXiv preprint arXiv:2305.19012 (2023) 2, 3
59. Zhang, G., Wang, Y., Luo, C., Xu, S., Peng, J., Zhang, Z., Zhang, M.: Furniscene:
A large-scale 3d room dataset with intricate furnishing scenes. arXiv preprint
arXiv:2401.03470 (2024) 3
60. Zhang, J., Wang, C.b., Qin, H., Chen, Y., Gao, Y.: Procedural modeling of rivers
from single image toward natural scene production. The Visual Computer 35 (02
2019). https://doi.org/10.1007/s00371-017-1465-7 4
61. Zhang, J., Li, X., Wan, Z., Wang, C., Liao, J.: Text2nerf: Text-driven 3d scene
generation with neural radiance fields. arXiv preprint arXiv:2305.11588 (2023) 3
62. Zhang, Q., Wang, C., Siarohin, A., Zhuang, P., Xu, Y., Yang, C., Lin, D., Zhou, B.,
Tulyakov, S., Lee, H.Y.: Scenewiz3d: Towards text-guided 3d scene composition.
arXiv preprint arXiv:2312.08885 (2023) 3
63. Zhu, X., Chen, Y., Tian, H., Tao, C., Su, W., Yang, C., Huang, G., Li, B., Lu, L.,
Wang, X., et al.: Ghost in the minecraft: Generally capable agents for open-world
enviroments via large language models with text-based knowledge and memory.
arXiv preprint arXiv:2305.17144 (2023) 5

A Supplementary Materials

In this paper, we propose an advanced SceneX for large-scale scene generation


according to the user’s instructions. The reproducible code is attached to the
supplementary material, and some other important details are as follows.
20

A.1 PCGBench Collection

Blender Market offers a wide range of PCG plugins and 3D asset resources.
To meet the demand for generating large-scale natural and urban scenes, we
constructed PCGbench from four directions: Materials and Shading, Model-
ing, Rendering, and Asset Placement. Specifically, for generation-based PCG,
we collected PCG models such as terrain, trees, flowers, rocks, and buildings.
For material-based PCG, we gathered modifiable materials such as plant leaves,
plant roots, terrain, buildings, snow, and sky. For asset placement-based PCG, we
collected various placement PCG methods including slope-based, height-based,
object-based, and geometric space-based placements. In terms of rendering-based
PCG, we collected PCG resources related to rendering engines, rendering set-
tings, and renderer optimizations. Additionally, for 3D asset resources, we col-
lected a wide range of categories including buildings, vegetation, urban infras-
tructure, and textures sourced from the extensive realm of the internet.

Raw API collection: These PCG plugins are predominantly categorized


into two types: one based on geometric node groups and the other based on
Python code.It is necessary to gather API-related information from these PCG
plugins, including API names, API function descriptions, API parameter de-
scriptions, API parameter value ranges, and API parameter default values. For
plugins based on geometric node groups, in order to obtain their original APIs,
we need to test the geometric node groups of the plugins in Blender software
and document their original APIs along with the corresponding information. For
plugins based on Python code, we need to test the source code and record the
functions within the plugin code as the original APIs, while also documenting
the relevant information for these functions.

PCGBench: In order to optimize the usage of PCG plugins in PCGPlanner,


we need to simplify each plugin. The simplification rules are as follows:

• For generation-based plugins, we extract the APIs related to grid gener-


ation and remove irrelevant APIs. The criteria for successful extraction is
the ability to independently generate the corresponding 3D grid using the
plugin.
– • For material-based plugins, we extract the APIs that provide material
variations and remove irrelevant APIs. The criteria for successful extraction
is the ability to independently change the corresponding material using the
plugin.
– • For placement-based plugins, we extract the APIs related to placement
functionality and remove irrelevant APIs. Additionally, to accommodate
multiple placement plugins, we have designed a universal collection format.
The criteria for successful extraction is the ability to independently place
objects in the current scene using the plugin.
– • For rendering-based plugins, we extract the APIs that provide scene ren-
dering capabilities and remove irrelevant APIs. The criteria for successful
21

extraction is the ability to independently perform the rendering tasks for


the entire 3D scene using the plugin.

Furthermore, to better integrate and utilize all the PCG plugins and 3D assets,
we actively utilize Blender’s API to create some basic functions such as import-
ing, deleting, moving, and deforming. These APIs can be accessed and called by
PCGPlanner to achieve better results. 3D Asset Dataset: In this study, we
have developed a comprehensive dataset consisting of a large collection of 3D as-
sets, including 1908 meticulously crafted models and 1294 textures. To facilitate
text-to-text and text-to-image retrieval for the CLIP model, we have extensively
annotated each 3D asset and supplemented them with rendered images.

Algorithm 1: The workflow of SceneX


Input: User query q, knowledge documentation D, The retrieved API α
in the API collection A, the retrieved model γ in the asset
collection Γ , Pi represents for the Prompt of each Agent,
End_f lag = F alse
Output: Modified API α∗
1 When receiving a user query, the multi-agent frame starts the processing
workflow;
2 while end_f lag == False do
3 Breaks down the input q into a rough plan;
4 {o1 , . . . , on } = Agentdispatch (q, Pdispatch );
5 for oi in {o1 , . . . , on } do
6 {s1 , . . . , sn } = Agentspecialist (oi , Pspecialist );
7 Make detailed plans for each object;
8 for sj in {s1 , . . . , sn } do
9 Asset α or model γ = Agentretrieval (sj , Pretrieval );
10 Use CLIP model for retrieval;
11 if α then
12 Modify the retrieved API α by
α∗ ← Agentretrieval (α, Pspecialist , sj )
13 else
14 Directly import the model γ into scene project;
15 end
16 if All the sub-task is completed then
17 End_f lag = T rue;
18 end
19 end
20 end
21 end
22 Ends when all the sub-tasks in the detailed plan are achieved.;
22

A.2 Algorithm

In this chapter, we provide the pseudocode for the entire framework implemen-
tation workflow to clearly illustrate the logical relationship between the loops
between the various agents. First, we use a loop to check the termination condi-
tion, i.e. stop the execution of the algorithm when End_f lag = True. In each
loop, we first call the dispatch agent, pass the input q and Prompt to the agent,
and receive a set of rough plan objects o1 , ..., on . Next, we iterate over each object
and use the specialist agent to further break it down into a detailed plan to get a
set of s1 , .., sn . We then execute each detailed plan and call Agent_retrieval and
CLIP to search for the corresponding assets and APIs in PCGBench. Finally, we
evoke execution Agent to manipulate each retrieved API α, until all the sub-task
is completed and change the End_f lag to True.

A.3 Examples of Prompt Details

To provide a more intuitive demonstration of how agent prompts are designed


and executed, we are presenting a series of examples. In Fig. 8 9 10 7, and 11,
we show prompts for PCG plugins with different functionalities, along with the
corresponding API calls for each PCG plugin. We also provide a tested example
for each PCG component.
These images illustrate that our systematized prompt framework enables the
successful execution of different tasks. Furthermore, the adoption of a unified
prompt template for all agents simplifies maintenance and facilitates consistent
invocation.

A.4 More Visualization Results


In Fig 12, we present the extended capabilities of SceneX in urban and natural
scene generation. The results demonstrate that our proposed SceneX framework
is capable of generating highly realistic scenes in both natural and urban en-
vironments. Furthermore, the generated content accurately corresponds to the
provided texture descriptions. These outcomes serve as compelling evidence of
the effectiveness and efficiency of our proposed LLM-based automatic 3D scene
generation framework. In Fig 13, we show scenes generated by SceneX without
the utilization of any materials, highlighting the partial PCG components em-
ployed in the scene generation process, particularly the placement functionality
achieved through geometric nodes. The results indicate that the generation of
complex scenes requires the use of several sophisticated PCG plugins. Moreover,
the generated scenes are solely based on object meshes, rather than relying on
materials or rendering techniques. This implies that the scenes generated by
SceneX possess depth and other essential information, which holds significant
implications across various domains.
23

Tamplate:
Role: You serve as a Blender modeling task planner that helps me design a scene.
Task: You will receive a paragraph description of the scene, your task is to help me to make a list includes all the
objects that may appear in the scene.
Document:
You should follow the principles when listing objects:
- The objects should be in one of the modules of {Terrain, City, Weather, Details}
- Terrain: the fundamental terrain, including {plain, basin, river, pool, mountain, hill, canyon}
- City: the major element that makes up the city {layout, road, sidewalk, vegetation, forest, buildings}
- Weather: the different light situation, including{morning, noon, afternoon, evening, midnight}
- Details: all the details may appear in the scene, for natural terrain there are {stone, grass, tree, brushwood,
flowers, …}, for the city, there are {people, trash bins, bicycles, cars, …}, the details are not limited in the
given list. You can imagine whatever may appear in the scene.
You should output a list that contains all the objects in a reasonable order. The objects should be arranged
according to the module in order from bottom to top. For the natural scene, it should be listed in the order of
{Terrain-Details-weather}, for the city scene, it should follow the order {City-Details-Weather}.
Format: Output the rough plan in the following form:
{{Module : Object name},…,{Module : Object name}}
Examples:
{{City : Small and crowded layout}, {City : Asphalt road}, {City : Cement sideWalk}, {City : Grass}, {City :
Maple}, {City : Modern buildings}, {Details : Bycicle}, {Details : People }, {Details : Trash bin}, {Weather :
Sunny noon.}}

Role: You serve as a Blender modeling specialist who helps me design a scene.
Task: You will receive an object description, your task is to make a list that contains a detailed plan to generate
this object in the blender project.
Document:
For each target object in the plan, there is a common generation and manipulation process. Each target may need to
go through at least one step to be placed in the scene, you should make a plan step by step according to the main
task list as follows:
Step 1: Retrieve the required asset. Target objects{Layout, Building, Grass, Tree, Terrain, Weather,…}
Step 2: Manipulate the parameters to change the mesh. Target objects{Terrain, Tree, Building,… }
Step 3: Make the texture manipulation. Target objects{Road, Leaf, Water, Weather, Ground,…}
Step 4: Set the proper location or distribution for the asset. Target objects{Tree, Stone, People, Trash bin, Car,…}
For each task, you should make an inference on which of these steps is to be used. In the meantime, you should
adopt the asset and its carrier according to the target task.
Format: Output the detailed plan in the following form :
{Step 1: The description of task 1},
{Step 2: The description of task 2},
{Step 3: The description of task 3},
{Step 4: The description of task 4}
Examples:
{Step 1: Retrieve the maple tree model},
{Step 2: Change it to a stout pine tree with lush foliage},
{Step 3: Change the leaves to yellow},
{Step 4: Distribute the pine tree at the parking area in a sparse density}

Fig. 7: Prompt Example for Planner: A Template, Code Snippet, and Specific Exam-
ple.
24

Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating terrain mesh. The
API that you can invoke for this purpose is called "Terrain_mesh_generate".
Document:
[{Name: Rotate_Slope, Description: This parameter allows you to define the magnitude of the terrain slope,Value:
-6.280 – 6.280, Default: 0.000},
{Name: Flat_Floor,Description: This parameter allows you to flatten the ground,
Value:[True,False],Default:False},
{Name: Sink, Description: You can change the elevation or height of the terrain, Value: 0.000 – 1.000,Default:
0.500},
{Name: Add_Valleys,Description: Based on the input provided by the user, speculate on whether the addition of
valleys is necessary., Value: :[True,False],Default: False},
{Name: Scale_Valleys,Description: Based on the user input, speculate on the scaling factor for the valleys., Value:
0.000 – 5.000,Default: 1.000}, ...]
Format: Out put the modified api in the follow form:
[Rotate_Slope: You need to give a parameter value in the range [-6.280 ,6.280], present the answer in the form of
a list,
Flat_Floor: You can only choose one from ["True", "False"], present the answer in the form of a list,
Sink: You need to give a parameter value in the range [0, 1], present the answer in the form of a list, ...],
Add_Valleys: You can only choose one from ["True", "False"], present the answer in the form of a list,
Scale_Valleys: You need to give a parameter value in the range [0.000, 5.000], present the answer in the form of a
list...]...
Examples: Inputs: Construct a summer terrain of flat and vast grasslands. Outputs:[Rotate_Slope: [0.000],
Flat_Floor: ["True"], Sink: [0.500], Add_Valleys: ["True"], Scale_Valleys:[1.000] ...]

Code:
'''python
import Terrain_mesh_gengerate
def Terrain_mesh_generate(Rotate_Slope, Flat_Floor, Sink, Add_Valleys, Scale_Valleys, ...):
bpy.context.object.modifiers["Terrain Mix"]["Input_25"] = Rotate_Slope
bpy.context.object.modifiers["Terrain Mix"]["Input_26"] = Flat_Floor
bpy.context.object.modifiers["Terrain Mix"]["Input_27"] = Sink
bpy.context.object.modifiers["Terrain Mix"]["Input_41"] = Add_Valleys
bpy.context.object.modifiers["Terrain Mix"]["Input_43"] = Scale_Valley
...
return
'''
Example:
Question: The summer highland terrain unveils a captivating expanse, with undulating hills and valleys covered in a
verdant carpet of lush green grasses, painting a breathtaking tableau of natural beauty.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the summer highland terrain:
```plaintext
[Rotate_Slope: [3.142], Flat_Floor: ["False"], Sink: [0.750], Add_Valleys: ["True"], Scale_Valleys: [2.500]...]
```
This configuration will help create undulating hills and valleys, allowing the terrain to have a natural appearance while
maintaining its captivating beauty.
Code:
'''python
Terrain_mesh_gengerate(Rotate_Slope=3.142, Flat_Floor=False, Sink=0.750, Add_Valleys=True,
Scale_Valleys=2.500, ...)
'''

Fig. 8: Prompt example for terrain PCG mesh generation: a template, code snippet,
and specific example.
25

Tamplate:
Role: You as the Blender API operator choose feasible parameter values based on the description. You need to
determine the placement parameters for each individual asset
Task: Your task is to break down the user's input into the parameters required for placing objects. The API that
you can invoke for this purpose is called "Scatter".
Document:
[{Name: Density,Description: Based on the user input, speculate on the vegetation density coefficient.,Value: 0.02
- 0.50, Default: 0.20},
{Name: Max_Height, Description: Based on the user input, speculate on the maximum height for placing object ,
represented as a percentage, Value:0 - 1.000, Default:0.800},
{ Name: Min_Height, Description: Based on the user input, speculate on the minimum height for placing object
represented as a percentage, Value:0 - 1.000, Default:0.200},
{Name: Max_Slope, Description: Based on the user input, speculate on the maximum degree of inclination for
placing objects, Value: 0 - 90, Default= 90},
{Name: Min_Slope, Description: Based on the user input, speculate on the minimum degree of inclination for
placing objects, Value: 0 - 90, Default= 0},...]
Format: Out put the modified api in the follow form:
[Density: You need to give a parameter value in the range [0.02,0.50], present the answer in the form of a list,
Max_Height: You need to give a parameter value in the range [0, 1.000], present the answer in the form of a list,
Min_Height: You need to give a parameter value in the range [0, 1.000], present the answer in the form of a list,
Max_Slope : You need to give a parameter value in the range [0, 90], present the answer in the form of a list,
Min_Slope : You need to give a parameter value in the range [0, 90], present the answer in the form of a list,
...
Examples: Inputs: The tree stands bare without leaves, its branches spreading out in all directions. Outputs:
[Density: [0.2], Max_Height: [1.0], Min_Height: [0.6], Max_Slope : [0], Min_Slope : [60]]

Code:
'''python
import Scatter
def Scatter(object, Density, Max_Height, Min_Height, Max_Slope, Min_Slope, ...):
node_group = bpy.data.node_groups["Density_Scatter_3"]
node_group.nodes["Density_Scatter_slope"].inputs[5].default_value = math.radians(Min_Slope)
node_group.nodes["Density_Scatter_slope"].inputs[7].default_value = math.radians(Max_Slope)
node_group.nodes["Density_Scatter_elev"].inputs[5].default_value = Min_Height
node_group.nodes["Density_Scatter_elev"].inputs[7].default_value = Max_Height
...
return
'''

Example:
Question: The tree stands adorned with lush foliage, its branches reaching upwards in unison.
Solution: Based on the provided document and your description, here are the parameter values you can use for
placing objects, assuming the tree as the object:
```plaintext
[Density: [0.20], Max_Height: [1.000], Min_Height: [0.200], Max_Slope: [90], Min_Slope: [0], ...]
```
This configuration will help place the tree objects with a moderate density, extending from the minimum height of
20% to the maximum height of 100%, and with no restriction on the slope, allowing the branches to spread out in all
directions as described.Code:
'''python
Scatter(obj, Density=0.20, Max_Height=1.000, Min_Height=0.200, Max_Slope=90, Min_Slope=0, ...)
'''

Fig. 9: Prompt example for object placement: a template, code snippet, and specific
example.
26

Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating tree mesh. The API
that you can invoke for this purpose is called "Tree_mesh_generate".
Document:
[{Name: IsTrunkStraight,Description: Based on the input provided by the user, make an inference regarding
whether the tree trunk is perfectly straight or not,Value: [True,False], Default: True},
{Name: TreeHeight, Description: Based on the information provided by the user, estimate the height of the
tree,Value:0 - 100, Default:5},
{ Name: LeafShape, Description: Based on the input from the user, speculate on the shape of the leaves,Value:
[Hexagonal, Elongated, Acerose (Maple-like)], Default: Elongated},
{Name: CrownDensity, Description: Based on the input from the user, speculate on the density of the tree
canopy,Value: 0.5 - 1.5,Default= 1.0},
{Name: LeafDensity, Description: Based on the user's input, speculate on the density of the tree leaves, Value:
[Dense, Medium],Default = Medium}]
Format: Out put the modified api in the follow form:
[IsTrunkStraight : You can only choose one from ["True", "False"], present the answer in the form of a list,
TreeHeight : You need to give a parameter value in the range [0, 100], present the answer in the form of a list,
LeafShape : You can only choose one from [Hexagonal, Elongated, Acerose (Maple-like)], present the answer in
the form of a list,
CrownDensity : You can only choose one from [Dense, Medium, Sparse], present the answer in the form of a list,
LeafDensity : You can only choose one from [Dense, Medium], present the answer in the form of a list]
Examples: Inputs: Generate a lush leaves and a moderate height. Outputs: [IsTrunkStraight : ["True"],
TreeHeight : [5], LeafShape : [Hexagonal], CrownDensity : [1.0], LeafDensity : [Dense]]

Code:
'''python
import Tree_mesh_generate
def Tree_mesh_generate(IsTrunkStraight, TreeHeight, LeafShape, CrownDensity, LeafDensity):
bpy.ops.preferences.addon_enable(module="add_curve_sapling")
para = process(IsTrunkStraight, TreeHeight, LeafShape, CrownDensity, LeafDensity) # Process parameters.
tree_mesh_generate_1(**para) # Generate a tree grid based on the given parameters.
return
'''

Example:
Question: The tree stands tall and straight, with branches reaching out like arms, adorned with lush green leaves that
gently sway in the breeze.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the tree mesh:
```plaintext
[IsTrunkStraight: ["True"], TreeHeight: [15], LeafShape: ["Elongated"], CrownDensity: ["Dense"], LeafDensity:
["Medium"]]
```
This configuration will help create a tall tree with a straight trunk, adorned with lush green leaves that gently sway in
the breeze, capturing the essence of the description you provided.
Code:
'''python
Tree_mesh_generate(IsTrunkStraight=True, TreeHeight=15, LeafShape="Elongated", CrownDensity="Dense",
LeafDensity="Medium")
'''

Fig. 10: Prompt example for tree PCG mesh generation: a template, code snippet,
and specific example.
27

Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating terrain. The API that
you can invoke for this purpose is called "map_generate".
Document:
[{Name: MapSize,Description: Based on the input provided by the user, make an inference about the map size for
the scene,Value: [500, 700, 1000, 1500, 2500], Default: 500},
{Name: BuildingDensity, Description: Based on the information provided by the user, estimate the buiding
density of the city,Value:[0 ~ 1], Default:1},
{ Name: VegetationDensity, Description: Based on the input from the user, make an inference about the tree
density on the street ,Value: [None, Few, Normal, Massive], Default: Normal},
{Name: SidewalkStyle, Description: Based on the input from the user, mkae an inference about the surface texture
of the sidewalk,Value: [Soil, Asphalt, Cement, Wooden, Brick,...], Default= Cement},
...]
Format: Out put the modified api in the follow form:
[MapSize: You can only choose one from : [500, 700, 1000, 1500, 2500], present the answer in the form of a list,
BuildingDensity: You need to decide the parameter value in the range [0, 1], present the answer in the form of a
list,
VegetationDensity: You can only choose one from : [None, Few, Normal, Massive], present the answer in the
form of a list,
SidewalkStyle: You can only choose one from[Soil, Asphalt, Cement, Wooden, Brick,...], present the answer in
the form of a list,
Examples: Inputs: Generate a small block with crowded buildings and no vegetation. Outputs: [MapSize : [500],
BuildingDensity : [1], VegetationDensity : [None], SidewalkStyle : [Cement], …]

Code:
'''map_generate
import map_generate
def map_generate(MapSize, BuildingDensity, VegetationDensity, SidewalkStyle, ...):
para = process(MapSize, BuildingDensity, VegetationDensity, SidewalkStyle, ...) # Process parameters.
map_size_config(MapSize)
building_density_config(BuildingDensity)
...
return

Example:
Question: Generate a thriving, modern urban center with high-rise buildings and beautiful urban greenery.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the map_generate:
```plaintext
: [MapSize : [500], BuildingDensity : [0.8], VegetationDensity : [Normal], SidewalkStyle : [Cement], …] ```
Code:
'''python
map_generate(MapSize =500, BuildingDensity =0.8, VegetationDensity =" Normal ", SidewalkStyle =" Cement
", …
'''

Fig. 11: Prompt example for city layout generation: a template, code snippet, and
specific example.
28

Generates a cityscape showing a large number of buildings, streets, and trees. Buildings range in style from modern skyscrapers to traditional
low-rise buildings. There are cars moving on the streets, and the green areas represent parks or green spaces.

Generates a cityscape containing multiple skyscrapers, residential buildings, intersections, traffic lights, and other buildings. The buildings are
surrounded by green trees, forming a modern urban scene.

Amidst the urban sprawl, a harmonious blend of cityscape and greenery unfolds, creating a vibrant environment where modern architecture
seamlessly coexists with lush plant life.

Early summer, under the bright sun, the forest is thriving with lush vegetation.

In the center of the majestic canyon, the bright sun casts a warm ray, illuminating the rugged beauty of the rocky terrain, which is dotted
with flowers and plants.

In the wilderness in early autumn, in a dense forest, the soft sunlight bathes the surroundings in warm light, creating a peaceful and warm
atmosphere.

As pure white snow blankets the surroundings, the world transforms into a peaceful winter wonderland, interrupted only by the subtle crunch of
the snow beneath your feet.

Fig. 12: Rendered scene results obtained by inputting text into SceneX : including
three urban and four natural scenes with varied prompts to generate diverse 3D envi-
ronments.
29

Fig. 13: The image illustrates scenes rendered without applying any materials and
displays partial PCG components used for each scene (presented as a composition of
geometry nodes).

You might also like