SceneX - Procedural Controllable Large-Scale Scene Generation Via Large-Language Models
SceneX - Procedural Controllable Large-Scale Scene Generation Via Large-Language Models
BlenderOSM
User descriptions: Generate beautiful cherry blossoms /Generate large-scale forest/Generate a modern city.
Fig. 1: The proposed SceneX can create large-scale 3D natural scenes or unbounded
cities automatically according to user instructions. The generated models are character-
ized by delicate geometric structures, realistic material textures, and natural lighting,
allowing for seamless deployment in the industrial pipeline.
1 Introduction
With the rapid progress in generative modeling, 3D content generation has wit-
nessed remarkable achievements in automatic creation of 3D objects [35, 36, 39,
43], avatars [17, 27, 58], and scenes [9, 29, 31, 52]. In this work, we focus on au-
tomatic 3D scene generation, which has extensive applications in domains like
gaming, animation, film, and virtual reality. Although previous approaches have
achieved promising results, a significant gap still remains for industrial-grade
applications due to incompatible representations with the industrial pipeline of
the scene using 3D primitives (e.g. point cloud or radiance field).
To bridge the gap, we resort to procedural modeling, an advanced technique
for the fast creation of high-quality 3D assets. It can generate realistic and intri-
cate 3D assets in natural scenes through adjustable parameters and rule-based
systems [32] using software like Blender. For example, Infinigen [42] proposes a
procedural generator that combines different procedural modeling algorithms to
generate large-scale natural scenes encompassing terrain, weather, vegetation,
and wildlife. [33], [5] and [46] use procedural modeling to generate city-level
street or layout. These procedural approaches generate high-quality 3D assets
at the object, scene, and city levels. However, the comprehensive grasp of gen-
eration rules, algorithmic frameworks, and individual parameters of procedural
modeling make the generation process beginner-unfriendly and time-consuming,
especially for large-scale scene or city generation. For instance, generating a New
York city, as shown in Fig 1, requires the effort of a professional PCG engineer
working for over two weeks. Moreover, due to the substantial learning curve for
users, it is formidable to generate large-scale scenes using procedural modeling.
To address the above problems, existing works such as 3D-GPT [45] and
SceneCraft [19] introduce an instruction-driven 3D modeling method by inte-
grating an LLM agent with procedural generation software. Despite the suc-
cessful collaboration with human designers to establish a procedural generation
framework, these methods still exhibit significant limitations. Firstly, SceneCraft
is unable to achieve fine-grained 3D asset editing due to the uneditable nature of
fixed 3D assets compared to those created by procedural generation. Secondly,
3
2 Related Works
2.1 Learning Based 3D Generation.
In recent years, 3D asset generation has witnessed rapid progress, combining
the ideas of computer graphics and computer vision to realize the free creation
of 3D content. Presently, predominant research in 3D asset generation primarily
concentrate on creating individual objects [8,30,35,39], 3D avatars [18,25,27,58],
and 3D scenes [11, 16, 59, 61, 62]. Among these, ZeroShot123 [35] proposes a
method based on a diffusion model, which realizes the 3D model of the target
4
Researchers have delved into the procedural generation of natural scenes [12,60]
and urban scenes [33, 46, 48, 54] using Procedural Content Generation (PCG).
For instance, PMC [38] proposes a procedural way to generate cities based on
2D ocean or city boundaries. It employs mathematical algorithms to generate
blocks and streets and utilizes tailgating technology to generate the geometry
of buildings. While these traditional computer graphic methods can produce
high-quality 3D data, all parameters must be pre-entered into the procedurally
generated process. The resultant 3D data is bound by rule limitations, introduc-
ing a certain level of deviation from the real world. This significantly constrains
flexibility and practical usability. Infinigen [42] introduces a technique for pro-
cedurally generating realistic 3D natural objects and scenes. This methodology
facilitates the programmatic generation of all assets, with meshes and textures
generated through random mathematical algorithms. Although Infinigen gener-
ates infinitely real assets, users are unable to customize the generated outcomes
because of their specific requirements. Therefore, we propose a more convenient
procedural generation method to produce high-quality 3D assets.
Benefiting from knowledge hidden in the large-language model (LLMs) [1, 3, 10,
26,41], researchers explore LLMs to address intricate tasks beyond canonical lan-
guage processing domains. These tasks encompass areas such as mathematical
reasoning [22, 50], medicine [23, 53], and planning [14, 20, 21, 57]. Thanks to pow-
erful reasoning and generalization capabilities, LLMs act as practiced planners
for different tasks. For example, [20] utilizes the expansive domain knowledge
of LLMs on the internet and their emerging zero-shot planning capabilities to
execute intricate task planning and reasoning. [14] investigates the application of
LLMs in scenarios involving multi-agent coordination, covering a range of diverse
task objectives. [56] presents a modular framework that employs structured dia-
logue through prompts among multiple large pretrained models. Moreover, spe-
5
(a) (b)
Fig. 2: (a) Illustrative Comparison: Tree PCG Example and Tree API Documentation
Example. (b) Illustration of the text-to-image retrieval process.
cialized LLMs for particular applications have been explored, such as Hugging-
GPT [44] for vision perception tasks, VisualChatGPT [51] for multi-modality
understanding, Voyager [49] and [63], SheetCopilot [28] for office software, and
Codex [6] for Python code generation. Inspired by the existing works, we explore
the LLM agent to PCG software, e.g., Blender, to provide automatic 3D assets
generation.
3 Methods
The proposed SceneX framework contains two main components: the PCGBench
and PCGPlanner. PCGBench is a large-scale dataset that includes diverse re-
sources in procedural modeling, which is used to perform as an asset bank and
evaluate the generation performance. Specifically, it includes collecting plugins
and assets from the Internet, asset annotations, and hand-craft API documents.
Based on our PCGBench, a novel PCGPlanner framework is proposed to auto-
matically generate 3D scenes using both PCG and pre-defined models. It com-
prises a task planner stage, an asset retrieval stage, and an action execution
stage guided by agents with custom-built instructions. By the cooperation be-
tween the PCGBench and novel PCGPlanner agents, SceneX is endowed with
the capability of efficient, controllable generation of diverse, high-quality scenes.
6
3.1 PCGBench
To build PCGBench, we first collect abundant resources of PCG models from
Blender Market1 , which includes action APIs and 3D assets. The action APIs
consist of raw APIs and detailed parameters that are used to control asset gener-
ation. Specifically, we have collected 1,532 individual raw APIs in total, involving
each aspect in the scene-generation tasks, such as controlling generated assets’
placement, color, size, and height. Then, we manually merge the most essen-
tial raw APIs according to their functions and obtain 45 action functions. Each
action function contains multiple raw APIs with the corresponding parameters
to produce a complete 3D asset. For example, to generate a 3D tree as Fig
2 shows, we define two individual action functions such as tree_mesh_generate
and tree_material_generate. Each function comprises several raw APIs with the
corresponding parameters like TreeHight, CrownDensity, LeafDensity, etc. These
parameters can be manipulated at the action input, and we can generate various
trees by changing these parameters.
To achieve controllable 3D model generation guided by human instructions,
we should formulate the action functions for large-language models (LLMs),
e.g., GPT-4 2 , following previous works [6, 45, 49]. To achieve this, we compile
lots of action functions guidance documents, denoted as API Documents, to
deploy our functions for LLMs. Concretely, this document includes information
on API types, function descriptions, API names, and a set of raw APIs with
the corresponding parameter details. Besides, we also provide a comprehensive
record of all parameter names and their possible values. The details are referred
to Fig 2. Moreover, we have categorized these documents into four modules:{
Terrain, City, Detail, Weather}. Simultaneously, we have developed a concise
document outlining the scene generation process, providing detailed explanations
of the construction sequence and capability boundaries for each module.
Regarding asset collection, we have meticulously curated an extensive 3D
dataset comprising 1,908 finely crafted models and 1,294 textures. This diverse
collection spans categories including buildings, flora, urban infrastructure, and
textures gathered from the vast expanse of the Internet. These existing assets
provide an alternative solution for 3D scene generation by retrieving existing
individual procedural models rather than progressively generation, which can
significantly improve the efficiency of creating large-scale scenes. For retrieval,
we use CLIP [40] encoder feature, achieving text-to-text retrieval and text-to-
image retrieval. Therefore, we also provide abundant annotations for collected
3D assets by rendering a picture for each 3D asset.
3.2 PCGPlanner
To achieve LLM-driven asset generation, we introduce PCG Planner Agents’ hi-
erarchy, which is comprised of multiple agents that can automatically generate
high-quality and large-scale scenes corresponding to the users’ textual descrip-
tions. Following previous works [6,44,49,51], we design our module encompassing
1
https://blendermarket.com/
2
https://openai.com/gpt-4
7
Stone_generate(par1,par2…par
a)
{Terrain: Plain} {Step 1: Generate a pine tree.}
{Details : pine tree} {Step 1: Import the pine into the scene.} Descriptions Obj_scatter(par1,par2…para)
{Details : shrub} {Step 1: Scatter willow son the ground.}
…
{Weather: Summer } … Tree_generate(par1,par2…para)
Text Encoder
{Step 1: Import a summer light.}
{Step 2: Change the leaf into green.} {"Name": "IsTrunkStraight",
"Parameter Types": "Discrete",
"Description": "Indicates whether the
tree trunk is straight or not",
"Possible Values": ["True", "False"]},
Specialist agent ...
You serve as a Blender modeling specialist. Your job is to give me detailed Embeddings
steps to achieve a task with given actions APIs.
- Your task is {T_specialist}.
- You must make a detailed plan as regulated in {D_specialist}
- Output a list of the detailed plan in the sequential order in the form of Text Encoding
{F_specialist}
- Here is an example:
{E_specialist}
Now, you must think about how to accomplish the task step by step. … API docs
Fig. 3: The PCG Planner framework comprises three essential components: the task
planner, asset retrieval, and action execution. This framework empowers LLMs with
the capabilities for task planning in complex scenarios, utilizing multiple API actions,
and facilitating large-scale scene generation.
three primary stages: task planning, asset retrieval, and action execution. In the
task planning stage, a sequential planning agent pair is introduced to process
the user query and output a detailed sub-task list. Then, to precisely connect
the sub-task with our PCGBench, we retrieve the actions and assets mentioned
in the task list. Finally, the execution agent leverages the annotated action APIs
to process subtasks of procedural generation or asset manipulation.
Agent construction To endow the LLMs with the capability for modeling
scenes like a specialist, we introduce a systematic template to prompt LLMs
as agents to achieve the subtasks in a scene modeling task. For each agent,
the prompt P has a similar structure defined as Pi (Ri , Ti , Di , Fi , Ei ), where
i ∈{dispatch, specialist, retrieval, execution} distinguishes the responsibility of
agents. The constituents of the prompt are defined as follows:
• Role Each agent is given a specific role R which describes its respon-
sibility in the scene generation process.
• Task T gives a detailed explanation of the goals for the agent. In the
meanwhile, the constraints are also expounded executing these tasks.
• Document At each step, the agent is prompted by a knowledge doc-
ument according to their task. We denote D as the collection that contains
all the knowledge documentation pre-defined in PCGBench.
• Format F denoted as the output format for each agent. To precisely
and concisely convey information between agents, the output format of each
agent is strictly defined.
• Examples To help agents respond strictly following the format, we
demonstrated several examples E for reference in each prompt.
Task Planner For a scene generation task, multiple action APIs from differ-
ent modules are usually needed. There is a great possibility that different APIs
8
where q is the user input, oi stands for an object listed in the rough plan in the
formation of {Module: Object name}
Specialist agent Then, the specialist agent is activated to make further detailed
plans. At this stage, a specialist agent serves as a professional designer who is ca-
pable of giving detailed plans for the objects given in the rough plan. According
to the module class of the object, the specialist agent is prompted with the corre-
sponding knowledge documentation Dspecialist ⊂ D, where Dspecialist contains
the generation mechanisms and explanations for each module. The specialist
agent decomposes each rough plan into a detailed plan with several sub-tasks,
where each subtask corresponds to an executable API or an asset to be imported
where α ∈ A stands for an retrieved API, A stands for the API collection.
γ ∈ Γ is the retrieved asset from asset collection Γ . For text-to-text retrieval,
the most relevant API is also selected by its functional description embedding. If
the retrieval result is an asset, it will be directly imported into the scene project.
On the other hand, the retrieved API α is to be passed to the action execution
agent.
Action Execution In the PCGBench, each action API is defined in the form
of a Python function with multiple pre-defined parameters. After obtaining the
α, we need to make custom modifications for these initial parameters to meet
the task description. For each API α, there is a knowledge documentation Dα
∈ D that passed into the prompt Pspecialist . As illustrated in Fig 2 (a), the
knowledge documentation of each action API is recorded in a JSON format and
has the same architecture. Take tree_mesh_generate API as an example, each
parameter in the function is precisely defined and explained in tree_mesh_doc
= {"type": "tree mesh", "Description": "Function for generating trees mesh.",
"Function API": "tree_mesh_generate", "Parameter Information": {"Possible
values": ("High", "Medium", "Low"), ... } }. The execution agent can be ex-
pressed as:
\alpha ^* \leftarrow Agent_{retrieval}(\alpha , P_{specialist},s_j) (4)
Where α∗ is the manipulated version of retrieved API α. There is a chance that
the descriptions contain less information for all the parameters. In this situation,
we instruct the execution agent to automatically choose a feasible parameter
from the options according to the other parameters. The execution agent ulti-
mately executed the action API with the inferred parameters in Blender Python
environments, incrementally adding new objects and making modifications in
scene project until all the tasks were completed.
4 Experiments
The goals of our experiments are threefold: (i) to verify the capability of SceneX
for generating photorealistic large-scale scenes, including nature scenes and cities;
(ii) to demonstrate the effectiveness of SceneX for personalized editing, such as
adding or changing; (iii) to compare different LLMs on the proposed benchmark.
Early summer, under the bright sun, the forest is thriving with lush vegetation.
Cherry blossoms bloom, green grass the ground, and a river flows calmly.
Bright lights illuminate the modern city, showcasing a distinctive contemporary style.
Fig. 4: Visualization of the generation quality for large-scale scenes and cities.
proportion of proposed actions that can be executed, and the latter is used to
evaluate action correctness [7]. Moreover, to quantify the aesthetic level of the
generated scenes, we adopt a unified judgment standard as a reference. We divide
the aesthetics of generated scenes into five standards: Poor (1-2 points)/Below
Average (3-4 points)/Average (5-6 points)/Good (7-8 points)/Excellent (9-10
points). We enlisted 35 volunteers to assess the quality of our generation, in-
cluding 5 PCG experts. We compute average score (AS) and average expert
score (AES) to evaluate the effectiveness of our method.
PersistentNature
SceneDreamer
InfiniCity
CityDreamer
Ours
Fig. 5: Comparative results on city generation. Issues with unreasonable geometry are
observed in previous works, while our method performs well in generating realistic
large-scale city scenes.
structure consistency, but the building quality is still relatively low. These fac-
tors limit their large-scale application in industry. In comparison, the proposed
SceneX generates highly realistic and well-structured urban scenes without the
issues of structural distortions and layout defects compared to learning-based
methods. These results demonstrate the effectiveness of our method for large-
scale city scene generation.
Aesthetic Evaluation. To better evaluate the generation quality of SceneX ,
we collect the results of related works involving text-to-3D work and Blender-
driven 3D generation. These results are subjected to aesthetic evaluation by a
panel comprising of 35 voluntary contributors and 5 experts in 3D modeling, with
the scoring criteria outlined in Section 4.1. As shown in Table 1, our scores for AS
and AES surpass the second-highest scores by 2.10 and 0.9 points, respectively.
Compared to the average level of other works, our project reaches a good level,
indicating the high generation quality of SceneX .
To evaluate the consistency between text inputs and the generated assets,
we calculate the CLIP similarity between input text and rendered images. To
better illustrate the results, we utilize three different CLIP models for testing,
including ViT-L/14 (V-L/14), ViT-B/16 (V-B/16) and ViT-B/32 (V-B/32), re-
spectively. The detailed results are displayed in Table 2. We compare represen-
tative text-to-3D approaches (e.g. WonderJ [55], Text2Room [16], and Dream-
Fusion [39]) and Blender-driven 3D generation works (e.g. BlenderGPT, 3D-
GPT [45], and SceneCraft [19]). Although the similarity scores of text-to-3D
methods are higher, it is reasonable because their training or optimization in-
cludes the text-to-image alignment process. Compared to the Blender-driven 3D
12
Generate a tree with the sprawling branches Make trees more lush Make the tree look like a dead a tree in autumn
tree
Generate a winter terrain of forest increase the number of mountains lower the height of the mountain convert snowy mountains to mud
generation works, SceneX achieves the highest score, indicating its capability to
accurately execute the input prompts and generate results.
Efficiency Evaluation. To illustrate the efficiency of SceneX , we provide
the time required of our method compared with Infinigen [42] and human craft.
The experiments are performed on a server equipped with dual Intel Xeon Pro-
cessors (Skylake architecture), each with 20 cores, totaling 80 CPU cores. Addi-
tionally, we consult 3D PCG experts to determine the time needed to construct
the same natural and urban scenes. The comparison results are shown in Table 3.
From the results, we can observe that SceneX is 7 times faster than Infinigen [42]
in generating a scene with 150m × 150m size. We also provide nearly 30 times
faster than human experts creating a large-scale city by hand. This demonstrates
the impressive efficiency of SceneX in both large-scale natural scene and urban
scene generation.
Personalized Editing Results. To demonstrate the capability of our method
for personalized editing, we conduct experiments on 3D asset generation guided
by users’ instructions. The results are shown in Fig. 10. It is evident that the
changes in the edited text are closely related to the modifications in 3D as-
sets. SceneX demonstrates a versatile, highly controllable, and personalized edit-
13
Table 3: Comparing the time required for natural scene generation and city generation
at different terrain scales.
Table 4: Results of different prompt components for tree asset generation. Examples,
Role, Document and Task represent four integral components of the prompt template.
5 Conclusion
sional PCG engineers to just a few hours for an ordinary user. Our method’s
capabilities in controllable large-scale scene generation and editing, including
tasks like asset placement and season translation, are validated through exten-
sive experiments.
Limitations Although the proposed SceneX is effective for the LLM-driven
procedural large-scale scene generation, it still faces several limitations: 1) The
performance of our method relies on the pretrained LLM model. This reliance
may constrain the framework from generalizing to a wider range of applications.
2) We have collected a limited number of assets and APIs, which limits the vari-
ety of generated scenes and degree of action space. In the future, it is promising
to fine-tune the LLMs with professional modeling knowledge and empower them
to be a professional modeling assistant. Collecting more resources for PCGBench
can also enhance the scene fidelity and aesthetics.
6 Acknowledgement
This work was supported in part by the National Key R&D Program of China
(No. 2022ZD0116500), the National Natural Science Foundation of China (No.
U21B2042, No. 62320106010), and in part by the 2035 Innovation Program of
CAS and the InnoHK program.
References
1. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida,
D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv
preprint arXiv:2303.08774 (2023) 4
2. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee-
lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win-
ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark,
J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
models are few-shot learners (2020) 14
3. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee,
P., Lee, Y.T., Li, Y., Lundberg, S., et al.: Sparks of artificial general intelligence:
Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023) 4
4. Chai, L., Tucker, R., Li, Z., Isola, P., Snavely, N.: Persistent nature: A generative
model of unbounded 3d worlds. In: Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. pp. 20863–20874 (2023) 10
5. Chen, G., Esch, G., Wonka, P., Müller, P., Zhang, E.: Interactive procedural street
modeling. In: ACM SIGGRAPH 2008 papers, pp. 1–10 (2008) 2
6. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code. arXiv preprint arXiv:2107.03374 (2021) 5, 6
7. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards,
H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models
trained on code. arXiv preprint arXiv:2107.03374 (2021) 10
16
8. Chen, R., Chen, Y., Jiao, N., Jia, K.: Fantasia3d: Disentangling geometry
and appearance for high-quality text-to-3d content creation. arXiv preprint
arXiv:2303.13873 (2023) 3
9. Chen, Z., Wang, G., Liu, Z.: Scenedreamer: Unbounded 3d scene generation from
2d image collections. arXiv preprint arXiv:2302.01330 (2023) 2, 4, 10
10. Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A.,
Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: Scaling lan-
guage modeling with pathways. Journal of Machine Learning Research 24(240),
1–113 (2023) 4
11. Fridman, R., Abecasis, A., Kasten, Y., Dekel, T.: Scenescape: Text-driven consis-
tent scene generation. arXiv preprint arXiv:2302.01133 (2023) 3
12. Gasch, C., Sotoca, J., Chover, M., Remolar, I., Rebollo, C.: Procedural modeling of
plant ecosystems maximizing vegetation cover. Multimedia Tools and Applications
81 (05 2022). https://doi.org/10.1007/s11042-022-12107-8 4
13. Gemma Team, G.D.: Gemma: Open models based on gemini research and tech-
nology, see Contributions and Acknowledgments section for full author list. Please
send correspondence to [email protected]. 14
14. Gong, R., Huang, Q., Ma, X., Vo, H., Durante, Z., Noda, Y., Zheng, Z., Zhu, S.C.,
Terzopoulos, D., Fei-Fei, L., et al.: Mindagent: Emergent gaming interaction. arXiv
preprint arXiv:2309.09971 (2023) 4
15. Hao, Z., Mallya, A., Belongie, S., Liu, M.Y.: Gancraft: Unsupervised 3d neural
rendering of minecraft worlds. In: Proceedings of the IEEE/CVF International
Conference on Computer Vision. pp. 14072–14082 (2021) 4
16. Höllein, L., Cao, A., Owens, A., Johnson, J., Nießner, M.: Text2room: Extracting
textured 3d meshes from 2d text-to-image models. arXiv preprint arXiv:2303.11989
(2023) 3, 11, 12
17. Hong, F., Chen, Z., Yushi, L., Pan, L., Liu, Z.: Eva3d: Compositional 3d human
generation from 2d image collections. In: The Eleventh International Conference
on Learning Representations (2022) 2
18. Hong, F., Zhang, M., Pan, L., Cai, Z., Yang, L., Liu, Z.: Avatarclip: Zero-shot text-
driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535
(2022) 3
19. Hu, Z., Iscen, A., Jain, A., Kipf, T., Yue, Y., Ross, D.A., Schmid, C., Fathi, A.:
Scenecraft: An llm agent for synthesizing 3d scene as blender code (2024) 2, 11,
12
20. Huang, W., Abbeel, P., Pathak, D., Mordatch, I.: Language models as zero-shot
planners: Extracting actionable knowledge for embodied agents. In: International
Conference on Machine Learning. pp. 9118–9147. PMLR (2022) 4
21. Huang, W., Wang, C., Zhang, R., Li, Y., Wu, J., Fei-Fei, L.: Voxposer: Composable
3d value maps for robotic manipulation with language models. arXiv preprint
arXiv:2307.05973 (2023) 4
22. Imani, S., Du, L., Shrivastava, H.: Mathprompter: Mathematical reasoning using
large language models. arXiv preprint arXiv:2303.05398 (2023) 4
23. Jeblick, K., Schachtner, B., Dexl, J., Mittermeier, A., Stüber, A.T., Topalis, J.,
Weber, T., Wesp, P., Sabel, B.O., Ricke, J., et al.: Chatgpt makes medicine easy
to swallow: an exploratory case study on simplified radiology reports. European
radiology pp. 1–9 (2023) 4
24. Jiang, A.Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D.S., de las Casas,
D., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., Lavaud, L.R., Lachaux,
M.A., Stock, P., Scao, T.L., Lavril, T., Wang, T., Lacroix, T., Sayed, W.E.: Mistral
7b (2023) 14
17
25. Jiang, R., Wang, C., Zhang, J., Chai, M., He, M., Chen, D., Liao, J.: Avatarcraft:
Transforming text into neural human avatars with parameterized shape and pose
control. arXiv preprint arXiv:2303.17606 (2023) 3
26. Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional
transformers for language understanding. In: Proceedings of naacL-HLT. vol. 1,
p. 2 (2019) 4
27. Kolotouros, N., Alldieck, T., Zanfir, A., Bazavan, E.G., Fieraru, M., Smin-
chisescu, C.: Dreamhuman: Animatable 3d avatars from text. arXiv preprint
arXiv:2306.09329 (2023) 2, 3
28. Li, H., Su, J., Chen, Y., Li, Q., Zhang, Z.: Sheetcopilot: Bringing software
productivity to the next level through large language models. arXiv preprint
arXiv:2305.19308 (2023) 3, 5
29. Li, Y., Jiang, L., Xu, L., Xiangli, Y., Wang, Z., Lin, D., Dai, B.: Matrixcity: A
large-scale city dataset for city-scale neural rendering and beyond. In: Proceedings
of the IEEE/CVF International Conference on Computer Vision. pp. 3205–3215
(2023) 2, 4
30. Lin, C.H., Gao, J., Tang, L., Takikawa, T., Zeng, X., Huang, X., Kreis, K., Fidler,
S., Liu, M.Y., Lin, T.Y.: Magic3d: High-resolution text-to-3d content creation.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 300–309 (2023) 3, 12
31. Lin, C.H., Lee, H.Y., Menapace, W., Chai, M., Siarohin, A., Yang, M.H., Tulyakov,
S.: Infinicity: Infinite-scale city synthesis. arXiv preprint arXiv:2301.09637 (2023)
2, 4, 10
32. Lindenmayer, A.: Mathematical models for cellular interactions in development
i. filaments with one-sided inputs. Journal of Theoretical Biology 18(3), 280–299
(1968). https://doi.org/https://doi.org/10.1016/0022- 5193(68)90079- 9,
https://www.sciencedirect.com/science/article/pii/0022519368900799 2
33. Lipp, M., Scherzer, D., Wonka, P., Wimmer, M.: Interactive modeling of city lay-
outs using layers of procedural content. In: Computer Graphics Forum. vol. 30, pp.
345–354. Wiley Online Library (2011) 2, 4
34. Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite
nature: Perpetual view generation of natural scenes from a single image. In: Pro-
ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
(October 2021) 4
35. Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-
1-to-3: Zero-shot one image to 3d object. In: Proceedings of the IEEE/CVF Inter-
national Conference on Computer Vision. pp. 9298–9309 (2023) 2, 3
36. Melas-Kyriazi, L., Laina, I., Rupprecht, C., Vedaldi, A.: Realfusion: 360deg recon-
struction of any object from a single image. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition. pp. 8446–8455 (2023) 2
37. OpenAI, Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L.,
Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., Avila, R., Babuschkin, I.,
Balaji, S., Balcom, V., Baltescu, P., Bao, H., Bavarian, M., Belgum, J., Bello, I.,
Berdine, J., Bernadett-Shapiro, G., Berner, C., Bogdonoff, L., Boiko, O., Boyd,
M., Brakman, A.L., Brockman, G., Brooks, T., Brundage, M., Button, K., Cai, T.,
Campbell, R., Cann, A., Carey, B., Carlson, C., Carmichael, R., Chan, B., Chang,
C., Chantzis, F., Chen, D., Chen, S., Chen, R., Chen, J., Chen, M., Chess, B.,
Cho, C., Chu, C., Chung, H.W., Cummings, D., Currier, J., Dai, Y., Decareaux,
C., Degry, T., Deutsch, N., Deville, D., Dhar, A., Dohan, D., Dowling, S., Dunning,
S., Ecoffet, A., Eleti, A., Eloundou, T., Farhi, D., Fedus, L., Felix, N., Fishman,
18
S.P., Forte, J., Fulford, I., Gao, L., Georges, E., Gibson, C., Goel, V., Gogineni,
T., Goh, G., Gontijo-Lopes, R., Gordon, J., Grafstein, M., Gray, S., Greene, R.,
Gross, J., Gu, S.S., Guo, Y., Hallacy, C., Han, J., Harris, J., He, Y., Heaton, M.,
Heidecke, J., Hesse, C., Hickey, A., Hickey, W., Hoeschele, P., Houghton, B., Hsu,
K., Hu, S., Hu, X., Huizingi, J., Jain, S., Jain, S., et al.: Gpt-4 technical report
(2023) 14
38. Parish, Y., Müller, P.: Procedural modeling of cities. vol. 2001, pp. 301–308 (08
2001). https://doi.org/10.1145/1185657.1185716 4
39. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: Text-to-3d using 2d
diffusion. In: The Eleventh International Conference on Learning Representations
(2022) 2, 3, 4, 11, 12
40. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G.,
Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from
natural language supervision. In: International conference on machine learning. pp.
8748–8763. PMLR (2021) 6, 8
41. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li,
W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text
transformer. The Journal of Machine Learning Research 21(1), 5485–5551 (2020)
4
42. Raistrick, A., Lipson, L., Ma, Z., Mei, L., Wang, M., Zuo, Y., Kayan, K., Wen, H.,
Han, B., Wang, Y., et al.: Infinite photorealistic worlds using procedural generation.
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 12630–12641 (2023) 2, 4, 12, 13
43. Raj, A., Kaza, S., Poole, B., Niemeyer, M., Ruiz, N., Mildenhall, B., Zada, S.,
Aberman, K., Rubinstein, M., Barron, J., et al.: Dreambooth3d: Subject-driven
text-to-3d generation. arXiv preprint arXiv:2303.13508 (2023) 2
44. Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: Solving ai
tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580
(2023) 3, 5, 6
45. Sun, C., Han, J., Deng, W., Wang, X., Qin, Z., Gould, S.: 3d-gpt: Procedural 3d
modeling with large language models. arXiv preprint arXiv:2310.12945 (2023) 2,
6, 11, 12
46. Talton, J.O., Lou, Y., Lesser, S., Duke, J., Mech, R., Koltun, V.: Metropolis pro-
cedural modeling. ACM Trans. Graph. 30(2), 11–1 (2011) 2, 4
47. Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bash-
lykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C.C.,
Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao,
C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kar-
das, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux,
M.A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov,
T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Sal-
adi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X.E., Tang,
B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan,
A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., Scialom,
T.: Llama 2: Open foundation and fine-tuned chat models (2023) 14
48. Vanegas, C., Kelly, T., Weber, B., Halatsch, J., Aliaga, D., Müller, P.: Procedural
generation of parcels in urban modeling. Computer Graphics Forum 31, 681–690
(05 2012). https://doi.org/10.1111/j.1467-8659.2012.03047.x 4
49. Wang, G., Xie, Y., Jiang, Y., Mandlekar, A., Xiao, C., Zhu, Y., Fan, L., Anand-
kumar, A.: Voyager: An open-ended embodied agent with large language models.
arXiv preprint arXiv:2305.16291 (2023) 3, 5, 6
19
50. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q.V., Zhou,
D., et al.: Chain-of-thought prompting elicits reasoning in large language models.
Advances in Neural Information Processing Systems 35, 24824–24837 (2022) 4
51. Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan, N.: Visual chatgpt:
Talking, drawing and editing with visual foundation models. arXiv preprint
arXiv:2303.04671 (2023) 3, 5, 6
52. Xie, H., Chen, Z., Hong, F., Liu, Z.: Citydreamer: Compositional generative model
of unbounded 3d cities. arXiv preprint arXiv:2309.00610 (2023) 2, 4, 10, 12
53. Yang, K., Ji, S., Zhang, T., Xie, Q., Ananiadou, S.: On the evaluations of chat-
gpt and emotion-enhanced prompting for mental health analysis. arXiv preprint
arXiv:2304.03347 (2023) 4
54. Yang, Y.L., Wang, J., Vouga, E., Wonka, P.: Urban pattern: Layout design by hi-
erarchical domain splitting. ACM Transactions on Graphics (Proceedings of SIG-
GRAPH Asia 2013) 32, Article No. xx (2013) 4
55. Yu, H.X., Duan, H., Hur, J., Sargent, K., Rubinstein, M., Freeman, W.T., Cole, F.,
Sun, D., Snavely, N., Wu, J., Herrmann, C.: Wonderjourney: Going from anywhere
to everywhere (2023) 11, 12
56. Zeng, A., Attarian, M., Ichter, B., Choromanski, K., Wong, A., Welker, S., Tombari,
F., Purohit, A., Ryoo, M., Sindhwani, V., et al.: Socratic models: Composing zero-
shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598 (2022)
4
57. Zhang, C., Yang, K., Hu, S., Wang, Z., Li, G., Sun, Y., Zhang, C., Zhang, Z.,
Liu, A., Zhu, S.C., et al.: Proagent: Building proactive cooperative ai with large
language models. arXiv preprint arXiv:2308.11339 (2023) 4
58. Zhang, C., Chen, Y., Fu, Y., Zhou, Z., Yu, G., Wang, B., Fu, B., Chen, T., Lin, G.,
Shen, C.: Styleavatar3d: Leveraging image-text diffusion models for high-fidelity
3d avatar generation. arXiv preprint arXiv:2305.19012 (2023) 2, 3
59. Zhang, G., Wang, Y., Luo, C., Xu, S., Peng, J., Zhang, Z., Zhang, M.: Furniscene:
A large-scale 3d room dataset with intricate furnishing scenes. arXiv preprint
arXiv:2401.03470 (2024) 3
60. Zhang, J., Wang, C.b., Qin, H., Chen, Y., Gao, Y.: Procedural modeling of rivers
from single image toward natural scene production. The Visual Computer 35 (02
2019). https://doi.org/10.1007/s00371-017-1465-7 4
61. Zhang, J., Li, X., Wan, Z., Wang, C., Liao, J.: Text2nerf: Text-driven 3d scene
generation with neural radiance fields. arXiv preprint arXiv:2305.11588 (2023) 3
62. Zhang, Q., Wang, C., Siarohin, A., Zhuang, P., Xu, Y., Yang, C., Lin, D., Zhou, B.,
Tulyakov, S., Lee, H.Y.: Scenewiz3d: Towards text-guided 3d scene composition.
arXiv preprint arXiv:2312.08885 (2023) 3
63. Zhu, X., Chen, Y., Tian, H., Tao, C., Su, W., Yang, C., Huang, G., Li, B., Lu, L.,
Wang, X., et al.: Ghost in the minecraft: Generally capable agents for open-world
enviroments via large language models with text-based knowledge and memory.
arXiv preprint arXiv:2305.17144 (2023) 5
A Supplementary Materials
Blender Market offers a wide range of PCG plugins and 3D asset resources.
To meet the demand for generating large-scale natural and urban scenes, we
constructed PCGbench from four directions: Materials and Shading, Model-
ing, Rendering, and Asset Placement. Specifically, for generation-based PCG,
we collected PCG models such as terrain, trees, flowers, rocks, and buildings.
For material-based PCG, we gathered modifiable materials such as plant leaves,
plant roots, terrain, buildings, snow, and sky. For asset placement-based PCG, we
collected various placement PCG methods including slope-based, height-based,
object-based, and geometric space-based placements. In terms of rendering-based
PCG, we collected PCG resources related to rendering engines, rendering set-
tings, and renderer optimizations. Additionally, for 3D asset resources, we col-
lected a wide range of categories including buildings, vegetation, urban infras-
tructure, and textures sourced from the extensive realm of the internet.
Furthermore, to better integrate and utilize all the PCG plugins and 3D assets,
we actively utilize Blender’s API to create some basic functions such as import-
ing, deleting, moving, and deforming. These APIs can be accessed and called by
PCGPlanner to achieve better results. 3D Asset Dataset: In this study, we
have developed a comprehensive dataset consisting of a large collection of 3D as-
sets, including 1908 meticulously crafted models and 1294 textures. To facilitate
text-to-text and text-to-image retrieval for the CLIP model, we have extensively
annotated each 3D asset and supplemented them with rendered images.
A.2 Algorithm
In this chapter, we provide the pseudocode for the entire framework implemen-
tation workflow to clearly illustrate the logical relationship between the loops
between the various agents. First, we use a loop to check the termination condi-
tion, i.e. stop the execution of the algorithm when End_f lag = True. In each
loop, we first call the dispatch agent, pass the input q and Prompt to the agent,
and receive a set of rough plan objects o1 , ..., on . Next, we iterate over each object
and use the specialist agent to further break it down into a detailed plan to get a
set of s1 , .., sn . We then execute each detailed plan and call Agent_retrieval and
CLIP to search for the corresponding assets and APIs in PCGBench. Finally, we
evoke execution Agent to manipulate each retrieved API α, until all the sub-task
is completed and change the End_f lag to True.
Tamplate:
Role: You serve as a Blender modeling task planner that helps me design a scene.
Task: You will receive a paragraph description of the scene, your task is to help me to make a list includes all the
objects that may appear in the scene.
Document:
You should follow the principles when listing objects:
- The objects should be in one of the modules of {Terrain, City, Weather, Details}
- Terrain: the fundamental terrain, including {plain, basin, river, pool, mountain, hill, canyon}
- City: the major element that makes up the city {layout, road, sidewalk, vegetation, forest, buildings}
- Weather: the different light situation, including{morning, noon, afternoon, evening, midnight}
- Details: all the details may appear in the scene, for natural terrain there are {stone, grass, tree, brushwood,
flowers, …}, for the city, there are {people, trash bins, bicycles, cars, …}, the details are not limited in the
given list. You can imagine whatever may appear in the scene.
You should output a list that contains all the objects in a reasonable order. The objects should be arranged
according to the module in order from bottom to top. For the natural scene, it should be listed in the order of
{Terrain-Details-weather}, for the city scene, it should follow the order {City-Details-Weather}.
Format: Output the rough plan in the following form:
{{Module : Object name},…,{Module : Object name}}
Examples:
{{City : Small and crowded layout}, {City : Asphalt road}, {City : Cement sideWalk}, {City : Grass}, {City :
Maple}, {City : Modern buildings}, {Details : Bycicle}, {Details : People }, {Details : Trash bin}, {Weather :
Sunny noon.}}
Role: You serve as a Blender modeling specialist who helps me design a scene.
Task: You will receive an object description, your task is to make a list that contains a detailed plan to generate
this object in the blender project.
Document:
For each target object in the plan, there is a common generation and manipulation process. Each target may need to
go through at least one step to be placed in the scene, you should make a plan step by step according to the main
task list as follows:
Step 1: Retrieve the required asset. Target objects{Layout, Building, Grass, Tree, Terrain, Weather,…}
Step 2: Manipulate the parameters to change the mesh. Target objects{Terrain, Tree, Building,… }
Step 3: Make the texture manipulation. Target objects{Road, Leaf, Water, Weather, Ground,…}
Step 4: Set the proper location or distribution for the asset. Target objects{Tree, Stone, People, Trash bin, Car,…}
For each task, you should make an inference on which of these steps is to be used. In the meantime, you should
adopt the asset and its carrier according to the target task.
Format: Output the detailed plan in the following form :
{Step 1: The description of task 1},
{Step 2: The description of task 2},
{Step 3: The description of task 3},
{Step 4: The description of task 4}
Examples:
{Step 1: Retrieve the maple tree model},
{Step 2: Change it to a stout pine tree with lush foliage},
{Step 3: Change the leaves to yellow},
{Step 4: Distribute the pine tree at the parking area in a sparse density}
Fig. 7: Prompt Example for Planner: A Template, Code Snippet, and Specific Exam-
ple.
24
Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating terrain mesh. The
API that you can invoke for this purpose is called "Terrain_mesh_generate".
Document:
[{Name: Rotate_Slope, Description: This parameter allows you to define the magnitude of the terrain slope,Value:
-6.280 – 6.280, Default: 0.000},
{Name: Flat_Floor,Description: This parameter allows you to flatten the ground,
Value:[True,False],Default:False},
{Name: Sink, Description: You can change the elevation or height of the terrain, Value: 0.000 – 1.000,Default:
0.500},
{Name: Add_Valleys,Description: Based on the input provided by the user, speculate on whether the addition of
valleys is necessary., Value: :[True,False],Default: False},
{Name: Scale_Valleys,Description: Based on the user input, speculate on the scaling factor for the valleys., Value:
0.000 – 5.000,Default: 1.000}, ...]
Format: Out put the modified api in the follow form:
[Rotate_Slope: You need to give a parameter value in the range [-6.280 ,6.280], present the answer in the form of
a list,
Flat_Floor: You can only choose one from ["True", "False"], present the answer in the form of a list,
Sink: You need to give a parameter value in the range [0, 1], present the answer in the form of a list, ...],
Add_Valleys: You can only choose one from ["True", "False"], present the answer in the form of a list,
Scale_Valleys: You need to give a parameter value in the range [0.000, 5.000], present the answer in the form of a
list...]...
Examples: Inputs: Construct a summer terrain of flat and vast grasslands. Outputs:[Rotate_Slope: [0.000],
Flat_Floor: ["True"], Sink: [0.500], Add_Valleys: ["True"], Scale_Valleys:[1.000] ...]
Code:
'''python
import Terrain_mesh_gengerate
def Terrain_mesh_generate(Rotate_Slope, Flat_Floor, Sink, Add_Valleys, Scale_Valleys, ...):
bpy.context.object.modifiers["Terrain Mix"]["Input_25"] = Rotate_Slope
bpy.context.object.modifiers["Terrain Mix"]["Input_26"] = Flat_Floor
bpy.context.object.modifiers["Terrain Mix"]["Input_27"] = Sink
bpy.context.object.modifiers["Terrain Mix"]["Input_41"] = Add_Valleys
bpy.context.object.modifiers["Terrain Mix"]["Input_43"] = Scale_Valley
...
return
'''
Example:
Question: The summer highland terrain unveils a captivating expanse, with undulating hills and valleys covered in a
verdant carpet of lush green grasses, painting a breathtaking tableau of natural beauty.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the summer highland terrain:
```plaintext
[Rotate_Slope: [3.142], Flat_Floor: ["False"], Sink: [0.750], Add_Valleys: ["True"], Scale_Valleys: [2.500]...]
```
This configuration will help create undulating hills and valleys, allowing the terrain to have a natural appearance while
maintaining its captivating beauty.
Code:
'''python
Terrain_mesh_gengerate(Rotate_Slope=3.142, Flat_Floor=False, Sink=0.750, Add_Valleys=True,
Scale_Valleys=2.500, ...)
'''
Fig. 8: Prompt example for terrain PCG mesh generation: a template, code snippet,
and specific example.
25
Tamplate:
Role: You as the Blender API operator choose feasible parameter values based on the description. You need to
determine the placement parameters for each individual asset
Task: Your task is to break down the user's input into the parameters required for placing objects. The API that
you can invoke for this purpose is called "Scatter".
Document:
[{Name: Density,Description: Based on the user input, speculate on the vegetation density coefficient.,Value: 0.02
- 0.50, Default: 0.20},
{Name: Max_Height, Description: Based on the user input, speculate on the maximum height for placing object ,
represented as a percentage, Value:0 - 1.000, Default:0.800},
{ Name: Min_Height, Description: Based on the user input, speculate on the minimum height for placing object
represented as a percentage, Value:0 - 1.000, Default:0.200},
{Name: Max_Slope, Description: Based on the user input, speculate on the maximum degree of inclination for
placing objects, Value: 0 - 90, Default= 90},
{Name: Min_Slope, Description: Based on the user input, speculate on the minimum degree of inclination for
placing objects, Value: 0 - 90, Default= 0},...]
Format: Out put the modified api in the follow form:
[Density: You need to give a parameter value in the range [0.02,0.50], present the answer in the form of a list,
Max_Height: You need to give a parameter value in the range [0, 1.000], present the answer in the form of a list,
Min_Height: You need to give a parameter value in the range [0, 1.000], present the answer in the form of a list,
Max_Slope : You need to give a parameter value in the range [0, 90], present the answer in the form of a list,
Min_Slope : You need to give a parameter value in the range [0, 90], present the answer in the form of a list,
...
Examples: Inputs: The tree stands bare without leaves, its branches spreading out in all directions. Outputs:
[Density: [0.2], Max_Height: [1.0], Min_Height: [0.6], Max_Slope : [0], Min_Slope : [60]]
Code:
'''python
import Scatter
def Scatter(object, Density, Max_Height, Min_Height, Max_Slope, Min_Slope, ...):
node_group = bpy.data.node_groups["Density_Scatter_3"]
node_group.nodes["Density_Scatter_slope"].inputs[5].default_value = math.radians(Min_Slope)
node_group.nodes["Density_Scatter_slope"].inputs[7].default_value = math.radians(Max_Slope)
node_group.nodes["Density_Scatter_elev"].inputs[5].default_value = Min_Height
node_group.nodes["Density_Scatter_elev"].inputs[7].default_value = Max_Height
...
return
'''
Example:
Question: The tree stands adorned with lush foliage, its branches reaching upwards in unison.
Solution: Based on the provided document and your description, here are the parameter values you can use for
placing objects, assuming the tree as the object:
```plaintext
[Density: [0.20], Max_Height: [1.000], Min_Height: [0.200], Max_Slope: [90], Min_Slope: [0], ...]
```
This configuration will help place the tree objects with a moderate density, extending from the minimum height of
20% to the maximum height of 100%, and with no restriction on the slope, allowing the branches to spread out in all
directions as described.Code:
'''python
Scatter(obj, Density=0.20, Max_Height=1.000, Min_Height=0.200, Max_Slope=90, Min_Slope=0, ...)
'''
Fig. 9: Prompt example for object placement: a template, code snippet, and specific
example.
26
Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating tree mesh. The API
that you can invoke for this purpose is called "Tree_mesh_generate".
Document:
[{Name: IsTrunkStraight,Description: Based on the input provided by the user, make an inference regarding
whether the tree trunk is perfectly straight or not,Value: [True,False], Default: True},
{Name: TreeHeight, Description: Based on the information provided by the user, estimate the height of the
tree,Value:0 - 100, Default:5},
{ Name: LeafShape, Description: Based on the input from the user, speculate on the shape of the leaves,Value:
[Hexagonal, Elongated, Acerose (Maple-like)], Default: Elongated},
{Name: CrownDensity, Description: Based on the input from the user, speculate on the density of the tree
canopy,Value: 0.5 - 1.5,Default= 1.0},
{Name: LeafDensity, Description: Based on the user's input, speculate on the density of the tree leaves, Value:
[Dense, Medium],Default = Medium}]
Format: Out put the modified api in the follow form:
[IsTrunkStraight : You can only choose one from ["True", "False"], present the answer in the form of a list,
TreeHeight : You need to give a parameter value in the range [0, 100], present the answer in the form of a list,
LeafShape : You can only choose one from [Hexagonal, Elongated, Acerose (Maple-like)], present the answer in
the form of a list,
CrownDensity : You can only choose one from [Dense, Medium, Sparse], present the answer in the form of a list,
LeafDensity : You can only choose one from [Dense, Medium], present the answer in the form of a list]
Examples: Inputs: Generate a lush leaves and a moderate height. Outputs: [IsTrunkStraight : ["True"],
TreeHeight : [5], LeafShape : [Hexagonal], CrownDensity : [1.0], LeafDensity : [Dense]]
Code:
'''python
import Tree_mesh_generate
def Tree_mesh_generate(IsTrunkStraight, TreeHeight, LeafShape, CrownDensity, LeafDensity):
bpy.ops.preferences.addon_enable(module="add_curve_sapling")
para = process(IsTrunkStraight, TreeHeight, LeafShape, CrownDensity, LeafDensity) # Process parameters.
tree_mesh_generate_1(**para) # Generate a tree grid based on the given parameters.
return
'''
Example:
Question: The tree stands tall and straight, with branches reaching out like arms, adorned with lush green leaves that
gently sway in the breeze.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the tree mesh:
```plaintext
[IsTrunkStraight: ["True"], TreeHeight: [15], LeafShape: ["Elongated"], CrownDensity: ["Dense"], LeafDensity:
["Medium"]]
```
This configuration will help create a tall tree with a straight trunk, adorned with lush green leaves that gently sway in
the breeze, capturing the essence of the description you provided.
Code:
'''python
Tree_mesh_generate(IsTrunkStraight=True, TreeHeight=15, LeafShape="Elongated", CrownDensity="Dense",
LeafDensity="Medium")
'''
Fig. 10: Prompt example for tree PCG mesh generation: a template, code snippet,
and specific example.
27
Tamplate:
Role: You serve as a Blender API operator to choose the feasible parameter value according to the description.
Task: Your task is to break down the user's input into the parameters required for generating terrain. The API that
you can invoke for this purpose is called "map_generate".
Document:
[{Name: MapSize,Description: Based on the input provided by the user, make an inference about the map size for
the scene,Value: [500, 700, 1000, 1500, 2500], Default: 500},
{Name: BuildingDensity, Description: Based on the information provided by the user, estimate the buiding
density of the city,Value:[0 ~ 1], Default:1},
{ Name: VegetationDensity, Description: Based on the input from the user, make an inference about the tree
density on the street ,Value: [None, Few, Normal, Massive], Default: Normal},
{Name: SidewalkStyle, Description: Based on the input from the user, mkae an inference about the surface texture
of the sidewalk,Value: [Soil, Asphalt, Cement, Wooden, Brick,...], Default= Cement},
...]
Format: Out put the modified api in the follow form:
[MapSize: You can only choose one from : [500, 700, 1000, 1500, 2500], present the answer in the form of a list,
BuildingDensity: You need to decide the parameter value in the range [0, 1], present the answer in the form of a
list,
VegetationDensity: You can only choose one from : [None, Few, Normal, Massive], present the answer in the
form of a list,
SidewalkStyle: You can only choose one from[Soil, Asphalt, Cement, Wooden, Brick,...], present the answer in
the form of a list,
Examples: Inputs: Generate a small block with crowded buildings and no vegetation. Outputs: [MapSize : [500],
BuildingDensity : [1], VegetationDensity : [None], SidewalkStyle : [Cement], …]
Code:
'''map_generate
import map_generate
def map_generate(MapSize, BuildingDensity, VegetationDensity, SidewalkStyle, ...):
para = process(MapSize, BuildingDensity, VegetationDensity, SidewalkStyle, ...) # Process parameters.
map_size_config(MapSize)
building_density_config(BuildingDensity)
...
return
Example:
Question: Generate a thriving, modern urban center with high-rise buildings and beautiful urban greenery.
Solution: Based on the provided document and your description, here are the parameter values you can use for
generating the map_generate:
```plaintext
: [MapSize : [500], BuildingDensity : [0.8], VegetationDensity : [Normal], SidewalkStyle : [Cement], …] ```
Code:
'''python
map_generate(MapSize =500, BuildingDensity =0.8, VegetationDensity =" Normal ", SidewalkStyle =" Cement
", …
'''
Fig. 11: Prompt example for city layout generation: a template, code snippet, and
specific example.
28
Generates a cityscape showing a large number of buildings, streets, and trees. Buildings range in style from modern skyscrapers to traditional
low-rise buildings. There are cars moving on the streets, and the green areas represent parks or green spaces.
Generates a cityscape containing multiple skyscrapers, residential buildings, intersections, traffic lights, and other buildings. The buildings are
surrounded by green trees, forming a modern urban scene.
Amidst the urban sprawl, a harmonious blend of cityscape and greenery unfolds, creating a vibrant environment where modern architecture
seamlessly coexists with lush plant life.
Early summer, under the bright sun, the forest is thriving with lush vegetation.
In the center of the majestic canyon, the bright sun casts a warm ray, illuminating the rugged beauty of the rocky terrain, which is dotted
with flowers and plants.
In the wilderness in early autumn, in a dense forest, the soft sunlight bathes the surroundings in warm light, creating a peaceful and warm
atmosphere.
As pure white snow blankets the surroundings, the world transforms into a peaceful winter wonderland, interrupted only by the subtle crunch of
the snow beneath your feet.
Fig. 12: Rendered scene results obtained by inputting text into SceneX : including
three urban and four natural scenes with varied prompts to generate diverse 3D envi-
ronments.
29
Fig. 13: The image illustrates scenes rendered without applying any materials and
displays partial PCG components used for each scene (presented as a composition of
geometry nodes).