0% found this document useful (0 votes)
53 views7 pages

1macaluso Et Al. - 2024 - Toward Automated Programming For Robotic Assembly

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views7 pages

1macaluso Et Al. - 2024 - Toward Automated Programming For Robotic Assembly

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Toward Automated Programming for Robotic Assembly Using ChatGPT

Annabella Macaluso1 , Nicholas Cote2 , and Sachin Chitta2

Abstract— Despite significant technological advancements,


the process of programming robots for adaptive assembly re-
mains labor-intensive, demanding expertise in multiple domains
and often resulting in task-specific, inflexible code. This work
explores the potential of Large Language Models (LLMs), like
ChatGPT, to automate this process, leveraging their ability to
arXiv:2405.08216v1 [cs.RO] 13 May 2024

understand natural language instructions, generalize examples


to new tasks, and write code. In this paper, we suggest how these
abilities can be harnessed and applied to real-world challenges
in the manufacturing industry. We present a novel system
that uses ChatGPT to automate the process of programming
robots for adaptive assembly by decomposing complex tasks
into simpler subtasks, generating robot control code, executing
the code in a simulated workcell, and debugging syntax and
control errors, such as collisions. We outline the architecture
of this system and strategies for task decomposition and
code generation. Finally, we demonstrate how our system can
autonomously program robots for various assembly tasks in a
real-world project.

I. INTRODUCTION
The way robots are programmed for adaptive assembly
has progressed significantly over the years. Initially, robots
were programmed manually, either through a teach pendant
or by guiding the robot physically through desired motions.
Offline Programming (OLP) software later enabled robots to
be programmed, simulated, and optimized on a computer and
Parametric Design workflows further streamlined the process
of extracting tool-paths and targets from CAD (Computer Fig. 1. Our workflow utilizes GPT-4’s generalization and code-writing
Aided Design) geometry. Today, Machine Learning enables abilities to contextualize robotic workcell and CAD information in order to
generate code for assembly tasks such as ”Assemble the Skateboard Truck”.
robots to adapt to variability in a design or workcell, signif-
icantly reducing the need for task-specific programming.
Despite these advancements, the process of programming, the ability to generalize from examples to new tasks, and
testing, and debugging such systems is labor-intensive, time- the ability to write code. We believe these capabilities can
consuming, and involves a lot of trial and error, resulting be harnessed and applied to real-world challenges in the
in highly specialized code tailored for a specific product. manufacturing industry and, furthermore, may represent an
Moreover, it requires deep expertise in multiple domains, opportunity to shift the burden of developing adaptive robotic
such as robotics, perception, manufacturing, and software assembly systems from people to LLMs.
engineering, which poses a barrier to adoption in industry. In this paper, we present a novel workflow that uses
While this approach may be suitable for low-mix, high- ChatGPT to automate the process of programming robots
volume manufacturing, it lacks the flexibility needed to adapt for adaptive assembly by decomposing complex tasks into
to a diverse range of assembly tasks, highlighting the need simpler subtasks, generating robot control code, executing
for a more general approach. the code in a simulated workcell, and debugging syntax and
One wonders: Is it possible to automate this process? control errors, such as collisions. We outline the architecture
Recent developments in Large Language Models (LLM) of our workflow and strategies we employed for task decom-
[1]–[3], like ChatGPT, have shown great promise in answer- position and code generation. Finally, we demonstrate how
ing this question. Specifically, LLMs have shown the capac- our system can autonomously program robots for various
ity to understand and process natural language instructions, assembly tasks in a simulated real-world project.
1 Annabella Macaluso is with the University of California San Diego, La
II. RELATED WORK
Jolla, CA 92093, USA [email protected]
2 Nicholas Cote and Sachin Chitta are with Autodesk Research (Robotics), The manufacturing and construction industries are transi-
Autodesk Inc. [email protected] tioning from traditional methods to digital, computationally-
driven design-robotics workflows. This shift is fueled by the work. The default tuning parameters were also employed.
rise of increasingly digital workflows [4], [5] that seamlessly We develop two specialized Agents for this workflow for
integrate computational design methodologies with modern task decomposition and script generation, discussed in detail
robotic fabrication systems [4], [6]. Integral to these work- later on.
flows are two-way feedback loops, wherein design goals The chat history shows ChatGPT agents on what is ex-
and manufacturing constraints inform one another. Real- pected in the response. The entire history is provided to
time feedback mechanisms and automated problem solving ChatGPT with each prompt, thus, we bootstrap the agent his-
strategies during the fabrication process further optimize this tory with contextual information prior to submitting an initial
process and make it adaptive [7]. This trend leans towards prompt. We also group entries as follows: system guidelines,
methods that are driven primarily by design data, integrated which includes the role the agent is expected to play and
with CAD software, and which significantly reduce coding rules regarding response content and formatting; task context,
and development time [8], [9]. A notable evolution in this which includes the design, workcell constants, reference
area is the incorporation of LLMs into computational mod- docs, and examples; and run-time history, which includes
eling and manufacturing [10], laying the groundwork for our responses generated by ChatGPT and feedback provided
research. from simulation. The run-time history grows throughout a
LLMs are already making strides in robotics, as in [11]– session, allowing an agent to iterate and improve upon prior
[13]. In these studies, LLMs serve as language interfaces responses. For privacy reasons, certain terms are swapped
for real-world robotic applications and scenarios. Studies by with a corresponding public or private alias before or after
[14]–[16] specifically explore tool usage with LLMs. Karpas an interaction with the OpenAI API.
et al. further suggests that integrating LLMs into a system
of expert modules and granting them access to external A. CAD to ChatGPT
tools to help them solve specific tasks and address their Although ChatGPT appears to understand natural language
inherent limitations. While GPT-4 [1] is designed to handle assembly procedures and spatial relationships for common
multi-modal inputs, its public usage is limited to text-based objects and assemblies, it’s unequipped to handle 3D geome-
modalities, thus, overcoming its perceptual, mathematical tries and standard CAD representations (e.g. STLs). While
and task-specific constraints requires a suite of robotics it’s indeed possible to convey some geometrical information
tools. In [17], Koga et al. introduced a CAD to assembly to ChatGPT, we observed that presenting a dictionary of
pipeline that provides scripting tools and such a suite of high- assembly information is more useful for code generation.
level assembly behaviors for designers to plan and automate This information is commonly stored by default in the CAD
robotic assembly tasks. This pipeline, enriched by a task- representation of a given assembly and includes individual
level API, offers a toolkit that code-writing LLMs can utilize. part names, classes, physical properties, and design poses as
Despite their challenges, LLMs offer significant promise well meta-information such as part adjacencies, joints, sub-
due to their ability to process natural language, write code, assemblies, and shared origin frames. A subset of this data is,
and generalize across diverse tasks. Their proficiency in then, extracted from the CAD model and saved to a JSON file
pattern matching for both text and numeric data without and provided as text to ChatGPT downstream. To ensure that
extra fine-tuning makes them even more powerful [18]. Many parts with technical names (i.e. manufacturer-specific serial
researchers have demonstrated this ability for robotic applica- numbers) are more readable to ChatGPT, we also annotate
tions using task and workcell representations [19]. Our work this file with a brief, General Language Description of each
leans into these strengths in order to decompose complex part.
assembly tasks recursively into manageable subtasks and
assembly behavior labels [11] and to write robotic assembly B. Algorithm
code based on the result. With these advancements in mind, Given a textual representation of a design, the following
the need for designers or engineers to develop application- process generates a set of error-free, simulation-tested Python
level code for manufacturing and construction processes scripts that can be used for robotic assembly. Note that the
might soon be redundant, with LLMs poised to take on this algorithm shown doesn’t include stop conditions based on the
role. number of failed script generation attempts, errors caused by
prior scripts, connection errors with OpenAI API, and so on:
III. ARCHITECTURE Initialize a separate thread for the workcell simulation;
At a high-level, we introduce a multi-agent system that note that a reference to the workcell will be required later
utilizes ChatGPT to generate and test Python scripts for on when executing Python modules. Next, initialize the Task
the robotic assembly of an arbitrary design. The term agent Decomposition Agent (TDA). Presented with the design
in this work refers to a Python class that connects to the representation, it infers the assembly process decomposes
OpenAI API, ensures secure interaction with ChatGPT, and it into a sequence of assembly subtasks with correspond-
stores and maintains the chat history. Agents are herein sub- ing behavior labels. The main thread then enters a loop,
classed and configured to solve specific problems later on. As continuously checking if all subtasks are completed. Once
others have remarked [1], [20], we found that GPT-4 provides all subtasks are marked complete, the simulation thread is
better responses than other models and solely use it in this stopped and the main process ends.
For each iteration of the main loop: The next subtask,
its corresponding behavior label, and any errors from prior
iterations are acquired. For the acquired subtask, a dedicated
Script Generation Agent (SGA) is initialized using the given
behavior label. The SGA then enters an inner loop, contin-
uously trying to generate a successful script for the subtask.
Whenever an error is caught, this loop continues and the
SGA tries to generate a better script.
For each iteration of the SGA inner loop: The SGA
generates a Python script string for the specific subtask,
Fig. 2. Example of demonstration provided for few-shot prompting. Pro-
behavior label, and error (if present). The string is then saved vides context on how language output from ChatGPT should be formatted
locally as a Python module, allowing it to be accessed later. and what a ”successful” example looks like. Formatting inspiration taken
The Python module is imported and, if successful, checked from VoxPoser [12] and Code as Policies [11]
for syntax and formatting errors. Then, the module’s main
function can be called with a reference to the simulated
workcell. If the module returns, the subtask is marked as
done.

Algorithm 1 Generate Robotic Assembly Scripts


Require: Textual representation of a design, D
Ensure: Set of tested Python scripts for robotic assembly
1: simulation ← WorkcellSimulation()
2: simulation.Start() {separate thread}
3: T DA ← TaskDecompositionAgent()
4: subtasks ← T DA.Decompose(D)
5: while !subtasks.AllDone do
6: subtask, label, error ← tasks.GetNext()
7: SGA ← ScriptGenerationAgent(label)
8: while true do
9: script ← SGA.Write(subtask, error)
10: f p ← SGA.Save(script)
11: try:
12: module ← ImportAndCheckModule(f p) Fig. 3. Example TDA input/output. User inputs a task query and the
output is a structured list of subtasks each with an assigned robotic behavior
13: module.main(simulation.workcell) primitive. In addition to the task query, information about the assembly,
14: subtasks.MarkDone(subtask) objects that reside in it, grippers, robots etc. are also provided as context
15: break {script ran without error} during input.
16: except Exception as e:
17: error ← e
18: end while of demonstrations in the form of Python scripts for few-
19: end while shot prompting of the SGA later on. Steering the LLMs
20: simulation.Stop() with few-shot learning allows us to ensure a higher success
rate, improve accuracy for down-stream tasks, and set an
appropriate level of detail for decomposed subtasks. While
we assume the topological order of subtasks is generated
C. Task Decomposition Agent correctly, this is not necessarily the case, highlighting the
The TDA leverages the pattern matching and generaliza- need for an additional verification stage in future work.
tion capabilities of LLMs to break down complex assembly We formulate the initial chat context as (R, L, S, P , B,
tasks into a sequence of simple subtasks, as opposed to E), where R is agent role, L are the formatting rules, S is
ones requiring detailed or nuanced implementation, and then the assembly sequence as a dictionary, P is a list of part
assign behavior labels to them. This allows subtasks and names, B is a list of available behavior primitives, and E is
behavior label pairs to be addressed individually during script a set of high quality examples for few-shot prompting. The
generation. user then provides a language description of the task, T , (e.g.
For each assembly subtask the TDA assigns labels based ”Assemble the toy car”).
on common robotic assembly behaviors, such as Move, Pick, For a simple assembly with 10 parts, the TDA may identify
Place, and Insert. A task like Assemble Axle might be as many as 40-50 subtasks, requiring the use of equally many
decomposed into Detect Axle, Pick-Up Axle, Move Axle, and SGAs. With multiple SGAs working in parallel, this process
Insert Axle. For each behavior, we supply a high-quality set takes only a few minutes.
D. Script Generation Agent
The SGA leverages the code-writing, debugging, and text
formatting capabilities of LLMs to generate and debug
Python scripts for a given subtask and behavior label. To
do this, the user provides the subtask and behavior primitive
provided by the TDA, e.g. [”P ickthebaseplate”, ”P ick”].
This agent leverages the capabilities of ChatGPT to write
Python code and debug it, to format text, and to generalize
from example code to generate solutions for specific tasks.
We formulate the initial chat context as (R, L, A, W , P ,
E), where R is agent role, L are the scripting rules, A is the
assembly context as a dictionary, W is the workcell context
Fig. 5. Geometry and assembly representations of the skateboard truck.
as a dictionary, D is reference documentation for the Python
API, and E is a user-defined scripting example associated
with the provided behavior primitive. The generated script, S,
IV. EXPERIMENTS
and any syntax errors or runtime exceptions, X, are appended
to this context every iteration of the SGA, allowing the agent Here we use our workflow to assemble a Skateboard Truck
to improve upon prior versions: (R, L, A, W , D, E, S1 , X1 , consisting of the following parts: Kingpin, Wheel, Bearing,
..., Sn ). Nut, Base, Axle, and Hanger. These parts are relatively few
Because each assembly behavior is purposefully succinct, in number, dimensionally and geometrically diverse, and re-
examples often contain only a few lines of code. To increase quire various tools and behaviors to assemble. Conveniently,
the diversity of generated code, we provide a few, varied there are numerous online tutorials that describe in layman’s
examples with different levels of complexity. We observed terms how to assemble a skateboard truck by hand, to which
that agents were more likely to provide a correctly formatted ChatGPT would have been exposed during training.
response when formatting is demonstrated in examples, We conducted tests on Gripper Selection, Debugging
reinforced in an agent’s natural language rules, and when Scripts, and Robotic assembly to answer questions such as
prompts appear code-like. We also observed an ability to de- (1) What processes does the workflow simplify (2) What’s
duce the behaviors of functions and classes with commonly the extent of generalization ChatGPT has to new unseen tool-
accepted naming conventions, especially when supplemented sets and (3) What limitations does the workflow run into?
with function metadata (e.g. type-hints and docstrings) and All experiments are conducted using a closed-source
detailed error explanations. robotics simulation platform integrated with Fusion 360. Our
It’s important to note that successfully running scripts at workcell consists of two UR-10e robots mounted to a table
the syntax level in a simulated environment doesn’t guarantee and equipped with a gripper and camera. Along one side
its semantic correctness. Occasionally, we observed ”de- of the table is a tool rack with alternative grippers that
bugged” motion commands enclosed in a try-except block, can be interchanged as required by the experiment. Between
which executed without error but failed to complete the the robots lies a bin-picking station containing either kitted
subtask, in which case we had to adjust either the prompt or or assorted parts. Directly opposite is an assembly station
script to achieve success. containing a vice. An overview of the workcell is shown in
Fig. 6.

Fig. 4. Script Generation Agent class architecture. The agent history gets Fig. 6. Digital twin of workcell in simulation platform. The workcell
passed to a client that communicates with ChatGPT to generate a script. The contains tool changers hanging off the table, a red kit of parts containing
script is added to the history and executed in simulation. Then feedback is organized skateboard truck pieces, a black vise fixture to hold the skateboard
added to the history and the process repeats. truck pieces and two UR-10e robots.
A. Gripper Selection B. Debugging Scripts
This experiment evaluates the ability for ChatGPT to select In this experiment, we explore script generation and de-
the best tool for picking or fastening a part among a varied bugging to execute a simple motion task: Move the robot
selection of grippers. Fastening hardware, such as socket to 100 random positions. We bootstrap the agent history
head screws, bolts, or nuts (lock-nuts, wing-nuts etc.), require with an example of such a script, which adds a random float
a high level of dexterity to grasp and manipulate correctly. to each component of the robot’s current pose. The script,
We simplify the process by utilizing custom grippers to however, is intentionally flawed: (1) an early and unnecessary
ensure a secure, successful grasp. We provide the SGA a Exception is raised on purpose (2) the randomness results in
list of the grippers available, API calls to access tools, and a unreachable poses, and (3) runtime Exceptions raised by the
language description detailing the kind of part each gripper is motion command aren’t handled and will cause the script to
intended to handle or best suited to grasp. The grippers tested crash.
include a Custom Kingpin Gripper, an All-Purpose Gripper, The experiment concluded after two iterations. On
Ratcheting Grippers, and a Custom Baseplate Gripper. the first iteration, ChatGPT identified and commented
out the Exception in the example script and returned
the rest unchanged. When the script was executed,
the call to gripper.move_cartesian triggered a
MotionException with the note unreachable position,
as expected. Provided this exception on the second iteration,
ChatGPT reduced the random range by a factor of 10.
This is an extremely conservative approach, and the authors
would have preferred incremental adjustment and matching
Fig. 7. Examples of ChatGPT choosing the best gripper to pick the part.
edits to the printout ”Generating a wild transform”. During
the same iteration, ChatGPT also incorporated a try/except
We test between a Generic Language Description (GLD) block to handle future runtime exceptions when moving.
of each part and a CAD-Derived Language Description Disappointingly, it did not specify the exact exception raised
(DLD) from the CAD model part-name created by the earlier and the introduction of this block, while making the
designer or manufacturer. If the result is incorrect, we send script more likely to finish, means that the robot may not
the result and history back through the retry loop requesting a move to all 100 positions as originally requested, due to
different gripper. The success rates (SR) after three trials are unreachable positions being skipped. This may indicate a
shown in Table IV-A. ChatGPT performs well at selecting bias in the model towards ensuring code runs without error,
the correct gripper the first time. In the case it doesn’t such even if it may compromise functionality. Following these
as with the kingpin we found a pass through the retry loop changes, the script completed without error.
successfully fixed this issue. We observe that part-names
inherited from CAD models often contain obscure naming
conventions which may make it difficult for ChatGPT to
understand the functionality of the part. Thus, as touched in
[22], without keywords and descriptive naming conventions
in CAD models, this level of generalization would not
be achievable. As a result, adopting conventions that store
semantic information within a part-name is incredibly useful
for LLM based workflows.
GLD SR% DLD SR% SR%
w/retry
Kingpin 100 Kingpin-Bolt-91257A662- 0 100
Zinc-Plated-Hex-Head-Screw
Wheel 100 Powell-Peralta-90a-art-bones- 100 100
Fig. 8. Input (L) and output (R) from SGA for random motion experiment.
wheel
Bearing 100 Hardcore-Bearing 100 100
Nut 100 Kingpin-Nut-93298A135- 100 100
Medium-Strength-Steel-Nylon- C. Robotic Assembly
Insert-Flange-Locknut In this experiment, we explore script generation for an
Base 100 Aera-Baseplate-Pneumatic- 100 100
Fixture-v26 insertion task for one of the the skateboard truck parts after
Axle 100 Aera-Trucks-4140-Axle-+4MM 100 100 the assembly has been processed by the TDA, namely: Place
Hanger 100 Area-K4-Hanger 100 100 Kingpin Bolt on Baseplate.
TABLE I The produced script imports the required modules, defines
G RIPPER S ELECTION S UCCESS R ATES (SR) a main function with workcell as an input parameter,
and provides a doc-string describing what the function does
and specifies the subtask. It also walks through a series of
Fig. 9. Example of history used to bootstrap a typical SGA on initialization.
Note that what’s shown is a selection, and that entries are significantly longer
in practice.

computational steps to calculate the position and orientation


of the kingpin bolt, a somewhat intricate challenge requiring
matrix multiplication. It appears to do well amalgamating in-
formation in the history, and it accesses the correct constants Fig. 10. Script generated by the SGA: Place Kingpin Bolt on Baseplate
from the assembly and workcell specifications despite being
provided higher-level terms. Before and after placement, the
gripper moves to a retracted position, showing considera- of refining this approach, however, as it has several key
tion for collision avoidance as in the examples. While the limitations.
SGA relies on templates and examples, its capability to While the model demonstrated an impressive ability to
dynamically adjust and produce task-specific code, such as generate code, our experiments highlight areas where careful
for grasping the unique Kingpin and placing it precisely, is human oversight remains crucial, particularly in tasks that
demonstrated in a coherent, functional script. Additionally, demand nuanced understanding of the task and complex
it provides numerous comments throughout the script that spatial reasoning abilities, such as ensuring scripts achieve
separate each distinct stage. their intended outcomes and executing dexterous manipula-
When executed in simulation after the preceding script, tions like re-grasping. These areas, often intuitive for human
Pick Kingpin Bolt, the script ran successfully. Subsequently, programmers, show clear gaps in ChatGPT’s capabilities
Insert Part on Kingpin Bolt, also ran successfully. Specifi- for programming successful robotics tasks. By fine-tuning
cally, the initial and final workcell states for this script were the model on a dataset of prior coding and manufacturing
compatible with those of the surrounding scripts, allowing examples, however, there may be an opportunity to not only
the script to function as a harmonious link in a sequence generalize it for these purposes, but also to reduce the need
of operations. This is interesting, as the SGA is unaware for the intricate, meticulously configured prompts we em-
of the desired states or actions on either side of the script ployed. While providing robust example code can reduce the
it’s generating. It’s possible that this information is subtle likelihood of some errors downstream, real-time debugging
but apparent in the output of the TDA and chat history, of robot programs remains challenging for text-only LLMs,
or if such best-practices were seen by ChatGPT during and we’re eager to experiment with language models that
training. Notable, no error checks or success criteria are have innate visual and spatial reasoning skills to overcome
implemented, and the script assumes certain prerequisites these challenges. Looking forward, we’re enthusiastic about
about the workcell, such as the part being available in the extending this approach to a broader range of assembly tasks,
designated location. including those with unique geometries, intricate spatial
relationships, and uncommon assembly methods – things that
V. CONCLUSIONS ChatGPT might not have seen during training – pushing the
This research offers a glimpse into the possibility of boundaries of what is currently achievable with LLM-driven
using LLMs, like ChatGPT, to automate the coding process robot programming.
for robotic assembly tasks, a process traditionally marked
by labor intensiveness and need for expertise. We offer R EFERENCES
a practical approach to implementing such an automated
programming system, and demonstrate its efficacy for basic [1] OpenAI, “GPT-4 Technical Report”, arXiv preprint arXiv:2303.08774,
robotic manufacturing tasks. We recognize the necessity 2023.
[2] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M. Lachaux, T. Lacroix, ”VoxPoser: Composable 3D Value Maps for Robotic Manipulation
B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, with Language Models,” in Conference on Robot Learning, 2023.
E. Grave, and G. Lample, ”Llama: Open and efficient foundation [13] D. Driess et al., “Palm-E: An embodied multimodal language model,”
language models,” arXiv preprint arXiv:2302.13971, 2023. arXiv preprint arXiv:2303.03378, 2023.
[3] D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, ”MiniGPT- [14] Y. Shen et al., “Hugginggpt: Solving AI tasks with Chatgpt and its
4: Enhancing vision-language understanding with advanced large friends in hugging face,” arXiv preprint arXiv:2303.17580, 2023.
language models,” arXiv preprint arXiv:2304.10592, 2023. [15] T. Schick et al., “Toolformer: Language models can teach themselves
[4] A. Thoma, A. Adel, M. Helmreich, T. Wehrle, F. Gramazio, and to use tools,” arXiv preprint arXiv:2302.04761.
M. Kohler, “Robotic fabrication of bespoke timber frame mod- [16] E. Karpas, O. Abend, Y. Belinkov, B. Lenz, O. Lieber, N. Ratner,
ules,” Robotic Fabrication in Architecture, Art and Design 2018, pp. Y. Shoham, H. Bata, Y. Levine, K. Leyton-Brown and D. Muhlgay,
447–458, 2018. ”MRKL Systems: A modular, neuro-symbolic architecture that com-
bines large language models, external knowledge sources and discrete
[5] A. Gandia, S. Parascho, R. Rust, G. Casas, F. Gramazio, and M.
reasoning,” arXiv preprint arXiv:2205.00445, 2022.
Kohler, “Towards automatic path planning for robotically assembled
[17] Y. Koga, H. Kerrick and S. Chitta, ”On cad informed adaptive robotic
spatial structures,” Robotic Fabrication in Architecture, Art and Design
assembly,” 2022 IEEE/RSJ International Conference on Intelligent
2018, pp. 59–73, 2018.
Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 10207-10214.
[6] N. King, N. Melenbrink, N. Cote, and G. Fagerström, “Build-ing the [18] S. Mirchandani et al., “Large language models as General Pattern
mass lo-Fab Pavilion,” Robotic Fabrication in Architecture, Art and Machines,” arXiv preprint arXiv:2307.04721, 2023.
Design 2016, pp. 362–373, 2016. [19] I. Singh et al., ”ProgPrompt: Generating Situated Robot Task Plans
[7] D. Pigram, I. Maxwell, and W. McGee, “Towards real-time adaptive using Large Language Models,” 2023 IEEE International Conference
fabrication-aware form finding in architecture,” Robotic Fabrication in on Robotics and Automation (ICRA), London, United Kingdom, 2023,
Architecture, Art and Design 2016, pp. 426–437, 2016. pp. 11523-11530.
[8] M. Bechthold and N. King, “Design robotics,” Rob — Arch 2012, pp. [20] WX. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B.
118–130, 2013. Zhang, J. Zhang, Z. Dong, Y Du et. al,” A survey of large language
[9] P. Eversmann, F. Gramazio, and M. Kohler, “Robotic prefabrication models”, arXiv preprint arXiv:2303.18223, 2023
of timber structures: Towards automated large-scale spatial assembly,” [21] S. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, “Chatgpt
Construction Robotics, vol. 1, no. 1–4, pp. 49–60, 2017. for robotics: Design principles and model abilities,” arXiv preprint
[10] L. Makatura, M. Foshey, B. Wang, F. HähnLein, P. Ma, B. Deng, arXiv:2306.17582, 2023.
M. Tjandrasuwita, A. Spielberg, C. E. Owens, P. Y. Chen, A. Zhao, [22] P. Meltzer, J. G. Lambourne, and D. Grandi, “What’s in a Name?
A. Zhu, W. J. Norton, E. Gu, J. Jacob, Y. Li, A. Schulz, and W. Evaluating Assembly-Part Semantic Knowledge in Language Mod-
Matusik. How can large language models help humans in design and els through User-Provided Names in CAD Files,” arXiv preprint
manufacturing?. arXiv preprint arXiv:2307.14377 arXiv:2304.14275, 2023.
[23] D. Driess et al. ”Palm-e: An embodied multimodal language model,”
[11] J. Liang et al., ”Code as Policies: Language Model Programs for arXiv preprint arXiv:2303.03378, 2023.
Embodied Control,” 2023 IEEE International Conference on Robotics
and Automation (ICRA), London, United Kingdom, 2023, pp. 9493-
9500.
[12] W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and L. Fei-Fei,

You might also like