1macaluso Et Al. - 2024 - Toward Automated Programming For Robotic Assembly
1macaluso Et Al. - 2024 - Toward Automated Programming For Robotic Assembly
I. INTRODUCTION
The way robots are programmed for adaptive assembly
has progressed significantly over the years. Initially, robots
were programmed manually, either through a teach pendant
or by guiding the robot physically through desired motions.
Offline Programming (OLP) software later enabled robots to
be programmed, simulated, and optimized on a computer and
Parametric Design workflows further streamlined the process
of extracting tool-paths and targets from CAD (Computer Fig. 1. Our workflow utilizes GPT-4’s generalization and code-writing
Aided Design) geometry. Today, Machine Learning enables abilities to contextualize robotic workcell and CAD information in order to
generate code for assembly tasks such as ”Assemble the Skateboard Truck”.
robots to adapt to variability in a design or workcell, signif-
icantly reducing the need for task-specific programming.
Despite these advancements, the process of programming, the ability to generalize from examples to new tasks, and
testing, and debugging such systems is labor-intensive, time- the ability to write code. We believe these capabilities can
consuming, and involves a lot of trial and error, resulting be harnessed and applied to real-world challenges in the
in highly specialized code tailored for a specific product. manufacturing industry and, furthermore, may represent an
Moreover, it requires deep expertise in multiple domains, opportunity to shift the burden of developing adaptive robotic
such as robotics, perception, manufacturing, and software assembly systems from people to LLMs.
engineering, which poses a barrier to adoption in industry. In this paper, we present a novel workflow that uses
While this approach may be suitable for low-mix, high- ChatGPT to automate the process of programming robots
volume manufacturing, it lacks the flexibility needed to adapt for adaptive assembly by decomposing complex tasks into
to a diverse range of assembly tasks, highlighting the need simpler subtasks, generating robot control code, executing
for a more general approach. the code in a simulated workcell, and debugging syntax and
One wonders: Is it possible to automate this process? control errors, such as collisions. We outline the architecture
Recent developments in Large Language Models (LLM) of our workflow and strategies we employed for task decom-
[1]–[3], like ChatGPT, have shown great promise in answer- position and code generation. Finally, we demonstrate how
ing this question. Specifically, LLMs have shown the capac- our system can autonomously program robots for various
ity to understand and process natural language instructions, assembly tasks in a simulated real-world project.
1 Annabella Macaluso is with the University of California San Diego, La
II. RELATED WORK
Jolla, CA 92093, USA [email protected]
2 Nicholas Cote and Sachin Chitta are with Autodesk Research (Robotics), The manufacturing and construction industries are transi-
Autodesk Inc. [email protected] tioning from traditional methods to digital, computationally-
driven design-robotics workflows. This shift is fueled by the work. The default tuning parameters were also employed.
rise of increasingly digital workflows [4], [5] that seamlessly We develop two specialized Agents for this workflow for
integrate computational design methodologies with modern task decomposition and script generation, discussed in detail
robotic fabrication systems [4], [6]. Integral to these work- later on.
flows are two-way feedback loops, wherein design goals The chat history shows ChatGPT agents on what is ex-
and manufacturing constraints inform one another. Real- pected in the response. The entire history is provided to
time feedback mechanisms and automated problem solving ChatGPT with each prompt, thus, we bootstrap the agent his-
strategies during the fabrication process further optimize this tory with contextual information prior to submitting an initial
process and make it adaptive [7]. This trend leans towards prompt. We also group entries as follows: system guidelines,
methods that are driven primarily by design data, integrated which includes the role the agent is expected to play and
with CAD software, and which significantly reduce coding rules regarding response content and formatting; task context,
and development time [8], [9]. A notable evolution in this which includes the design, workcell constants, reference
area is the incorporation of LLMs into computational mod- docs, and examples; and run-time history, which includes
eling and manufacturing [10], laying the groundwork for our responses generated by ChatGPT and feedback provided
research. from simulation. The run-time history grows throughout a
LLMs are already making strides in robotics, as in [11]– session, allowing an agent to iterate and improve upon prior
[13]. In these studies, LLMs serve as language interfaces responses. For privacy reasons, certain terms are swapped
for real-world robotic applications and scenarios. Studies by with a corresponding public or private alias before or after
[14]–[16] specifically explore tool usage with LLMs. Karpas an interaction with the OpenAI API.
et al. further suggests that integrating LLMs into a system
of expert modules and granting them access to external A. CAD to ChatGPT
tools to help them solve specific tasks and address their Although ChatGPT appears to understand natural language
inherent limitations. While GPT-4 [1] is designed to handle assembly procedures and spatial relationships for common
multi-modal inputs, its public usage is limited to text-based objects and assemblies, it’s unequipped to handle 3D geome-
modalities, thus, overcoming its perceptual, mathematical tries and standard CAD representations (e.g. STLs). While
and task-specific constraints requires a suite of robotics it’s indeed possible to convey some geometrical information
tools. In [17], Koga et al. introduced a CAD to assembly to ChatGPT, we observed that presenting a dictionary of
pipeline that provides scripting tools and such a suite of high- assembly information is more useful for code generation.
level assembly behaviors for designers to plan and automate This information is commonly stored by default in the CAD
robotic assembly tasks. This pipeline, enriched by a task- representation of a given assembly and includes individual
level API, offers a toolkit that code-writing LLMs can utilize. part names, classes, physical properties, and design poses as
Despite their challenges, LLMs offer significant promise well meta-information such as part adjacencies, joints, sub-
due to their ability to process natural language, write code, assemblies, and shared origin frames. A subset of this data is,
and generalize across diverse tasks. Their proficiency in then, extracted from the CAD model and saved to a JSON file
pattern matching for both text and numeric data without and provided as text to ChatGPT downstream. To ensure that
extra fine-tuning makes them even more powerful [18]. Many parts with technical names (i.e. manufacturer-specific serial
researchers have demonstrated this ability for robotic applica- numbers) are more readable to ChatGPT, we also annotate
tions using task and workcell representations [19]. Our work this file with a brief, General Language Description of each
leans into these strengths in order to decompose complex part.
assembly tasks recursively into manageable subtasks and
assembly behavior labels [11] and to write robotic assembly B. Algorithm
code based on the result. With these advancements in mind, Given a textual representation of a design, the following
the need for designers or engineers to develop application- process generates a set of error-free, simulation-tested Python
level code for manufacturing and construction processes scripts that can be used for robotic assembly. Note that the
might soon be redundant, with LLMs poised to take on this algorithm shown doesn’t include stop conditions based on the
role. number of failed script generation attempts, errors caused by
prior scripts, connection errors with OpenAI API, and so on:
III. ARCHITECTURE Initialize a separate thread for the workcell simulation;
At a high-level, we introduce a multi-agent system that note that a reference to the workcell will be required later
utilizes ChatGPT to generate and test Python scripts for on when executing Python modules. Next, initialize the Task
the robotic assembly of an arbitrary design. The term agent Decomposition Agent (TDA). Presented with the design
in this work refers to a Python class that connects to the representation, it infers the assembly process decomposes
OpenAI API, ensures secure interaction with ChatGPT, and it into a sequence of assembly subtasks with correspond-
stores and maintains the chat history. Agents are herein sub- ing behavior labels. The main thread then enters a loop,
classed and configured to solve specific problems later on. As continuously checking if all subtasks are completed. Once
others have remarked [1], [20], we found that GPT-4 provides all subtasks are marked complete, the simulation thread is
better responses than other models and solely use it in this stopped and the main process ends.
For each iteration of the main loop: The next subtask,
its corresponding behavior label, and any errors from prior
iterations are acquired. For the acquired subtask, a dedicated
Script Generation Agent (SGA) is initialized using the given
behavior label. The SGA then enters an inner loop, contin-
uously trying to generate a successful script for the subtask.
Whenever an error is caught, this loop continues and the
SGA tries to generate a better script.
For each iteration of the SGA inner loop: The SGA
generates a Python script string for the specific subtask,
Fig. 2. Example of demonstration provided for few-shot prompting. Pro-
behavior label, and error (if present). The string is then saved vides context on how language output from ChatGPT should be formatted
locally as a Python module, allowing it to be accessed later. and what a ”successful” example looks like. Formatting inspiration taken
The Python module is imported and, if successful, checked from VoxPoser [12] and Code as Policies [11]
for syntax and formatting errors. Then, the module’s main
function can be called with a reference to the simulated
workcell. If the module returns, the subtask is marked as
done.
Fig. 4. Script Generation Agent class architecture. The agent history gets Fig. 6. Digital twin of workcell in simulation platform. The workcell
passed to a client that communicates with ChatGPT to generate a script. The contains tool changers hanging off the table, a red kit of parts containing
script is added to the history and executed in simulation. Then feedback is organized skateboard truck pieces, a black vise fixture to hold the skateboard
added to the history and the process repeats. truck pieces and two UR-10e robots.
A. Gripper Selection B. Debugging Scripts
This experiment evaluates the ability for ChatGPT to select In this experiment, we explore script generation and de-
the best tool for picking or fastening a part among a varied bugging to execute a simple motion task: Move the robot
selection of grippers. Fastening hardware, such as socket to 100 random positions. We bootstrap the agent history
head screws, bolts, or nuts (lock-nuts, wing-nuts etc.), require with an example of such a script, which adds a random float
a high level of dexterity to grasp and manipulate correctly. to each component of the robot’s current pose. The script,
We simplify the process by utilizing custom grippers to however, is intentionally flawed: (1) an early and unnecessary
ensure a secure, successful grasp. We provide the SGA a Exception is raised on purpose (2) the randomness results in
list of the grippers available, API calls to access tools, and a unreachable poses, and (3) runtime Exceptions raised by the
language description detailing the kind of part each gripper is motion command aren’t handled and will cause the script to
intended to handle or best suited to grasp. The grippers tested crash.
include a Custom Kingpin Gripper, an All-Purpose Gripper, The experiment concluded after two iterations. On
Ratcheting Grippers, and a Custom Baseplate Gripper. the first iteration, ChatGPT identified and commented
out the Exception in the example script and returned
the rest unchanged. When the script was executed,
the call to gripper.move_cartesian triggered a
MotionException with the note unreachable position,
as expected. Provided this exception on the second iteration,
ChatGPT reduced the random range by a factor of 10.
This is an extremely conservative approach, and the authors
would have preferred incremental adjustment and matching
Fig. 7. Examples of ChatGPT choosing the best gripper to pick the part.
edits to the printout ”Generating a wild transform”. During
the same iteration, ChatGPT also incorporated a try/except
We test between a Generic Language Description (GLD) block to handle future runtime exceptions when moving.
of each part and a CAD-Derived Language Description Disappointingly, it did not specify the exact exception raised
(DLD) from the CAD model part-name created by the earlier and the introduction of this block, while making the
designer or manufacturer. If the result is incorrect, we send script more likely to finish, means that the robot may not
the result and history back through the retry loop requesting a move to all 100 positions as originally requested, due to
different gripper. The success rates (SR) after three trials are unreachable positions being skipped. This may indicate a
shown in Table IV-A. ChatGPT performs well at selecting bias in the model towards ensuring code runs without error,
the correct gripper the first time. In the case it doesn’t such even if it may compromise functionality. Following these
as with the kingpin we found a pass through the retry loop changes, the script completed without error.
successfully fixed this issue. We observe that part-names
inherited from CAD models often contain obscure naming
conventions which may make it difficult for ChatGPT to
understand the functionality of the part. Thus, as touched in
[22], without keywords and descriptive naming conventions
in CAD models, this level of generalization would not
be achievable. As a result, adopting conventions that store
semantic information within a part-name is incredibly useful
for LLM based workflows.
GLD SR% DLD SR% SR%
w/retry
Kingpin 100 Kingpin-Bolt-91257A662- 0 100
Zinc-Plated-Hex-Head-Screw
Wheel 100 Powell-Peralta-90a-art-bones- 100 100
Fig. 8. Input (L) and output (R) from SGA for random motion experiment.
wheel
Bearing 100 Hardcore-Bearing 100 100
Nut 100 Kingpin-Nut-93298A135- 100 100
Medium-Strength-Steel-Nylon- C. Robotic Assembly
Insert-Flange-Locknut In this experiment, we explore script generation for an
Base 100 Aera-Baseplate-Pneumatic- 100 100
Fixture-v26 insertion task for one of the the skateboard truck parts after
Axle 100 Aera-Trucks-4140-Axle-+4MM 100 100 the assembly has been processed by the TDA, namely: Place
Hanger 100 Area-K4-Hanger 100 100 Kingpin Bolt on Baseplate.
TABLE I The produced script imports the required modules, defines
G RIPPER S ELECTION S UCCESS R ATES (SR) a main function with workcell as an input parameter,
and provides a doc-string describing what the function does
and specifies the subtask. It also walks through a series of
Fig. 9. Example of history used to bootstrap a typical SGA on initialization.
Note that what’s shown is a selection, and that entries are significantly longer
in practice.